Python读取大文件

2014.11.12

1. 前言

前几天在做日志分析系统，需要处理几十G的文件，我尝试用原来的for line in open(filepath).readlines()处理，但停顿好久也没变化，可见占用不小的内存。在网上搜索了下，找到了两种方法来读取大文件。

2. with读取大文件

with读取是非常Pythonic的方法，示例如下：

with open(filepath) as f:
    for line in f:
        <do something with line>

这个方法是在Stackoverflow上找到，这位高手对with读取的解释是这样的：

The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered IO and memory management so you don't have to worry about large files.

大意就是with负责处理open和close文件，包括抛出内部异常。而for line in f将文件对象f当做迭代对象，将自动处理IO缓冲和内存管理，这样你无需担心大文件的处理了。

3. fileinput处理

用到了Python的fileinput模块，亲测也毫无卡顿，示例代码如下：

import fileinput
for line in fileinput.input(['sum.log']):
    print line

4. 总结

以上两种方法都亲测可用，明显第一种更Pythonic，无需import，而且还能处理close和Exception，更推荐使用。

python bigfile

Comments

aa reply

2018-02-04 16:21:28

能读出来，但是处理时报错

TypeError: string indices must be integers, not str

△

SEO技术流

Python读取大文件

1. 前言

2. with读取大文件

3. fileinput处理

4. 总结