Python - 第三方HTTP库Requests使用详解3（文件下载）

作者：hangge | 2022-06-29 10:10

四、文件下载

1，普通下载

（1）对于非文本请求，我们可以通过 Response 对象的 content 属性以字节的方式访问请求响应体。

注意：这种模式只能下载小文件。因为在这种模式下，从网站接受到的数据是一直储存在内存中，只有当 write 时才写入硬盘，如果文件很大，那么所占用的内存也是很大的。

（2）下面将一张网络上的图片下载到本地并保存（文件名不变）：

import requests

url = 'http://www.hangge.com/blog/images/logo.png'
r = requests.get(url)
with open("logo.png", "wb") as code:
    code.write(r.content)

（3）代码运行后可以看到图片已经成功下载下来了。

2，流式下载

下面代码我们改成流式下载，即边下载边保存。这种方式适合用来下载大文件。

import requests

url = 'http://www.hangge.com/blog/images/logo.png'
r = requests.get(url, stream=True)
with open("logo.png", "wb") as f:
    for bl in r.iter_content(chunk_size=1024):
        if bl:
            f.write(bl)

3，带进度的文件下载

（1）如果文件体积很大，下载时我们最好能实时显示当前的下载进度。为方便使用，我这里封装了一个下载方法（内部同样使用流式下载的方式）。

import requests
from contextlib import closing


# 文件下载器
def down_load(file_url, file_path):
    with closing(requests.get(file_url, stream=True)) as response:
        chunk_size = 1024  # 单次请求最大值
        content_size = int(response.headers['content-length'])  # 内容体总大小
        data_count = 0
        with open(file_path, "wb") as file:
            for data in response.iter_content(chunk_size=chunk_size):
                file.write(data)
                data_count = data_count + len(data)
                now_jd = (data_count / content_size) * 100
                print("\r 文件下载进度：%d%%(%d/%d) - %s"
                      % (now_jd, data_count, content_size, file_path), end=" ")


if __name__ == '__main__':
    fileUrl = 'http://www.hangge.com/hangge.zip'  # 文件链接
    filePath = "hangge.zip"  # 文件路径
    down_load(fileUrl, filePath)

（3）运行效果如下，可以看到在文件下载的过程会实时显示当前的进度：

4，带下载速度显示的文件下载

这里对上面的方法做个改进，增加实时下载速度的计算和显示：

import requests
import time
from contextlib import closing


# 文件下载器
def down_load(file_url, file_path):
    start_time = time.time() # 文件开始下载时的时间
    with closing(requests.get(file_url, stream=True)) as response:
        chunk_size = 1024  # 单次请求最大值
        content_size = int(response.headers['content-length'])  # 内容体总大小
        data_count = 0
        with open(file_path, "wb") as file:
            for data in response.iter_content(chunk_size=chunk_size):
                file.write(data)
                data_count = data_count + len(data)
                now_jd = (data_count / content_size) * 100
                speed = data_count / 1024 / (time.time() - start_time)
                print("\r 文件下载进度：%d%%(%d/%d) 文件下载速度：%dKB/s - %s"
                      % (now_jd, data_count, content_size, speed, file_path), end=" ")


if __name__ == '__main__':
    fileUrl = 'http://www.hangge.com/hangge.zip'  # 文件链接
    filePath = "hangge.zip"  # 文件路径
    down_load(fileUrl, filePath)

Python / AI