152

İnternetteki bir resmin URL'sini biliyorum.

örneğin , Google logosunu içeren http://www.digimouth.com/news/media/2011/09/google-logo.jpg .

Şimdi, URL'yi bir tarayıcıda açmadan ve dosyayı manuel olarak kaydetmeden Python kullanarak bu görüntüyü nasıl indirebilirim.

python web-scraping

— Pankaj Vatsa
kaynak

1

Olası yinelenen Python kullanarak HTTP üzerinden bir dosya indirmek nasıl?

— Jaydev

316

Python 2

Tek yapmanız gereken onu bir dosya olarak kaydetmekse, daha basit bir yol:

import urllib

urllib.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

İkinci argüman, dosyanın kaydedilmesi gereken yerel yoldur.

Python 3

SergO'nun önerdiği gibi, aşağıdaki kod Python 3 ile çalışmalıdır.

import urllib.request

urllib.request.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

— Liquid_Fire
kaynak

55

Bağlantıdan dosya adı almanın iyi bir yolufilename = link.split('/')[-1]

— heltonbiker

2

urlretrieve ile ben sadece içinde bir dict ve 404 hata metni ile 1KB dosyası olsun. neden? Eğer tarayıcımdan url

— girersem

2

@Yebach: İndirdiğiniz site, size hangi içeriğin sunulacağını belirlemek için çerezler, Kullanıcı Aracısı veya diğer başlıklar kullanıyor olabilir. Bunlar tarayıcınız ve Python arasında farklı olacaktır.

— Liquid_Fire

27

Python 3 : import urllib.request veurllib.request.urlretrieve()buna göre.

— SergO

1

@SergO - Python 3 bölümünü orijinal cevaba ekleyebilir misiniz?

— Sreejith Menon

27

import urllib
resource = urllib.urlopen("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")
output = open("file01.jpg","wb")
output.write(resource.read())
output.close()

file01.jpg resminizi içerecektir.

— Noufal Ibrahim
kaynak

2

Dosyayı ikili modda açmalısınız: open("file01.jpg", "wb")Aksi takdirde görüntüyü bozabilirsiniz.

— Liquid_Fire

2

urllib.urlretrievedoğrudan görüntü kaydedebilirsiniz.

— heltonbiker

17

Sadece bunu yapan bir senaryo yazdım ve kullanım için github'ımda mevcut.

Resimler için herhangi bir web sitesini ayrıştırmama izin vermek için BeautifulSoup'u kullandım. Çok fazla web kazıma yapacaksanız (veya aracımı kullanmak istiyorsanız) size öneririm sudo pip install BeautifulSoup. BeautifulSoup ile ilgili bilgilere buradan ulaşabilirsiniz .

Kolaylık sağlamak için benim kod:

from bs4 import BeautifulSoup
from urllib2 import urlopen
import urllib

# use this image scraper from the location that 
#you want to save scraped images to

def make_soup(url):
    html = urlopen(url).read()
    return BeautifulSoup(html)

def get_images(url):
    soup = make_soup(url)
    #this makes a list of bs4 element tags
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + "images found.")
    print 'Downloading images to current working directory.'
    #compile our unicode list of image links
    image_links = [each.get('src') for each in images]
    for each in image_links:
        filename=each.split('/')[-1]
        urllib.urlretrieve(each, filename)
    return image_links

#a standard call looks like this
#get_images('http://www.wookmark.com')

— Evet.
kaynak

11

Bu isteklerle yapılabilir. Sayfayı yükleyin ve ikili içeriği bir dosyaya dökün.

import os
import requests

url = 'https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg'
page = requests.get(url)

f_ext = os.path.splitext(url)[-1]
f_name = 'img{}'.format(f_ext)
with open(f_name, 'wb') as f:
    f.write(page.content)

— AlexG
kaynak

1

kötü istek alırsanız isteklerde kullanıcı başlıkları :)

— 1UC1F3R616

8

Python 3

urllib.request - URL'leri açmak için genişletilebilir kitaplık

from urllib.error import HTTPError
from urllib.request import urlretrieve

try:
    urlretrieve(image_url, image_local_path)
except FileNotFoundError as err:
    print(err)   # something wrong with local path
except HTTPError as err:
    print(err)  # something wrong with url

— Sergo
kaynak

6

Python 2 ve Python 3 ile çalışan bir çözüm:

try:
    from urllib.request import urlretrieve  # Python 3
except ImportError:
    from urllib import urlretrieve  # Python 2

url = "http://www.digimouth.com/news/media/2011/09/google-logo.jpg"
urlretrieve(url, "local-filename.jpg")

veya ek gereksinimi requestskabul edilebilirse ve bir http (s) URL'si ise:

def load_requests(source_url, sink_path):
    """
    Load a file from an URL (e.g. http).

    Parameters
    ----------
    source_url : str
        Where to load the file from.
    sink_path : str
        Where the loaded file is stored.
    """
    import requests
    r = requests.get(source_url, stream=True)
    if r.status_code == 200:
        with open(sink_path, 'wb') as f:
            for chunk in r:
                f.write(chunk)

— Martin Thoma
kaynak

5

Yup.'in senaryosunu genişleten bir senaryo hazırladım. Bazı şeyleri düzelttim. Şimdi 403'ü atlayacak: Yasak problemler. Bir görüntü alınamadığı zaman çökmez. Bozuk önizlemelerden kaçınmaya çalışır. Doğru mutlak URL'leri alır. Daha fazla bilgi verir. Komut satırından bir argüman ile çalıştırılabilir.

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib2
import shutil
import requests
from urlparse import urljoin
import sys
import time

def make_soup(url):
    req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib2.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print 'Downloading images to current working directory.'
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print 'Getting: ' + filename
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print '  An error occured. Continuing.'
    print 'Done.'

if __name__ == '__main__':
    url = sys.argv[1]
    get_images(url)

— madprops
kaynak

3

İstek kitaplığını kullanma

import requests
import shutil,os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
currentDir = os.getcwd()
path = os.path.join(currentDir,'Images')#saving images to Images folder

def ImageDl(url):
    attempts = 0
    while attempts < 5:#retry 5 times
        try:
            filename = url.split('/')[-1]
            r = requests.get(url,headers=headers,stream=True,timeout=5)
            if r.status_code == 200:
                with open(os.path.join(path,filename),'wb') as f:
                    r.raw.decode_content = True
                    shutil.copyfileobj(r.raw,f)
            print(filename)
            break
        except Exception as e:
            attempts+=1
            print(e)


ImageDl(url)

— Sohan Das
kaynak

Görünüşe göre başlık gerçekten önemli, 403 hataları alıyordum. İşe yaradı.

— Ishtiyaq Husain

2

Bu çok kısa bir cevap.

import urllib
urllib.urlretrieve("http://photogallery.sandesh.com/Picture.aspx?AlubumId=422040", "Abc.jpg")

— OO7
kaynak

2

Python 3 sürümü

Python 3 için @madprops kodunu ayarladım

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib.request
import shutil
import requests
from urllib.parse import urljoin
import sys
import time

def make_soup(url):
    req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib.request.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print('Downloading images to current working directory.')
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print('Getting: ' + filename)
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print('  An error occured. Continuing.')
    print('Done.')

if __name__ == '__main__':
    get_images('http://www.wookmark.com')

— Giovanni G. PY
kaynak

1

İstekleri kullanarak Python 3 için yeni bir şey:

Koddaki yorumlar. Kullanıma hazır fonksiyon.


import requests
from os import path

def get_image(image_url):
    """
    Get image based on url.
    :return: Image name if everything OK, False otherwise
    """
    image_name = path.split(image_url)[1]
    try:
        image = requests.get(image_url)
    except OSError:  # Little too wide, but work OK, no additional imports needed. Catch all conection problems
        return False
    if image.status_code == 200:  # we could have retrieved error page
        base_dir = path.join(path.dirname(path.realpath(__file__)), "images") # Use your own path or "" to use current working directory. Folder must exist.
        with open(path.join(base_dir, image_name), "wb") as f:
            f.write(image.content)
        return image_name

get_image("https://apod.nasddfda.gov/apod/image/2003/S106_Mishra_1947.jpg")

— Pavel Pančocha
kaynak

0

Geç cevap, ama python>=3.6sizin için dload kullanabilirsiniz , yani:

import dload
dload.save("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

görüntüye ihtiyacınız varsa byteskullanın:

img_bytes = dload.bytes("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

kullanarak yükle pip3 install dload

— CONvid19
kaynak

-2

img_data=requests.get('https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg')

with open(str('file_name.jpg', 'wb') as handler:
    handler.write(img_data)

— Lewis Mann
kaynak

4

Stack Overflow'a hoş geldiniz! Bu kullanıcının sorununu çözmüş olsanız da, yalnızca kod yanıtları gelecekte bu soruya gelen kullanıcılar için çok yararlı değildir. Lütfen kodunuzun neden orijinal sorunu çözdüğünü açıklamak için cevabınızı düzenleyin.

— Joe C

1

TypeError: a bytes-like object is required, not 'Response'. O olmalıhandler.write(img_data.content)

— TitanFighter

Olmalı handler.write(img_data.read()).

— jdhao

URL adresini zaten bildiğim Python kullanarak bir resmi yerel olarak nasıl kaydedebilirim?

Python 2

Python 3

Python 3 sürümü