Python'da .txt uzantılı bir dizindeki tüm dosyaları bulun

1043

.txtPython uzantısına sahip bir dizindeki tüm dosyaları nasıl bulabilirim ?

python file-io

— usertest
kaynak

2354

Şunları kullanabilirsiniz glob:

import glob, os
os.chdir("/mydir")
for file in glob.glob("*.txt"):
    print(file)

ya da basitçe os.listdir:

import os
for file in os.listdir("/mydir"):
    if file.endswith(".txt"):
        print(os.path.join("/mydir", file))

veya dizinde geçiş yapmak istiyorsanız, şunu kullanın os.walk:

import os
for root, dirs, files in os.walk("/mydir"):
    for file in files:
        if file.endswith(".txt"):
             print(os.path.join(root, file))

— ghostdog74
kaynak

11

Çözüm # 2'yi kullanarak, bu bilgilerle nasıl dosya veya liste oluşturabilirsiniz?

— Merlin

72

@ ghostdog74: Bence , değişkende olan tek bir dosya adı for file in folduğu için yazmak daha uygun olurdu for files in f. Daha da iyisi değiştirmek olacaktır fiçin filesdaha sonra döngüler için haline gelebilir ve for file in files.

— martineau

45

@computermacgyver: Hayır, fileayrılmış bir sözcük değil, yalnızca önceden tanımlanmış bir işlevin adıdır, bu nedenle bunu kendi kodunuzda değişken adı olarak kullanmak oldukça mümkündür. Genellikle böyle çarpışmalardan kaçınması gerektiği doğru olsa da, fileözel bir durumdur, çünkü onu kullanmaya hiç gerek yoktur, bu nedenle genellikle kılavuzun bir istisnası olarak kabul edilir. Bunu yapmak istemiyorsanız, PEP8 bu tür isimlere tek bir alt çizgi eklemenizi önerir, yani file_, hala oldukça okunabilir olduğunu kabul etmeniz gerekir.

— martineau

9

Teşekkürler martineau, kesinlikle haklısın. Sonuçlara çok hızlı atladım.

— computermacgyver

40

# 2 için daha Pythonic bir yol, f.endswith ('. Txt')] ise [os.listdir ('/ mydir') içindeki f için f için

— ozgur

247

Glob kullanın .

>>> import glob
>>> glob.glob('./*.txt')
['./outline.txt', './pip-log.txt', './test.txt', './testingvim.txt']

— Muhammed Alkarouri
kaynak

Bu sadece kolay değil, aynı zamanda büyük / küçük harfe duyarsızdır. (En azından Windows'ta olması gerektiği gibi. Diğer işletim sistemlerinden emin değilim.)

— Jon Coombs

35

Python'unuz 3.5'in altındaysa globdosyaları tekrar tekrar bulamayacağınıza dikkat edin . daha fazla bilgi

— qun

en iyi bölüm düzenli ifade testi kullanabilirsiniz * .txt

— Alex Punnen

@JonCoombs hayır. En azından Linux'ta değil.

— Karuhanga

157

Böyle bir şey işi yapmalı

for root, dirs, files in os.walk(directory):
    for file in files:
        if file.endswith('.txt'):
            print file

— Adam Byrtek
kaynak

73

Senin değişkenleri adlandırmak için 1 root, dirs, filesyerine r, d, f. Çok daha okunabilir.

— Clément

27

Bu harf duyarlı (.TXT veya .txt Eşleşmeyecek) muhtemelen yapmak isteyeceksiniz böylece olduğunu Not eğer file.lower () endswith ( 'txt.'):.

— Jon Coombs

1

cevabınız altdizinle ilgilidir.

— Sam Liao

117

Böyle bir şey işe yarayacak:

>>> import os
>>> path = '/usr/share/cups/charmaps'
>>> text_files = [f for f in os.listdir(path) if f.endswith('.txt')]
>>> text_files
['euc-cn.txt', 'euc-jp.txt', 'euc-kr.txt', 'euc-tw.txt', ... 'windows-950.txt']

— Seth
kaynak

Text_files yolunu nasıl kaydederim? ['path / euc-cn.txt', ... 'path / windows-950.txt']

— IceQueeny

5

os.path.joinÖğesinin her öğesinde kullanabilirsiniz text_files. Böyle bir şey olabilir text_files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith('.txt')].

— Seth

55

Sadece pathlibs ^1'i kullanabilirsiniz :glob

import pathlib

list(pathlib.Path('your_directory').glob('*.txt'))

veya bir döngü içinde:

for txt_file in pathlib.Path('your_directory').glob('*.txt'):
    # do something with "txt_file"

Özyinelemeyi istiyorsanız .glob('**/*.txt)

¹pathlib modül piton 3.4 standart kütüphanesinde yer aldı. Ancak bu modülün arka bağlantı noktalarını eski Python sürümlerine bile (yani condaveya kullanarak pip) yükleyebilirsiniz : pathlibve pathlib2.

— MSeifert
kaynak

**/*.txteski python sürümleri tarafından desteklenmez.Bu yüzden bunu çözdüm: foundfiles= subprocess.check_output("ls **/*.txt", shell=True) for foundfile in foundfiles.splitlines(): print foundfile

— Roman

1

@Roman Evet, bu sadece neler pathlibyapabileceğimi gösteriyordu ve ben zaten Python sürüm gereksinimlerini ekledim . :) Ama eğer yaklaşımınız henüz gönderilmemişse neden sadece başka bir cevap olarak eklemiyorsunuz?

— MSeifert

1

evet, cevap göndermek bana daha iyi biçimlendirme olanakları sunacaktı. Ben orada postet çünkü ben bunun için daha uygun bir yer olduğunu düşünüyorum.

— Roma

5

rglobÖğeleri özyinelemeli olarak aramak istiyorsanız da kullanabileceğinizi unutmayın . Örn..rglob('*.txt')

— Bram Vanroy

40

import os

path = 'mypath/path' 
files = os.listdir(path)

files_txt = [i for i in files if i.endswith('.txt')]

— user3281344
kaynak

29

Ben os.walk () gibi :

import os

for root, dirs, files in os.walk(dir):
    for f in files:
        if os.path.splitext(f)[1] == '.txt':
            fullpath = os.path.join(root, f)
            print(fullpath)

Veya jeneratörlerle:

import os

fileiter = (os.path.join(root, f)
    for root, _, files in os.walk(dir)
    for f in files)
txtfileiter = (f for f in fileiter if os.path.splitext(f)[1] == '.txt')
for txt in txtfileiter:
    print(txt)

— hughdbrown
kaynak

28

İşte biraz farklı sonuçlar üreten aynı sürümler:

) (Glob.iglob

import glob
for f in glob.iglob("/mydir/*/*.txt"): # generator, search immediate subdirectories 
    print f

glob.glob1 ()

print glob.glob1("/mydir", "*.tx?")  # literal_directory, basename_pattern

fnmatch.filter ()

import fnmatch, os
print fnmatch.filter(os.listdir("/mydir"), "*.tx?") # include dot-files

— jfs
kaynak

3

Meraklı için, modülde Python belgelerinde listelenmeyen glob1()yardımcı bir işlevdir glob. Kaynak dosyada ne yaptığını açıklayan bazı satır içi yorumlar var, bakın .../Lib/glob.py.

— martineau

1

@martineau: glob.glob1()herkese açık değil ancak Python 2.4-2.7; 3.0-3.2; PYPY; jython github.com/zed/test_glob1

— jfs

1

Teşekkürler, bir modülde belgelenmemiş özel bir işlev kullanıp kullanmayacağınıza karar verirken sahip olmanız gereken ek bilgilerdir. ;-) İşte biraz daha. Python 2.7 sürümü sadece 12 satır uzunluğundadır ve globmodülden kolayca çıkarılabilir gibi görünüyor .

— martineau

21

path.py başka bir alternatiftir: https://github.com/jaraco/path.py

from path import path
p = path('/path/to/the/directory')
for f in p.files(pattern='*.txt'):
    print f

— Anuvrat Parashar
kaynak

Güzel, aynı zamanda desende düzenli ifadeyi de kabul ediyor. Ben kullanıyorum for f in p.walk(pattern='*.txt')her alt klasörler aracılığıyla gitmek

— Kostanos

1

Ya da pathlib var. Şunlar gibi bir şey yapabilirsiniz: list(p.glob('**/*.py'))

— user2233949

15

Python v3.5 +

Özyinelemeli bir işlevde os.scandir kullanarak hızlı yöntem. Klasör ve alt klasörlerde belirtilen uzantıya sahip tüm dosyaları arar.

import os

def findFilesInFolder(path, pathList, extension, subFolders = True):
    """  Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:        Base directory to find files
    pathList:    A list that stores all paths
    extension:   File extension to find
    subFolders:  Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    """

    try:   # Trapping a OSError:  File permissions problem I believe
        for entry in os.scandir(path):
            if entry.is_file() and entry.path.endswith(extension):
                pathList.append(entry.path)
            elif entry.is_dir() and subFolders:   # if its a directory, then repeat process as a nested function
                pathList = findFilesInFolder(entry.path, pathList, extension, subFolders)
    except OSError:
        print('Cannot access ' + path +'. Probably a permissions error')

    return pathList

dir_name = r'J:\myDirectory'
extension = ".txt"

pathList = []
pathList = findFilesInFolder(dir_name, pathList, extension, True)

Nisan 2019 Güncellemesi

10.000 sn dosya içeren dizinlerde arama yapıyorsanız, listeye ekleme yapmak verimsiz hale gelir. Sonuçları 'elde etmek' daha iyi bir çözümdür. Ayrıca çıktı bir Panda veri çerçevesine dönüştürmek için bir işlev ekledik.

import os
import re
import pandas as pd
import numpy as np


def findFilesInFolderYield(path,  extension, containsTxt='', subFolders = True, excludeText = ''):
    """  Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:               Base directory to find files
    extension:          File extension to find.  e.g. 'txt'.  Regular expression. Or  'ls\d' to match ls1, ls2, ls3 etc
    containsTxt:        List of Strings, only finds file if it contains this text.  Ignore if '' (or blank)
    subFolders:         Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    excludeText:        Text string.  Ignore if ''. Will exclude if text string is in path.
    """
    if type(containsTxt) == str: # if a string and not in a list
        containsTxt = [containsTxt]

    myregexobj = re.compile('\.' + extension + '$')    # Makes sure the file extension is at the end and is preceded by a .

    try:   # Trapping a OSError or FileNotFoundError:  File permissions problem I believe
        for entry in os.scandir(path):
            if entry.is_file() and myregexobj.search(entry.path): # 

                bools = [True for txt in containsTxt if txt in entry.path and (excludeText == '' or excludeText not in entry.path)]

                if len(bools)== len(containsTxt):
                    yield entry.stat().st_size, entry.stat().st_atime_ns, entry.stat().st_mtime_ns, entry.stat().st_ctime_ns, entry.path

            elif entry.is_dir() and subFolders:   # if its a directory, then repeat process as a nested function
                yield from findFilesInFolderYield(entry.path,  extension, containsTxt, subFolders)
    except OSError as ose:
        print('Cannot access ' + path +'. Probably a permissions error ', ose)
    except FileNotFoundError as fnf:
        print(path +' not found ', fnf)

def findFilesInFolderYieldandGetDf(path,  extension, containsTxt, subFolders = True, excludeText = ''):
    """  Converts returned data from findFilesInFolderYield and creates and Pandas Dataframe.
    Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:               Base directory to find files
    extension:          File extension to find.  e.g. 'txt'.  Regular expression. Or  'ls\d' to match ls1, ls2, ls3 etc
    containsTxt:        List of Strings, only finds file if it contains this text.  Ignore if '' (or blank)
    subFolders:         Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    excludeText:        Text string.  Ignore if ''. Will exclude if text string is in path.
    """

    fileSizes, accessTimes, modificationTimes, creationTimes , paths  = zip(*findFilesInFolderYield(path,  extension, containsTxt, subFolders))
    df = pd.DataFrame({
            'FLS_File_Size':fileSizes,
            'FLS_File_Access_Date':accessTimes,
            'FLS_File_Modification_Date':np.array(modificationTimes).astype('timedelta64[ns]'),
            'FLS_File_Creation_Date':creationTimes,
            'FLS_File_PathName':paths,
                  })

    df['FLS_File_Modification_Date'] = pd.to_datetime(df['FLS_File_Modification_Date'],infer_datetime_format=True)
    df['FLS_File_Creation_Date'] = pd.to_datetime(df['FLS_File_Creation_Date'],infer_datetime_format=True)
    df['FLS_File_Access_Date'] = pd.to_datetime(df['FLS_File_Access_Date'],infer_datetime_format=True)

    return df

ext =   'txt'  # regular expression 
containsTxt=[]
path = 'C:\myFolder'
df = findFilesInFolderYieldandGetDf(path,  ext, containsTxt, subFolders = True)

— DougR
kaynak

14

Python bunu yapmak için tüm araçlara sahiptir:

import os

the_dir = 'the_dir_that_want_to_search_in'
all_txt_files = filter(lambda x: x.endswith('.txt'), os.listdir(the_dir))

— xxxo
kaynak

1

All_txt_files bir liste olmasını istiyorsanız:all_txt_files = list(filter(lambda x: x.endswith('.txt'), os.listdir(the_dir)))

— Ena

12

'DataPath' klasörü içindeki tüm '.txt' dosya adlarını Pythonic olarak liste olarak almak için:

from os import listdir
from os.path import isfile, join
path = "/dataPath/"
onlyTxtFiles = [f for f in listdir(path) if isfile(join(path, f)) and  f.endswith(".txt")]
print onlyTxtFiles

— ewalel
kaynak

12

Bunu deneyin, tüm dosyalarınızı tekrar tekrar bulacaktır:

import glob, os
os.chdir("H:\\wallpaper")# use whatever directory you want

#double\\ no single \

for file in glob.glob("**/*.txt", recursive = True):
    print(file)

— mayank
kaynak

özyinelemeli sürümle değil (çift yıldız:) **. Sadece python 3'te mevcut. Sevmediğim chdirkısım. Buna gerek yok.

— Jean-François Fabre

2

iyi, örneğin os kütüphanesini yola katılmak için filepath = os.path.join('wallpaper')kullanabilirsiniz , ve daha sonra glob.glob(filepath+"**/*.psd", recursive = True)aynı sonucu verecek şekilde olarak kullanabilirsiniz.

— Mitalee Rao

8

import os
import sys 

if len(sys.argv)==2:
    print('no params')
    sys.exit(1)

dir = sys.argv[1]
mask= sys.argv[2]

files = os.listdir(dir); 

res = filter(lambda x: x.endswith(mask), files); 

print res

— mrgloom
kaynak

8

Belirli bir uzantıya sahip dosyalar için tam dosya yollarının bir listesini almak için bir klasör, alt dizinler için en hızlı olan çözümün hangisi olduğunu görmek için bir test yaptım (Python 3.6.4, W7x64).

Kısacası, bu görev os.listdir()için en hızlı ve bir sonraki en iyi kadar 1,7 kat daha hızlı: os.walk()(ara ile!), 2,7 kat daha hızlı, 3,2 kat daha hızlı ve 3,3 pathlibkat daha os.scandir()hızlı glob.
Yinelenen sonuçlara ihtiyacınız olduğunda bu sonuçların değişeceğini lütfen unutmayın. Aşağıdaki yöntemlerden birini kopyalayıp yapıştırırsanız, lütfen bir .lower () ekleyin, aksi takdirde .ext aranırken .EXT bulunamaz.

import os
import pathlib
import timeit
import glob

def a():
    path = pathlib.Path().cwd()
    list_sqlite_files = [str(f) for f in path.glob("*.sqlite")]

def b(): 
    path = os.getcwd()
    list_sqlite_files = [f.path for f in os.scandir(path) if os.path.splitext(f)[1] == ".sqlite"]

def c():
    path = os.getcwd()
    list_sqlite_files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith(".sqlite")]

def d():
    path = os.getcwd()
    os.chdir(path)
    list_sqlite_files = [os.path.join(path, f) for f in glob.glob("*.sqlite")]

def e():
    path = os.getcwd()
    list_sqlite_files = [os.path.join(path, f) for f in glob.glob1(str(path), "*.sqlite")]

def f():
    path = os.getcwd()
    list_sqlite_files = []
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith(".sqlite"):
                list_sqlite_files.append( os.path.join(root, file) )
        break



print(timeit.timeit(a, number=1000))
print(timeit.timeit(b, number=1000))
print(timeit.timeit(c, number=1000))
print(timeit.timeit(d, number=1000))
print(timeit.timeit(e, number=1000))
print(timeit.timeit(f, number=1000))

Sonuçlar:

# Python 3.6.4
0.431
0.515
0.161
0.548
0.537
0.274

— user136036
kaynak

Python 3.6.5 belgelerinde şunlar bulunur: os.scandir () işlevi, dosya öznitelik bilgileriyle birlikte dizin girişlerini döndürerek, birçok yaygın kullanım durumunda [os.listdir ()] 'den daha iyi performans sağlar.

— Bill Oldroyd

Bu testin ölçeklendirme kapsamını kaçırıyorum Bu testte kaç dosya kullandınız? sayıyı yukarı / aşağı ölçeklerseniz nasıl karşılaştırırlar?

— N4ppeL

5

Bu kod hayatımı kolaylaştırıyor.

import os
fnames = ([file for root, dirs, files in os.walk(dir)
    for file in files
    if file.endswith('.txt') #or file.endswith('.png') or file.endswith('.pdf')
    ])
for fname in fnames: print(fname)

— praba230890
kaynak

5

Fnmatch kullanın: https://docs.python.org/2/library/fnmatch.html

import fnmatch
import os

for file in os.listdir('.'):
    if fnmatch.fnmatch(file, '*.txt'):
        print file

— YÜCER
kaynak

5

Aynı dizinde "veri" adlı bir klasörden bir dizi ".txt" dosya adları almak için genellikle bu basit kod satırını kullanın:

import os
fileNames = [fileName for fileName in os.listdir("data") if fileName.endswith(".txt")]

— Kamen Tsvetkov
kaynak

3

Fnmatch ve üst yöntemi kullanmanızı öneririm . Bu şekilde aşağıdakilerden herhangi birini bulabilirsiniz:

Adı. txt ;
Adı. TXT ;
Adı. Txt

.

import fnmatch
import os

    for file in os.listdir("/Users/Johnny/Desktop/MyTXTfolder"):
        if fnmatch.fnmatch(file.upper(), '*.TXT'):
            print(file)

— Nicolaesse
kaynak

3

İşte biri extend()

types = ('*.jpg', '*.png')
images_list = []
for files in types:
    images_list.extend(glob.glob(os.path.join(path, files)))

— Efreeto
kaynak

.txt:) ile kullanım için değil

— Efreeto

2

Alt dizinlerle fonksiyonel çözüm:

from fnmatch import filter
from functools import partial
from itertools import chain
from os import path, walk

print(*chain(*(map(partial(path.join, root), filter(filenames, "*.txt")) for root, _, filenames in walk("mydir"))))

— Adam Chrapkowski
kaynak

15

Bu kod uzun vadede korumak istediğiniz mi?

— Simeon Visser

2

Klasörün çok fazla dosya içermesi veya bellek bir kısıtlama olması durumunda, jeneratörleri kullanmayı düşünün:

def yield_files_with_extensions(folder_path, file_extension):
   for _, _, files in os.walk(folder_path):
       for file in files:
           if file.endswith(file_extension):
               yield file

Seçenek A: Yineleme

for f in yield_files_with_extensions('.', '.txt'): 
    print(f)

Seçenek B: Tümünü al

files = [f for f in yield_files_with_extensions('.', '.txt')]

— tashuhka
kaynak

2

Hayaletkine benzer kopyalanabilir bir çözüm:

def get_all_filepaths(root_path, ext):
    """
    Search all files which have a given extension within root_path.

    This ignores the case of the extension and searches subdirectories, too.

    Parameters
    ----------
    root_path : str
    ext : str

    Returns
    -------
    list of str

    Examples
    --------
    >>> get_all_filepaths('/run', '.lock')
    ['/run/unattended-upgrades.lock',
     '/run/mlocate.daily.lock',
     '/run/xtables.lock',
     '/run/mysqld/mysqld.sock.lock',
     '/run/postgresql/.s.PGSQL.5432.lock',
     '/run/network/.ifstate.lock',
     '/run/lock/asound.state.lock']
    """
    import os
    all_files = []
    for root, dirs, files in os.walk(root_path):
        for filename in files:
            if filename.lower().endswith(ext):
                all_files.append(os.path.join(root, filename))
    return all_files

— Martin Thoma
kaynak

1

Python OS kullanbelirli bir uzantıya sahip dosyaları bulmak modülünü kullanın.

basit örnek burada:

import os

# This is the path where you want to search
path = r'd:'  

# this is extension you want to detect
extension = '.txt'   # this can be : .jpg  .png  .xls  .log .....

for root, dirs_list, files_list in os.walk(path):
    for file_name in files_list:
        if os.path.splitext(file_name)[-1] == extension:
            file_name_path = os.path.join(root, file_name)
            print file_name
            print file_name_path   # This is the full path of the filter file

— Rajiv Sharma
kaynak

0

Birçok kullanıcı os.walk, tüm dosyaları, aynı zamanda tüm dizinleri ve alt dizinleri ve dosyalarını içeren cevapları yanıtladı .

import os


def files_in_dir(path, extension=''):
    """
       Generator: yields all of the files in <path> ending with
       <extension>

       \param   path       Absolute or relative path to inspect,
       \param   extension  [optional] Only yield files matching this,

       \yield              [filenames]
    """


    for _, dirs, files in os.walk(path):
        dirs[:] = []  # do not recurse directories.
        yield from [f for f in files if f.endswith(extension)]

# Example: print all the .py files in './python'
for filename in files_in_dir('./python', '*.py'):
    print("-", filename)

Veya bir jeneratöre ihtiyacınız olmayan bir sefer için:

path, ext = "./python", ext = ".py"
for _, _, dirfiles in os.walk(path):
    matches = (f for f in dirfiles if f.endswith(ext))
    break

for filename in matches:
    print("-", filename)

Eşleşmeleri başka bir şey için kullanacaksanız, bunu bir jeneratör ifadesi yerine bir liste yapmak isteyebilirsiniz:

    matches = [f for f in dirfiles if f.endswith(ext)]

— kfsone
kaynak

0

forDöngü kullanarak basit bir yöntem :

import os

dir = ["e","x","e"]

p = os.listdir('E:')  #path

for n in range(len(p)):
   name = p[n]
   myfile = [name[-3],name[-2],name[-1]]  #for .txt
   if myfile == dir :
      print(name)
   else:
      print("nops")

Bu daha genelleştirilebilir olsa da.

— borris
kaynak

bir uzantıyı kontrol etmenin çok unpythonic yolu. Güvenli de değil. İsim çok kısaysa ne olur? ve neden karakter yerine bir karakter listesi kullanmalıyım?

— Jean-François Fabre