Nasıl renk pdf siyah-beyaz dönüştürmek için?

18

Boyutlarını azaltmak için bir pdf'yi sadece siyah beyaz olan başka bir pdf'de renkli metin ve görüntülerle dönüştürmek istiyorum. Dahası, resimlerdeki sayfa öğelerini dönüştürmeden metni metin olarak tutmak istiyorum. Aşağıdaki komutu denedim:

convert -density 150 -threshold 50% input.pdf output.pdf

başka bir soru, bir bağlantı bulundu , ama istemediğimi yapar: çıktıdaki metin kötü bir görüntü dönüştürülür ve artık seçilemez. Ghostscript ile denedim:

gs      -sOutputFile=output.pdf \
        -q -dNOPAUSE -dBATCH -dSAFER \
        -sDEVICE=pdfwrite \
        -dCompatibilityLevel=1.3 \
        -dPDFSETTINGS=/screen \
        -dEmbedAllFonts=true \
        -dSubsetFonts=true \
        -sColorConversionStrategy=/Mono \
        -sColorConversionStrategyForImages=/Mono \
        -sProcessColorModel=/DeviceGray \
        $1

ama bana aşağıdaki hata iletisini veriyor:

./script.sh: 19: ./script.sh: output.pdf: not found

Dosyayı oluşturmanın başka bir yolu var mı?

— BowPark
kaynak

Bu çok iyi görünüyor superuser.com/questions/200378/…

— slackmart

1

İlgili: unix.stackexchange.com/questions/84709/…

— slm

Bazı süper kullanıcı yaklaşımlarını kullanırken dikkatli olun, PDF'yi rasterleştirilmiş bir sürüme dönüştürürler, bu yüzden artık vektör grafikleri değildir.

— slm

1

Senaryonun tamamı bu mu? Öyle görünmüyor, tüm senaryoyu gönderebilir misiniz?

— terdon

23

Gs örneği

gsYukarıdaki koşuyoruz komut bir sondaki sahiptir $1tipik bir komut dosyası içine komut satırı argümanları geçen içindir. Yani aslında ne denediğinden emin değilim ama bu komutu bir senaryoya koymaya çalıştığınızı tahmin ediyorum script.sh:

#!/bin/bash

gs      -sOutputFile=output.pdf \
        -q -dNOPAUSE -dBATCH -dSAFER \
        -sDEVICE=pdfwrite \
        -dCompatibilityLevel=1.3 \
        -dPDFSETTINGS=/screen \
        -dEmbedAllFonts=true \
        -dSubsetFonts=true \
        -sColorConversionStrategy=/Mono \
        -sColorConversionStrategyForImages=/Mono \
        -sProcessColorModel=/DeviceGray \
        $1

Ve şu şekilde çalıştırın:

$ ./script.sh: 19: ./script.sh: output.pdf: not found

Bu komut dosyasını nasıl ayarladığınızdan emin değil, ancak çalıştırılabilir olması gerekiyor.

$ chmod +x script.sh

Bir şey kesinlikle bu senaryo ile doğru görünmüyor. Bunu denediğimde bunun yerine bu hatayı aldım:

Kurtarılamaz hata: .putdeviceprops içindeki rangecheck

Bir alternatif

Bu senaryo yerine bunu SU sorusundan kullanırım.

#!/bin/bash

gs \
 -sOutputFile=output.pdf \
 -sDEVICE=pdfwrite \
 -sColorConversionStrategy=Gray \
 -dProcessColorModel=/DeviceGray \
 -dCompatibilityLevel=1.4 \
 -dNOPAUSE \
 -dBATCH \
 $1

Sonra şu şekilde çalıştırın:

$ ./script.bash LeaseContract.pdf 
GPL Ghostscript 8.71 (2010-02-10)
Copyright (C) 2010 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 2.
Page 1
Page 2

— slm
kaynak

Haklısın, senaryoda yanlış bir şey var: bu durumda "şey", bunun yerine sProcessColorModelolması gerekir dProcessColorModel.

— Sora.

8

Burada bunu yapabilen bir senaryo buldum . Sahip gsolduğunuz gibi ama aynı zamanda gerektirir pdftk. Dağıtımınızdan bahsetmediniz, ancak Debian tabanlı sistemlerde,

sudo apt-get install pdftk

RPM'leri burada bulabilirsiniz .

Yükledikten sonra pdftkkomut dosyasını farklı kaydedin graypdf.shve şu şekilde çalıştırın:

./greypdf.sh input.pdf

Adlı bir dosya oluşturur input-gray.pdf. Link rot önlemek için burada tüm komut dosyası dahil:

# convert pdf to grayscale, preserving metadata
# "AFAIK graphicx has no feature for manipulating colorspaces. " http://groups.google.com/group/latexusersgroup/browse_thread/thread/5ebbc3ff9978af05
# "> Is there an easy (or just standard) way with pdflatex to do a > conversion from color to grayscale when a PDF file is generated? No." ... "If you want to convert a multipage document then you better have pdftops from the xpdf suite installed because Ghostscript's pdf to ps doesn't produce nice Postscript." http://osdir.com/ml/tex.pdftex/2008-05/msg00006.html
# "Converting a color EPS to grayscale" - http://en.wikibooks.org/wiki/LaTeX/Importing_Graphics
# "\usepackage[monochrome]{color} .. I don't know of a neat automatic conversion to monochrome (there might be such a thing) although there was something in Tugboat a while back about mapping colors on the fly. I would probably make monochrome versions of the pictures, and name them consistently. Then conditionally load each one" http://newsgroups.derkeiler.com/Archive/Comp/comp.text.tex/2005-08/msg01864.html
# "Here comes optional.sty. By adding \usepackage{optional} ... \opt{color}{\includegraphics[width=0.4\textwidth]{intro/benzoCompounds_color}} \opt{grayscale}{\includegraphics[width=0.4\textwidth]{intro/benzoCompounds}} " - http://chem-bla-ics.blogspot.com/2008/01/my-phd-thesis-in-color-and-grayscale.html
# with gs:
# http://handyfloss.net/2008.09/making-a-pdf-grayscale-with-ghostscript/
# note - this strips metadata! so:
# http://etutorials.org/Linux+systems/pdf+hacks/Chapter+5.+Manipulating+PDF+Files/Hack+64+Get+and+Set+PDF+Metadata/
COLORFILENAME=$1
OVERWRITE=$2
FNAME=${COLORFILENAME%.pdf}
# NOTE: pdftk does not work with logical page numbers / pagination;
# gs kills it as well;
# so check for existence of 'pdfmarks' file in calling dir;
# if there, use it to correct gs logical pagination
# for example, see
# http://askubuntu.com/questions/32048/renumber-pages-of-a-pdf/65894#65894
PDFMARKS=
if [ -e pdfmarks ] ; then
PDFMARKS="pdfmarks"
echo "$PDFMARKS exists, using..."
# convert to gray pdf - this strips metadata!
gs -sOutputFile=$FNAME-gs-gray.pdf -sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH "$COLORFILENAME" "$PDFMARKS"
else # not really needed ?!
gs -sOutputFile=$FNAME-gs-gray.pdf -sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH "$COLORFILENAME"
fi
# dump metadata from original color pdf
## pdftk $COLORFILENAME dump_data output $FNAME.data.txt
# also: pdfinfo -meta $COLORFILENAME
# grep to avoid BookmarkTitle/Level/PageNumber:
pdftk $COLORFILENAME dump_data output | grep 'Info\|Pdf' > $FNAME.data.txt
# "pdftk can take a plain-text file of these same key/value pairs and update a PDF's Info dictionary to match. Currently, it does not update the PDF's XMP stream."
pdftk $FNAME-gs-gray.pdf update_info $FNAME.data.txt output $FNAME-gray.pdf
# (http://wiki.creativecommons.org/XMP_Implementations : Exempi ... allows reading/writing XMP metadata for various file formats, including PDF ... )
# clean up
rm $FNAME-gs-gray.pdf
rm $FNAME.data.txt
if [ "$OVERWRITE" == "y" ] ; then
echo "Overwriting $COLORFILENAME..."
mv $FNAME-gray.pdf $COLORFILENAME
fi
# BUT NOTE:
# Mixing TEX & PostScript : The GEX Model - http://www.tug.org/TUGboat/Articles/tb21-3/tb68kost.pdf
# VTEX is a (commercial) extended version of TEX, sold by MicroPress, Inc. Free versions of VTEX have recently been made available, that work under OS/2 and Linux. This paper describes GEX, a fast fully-integrated PostScript interpreter which functions as part of the VTEX code-generator. Unless specified otherwise, this article describes the functionality in the free- ware version of the VTEX compiler, as available on CTAN sites in systems/vtex.
# GEX is a graphics counterpart to TEX. .. Since GEX may exercise subtle influence on TEX (load fonts, or change TEX registers), GEX is op- tional in VTEX implementations: the default oper- ation of the program is with GEX off; it is enabled by a command-line switch.
# \includegraphics[width=1.3in, colorspace=grayscale 256]{macaw.jpg}
# http://mail.tug.org/texlive/Contents/live/texmf-dist/doc/generic/FAQ-en/html/FAQ-TeXsystems.html
# A free version of the commercial VTeX extended TeX system is available for use under Linux, which among other things specialises in direct production of PDF from (La)TeX input. Sadly, it���s no longer supported, and the ready-built images are made for use with a rather ancient Linux kernel.
# NOTE: another way to capture metadata; if converting via ghostscript:
# http://compgroups.net/comp.text.pdf/How-to-specify-metadata-using-Ghostscript
# first:
# grep -a 'Keywo' orig.pdf
# /Author(xxx)/Title(ttt)/Subject()/Creator(LaTeX)/Producer(pdfTeX-1.40.12)/Keywords(kkkk)
# then - copy this data in a file prologue.ini:
#/pdfmark where {pop} {userdict /pdfmark /cleartomark load put} ifelse
#[/Author(xxx)
#/Title(ttt)
#/Subject()
#/Creator(LaTeX with hyperref package + gs w/ prologue)
#/Producer(pdfTeX-1.40.12)
#/Keywords(kkkk)
#/DOCINFO pdfmark
#
# finally, call gs on the orig file,
# asking to process pdfmarks in prologue.ini:
# gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
# -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -dDOPDFMARKS \
# -sOutputFile=out.pdf in.pdf prologue.ini
# then the metadata will be in output too (which is stripped otherwise;
# note bookmarks are preserved, however).

— terdon
kaynak

3

Ayrıca bw dönüştürmek istediğim bazı taranmış renkli pdfs ve gri tonlamalı pdfs vardı. Kullanmayı denedimgsBurada listelenen kod ile ve görüntü kalitesi hala orada pdf metin ile iyidir. Bununla birlikte, bu gs kodu yalnızca (sorguda sorulduğu gibi) gri tonlamaya dönüştürülür ve yine de büyük dosya boyutuna sahiptir. convertdoğrudan kullanıldığında çok kötü sonuçlar verir.

İyi görüntü kalitesi ve küçük dosya boyutu ile bw pdfs istedim. Terdon'ın çözümünü denedim, ama pdftkyum (yazma zamanında) kullanarak centOS 7 alamadım .

Benim çözüm gspdf gri tonlama bmp dosyaları ayıklamak için kullanır ,convert pdf'den ayıklamak , bu bmp'leri bw ile eşleştirmek ve tiff dosyaları olarak kaydetmek ve sonra tiff görüntülerini sıkıştırmak ve hepsini tek bir pdf olarak birleştirmek için img2pdf'yi kullanır.

Doğrudan pdf tiff gitmeye çalıştım ama kalite aynı değil bu yüzden her sayfa bmp kaydedin. Bir sayfalık pdf dosyası için,convert bmp'den pdf'ye harika bir iş çıkarır. Misal:

gs -sDEVICE=bmpgray -dNOPAUSE -dBATCH -r300x300 \
   -sOutputFile=./pdf_image.bmp ./input.pdf

convert ./pdf_image.bmp -threshold 40% -compress zip ./bw_out.pdf

Birden fazla sayfa için, gsbirden fazla pdf dosyasını tek bir dosyada birleştirebilir, ancak img2pdfgs'den daha küçük dosya boyutu verir. Tiff dosyaları img2pdf dosyasına girdi olarak sıkıştırılmamış olmalıdır. Çok sayıda sayfa için aklınızda bulundurun, ara bmp ve tiff dosyalarının boyutu büyüktür. pdftkveyajoinpdf sıkıştırılmış pdf dosyalarını birleştirebilirlerse daha iyi olur convert.

Daha zarif bir çözüm olduğunu hayal ediyorum. Ancak yöntemim, çok iyi görüntü kalitesine ve çok daha küçük dosya boyutuna sahip sonuçlar üretir. Bw pdf içine metin geri almak için OCR'ı tekrar çalıştırın.

Kabuk betiğim gs, convert ve img2pdf kullanıyor. Başlangıçta listelenen parametreleri (sayfa sayısı, tarama dpi'si,% eşiği vb.) Gerektiği gibi değiştirin ve çalıştırın chmod +x ./pdf2bw.sh. İşte tam komut dosyası (pdf2bw.sh):

#!/bin/bash

num_pages=12
dpi_res=300
input_pdf_name=color_or_grayscale.pdf
bw_threshold=40%
output_pdf_name=out_bw.pdf
#-------------------------------------------------------------------------
gs -sDEVICE=bmpgray -dNOPAUSE -dBATCH -q -r$dpi_res \
   -sOutputFile=./%d.bmp ./$input_pdf_name
#-------------------------------------------------------------------------
for file_num in `seq 1 $num_pages`
do
  convert ./$file_num.bmp -threshold $bw_threshold \
          ./$file_num.tif
done
#-------------------------------------------------------------------------
input_files=""

for file_num in `seq 1 $num_pages`
do
  input_files+="./$file_num.tif "
done

img2pdf -o ./$output_pdf_name --dpi $dpi_res $input_files
#-------------------------------------------------------------------------
# clean up bmp and tif files used in conversion

for file_num in `seq 1 $num_pages`
do
  rm ./$file_num.bmp
  rm ./$file_num.tif
done

— OccamsRazor
kaynak

1

Her ikisi de 8.70'deki Ghostscript'i temel alan RHEL6 ve RHEL5, yukarıda verilen komutun formlarını kullanamadı. PDF dosyasını ilk argüman "$ 1" olarak bekleyen bir komut dosyası veya işlev varsayarsak, aşağıdakiler daha taşınabilir olmalıdır:

gs \
    -sOutputFile="grey_$1" \
    -sDEVICE=pdfwrite \
    -sColorConversionStrategy=Mono \
    -sColorConversionStrategyForImages=/Mono \
    -dProcessColorModel=/DeviceGray \
    -dCompatibilityLevel=1.3 \
    -dNOPAUSE -dBATCH \
    "$1"

Çıktı dosyasının önüne "grey_" eklenir.

RHEL6 ve 5 çok daha hızlı olan CompatibilityLevel = 1.4 kullanabilir , ancak taşınabilirliği hedefliyordum.

— Zengin
kaynak

Devs ( 1 , 2 , 3 , 4 ) sColorConversionStrategyForImagesanahtar olmadığını söylüyor .

— Igor

Teşekkürler @Igor - Bu parçacığı nereden aldığım hakkında hiçbir fikrim yok! Bunu test ettiğimi biliyorum ve o zaman işe yaradı . (Ve bu millet, kodunuz için her zaman referans vermelisiniz.)

— Zengin

1

Bu "sahte parametre" web arasında inanılmaz derecede popüler görünüyor. GS (üzgün) bilinmeyen anahtarları yok sayar, bu yüzden yine de çalışır.

— Igor

1

Bu komut dosyası ile iyi kontrast için taranmış pdf'leri temizleme güvenilir sonuçlar elde;

#!/bin/bash
# 
# $ sudo apt install poppler-utils img2pdf pdftk imagemagick
#
# Output is still greyscale, but lots of scanner light tone fuzz removed.
#

pdfimages $1 pages

ls ./pages*.ppm | xargs -L1 -I {} convert {}  -quality 100 -density 400 \
  -fill white -fuzz 80% -auto-level -depth 4 +opaque "#000000" {}.jpg

ls -1 ./pages*jpg | xargs -L1 -I {} img2pdf {} -o {}.pdf

pdftk pages*.pdf cat output ${1/.pdf/}_bw.pdf

rm pages*

— Bijou Smith
kaynak

0

Yukarıda birçok harika cevap var. Yukarıdaki cevaplardan birini aldım ve bazı insan arayüzlerini ekledim. Belki birisi bunu faydalı bulabilir.

#!/bin/bash

pdf2Gray()
{
    if [ -z "$1" ]; then
        return 1
    else
        inputFile="$1"
    fi

    if [ ! -f "$inputFile" ]; then
        echo "File not found"
        echo "$inputFile"
        return 2
    fi

    fileType="`file -b --mime-type \"$inputFile\"`"

    if [ "$fileType" != 'application/pdf' ]; then
        echo "This file is not a pdf"
        echo "$inputFile"
        return 3
    fi

    outFile="`basename -s .pdf \"$inputFile\"`-gray.pdf"

    if [ -f "$outFile" ]; then

        echo -en "File Exists, overwrite it? (Y/n) "
        read overWrite

        if [ -z "$overWrite" ]; then
            overWrite="y"
        fi

        if [[ "$overWrite" != "y" && "$overWrite" != "Y" ]]; then
            return 4
        fi
    fi
gs \
-q \
-sOutputFile="$outFile" \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray \
-dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 \
-dNOPAUSE \
-dBATCH \
"$inputFile"

echo -en "\033[1;32m"
echo -n "$PWD/$outFile"
echo -e "\033[0m"

}

if [ -z $1 ]; then
    echo "usage:"
    echo "$0 file1.pdf file2.pdf file3.pdf ..."
else
    for file in "$@"
    do
        pdf2Gray "$file"
        #echo $? #debug
    done
fi

— abear2
kaynak