How can I merge PDF files (or PS if not possible) such that every file will begin in a odd page? - unix

I am working on a UNIX system and I'd like to merge thousands of PDF files into one file in order to print it. I don't know how many pages they are in advance.
I'd like to print it double sided, such that two files will not be on the same page.
Therefore it I'd the merging file to be aligned such that every file will begin in odd page and a blank page will be added if the next place to write is an even page.

Here's the solution I use (it's based on #Dingo's basic principle, but uses an easier approach for the PDF manipulation):
Create PDF file with a single blank page
First, create a PDF file with a single blank page somewhere (in my case, it is located at /path/to/blank.pdf). This command should work (from this thread):
touch blank.ps && ps2pdf blank.ps blank.pdf
Run Bash script
Then, from the directory that contains all my PDF files, I run a little script that appends the blank.pdf file to each PDF file with an odd page number:
#!/bin/bash
for f in *.pdf; do
let npages=$(pdfinfo "$f"|grep 'Pages:'|awk '{print $2}')
let modulo="($npages %2)"
if [ $modulo -eq 1 ]; then
pdftk "$f" "/path/to/blank.pdf" output "aligned_$f"
# or
# pdfunite "$f" "/path/to/blank.pdf" "aligned_$f"
else
cp "$f" "aligned_$f"
fi
done
Combine the results
Now, all aligned_-prefixed files have even page numbers, and I can join them using
pdftk aligned_*.pdf output result.pdf
# or
pdfunite aligned_*.pdf result.pdf
Tool info:
ps2pdf is in the ghostscript package in most Linux distros
pdfinfo, pdfunite are from the Poppler PDF rendering library (usually the package name is poppler-utils or poppler_utils)
pdftk is usually its own package, the pdftk package

your problem can be more easily solved if you look at this from an another point of view
to obtain that, in printing, page 1 of second pdf file will be not attached to last page of first pdf file on the same sheet of paper, and, more generally, first page of subsequent pdf file will be not printed on the back of the same sheet with the last page of the precedent pdf file
you need to perform a selective addition of one blank page only to pdf files having and odd number of pages
I wrote a simple script named abbblankifneeded that you can put in a file and then copy in /usr/bin or /usr/local/bin
and then invoke in folder where you have your pdf with this syntax
for f in *.pdf; do addblankifneeded $f; done
this script adds a blank page at end to pdf files having an odd number of pages, skipping pdf files having already an even number of pages and then join together all pdf into one
requirements: pdftk, pdfinfo
NOTE: depending from your bash environment, you may need to replace sh interpreter with bash interpreter in the first line of script
#!/bin/sh
#script to add automatically blank page at the end of a pdf documents, if count of their pages is a not a module of 2 and then to join all pdfs into one
#
# made by Dingo
#
# dokupuppylinux.co.cc
#
#http://pastebin.com/u/dingodog (my pastebin toolbox for pdf scripts)
#
filename=$1
altxlarg="`pdfinfo -box $filename| grep MediaBox | cut -d : -f2 | awk '{print $3 FS $4}'`"
echo "%PDF-1.4
%µí®û
3 0 obj
<<
/Length 0
>>
stream
endstream
endobj
4 0 obj
<<
/ProcSet [/PDF ]
/ExtGState <<
/GS1 1 0 R
>>
>>
endobj
5 0 obj
<<
/Type /Halftone
/HalftoneType 1
/HalftoneName (Default)
/Frequency 60
/Angle 45
/SpotFunction /Round
>>
endobj
1 0 obj
<<
/Type /ExtGState
/SA false
/OP false
/HT /Default
>>
endobj
2 0 obj
<<
/Type /Page
/Parent 7 0 R
/Resources 4 0 R
/Contents 3 0 R
>>
endobj
7 0 obj
<<
/Type /Pages
/Kids [2 0 R ]
/Count 1
/MediaBox [0 0 595 841]
>>
endobj
6 0 obj
<<
/Type /Catalog
/Pages 7 0 R
>>
endobj
8 0 obj
<<
/CreationDate (D:20110915222508)
/Producer (libgnomeprint Ver: 2.12.1)
>>
endobj
xref
0 9
0000000000 65535 f
0000000278 00000 n
0000000357 00000 n
0000000017 00000 n
0000000072 00000 n
0000000146 00000 n
0000000535 00000 n
0000000445 00000 n
0000000590 00000 n
trailer
<<
/Size 9
/Root 6 0 R
/Info 8 0 R
>>
startxref
688
%%EOF" | sed -e "s/595 841/$altxlarg/g">blank.pdf
pdftk blank.pdf output fixed.pdf
mv fixed.pdf blank.pdf
pages="`pdftk $filename dump_data | grep NumberOfPages | cut -d : -f2`"
if [ $(( $pages % 2 )) -eq 0 ]
then echo "$filename has already a multiple of 2 pages ($pages ). Script will be skipped for this file" >>report.txt
else
pdftk A=$filename B=blank.pdf cat A B output blankadded.pdf
mv blankadded.pdf $filename
pdffiles=`ls *.pdf | grep -v -e blank.pdf -e joinedtogether.pdf| xargs -n 1`; pdftk $pdffiles cat output joinedtogether.pdf
fi
exit 0

You can use PDFsam:
gratis
runs on Microsoft Windows, Mac OS X and Linux
portable version available (at least on Windows)
can add a blank page after each merged document if the document has an odd number of pages

Disclaimer: I'm the author of the tools I'm mentioning here.
sejda-console
It's a free and open source command line interface for performing pdf manipulations such as merge or split. The merge command has an option stating:
[--addBlanks] : add a blank page after each merged document if the number of pages is odd (optional)
Since you just need to print the pdf I'm assuming you don't care about the order your documents are merged. This is the command you can use:
sejda-console merge -d /path/to/pdfs_to_merge -o /outputpath/merged_file.pdf --addBlanks
It can be downloaded from the official website sejda.org.
sejda.com
This is a web application backed by Sejda and has the same functionalities mentioned above but through a web interface. You are required to upload your files so, depending on the size of your input set, it might not be the right solution for you.
If you select the merge command and upload your pdf documents you will have to flag the checkbox Add blank page if odd page number to get the desired behaviour.

Here is a PowerShell version of the most popular solution using pdftk. I did this for windows but you can use PowerShell Core for other platforms.
# install pdftk server if on windows
# https://www.pdflabs.com/tools/pdftk-server/
$blank_pdf_path = ".\blank.pdf"
$input_folder = ".\input\"
$aligned_folder = ".\aligned\"
$final_output_path = ".\result.pdf"
foreach($file in (Get-ChildItem $input_folder -Filter *.pdf))
{
# easy but might break if pdfinfo output changes
# takes 7th line with the "Page: 2" and matches only numbers
(pdfinfo $file.FullName)[7] -match "(\d+)" | Out-Null
$npages = $Matches[1]
$modulo = $npages % 2
if($modulo -eq 1)
{
$output_path = Join-Path $aligned_folder $file.Name
pdftk $file.FullName $blank_pdf_path output $output_path
}
else
{
Copy-Item $file.FullName -Destination $aligned_folder
}
}
$aligned_pdfs = Join-Path $aligned_folder "*.pdf"
pdftk $aligned_pdfs output $final_output_path

Preparation
Install Python and make sure you have the pyPDF package.
Create a PDF file with a single blank in /path/to/blank.pdf (I've created blank pdf pages here).
Save this as pdfmerge.py in any directory of your $PATH. (I'm not a Windows user. This is straight forward under Linux. Please let me know if you get errors / if it works.)
Make pdfmerge.py executable
Every time you need it
Run uniprint.py a directory that contains only PDF files you want to merge.
pdfmerge.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from argparse import ArgumentParser
from glob import glob
from pyPdf import PdfFileReader, PdfFileWriter
def merge(path, blank_filename, output_filename):
blank = PdfFileReader(file(blank_filename, "rb"))
output = PdfFileWriter()
for pdffile in glob('*.pdf'):
if pdffile == output_filename:
continue
print("Parse '%s'" % pdffile)
document = PdfFileReader(open(pdffile, 'rb'))
for i in range(document.getNumPages()):
output.addPage(document.getPage(i))
if document.getNumPages() % 2 == 1:
output.addPage(blank.getPage(0))
print("Add blank page to '%s' (had %i pages)" % (pdffile, document.getNumPages()))
print("Start writing '%s'" % output_filename)
output_stream = file(output_filename, "wb")
output.write(output_stream)
output_stream.close()
if __name__ == "__main__":
parser = ArgumentParser()
# Add more options if you like
parser.add_argument("-o", "--output", dest="output_filename", default="merged.pdf",
help="write merged PDF to FILE", metavar="FILE")
parser.add_argument("-b", "--blank", dest="blank_filename", default="blank.pdf",
help="path to blank PDF file", metavar="FILE")
parser.add_argument("-p", "--path", dest="path", default=".",
help="path of source PDF files")
args = parser.parse_args()
merge(args.path, args.blank_filename, args.output_filename)
Testing
Please make a comment if this works on Windows and Mac.
Please always leave a comment if it doesn't work / it could be improved.
It works on Linux. Joining 3 PDFs to a single 200-page PDF took less then a second.

Martin had a good start. I updated to PyPdf2 and made a few tweaks like sorting the output by filename.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from argparse import ArgumentParser
from glob import glob
from PyPDF2 import PdfFileReader, PdfFileWriter
import os.path
def merge(pdfpath, blank_filename, output_filename):
with open(blank_filename, "rb") as f:
blank = PdfFileReader(f)
output = PdfFileWriter()
filelist = sorted(glob(os.path.join(pdfpath,'*.pdf')))
for pdffile in filelist:
if pdffile == output_filename:
continue
print("Parse '%s'" % pdffile)
document = PdfFileReader(open(pdffile, 'rb'))
for i in range(document.getNumPages()):
output.addPage(document.getPage(i))
if document.getNumPages() % 2 == 1:
output.addPage(blank.getPage(0))
print("Add blank page to '%s' (had %i pages)" % (pdffile, document.getNumPages()))
print("Start writing '%s'" % output_filename)
with open(output_filename, "wb") as output_stream:
output.write(output_stream)
if __name__ == "__main__":
parser = ArgumentParser()
# Add more options if you like
parser.add_argument("-o", "--output", dest="output_filename", default="merged.pdf",
help="write merged PDF to FILE", metavar="FILE")
parser.add_argument("-b", "--blank", dest="blank_filename", default="blank.pdf",
help="path to blank PDF file", metavar="FILE")
parser.add_argument("-p", "--path", dest="path", default=".",
help="path of source PDF files")
args = parser.parse_args()
merge(args.path, args.blank_filename, args.output_filename)
`

The code by #Chris Lercher in https://stackoverflow.com/a/12761103/1369181 did not quite work for me. I do not know whether that is because I am working on Cygwin/mintty. Also, I have to use qpdf instead of pdftk. Here is the code that has worked for me:
#!/bin/bash
for f in *.pdf; do
npages=$(pdfinfo "$f"|grep 'Pages:'|sed 's/[^0-9]*//g')
modulo=$(($npages %2))
if [ $modulo -eq 1 ]; then
qpdf --empty --pages "$f" "path/to/blank.pdf" -- "aligned_$f"
else
cp "$f" "aligned_$f"
fi
done
Now, all "aligned_" files have even page numbers, and I can join them using qpdf (thanks to https://stackoverflow.com/a/51080927):
qpdf --verbose --empty --pages aligned_* -- all.pdf
And here the useful code from https://unix.stackexchange.com/a/272878 that I have used for creating the blank page:
echo "" | ps2pdf -sPAPERSIZE=a4 - blank.pdf

This one worked for me. Have used pdfcpu on macos.
Can be installed this way:
brew install pdfcpu
And have slightly adjusted the code from https://stackoverflow.com/a/12761103/1369181
#!/bin/bash
mkdir aligned
for f in *.pdf; do
let npages=$(pdfcpu info "$f"|grep 'Page count:'|awk '{print $3}')
let modulo="($npages %2)"
if [ $modulo -eq 1 ]; then
pdfcpu page insert -pages l -mode after "$f" "aligned/$f"
else
cp "$f" "aligned/$f"
fi
done
pdfcpu merge merged-aligned.pdf aligned/*.pdf
rm -rf aligned
NB! It creates and removes "aligned" directory in the current directory. So feel free to improve it to make it safe for use.

Related

parse erorr near '\n' in .zshrc file

When I open my MacOS terminal, it shows
Last login: Sun Jan 2 15:50:48 on ttys000
/Users/rajeshrao/.zshrc:18: parse error near `\n'
(base) rajeshrao#Rajeshs-MacBook-Air ~ %
I tried opening that .zshrc file which had this code in there ---->
export PATH="/usr/local/sbin:$PATH"
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/Users/rajeshrao/opt/anaconda3/bin/conda' 'shell.zsh' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/Users/rajeshrao/opt/anaconda3/etc/profile.d/conda.sh" ]; then
. "/Users/rajeshrao/opt/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/Users/rajeshrao/opt/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-17.0.1.jdk/Contents/Home
<<<conda initialize<<<
here is the image of code
I don't know much about how shell works and am just a beginner. I didn't interfere with anything before using the terminal.
The last line has lost the initial '#' (ie. is not a comment).
Edit the file to end with this line:
# <<<conda initialize<<<

Generating consecutive numbered urls

I want to generate a text file containing the folowing lines:
http://example.com/file1.pdf
http://example.com/file2.pdf
http://example.com/file3.pdf
.
.
http://example.com/file1000.pdf
Can any one advise how to do it using unix command line, please?
Thank you
With an interating for loop
for (( i=1;i<=1000;i++ ));
do
echo "http://example.com/file$i.pdf";
done > newfile
With seq:
while read i;
do
echo "http://example.com/file$i.pdf";
done <<< $(seq 1000) > newfile
It is possible to create/run a python script file ato generate this. Using vim, nano, or any other terminal editor, create a python file as follows:
def genFile(fn, start, end):
with open(fn, "w+") as f:
f.writelines([f"http://example.com/file{str(i)}.pdf\n" for i in range(start, end+1)])
try:
fn = input("File Path: ") # can be relative
start = int(input("Start: ")) # inclusive
end = int(input("End: ")) # inclusive
genFile(fn, start, end)
except:
print("Invalid Input")
Once this is written to a file, let's call it script.py. We can run the following command to execute the script:
python script.py
Then, fill out the prompts for the file path, start, and end. This should result in all those lines printed in the file specified delimited by '\n'.

How to exclude parent Unix processes from grepped output from ps

I have got a file of pids and am using ps -f to get information about the pids.
Here is an example..
ps -eaf | grep -f myfilename
myuser 14216 14215 0 10:00 ? 00:00:00 /usr/bin/ksh /home/myScript.ksh
myuser 14286 14216 0 10:00 ? 00:00:00 /usr/bin/ksh /home/myScript.ksh
where myfilename contains only 14216.
I've got a tiny problem where the output is giving me parent process id's as well as the child. I want to exclude the line for the parent process id.
Does anyone know how I could modify my command to exclude parent process keeping in mind that I could have many process id's in my input file?
Hard to do with just grep but easy to do with awk.
Invoke the awk script below from the following command:
ps -eaf | awk -f script.awk myfilename -
Here's the script:
# process the first file on the command line (aka myfilename)
# this is the list of pids
ARGIND == 1 {
pids[$0] = 1
}
# second and subsequent files ("-"/stdin in the example)
ARGIND > 1 {
# is column 2 of the ps -eaf output [i.e.] the pid in the list of desired
# pids? -- if so, print the entire line
if ($2 in pids)
printf("%s\n",$0)
}
UPDATE:
When using GNU awk (gawk), the following may be ignored. For other [obsolete] versions, insert the following code at the top:
# work around old, obsolete versions
ARGIND == 0 {
defective_awk_flag = 1
}
defective_awk_flag != 0 {
if (FILENAME != defective_awk_file) {
defective_awk_file = FILENAME
ARGIND += 1
}
}
UPDATE #2:
The above is all fine. Just for fun, here's an alternate way to do the same thing with perl. One of the advantages is that everything can be contained in the script and no pipeline is necessary.
Invoke the script via:
./script.pl myfilename
And, here's script.pl. Note: I don't write idiomatic perl. My style is more akin to what one would expect to see in other languages like C, javascript, etc.:
#!/usr/bin/perl
master(#ARGV);
exit(0);
# master -- master control
sub master
{
my(#argv) = #_;
my($xfsrc);
my($pidfile);
my($buf);
# NOTE: "chomp" is a perl function that strips newlines
# get filename with list of pids (e.g. myfilename)
$pidfile = shift(#argv);
open($xfsrc,"<$pidfile") ||
die("master: unable to open '$pidfile' -- $!\n");
# create an associative array (a 'hash" in perl parlance) of the desired
# pid numbers
while ($pid = <$xfsrc>) {
chomp($pid);
$pid_desired{$pid} = 1;
}
close($xfsrc);
# run the 'ps' command and capture its output into an array
#pslist = (`ps -eaf`);
# process the command output, line-by-line
foreach $buf (#pslist) {
chomp($buf);
# the pid number we want is in the second column
(undef,$pid) = split(" ",$buf);
# print the line if the pid is one of the ones we want
print($buf,"\n")
if ($pid_desired{$pid});
}
}
Use this command:
ps -eaf | grep -f myfilename | grep -v grep | grep -f myfilename

how to copy the dynamic file name and append some string while copying into other directory in unix

I have many files like ABC_Timestamp.txt , RAM_Timestamp.txthere timestamp will be different everytime. I want to copy this file into other directory but while copying I want append one string at the end of the file , so the format will be ABC_Timestamp.txt.OK and RAM_Timestamp.txt.OK. How to append the string in dynamic file. Please suggest.
My 2 pence:
(cat file.txt; echo "append a line"; date +"perhaps with a timestamp: %T") > file.txt.OK
Or more complete for your filenames:
while sleep 3;
do
for a in ABC RAM
do
(echo "appending one string at the end of the file" | cat ${a}_Timestamp.txt -) > ${a}_Timestamp.txt.OK
done
done
Execute this on command line.
ls -1|awk '/ABC_.*\.txt/||/RAM_.*\.txt/
{old=$0;
new="/new_dir/"old".OK";
system("cp "old" "new); }'
Taken from here
You can say:
for i in *.txt; do cp "${i}" targetdirectory/"${i}".OK ; done
or
for i in ABC_*.txt RAM_*.txt; do cp "${i}" targetdirectory/"${i}".OK ; done
How about first dumping the names of the file in another file and then moving file one by one.
find . -name "*.txt" >fileNames
while read line
do
newName="${line}appendText"
echo $newName
cp $line $newName
done < fileNames

insert header to a file

I would like to hear your directions on how to insert lines of header (all lines in a file) to another file (more bigger, several GB). I prefer the Unix/awk/sed ways of do that job.
# header I need to insert to another, they are in a file named "header".
##fileformat=VCFv4.0
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO
header="/name/of/file/containing/header"
for file in "$#"
do
cat "$header" "$file" > /tmp/xx.$$
mv /tmp/xx.$$ "$file"
done
You might prefer to locate the temporary file on the same file system as the file you are editing, but anything that requires inserting data at the front of the file is going to end up working very close to this. If you are going to be doing this all day, every day, you might assemble something a little slicker, but the chances are the savings will be minuscule (fractions of a second per file).
If you really, really must use sed, then I suppose you could use:
header="/name/of/file/containing/header"
for file in "$#"
do
sed -e "0r $header" "$file" > /tmp/xx.$$
mv /tmp/xx.$$ "$file"
done
The command reads the content of header 'after' line 0 (before line 1), and then everything else is passed through unchanged. This isn't as swift as cat though.
An analogous construct using awk is:
header="/name/of/file/containing/header"
for file in "$#"
do
awk '{print}' "$header" "$file" > /tmp/xx.$$
mv /tmp/xx.$$ "$file"
done
This simply prints each input line on the output; again, not as swift as cat.
One more advantage of cat over sed or awk; cat will work even if the big files are mainly binary data (it is oblivious to the content of the files). Both sed and awk are designed to handle data split into lines; while modern versions will probably handle even binary data fairly well, it is not what they are designed for.
I did it all with a Perl script, because I had to traverse a directory tree and handle various file types differently. The basic script was
#!perl -w
process_directory(".");
sub process_directory {
my $dir = shift;
opendir DIR, $dir or die "$dir: not a directory\n";
my #files = readdir DIR;
closedir DIR;
foreach(#files) {
next if(/^\./ or /bin/ or /obj/); # ignore some directories
if(-d "$dir/$_") {
process_directory("$dir/$_");
} else {
fix_file("$dir/$_");
}
}
}
sub fix_file {
my $file = shift;
open SRC, $file or die "Can't open $file\n";
my $file = "$file-f";
open FIX, ">$fix" or die "Can't open $fix\n";
print FIX <<EOT;
-- Text to insert
EOT
while(<SRC>) {
print FIX;
}
close SRC;
close FIX;
my $oldfile = $file;
$oldFile =~ s/(.*)\.\(\w+)$/$1-old.$2/;
if(rename $file, $oldFile) {
rename $fix, $file;
}
}
Share and enjoy! Or not -- I'm no Perl hacker, so this is probably double-plus-unoptimal Perl code. Still, it worked for me!

Resources