[Trisquel-users] finding particular pages within PDFs

adel.afzal at gmail.com adel.afzal at gmail.com
Fri Aug 29 20:29:03 CEST 2014


MB, having the text would be way more useful than the PDF pages!  Thanks for  
recommending pdftotext and the -layout option.

I have some questions -- could you help me break this process down into  
smaller steps?

I looked up pdfjam's split command online -- I think that it may be a little  
time consuming (my PDFs are a few thousand pages long):

http://0x2a.at/blog/2011/02/pdf_manipulation_on_the_cli/

http://tex.stackexchange.com/questions/79623/quickly-extracting-individual-pages-from-a-document

I looked at PDF Shuffler (the GUI one) and that can only split files  
one-by-one.  Are there other options?


Once I split the files into single pages, I'll need the Shell command 'for  
file in pages/*" loop.  I don't understand what this step will do.  Could you  
please explain this step too?

About this step: 'if pdftotex "$file" - | grep -i regexps'  -- does this copy  
all the PDF text to one text file?  And then search (grep) the text file?   
Does this command take text from many single PDfs?  Or only after the "hit"  
pages are joined up into one document?

What does it mean to "append the file to a Shell variable" ?  What is the  
goal in this step?  Could you please explain how I can do this step too?


More information about the Trisquel-users mailing list