Split PDF document from command line in Linux?

I would like to extract page ranges from a PDF document into a new PDF document using the command line in Linux. Note that:

$ pdftk input.pdf cat 1 verbose output output.pdf
Error: Failed to open PDF file: input.pdf
Errors encountered. No output created.
Done. Input errors, so no output created.

Turns out that "You (should) know that Pdftk is nothing more than a very old version of iText.... The keywords in the above statement are "VERY OLD"." (from pdftk can't open pdf file)

$ java -classpath /path/to/Multivalent20091027.jar tool.pdf.Split -page 1 input.pdf
Exception in thread "main" java.lang.NoClassDefFoundError: tool/pdf/Split
Caused by: java.lang.ClassNotFoundException: tool.pdf.Split at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: tool.pdf.Split. Program will exit.

Turns out, this is a bit of a tricky software: even if its on SourceForge, and says that "Practical Thought generously provides these tools for free use on the command line" here - however, here then it says: "The browser is open source. The document tools are a free bonus and not open source." ... which finally clarifies the comment from conversion - Gluing (Imposition) PDF documents - Stack Overflow:

All releases of Multivalent linked from the official sourceforge site are missing the tools package.

(edit: there seems to be an old Multivalent version with the tools included, see the SO link; but as it looks somewhat like abandonware, I'd rather not use it)

  • Finally, I'd like to avoid tools that are essentially front-ends for Latex like PDFjam

 

So, are there any options for such a pdf-splitting command line tool under Linux?

1

4 Answers

I find pdfseparate very convenient to split ranges into individual pages. This command would extract pages 1 - 5 of input.pdf into files named output-page1.pdf, output-page2.pdf, ...

pdfseparate -f 1 -l 5 input.pdf output-page%d.pdf

If you want to recombine them into page ranges, for example pages 1-3 in one document and pages 4-5 in another, you can use the companion program, pdfunite, as follows:

pdfunite output-page1.pdf output-page2.pdf output-page3.pdf final-pages1-3.pdf
pdfunite output-page4.pdf output-page5.pdf final-pages4-5.pdf

I believe theese tools are part of poppler and may already be installed on your system.

2

Using pdftk 2.02 worked for me on debian, but I think it should work for you too.

pdftk input.pdf cat 2-4 output out1.pdf

For a general case where you have to split a single pdf to multiple files I could not find a way with pdftk, so I'm using a Bash script.

I'll put this as an answer, so as not to clog the question: here is a related link on unix.se:

... and the accepted answer uses a Python script with PyPDF (but that answer implements a split of one page into two - and that script thus needs to be modified for page ranges, for it to work as asked in OP).

EDIT: I just found this: Stapler - A python utility for manipulating PDF docs based on pypdf (Page 3) / Community Contributions / Arch Linux Forums; which is, apparently "A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk" (note that the mailing list notes some problems with it, however)...

You can use the pdfjam tool with the syntax

pdfjam <input-file> <page-ranges> -o <output-file>

and an example of page ranges would be

3,67-70,80

Source: by Vincent Nivoliers

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like