Home » What’s the easiest way to merge (server-side) a collection of PDF documents into one big PDF document in JAVA

What’s the easiest way to merge (server-side) a collection of PDF documents into one big PDF document in JAVA

Solutons:


@J D OConal, thanks for the tip, the article you sent me was very outdated, but it did point me towards iText. I found this page that explains how to do exactly what I need:
http://java-x.blogspot.com/2006/11/merge-pdf-files-with-itext.html

Thanks for the other answers, but I don’t really want to have to spawn other processes if I can avoid it, and our project already has itext.jar, so I’m not adding any external dependancies

Here’s the code I ended up writing:

public class PdfMergeHelper {

    /**
     * Merges the passed in PDFs, in the order that they are listed in the java.util.List.
     * Writes the resulting PDF out to the OutputStream provided.
     * 
     * Sample Usage:
     * List<InputStream> pdfs = new ArrayList<InputStream>();
     * pdfs.add(new FileInputStream("/location/of/pdf/OQS_FRSv1.5.pdf"));
     * pdfs.add(new FileInputStream("/location/of/pdf/PPFP-Contract_Genericv0.5.pdf"));
     * pdfs.add(new FileInputStream("/location/of/pdf/PPFP-Quotev0.6.pdf"));
     * FileOutputStream output = new FileOutputStream("/location/to/write/to/merge.pdf");
     * PdfMergeHelper.concatPDFs(pdfs, output, true);
     * 
     * @param streamOfPDFFiles the list of files to merge, in the order that they should be merged
     * @param outputStream the output stream to write the merged PDF to
     * @param paginate true if you want page numbers to appear at the bottom of each page, false otherwise
     */
    public static void concatPDFs(List<InputStream> streamOfPDFFiles, OutputStream outputStream, boolean paginate) {
        Document document = new Document();
        try {
            List<InputStream> pdfs = streamOfPDFFiles;
            List<PdfReader> readers = new ArrayList<PdfReader>();
            int totalPages = 0;
            Iterator<InputStream> iteratorPDFs = pdfs.iterator();

            // Create Readers for the pdfs.
            while (iteratorPDFs.hasNext()) {
                InputStream pdf = iteratorPDFs.next();
                PdfReader pdfReader = new PdfReader(pdf);
                readers.add(pdfReader);
                totalPages += pdfReader.getNumberOfPages();
            }
            // Create a writer for the outputstream
            PdfWriter writer = PdfWriter.getInstance(document, outputStream);

            document.open();
            BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
            PdfContentByte cb = writer.getDirectContent(); // Holds the PDF
            // data

            PdfImportedPage page;
            int currentPageNumber = 0;
            int pageOfCurrentReaderPDF = 0;
            Iterator<PdfReader> iteratorPDFReader = readers.iterator();

            // Loop through the PDF files and add to the output.
            while (iteratorPDFReader.hasNext()) {
                PdfReader pdfReader = iteratorPDFReader.next();

                // Create a new page in the target for each source page.
                while (pageOfCurrentReaderPDF < pdfReader.getNumberOfPages()) {
                    document.newPage();
                    pageOfCurrentReaderPDF++;
                    currentPageNumber++;
                    page = writer.getImportedPage(pdfReader, pageOfCurrentReaderPDF);
                    cb.addTemplate(page, 0, 0);

                    // Code for pagination.
                    if (paginate) {
                        cb.beginText();
                        cb.setFontAndSize(bf, 9);
                        cb.showTextAligned(PdfContentByte.ALIGN_CENTER, "" + currentPageNumber + " of " + totalPages,
                                520, 5, 0);
                        cb.endText();
                    }
                }
                pageOfCurrentReaderPDF = 0;
            }
            outputStream.flush();
            document.close();
            outputStream.close();
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (document.isOpen()) {
                document.close();
            }
            try {
                if (outputStream != null) {
                    outputStream.close();
                }
            } catch (IOException ioe) {
                ioe.printStackTrace();
            }
        }
    }
}

I’ve used pdftk to great effect. It’s an external application that you’ll have to run from your java app.

iText seems to have changed and now has commercial licencing requirements, along with not that good help (Want documentation? Buy our book!).

We ended up finding PDFSharp http://www.pdfsharp.net/ and using that. The sample for concatenating multiple pdf documents together is simple and easy to follow: http://www.pdfsharp.net/wiki/ConcatenateDocuments-sample.ashx

Enjoy
Random

Related Solutions

Explaining computational complexity theory

Hoooo, doctoral comp flashback. Okay, here goes. We start with the idea of a decision problem, a problem for which an algorithm can always answer "yes" or "no." We also need the idea of two models of computer (Turing machine, really): deterministic and...

Building a multi-level menu for umbraco

First off, no need pass the a parent parameter around. The context will transport this information. Here is the XSL stylesheet that should solve your problem: <!-- update this variable on how deep your menu should be --> <xsl:variable...

How to generate a random string?

My favorite way to do it is by using /dev/urandom together with tr to delete unwanted characters. For instance, to get only digits and letters: tr -dc A-Za-z0-9 </dev/urandom | head -c 13 ; echo '' Alternatively, to include more characters from the OWASP...

How to copy a file from a remote server to a local machine?

The syntax for scp is: If you are on the computer from which you want to send file to a remote computer: scp /file/to/send username@remote:/where/to/put Here the remote can be a FQDN or an IP address. On the other hand if you are on the computer wanting to...

What is the difference between curl and wget?

The main differences are: wget's major strong side compared to curl is its ability to download recursively. wget is command line only. There's no lib or anything, but curl's features are powered by libcurl. curl supports FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP,...

Using ‘sed’ to find and replace [duplicate]

sed is the stream editor, in that you can use | (pipe) to send standard streams (STDIN and STDOUT specifically) through sed and alter them programmatically on the fly, making it a handy tool in the Unix philosophy tradition; but can edit files directly, too,...

How do I loop through only directories in bash?

You can specify a slash at the end to match only directories: for d in */ ; do echo "$d" done If you want to exclude symlinks, use a test to continue the loop if the current entry is a link. You need to remove the trailing slash from the name in order for -L to...

How to clear journalctl

The self maintenance method is to vacuum the logs by size or time. Retain only the past two days: journalctl --vacuum-time=2d Retain only the past 500 MB: journalctl --vacuum-size=500M man journalctl for more information. You don't typically clear the journal...

How can I run a command which will survive terminal close?

One of the following 2 should work: $ nohup redshift & or $ redshift & $ disown See the following for a bit more information on how this works: man nohup help disown Difference between nohup, disown and & (be sure to read the comments too) If your...

Get exit status of process that’s piped to another

bash and zsh have an array variable that holds the exit status of each element (command) of the last pipeline executed by the shell. If you are using bash, the array is called PIPESTATUS (case matters!) and the array indicies start at zero: $ false | true $...

Execute vs Read bit. How do directory permissions in Linux work?

When applying permissions to directories on Linux, the permission bits have different meanings than on regular files. The read bit (r) allows the affected user to list the files within the directory The write bit (w) allows the affected user to create, rename,...

What are the pros and cons of Vim and Emacs? [closed]

I use both, although if I had to choose one, I know which one I would pick. Still, I'll try to make an objective comparison on a few issues. Available everywhere? If you're a professional system administrator who works with Unix systems, or a power user on...

How do I use pushd and popd commands?

pushd, popd, and dirs are shell builtins which allow you manipulate the directory stack. This can be used to change directories but return to the directory from which you came. For example start up with the following directories: $ pwd /home/saml/somedir $ ls...

How to forward X over SSH to run graphics applications remotely?

X11 forwarding needs to be enabled on both the client side and the server side. On the client side, the -X (capital X) option to ssh enables X11 forwarding, and you can make this the default (for all connections or for a specific connection) with ForwardX11 yes...

What does “LC_ALL=C” do?

LC_ALL is the environment variable that overrides all the other localisation settings (except $LANGUAGE under some circumstances). Different aspects of localisations (like the thousand separator or decimal point character, character set, sorting order, month,...

What is a bind mount?

What is a bind mount? A bind mount is an alternate view of a directory tree. Classically, mounting creates a view of a storage device as a directory tree. A bind mount instead takes an existing directory tree and replicates it under a different point. The...

Turn off buffering in pipe

Another way to skin this cat is to use the stdbuf program, which is part of the GNU Coreutils (FreeBSD also has its own one). stdbuf -i0 -o0 -e0 command This turns off buffering completely for input, output and error. For some applications, line buffering may...