How to Generate PDFs in Python for Google App Engine

One of my last projects based on google app engine and python involved storing form data in GAE datastore and generating PDF documents that the user can download. Whilst data storing was the easier part as google’s big data API it is pretty well documented, the trickier aspect was to convert it to PDF using python.


This was especially difficult in the face of GAE not providing an easy mechanism for disk writing that most PDF generation libraries require. To share my endeavors, I’m writing this post about how to generate pdfs in python for Google app engine.

The solution I came across was, as far as I know, the only possible way of generating PDFs in python! There are about three PDF generation utilities in python, each differing in terms of their area of usage:

  • Reportlab PDF library: This is the ideal library, if you want to create a pdf from scratch. It provides objects like canvas, pdfmetrics and ttfonts that help you with stuff like adding lines, shapes, images and paragraphs. This is pretty much comparable to the comprehensive iText java library or its C# port, iTextSharp. Their documentation is also good.
  • xhtml2pdf: If you want to simply convert an existing html document to pdf, the xhtml2pdf library comes in very handy.
  • pyPDF: If all you want to do is merge two PDFs page by page quickly, this library is the way to go.

I figured out after researching the above three libraries that a combination of xhtml2pdf and pyPDF is what I needed. Since I already had the html document template ready, I just put placeholders for my form data like __name__ , __occupation__, etc so that I can fill these before converting to PDF.

Now, I could fill these values from my python program, but the real challenge was storing the resulting PDF to disk, which was not allowed by google app engine! Turns out, we don’t need to actually store anything to disk. By sending the CreatePDF() output to a StringIO object, which is stored in memory instead of the filesystem, I could bypass the need to actually store anything to disk!!

sourceHtml = unicode(, errors='ignore')
sourceHtml = template.render(tvals)
sourceHtml = sourceHtml.replace('__name__',sname)
sourceHtml = sourceHtml.replace('__address__',saddress)
sourceHtml = sourceHtml.replace('__occupation__',will.occupation)
packet = StringIO.StringIO() #write to memory

Now, it would have been simple to just self.response.write(packet) to send this pdf download to the user, but in my case, I had to merge this generated pdf with another template-pdf which contained information like symbols, images and page-numbers that for some reason, could not be placed into the html document. So, I had to create a PdfFileReader object (coutesy of PyPDF library!), and then merge each page of my generated document with this template document. Then where do I write this merged output? Any guesses? – another StringIO object!! And then finally, write this StringIO object to self.response, so the user can download it.
new =PdfFileReader(packet) #generated pdf
template = PdfFileReader(file("template.pdf", "rb")) #template pdf
output=PdfFileWriter() #writer for the merged pdf
for i in range(new.getNumPages()):

outputStream = StringIO.StringIO()
output.write(outputStream) #write merged output to the StringIO object

self.response.headers['Content-Type'] = 'application/pdf'
fname = ( if mirror=='n' else will.partner)
self.response.headers['Content-Disposition'] = 'attachment; filename=' + str(fname).replace(' ','_') + '.pdf'

Remember to add and include the below libraries before you do this:

import StringIO
import xhtml2pdf.pisa as pisa
from pyPdf import PdfFileWriter,PdfFileReader
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.pdfbase import pdfmetrics,ttfonts



10 thoughts on “How to Generate PDFs in Python for Google App Engine

  1. Nice article . I would want to say that this is one of the best articles, I encountered on a web search . I tried this but I am still facing some difficulties . Will this work without using a django app . I want to use it in Google App Engine Project .

  2. Hello, thanks for your post.
    I can not install xhtml2pdf package to google appengine.
    When deploy project I got following error.

    running build_ext
    The headers or library files could not be found for zlib,
    a required dependency when compiling Pillow from source.
    Please see the install instructions at:
    Traceback (most recent call last):
    File “”, line 1, in
    File “/tmp/pip_build_dservice/Pillow/”, line 804, in
    raise RequiredDependencyException(msg)

    And this is my requirements.txt file.

    Then this is app.ymal file.
    – name: MySQLdb
    version: "1.2.5"
    – name: PIL
    version: "1.1.7"

    Expect your quick reply.

    • If you cannot perform a ‘standard install’ using pip, then just extract their repo from below github link and then copy the xhtml2pdf package folder into your project root:

      If I remember correctly, that’s how I had used it last time. Also, google app-engine doesn’t allow a pip install of all python packages, but only those in the approved list from Google, at least that’s how it was the last time I’d used it.

    • None that I’m aware of. Your problem relates to google app-engine’s restriction of python package installs to their own filtered list. I’m not aware of any way to install custom packages except the one that I suggested (copy the package folder to your own app folder).

      Try posting in appengine specific forums or google groups to find more about it.

  3. Is it possible for google appengine?
    I know it’s possible only in google computer engine.
    Maybe I have to create computer engine instance?

  4. Which packages I have to upload?
    There are following packages in xhtml2pdf.
    html5lib, httplib2, PIL, pip, pkg_resources, PyPDF2, reportlab, setuptools, webencodings, xhtml2pdf.

    • When I had used it with app-engine, I had copied all the following in my source folder:

      1. xhtml2pdf
      2. reportlab
      3. PyPDF (now PyPDF2)
      4. html5lib

      But that was back in 2014 when app-engine hardly allowed you to install any official python packages at all and my app made extensive use of all of them. I don’t know about the current situation, you can still copy all four of them, or try installing some by “official” way of app-engine first and then copy the rest, its up to you! If you are in doubt, I’d recommend to copy all four of them.

Leave a Reply

Your email address will not be published. Required fields are marked *