I recently worked on an ETL project where I had to refer to screenshots from the legacy application quite often. I had a folder full of screenshot images but it was quite inconvenient to keep clicking through images to find the one I wanted. A better solution would be to have all of the images in a single PDF file where I could just scroll through. One option would be to paste the images to a Word document and then export that to PDF. There must be a better way. Python to the rescue with a learning opportunity.
In today’s digital age, working with images and documents is a common task in various fields. Sometimes, you might need to consolidate a collection of image files into a single PDF document. In this blog post, I will walk you through the process of creating a Python program to achieve just that, using the Python img2pdf library. This library simplifies the process of converting a list of image files into a PDF file, making it an excellent tool for automating PDF document creation.
What is img2pdf?
Img2pdf is a Python library that provides a straightforward way to convert image files into PDF documents. It supports various image formats, including PNG, JPEG, and more, allowing you to create PDF files from a variety of source images. The library is lightweight, easy to use, and can be a valuable addition to your Python toolkit when dealing with image-to-PDF conversion tasks.
Step-by-Step Walkthrough
- Import required modules:
To begin, we import the necessary Python modules, ‘os‘ and ‘img2pdf,’ which will be used in our program.
import os
import img2pdf
- Define the working directory:
We specify the path to the directory containing the image files you want to convert into a PDF. The new PDF file will also be saved to this folder. Make sure to replacedir_path
with the actual path to your image directory.
dir_path = "/Users/nigel/Documents/10_Code/Python/ImgToPdf"
- Generate a list of image files:
We create an empty list called ‘images’ to store the file paths of image files with the “.png” extension found in the specified directory. Make sure to change the image file extension if not working with PNG images.
images = []
for fname in os.listdir(dir_path):
if fname.endswith(".png"):
p = os.path.join(dir_path, fname)
images.append(p)
- Create the PDF file:
With our list of image file paths appended to the ‘images‘ list, we can proceed to create the PDF document. We determine the name of the output PDF file by using the base name of the source directory using the ‘basename‘ method from the ‘os.path‘ library.
out_file = os.path.basename(dir_path)
- Write the PDF file:
We open a PDF file for writing in binary mode (‘wb’) and use img2pdf’s ‘convert‘ method to convert the list of image files into a single PDF. The resulting PDF data is written to the file.
with open(out_file + ".pdf", "wb") as f:
f.write(img2pdf.convert(images))
The completed code should look like this:
import os
import img2pdf
# path to folder containing images
dir_path = "/Users/nigel/Documents/10_Code/Python/ImgToPdf"
# name the output pdf file as the base folder name
out_file = os.path.basename(dir_path)
images = []
for fname in os.listdir(dir_path):
if fname.endswith(".png"):
p = os.path.join(dir_path, fname)
images.append(p)
with open(out_file + ".pdf", "wb") as f:
f.write(img2pdf.convert(images))
Conclusion
In this blog post, we’ve explored the process of using the img2pdf library to create a Python program that consolidates a collection of image files into a single PDF document. The program is concise, efficient, and an excellent example of how Python can simplify complex tasks. We’ve learned about importing Python modules, working with file paths, and using the img2pdf library for image-to-PDF conversion. By mastering these concepts, you can streamline various document-related processes in your Python projects and automate your workflow.