How to convert PDF File to CSV File using iText API in Java

In this example I going to show you how to convert PDF file to CSV file. I will read the PDF file using iText library and write data to the CSV file using Java programming language. In my previous example I had shown how to convert CSV file to PDF file using iText library.

As you know that CSV is a comma separated value, so I assume that the PDF file is having data in tabular format which would be converted into comma separated values.

Related Posts:

Prerequisites

At least Java 1.8, Gradle 6.5.1, Maven 3.6.3, iText library 5.3.13.1

Convert PDF to CSV

Now I will use here the same PDF file which was generated using the example how to convert CSV to PDF file.

In the below code the line PdfReader pdfReader = new PdfReader("student.pdf"); reads the PDF file from the project’s root directory.

I determine the page number first and loop through each page to extract the content from PDF file using the below line:

String content = PdfTextExtractor.getTextFromPage(pdfReader, i);

Next I skip the title part of the table content. Then I split each line and replace all white spaces by comma (,) and write to CSV file.

package com.roytuts.java.pdf.to.csv;

import java.io.FileWriter;
import java.io.IOException;

import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;

public class PdfToCsvConverter {

	public static void main(String[] args) throws IOException, DocumentException {
		PdfReader pdfReader = new PdfReader("student.pdf");

		int pages = pdfReader.getNumberOfPages();

		FileWriter csvWriter = new FileWriter("student.csv");

		for (int i = 1; i <= pages; i++) {
			String content = PdfTextExtractor.getTextFromPage(pdfReader, i);

			String[] splitContents = content.split("\n");

			boolean isTitle = true;

			for (int j = 0; j < splitContents.length; j++) {
				if (isTitle) {
					isTitle = false;
					continue;
				}

				csvWriter.append(splitContents[j].replaceAll(" ", ","));
				csvWriter.append("\n");
			}
		}

		csvWriter.flush();
		csvWriter.close();
	}

}

The address might have white spaces which is required for better clarity. You need to handle it instead of simply replacing by comma(,).

I did not handle white spaces for address field in my above code.

Running the above class will produce below output:

convert pdf to csv using itext in java

Source Code

Download

Thanks for reading.

Related posts

Leave a Comment