How to merge Multiple CSV Files into One in Java

Introduction

This tutorial shows how to merge multiple csv files into one in Java. You may need to merge multiple csv files into one in some situations for your business requirements.

Suppose there are n number of csv files and each csv file is having different number of headers, so this example will show you how to merge multiple csv files into one file in java and write all the records into a single csv file using java. The single file will contain all unique headers from multiple csv files.

Prerequisites

Java at least 8, Gradle 4.10.2 – 6.8.3, Maven 3.6.3

Project Setup

Create a gradle or maven based project in your favorite tool or IDE. The name of the project is java-merge-multiple-csv-files.

You don’t need any special library for reading or writing csv files. So it’s a simple build script:

plugins {
    id 'java-library'
}

sourceCompatibility = 12
targetCompatibility = 12

repositories {
    jcenter()
}

dependencies {
}

If you are creating maven based project then you can use pom.xml file:

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.roytuts</groupId>
	<artifactId>java-merge-multiple-csv-files</artifactId>
	<version>0.0.1-SNAPSHOT</version>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<maven.compiler.source>12</maven.compiler.source>
		<maven.compiler.target>12</maven.compiler.target>
	</properties>

	<dependencies>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>3.8.1</version>
			</plugin>
		</plugins>
	</build>
</project>

VO Class

Create a VO class that will actually represent csv records.

package com.roytuts.java.merge.multiple.csv.files.vo;

import java.util.LinkedHashMap;
import java.util.Map;

public class CsvVo {

	private Map<String, String> keyVal;

	public CsvVo(String id) {
		keyVal = new LinkedHashMap<>();// you may also use HashMap if you don't need to keep order
	}

	public Map<String, String> getKeyVal() {
		return keyVal;
	}

	public void setKeyVal(Map<String, String> keyVal) {
		this.keyVal = keyVal;
	}

	public void put(String key, String val) {
		keyVal.put(key, val);
	}

	public String get(String key) {
		return keyVal.get(key);
	}

}

CSV Parser Class

Create a CSV parser class that will read records from csv files and write records back to csv file.

package com.roytuts.java.merge.multiple.csv.files.parser;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Set;

import com.roytuts.java.merge.multiple.csv.files.vo.CsvVo;

public class CsvParser {

	public static List<CsvVo> getRecodrsFromACsv(File file, List<String> keys) throws IOException {
		BufferedReader br = new BufferedReader(new FileReader(file));
		List<CsvVo> records = new ArrayList<>();
		boolean isHeader = true;

		String line = null;
		while ((line = br.readLine()) != null) {
			if (isHeader) {// first line is header
				isHeader = false;
				continue;
			}
			CsvVo record = new CsvVo(file.getName());
			String[] lineSplit = line.split(",");
			for (int i = 0; i < lineSplit.length; i++) {
				record.put(keys.get(i), lineSplit[i]);
			}
			records.add(record);
		}

		br.close();

		return records;
	}

	public static List<String> getHeadersFromACsv(File file) throws IOException {
		BufferedReader br = new BufferedReader(new FileReader(file));
		List<String> headers = null;

		String line = null;
		while ((line = br.readLine()) != null) {
			String[] lineSplit = line.split(",");
			headers = new ArrayList<>(Arrays.asList(lineSplit));
			break;
		}

		br.close();

		return headers;
	}

	public static void writeToCsv(final File file, final Set<String> headers, final List<CsvVo> records)
			throws IOException {
		FileWriter csvWriter = new FileWriter(file);

		// write headers
		String sep = "";
		String[] headersArr = headers.toArray(new String[headers.size()]);
		for (String header : headersArr) {
			csvWriter.append(sep);
			csvWriter.append(header);
			sep = ",";
		}

		csvWriter.append("\n");

		// write records at each line
		for (CsvVo record : records) {
			sep = "";
			for (int i = 0; i < headersArr.length; i++) {
				csvWriter.append(sep);
				csvWriter.append(record.get(headersArr[i]));
				sep = ",";
			}
			csvWriter.append("\n");
		}

		csvWriter.flush();
		csvWriter.close();
	}
}

Main Class

Create main class that will test your application.

package com.roytuts.java.merge.multiple.csv.files;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

import com.roytuts.java.merge.multiple.csv.files.parser.CsvParser;
import com.roytuts.java.merge.multiple.csv.files.vo.CsvVo;

public class CsvApplication {

	public static void main(String[] args) throws IOException {
		File csv1 = new File("C:/csv/csv1.csv");
		File csv2 = new File("C:/csv/csv2.csv");

		List<String> csv1Headers = CsvParser.getHeadersFromACsv(csv1);
		// csv1Headers.forEach(h -> System.out.print(h + " "));
		// System.out.println();
		List<String> csv2Headers = CsvParser.getHeadersFromACsv(csv2);
		// csv2Headers.forEach(h -> System.out.print(h + " "));
		// System.out.println();

		List<String> allCsvHeaders = new ArrayList<>();
		allCsvHeaders.addAll(csv1Headers);
		allCsvHeaders.addAll(csv2Headers);
		// allCsvHeaders.forEach(h -> System.out.print(h + " "));
		// System.out.println();

		Set<String> uniqueHeaders = new HashSet<>(allCsvHeaders);
		// uniqueHeaders.forEach(h -> System.out.print(h + " "));
		// System.out.println();

		List<CsvVo> csv1Records = CsvParser.getRecodrsFromACsv(csv1, csv1Headers);
		List<CsvVo> csv2Records = CsvParser.getRecodrsFromACsv(csv2, csv2Headers);

		List<CsvVo> allCsvRecords = new ArrayList<>();
		allCsvRecords.addAll(csv1Records);
		allCsvRecords.addAll(csv2Records);

		CsvParser.writeToCsv(new File("C:/csv/csv.csv"), uniqueHeaders, allCsvRecords);
	}

}

Testing the Application

Let’s say you have two csv files csv1.csv and csv2.csv with the following data.

csv1.csv

The first row in the following file contains the header names and subsequent rows contain values.

Notice here you have four header fields and corresponding values in subsequent rows but in other csv file you may have more or less header fields.

NAME,MIDDLENAME,SURNAME,AGE
Jason,Noname,Scarry,16

csv2.csv

As I said in the above that you may have more or less header fields in other csv file.

Here in the below file you see that you have five header fields and values in subsequent rows.

MIDDLENAME,NAME,AGE,SURNAME,EMAIL
,Fred,Unknown,Krueger,fred.krueger@email.com
Noname,Jason,16,Scarry,jason.scarry@email.com

csv.csv

So if you merge the above two files then it should look like:

SURNAME,MIDDLENAME,EMAIL,NAME,AGE
Scarry,Noname,null,Jason,16
Krueger,,fred.krueger@email.com,Fred,Unknown
Scarry,Noname,jason.scarry@email.com,Jason,16

Look at the merged file csv.csv which has only unique headers but data from both the files csv1.csv and csv2.csv.

That’s all. Hope you got an idea on how to merge multiple csv files into one in Java.

You may also have more than two csv files and you can apply the same concept in your application to merge multiple csv files into one in java.

Source Code

Download

Leave a Reply

Your email address will not be published. Required fields are marked *