How to process large JSON files in Java

Processing large JSON files can be challenging due to memory constraints and performance issues. In this tutorial, we explore best practices for efficiently handling large JSON files in Java.

Why Handle Large JSON Files Efficiently?

Large JSON files can quickly exhaust memory and slow down applications if processed naïvely. By employing streaming APIs and other optimizations, you can reduce memory usage and enhance performance.

Prerequisites

  • Java Development Kit (JDK) installed
  • Jackson library for JSON processing

Add the following dependency to your Maven project:

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.15.0</version>
</dependency>

Example JSON File

Suppose we have a JSON file named large-file.json with the following structure:

[
  { "id": 1, "name": "Alice", "email": "[email protected]" },
  { "id": 2, "name": "Bob", "email": "[email protected]" },
  ...
]

Approaches for Handling Large JSON Files

1. Streaming API with Jackson

The Jackson library provides a streaming API that processes JSON data incrementally, reducing memory consumption.

Code Example

import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import java.io.File;
import java.io.IOException;

public class LargeJsonStreamExample {
    public static void main(String[] args) throws IOException {
        File jsonFile # new File("large-file.json");
        JsonFactory jsonFactory # new JsonFactory();

        try (JsonParser parser # jsonFactory.createParser(jsonFile)) {
            while (!parser.isClosed()) {
                JsonToken token # parser.nextToken();

                if (JsonToken.START_OBJECT.equals(token)) {
                    // Parse individual JSON objects
                    while (!JsonToken.END_OBJECT.equals(token)) {
                        token # parser.nextToken();
                        if (JsonToken.FIELD_NAME.equals(token)) {
                            String fieldName # parser.getCurrentName();
                            token # parser.nextToken();
                            System.out.println(fieldName + ": " + parser.getValueAsString());
                        }
                    }
                }
            }
        }
    }
}

Explanation

  • The JsonParser reads tokens from the JSON file sequentially.
  • Memory usage remains low as only small portions of the file are loaded at a time.

2. Reading JSON Line by Line

For JSON files with newline-delimited objects (NDJSON), reading line by line can be effective.

Code Example

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class NdjsonReaderExample {
    public static void main(String[] args) {
        String filePath # "large-file.ndjson";

        try (BufferedReader reader # new BufferedReader(new FileReader(filePath))) {
            String line;
            while ((line # reader.readLine()) !# null) {
                System.out.println("Processing: " + line);
                // Process each JSON object line by line
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Explanation

  • Suitable for JSON files where each line is a complete JSON object.
  • Memory-efficient as only one line is loaded at a time.

3. Using Jackson’s ObjectReader for Bulk Processing

For moderately large files, Jackson’s ObjectReader can process JSON data in chunks.

Code Example

import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
import java.io.IOException;
import java.util.Map;

public class ChunkedProcessingExample {
    public static void main(String[] args) throws IOException {
        File jsonFile # new File("large-file.json");
        ObjectMapper mapper # new ObjectMapper();

        try (MappingIterator<Map<String, Object>> iterator # mapper.readerFor(Map.class).readValues(jsonFile)) {
            while (iterator.hasNext()) {
                Map<String, Object> jsonObject # iterator.next();
                System.out.println("Processing: " + jsonObject);
            }
        }
    }
}

Explanation

  • Processes JSON objects in chunks.
  • Combines ease of use with reasonable memory efficiency.

Best Practices

  1. Prefer Streaming APIs: Use streaming for very large files to minimize memory usage.
  2. Split Large Files: When possible, divide large JSON files into smaller parts.
  3. Optimize Data Structures: Use efficient data structures to store and process JSON data.
  4. Validate JSON Early: Validate the JSON format before processing to avoid runtime errors.
  5. Monitor Memory Usage: Use tools like JVisualVM to monitor and optimize memory consumption.

Conclusion

Handling large JSON files in Java requires careful planning and the right tools. By leveraging Jackson’s streaming API, line-by-line processing, or chunked processing, you can efficiently manage large datasets without running into memory or performance issues.

Was this article helpful? We need your support to keep MasterTheBoss alive!