CSV Comparator User Guide

Quick Navigation

Overview
Getting Started
How to Use
Understanding Results
Use Cases
Best Practices
Privacy & Security
Troubleshooting

Overview

What is CSV Comparator?

CSV Comparator is a free browser-based tool that helps you compare two CSV files and identify differences with field-level precision. Perfect for data validation, quality checks, and tracking changes over time.

Compare up to 100,000 rows - Instant processing for large datasets
Auto-detect identifiers - Automatically finds email, ID, or unique key columns
Field-level tracking - See exactly which fields changed and how
100% private - All processing happens in your browser, no uploads

Key Features

Instant Results: Process large files in seconds
Smart Detection: Automatically identifies unique identifier columns
Three Result Types: Missing records, new records, and changed fields
Export Reports: Download detailed comparison results
Drag & Drop: Simple, intuitive interface
Works Offline: No internet connection required after initial load

Getting Started

What You Need

A modern web browser (Chrome, Firefox, Safari, Edge)
Two CSV files to compare
Files with a unique identifier column (email, ID, customer number, etc.)

Supported File Formats

CSV files (.csv) - Comma-separated values
TSV files (.tsv) - Tab-separated values
UTF-8 encoding - Supports special characters and multiple languages

File Size Limits

Maximum rows: 100,000 rows per file
Performance: Files under 50,000 rows process instantly
Recommendation: Test with sample data first for very large files

💡 Quick Start

Access the tool: https://piraiai.com/pirai-csv-comparator

No installation, registration, or setup required. Just open and start comparing!

📺 Video Tutorial

How to Use Pirai CSV Comparator

Learn how to compare two CSV files, identify differences, and export results - all in your browser with complete privacy.

How to Use

Upload Two CSV Files

Drag and drop or click to upload your "before" and "after" CSV files. The tool will display a preview of the first few rows.

Select Identifier Field

Choose the column that uniquely identifies each record (e.g., Email, CustomerID, ResponseID). The tool will auto-detect common identifiers.

Run Comparison

Click "Compare" to start the analysis. Processing typically takes seconds, even for large files.

Review Results

Explore three result categories: Missing records (🔴), New records (🟢), and Changed fields (🟡). View field-by-field changes.

Export Report

Download a detailed comparison report with all changes documented. Results are exported in CSV format.

💡 Pro Tip

Make sure both CSV files use the same identifier column name and format. For example, if File 1 uses "Email", File 2 should also use "Email" (case-sensitive).

Understanding Results

The comparison produces three types of results, each with distinct meaning:

🔴 Missing Records

Definition: Records that exist in File 1 (before) but are missing in File 2 (after).

Common causes:

Records were deleted or removed
Data export filtered out certain records
Respondents dropped out (in survey contexts)

Example: Customer ID "C123" appears in your old SDS file but not in the updated version.

🟢 New Records

Definition: Records that exist in File 2 (after) but were not in File 1 (before).

Common causes:

New records were added
Additional data was collected
New customers/respondents joined

Example: Customer ID "C456" appears in your updated SDS file but wasn't in the previous version.

🟡 Changed Fields

Definition: Records that exist in both files but have different values in one or more fields.

Shows:

Which fields changed
Old value → New value
Field-by-field comparison

Example: Customer C123's "Status" field changed from "Active" to "Inactive".

Summary Statistics

At the top of the results, you'll see:

Total records compared - How many unique identifiers were found
Match rate - Percentage of records that appeared in both files
Missing count - Number of records removed
New count - Number of records added
Changed count - Number of records with field changes

Reading Changed Field Results

For changed records, you'll see a table showing:

Identifier	Field Name	Old Value	New Value
C123	Status	Active	Inactive
C123	Segment	Premium	Standard
C456	Email	old@email.com	new@email.com

Use Cases

1. Qualtrics Supplemental Data Sources (SDS) Validation

Qualtrics allows you to upload CSV files as Supplemental Data Sources (SDS) to enrich survey responses with additional data.

The challenge: When updating SDS files, you need to verify what changed before uploading to Qualtrics.

How CSV Comparator helps:

Compare old vs new SDS CSV files before uploading
Verify which records were added, removed, or updated
Ensure no critical data was accidentally deleted
Track field-level changes across customer/respondent data

💡 Qualtrics SDS Pro Tip

Before uploading updated Supplemental Data Sources to Qualtrics:

Export your current SDS file from Qualtrics
Upload both old and new files to CSV Comparator
Use CustomerID or Email as identifier
Review all changes - missing, new, and field updates
Prevent accidental data loss or incorrect updates

This 2-minute check can save hours of troubleshooting!

2. Survey Response Validation

Compare exported survey responses before and after bulk updates
Verify that embedded data updates were applied correctly
Check for unexpected changes after data processing
Validate data transformations

3. Data Quality Checks

Track data drift over time
Identify unexpected changes in master data files
Validate data synchronization between systems
Monitor changes in customer/respondent information

4. QA Testing

Validate data migrations between systems
Compare test vs production data
Ensure data integrity after system updates
Verify ETL processes worked correctly

5. Respondent Tracking

Compare participant lists across survey waves
Identify new respondents in follow-up studies
Track respondent dropouts between waves
Monitor panel composition changes

Best Practices

✅ Do's

Use unique identifiers (email, ID, customer number)
Ensure consistent column names across both files
Clean your data before comparison (remove extra spaces, standardize formats)
Test with a small sample first if working with very large files
Export and save comparison results for documentation
Verify identifier column has no duplicate values

❌ Don'ts

Don't use non-unique identifiers (names, dates, generic IDs with duplicates)
Don't compare files with completely different structures
Don't forget to download results before closing the browser
Don't assume missing identifier means missing data - check for typos

Performance Tips

💡 Optimize for Large Files

Files under 50K rows: Process instantly
Files 50K-100K rows: May take 10-30 seconds
Files over 100K rows: Consider splitting into chunks
Browser recommendation: Chrome or Edge for best performance
Memory: Close unnecessary browser tabs for very large comparisons

Data Preparation Checklist

Verify identifier uniqueness: No duplicate IDs in either file
Consistent naming: Same column names in both files
Same data types: Numbers as numbers, dates in same format
Trim whitespace: Remove leading/trailing spaces
Standardize case: Decide on uppercase/lowercase for identifiers

🔐 Privacy & Security

Your Data Stays Private

CSV Comparator is designed with privacy as the top priority. All comparison processing happens entirely in your browser - your data never leaves your computer.

How It Works

100% browser-based: All processing happens locally on your device
No server uploads: Your CSV files are never sent to any server
No data storage: Files are processed in memory and discarded when you close the tab
Works offline: After initial page load, no internet connection required
Client-side JavaScript: All logic runs in your browser

Technical Details

Processing: JavaScript FileReader API reads files locally
Memory: Data stored temporarily in browser RAM only
Network: Zero network requests after page load
Sessions: All data clears when you close the browser tab

Best Practices for Sensitive Data

💡 Security Tips

Use on trusted devices only
Clear browser cache after working with highly sensitive data
Close the browser tab when finished to clear all data from memory
For extremely sensitive data, consider using in an incognito/private window

⚠️ Important: While the tool itself is secure and private, always follow your organization's data handling policies when working with sensitive information.

Troubleshooting

❌ "File won't upload"

Solutions:

Check file format - must be CSV or TSV
Verify file size is under 100K rows
Ensure file is not corrupted - try opening in Excel first
Check file encoding - should be UTF-8
Remove special characters from filename

❌ "No identifier field detected"

Solutions:

Manually select the identifier column from the dropdown
Ensure the identifier column has unique values
Check for empty cells in the identifier column
Verify both files have the same identifier column name

❌ "Comparison taking too long"

Solutions:

Large file detected - be patient (may take up to 30 seconds for 100K rows)
Close other browser tabs to free up memory
Use Chrome or Edge browser for best performance
Consider splitting file if over 100K rows

❌ "Unexpected results / Too many changes"

Solutions:

Verify you selected the correct identifier column
Check for duplicate identifiers in either file
Ensure both files use the same identifier format (e.g., both use "email" not "Email" vs "email")
Look for leading/trailing spaces in identifier values
Verify data types are consistent (numbers stored as text vs numbers)

❌ "All records showing as new/missing"

Solutions:

Check if identifier column names match exactly in both files
Verify you didn't swap File 1 and File 2
Look for case sensitivity issues (ID vs id)
Check if identifier format changed (added prefixes/suffixes)

Still Having Issues?

Contact our support team: admin@piraiai.com

Include:

Description of the issue
Browser and version
Approximate file sizes
Screenshot of error (if any)