Note: Archival Project
This was my second major project in Go, built as a deep dive into the language's idiomatic concurrency patterns and high-performance I/O. It is now archived but serves as a solid reference for ETL (Extract, Transform, Load) implementations in Golang.
Go File Processor is a high-performance command-line tool and library designed to efficiently convert massive CSV files (millions of records) into structured JSON. It demonstrates the power of Go's concurrency primitives to achieve maximum throughput with minimal memory overhead.
This project was a hands-on laboratory to master several Go concepts:
- Concurrency via Worker Pool: Leveraging
goroutinesandchannelsto process data in parallel without overwhelming the system. - Memory Efficiency (Streaming): Using
io.Readerandio.Writerto process gigabytes of data with a constant, tiny memory footprint. - The Middleware Pattern: Implementing a "Chain of Responsibility" for data transformation that is both flexible and type-safe.
- Atomic Operations: Using
sync/atomicfor high-speed metrics tracking, avoiding the overhead of mutexes. - Idiomatic Project Layout: Following standard Go folder structures (
cmd/,internal/) and build automation withMakefile.
proc := processor.NewCSVToJSONProcessor()
config := processor.Config{WorkerCount: 8}
// Fluent transformation chain
config.AddTransformer(processor.EmailFilter(`@company.com$`))
config.AddTransformer(processor.FieldMasker("email"))
metrics, err := proc.Process("input.csv", "output.json", config)./fileproc -input data.csv -output data.json -workers 4| Technology | What I Learned |
|---|---|
| Worker Pool | How to orchestrate multiple goroutines for parallel work. |
| Channels | Managing safe communication and backpressure between stages. |
| Streaming I/O | Processing files record-by-record instead of loading to RAM. |
| Atomic Counters | Implementing thread-safe counters with maximum performance. |
| Structured Logs | Using slog for modern, machine-readable observability. |
The system uses a streaming model to maintain low memory usage:
Input CSV -> Producer -> Job Channel -> [Workers + Transformers] -> Result Channel -> Consumer -> Output JSON
| Target | Description |
|---|---|
make build |
Compiles the fileproc binary. |
make test |
Runs the full unit test suite. |
make bench |
Runs benchmarks to see the speed of Parallel vs Sequential. |
make generate-data |
Generates a 100k row test file for performance testing. |
Building this project taught me that Go isn't just about syntax; it's about a philosophy of simplicity and performance. The transition from sequential processing to a parallel worker pool showed me how Go empowers developers to build tools that scale effortlessly.
