Data Engineer Interview Prep

An 8 week structured practice track for data engineering interviews. Weekly schedule, graded problem sets, self assessment rubric, and a journal template.

Schedule · Weekly cycle · Scoring · Journal · Companion repos

Reading prep material is necessary but not sufficient. Doing problems on a schedule with self assessment is what produces results. This repo is the schedule.

Six hours per week for eight weeks. Every problem links to a runnable browser sandbox at datadriven.io. No local setup.

Schedule

Week	Phase	Focus	Target avg score	Primer	Practice
1	Foundations	SQL fundamentals: joins, aggregating, dates	1.5	joins, aggregating	SQL bank, 9 problems
2	Foundations	Python data wrangling	1.5	foundations, collections	Python bank, 9 problems
3	Patterns	SQL window functions	2.0	window functions	Window drill, 9 problems
4	Patterns	Python: sessionization, dedup, retries, partitioning	2.0	Python for DE	9 problems across patterns
5	Design	Schema design	2.0	data modeling lessons (read all 8)	4 schema problems, sketched on paper
6	Design	Pipeline architecture	2.0	system design framework	3 case studies, designed end to end
7	Polish	Mocks and behavioral story bank	2.5	none	One full mock loop, six STAR stories
8	Polish	Company specific prep	2.5	Company guides	90 minutes per target company

Weekly cycle

Day	Activity	Time
Mon	Read the focus primer	30 min
Tue	Problem set 1 (3 easy/medium)	60 min
Wed	Problem set 2 (3 medium)	60 min
Thu	Problem set 3 (3 hard)	90 min
Fri	Self review and journal entry	30 min
Sat	Long form problem (schema or pipeline)	90 min
Sun	Off

Total: about 6 hours per week. If you can only commit 4, drop Thursday.

Scoring

After each problem, score yourself 0 to 3:

Score	Meaning
0	Could not start. Need to learn from scratch.
1	Solved with significant help. Need more reps.
2	Solved without help, took longer than expected.
3	Solved cleanly within time. Could explain it to a colleague.

Track scores in /journal/week-XX.md. Goal at the end of week 8: average 2.5.

If a week's average is below the target in the schedule, repeat the week before moving on.

Journal

Every Friday, fill in the template:

# Week XX

## Score summary
- SQL: avg X.X / 3
- Python: avg X.X / 3
- Schema: avg X.X / 3
- Pipeline: avg X.X / 3

## What I missed
- (concrete, problem by problem)

## What I will repeat next week
- (specific topics or patterns to drill again)

## One thing I learned that surprised me
- (1 to 2 sentences)

The journal is the program. Skip it and you are just doing problems.

Week by week deep dives

Week 1: SQL fundamentals

Drill 9 problems from datadriven.io/sql-interview-questions covering joins, aggregating, and date filtering. Start with 10 Lowest Uptime Services, 2FA Confirmation Rate, 30 Day Page View Counts.

Week 2: Python wrangling

Work through Batch Records, Column Sum, Activity Time Ledger, Batch Partitioner, Batch With Metadata.

Week 3: Window functions

Window functions show up in most senior SQL screens. Run the window functions drill timed. Target patterns: rolling totals, top N per group, sessionization, gaps and islands, percent of total, second to last X.

Week 4: Python patterns

Sessionization, dedup, hash partitioning, interval merging, retries with backoff, schema evolution, top N with ties, parsing semi structured logs, streaming aggregation. Pick 9 from the Python bank covering each.

Week 5: Schema design

Read all 8 data modeling lessons on Mon and Tue. Then sketch four schemas for 30 minutes each before reading the solution: A/B Experiment Assignment Schema, Customer Address History, Insurance Claims Lifecycle, Clickstream and Session Schema.

Week 6: Pipeline architecture

Memorize the eight beat framework. Sketch three pipelines end to end on paper for 45 minutes each: Card Transaction Streaming Pipeline, Cellular Connectivity and App Log Data Warehouse, and one from datadriven.io/data-pipeline-interview-questions matching your target company's stack.

Week 7: Mocks and behavioral

90 minute mock loop on Saturday: 25 min SQL, 25 min Python, 30 min design, 10 min behavioral. No partner? Record yourself and watch the recording. Write six STAR stories during the week.

Week 8: Company specific

For each target company, 90 minutes: read the loop guide, read three recent engineering blog posts, pick three problems from this bank that match the company's style, map two STAR stories to the company's leveling rubric.

Company	Guide
Netflix	companies/netflix/interview
Uber	companies/uber/interview
Amazon	companies/amazon/interview
Google	companies/google/interview
Meta	companies/meta/interview

When you finish

You should be able to:

Write any window function from memory under time pressure.
Implement any of the eight Python patterns without lookups.
Defend a star schema with explicit grain and SCD choices.
Walk through the eight beat framework on a new pipeline question.
Tell six STAR stories without rambling.
Articulate the loop structure of your top three target companies.

If yes to all six, schedule the loop.

Companion repos

data-engineering-interview-handbook. Flagship handbook.
data-engineering-interview-questions. 1418 question bank.
system-design-for-data-engineers. 120 case studies.
data-engineering-cheatsheet. Single page reference.
data-engineer-interview-handbook. 7 day sprint.
awesome-data-engineering-interview. Curated resources.

Contributing

The schedule is a starting point. If you have a better one with evidence, open a PR. Include your timeline, target role, starting level, and what worked.

License

CC BY-SA 4.0. Sandboxes hosted at datadriven.io.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineer Interview Prep

Schedule

Weekly cycle

Scoring

Journal

Week by week deep dives

Week 1: SQL fundamentals

Week 2: Python wrangling

Week 3: Window functions

Week 4: Python patterns

Week 5: Schema design

Week 6: Pipeline architecture

Week 7: Mocks and behavioral

Week 8: Company specific

When you finish

Companion repos

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Data Engineer Interview Prep

Schedule

Weekly cycle

Scoring

Journal

Week by week deep dives

Week 1: SQL fundamentals

Week 2: Python wrangling

Week 3: Window functions

Week 4: Python patterns

Week 5: Schema design

Week 6: Pipeline architecture

Week 7: Mocks and behavioral

Week 8: Company specific

When you finish

Companion repos

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages