An 8 week structured practice track for data engineering interviews. Weekly schedule, graded problem sets, self assessment rubric, and a journal template.
Schedule · Weekly cycle · Scoring · Journal · Companion repos
Reading prep material is necessary but not sufficient. Doing problems on a schedule with self assessment is what produces results. This repo is the schedule.
Six hours per week for eight weeks. Every problem links to a runnable browser sandbox at datadriven.io. No local setup.
| Week | Phase | Focus | Target avg score | Primer | Practice |
|---|---|---|---|---|---|
| 1 | Foundations | SQL fundamentals: joins, aggregating, dates | 1.5 | joins, aggregating | SQL bank, 9 problems |
| 2 | Foundations | Python data wrangling | 1.5 | foundations, collections | Python bank, 9 problems |
| 3 | Patterns | SQL window functions | 2.0 | window functions | Window drill, 9 problems |
| 4 | Patterns | Python: sessionization, dedup, retries, partitioning | 2.0 | Python for DE | 9 problems across patterns |
| 5 | Design | Schema design | 2.0 | data modeling lessons (read all 8) | 4 schema problems, sketched on paper |
| 6 | Design | Pipeline architecture | 2.0 | system design framework | 3 case studies, designed end to end |
| 7 | Polish | Mocks and behavioral story bank | 2.5 | none | One full mock loop, six STAR stories |
| 8 | Polish | Company specific prep | 2.5 | Company guides | 90 minutes per target company |
| Day | Activity | Time |
|---|---|---|
| Mon | Read the focus primer | 30 min |
| Tue | Problem set 1 (3 easy/medium) | 60 min |
| Wed | Problem set 2 (3 medium) | 60 min |
| Thu | Problem set 3 (3 hard) | 90 min |
| Fri | Self review and journal entry | 30 min |
| Sat | Long form problem (schema or pipeline) | 90 min |
| Sun | Off |
Total: about 6 hours per week. If you can only commit 4, drop Thursday.
After each problem, score yourself 0 to 3:
| Score | Meaning |
|---|---|
| 0 | Could not start. Need to learn from scratch. |
| 1 | Solved with significant help. Need more reps. |
| 2 | Solved without help, took longer than expected. |
| 3 | Solved cleanly within time. Could explain it to a colleague. |
Track scores in /journal/week-XX.md. Goal at the end of week 8: average 2.5.
If a week's average is below the target in the schedule, repeat the week before moving on.
Every Friday, fill in the template:
# Week XX
## Score summary
- SQL: avg X.X / 3
- Python: avg X.X / 3
- Schema: avg X.X / 3
- Pipeline: avg X.X / 3
## What I missed
- (concrete, problem by problem)
## What I will repeat next week
- (specific topics or patterns to drill again)
## One thing I learned that surprised me
- (1 to 2 sentences)The journal is the program. Skip it and you are just doing problems.
Drill 9 problems from datadriven.io/sql-interview-questions covering joins, aggregating, and date filtering. Start with 10 Lowest Uptime Services, 2FA Confirmation Rate, 30 Day Page View Counts.
Work through Batch Records, Column Sum, Activity Time Ledger, Batch Partitioner, Batch With Metadata.
Window functions show up in most senior SQL screens. Run the window functions drill timed. Target patterns: rolling totals, top N per group, sessionization, gaps and islands, percent of total, second to last X.
Sessionization, dedup, hash partitioning, interval merging, retries with backoff, schema evolution, top N with ties, parsing semi structured logs, streaming aggregation. Pick 9 from the Python bank covering each.
Read all 8 data modeling lessons on Mon and Tue. Then sketch four schemas for 30 minutes each before reading the solution: A/B Experiment Assignment Schema, Customer Address History, Insurance Claims Lifecycle, Clickstream and Session Schema.
Memorize the eight beat framework. Sketch three pipelines end to end on paper for 45 minutes each: Card Transaction Streaming Pipeline, Cellular Connectivity and App Log Data Warehouse, and one from datadriven.io/data-pipeline-interview-questions matching your target company's stack.
90 minute mock loop on Saturday: 25 min SQL, 25 min Python, 30 min design, 10 min behavioral. No partner? Record yourself and watch the recording. Write six STAR stories during the week.
For each target company, 90 minutes: read the loop guide, read three recent engineering blog posts, pick three problems from this bank that match the company's style, map two STAR stories to the company's leveling rubric.
| Company | Guide |
|---|---|
| Netflix | companies/netflix/interview |
| Uber | companies/uber/interview |
| Amazon | companies/amazon/interview |
| companies/google/interview | |
| Meta | companies/meta/interview |
You should be able to:
- Write any window function from memory under time pressure.
- Implement any of the eight Python patterns without lookups.
- Defend a star schema with explicit grain and SCD choices.
- Walk through the eight beat framework on a new pipeline question.
- Tell six STAR stories without rambling.
- Articulate the loop structure of your top three target companies.
If yes to all six, schedule the loop.
- data-engineering-interview-handbook. Flagship handbook.
- data-engineering-interview-questions. 1418 question bank.
- system-design-for-data-engineers. 120 case studies.
- data-engineering-cheatsheet. Single page reference.
- data-engineer-interview-handbook. 7 day sprint.
- awesome-data-engineering-interview. Curated resources.
The schedule is a starting point. If you have a better one with evidence, open a PR. Include your timeline, target role, starting level, and what worked.
CC BY-SA 4.0. Sandboxes hosted at datadriven.io.