Feature/ml solution articles new by ahmadbasyouni10 · Pull Request #5666 · neetcode-gh/leetcode

ahmadbasyouni10 · 2026-04-05T00:48:30Z

File(s) Modified: articles/batch-normalization.md, articles/multi-layer-backpropagation.md, articles/weight-initialization.md, articles/multi-headed-self-attention.md, articles/transformer-block.md
Language(s) Used: python
Submission URL: N/A (ML course solution articles, not LeetCode submissions)

Summary

Solution articles for the 3 new ML problems + W_o output projection fix for 2 existing problems:

New articles:

multi-layer-backpropagation.md — Chain rule through 2-layer MLP with ReLU, covers forward/backward pass, ReLU mask, outer product gradients
weight-initialization.md — Xavier vs Kaiming initialization, activation stability analysis across layers using raw matrix approach
batch-normalization.md — Training vs inference modes, running statistics, axis=0 vs axis=1 distinction

Updated articles:

multi-headed-self-attention.md — Added W_o output projection (nn.Linear(attention_dim, attention_dim, bias=False)) to solution code and explanation
transformer-block.md — Updated inner MultiHeadedSelfAttention class to include W_o output projection, matching the updated problem

Each new article follows the existing format: Prerequisites → Concept → Solution (Intuition, Implementation, Walkthrough, Time & Space Complexity) → Common Pitfalls → In the GPT Project → Key Takeaways.

Add solution articles for multi-layer backpropagation, weight initialization, and batch normalization. These follow the same format as the existing 27 ML solution articles (Prerequisites, Concept, Solution with Python tabs, Common Pitfalls, In the GPT Project, Key Takeaways). Made-with: Cursor

- multi-headed-self-attention: Add output_proj (W_o) linear layer after concatenating heads, matching standard practice - transformer-block: Add output_proj to inner MultiHeadedSelfAttention class, consistent with multi-head attention problem - weight-initialization: Rewrite check_activations to use raw weight matrices (torch.randn * std) instead of nn.Linear for cross-platform determinism Made-with: Cursor

Reduces precision from 4 to 2 decimal places for the check_activations method to absorb cross-platform floating point differences in multi-layer matrix operations. Made-with: Cursor

ahmadbasyouni10 added 3 commits April 3, 2026 23:30

weight-initialization: round check_activations to 2 decimals

7832a2b

Reduces precision from 4 to 2 decimal places for the check_activations method to absorb cross-platform floating point differences in multi-layer matrix operations. Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/ml solution articles new#5666

Feature/ml solution articles new#5666
ahmadbasyouni10 wants to merge 3 commits intomainfrom
feature/ml-solution-articles-new

ahmadbasyouni10 commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ahmadbasyouni10 commented Apr 5, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant