Skip to content

Feature/ml solution articles new#5666

Open
ahmadbasyouni10 wants to merge 3 commits intomainfrom
feature/ml-solution-articles-new
Open

Feature/ml solution articles new#5666
ahmadbasyouni10 wants to merge 3 commits intomainfrom
feature/ml-solution-articles-new

Conversation

@ahmadbasyouni10
Copy link
Copy Markdown
Collaborator

  • File(s) Modified: articles/batch-normalization.md, articles/multi-layer-backpropagation.md, articles/weight-initialization.md, articles/multi-headed-self-attention.md, articles/transformer-block.md
  • Language(s) Used: python
  • Submission URL: N/A (ML course solution articles, not LeetCode submissions)

Summary

Solution articles for the 3 new ML problems + W_o output projection fix for 2 existing problems:

New articles:

  • multi-layer-backpropagation.md — Chain rule through 2-layer MLP with ReLU, covers forward/backward pass, ReLU mask, outer product gradients
  • weight-initialization.md — Xavier vs Kaiming initialization, activation stability analysis across layers using raw matrix approach
  • batch-normalization.md — Training vs inference modes, running statistics, axis=0 vs axis=1 distinction

Updated articles:

  • multi-headed-self-attention.md — Added W_o output projection (nn.Linear(attention_dim, attention_dim, bias=False)) to solution code and explanation
  • transformer-block.md — Updated inner MultiHeadedSelfAttention class to include W_o output projection, matching the updated problem

Each new article follows the existing format: Prerequisites → Concept → Solution (Intuition, Implementation, Walkthrough, Time & Space Complexity) → Common Pitfalls → In the GPT Project → Key Takeaways.

Add solution articles for multi-layer backpropagation, weight
initialization, and batch normalization. These follow the same
format as the existing 27 ML solution articles (Prerequisites,
Concept, Solution with Python tabs, Common Pitfalls, In the GPT
Project, Key Takeaways).

Made-with: Cursor
- multi-headed-self-attention: Add output_proj (W_o) linear layer after
  concatenating heads, matching standard practice
- transformer-block: Add output_proj to inner MultiHeadedSelfAttention
  class, consistent with multi-head attention problem
- weight-initialization: Rewrite check_activations to use raw weight
  matrices (torch.randn * std) instead of nn.Linear for cross-platform
  determinism

Made-with: Cursor
Reduces precision from 4 to 2 decimal places for the
check_activations method to absorb cross-platform floating
point differences in multi-layer matrix operations.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant