Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 176 additions & 0 deletions March_ML_Mania_2026_Kaggle_Submission_Marktechpost.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# March Machine Learning Mania 2026 — Kaggle Submission Notebook\n",
"\n",
"**Competition:** [March Machine Learning Mania 2026](https://www.kaggle.com/competitions/march-machine-learning-mania-2026)\n",
"\n",
"This notebook generates a valid baseline submission for both the Men's and Women's 2026 NCAA Basketball Tournament predictions.\n",
"\n",
"> **Assumption:** The competition data files are available at `/kaggle/input/march-machine-learning-mania-2026/`. \n",
"> We use `SampleSubmissionStage2.csv` as the template for the current-season submission."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Kaggle Notebook Code"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import pandas as pd\n",
"\n",
"# ------------------------------------------------------------------\n",
"# Path to competition data (standard Kaggle input directory)\n",
"# ------------------------------------------------------------------\n",
"DATA_DIR = \"/kaggle/input/march-machine-learning-mania-2026\"\n",
"OUTPUT_DIR = \"/kaggle/working\" # Kaggle saves output files here\n",
"\n",
"# ------------------------------------------------------------------\n",
"# Load the Stage 2 sample submission\n",
"# Stage 2 lists every possible 2026 team matchup that needs a prediction\n",
"# ------------------------------------------------------------------\n",
"sample_sub_path = os.path.join(DATA_DIR, \"SampleSubmissionStage2.csv\")\n",
"submission = pd.read_csv(sample_sub_path)\n",
"\n",
"print(f\"Sample submission shape: {submission.shape}\")\n",
"print(submission.head())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ------------------------------------------------------------------\n",
"# Validate the required columns are present\n",
"# The competition requires: ID (string) and Pred (float 0–1)\n",
"# ------------------------------------------------------------------\n",
"assert \"ID\" in submission.columns, \"Missing required column: ID\"\n",
"assert \"Pred\" in submission.columns, \"Missing required column: Pred\"\n",
"\n",
"print(f\"Required columns present: {list(submission.columns)}\")\n",
"print(f\"Total predictions required: {len(submission)}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ------------------------------------------------------------------\n",
"# Baseline model: predict 0.5 for every matchup (coin flip)\n",
"# This gives a valid first submission scored at 0.0 before tournaments\n",
"# Pred = probability that the LOWER TeamID team wins the matchup\n",
"# ------------------------------------------------------------------\n",
"submission[\"Pred\"] = 0.5\n",
"\n",
"print(\"Prediction column set to 0.5 (baseline) for all matchups.\")\n",
"print(submission.head(10))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ------------------------------------------------------------------\n",
"# Quick sanity checks before saving\n",
"# ------------------------------------------------------------------\n",
"# Check all IDs follow the format SSSS_XXXX_YYYY\n",
"assert submission[\"ID\"].str.match(r\"\\d{4}_\\d{4}_\\d{4}\").all(), \\\n",
" \"Some IDs do not match expected format SSSS_XXXX_YYYY\"\n",
"\n",
"# Check predictions are within [0, 1]\n",
"assert submission[\"Pred\"].between(0, 1).all(), \\\n",
" \"Some predictions are outside the valid range [0, 1]\"\n",
"\n",
"# Confirm no duplicate IDs\n",
"assert submission[\"ID\"].is_unique, \"Duplicate IDs found in submission\"\n",
"\n",
"print(\"All sanity checks passed!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ------------------------------------------------------------------\n",
"# Save the submission file to the Kaggle working directory\n",
"# ------------------------------------------------------------------\n",
"output_path = os.path.join(OUTPUT_DIR, \"submission.csv\")\n",
"submission.to_csv(output_path, index=False)\n",
"\n",
"print(f\"Submission saved to: {output_path}\")\n",
"print(f\"File size: {os.path.getsize(output_path) / 1024:.1f} KB\")\n",
"print(\"\\nFinal submission preview:\")\n",
"print(submission.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Kaggle Submit Command\n",
"\n",
"After the notebook finishes, submit the output file using the Kaggle CLI:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Run this cell to submit directly from the notebook (or paste into a terminal)\n",
"# Make sure your Kaggle API credentials are configured (~/.kaggle/kaggle.json)\n",
"\n",
"!kaggle competitions submit \\\n",
" -c march-machine-learning-mania-2026 \\\n",
" -f /kaggle/working/submission.csv \\\n",
" -m \"Baseline 0.5 submission\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Notes\n",
"\n",
"- **File used:** `SampleSubmissionStage2.csv` — contains all 2026 season matchup IDs (both Men's and Women's).\n",
"- **Baseline:** Every matchup is predicted with probability `0.5` (equal chance for either team). This scores **0.0** before the tournament begins, as stated in the competition rules.\n",
"- **Pred column meaning:** Probability that the team with the **lower** `TeamID` wins. Men's TeamIDs are 1000–1999; Women's TeamIDs are 3000–3999.\n",
"- **Submission deadline:** March 19, 2026 4PM UTC. Select your best submission manually before the deadline — do not rely on automatic selection.\n",
"- To improve on the baseline, replace the `submission[\"Pred\"] = 0.5` line with model-predicted probabilities derived from the historical game data provided."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}