-
Notifications
You must be signed in to change notification settings - Fork 8
HW_01_202601ย #163
Description
Tasks โ Web Scraping & API REST
Course: Python Programming
Total Score: 20 points (10 pts each task)
๐ท๏ธ Task 1: Web Scraping โ UNMSM Admission Exam Results
Score: 10 points
Description
The goal of this task is to automatically extract the admission exam results
from Universidad Nacional Mayor de San Marcos using Python and Selenium,
career by career, and consolidate all the information into an Excel file.
๐๏ธ Expected Repository Structure
Create a repository named exactly Scraping_data with the following structure:
Scraping_data/
โ
โโโ scraper.py # Main extraction script
โโโ README.md # Project explanation
โโโ output/
โ โโโ resultados_sanmarcos.xlsx # Consolidated Excel with all results
โโโ video/
โโโ link.txt # Link to your explanatory video
Place the link to your repository and video here:
https://docs.google.com/spreadsheets/d/16i_gtlZV08QARXl8FM5yX503XjDyKPiRFRbo56cjR2k/edit?usp=sharing
๐ง The script must:
- Access:
https://admision.unmsm.edu.pe/Website20262/A/A.html - Automatically extract the links of all careers
- Iterate career by career extracting all applicants (not just the first 50)
- Save the result in a consolidated Excel file inside the
output/folder
๐ก Important tips based on the San Marcos page:
1. The table uses DataTables (JavaScript pagination)
The page loads all the data into memory but only displays 50 records by default.
You need to find a way to solve this challenge.
๐ฟ GitHub Workflow (MANDATORY)
- โ Do not work directly on
mainโ points will be deducted - โ Create a working branch
- โ Make progressive commits with descriptive messages
- โ
When finished, merge into
mainvia a Pull Request
๐น Explanatory Video (2.5 points)
- Duration: 3 minutes maximum
- Must show:
- Brief explanation of the code
- The script running live
- The final Excel file with the extracted data
- Upload the link in the file
video/link.txt
๐ README.md
Must include at least:
- What does the project do?
- How to install the dependencies?
- How to run the script?
- What does the output contain?
๐ Grading Rubric โ Task 1 (0 - 10 pts)
| Criteria | Points |
|---|---|
| Script works and correctly extracts all careers | 2.5 pts |
| Explanatory video (3 min, shows code and output) | 2.5 pts |
| Branch workflow + merge via Pull Request | 1.5 pts |
Complete consolidated Excel in output/ folder |
1.5 pts |
Explanatory README.md |
1.0 pt |
| Progressive commits with descriptive messages | 0.5 pt |
Error handling in the code (try/except) |
0.5 pt |
| TOTAL | 10 pts |
โ ๏ธ Penalty: If you are found to have worked directly onmain
without using branches, 1.5 points will be automatically deducted.
๐ฎ Task 2: API REST โ RAWG Video Games Database
Score: 10 points
Description
The goal of this task is to consume the RAWG API to extract, analyze,
and compare video game data using Python.
You will create a new notebook inside the same repository Scraping_data.
๐๏ธ Expected Repository Structure
Add the following to your existing Scraping_data repository:
Scraping_data/
โ
โโโ scraper.py # (Task 1 โ already exists)
โโโ README.md # Update with the API section
โ
โโโ api/
โ โโโ tarea_rawg_api.ipynb # Main task notebook
โ โโโ output/
โ โโโ top20_rawg.csv # CSV file generated in the task
๐ Step 1 โ Get your RAWG API Key
- Go to https://rawg.io and create your account
- Visit https://rawg.io/apidocs
- Click Get API Key and fill out the form:
| Field | What to enter |
|---|---|
| Site/App URL | https://localhost |
| Description | API Python Class |
- Copy your API Key and paste it into your notebook
โ ๏ธ Do not upload your API Key to GitHub. Store it in a local variable.
๐ Notebook Structure
Each section must have Markdown cells explaining what you are doing
and code cells with the output executed and visible.
๐ข Part A โ General Exploration ย ย ย (2 pts)
A1 โ (1 pt)
How many games does RAWG have registered in total?
Print the number with a clear message.
Hint: the
countfield is in the response from the/gamesendpoint.
๐ต Part B โ Category Analysis ย ย ย (2 pts)
B1 โ (1 pt)
What is the top 5 highest rated games of all time according to Metacritic?
Show: name, rating, and metacritic score.
B2 โ (1 pt)
What are the 10 best games available on Steam (store_id=1)?
Show name, rating, and metacritic score.
๐ก Part C โ Comparisons ย ย ย (3 pts)
C1 โ (0.5 pts)
Compare the top 5 games on PC (platform_id=4) vs top 5 on PS5 (platform_id=187).
Which platform has the highest rated games?
C2 โ (0.5 pts)
Choose 3 famous games and build a comparison table with:
name, rating, metacritic, genres, and platforms.
C3 โ (0.5 pts)
Query the top 5 games from at least 4 different genres, calculate the
average rating for each, and determine which genre produces
the best games according to users.
C4 โ (0.5 pts)
Compare the best games from 3 different years of your choice.
In which year were the games with the highest average metacritic score released?
C5 โ (1.0 pt)
Export the top 20 games of all time to a CSV file named
top20_rawg.csv inside the api/output/ folder.
The CSV must have the following columns:
name, rating, metacritic, release_date, main_genre
Display the first 5 rows of the generated file in the notebook.
๐ด Part D โ Insights & Conclusions ย ย ย (3 pts)
D1 โ (1.0 pt)
In a Markdown cell write your personal conclusions answering:
- What was the most interesting thing you found in the data?
- Which genre or platform surprised you the most and why?
- What other question would you ask this API if you had more time?
- How many requests did you use in total? (call
client.resumen_requests())
This question is graded on the depth of your analysis,
not on having a "correct" answer.
๐ฟ GitHub Workflow (MANDATORY โ same rules as Task 1)
- โ Do not work directly on
main - โ
Create a branch for this task (e.g.
feature/api-rawg) - โ Progressive commits with descriptive messages
- โ
Merge into
mainvia a Pull Request
๐ Grading Rubric โ Task 2 (0 - 10 pts)
| Criteria | Points |
|---|---|
| Part A โ General Exploration | 2.0 pts |
| Part B โ Category Analysis | 2.0 pts |
| Part C โ Comparisons + CSV exported | 3.0 pts |
| Part D โ Insights + personal conclusions | 3.0 pts |
| TOTAL | 10 pts |
โ ๏ธ Penalty: Working directly onmaindeducts 3 points.
Code cells without visible output deduct 0.5 points per cell.
โ Checklist before submitting
- The notebook is named
tarea_rawg_api.ipynband is inside theapi/folder - The file
top20_rawg.csvis insideapi/output/ - All code cells are executed with visible output
- The code is commented
- You used a branch + Pull Request for the merge
- The
README.mdmentions both tasks
๐ Global Score Summary
| Task | Description | Score |
|---|---|---|
| Task 1 | Web Scraping โ UNMSM | 10 pts |
| Task 2 | API REST โ RAWG | 10 pts |
| Course Total | 20 pts |
๐ Deadline
Friday, April 10 โ 11:59 PM
Submit the link to your repository in the same Google Sheets form.
๐ฌ Any questions, reach out on Discord. Good luck! ๐ฎ