Skip to content

HW_01_202601ย #163

@The-Paul2002

Description

@The-Paul2002

Tasks โ€” Web Scraping & API REST

Course: Python Programming
Total Score: 20 points (10 pts each task)


๐Ÿ•ท๏ธ Task 1: Web Scraping โ€” UNMSM Admission Exam Results

Score: 10 points

Description

The goal of this task is to automatically extract the admission exam results
from Universidad Nacional Mayor de San Marcos using Python and Selenium,
career by career, and consolidate all the information into an Excel file.


๐Ÿ—‚๏ธ Expected Repository Structure

Create a repository named exactly Scraping_data with the following structure:

Scraping_data/
โ”‚
โ”œโ”€โ”€ scraper.py                        # Main extraction script
โ”œโ”€โ”€ README.md                         # Project explanation
โ”œโ”€โ”€ output/
โ”‚   โ””โ”€โ”€ resultados_sanmarcos.xlsx     # Consolidated Excel with all results
โ””โ”€โ”€ video/
    โ””โ”€โ”€ link.txt                      # Link to your explanatory video

Place the link to your repository and video here:
https://docs.google.com/spreadsheets/d/16i_gtlZV08QARXl8FM5yX503XjDyKPiRFRbo56cjR2k/edit?usp=sharing


๐Ÿ”ง The script must:

  • Access: https://admision.unmsm.edu.pe/Website20262/A/A.html
  • Automatically extract the links of all careers
  • Iterate career by career extracting all applicants (not just the first 50)
  • Save the result in a consolidated Excel file inside the output/ folder

๐Ÿ’ก Important tips based on the San Marcos page:

1. The table uses DataTables (JavaScript pagination)
The page loads all the data into memory but only displays 50 records by default.
You need to find a way to solve this challenge.


๐ŸŒฟ GitHub Workflow (MANDATORY)

  • โŒ Do not work directly on main โ€” points will be deducted
  • โœ… Create a working branch
  • โœ… Make progressive commits with descriptive messages
  • โœ… When finished, merge into main via a Pull Request

๐Ÿ“น Explanatory Video (2.5 points)

  • Duration: 3 minutes maximum
  • Must show:
    • Brief explanation of the code
    • The script running live
    • The final Excel file with the extracted data
  • Upload the link in the file video/link.txt

๐Ÿ“ README.md

Must include at least:

  • What does the project do?
  • How to install the dependencies?
  • How to run the script?
  • What does the output contain?

๐Ÿ† Grading Rubric โ€” Task 1 (0 - 10 pts)

Criteria Points
Script works and correctly extracts all careers 2.5 pts
Explanatory video (3 min, shows code and output) 2.5 pts
Branch workflow + merge via Pull Request 1.5 pts
Complete consolidated Excel in output/ folder 1.5 pts
Explanatory README.md 1.0 pt
Progressive commits with descriptive messages 0.5 pt
Error handling in the code (try/except) 0.5 pt
TOTAL 10 pts

โš ๏ธ Penalty: If you are found to have worked directly on main
without using branches, 1.5 points will be automatically deducted.


๐ŸŽฎ Task 2: API REST โ€” RAWG Video Games Database

Score: 10 points

Description

The goal of this task is to consume the RAWG API to extract, analyze,
and compare video game data using Python.
You will create a new notebook inside the same repository Scraping_data.


๐Ÿ—‚๏ธ Expected Repository Structure

Add the following to your existing Scraping_data repository:

Scraping_data/
โ”‚
โ”œโ”€โ”€ scraper.py                        # (Task 1 โ€” already exists)
โ”œโ”€โ”€ README.md                         # Update with the API section
โ”‚
โ”œโ”€โ”€ api/
โ”‚   โ”œโ”€โ”€ tarea_rawg_api.ipynb          # Main task notebook
โ”‚   โ””โ”€โ”€ output/
โ”‚       โ””โ”€โ”€ top20_rawg.csv            # CSV file generated in the task

๐Ÿ”‘ Step 1 โ€” Get your RAWG API Key

  1. Go to https://rawg.io and create your account
  2. Visit https://rawg.io/apidocs
  3. Click Get API Key and fill out the form:
Field What to enter
Site/App URL https://localhost
Description API Python Class
  1. Copy your API Key and paste it into your notebook

โš ๏ธ Do not upload your API Key to GitHub. Store it in a local variable.


๐Ÿ““ Notebook Structure

Each section must have Markdown cells explaining what you are doing
and code cells with the output executed and visible.


๐ŸŸข Part A โ€” General Exploration ย ย ย  (2 pts)

A1 โ€” (1 pt)

How many games does RAWG have registered in total?
Print the number with a clear message.

Hint: the count field is in the response from the /games endpoint.


๐Ÿ”ต Part B โ€” Category Analysis ย ย ย  (2 pts)

B1 โ€” (1 pt)

What is the top 5 highest rated games of all time according to Metacritic?
Show: name, rating, and metacritic score.

B2 โ€” (1 pt)

What are the 10 best games available on Steam (store_id=1)?
Show name, rating, and metacritic score.


๐ŸŸก Part C โ€” Comparisons ย ย ย  (3 pts)

C1 โ€” (0.5 pts)

Compare the top 5 games on PC (platform_id=4) vs top 5 on PS5 (platform_id=187).
Which platform has the highest rated games?

C2 โ€” (0.5 pts)

Choose 3 famous games and build a comparison table with:
name, rating, metacritic, genres, and platforms.

C3 โ€” (0.5 pts)

Query the top 5 games from at least 4 different genres, calculate the
average rating for each, and determine which genre produces
the best games according to users.

C4 โ€” (0.5 pts)

Compare the best games from 3 different years of your choice.
In which year were the games with the highest average metacritic score released?

C5 โ€” (1.0 pt)

Export the top 20 games of all time to a CSV file named
top20_rawg.csv inside the api/output/ folder.

The CSV must have the following columns:
name, rating, metacritic, release_date, main_genre

Display the first 5 rows of the generated file in the notebook.


๐Ÿ”ด Part D โ€” Insights & Conclusions ย ย ย  (3 pts)

D1 โ€” (1.0 pt)

In a Markdown cell write your personal conclusions answering:

  • What was the most interesting thing you found in the data?
  • Which genre or platform surprised you the most and why?
  • What other question would you ask this API if you had more time?
  • How many requests did you use in total? (call client.resumen_requests())

This question is graded on the depth of your analysis,
not on having a "correct" answer.


๐ŸŒฟ GitHub Workflow (MANDATORY โ€” same rules as Task 1)

  • โŒ Do not work directly on main
  • โœ… Create a branch for this task (e.g. feature/api-rawg)
  • โœ… Progressive commits with descriptive messages
  • โœ… Merge into main via a Pull Request

๐Ÿ† Grading Rubric โ€” Task 2 (0 - 10 pts)

Criteria Points
Part A โ€” General Exploration 2.0 pts
Part B โ€” Category Analysis 2.0 pts
Part C โ€” Comparisons + CSV exported 3.0 pts
Part D โ€” Insights + personal conclusions 3.0 pts
TOTAL 10 pts

โš ๏ธ Penalty: Working directly on main deducts 3 points.
Code cells without visible output deduct 0.5 points per cell.


โœ… Checklist before submitting

  • The notebook is named tarea_rawg_api.ipynb and is inside the api/ folder
  • The file top20_rawg.csv is inside api/output/
  • All code cells are executed with visible output
  • The code is commented
  • You used a branch + Pull Request for the merge
  • The README.md mentions both tasks

๐Ÿ“Š Global Score Summary

Task Description Score
Task 1 Web Scraping โ€” UNMSM 10 pts
Task 2 API REST โ€” RAWG 10 pts
Course Total 20 pts

๐Ÿ“… Deadline

Friday, April 10 โ€” 11:59 PM
Submit the link to your repository in the same Google Sheets form.

๐Ÿ’ฌ Any questions, reach out on Discord. Good luck! ๐ŸŽฎ

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions