Skip to content

Commit f9ad372

Browse files
Merge pull request #216 from VarshaUN/gsoc2025-report
Add GSoC 2025 Final report
2 parents 9bd7f14 + 20a26a3 commit f9ad372

File tree

3 files changed

+116
-0
lines changed

3 files changed

+116
-0
lines changed
89.3 KB
Loading

docs/source/archive/gsoc-toc.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ GSoC 2025
1414
.. toctree::
1515
:maxdepth: 2
1616

17+
gsoc/reports/2025/scancodeio_varsha
1718
gsoc/reports/2025/scancodeio_aayush
1819
gsoc/reports/2025/scancodeio_manit
1920
gsoc/reports/2025/scancode_toolkit_alok
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
###################################################
2+
Adding ability to store/query downloaded packages
3+
###################################################
4+
5+
**Organization:** `AboutCode <https://aboutcode.org>`_
6+
7+
**Project:** `ScanCode.io
8+
<https://github.com/aboutcode-org/scancode.io>`_
9+
10+
| **Varsha U N**
11+
| GitHub: `VarshaUN <https://github.com/VarshaUN>`_
12+
| LinkedIn: `Varsha U N <https://www.linkedin.com/in/varsha-un/>`_
13+
14+
**Mentors:**
15+
16+
- `Philippe Ombredanne <https://github.com/pombredanne>`_
17+
- `Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_
18+
19+
**********
20+
Overview
21+
**********
22+
23+
Currently ScanCode.io scans the packages but doesn’t store it. This
24+
makes it difficult for users to maintain a reference of packages used in
25+
their projects, meet source redistribution obligations, or revisit
26+
scanned packages for future.
27+
28+
This project enhanced ScanCode.io by adding the ability to store and
29+
query downloaded packages locally and re-use packages that were already
30+
scanned.
31+
32+
----
33+
34+
****************
35+
Implementation
36+
****************
37+
38+
The project involved the following key components and steps:
39+
40+
.. figure:: /_static/gsoc2025/scancodeio_varsha/project_flow.png
41+
:alt: Project Flow Diagram
42+
:align: center
43+
:width: 70%
44+
45+
Currently ScanCode.io downloads packages but does not store them. The new archiving system stores downloaded packages on the local filesystem and allows querying them.
46+
47+
Storage System Development:
48+
49+
- Created a `DownloadStore` abstract base class in `archiving.py` to
50+
define the interface for managing package content and metadata
51+
storage.
52+
53+
- Built the `LocalFilesystemProvider` class to store downloads on the
54+
local filesystem, using a SHA256-based nested directory structure.
55+
56+
- Implemented methods for storing (`put`), retrieving (`get`), listing
57+
(`list`), and searching (`find`) downloads, with metadata saved in
58+
`origin-<hash>.json` files.
59+
60+
Integration with ScanCode.io:
61+
62+
- Updated `pipelines/init.py` to incorporate the archiving system into
63+
ScanCode.io’s pipeline workflow, ensuring downloaded packages are
64+
stored during execution.
65+
66+
- Revised `input.py` to process package download inputs, passing
67+
content, `download_url`, `download_date`, and `filename` to the
68+
archiving system.
69+
70+
User Interface Enhancements:
71+
72+
- Modified the project resource view to display stored package
73+
information, including download URLs and dates.
74+
75+
Validation and Testing:
76+
77+
- Wrote unit tests in `test_archiving.py` to verify
78+
`LocalFilesystemProvider` functionality (`put`, `get`, `list`,
79+
`find`), testing normal cases, edge cases (e.g., empty files), and
80+
errors (e.g., duplicate origins).
81+
82+
**********************
83+
Linked Pull Request:
84+
**********************
85+
86+
Add download archiving system with local filesystem provider -
87+
(https://github.com/aboutcode-org/scancode.io/pull/1815)
88+
89+
****************
90+
Related Issue:
91+
****************
92+
93+
Store and retrieve on demand scanned packages/archives -
94+
(https://github.com/aboutcode-org/scancode.io/issues/1063)
95+
96+
********
97+
Links:
98+
********
99+
100+
| Project Idea: `Idea Link
101+
<https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2025-project-ideas#scancodeio-add-ability-to-storequery-downloaded-packages>`_
102+
| GSoC Project Page: `GSoC 2025
103+
<https://summerofcode.withgoogle.com/programs/2025/projects/x7sA6uN6>`_
104+
| Proposal: `Proposal Link
105+
<https://docs.google.com/document/d/1LfTGfatLfg9RB-OyLhlS4_h0-Tc9Q8QU1ObsCVDV_sM/edit?usp=sharing>`_
106+
107+
***************
108+
Closing Notes
109+
***************
110+
111+
During the GSoC coding period, my mentors and I had weekly meetings to
112+
discuss progress, challenges, and next steps. Thank you so much to my
113+
mentors for being there every step of the way during GSoC 2025. Your
114+
encouragement and insights made a huge difference in my learning
115+
journey.

0 commit comments

Comments
 (0)