Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Climmate_academy/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This document contains the encyclopedia of a publication called **Climate Academy**.

It contains the keyphrases form the publication.
7,282 changes: 7,282 additions & 0 deletions Climmate_academy/climatereport.html

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions Cross Chaptor 7/ccp6_processed.html

Large diffs are not rendered by default.

63 changes: 63 additions & 0 deletions Cross Chaptor 7/html_id cleaning.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import requests
from bs4 import BeautifulSoup
import re

def fetch_html(url):
"""
Fetches HTML content from a given url
"""
headers = {"User-agent": "Mozilla/5.0"}
response = requests.get(url, headers=headers)

if response.status_code == 200:
return response.text
else:
print(f"Error: Unable to fetch page (Statust Code: {response.status_code})")
return None

def clean_html(html_content):
"""
Removes unnecessary attributes and cleans the HTML
"""
soup = BeautifulSoup(html_content, "html.parser")

#Remove Gatsby-related attributes
"""
Remove unnecessary elements and attributes from the parsed HTML
"""
# Remove <script> and <style> tags
for tag in soup(["script", "style"]):
tag.decompose()

# Remove unnecessary attributes (React attributes, inline styles, classes)
for tag in soup.find_all(True):
attrs_to_remove = [attr for attr in tag.attrs if re.match(r"^(data-|aria-|on)", attr)]
for attr in attrs_to_remove:
del tag[attr]
if 'class' in tag.attrs:
del tag['class']
if 'style' in tag.attrs:
del tag['style']
if 'id' in tag.attrs and not tag['id'].startswith('item-'):
del tag['id']

return str(soup)


def save_html(content, filename="ccp6_processed.html"):
"""
Saves cleaned HTML to a file.
"""
with open(filename, "w", encoding="utf-8") as file:
file.write(content)
print(f"Processed HTML saved as {filename}")


# IPCC webpage URL (replace with the specific URL you want)
ipcc_url = "https://www.ipcc.ch/report/ar6/wg2/chapter/ccp6/"

# Fetch, clean and save HTML
html_content = fetch_html(ipcc_url)
if html_content:
cleaned_html = clean_html(html_content)
save_html(cleaned_html)
25 changes: 25 additions & 0 deletions Daily_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
## Date : 16/04/2025 Wednesday

## Component
Resolving the issues with `docanalysis` in **Google colab environment**
### **current task** : Working of `docanalysis` in **Google colab** environment
### **current status** : Not able to install `docanalysis` in colab


## Date : 04/06/2025 Wednesday

- **Task**: Tested `pygetpapers` for downloading research articles using a query.
- **Initial Command**:
```bash
python -m pygetpapers.pygetpapers --query '"wildlife" AND "biodiversity"' --pdf --limit 5 --output downloaded_file --api openalex --output Wildlife

- **Output** :
```bash
pygetpapers.py: error: unrecognized arguments: AND biodiversity'
```
- **Reason** : The query string was not correctly escaped, causing the logical operator AND to be interpreted as separate arguments.

- **Solution** :
```bash
python -m pygetpapers.pygetpapers --query "\"wildlife\" AND \"biodiversity\"" --pdf --limit 5 --output Wildlife --api openalex
```
25 changes: 25 additions & 0 deletions Dictionary creation/README.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Dictionary Management

## Overview

This document outlines the process of managing the dictionary created from wordlists extracted from IPCC chapters. It includes quality checks, rectification, and updates to ensure consistency, accuracy, and usability.

## Tasks Involved

### 1. Managing the Dictionary
- Organizing and maintaining the dictionary derived from IPCC chapter wordlists.
- Ensuring proper structuring and accessibility for further processing and analysis.

### 2. Quality Check & Rectification
- Identifying and *removing repetitive words* to avoid redundancy.
- Removing words that *do not have an associated Wikipedia link* to maintain relevance.
- Identifying *ambiguous words* and linking them to *Wiktionary* for better clarity.
- Ensuring the dictionary remains well-structured and meaningful for users.

### 3. Updating the Dictionary Code
- Making necessary modifications to the code responsible for generating and managing the dictionary.
- Documenting all updates and improvements for future reference.
- Ensuring changes are reflected in the latest version of the dictionary.

## Contribution & Feedback
If you have suggestions for improvements or encounter any issues, feel free to raise an issue or submit a pull request.
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# internship_sC

This contains the work progress by the interns and discussion for different tasks.
## Poject overview
I am currently working on **IPCC Working Group 2, Cross Chapter 2: Polar Regions.**
The focus of this project is to develop resources that enhance the understanding of climate-related terminologies and concepts.

## Objectives
* Wordlist
* Dictionary
* Table of content of my IPCC chapter
* Network graph of IPCC main page and Syntesis Report
Loading