Skip to content

Commit efe47aa

Browse files
authored
Update README.md
1 parent 6734f3f commit efe47aa

File tree

1 file changed

+14
-4
lines changed

1 file changed

+14
-4
lines changed

README.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,8 @@ This dataset serves aims to be a resource for researchers focusing on AI-generat
5959
<!-- GETTING STARTED -->
6060
## Getting Started
6161

62-
### Composition
62+
### 📌 Dataset Structure
6363

64-
Here's a breakdown of the files in this dataset:
6564
* 76,089 total files
6665
* 58,524 files of original authors from the 2020 Google Code Jam
6766
* 17,565 rewritten files using GPT-4o
@@ -76,13 +75,24 @@ Researchers can use this dataset to:
7675

7776
<p align="right">(<a href="#readme-top">back to top</a>)</p>
7877

78+
## 🔗 Citation
79+
If you use this dataset, please cite:
80+
81+
```bibtex
82+
@misc{P24_GCJ,
83+
author = {Paek, Timothy},
84+
title = {GPT Java GCJ Dataset: The Largest LLM-Generated Code Dataset from Google Code Jam},
85+
year = {2024},
86+
howpublished = {GitHub Repository},
87+
url = {https://github.com/tipaek/GPT-Java-GCJ-Dataset}
88+
}
89+
```
90+
7991
<!-- CONTACT -->
8092
## Contact
8193

8294
Timothy Paek - [LinkedIn](https://www.linkedin.com/in/timothy-paek/) - tipaek@syr.edu
8395

84-
Project Link: [https://github.com/tipaek/GPT-Java-GCJ-Dataset](https://github.com/tipaek/GPT-Java-GCJ-Dataset)
85-
8696
<p align="right">(<a href="#readme-top">back to top</a>)</p>
8797

8898
<!-- ACKNOWLEDGMENTS -->

0 commit comments

Comments
 (0)