|
28 | 28 | <li> |
29 | 29 | <a href="#getting-started">Getting Started</a> |
30 | 30 | <ul> |
31 | | - <li><a href="#composition">File Composiiton</a></li> |
32 | | - <li><a href="#installation">Installation</a></li> |
| 31 | + <li><a href="#dataset-structure">File Composiiton</a></li> |
| 32 | + <li><a href="#citation">Citation</a></li> |
33 | 33 | </ul> |
34 | 34 | </li> |
35 | | - <li><a href="#usage">Usage</a></li> |
36 | 35 | <li><a href="#contact">Contact</a></li> |
37 | 36 | <li><a href="#acknowledgments">Acknowledgments</a></li> |
38 | 37 | </ol> |
@@ -62,29 +61,36 @@ Of course, there are limitations to this dataset as code classification by an LL |
62 | 61 | <!-- GETTING STARTED --> |
63 | 62 | ## Getting Started |
64 | 63 |
|
65 | | -### Composition |
| 64 | +### Dataset Structure |
66 | 65 |
|
67 | 66 | Here's a breakdown of the files in this dataset: |
68 | 67 | * 976 total files |
69 | 68 | * 666 files of original authors |
70 | 69 | * 108 rewritten files using Bing GPT-4 (61 formatted, 47 non-formatted) |
71 | 70 | * 202 rewritten files using ChatGPT-3.5 (59 formatted, 143 non-formatted) |
72 | 71 |
|
73 | | -### Installation |
74 | | - |
75 | | -To download this dataset, simply download it as a zip file and extract it from this GitHub page. |
76 | 72 |
|
77 | 73 | <p align="right">(<a href="#readme-top">back to top</a>)</p> |
78 | 74 |
|
| 75 | +## Citation |
| 76 | +If you use this dataset, please cite: |
| 77 | + |
| 78 | +```bibtex |
| 79 | +@misc{P24_Java, |
| 80 | + author = {Paek, Timothy}, |
| 81 | + title = {GPT Java Dataset: A Dataset for LLM-Generated Code Detection}, |
| 82 | + year = {2024}, |
| 83 | + howpublished = {GitHub Repository}, |
| 84 | + url = {https://github.com/tipaek/GPT-Java-Dataset} |
| 85 | +} |
| 86 | +``` |
79 | 87 |
|
80 | 88 |
|
81 | 89 | <!-- CONTACT --> |
82 | 90 | ## Contact |
83 | 91 |
|
84 | 92 | Timothy Paek - [Linked-In](https://www.linkedin.com/in/timothy-paek/) - tipaek@syr.edu |
85 | 93 |
|
86 | | -Project Link: [https://github.com/tipaek/GPTJavaDataset](https://github.com/tipaek/GPTJavaDataset) |
87 | | - |
88 | 94 | <p align="right">(<a href="#readme-top">back to top</a>)</p> |
89 | 95 |
|
90 | 96 |
|
|
0 commit comments