Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
643 commits
Select commit Hold shift + click to select a range
8bc8114
docs: fix Chinese docs path
ChenZiHong-Gavin Oct 28, 2025
c0088b2
tests: add ollama_client tests
ChenZiHong-Gavin Oct 28, 2025
556264e
fix: fix generate_topk_per_token in ollmam_client
ChenZiHong-Gavin Oct 28, 2025
c66176f
fix: delete useless tests
ChenZiHong-Gavin Oct 28, 2025
a1cc4ec
fix: fix transformers warning not using GenerationConfig
ChenZiHong-Gavin Oct 28, 2025
776c9c2
fix: fix _build_inputs in hf_wrapper
ChenZiHong-Gavin Oct 28, 2025
9463da2
fix: fix gen_kwargs
ChenZiHong-Gavin Oct 28, 2025
4231526
chore: delete ds_wrapper
ChenZiHong-Gavin Oct 28, 2025
f81c90f
feat: add vllm_wrapper
ChenZiHong-Gavin Oct 28, 2025
41c5871
wip:sglang backend
ChenZiHong-Gavin Oct 29, 2025
c061e16
fix: change llm_wrapper type
ChenZiHong-Gavin Oct 29, 2025
964bddb
wip: add sglang_wrapper
ChenZiHong-Gavin Oct 29, 2025
068ad9c
docs: update .env.example
ChenZiHong-Gavin Oct 29, 2025
dc61561
fix: fix parsing token_logprobs in sglang_wrapper
ChenZiHong-Gavin Oct 29, 2025
8c74d9f
Merge pull request #74 from open-sciencelab/feature/inference-backend
ChenZiHong-Gavin Oct 29, 2025
d628e5f
fix: fix loss_entropy
ChenZiHong-Gavin Oct 29, 2025
e5fc07a
Merge pull request #77 from open-sciencelab/fix/fix-logprobs
ChenZiHong-Gavin Oct 29, 2025
b7ba5a5
docs: update README
ChenZiHong-Gavin Oct 30, 2025
92d494f
docs: update README
ChenZiHong-Gavin Oct 30, 2025
4396ae4
docs: update README
ChenZiHong-Gavin Oct 30, 2025
ebe34a8
docs: update README
ChenZiHong-Gavin Oct 30, 2025
f7cc3ad
docs: update README
ChenZiHong-Gavin Oct 30, 2025
76b9979
docs: update README
ChenZiHong-Gavin Oct 30, 2025
edebfb2
docs: update README
ChenZiHong-Gavin Oct 30, 2025
3707500
docs: update README
ChenZiHong-Gavin Oct 30, 2025
79a7752
merge from main
ChenZiHong-Gavin Oct 30, 2025
12f26d2
fix: fix lint errors
ChenZiHong-Gavin Oct 30, 2025
cad0548
delete search_mo
ChenZiHong-Gavin Oct 30, 2025
6eecc2d
feat: add mo_kg_builder
ChenZiHong-Gavin Nov 3, 2025
77d33f7
chore: downgrade numpy in requirements.txt
ChenZiHong-Gavin Nov 3, 2025
e1fa3a2
fix: fix dependencies
ChenZiHong-Gavin Nov 3, 2025
86d0df0
wip
ChenZiHong-Gavin Nov 4, 2025
f53c910
chore: delete code-review
ChenZiHong-Gavin Nov 4, 2025
35fb399
refactor: refactor graphgen.insert for scalability
ChenZiHong-Gavin Nov 4, 2025
a98c02a
Merge pull request #78 from open-sciencelab/refactor/insert
ChenZiHong-Gavin Nov 4, 2025
d92dc6f
merge from main
ChenZiHong-Gavin Nov 4, 2025
55a017f
Update graphgen/models/generator/protein_qa_generator.py
ChenZiHong-Gavin Nov 4, 2025
f3ff41b
Update graphgen/models/kg_builder/mo_kg_builder.py
ChenZiHong-Gavin Nov 4, 2025
03b400f
fix: attach additional data to node from image chunks
ChenZiHong-Gavin Nov 4, 2025
7b2e642
Merge pull request #79 from open-sciencelab/fix/attach-additional-data
ChenZiHong-Gavin Nov 4, 2025
f47cb17
fix: fix operator partition_kg
ChenZiHong-Gavin Nov 4, 2025
ebdc9f3
fix: fix partition_kg
ChenZiHong-Gavin Nov 4, 2025
c3bb489
Merge branch 'feature/protein-qa' of https://github.com/ChenZiHong-Ga…
ChenZiHong-Gavin Nov 4, 2025
efdddf5
feat: add get_by_fasta in UniProtSearch
ChenZiHong-Gavin Nov 4, 2025
42d078c
fix: add docstring
ChenZiHong-Gavin Nov 4, 2025
51cf700
Merge pull request #80 from open-sciencelab/feature/uniprot_search
ChenZiHong-Gavin Nov 4, 2025
6206568
merge from main
ChenZiHong-Gavin Nov 4, 2025
8304bea
feat: add PickleReader
ChenZiHong-Gavin Nov 5, 2025
301d045
Merge pull request #81 from open-sciencelab/feature/pickle-reader
ChenZiHong-Gavin Nov 5, 2025
11d8035
feat: add ParquetReader
ChenZiHong-Gavin Nov 5, 2025
aad04b1
Merge pull request #82 from open-sciencelab/feature/parquet-reader
ChenZiHong-Gavin Nov 5, 2025
c3d092c
feat: add rdf_reader
ChenZiHong-Gavin Nov 5, 2025
509d7bd
feat: update read_files
ChenZiHong-Gavin Nov 5, 2025
431885e
docs: add schema_guided_config
ChenZiHong-Gavin Nov 5, 2025
eb72993
wip
ChenZiHong-Gavin Nov 5, 2025
201cf93
feat: add schema_guided extraction prompt template
ChenZiHong-Gavin Nov 6, 2025
a7074ec
feat: add schema_guided_extraction config
ChenZiHong-Gavin Nov 6, 2025
5335e73
feat: add extract_schema_guided.sh
ChenZiHong-Gavin Nov 6, 2025
8787e9b
wip: add schema_guided_extractor
ChenZiHong-Gavin Nov 6, 2025
e5a98a8
feat: add orchestration engine for GraphGen and tests
ChenZiHong-Gavin Nov 6, 2025
343b9b2
styles: add todos
ChenZiHong-Gavin Nov 6, 2025
19b830f
chore: update requirements
ChenZiHong-Gavin Nov 6, 2025
0229e7d
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Nov 6, 2025
296292e
refactor: refactor graphgen to integrete orchestration engine
ChenZiHong-Gavin Nov 6, 2025
e84f236
style: rename generate.py to run.py
ChenZiHong-Gavin Nov 6, 2025
b74a0cd
fix: switch to new configs
ChenZiHong-Gavin Nov 6, 2025
8c48998
feat: validate content of input files
ChenZiHong-Gavin Nov 6, 2025
6a45ae2
Merge pull request #86 from open-sciencelab/feature/input-file-valida…
ChenZiHong-Gavin Nov 6, 2025
4e0272d
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Nov 6, 2025
af3d2d6
Merge branch 'feature/orchestration-operation' of https://github.com/…
ChenZiHong-Gavin Nov 6, 2025
5bce338
wip: add extract_info
ChenZiHong-Gavin Nov 6, 2025
3fc1924
wip
ChenZiHong-Gavin Nov 7, 2025
6b73a01
feat: add _meta.json to record processed chunks
ChenZiHong-Gavin Nov 7, 2025
6ed1bc7
merge
ChenZiHong-Gavin Nov 7, 2025
06ee93e
wip
ChenZiHong-Gavin Nov 7, 2025
3c38374
fix: update gradio
ChenZiHong-Gavin Nov 7, 2025
297c8e6
Merge pull request #85 from open-sciencelab/feature/orchestration-ope…
ChenZiHong-Gavin Nov 7, 2025
05d8801
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Nov 7, 2025
e7137b7
fix: fix txt_reader, not splitting lines
ChenZiHong-Gavin Nov 7, 2025
292ca73
docs: update example data
ChenZiHong-Gavin Nov 7, 2025
fb3ce70
feat: complete extract_info pipeline
ChenZiHong-Gavin Nov 7, 2025
2abaa63
fix: update webui
ChenZiHong-Gavin Nov 7, 2025
f4c3fa6
Merge pull request #84 from open-sciencelab/feature/schema_guided_build
ChenZiHong-Gavin Nov 7, 2025
6ed6204
feat: support input folders
ChenZiHong-Gavin Nov 7, 2025
413148b
Merge pull request #87 from open-sciencelab/feature/input-folder
ChenZiHong-Gavin Nov 7, 2025
525dae3
docs: add example data for extraction task
ChenZiHong-Gavin Nov 7, 2025
1941f0c
fix: fix missing content in extraction prompt
ChenZiHong-Gavin Nov 7, 2025
dd93840
feat: add allow_suffix to limit input files
ChenZiHong-Gavin Nov 7, 2025
bc41334
feat: support .md files using TXTReader
ChenZiHong-Gavin Nov 7, 2025
37d7d88
fix: fix init_llm.py
ChenZiHong-Gavin Nov 7, 2025
57c1c66
fix: fix init_llm.py
ChenZiHong-Gavin Nov 7, 2025
ac60534
style: adjust example schema
ChenZiHong-Gavin Nov 10, 2025
10167d4
style: adjust example schema
ChenZiHong-Gavin Nov 10, 2025
5c2a57e
feat: add search_any in uniprot_search
ChenZiHong-Gavin Nov 10, 2025
e63c1a7
style: delete useless code
ChenZiHong-Gavin Nov 10, 2025
9451be2
refactor: refactor search_all to support uniprot_search
ChenZiHong-Gavin Nov 10, 2025
4703b85
Merge pull request #88 from open-sciencelab/refactor/refactor-search
ChenZiHong-Gavin Nov 10, 2025
0662d2b
feat: add local blast
ChenZiHong-Gavin Nov 11, 2025
f1d3797
fix: fix async search
ChenZiHong-Gavin Nov 11, 2025
398a244
Merge pull request #90 from open-sciencelab/refactor/refactor-search
ChenZiHong-Gavin Nov 11, 2025
cdddfac
merge
ChenZiHong-Gavin Nov 12, 2025
14a1ddc
Create contributing.md
ChenZiHong-Gavin Nov 12, 2025
8b3ca1c
fix: fix protein generator
ChenZiHong-Gavin Nov 12, 2025
6192964
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Nov 12, 2025
c668af7
style: change config name
ChenZiHong-Gavin Nov 12, 2025
a8d475b
feat: remove think content when text containing only </think>
ChenZiHong-Gavin Nov 12, 2025
f08f6db
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Nov 12, 2025
b168327
feat: support tp_size for sglang
ChenZiHong-Gavin Nov 12, 2025
0bf75ff
fix: store graph
ChenZiHong-Gavin Nov 12, 2025
5e0a608
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Nov 12, 2025
00c2733
fix: add tp_size for sglang_wrapper
ChenZiHong-Gavin Nov 12, 2025
ec70077
fix: add tp_size for sglang_wrapper
ChenZiHong-Gavin Nov 12, 2025
eafa48f
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Nov 12, 2025
031c94a
fix: fix duplicate param
ChenZiHong-Gavin Nov 12, 2025
1b69b32
fix: fix tp_size
ChenZiHong-Gavin Nov 12, 2025
11dc4ee
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Nov 12, 2025
457c953
fix: update prompt for protein_kg
ChenZiHong-Gavin Nov 12, 2025
02fbd7a
fix: refactor e2e tests to extract common test logic
CHERRY-ui8 Nov 19, 2025
6964da5
Merge pull request #93 from CHERRY-ui8/refactor/extract-e2e-test-comm…
ChenZiHong-Gavin Nov 19, 2025
84ed1c4
fix: OpenAIClient parameter from model_name to model to resolve key m…
CHERRY-ui8 Nov 20, 2025
a5a9d7c
Merge branch 'main' into fix/quiz-refactor
CHERRY-ui8 Nov 20, 2025
d06c053
refactor: implement QuizGenerator and refactor quiz_and_judge to stan…
CHERRY-ui8 Nov 20, 2025
00c1028
refactor: use run_concurrent in quiz operator and fix language codes
CHERRY-ui8 Nov 20, 2025
cf2911d
Merge pull request #95 from CHERRY-ui8/update-baselines-and-clients
ChenZiHong-Gavin Nov 20, 2025
bad177d
fix: change language detection to use detect_main_language directly
CHERRY-ui8 Nov 20, 2025
828ba5c
fix: change language detection to use detect_main_language directly
CHERRY-ui8 Nov 20, 2025
8620acc
fix: change language detection to use detect_main_language directly i…
CHERRY-ui8 Nov 20, 2025
16d6eed
feat: add progress bar support and refactor concurrent operations
CHERRY-ui8 Nov 20, 2025
232d1e1
Merge pull request #97 from CHERRY-ui8/fix/quiz-refactor
ChenZiHong-Gavin Nov 21, 2025
477b2c6
Merge pull request #98 from CHERRY-ui8/feat/add-progress-bar-and-refa…
ChenZiHong-Gavin Nov 21, 2025
1ecded0
docs: update README
ChenZiHong-Gavin Nov 21, 2025
48b1e38
feat: added support for azure open ai api
muhammadyaseen Nov 21, 2025
7f33fac
misc: removed unused import
muhammadyaseen Nov 21, 2025
c3833c4
misc: fixing pylint issues
muhammadyaseen Nov 21, 2025
c78c62c
misc: pylint fixes
muhammadyaseen Nov 21, 2025
63ab858
Merge pull request #99 from muhammadyaseen/feat/azure-openai-support
ChenZiHong-Gavin Nov 21, 2025
a991a51
refactor: refactor storage methods to non-async as not necessary
ChenZiHong-Gavin Nov 25, 2025
8964709
chore: fix gradio version
ChenZiHong-Gavin Nov 25, 2025
e72c570
Merge pull request #100 from open-sciencelab/fix/fix-async-storage
ChenZiHong-Gavin Nov 25, 2025
e2e5aec
feat: enable UniProt search in protein QA pipeline
CHERRY-ui8 Nov 25, 2025
e1b25c8
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
CHERRY-ui8 Nov 25, 2025
af3ec45
feat: add parallel_file_scanner & make read_files streamable
ChenZiHong-Gavin Nov 26, 2025
2c1e141
chore: update requirements(diskcache)
ChenZiHong-Gavin Nov 26, 2025
d3e4e3c
Merge pull request #101 from open-sciencelab/feature/parallel_file_sc…
ChenZiHong-Gavin Nov 26, 2025
33d335e
workflow: add search_uniprot example
ChenZiHong-Gavin Nov 26, 2025
4d8a807
Merge pull request #102 from open-sciencelab/workflow/add-search-work…
ChenZiHong-Gavin Nov 26, 2025
01429bd
Merge remote-tracking branch 'origin/main' into feature/protein-qa
CHERRY-ui8 Nov 26, 2025
a140798
feat: support complex configs
ChenZiHong-Gavin Nov 26, 2025
771b8b2
feat(webui): update gradio webui
ChenZiHong-Gavin Nov 26, 2025
adc9c9a
Merge pull request #103 from open-sciencelab/feature/enable-complex-c…
ChenZiHong-Gavin Nov 26, 2025
7ede5f1
Revert "feat: enable UniProt search in protein QA pipeline"
CHERRY-ui8 Nov 26, 2025
69bd582
chore: add test_database_access.py to .gitignore
CHERRY-ui8 Nov 26, 2025
2e4afab
refactor: delete meta storage
ChenZiHong-Gavin Nov 26, 2025
48dc3f6
fix: fix lint problem
ChenZiHong-Gavin Nov 26, 2025
e01541c
Merge pull request #104 from open-sciencelab/feature/optimize-pipeline
ChenZiHong-Gavin Nov 26, 2025
1e07969
Merge pull request #105 from CHERRY-ui8/feat/add-dna-rna-search
CHERRY-ui8 Dec 1, 2025
b527bfb
docs: update README
ChenZiHong-Gavin Dec 1, 2025
3282ff5
docs: update README
ChenZiHong-Gavin Dec 1, 2025
38d4d9d
merge from main
ChenZiHong-Gavin Dec 2, 2025
5eebc20
Update star history link in README
tpoisonooo Dec 2, 2025
9e52d49
Update star history chart link in README_zh.md
tpoisonooo Dec 2, 2025
bfffe2c
Merge pull request #106 from Intern-Science/tpoisonooo-patch-1
tpoisonooo Dec 2, 2025
8adb434
feat: add bds baseline (#48)
ChenZiHong-Gavin Dec 2, 2025
b39493e
fix: fix bds baseline (#108)
ChenZiHong-Gavin Dec 2, 2025
e3940b5
Fix repository link in star history chart
ChenZiHong-Gavin Dec 3, 2025
4f302a3
Fix repository link in star history chart
ChenZiHong-Gavin Dec 3, 2025
a3d2396
refactor: replace sqlite with rocksdb (#109)
ChenZiHong-Gavin Dec 3, 2025
1006490
feat: add config and operator node types
ChenZiHong-Gavin Dec 3, 2025
3b59e82
refactor: refactor readers with ray data
ChenZiHong-Gavin Dec 3, 2025
82c7d36
fix: delete param parallelism for readers
ChenZiHong-Gavin Dec 3, 2025
7d6328f
fix: fix import error
ChenZiHong-Gavin Dec 3, 2025
6546efd
refactor read and chunk operators with no side effects
ChenZiHong-Gavin Dec 4, 2025
8aae44b
fix: fix import error
ChenZiHong-Gavin Dec 4, 2025
25af276
fix: fix return logic
ChenZiHong-Gavin Dec 4, 2025
b1cbf00
refactor: rename operator split to chunk
ChenZiHong-Gavin Dec 4, 2025
9c9a61a
refactor: refactor build_kg to accomodate ray data
ChenZiHong-Gavin Dec 4, 2025
6757c0e
feat: add StorageFactory & global params
ChenZiHong-Gavin Dec 4, 2025
c5e8486
fix: fix dna/rna local blast
CHERRY-ui8 Dec 4, 2025
f980460
refactor: refactor quiz to accomodata ray data engine
ChenZiHong-Gavin Dec 5, 2025
b009342
fix: fix dna/rna local blast (#111)
CHERRY-ui8 Dec 5, 2025
7751c10
fix: reload graph before quizzing
ChenZiHong-Gavin Dec 5, 2025
7fb2b00
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Dec 5, 2025
e284bee
Potential fix for pull request finding 'Unreachable code'
ChenZiHong-Gavin Dec 5, 2025
57dfb7a
fix: fix quiz params
ChenZiHong-Gavin Dec 5, 2025
31b08f9
Merge remote-tracking branch 'origin/main' into main
CHERRY-ui8 Dec 5, 2025
a3b34ee
fix: fix rna search with no gene info
CHERRY-ui8 Dec 9, 2025
094fd20
Update graphgen/models/searcher/db/rnacentral_searcher.py
CHERRY-ui8 Dec 9, 2025
a6e5eb1
refactor: refactor quiz&judge to ray actors
ChenZiHong-Gavin Dec 10, 2025
224038a
Merge branch 'refactor/refactor-with-ray-data' of https://github.com/…
ChenZiHong-Gavin Dec 10, 2025
bd1f033
fix: disable API fallback when local BLAST is enabled
CHERRY-ui8 Dec 10, 2025
01791c0
fix: fix transferring quizzed data to JudgeService
ChenZiHong-Gavin Dec 10, 2025
3cff4ca
refactor: refactor partition to accomodate ray data
ChenZiHong-Gavin Dec 10, 2025
33c0625
fix: fix lint problem
ChenZiHong-Gavin Dec 10, 2025
9dd68c7
refactor: refactor op generate
ChenZiHong-Gavin Dec 11, 2025
965d497
feat: write results in output folder
ChenZiHong-Gavin Dec 11, 2025
94bd2eb
fix: raise error when no dataset is created
ChenZiHong-Gavin Dec 11, 2025
b155a0f
fix: return generator in ece_partitioner
ChenZiHong-Gavin Dec 11, 2025
bfbf932
fix: return generator in ece_partitioner
ChenZiHong-Gavin Dec 11, 2025
65adb45
refactor: refactor data format to support multi-modal input
ChenZiHong-Gavin Dec 11, 2025
70f2347
fix: delete fetching schema to avoid ray's duplicate execution
ChenZiHong-Gavin Dec 11, 2025
9e3e5dc
fix: fix operators' registry
ChenZiHong-Gavin Dec 11, 2025
e7ae3b6
feat: refactor schema_guided_extraction & add examples
ChenZiHong-Gavin Dec 11, 2025
0c1cdde
add: add local rna databases and merge
CHERRY-ui8 Dec 11, 2025
77b2969
add: add local dna databases of more species
CHERRY-ui8 Dec 11, 2025
79bf775
add: add local UniProt mirror and more download options
CHERRY-ui8 Dec 11, 2025
66d37d5
feat: seperate ray logs and service logs
ChenZiHong-Gavin Dec 12, 2025
b574922
feat: enable faster search
CHERRY-ui8 Dec 12, 2025
64071ae
add: enable mid-auto save in searcher
CHERRY-ui8 Dec 12, 2025
b722bd8
fix: accept both U and T as RNA seq
CHERRY-ui8 Dec 13, 2025
3592099
add: add blast threads and search max_concurrent
CHERRY-ui8 Dec 14, 2025
3ad0de7
fix: fix max_concurrent parameter in search_all
CHERRY-ui8 Dec 14, 2025
0804a40
add: support multi-file database search in dna
CHERRY-ui8 Dec 14, 2025
9796822
feat: enhance JSONL reading and storage with streaming and batch proc…
CHERRY-ui8 Dec 14, 2025
be32678
add: add retry for all API usage and support extract sequence from lo…
CHERRY-ui8 Dec 14, 2025
9770f58
add: add retry for all API usage in RNA search
CHERRY-ui8 Dec 14, 2025
d8497d3
feat: use storage actor
ChenZiHong-Gavin Dec 15, 2025
873c45b
feat: add kuzu graph database
ChenZiHong-Gavin Dec 15, 2025
6e656ae
feat: add llm as actors
ChenZiHong-Gavin Dec 15, 2025
7ff6c7d
refactor: delete old runner
ChenZiHong-Gavin Dec 15, 2025
7b1adac
add: resume downloading in DNA local blast buiding
CHERRY-ui8 Dec 15, 2025
3b3e723
add: add rfam in selected RNA databases
CHERRY-ui8 Dec 15, 2025
0a390fe
Merge PR #110: refactor with ray-data, preserving streaming read func…
CHERRY-ui8 Dec 15, 2025
2ff2272
fix: fix vllm wrapper
ChenZiHong-Gavin Dec 15, 2025
01fe026
Merge remote-tracking branch 'intern/refactor/refactor-with-ray-data'…
CHERRY-ui8 Dec 15, 2025
3e0a8bb
add: omic KG buiding
CHERRY-ui8 Dec 16, 2025
30102cb
add: genarate omics qa configs
CHERRY-ui8 Dec 16, 2025
bfc7eb8
fix: support async search in ray pipenline
CHERRY-ui8 Dec 16, 2025
d103175
refactor: delete files no longer exist in ray pipeline
CHERRY-ui8 Dec 16, 2025
6bc49a5
fix: change search configs to fit ray
CHERRY-ui8 Dec 16, 2025
b7fec27
refactor: refactor pipeline engine using ray data (#110)
ChenZiHong-Gavin Dec 16, 2025
7174872
Merge main branch (PR #110) into fix/rna-search-gene-info
CHERRY-ui8 Dec 16, 2025
fd4a2a7
docs: update README
ChenZiHong-Gavin Dec 16, 2025
fe25abe
Merge fix/rna-search-gene-info into feature/protein-qa: integrate new…
CHERRY-ui8 Dec 16, 2025
d021f7c
fix: fix data type of num_gpus & generate_topk_per_token of vllmwrapp…
ChenZiHong-Gavin Dec 16, 2025
aaf2e6a
feat: expand protein-qa to omics (dna rna prot)
CHERRY-ui8 Dec 16, 2025
51fa274
fix: disable metrics exporter to prevent RpcError
CHERRY-ui8 Dec 16, 2025
f452b4f
feat: enhance OmicsQAGenerator and PartitionService for data extraction
CHERRY-ui8 Dec 16, 2025
9cbe5ad
Merge fork/feature/protein-qa: resolve conflicts and remove deprecate…
CHERRY-ui8 Dec 16, 2025
689560b
fix: delete deplicate --output param
ChenZiHong-Gavin Dec 17, 2025
a8dcc9c
Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…
ChenZiHong-Gavin Dec 17, 2025
52ee782
fix: allow multi input sources and ensure sequence extraction in omic…
CHERRY-ui8 Dec 17, 2025
d4511c1
fix: update asyncio event loop retrieval in evaluators and correct ge…
CHERRY-ui8 Dec 17, 2025
7634a5b
add: allow search result as input for omics KG
CHERRY-ui8 Dec 17, 2025
0bca1e7
Merge branch 'main' into feature/protein-qa
CHERRY-ui8 Dec 17, 2025
35ffa68
fix: fix pylint problems
CHERRY-ui8 Dec 17, 2025
1e08c61
Refactor logging setup and clean up whitespace in various files. Adju…
CHERRY-ui8 Dec 17, 2025
6dd2dfc
Merge remote-tracking branch 'fork/feature/protein-qa' into feature/p…
CHERRY-ui8 Dec 17, 2025
57faab0
chore: update .gitignore
CHERRY-ui8 Dec 17, 2025
65c2270
fix: remove MOKGBuilder and related functionality to avoid duplication
CHERRY-ui8 Dec 17, 2025
082e25a
fix: fix some small problems on gemini bot's advice
CHERRY-ui8 Dec 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
55 changes: 53 additions & 2 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,6 +1,57 @@
SYNTHESIZER_MODEL=
# Tokenizer
TOKENIZER_MODEL=

# LLM
# Support different backends: http_api, openai_api, ollama_api, ollama, huggingface, tgi, sglang, tensorrt

# http_api / openai_api
SYNTHESIZER_BACKEND=openai_api
SYNTHESIZER_MODEL=gpt-4o-mini
SYNTHESIZER_BASE_URL=
SYNTHESIZER_API_KEY=
TRAINEE_MODEL=
TRAINEE_BACKEND=openai_api
TRAINEE_MODEL=gpt-4o-mini
TRAINEE_BASE_URL=
TRAINEE_API_KEY=

# azure_openai_api
# SYNTHESIZER_BACKEND=azure_openai_api
# The following is the same as your "Deployment name" in Azure
# SYNTHESIZER_MODEL=<your-deployment-name>
# SYNTHESIZER_BASE_URL=https://<your-resource-name>.openai.azure.com/openai/deployments/<your-deployment-name>/chat/completions
# SYNTHESIZER_API_KEY=
# SYNTHESIZER_API_VERSION=<api-version>

# # ollama_api
# SYNTHESIZER_BACKEND=ollama_api
# SYNTHESIZER_MODEL=gemma3
# SYNTHESIZER_BASE_URL=http://localhost:11434
#
# Note: TRAINEE with ollama_api backend is not supported yet as ollama_api does not support logprobs.

# # huggingface
# SYNTHESIZER_BACKEND=huggingface
# SYNTHESIZER_MODEL=Qwen/Qwen2.5-0.5B-Instruct
#
# TRAINEE_BACKEND=huggingface
# TRAINEE_MODEL=Qwen/Qwen2.5-0.5B-Instruct

# # sglang
# SYNTHESIZER_BACKEND=sglang
# SYNTHESIZER_MODEL=Qwen/Qwen2.5-0.5B-Instruct
# SYNTHESIZER_TP_SIZE=1
# SYNTHESIZER_NUM_GPUS=1

# TRAINEE_BACKEND=sglang
# TRAINEE_MODEL=Qwen/Qwen2.5-0.5B-Instruct
# SYNTHESIZER_TP_SIZE=1
# SYNTHESIZER_NUM_GPUS=1

# # vllm
# SYNTHESIZER_BACKEND=vllm
# SYNTHESIZER_MODEL=Qwen/Qwen2.5-0.5B-Instruct
# SYNTHESIZER_NUM_GPUS=1

# TRAINEE_BACKEND=vllm
# TRAINEE_MODEL=Qwen/Qwen2.5-0.5B-Instruct
# TRAINEE_NUM_GPUS=1
24 changes: 24 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Additional context**
Add any other context about the problem here.
20 changes: 20 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
34 changes: 34 additions & 0 deletions .github/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## Contribution Guide
Here are the steps to contribute to this project:

1. Star this repository.
2. Fork this repository.

Type the following command on Git bash console:
```bash
git clone https://github.com/open-sciencelab/GraphGen.git
```

3. Create a new branch

Now before making changes to the files, go to your terminal under the repo you just cloned, and type the following:

```bash
git checkout -b add-my-name
```

By running the above command, you just created a new branch called add-my-name and checked it out, what this does is that it creates a new branch with the commit history of the master branch or the branch that you were on previously.

4. Make your changes and push your code.

```
git add .
git commit -m "xxx"
git push
```

This will create a new commit with the changes you made.

5. Now create a pull request and add the title.

Sit back and relax while your pull request is being reviewed and merged.
17 changes: 17 additions & 0 deletions .github/sync-config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
sync:
- source: graphgen/
dest: graphgen/
- source: resources/nltk_data/
dest: resources/nltk_data/
- source: resources/examples/
dest: resources/examples/
- source: resources/images/logo.png
dest: resources/images/logo.png
- source: webui/
dest: webui/
- source: webui/app.py
dest: app.py
- source: requirements.txt
dest: requirements.txt
- source: LICENSE
dest: LICENSE
51 changes: 51 additions & 0 deletions .github/workflows/push-to-hf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Push demo branch to Hugging Face

on:
workflow_call:
inputs:
ref:
required: false
default: demo
type: string
secrets:
HF_TOKEN:
required: true

jobs:
push-hf:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ inputs.ref }}
token: ${{ secrets.GITHUB_TOKEN }}

- name: Configure Git identity
run: |
git config --global user.email "actions@github.com"
git config --global user.name "github-actions[bot]"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install huggingface_hub

- name: Push to Hugging Face
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_REPO_TYPE: spaces
HF_REPO_ID: chenzihong/GraphGen
run: |
git config --global credential.helper store
echo "https://user:${HF_TOKEN}@huggingface.co" > ~/.git-credentials

[[ -d hf-repo ]] && rm -rf hf-repo
git clone https://huggingface.co/${HF_REPO_TYPE}/${HF_REPO_ID} hf-repo

rsync -a --delete --exclude='.git' --exclude='hf-repo' --exclude='README.md' ./ hf-repo/

cd hf-repo
git add .
git diff-index --quiet HEAD || \
(git commit -m "Auto-sync from ${{ inputs.ref }} at $(date -u)" && git push)
50 changes: 50 additions & 0 deletions .github/workflows/push-to-ms.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Push demo branch to ModelScope

on:
workflow_call:
inputs:
ref:
required: false
default: demo
type: string
secrets:
MS_TOKEN:
required: true

jobs:
push-ms:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ inputs.ref }}
token: ${{ secrets.GITHUB_TOKEN }}

- name: Configure Git identity
run: |
git config --global user.email "actions@github.com"
git config --global user.name "github-actions[bot]"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
# ModelScope official SDK (optional, install only if you need to call the platform API)
pip install modelscope

- name: Push to ModelScope
env:
MS_TOKEN: ${{ secrets.MS_TOKEN }}
MS_REPO_TYPE: studios
MS_REPO_ID: chenzihong/GraphGen
run: |
[[ -d ms-repo ]] && rm -rf ms-repo
git clone https://oauth2:${MS_TOKEN}@www.modelscope.cn/${MS_REPO_TYPE}/${MS_REPO_ID}.git ms-repo

rsync -a --delete --exclude='.git' --exclude='ms-repo' --exclude='README.md' ./ ms-repo/

cd ms-repo
git add .
git diff-index --quiet HEAD || \
(git commit -m "Auto-sync from ${{ inputs.ref }} at $(date -u)" && \
git push "https://oauth2:${MS_TOKEN}@www.modelscope.cn/${MS_REPO_TYPE}/${MS_REPO_ID}.git")
2 changes: 1 addition & 1 deletion .github/workflows/pylint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11"]
python-version: ["3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4
Expand Down
File renamed without changes.
96 changes: 96 additions & 0 deletions .github/workflows/sync-demo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
name: Sync Demo Branch

on:
push:
branches:
- main
workflow_dispatch:

jobs:
sync-demo:
runs-on: ubuntu-latest

steps:
- name: Checkout main branch
uses: actions/checkout@v4
with:
ref: main
token: ${{ secrets.GITHUB_TOKEN }}

- name: Create demo branch if it doesn't exist
run: |
if ! git ls-remote --heads origin demo | grep -q demo; then
echo "Creating demo branch..."
git checkout -b demo
git push origin demo
else
echo "Demo branch already exists"
fi

- name: Checkout demo branch
uses: actions/checkout@v4
with:
ref: demo
token: ${{ secrets.GITHUB_TOKEN }}
path: demo

- name: Clean demo directory
run: |
cd demo
find . -mindepth 1 -path './.git' -prune -o -exec rm -rf {} + 2>/dev/null || true

- name: Copy files using config
run: |
yq eval '.sync[] | .source + ":" + .dest' .github/sync-config.yml | while IFS=: read -r src dst; do
src=$(echo "$src" | xargs)
dst=$(echo "$dst" | xargs)

[ -z "$src" ] && continue

if [ -e "$src" ]; then
target_path="demo/$dst"

# 处理目录(以/结尾或本身是目录)
if [[ "$dst" == */ ]] || [ -d "$src" ]; then
mkdir -p "$target_path"
# 复制目录里的内容,而不是目录本身
cp -r "$src"/* "$target_path"
echo "Copied $src/* → $target_path"
else
mkdir -p "$(dirname "$target_path")"
cp "$src" "$target_path"
echo "Copied $src → $target_path"
fi
else
echo "Source not found: $src"
fi
done

- name: Commit and push changes
run: |
cd demo
git config --global user.email "actions@github.com"
git config --global user.name "github-actions[bot]"

# 检查是否有变化
if [[ -n $(git status --porcelain) ]]; then
git add .
git commit -m "Auto-sync demo branch with main branch"
git push origin demo
echo "Changes pushed to demo branch"
else
echo "No changes to sync"
fi

push-hf:
needs: sync-demo
uses: ./.github/workflows/push-to-hf.yml
secrets:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
push-ms:
needs: sync-demo
uses: ./.github/workflows/push-to-ms.yml
secrets:
MS_TOKEN: ${{ secrets.MS_TOKEN }}
with:
ref: demo
Loading