feat(agent): add GraphNet Agent with single-model and multi-GPU parallel extraction #704
Conversation
… docs - Remove tests/ directory (broken test cases referencing non-existent methods) - Fix concurrent output dir collision: remove time-based Strategy 3 in SubprocessGraphExtractor._find_output_dir_robust to prevent workers from grabbing each other's output directories - Fix generated code missing import torchvision for resnet/vgg/densenet; then remove torchvision entirely — all models now go through AutoConfig - Cap input sequence length to 128 and image size to 512 in ConfigMetadataAnalyzer to prevent OOM from large max_position_embeddings - Remove hardcoded paths (/work/graphnet_workspace, GPU list [2,3,4,5], python3.12 nvidia path, /root/.comate path injection); workspace now resolves from GRAPH_NET_EXTRACT_WORKSPACE or ~/graphnet_workspace, GPUs auto-detected via CUDA_VISIBLE_DEVICES / nvidia-smi, nvidia lib path via sysconfig, PATH injection derived from found binary - Fix broken import of deleted tests module in parallel_extract.py; inline load_models_from_file / get_models_from_hf / HUGGINGFACE_HUB_AVAILABLE - Update README: remove torchvision dep, remove tests section, add parallel_extract.py detailed docs, LLM retry section, OOM limits section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for your contribution! |
| @@ -222,4 +208,8 @@ def _find_hash_named_dir(self, workspace_path: Path) -> Optional[Path]: | |||
| def _is_valid_sample_dir(self, dir_path: Path) -> bool: | |||
| """ | ||
|
|
||
|
|
||
| class ForwardVerifier(BaseSampleVerifier): |
There was a problem hiding this comment.
应该叫ModelRunnableVerifier,或者其实也可以直接用graph_net里面已经实现的ModelRunnablePredictor
| """Basic verifier that checks file existence and basic structure""" | ||
| """Basic verifier that checks file existence and basic structure. | ||
|
|
||
| Supports both single-graph and multi-subgraph (subgraph_0/, subgraph_1/, …) layouts. |
There was a problem hiding this comment.
这个BasicSampleVerifier功能和前面的_is_valid_sample_dir是重复的吧?
| Args: | ||
| timeout: seconds to wait for each forward-pass subprocess (default 5 min) | ||
| """ | ||
| self._basic = BasicSampleVerifier() |
| help="从 HuggingFace Hub 抓取的模型数量(model-list 未指定时生效,默认 100)", | ||
| ) | ||
| parser.add_argument( | ||
| "--task", |
There was a problem hiding this comment.
所以在agent里面是有用到task类中的哦,需要把它写入graph_net.json
| _print_summary(results) | ||
| print(f"\n[DONE] Total elapsed: {elapsed_total:.0f}s") | ||
|
|
||
| return 0 if results["success_rate"] > 0 else 1 |
There was a problem hiding this comment.
Pull request overview
Adds a GraphNet Agent workflow for HuggingFace model extraction with config-only model loading, LLM-assisted retry, forward verification, and multi-GPU batch extraction.
Changes:
- Adds LLM retry support, forward-pass verification, and multi-subgraph verification support.
- Adds
parallel_extract.pyfor multi-GPU batch extraction and updates Agent docs. - Changes fetching/codegen behavior to avoid downloading weights and cap input sizes; removes existing agent tests.
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 24 comments.
Show a summary per file
| File | Description |
|---|---|
graph_net/agent/graph_net_agent.py |
Updates the core extraction pipeline with optional workspace defaults, LLM retry, forward verifier, and model-name JSON fixups. |
graph_net/agent/code_generator/template_generator.py |
Switches generated scripts to config-only random-weight loading and static graph extraction names. |
graph_net/agent/code_generator/llm_code_fixer.py |
Adds LLM-based script repair via ducc/claude. |
graph_net/agent/code_generator/__init__.py |
Exports the new LLM fixer. |
graph_net/agent/graph_extractor/subprocess_graph_extractor.py |
Changes subprocess execution, timeout cleanup, workspace handling, and output directory discovery. |
graph_net/agent/metadata_analyzer/config_metadata_analyzer.py |
Caps sequence length and image size during metadata-derived input generation. |
graph_net/agent/model_fetcher/huggingface_fetcher.py |
Adds retry behavior, endpoint support, and weight-file ignore patterns for downloads. |
graph_net/agent/sample_verifier/basic_sample_verifier.py |
Extends basic verification to multi-subgraph outputs. |
graph_net/agent/sample_verifier/forward_verifier.py |
Adds subprocess-based eager forward verification. |
graph_net/agent/sample_verifier/__init__.py |
Exports ForwardVerifier. |
graph_net/agent/parallel_extract.py |
Adds shared-queue multi-GPU batch extraction CLI. |
graph_net/agent/README.md |
Updates setup, usage, workflow, LLM retry, and parallel extraction documentation. |
graph_net/agent/agent_usage.md |
Adds a detailed usage guide for single and batch extraction. |
graph_net/agent/tests/__init__.py |
Removes the agent tests package marker. |
graph_net/agent/tests/test_utils.py |
Removes utility/workspace tests. |
graph_net/agent/tests/test_model_metadata.py |
Removes metadata validation tests. |
graph_net/agent/tests/test_integration.py |
Removes integration workflow tests. |
graph_net/agent/tests/test_code_generator.py |
Removes template code generator tests. |
graph_net/agent/tests/test_batch_success_rate.py |
Removes batch success-rate test script. |
graph_net/agent/tests/run_500_models_test.py |
Removes large batch test runner. |
Comments suppressed due to low confidence (4)
graph_net/agent/tests/test_integration.py:1
- The PR deletes the agent's existing unit and integration coverage while adding new extraction, retry, verifier, and parallel scheduling behavior. Please keep or replace these tests so the existing API contracts and new paths remain covered.
graph_net/agent/agent_usage.md:136 - This flow still says the final step archives the script, but
GraphNetAgent.extract_sample()no longer calls an archive method after verification. Keeping this in the guide makes the documented pipeline disagree with the actual behavior.
⑥ 生成 graph_hash.txt + 去重检查 + 验证输出文件完整性 + 归档脚本
graph_net/agent/agent_usage.md:50
- This table repeats
/work/graphnet_workspaceas the default, butGraphNetAgentdefaults to~/graphnet_workspacewhen no workspace is provided. Please align the documented default with the implementation.
| 参数 | 默认值 | 说明 |
|------|--------|------|
| `workspace` | `/work/graphnet_workspace` | 工作目录,自动创建子目录结构 |
| `hf_token` | `None` | HF access token,公开模型无需填写 |
graph_net/agent/agent_usage.md:190
- This success checklist still includes
run_model.py, but the archive method and call were removed fromGraphNetAgent. As written, users can seeextract_sample()returnTruewhile this documented file is absent.
**Q:如何检查某次抽取是否成功?**
`extract_sample()` 返回 `True` 表示成功,同时可以检查输出目录是否存在 7 个文件:
`model.py`、`graph_net.json`、`input_meta.py`、`input_tensor_constraints.py`、
`weight_meta.py`、`graph_hash.txt`、`run_model.py`。
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Ensure GRAPH_NET_EXTRACT_WORKSPACE points to our workspace | ||
| if "GRAPH_NET_EXTRACT_WORKSPACE" not in env: | ||
| env["GRAPH_NET_EXTRACT_WORKSPACE"] = str(self.workspace) |
| # Check if all workers are done | ||
| alive = [p for p in processes if p.is_alive()] | ||
| if not alive: | ||
| break |
| # image_size may be an int or a [H, W] list | ||
| raw_size = config.get("image_size", 224) | ||
| if isinstance(raw_size, (list, tuple)): | ||
| raw_size = raw_size[0] | ||
| image_size = min(int(raw_size), _MAX_IMAGE_SIZE) |
| return f'model = AutoModel.from_pretrained("{model_path}")' | ||
| return ( | ||
| f"from transformers import AutoConfig\n" | ||
| f'_config = AutoConfig.from_pretrained("{model_path}", trust_remote_code=True)\n' |
| subgraph_dirs = sorted(sample_dir.glob("subgraph_*/")) | ||
| targets = subgraph_dirs if subgraph_dirs else [sample_dir] |
| # 设置代理(访问 HuggingFace 需要) | ||
| export http_proxy=http://agent.baidu.com:8891 | ||
| export https_proxy=http://agent.baidu.com:8891 |
| export https_proxy=http://agent.baidu.com:8891 | ||
|
|
||
| # LLM 兜底功能需要 ducc CLI(可选) | ||
| export PATH="/root/.comate/baidu-cc/bin:$PATH" |
| # Candidate binary names / paths to search for ducc CLI | ||
| _DUCC_CANDIDATES = [ | ||
| "ducc", | ||
| "claude", | ||
| "/root/.comate/baidu-cc/bin/ducc", | ||
| "/usr/local/bin/ducc", | ||
| os.path.expanduser("~/.local/bin/ducc"), | ||
| ] | ||
|
|
| def __init__( | ||
| self, | ||
| workspace: str, | ||
| workspace: Optional[str] = None, | ||
| hf_token: Optional[str] = None, | ||
| llm_retry: bool = True, | ||
| ): |
| _GRAPHNET_ROOT = _SCRIPT_DIR.parent.parent # GraphNet/ | ||
| if str(_GRAPHNET_ROOT) not in sys.path: | ||
| sys.path.insert(0, str(_GRAPHNET_ROOT)) |
…_extract to English - Remove baidu proxy settings and baidu-cc PATH from agent_usage.md - Remove /root/.comate/baidu-cc/bin/ducc hardcoded path from llm_code_fixer.py - Translate all Chinese comments/docstrings/help text in parallel_extract.py to English - Remove _setup_nvidia_ld_library_path and unused sysconfig import from parallel_extract.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
luotao1
left a comment
There was a problem hiding this comment.
parallel_extract_20260418_040238.json 可以删掉
| # 目录 | ||
| 在GraphNet目录下运行即可,不需要安装 | ||
|
|
||
| ``` |
PR Category
Feature Enhancement
Description
新增 GraphNet Agent 模块,实现从 HuggingFace model ID 到 GraphNet Sample 的全自动化抽取流水线,支持单模型抽取和多 GPU并行批量抽取。
ducc/claude -p修复脚本并最多重试 2 次parallel_extract.py基于共享任务队列动态调度,天然负载均衡,支持从文件或 HuggingFace Hub 批量获取模型列表max_position_embeddings(可达 131072)直接导致 OOM