Description
The Docker API server lags behind the Python library. This issue tracks adding endpoints/parameters to expose the following library features:
1. Adaptive crawling
- AdaptiveCrawler, AdaptiveConfig, CrawlState, CrawlStrategy, StatisticalStrategy
- Missing: endpoints to run/tune adaptive crawls
2. C4A Script language
- c4a_compile, c4a_validate, c4a_compile_file, CompilationResult, ValidationResult, ErrorDetail
- Missing: submit/validate/execute script endpoints
3. URL seeding
- AsyncUrlSeeder, SeedingConfig
- Missing: sitemap/common-crawl/discovery endpoints
4. Chunking
- ChunkingStrategy, RegexChunking
- Missing: chunking configuration
5. Browser adapters
- BrowserAdapter, PlaywrightAdapter, UndetectedAdapter
- Missing: adapter/stealth selection
6. Proxy rotation
- ProxyRotationStrategy, RoundRobinProxyStrategy
- Missing: rotation strategy selection (beyond raw proxy)
7. Dispatchers
- SemaphoreDispatcher, BaseDispatcher
- Missing: dispatcher selection (only MemoryAdaptive used internally)
8. Link preview
- LinkPreview, LinkPreviewConfig
- Missing: link preview/scoring endpoint
9. Profiling/monitoring
- BrowserProfiler, CrawlerMonitor
- Missing: profiling/monitoring endpoints
10. HTTP-only crawling
- HTTPCrawlerConfig
- Missing: HTTP crawler methods/params (non-browser). API uses browser-based crawling with LXMLWebScrapingStrategy
11. Virtual scroll
- VirtualScrollConfig
- Missing: infinite-scroll capture configuration
12. Undetected/stealth browser
- UndetectedAdapter; browser_config/browser_type='undetected'; stealth options
- Missing: explicit stealth mode controls
Acceptance criteria
1. New/extended endpoints and/or request schemas added
- New endpoints: Add missing API routes (e.g.,
/adaptive/crawl, /deep-crawl, /c4a-script/compile, /hub/crawlers)
- Extended schemas: Enhance existing endpoints to accept new parameters (e.g., add
virtual_scroll_config to /crawl, add table_extraction_strategy options)
- Request schemas: Update
schemas.py to include new request models for the missing features
2. Docs and examples updated
- API documentation: Update the docs to show new endpoints and parameters
- Parameter documentation: Add descriptions, examples, and validation rules for new fields
- Examples: Add working code examples showing how to use each new feature.
3. Minimal e2e tests per feature group
- Test coverage: Create integration tests that verify each new feature works end-to-end
- Happy path: Test successful usage of each feature
- Validation: Test error handling (invalid parameters, edge cases, etc.)
- Feature groups: Organize tests by category (adaptive crawling, deep crawling, C4A scripts, etc.)
Description
The Docker API server lags behind the Python library. This issue tracks adding endpoints/parameters to expose the following library features:
1. Adaptive crawling
2. C4A Script language
3. URL seeding
4. Chunking
5. Browser adapters
6. Proxy rotation
7. Dispatchers
8. Link preview
9. Profiling/monitoring
10. HTTP-only crawling
11. Virtual scroll
12. Undetected/stealth browser
Acceptance criteria
1. New/extended endpoints and/or request schemas added
/adaptive/crawl,/deep-crawl,/c4a-script/compile,/hub/crawlers)virtual_scroll_configto/crawl, addtable_extraction_strategyoptions)schemas.pyto include new request models for the missing features2. Docs and examples updated
3. Minimal e2e tests per feature group