diff --git a/README.md b/README.md index 877172e23..d10987a2a 100644 --- a/README.md +++ b/README.md @@ -15,46 +15,53 @@ OpenEvolve implements a comprehensive evolutionary coding system with: - **Evolutionary Coding Agent**: LLM-guided evolution of entire code files (not just functions) - **Distributed Controller Loop**: Asynchronous pipeline coordinating LLMs, evaluators, and databases - **Program Database**: Storage and sampling of evolved programs with evaluation metrics -- **Prompt Sampling**: Context-rich prompts with past programs, scores, and problem descriptions +- **Prompt Sampling**: Context-rich prompts with past programs, scores, and problem descriptions - **LLM Ensemble**: Multiple language models working together for code generation - **Multi-objective Optimization**: Simultaneous optimization of multiple evaluation metrics - **Checkpoint System**: Automatic saving and resuming of evolution state #### 🔬 **Scientific Reproducibility** + - **Comprehensive Seeding**: Full deterministic reproduction with hash-based component isolation - **Default Reproducibility**: Seed=42 by default for immediate reproducible results - **Granular Control**: Per-component seeding for LLMs, database, and evaluation pipeline -#### 🤖 **Advanced LLM Integration** +#### 🤖 **Advanced LLM Integration** + - **Ensemble Sophistication**: Weighted model combinations with intelligent fallback strategies - **Test-Time Compute**: Integration with [optillm](https://github.com/codelion/optillm) for Mixture of Agents (MoA) and enhanced reasoning - **Universal API Support**: Works with any OpenAI-compatible endpoint (Anthropic, Google, local models) - **Plugin Ecosystem**: Support for optillm plugins (readurls, executecode, z3_solver, etc.) #### 🧬 **Evolution Algorithm Innovations** -- **MAP-Elites Implementation**: Quality-diversity algorithm for balanced exploration/exploitation + +- **MAP-Elites Implementation**: Quality-diversity algorithm for balanced exploration/exploitation - **Island-Based Evolution**: Multiple populations with periodic migration for diversity maintenance - **Inspiration vs Performance**: Sophisticated prompt engineering separating top performers from diverse inspirations - **Multi-Strategy Selection**: Elite, diverse, and exploratory program sampling strategies #### 📊 **Evaluation & Feedback Systems** + - **Artifacts Side-Channel**: Capture build errors, profiling data, and execution feedback for LLM improvement - **Cascade Evaluation**: Multi-stage testing with progressive complexity for efficient resource usage - **LLM-Based Feedback**: Automated code quality assessment and reasoning capture - **Comprehensive Error Handling**: Graceful recovery from evaluation failures with detailed diagnostics #### 🌐 **Multi-Language & Platform Support** + - **Language Agnostic**: Python, Rust, R, Metal shaders, and more - **Platform Optimization**: Apple Silicon GPU kernels, CUDA optimization, CPU-specific tuning - **Framework Integration**: MLX, PyTorch, scientific computing libraries #### 🔧 **Developer Experience & Tooling** + - **Real-Time Visualization**: Interactive web-based evolution tree viewer with performance analytics - **Advanced CLI**: Rich command-line interface with checkpoint management and configuration override - **Comprehensive Examples**: 12+ diverse examples spanning optimization, ML, systems programming, and scientific computing - **Error Recovery**: Robust checkpoint loading with automatic fix for common serialization issues #### 🚀 **Performance & Scalability** + - **Threaded Parallelism**: High-throughput asynchronous evaluation pipeline - **Resource Management**: Memory limits, timeouts, and resource monitoring - **Efficient Storage**: Optimized database with artifact management and cleanup policies @@ -68,17 +75,20 @@ OpenEvolve orchestrates a sophisticated evolutionary pipeline: ### Core Evolution Loop 1. **Enhanced Prompt Sampler**: Creates rich prompts containing: - - Top-performing programs (for optimization guidance) + + - Top-performing programs (for optimization guidance) - Diverse inspiration programs (for creative exploration) - Execution artifacts and error feedback - Dynamic documentation fetching (via optillm plugins) -2. **Intelligent LLM Ensemble**: +2. **Intelligent LLM Ensemble**: + - Weighted model combinations for quality/speed tradeoffs - Test-time compute techniques (MoA, chain-of-thought, reflection) - Deterministic selection with comprehensive seeding 3. **Advanced Evaluator Pool**: + - Multi-stage cascade evaluation - Artifact collection for detailed feedback - LLM-based code quality assessment @@ -95,6 +105,7 @@ OpenEvolve orchestrates a sophisticated evolutionary pipeline: ### Installation To install natively, use: + ```bash git clone https://github.com/codelion/openevolve.git cd openevolve @@ -108,18 +119,18 @@ pip install -e . OpenEvolve uses the OpenAI SDK, which means it works with any LLM provider that supports an OpenAI-compatible API: 1. **Set the API Key**: Export the `OPENAI_API_KEY` environment variable: + ```bash export OPENAI_API_KEY=your-api-key-here ``` -2. **Using Alternative LLM Providers**: +2. **Using Alternative LLM Providers**: - For providers other than OpenAI (e.g., Anthropic, Cohere, local models), update the `api_base` in your config.yaml: ```yaml llm: api_base: "https://your-provider-endpoint.com/v1" ``` - -3. **Maximum Flexibility with optillm**: +3. **Maximum Flexibility with optillm**: - For advanced routing, rate limiting, or using multiple providers, we recommend [optillm](https://github.com/codelion/optillm) - optillm acts as a proxy that can route requests to different LLMs based on your rules - Simply point `api_base` to your optillm instance: @@ -140,7 +151,7 @@ if not os.environ.get("OPENAI_API_KEY"): # Initialize the system evolve = OpenEvolve( - initial_program_path="path/to/initial_program.py", + initial_programs_paths=["path/to/initial_program.py"], evaluation_file="path/to/evaluator.py", config_path="path/to/config.yaml" ) @@ -172,6 +183,7 @@ python openevolve-run.py path/to/initial_program.py path/to/evaluator.py \ ``` When resuming from a checkpoint: + - The system loads all previously evolved programs and their metrics - Checkpoint numbering continues from where it left off (e.g., if loaded from checkpoint_50, the next checkpoint will be checkpoint_60) - All evolution state is preserved (best programs, feature maps, archives, etc.) @@ -234,6 +246,7 @@ python scripts/visualizer.py --path examples/function_minimization/openevolve_ou ``` In the visualization UI, you can + - see the branching of your program evolution in a network visualization, with node radius chosen by the program fitness (= the currently selected metric), - see the parent-child relationship of nodes and click through them in the sidebar (use the yellow locator icon in the sidebar to center the node in the graph), - select the metric of interest (with the available metric choices depending on your data set), @@ -246,6 +259,7 @@ In the visualization UI, you can ### Docker You can also install and execute via Docker: + ```bash docker build -t openevolve . docker run --rm -v $(pwd):/app --network="host" openevolve examples/function_minimization/initial_program.py examples/function_minimization/evaluator.py --config examples/function_minimization/config.yaml --iterations 1000 @@ -258,38 +272,39 @@ OpenEvolve is highly configurable with advanced options: ```yaml # Example configuration showcasing advanced features max_iterations: 1000 -random_seed: 42 # Full reproducibility by default +random_seed: 42 # Full reproducibility by default llm: # Advanced ensemble configuration models: - name: "gemini-2.0-flash-lite" weight: 0.7 - - name: "moa&readurls-gemini-2.0-flash" # optillm test-time compute + - name: "moa&readurls-gemini-2.0-flash" # optillm test-time compute weight: 0.3 temperature: 0.7 - + database: # MAP-Elites configuration population_size: 500 - num_islands: 5 # Island-based evolution + num_islands: 5 # Island-based evolution migration_interval: 20 - feature_dimensions: ["score", "complexity"] # Quality-diversity features - + feature_dimensions: ["score", "complexity"] # Quality-diversity features + evaluator: # Advanced evaluation features - enable_artifacts: true # Capture execution feedback - cascade_evaluation: true # Multi-stage testing - use_llm_feedback: true # AI-based code quality assessment - + enable_artifacts: true # Capture execution feedback + cascade_evaluation: true # Multi-stage testing + use_llm_feedback: true # AI-based code quality assessment + prompt: # Sophisticated prompt engineering - num_top_programs: 3 # Performance examples - num_diverse_programs: 2 # Creative inspiration - include_artifacts: true # Execution feedback + num_top_programs: 3 # Performance examples + num_diverse_programs: 2 # Creative inspiration + include_artifacts: true # Execution feedback ``` Sample configuration files are available in the `configs/` directory: + - `default_config.yaml`: Comprehensive configuration with all available options - `island_config_example.yaml`: Advanced island-based evolution setup @@ -317,18 +332,23 @@ return EvaluationResult( ``` The next generation prompt will include: + ```markdown ## Last Execution Output + ### Stderr + SyntaxError: invalid syntax (line 15) ### Traceback + ... ``` ## Example: LLM Feedback An example for an LLM artifact side channel is part of the default evaluation template, which ends with + ```markdown Return your evaluation as a JSON object with the following format: {{ @@ -338,6 +358,7 @@ Return your evaluation as a JSON object with the following format: "reasoning": "[brief explanation of scores]" }} ``` + The non-float values, in this case the "reasoning" key of the json response that the evaluator LLM generates, will be available within the next generation prompt. ### Configuration @@ -351,7 +372,7 @@ evaluator: prompt: include_artifacts: true - max_artifact_bytes: 4096 # 4KB limit in prompts + max_artifact_bytes: 4096 # 4KB limit in prompts artifact_security_filter: true ``` @@ -374,57 +395,76 @@ See the `examples/` directory for complete examples of using OpenEvolve on vario ### Mathematical Optimization #### [Function Minimization](examples/function_minimization/) + A comprehensive example demonstrating evolution from random search to sophisticated simulated annealing. #### [Circle Packing](examples/circle_packing/) + Our implementation of the circle packing problem. For the n=26 case, we achieve state-of-the-art results matching published benchmarks. -Below is the optimal packing found by OpenEvolve after 800 iterations: +<<<<<<< HEAD +Key features: + +- Automatic generation of initial programs from benchmark tasks +- Evolution from simple linear models to complex mathematical expressions +- Evaluation on physics, chemistry, biology, and material science datasets +- # Competitive results compared to state-of-the-art symbolic regression methods + Below is the optimal packing found by OpenEvolve after 800 iterations: + > > > > > > > upstream/main ![circle-packing-result](https://github.com/user-attachments/assets/00100f9e-2ac3-445b-9266-0398b7174193) ### Advanced AI & LLM Integration #### [Web Scraper with optillm](examples/web_scraper_optillm/) + Demonstrates integration with [optillm](https://github.com/codelion/optillm) for test-time compute optimization, including: + - **readurls plugin**: Automatic documentation fetching -- **Mixture of Agents (MoA)**: Multi-response synthesis for improved accuracy +- **Mixture of Agents (MoA)**: Multi-response synthesis for improved accuracy - **Local model optimization**: Enhanced reasoning with smaller models #### [LLM Prompt Optimization](examples/llm_prompt_optimazation/) + Evolving prompts themselves for better LLM performance, demonstrating self-improving AI systems. ### Systems & Performance Optimization #### [MLX Metal Kernel Optimization](examples/mlx_metal_kernel_opt/) + Automated discovery of custom GPU kernels for Apple Silicon, achieving: + - **2-3x speedup** over baseline attention implementations - **Hardware-aware optimizations** for unified memory architecture - **Metal shader evolution** with numerical correctness validation #### [Rust Adaptive Sort](examples/rust_adaptive_sort/) + Evolution of sorting algorithms that adapt to data patterns, showcasing OpenEvolve's language-agnostic capabilities. ### Scientific Computing & Discovery #### [Symbolic Regression](examples/symbolic_regression/) + A comprehensive example demonstrating automated discovery of mathematical expressions from scientific datasets using the LLM-SRBench benchmark. #### [R Robust Regression](examples/r_robust_regression/) + Developing robust regression methods resistant to outliers using R language support. #### [Signal Processing](examples/signal_processing/) + Automated design of digital filters with superior performance characteristics. ### Web and Integration Examples #### [Online Judge Programming](examples/online_judge_programming/) + Automated competitive programming solution generation with external evaluation systems. #### [LM-Eval Integration](examples/lm_eval/) -Working with standard ML evaluation harnesses for automated benchmark improvement. - +Working with standard ML evaluation harnesses for automated benchmark improvement. ## Preparing Your Own Problems @@ -448,5 +488,3 @@ If you use OpenEvolve in your research, please cite: url = {https://github.com/codelion/openevolve} } ``` - - diff --git a/configs/README.md b/configs/README.md index 6ce24383c..4fd43c4fa 100644 --- a/configs/README.md +++ b/configs/README.md @@ -5,7 +5,9 @@ This directory contains configuration files for OpenEvolve with examples for dif ## Configuration Files ### `default_config.yaml` + The main configuration file containing all available options with sensible defaults. This file includes: + - Complete documentation for all configuration parameters - Default values for all settings - **Island-based evolution parameters** for proper evolutionary diversity @@ -13,15 +15,19 @@ The main configuration file containing all available options with sensible defau Use this file as a template for your own configurations. ### `island_config_example.yaml` + A practical example configuration demonstrating proper island-based evolution setup. Shows: + - Recommended island settings for most use cases - Balanced migration parameters - Complete working configuration ### `island_examples.yaml` + Multiple example configurations for different scenarios: + - **Maximum Diversity**: Many islands, frequent migration -- **Focused Exploration**: Few islands, rare migration +- **Focused Exploration**: Few islands, rare migration - **Balanced Approach**: Default recommended settings - **Quick Exploration**: Small-scale rapid testing - **Large-Scale Evolution**: Complex optimization runs @@ -34,9 +40,9 @@ The key new parameters for proper evolutionary diversity are: ```yaml database: - num_islands: 5 # Number of separate populations - migration_interval: 50 # Migrate every N generations - migration_rate: 0.1 # Fraction of top programs to migrate + num_islands: 5 # Number of separate populations + migration_interval: 50 # Migrate every N generations + migration_rate: 0.1 # Fraction of top programs to migrate ``` ### Parameter Guidelines @@ -66,8 +72,8 @@ Then use with OpenEvolve: ```python from openevolve import OpenEvolve evolve = OpenEvolve( - initial_program_path="program.py", - evaluation_file="evaluator.py", + initial_program_paths=["program.py"], + evaluation_file="evaluator.py", config_path="my_config.yaml" ) ``` diff --git a/openevolve/cli.py b/openevolve/cli.py index dd5d707dd..cb10daf80 100644 --- a/openevolve/cli.py +++ b/openevolve/cli.py @@ -19,12 +19,17 @@ def parse_args() -> argparse.Namespace: """Parse command-line arguments""" parser = argparse.ArgumentParser(description="OpenEvolve - Evolutionary coding agent") - parser.add_argument("initial_program", help="Path to the initial program file") - parser.add_argument( "evaluation_file", help="Path to the evaluation file containing an 'evaluate' function" ) + parser.add_argument( + "initial_programs", + nargs="+", + help="Path(s) to one or more initial program files", + default=None, + ) + parser.add_argument("--config", "-c", help="Path to configuration file (YAML)", default=None) parser.add_argument("--output", "-o", help="Output directory for results", default=None) @@ -69,11 +74,13 @@ async def main_async() -> int: """ args = parse_args() - # Check if files exist - if not os.path.exists(args.initial_program): - print(f"Error: Initial program file '{args.initial_program}' not found") - return 1 + # Check if files exist. + for program in args.initial_programs: + if not os.path.isfile(program): + print(f"Error: Initial program file '{program}' does not exist") + return 1 + if not os.path.exists(args.evaluation_file): print(f"Error: Evaluation file '{args.evaluation_file}' not found") return 1 @@ -100,7 +107,7 @@ async def main_async() -> int: # Initialize OpenEvolve try: openevolve = OpenEvolve( - initial_program_path=args.initial_program, + initial_programs_paths=args.initial_programs, evaluation_file=args.evaluation_file, config=config, config_path=args.config if config is None else None, diff --git a/openevolve/controller.py b/openevolve/controller.py index 0c0ab4d29..18405d98f 100644 --- a/openevolve/controller.py +++ b/openevolve/controller.py @@ -72,7 +72,7 @@ class OpenEvolve: def __init__( self, - initial_program_path: str, + initial_programs_paths: List[str], evaluation_file: str, config_path: Optional[str] = None, config: Optional[Config] = None, @@ -86,9 +86,15 @@ def __init__( # Load from file or use defaults self.config = load_config(config_path) - # Set up output directory + # Assert that initial_programs_paths is a list, and not empty + if not initial_programs_paths: + raise ValueError("initial_programs_paths must be a non-empty list of file paths") + + # Set up output directory. + # If output_dir is specified, use it + # Otherwise, if initial_programs_paths has a single path, use the directory of the initial program. self.output_dir = output_dir or os.path.join( - os.path.dirname(initial_program_path), "openevolve_output" + os.path.dirname(initial_programs_paths[0]), "openevolve_output" ) os.makedirs(self.output_dir, exist_ok=True) @@ -122,13 +128,15 @@ def __init__( logger.debug(f"Generated LLM seed: {llm_seed}") # Load initial program - self.initial_program_path = initial_program_path - self.initial_program_code = self._load_initial_program() + self.initial_programs_paths = initial_programs_paths + self.initial_programs_code = self._load_initial_programs() + + # Assume all initial programs are in the same language if not self.config.language: - self.config.language = extract_code_language(self.initial_program_code) + self.config.language = extract_code_language(self.initial_programs_code[0]) # Extract file extension from initial program - self.file_extension = os.path.splitext(initial_program_path)[1] + self.file_extension = os.path.splitext(initial_programs_paths[0])[1] if not self.file_extension: # Default to .py if no extension found self.file_extension = ".py" @@ -136,6 +144,15 @@ def __init__( # Make sure it starts with a dot if not self.file_extension.startswith("."): self.file_extension = f".{self.file_extension}" + + # Check that all files have the same extension + for path in initial_programs_paths[1:]: + ext = os.path.splitext(path)[1] + if ext != self.file_extension: + raise ValueError( + f"All initial program files must have the same extension. " + f"Expected {self.file_extension}, but got {ext} for {path}" + ) # Initialize components self.llm_ensemble = LLMEnsemble(self.config.llm.models) @@ -160,8 +177,8 @@ def __init__( ) self.evaluation_file = evaluation_file - logger.info(f"Initialized OpenEvolve with {initial_program_path}") - + logger.info(f"Initialized OpenEvolve with {initial_programs_paths}") + # Initialize improved parallel processing components self.parallel_controller = None @@ -189,10 +206,13 @@ def _setup_logging(self) -> None: logger.info(f"Logging to {log_file}") - def _load_initial_program(self) -> str: - """Load the initial program from file""" - with open(self.initial_program_path, "r") as f: - return f.read() + def _load_initial_programs(self) -> List[str]: + """Load the initial programs from file""" + programs: List[str] = [] + for path in self.initial_programs_paths: + with open(path, "r") as f: + programs.append(f.read()) + return programs async def run( self, @@ -226,29 +246,34 @@ async def run( should_add_initial = ( start_iteration == 0 and len(self.database.programs) == 0 - and not any( - p.code == self.initial_program_code for p in self.database.programs.values() - ) ) if should_add_initial: - logger.info("Adding initial program to database") - initial_program_id = str(uuid.uuid4()) + logger.info("Adding initial programs to database") - # Evaluate the initial program - initial_metrics = await self.evaluator.evaluate_program( - self.initial_program_code, initial_program_id - ) + if len(self.initial_programs_code) > len(self.database.islands): + raise ValueError( + "Number of initial programs exceeds number of islands." + ) - initial_program = Program( - id=initial_program_id, - code=self.initial_program_code, - language=self.config.language, - metrics=initial_metrics, - iteration_found=start_iteration, - ) + for i, code in enumerate(self.initial_programs_code): + initial_program_id = str(uuid.uuid4()) + + # Evaluate the initial program + initial_metrics = await self.evaluator.evaluate_program( + code, initial_program_id + ) + + initial_program = Program( + id=initial_program_id, + code=code, + language=self.config.language, + metrics=initial_metrics, + iteration_found=start_iteration, + ) - self.database.add(initial_program) + # TODO. Should the island be incremented and reset here? + self.database.add(initial_program, 0, i) else: logger.info( f"Skipping initial program addition (resuming from iteration {start_iteration} " diff --git a/tests/test_checkpoint_resume.py b/tests/test_checkpoint_resume.py index 08baaf956..c2db03b7c 100644 --- a/tests/test_checkpoint_resume.py +++ b/tests/test_checkpoint_resume.py @@ -86,7 +86,7 @@ async def run_test(): mock_evaluator_class.return_value = mock_evaluator controller = OpenEvolve( - initial_program_path=self.test_program_path, + initial_programs_paths=[self.test_program_path], evaluation_file=self.evaluator_path, config=self.config, output_dir=self.test_dir, @@ -127,7 +127,7 @@ async def run_test(): mock_evaluator_class.return_value = mock_evaluator controller = OpenEvolve( - initial_program_path=self.test_program_path, + initial_programs_paths=[self.test_program_path], evaluation_file=self.evaluator_path, config=self.config, output_dir=self.test_dir, @@ -169,7 +169,7 @@ async def run_test(): mock_evaluator_class.return_value = mock_evaluator controller = OpenEvolve( - initial_program_path=self.test_program_path, + initial_programs_paths=[self.test_program_path], evaluation_file=self.evaluator_path, config=self.config, output_dir=self.test_dir, @@ -219,7 +219,7 @@ async def run_test(): mock_evaluator_class.return_value = mock_evaluator controller = OpenEvolve( - initial_program_path=self.test_program_path, + initial_programs_paths=[self.test_program_path], evaluation_file=self.evaluator_path, config=self.config, output_dir=self.test_dir, @@ -269,7 +269,7 @@ async def run_test(): mock_evaluator_class.return_value = mock_evaluator controller = OpenEvolve( - initial_program_path=self.test_program_path, + initial_programs_paths=[self.test_program_path], evaluation_file=self.evaluator_path, config=self.config, output_dir=self.test_dir,