Skip to content

Commit 586c4f5

Browse files
committed
Merge branch 'main' into reflective-mouse-click
2 parents 3b9263f + 90dd056 commit 586c4f5

File tree

3 files changed

+31
-15
lines changed

3 files changed

+31
-15
lines changed

CONTRIBUTING.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,25 @@
1-
Contributing
2-
============
1+
# Contributing
2+
We appreciate your contributions!
33

4+
## Process
45
1. Fork it
56
2. Create your feature branch (`git checkout -b my-new-feature`)
67
3. Commit your changes (`git commit -am 'Add some feature'`)
78
4. Push to the branch (`git push origin my-new-feature`)
89
5. Create new Pull Request
10+
11+
## Contribution Ideas
12+
- **Remove necessity for `pip install .`**: I think by uploading packages to PyPi we can reduce the installation code steps by consolidating `pip install -r requirements.txt` and `pip install .`. If that's possible that'd be great.
13+
- **Improve performance by finding optimal screenshot grid**: A primary element of the framework is that it overlays a percentage grid on the screenshot which GPT-4v uses to estimate click locations. If someone is able to find the optimal grid and some evaluation metrics to confirm it is an improvement on the current method then we will merge that PR.
14+
- **Improve the `SUMMARY_PROMPT`**
15+
- **Create an evaluation system**
16+
- **Improve Linux and Windows compatibility**: There are still some issues with Linux and Windows compatibility. PRs to fix the issues are encouraged.
17+
- **Enabling New Mouse Capabilities**: (drag, hover, etc.)
18+
- **Adding New Multimodal Models**: Integration of new multimodal models is welcomed. If you have a specific model in mind that you believe would be a valuable addition, please feel free to integrate it and submit a PR.
19+
- **Framework Architecture Improvements**: Think you can enhance the framework architecture described in the intro? We welcome suggestions and PRs.
20+
21+
## Guidelines
22+
This will primarily be a [Software 2.0](https://karpathy.medium.com/software-2-0-a64152b37c35) project. For this reason:
23+
24+
- Let's try to hold off refactors into separate files until `main.py` is more than 1000 lines
25+

README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -89,17 +89,13 @@ operate
8989
<img src="https://github.com/OthersideAI/self-operating-computer/blob/main/readme/terminal-access-2.png" width="300" style="margin: 10px;"/>
9090
</div>
9191

92-
### Contributions are Welcomed! Some Ideas:
93-
- **Improve performance by finding optimal screenshot grid**: A primary element of the framework is that it overlays a percentage grid on the screenshot which GPT-4v uses to estimate click locations. If someone is able to find the optimal grid and some evaluation metrics to confirm it is an improvement on the current method then we will merge that PR.
94-
- **Improve the `SUMMARY_PROMPT`**
95-
- **Create an evaluation system**
96-
- **Improve Linux and Windows compatibility**: There are still some issues with Linux and Windows compatibility. PRs to fix the issues are encouraged.
97-
- **Enabling New Mouse Capabilities**: (drag, hover, etc.)
98-
- **Adding New Multimodal Models**: Integration of new multimodal models is welcomed. If you have a specific model in mind that you believe would be a valuable addition, please feel free to integrate it and submit a PR.
99-
- **Framework Architecture Improvements**: Think you can enhance the framework architecture described in the intro? We welcome suggestions and PRs.
100-
- **Implement a Reflective Mouse Click Mode**: Introduce a new mode that enhances click accuracy by adding a 'reflect and correct' step. In this mode, the system will 'move mouse, reflect on position, and click if accurate; otherwise, adjust position closer.' This approach, more akin to human interaction, could increase accuracy before the implementation of `Agent-1-vision` for precise clicking. The main challenge is the increased time due to current multimodal model latency. We propose an optional `-accurate` terminal flag to activate this mode. This feature has the potential to significantly boost performance and offers an interesting area for development.
101-
102-
For any input on improving this project, feel free to reach out to [Josh](https://twitter.com/josh_bickett) on Twitter.If you want to contribute yourself, see [CONTRIBUTING.md](https://github.com/OthersideAI/self-operating-computer/blob/main/CONTRIBUTING.md).
92+
### Contributions are Welcomed!:
93+
94+
If you want to contribute yourself, see [CONTRIBUTING.md](https://github.com/OthersideAI/self-operating-computer/blob/main/CONTRIBUTING.md).
95+
96+
### Feedback
97+
98+
For any input on improving this project, feel free to reach out to [Josh](https://twitter.com/josh_bickett) on Twitter.
10399

104100
### Follow HyperWriteAI for More Updates
105101

@@ -108,4 +104,8 @@ Stay updated with the latest developments:
108104
- Follow HyperWriteAI on [LinkedIn](https://www.linkedin.com/company/othersideai/).
109105

110106
### Compatibility
111-
- This project is compatible with Mac OS, Windows, and Linux (with X server installed).
107+
- This project is compatible with Mac OS, Windows, and Linux (with X server installed).
108+
109+
### Star History
110+
111+
[![Star History Chart](https://api.star-history.com/svg?repos=OthersideAI/self-operating-computer&type=Timeline)](https://star-history.com/#OthersideAI/self-operating-computer&Timeline)

operate/main.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
"""
44
import os
55
import time
6-
import random
76
import base64
87
import json
98
import math

0 commit comments

Comments
 (0)