Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1168 commits
Select commit Hold shift + click to select a range
4a959d4
fix rustic install
williamstein Nov 17, 2025
181241c
fix another failing test involving file watching on the backend
williamstein Nov 17, 2025
5fa10b5
fix the sandbox unit tests
williamstein Nov 17, 2025
15beb60
fixing more unit tests (due to chokidar subs for node watch)
williamstein Nov 18, 2025
49d6703
fix in conat core service test
williamstein Nov 18, 2025
98bbb8d
fix some sync conflict unit tests
williamstein Nov 18, 2025
99c4d2f
fix unit tests of sync (which were broken by changes elsewhere)
williamstein Nov 18, 2025
7372756
fix another badly written sync test
williamstein Nov 18, 2025
465228e
sync: fix event not being emited related to loading from disk
williamstein Nov 18, 2025
4251070
explain why a test fails involving sync editing and deleting a file
williamstein Nov 18, 2025
1c11569
updating more unit tests to chokidar semantics
williamstein Nov 18, 2025
edb6cc2
do not show disk usage when quota isn't known
williamstein Nov 18, 2025
ada9f62
only show ssh gateway when configured; move copy button to left side …
williamstein Nov 18, 2025
674df54
delete the "ghost file tabs" tracking -- that was a leftover hack tha…
williamstein Nov 18, 2025
c4c9cbd
simplify the file tab resize prevention code
williamstein Nov 18, 2025
d194c82
typescript error
williamstein Nov 18, 2025
e195276
lite mode: set that project is running in context (since it is always…
williamstein Nov 18, 2025
92d2e75
cocalc-lite --> cocalc-plus
williamstein Nov 18, 2025
01834b2
lite: better-sqlite --> node:sqlite
williamstein Nov 18, 2025
5604397
lite: hook user queries into the new sqlite system
williamstein Nov 18, 2025
f905108
add unit test of new sqlite hub lite functionality
williamstein Nov 18, 2025
5d781cd
lite: new sqlite db -- make changefeed provide initial result
williamstein Nov 18, 2025
c95a131
remove the old dkv-based user-query implementation of the lite databa…
williamstein Nov 18, 2025
36cb858
lite: surface admin settings via customize
williamstein Nov 18, 2025
98a31a0
lite: add no-op touch
williamstein Nov 18, 2025
3608272
fix how reflect-sync is installed into backend
williamstein Nov 18, 2025
07a62d5
add back normalize openai models
williamstein Nov 18, 2025
da9c590
fix issue with an import from file-server and our jest tests
williamstein Nov 18, 2025
97fb42d
switch to using evaluateWithLangChain for user defined model evaluation
williamstein Nov 18, 2025
f3ce894
llm: rewrite user defined llm to use langchain instead of our old cus…
williamstein Nov 18, 2025
dcbb4e2
new @cocalc/ai package
williamstein Nov 18, 2025
999d78f
drop heavy gpt3-tokenizer from @cocalc/ai and add a simple heuristic …
williamstein Nov 18, 2025
c846050
add basic test suite to @cocalc/ai
williamstein Nov 18, 2025
8fcfdc1
add more unit tests
williamstein Nov 18, 2025
a9ef434
@cocalc/ai: add streaming test
williamstein Nov 18, 2025
95bd9d1
@cocalc/ai: more tests of history
williamstein Nov 18, 2025
dd63513
ai: more unit tests
williamstein Nov 18, 2025
8c67958
ai: more tests
williamstein Nov 18, 2025
7bd445d
work in progress switching server/llm to use the new @cocalc/ai package
williamstein Nov 18, 2025
78cda79
ai dic generator -- don't allow request to create notebook if no kern…
williamstein Nov 18, 2025
4e6c1a5
switch to using @cocalc/ai from server
williamstein Nov 18, 2025
f05e19a
swap hint v solution
williamstein Nov 18, 2025
7b348bc
lite: support llm conat server endpoint
williamstein Nov 18, 2025
fbc478c
lite: more settings
williamstein Nov 19, 2025
cdedb9d
lite: get llm @mention to work
williamstein Nov 19, 2025
7085f85
lite: add easy way to configure AI api keys
williamstein Nov 19, 2025
33b5e64
make state of lite ai config better
williamstein Nov 19, 2025
2812a65
hopefully make ai keys not be mistaken for passwords
williamstein Nov 19, 2025
d37f2db
remove some leftover not-important websocketfs remnants
williamstein Nov 19, 2025
5da29fd
remove zstd-napi (switch to native nodejs)
williamstein Nov 19, 2025
b40ae85
be explicit about needing nodejs v22+
williamstein Nov 19, 2025
8d2c776
switch better-sqlite3 --> node:sqlite, which shrinks size and removes…
williamstein Nov 19, 2025
a6c1bd5
fix more typescript issues related to using sqlite:node
williamstein Nov 19, 2025
541c833
aligning versions
williamstein Nov 19, 2025
03279df
finish aligning version
williamstein Nov 19, 2025
a3b7fec
fix bug in better-sqlite3 --> node:sqlite switch
williamstein Nov 19, 2025
0e386cb
new lite version
williamstein Nov 19, 2025
c81e4ea
Merge branch 'master' into fs2
williamstein Nov 19, 2025
42452ce
Merge branch 'fs2' of github.com:sagemathinc/cocalc into fs2
williamstein Nov 19, 2025
45863ee
unblock langchain core upgrade and put in some path hacks to tsconfig…
williamstein Nov 20, 2025
30efdcc
Merge branch 'master' into fs2
williamstein Nov 20, 2025
d569527
fix using nextjs + newest langchain
williamstein Nov 20, 2025
a5e1902
added .agents/codex-plan.md for tracking progress on codex integration
williamstein Nov 20, 2025
eac5993
add minimal codex backend support
williamstein Nov 20, 2025
1981e2a
ai: end-to-end integration of codex...
williamstein Nov 20, 2025
a8eecff
get codex to work by making sure HOME is known properly to codex
williamstein Nov 20, 2025
876e30c
add codex config option; updated progress
williamstein Nov 20, 2025
d45e4da
if codex is enabled add it to the list of mentionable users (selectin…
williamstein Nov 20, 2025
1d77073
get @mentioning codex to work
williamstein Nov 20, 2025
5252125
add scratch to gitignore; update the plan
williamstein Nov 20, 2025
8048779
add new codex configuration button and model to chat
williamstein Nov 20, 2025
9606256
codex: wire up the api so it actually sends user requests and gets ba…
williamstein Nov 20, 2025
f132dba
codex: first attempt at rendering events
williamstein Nov 21, 2025
0ec10e6
codex: start work on codex config
williamstein Nov 21, 2025
eab392e
codex: working on more settings
williamstein Nov 21, 2025
d301634
codex: don't silently hide errors
williamstein Nov 21, 2025
435b0c4
codex: actually save/load config to the chatroom, at least
williamstein Nov 21, 2025
4375b25
codex: use the configured options
williamstein Nov 21, 2025
5798960
codex: show model info on button
williamstein Nov 21, 2025
d4e559b
codex: show remaining context info
williamstein Nov 21, 2025
bc32ea0
update the plan
williamstein Nov 21, 2025
d174c3a
codex: render activity
williamstein Nov 21, 2025
c596016
codex: very basic file change info
williamstein Nov 21, 2025
3cb331e
codex: first attempt at showing diffs; not useful yet
williamstein Nov 21, 2025
c4b0b0f
agent: wire up a full acp protocol hello world front-to-back
williamstein Nov 21, 2025
9c9acec
agent: acp support for codex!
williamstein Nov 22, 2025
809be9f
agent: switch to using acp in the frontend for the actual chat
williamstein Nov 22, 2025
9c96361
agent: deleting the codex-sdk approach, step (1)
williamstein Nov 22, 2025
bb9b94f
agent: finished deleting codex-cli types and rendering
williamstein Nov 22, 2025
35c104b
agent: make codex get configured
williamstein Nov 22, 2025
3416bbc
agent: codex -- add support for compactification
williamstein Nov 22, 2025
7ab8790
agent ACP: implement terminal support
williamstein Nov 22, 2025
8902e83
agent acp: add read/write text file support
williamstein Nov 22, 2025
187b6cb
agent: chat frontend -- combine adjacent messages
williamstein Nov 22, 2025
f334c8e
acp: add logger
williamstein Nov 22, 2025
d9ae70c
do not mess with the colors for messages from others.
williamstein Nov 22, 2025
b109dc9
decompressPatch now instantiates real PatchObjects, so toString() works
williamstein Nov 22, 2025
0b4c591
end-to-end proof of concept ACP diff/patch support
williamstein Nov 22, 2025
443269b
agent: frontend rendering -- fix issue with diff
williamstein Nov 22, 2025
aeb06c2
agent: add acp terminal display to the log
williamstein Nov 22, 2025
4b2afd7
wire in the cocalc/acp side of readFile
williamstein Nov 22, 2025
aea67af
ai agents -- improve logging of client capabilities
williamstein Nov 23, 2025
eb7e805
ai agent activity: lighten some backgrounds
williamstein Nov 23, 2025
5a4bd20
ai agents: end-to-end wiring to support persistent sessions with code…
williamstein Nov 23, 2025
37a9174
ai agents: work in progress adding image support
williamstein Nov 23, 2025
a156cd1
agent: images -- work in progress; trying to be more generic (not good)
williamstein Nov 23, 2025
07a4e62
agent: take a different approach to making pasted images available
williamstein Nov 23, 2025
3466958
agent: truncate terminal output in log
williamstein Nov 23, 2025
2a0b0da
agent: truncate stored chat
williamstein Nov 23, 2025
c348602
add view --> terminal command to chatrooms
williamstein Nov 23, 2025
7ac60ea
create new package @cocalc/chat and refactor a little bit of code fro…
williamstein Nov 23, 2025
15b0d86
basic unit test of new @cocalc/chat package
williamstein Nov 23, 2025
ada4fac
improve packages/lite README
williamstein Nov 23, 2025
b393400
Merge branch 'fs2' of github.com:sagemathinc/cocalc into fs2
williamstein Nov 23, 2025
237492b
adding ability to use syncdb to @cocalc/chat
williamstein Nov 23, 2025
431d260
Merge branch 'fs2' of github.com:sagemathinc/cocalc into fs2
williamstein Nov 23, 2025
e3be553
agent: handle inserting chat messages on the backend for better brows…
williamstein Nov 23, 2025
3027e0c
remove tsconfig.tsbuildinfo in all pnpm clean scripts
williamstein Nov 23, 2025
e3bfc7f
fix import error (due to our legacy module import strategy)
williamstein Nov 23, 2025
68d6924
agent chat: ensure project_id and path are defined
williamstein Nov 23, 2025
2667fd6
agent: backend chat support -- work in progress
williamstein Nov 23, 2025
ff351e5
agent chat: tighten up frontend code and fix issue with passing chat …
williamstein Nov 24, 2025
f8de736
agent: attempt to stream updates from the backend (not working)
williamstein Nov 24, 2025
2da5742
hide silly logs
williamstein Nov 24, 2025
2db1d47
just disable excessive conat socket logging
williamstein Nov 24, 2025
d82477d
acp -- backend evaluation
williamstein Nov 24, 2025
436cb77
agent: got backend only evaluation fully working
williamstein Nov 24, 2025
23f224e
update plan; show a timer and make context red when loaw
williamstein Nov 24, 2025
16cc98a
video chat button: popconfirm and not for ai
williamstein Nov 24, 2025
86ee1ad
agent: 3 sandbox modes (not sure yet...)
williamstein Nov 24, 2025
8e498ea
agent: option with codex to use native terminal, for easy sandboxing …
williamstein Nov 24, 2025
91b70ad
agent: end-to-end approvals -- not yet working or properly secure
williamstein Nov 24, 2025
dcd4096
acp: got basic approvals to work (at least for network)
williamstein Nov 25, 2025
38c0825
agent: approval attempts (may revert)
williamstein Nov 25, 2025
b760049
agent (for codex): add a hack so we get terminal output in sandbox mode
williamstein Nov 25, 2025
cc4e598
agents: codex config ui improvemements
williamstein Nov 25, 2025
78c1d92
make the codex config dialog more user friendly
williamstein Nov 25, 2025
883eb88
agent: made the execution mode look nice
williamstein Nov 25, 2025
5ad6bc8
Merge branch 'master' into fs2
williamstein Nov 25, 2025
9ec2f98
disable render_dim_file_extensions due to merge conflict
williamstein Nov 25, 2025
9dba098
misc
williamstein Nov 25, 2025
fa3dbc0
rewrite file tabs to not use the @react-hook/mouse-position hook, sin…
williamstein Nov 25, 2025
7193467
cleanup display of chat messages (a lot)
williamstein Nov 25, 2025
a605e4e
messages: remove some deprecated code
williamstein Nov 25, 2025
4301936
chat: fix a crash when message is maybe malformed
williamstein Nov 25, 2025
f686e9c
agent: ability to interrupt running codex turn
williamstein Nov 25, 2025
79dafb0
agent: interrupt in full access mode works well with this change to a…
williamstein Nov 26, 2025
cfaa9dc
agent: add interrupt event message
williamstein Nov 26, 2025
e560ba3
agent: improve style of activity log
williamstein Nov 26, 2025
43aa176
codex: tweaking the display
williamstein Nov 26, 2025
683e5ab
Add codex compact button to codex config
williamstein Nov 26, 2025
44683ab
chat: make sender messages clearer in full chatroom mode
williamstein Nov 26, 2025
a6bde25
agent: don't print exit code on success
williamstein Nov 26, 2025
9f3b94c
codex/acp: fix bug with marking a turn as done
williamstein Nov 26, 2025
f57e6af
codex: maybe fix wrong context window info
williamstein Nov 26, 2025
3bf74b7
codex: hopefully fix issues with showing used context
williamstein Nov 26, 2025
46bf21d
ts
williamstein Nov 26, 2025
2617985
quick +New reorg
williamstein Nov 26, 2025
f2930ac
+new page ts errors
williamstein Nov 26, 2025
75693b0
codex: fix bug in setting thread_id; also improve frontend activity l…
williamstein Nov 26, 2025
bc135ab
Tweak Codex prompt to emit clickable file links
williamstein Nov 26, 2025
8245055
codex: swap the order of log and output
williamstein Nov 26, 2025
e82c8a7
codex: maybe fix issue with displaying thread context size
williamstein Nov 26, 2025
a1bbc27
codex: fix displaying used context
williamstein Nov 26, 2025
bbaab2a
Persist Codex activity log toggle across Virtuoso remounts
williamstein Nov 26, 2025
5e87aa6
Add per-thread activity indicators in chat thread list
williamstein Nov 26, 2025
d020cec
Preserve slash commands when adding file-link guidance
williamstein Nov 26, 2025
417cb8a
do not show hashtag bar in chat thread
williamstein Nov 26, 2025
c84b470
Capture stderr from the child codex-acp
williamstein Nov 26, 2025
00f3949
add comment about how to limit memory usage by codex-acp via an env var
williamstein Nov 26, 2025
69d1152
chat: remove the save_to_disks all over; really no need
williamstein Nov 26, 2025
a703842
chat: autoscroll when ai is generating unless user manually scrolls
williamstein Nov 26, 2025
fb118fe
codex/agent: do not show thinking by default
williamstein Nov 26, 2025
b18aa37
codex: show thinking log when approval is needed
williamstein Nov 26, 2025
e67af0b
codex: do NOT expand log whenever generating
williamstein Nov 26, 2025
d1606b2
codex: only show final summary when it is ready
williamstein Nov 26, 2025
c14fadb
Fix codex streaming content and autoscroll
williamstein Nov 26, 2025
9415659
Emphasize low-context warnings and thread indicators
williamstein Nov 26, 2025
ad8c86a
codex: refactor to avoid duplication
williamstein Nov 26, 2025
b3a6da6
Show low-context banner only on latest thread message
williamstein Nov 26, 2025
a71e4b7
see if updating AGENTS.md can prevent dumb autocommits
williamstein Nov 26, 2025
3158b64
codex: redo context warnings
williamstein Nov 26, 2025
69b3502
codex chat: fix scroll and finall message again, hopefully
williamstein Nov 26, 2025
1e453e0
add confirmation title to chat thread delete
williamstein Nov 26, 2025
1f518ae
chat threads: scroll to first unread message
williamstein Nov 26, 2025
0013d67
chat threads: maintain scroll position
williamstein Nov 27, 2025
6e013c5
resiable side bar
williamstein Nov 27, 2025
504afe8
chat: add sidebar hide/show
williamstein Nov 27, 2025
11d6b12
chat: make chat input box height grow
williamstein Nov 27, 2025
f0b2390
chat: eliminate preview functionality entirely
williamstein Nov 27, 2025
ccd474b
chatroom: fix rendering now that composer resizes
williamstein Nov 27, 2025
27f4604
chat: eliminate animation when resizing side bar
williamstein Nov 27, 2025
de212ed
codex: compact causes a save and close
williamstein Nov 27, 2025
26d783b
autoGrow markdown editor flag so existing editors (e.g., tasks) still…
williamstein Nov 27, 2025
dfb543e
codex: improve interrupt robustness
williamstein Nov 27, 2025
9361463
codex: make path read paths clickable
williamstein Nov 27, 2025
7ea7623
codex/acp agents: make terminals look nicer
williamstein Nov 27, 2025
ac80155
codex: clean up terminal output
williamstein Nov 27, 2025
48c2551
uniform size avatars for @mentions
williamstein Nov 27, 2025
b77ff3b
chat: add "export to markdown" per thread
williamstein Nov 27, 2025
68190c7
add "Newest" button to codex chat gen to stay at bottom.
williamstein Nov 27, 2025
6f433c9
codex: heuristic diff info
williamstein Nov 27, 2025
d74800b
sage-chat --> chat rename, with backward compat and nice refactor
williamstein Nov 27, 2025
0386b09
add patchflow as a dependency
williamstein Nov 29, 2025
084f8c0
first steps to switch to using patchflow for sync
williamstein Nov 29, 2025
01112bd
add pnpm build-dev-force to make it possible to ignore all typescript…
williamstein Nov 29, 2025
ae3d151
sync: increase compat of our patchflow shims
williamstein Nov 29, 2025
321c8f3
Added the missing patchflow-style methods to the time-travel view doc…
williamstein Nov 29, 2025
bf2335f
sync: started work integrating patchflow into sync-doc.ts
williamstein Nov 29, 2025
0f77b78
temporarily revert some use of patchflow
williamstein Nov 29, 2025
77ac454
sync: getting undo to work with patchflow
williamstein Nov 29, 2025
b956dea
sync: using patchflow for getting specific version of doc
williamstein Nov 29, 2025
386a133
sync: history export using patchflow
williamstein Nov 29, 2025
3b0a324
sync: show_history via patch flow
williamstein Nov 29, 2025
0967ee2
sync
williamstein Nov 29, 2025
5d20822
more use of patch flow for sync
williamstein Nov 29, 2025
30ad7d7
sync: more patchflow
williamstein Nov 29, 2025
654a4ee
sync -- migrating to patchflow -- partial loading support
williamstein Nov 29, 2025
b7730f1
sync: migrate how we select snapshots
williamstein Nov 29, 2025
1effc76
ofix some confusing issues found by auditing this code
williamstein Nov 29, 2025
3e94518
sync: delete sorted-patch-list entirely; it's now in patchflow
williamstein Nov 29, 2025
bcd61ca
eliminate pre-nats timetravel support entirely
williamstein Nov 30, 2025
9167a74
audit file-server/ssh code and make some fixes
williamstein Nov 30, 2025
1b85061
btrfs snapshots: ignore lock files in limits and await lock writes
williamstein Nov 30, 2025
bb8a4fa
btrfs clone: create qgroup for cloned subvolume
williamstein Nov 30, 2025
41529ff
btrfs quota: pick qgroup for subvolume id
williamstein Nov 30, 2025
d2eb297
sync: refactoring cursor handling
williamstein Nov 30, 2025
84a01c2
switch to using PatchFlow for cursors
williamstein Nov 30, 2025
c04a10f
sync: delete some dead code
williamstein Nov 30, 2025
b0fc4a2
fully switch to PatchFlow's undo/redo, instead of our own
williamstein Nov 30, 2025
011eebc
sync: fix before-change hook
williamstein Nov 30, 2025
5106952
remove before-sync event from core sync algorithm (broken)
williamstein Nov 30, 2025
865fa19
sync: codemirror -- change to only merging in upstream on idle to eli…
williamstein Nov 30, 2025
dbd93b3
attempt to improve use of sync by codemirror; result not good
williamstein Nov 30, 2025
81a7e1b
frontend sync: start refactoring how editors are plugged into sync
williamstein Nov 30, 2025
ce6c90a
Implemented a dedicated CodeMirror adapter for sync listeners -- just…
williamstein Nov 30, 2025
22797b7
Extracted a sync adapter to isolate SyncString wiring.
williamstein Nov 30, 2025
4884687
sync: removing old code from code-editor/actions.ts
williamstein Nov 30, 2025
66a4026
add sync unit test to frontend; remove unused flags/events
williamstein Nov 30, 2025
50f4a03
codemirror sync -- work in progress
williamstein Nov 30, 2025
40cd698
frontend cm sync: work in progress
williamstein Nov 30, 2025
9176c2a
cm sync integration -- fix issue with not firing
williamstein Nov 30, 2025
07cd789
update tasks to work with new sync
williamstein Dec 1, 2025
3b90279
slate -- adapt to work with sync
williamstein Dec 1, 2025
125efd9
update whiteboard text input for new sync
williamstein Dec 1, 2025
348de2e
update whiteboard code input for new sync
williamstein Dec 1, 2025
083ad9c
update jupyter to new sync model
williamstein Dec 1, 2025
4837816
remove before-change event from synctable, since we no longer need it
williamstein Dec 1, 2025
5f218ef
migrate jupyter sync to new model
williamstein Dec 1, 2025
9d3b631
switch to actual patchflow package
williamstein Dec 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
54 changes: 25 additions & 29 deletions .github/workflows/make-and-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
detached: true
- uses: actions/checkout@v4
- name: Install python3 requests
run: sudo apt-get install python3-requests
run: sudo apt-get install python3-requests python3-yapf
- name: Check doc links
run: cd src/scripts && python3 check_doc_urls.py || sleep 5 || python3 check_doc_urls.py

Expand Down Expand Up @@ -91,19 +91,15 @@ jobs:
# cache: "pnpm"
# cache-dependency-path: "src/packages/pnpm-lock.yaml"

- name: Download and install Valkey
run: |
VALKEY_VERSION=8.1.2
curl -LOq https://download.valkey.io/releases/valkey-${VALKEY_VERSION}-jammy-x86_64.tar.gz
tar -xzf valkey-${VALKEY_VERSION}-jammy-x86_64.tar.gz
sudo cp valkey-${VALKEY_VERSION}-jammy-x86_64/bin/valkey-server /usr/local/bin/
- name: Install btrfs-progs and bup for @cocalc/file-server
run: sudo apt-get update && sudo apt-get install -y btrfs-progs bup

- name: Set up Python venv and Jupyter kernel
run: |
python3 -m pip install --upgrade pip virtualenv
python3 -m virtualenv venv
source venv/bin/activate
pip install ipykernel
pip install ipykernel yapf
python -m ipykernel install --prefix=./jupyter-local --name python3-local --display-name "Python 3 (Local)"


Expand All @@ -128,30 +124,30 @@ jobs:
name: "test-results-node-${{ matrix.node-version }}-pg-${{ matrix.pg-version }}"
path: 'src/packages/*/junit.xml'

report:
runs-on: ubuntu-latest
# report:
# runs-on: ubuntu-latest

needs: [test]
# needs: [test]

if: ${{ !cancelled() }}
# if: ${{ !cancelled() }}

steps:
- name: Checkout code
uses: actions/checkout@v4
# steps:
# - name: Checkout code
# uses: actions/checkout@v4

- name: Download all test artifacts
uses: actions/download-artifact@v4
with:
pattern: "test-results-*"
merge-multiple: true
path: test-results/
# - name: Download all test artifacts
# uses: actions/download-artifact@v4
# with:
# pattern: "test-results-*"
# merge-multiple: true
# path: test-results/

- name: Test Report
uses: dorny/test-reporter@v2
with:
name: CoCalc Jest Tests
path: 'test-results/**/junit.xml'
reporter: jest-junit
use-actions-summary: 'true'
fail-on-error: false
# - name: Test Report
# uses: dorny/test-reporter@v2
# with:
# name: CoCalc Jest Tests
# path: 'test-results/**/junit.xml'
# reporter: jest-junit
# use-actions-summary: 'true'
# fail-on-error: false

35 changes: 30 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ src/restart_hub
src/restart_compute
.coffee

src/packages/project/build

src/smc-hub/run/hub[0-9].js
src/smc-hub/lti
src/smc-hub/landing
Expand All @@ -110,11 +112,6 @@ src/smc-build/prometheus/alertmanager.yml
.gitignore
postgres-env

# related to testing
src/smc-hub/test/*.js
src/smc-hub/test/*.map
src/smc_sagews/smc_sagews/metastore_db/

# autogenerated
src/smc-webapp/_colors.sass

Expand Down Expand Up @@ -164,8 +161,36 @@ src/.claude/settings.local.json

# test reports by jest-junit
junit.xml


sea-prep.blob
cocalc
cocalc-lite.tar.*
*.egg-info
.python-version
src/packages/lite/sea/cocalc*gz
src/packages/lite/sea/cocalc*xz
src/packages/lite/sea/cocalc*zip
src/packages/lite/sea/cocalc*gnu


# autogenerated docs
**/cocalc-api/site/**
*.pkg
*.zip

src/packages/lite/build/
src/packages/project/build/
src/packages/project-runner/build/


codex.sh
g
g-cs
g-cs2
g-ssh-server
g-bundle
g2
g-lite
scratch
patchflow
236 changes: 236 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
# CoCalc2 Architecture Overview (Draft)

> This is a working draft meant to capture the current design in one place.

---

## Goals & Non‑Goals

**Goals**

- Fast, durable, multi‑tenant project storage with clear quotas.
- Predictable save from project runner VMs to the central file server \(no “I did work but it can’t be saved”\).
- Efficient storage via transparent compression; simple mental model for users.
- Rolling snapshots for user self‑service restore. Separate quota for snapshots, which users mostly don't worry about.

**Non‑Goals**

- Per‑user UID separation on runners \(we rely on containerization and subvolume quotas instead\).
- Snapshots on runner VMs \(server owns snapshot history; runners are ephemeral\).

---

## High‑Level Components

1. **Central File Server** \(single large Btrfs filesystem\)
- One Btrfs **subvolume per project** \(live working set\).
- Compression enabled \(e.g., `zstd`\).
- **Qgroups/Quotas** enabled for hard limits.
- **Rolling snapshots** per project for user restore.
- Named **user created snapshots**.

2. **Project Runner VMs** \(many; fast local SSD\)
- Also Btrfs with compression and **per‑project subvolumes**.
- Hard quotas sized slightly below the server quota to maintain save‑back headroom.
- No persistent snapshots \(might use short‑lived read only snapshots for atomic rsync of rootfs\).

3. **Sync Layer**
- **Mutagen**: near real‑time sync for user home files.
- **rsync**: periodic sync for the container rootfs upper overlay.

4. **Web UI & Services**
- Surfaced usage and limits \(live and snapshots\), snapshot browser/restore, warnings.

---

## Storage Model & Quotas

### Per‑Project Subvolume (File Server)

- Each project lives at `/mnt/project-<project-id>` as its **own subvolume**.
- **Compression** is enabled at the filesystem level; **quotas are enforced** _**after compression**_.
- Two distinct quota budget buckets:
- **Live quota**: applies to the live subvolume.
- **Snapshots quota**: applies to the aggregate of _all_ snapshots for that project.
- Quota for snapshots will be a simple function \(probably 2x\) of the live quota.

### Qgroups Structure

- Btrfs assigns each subvolume an implicit qgroup `0/<live-id>`.
- We create an **aggregate qgroup** `1/<live-id>` for that project’s snapshots.
- We apply limits:
- **Live**: limit `0/<live-id>` \(or the path directly\) to, say, **10 GiB**.
- **Snapshots**: limit `1/<live-id>` to, say, **20 GiB** total across all snapshots.
- On snapshot creation, we assign the snapshot’s `0/<snap-id>` **into** `1/<live-id>`.
- Using the **live subvolume ID as the aggregate id** avoids external ID bookkeeping.

### Runner VM Quotas

- Each runner has a **per‑project subvolume** with **quota set to ~85–90%** of the server’s live quota.
- Rationale: keeps **headroom** so save‑back to the server succeeds even if compression ratios differ.

### User‑Facing Explanation (docs‑ready blurb)

> **Storage quota is measured after compression.** Your project has a quota that measures the actual space consumed on disk. If your data compresses well, the sum of file sizes you see in the editor may exceed your quota and still fit. Snapshots have a separate quota \(twice the project quota\) that limits how much historical data is retained.

---

## Snapshots

- **Where**: server only, per project \(no long‑term snapshots on runners\).
- **How**: periodic RO snapshots \(e.g., 15 minute/hourly/daily/weekly retention\).
- **Budget**: snapshots all share the project’s **snapshot quota** \(`1/<live-id>` limit\). When the budget is exceeded, the snapshot retention policy prunes oldest automatic snapshots until under budget. Explicit user created named snapshots are not automatically deleted.
- **Self‑service**: UI lets users browse/restore from snapshots; command line restore via rsync is also supported.

> **Note**: Runner nodes may take a **short‑lived RO snapshot** strictly for consistent `rsync` (copy‑on‑write point‑in‑time view), then delete it immediately after sync completes. This does not change policy: history lives on the server.

---

## Data Flow

1. **Active work on runner**
- User edits files in their per‑project subvolume on a runner.
- **Mutagen** streams home‑dir changes to the server nearly immediately. In case of file change conflicts the central file server always wins.
- **rsync** pushes the rootfs overlay periodically \(e.g., every minute\) from a transient snapshot for consistency.

2. **File Server receives changes**
- Writes land in the project’s live subvolume, bounded by the live quota.
- Periodic snapshots capture history and consume from the snapshots quota.

3. **Restore**
- Users restore individual files or directories from snapshots via UI or CLI.

---

## Operational Procedures

The following is roughly what the actual Javascript code in `packages/file-server` does.

### One‑Time Setup (per filesystem)

```bash
# Enable quotas once
sudo btrfs quota enable /mnt/fs
# Optional after bulk ops or enabling late
sudo btrfs quota rescan -w /mnt/fs
```

### Create a New Project (Server)

```bash
# Live subvolume
sudo btrfs subvolume create /mnt/project-$PROJECT_ID

# Set live quota (example: 10 GiB)
sudo btrfs qgroup limit 10G /mnt/project-$PROJECT_ID

# Snapshot aggregate group uses the live subvolume ID
LIVEID=$(sudo btrfs subvolume show /mnt/project-$PROJECT_ID | awk '/Subvolume ID:/ {print $3}')

# Create and limit the snapshots group
sudo btrfs qgroup create 1/$LIVEID /mnt/
sudo btrfs qgroup limit 20G 1/$LIVEID /mnt/ # example snapshots budget
```

### Snapshot Creation (Server)

```bash
# Create RO snapshot
TS=$(date -u +%Y%m%dT%H%M%SZ)
SNAP=/mnt/project-$PROJECT_ID/.snapshots/$TS
sudo btrfs subvolume snapshot -r /mnt/project-$PROJECT_ID "$SNAP"

# Assign snapshot to the project’s snapshot group
SNAPID=$(sudo btrfs subvolume show "$SNAP" | awk '/ID:/ {print $2}')
LIVEID=$(sudo btrfs subvolume show /mnt/project-$PROJECT_ID | awk '/ID:/ {print $2}')
sudo btrfs qgroup assign 0/$SNAPID 1/$LIVEID /mnt
```

### Runner Subvolume & Quota

```bash
# Create per‑project subvolume on runner
sudo btrfs subvolume create /runnerfs/project-$PROJECT_ID

# Set runner quota to ~90% of server limit (example: 9 GiB)
sudo btrfs qgroup limit 9G /runnerfs/project-$PROJECT_ID
```

### Rsync from Runner \(optional transient snapshot\)

```bash
# (TODO)
P=/runnerfs/projects/$PROJ
TS=$(date -u +%Y%m%dT%H%M%SZ)
rsync -aHAX --delete ... file-server:/mnt/projects-$PROJECT_ID/.local/overlay/...
```

### Inspecting Usage

```bash
# Qgroup usage (referenced/exclusive, human‑readable)
sudo btrfs qgroup show -reF /mnt | less

# Filesystem space by class (useful with compression)
sudo btrfs filesystem df /mnt
```

---

## Policies & Safety

- **Hard quotas**: enforced by the kernel via qgroups \(both server and runner\). When a project exceeds its quota, writes fail with ENOSPC scoped to that subvolume.
- **Headroom on runners**: prevents the common failure mode where work done on a runner can’t be saved back to the server due to tighter server limits or different compression ratios.
- **User guidance**: expose a `~/scratch` directory \(separate subvolume and policy\) for large temporary files not intended for sync—reduces quota pressure on the live budget.
- **Performance knobs**: `compress=zstd[:3]`, `ssd`, `discard=async`. Consider `autodefrag` only for heavy small‑random‑write workloads. Set `chattr +C` sparingly on paths needing no‑CoW \(trades off checksumming\).
- **Dedup** on runners: optional **bees** on runners to reduce local SSD usage; measure CPU/IO overhead under realistic load. Use reflink copy\-on\-write when possible \(e.g., cloning projects\).
- **Dedup** on file server: optional **bees** to reduce disk usage. Also extensively use copy\-on\-write, e.g., when copying files between projects.

---

## Failure Modes & Mitigations

- **Runner quota exceeded** → user sees ENOSPC early; save‑back fails fast and visibly. UI should warn near 80–90%.
- **Server live quota exceeded** → incoming syncs fail; UI callouts \+ guidance to delete files or increase quota.
- **Snapshot budget exceeded** → retention pruner deletes oldest snapshots until under budget.
- **Qgroup counter drift** \(rare, after crashes/bulk ops\) → `btrfs quota rescan -w` to reconcile.
- **Filesystem nearly full** → monitor `btrfs filesystem df`; alert admins before metadata pools are pressured.

---

## Observability (What to Monitor)

- Live and snapshots usage per project (qgroup referenced/exclusive).
- Runner vs server usage deltas (to detect pathological compression differences).
- Snapshot creation latency; pruner actions count.
- Error rates from mutagen/rsync; ENOSPC events; quota rescans.

---

## FAQ (User‑Facing)

**Q: My files add up to more than my quota, but I’m not blocked. Why?**
A: Quotas measure space **after compression**. If your data compresses well, you can store more than the sum of uncompressed file sizes.

**Q: Do snapshots count against my main quota?**
A: No. Snapshots have a **separate budget which is twice your main quota**. When that fills, older snapshots are pruned automatically.

**Q: What happens if I hit the quota while working?**
A: New writes fail with “out of space.” Delete data or request a higher quota, then try again.

**Q: Can I keep big temporary outputs?**
A: Use `~/scratch` \(limited retention and a separate quota\). Only the project’s live area is synced and counted against your main quota.

---

## Appendix: Rationale for Design Choices

- **Per‑project subvolumes** enable kernel‑level quotas, small blast radius, and fast deletion.
- **Server‑side snapshots only** simplify reasoning about history, save SSD cycles on runners, and reduce operational complexity.
- **Aggregate snapshot qgroup** provides a single dial for “how much history a project can accumulate.”
- **Runner quotas < server quotas** provide a simple, robust guardrail against save‑back failures due to compression variance.

---

_End of draft._

Loading
Loading