B2view plotting and other improvements#663
Merged
Merged
Conversation
Include the sidecar's `path` in `_sidecar_handle_cache_key` so sibling columns in a compact (.b2z) store no longer collide on a single cache entry. Previously all columns shared the same `_array_key` (identical urlpath), so every column received the handle opened first — returning wrong data when the handle was read directly (e.g. plots, b2view).
Compact (.b2z) stores share one urlpath, so the index store held only the first column's descriptor under "__self__" — multi-column predicates pruned by one column at best. Injecting all columns (per-column tokens + array_to_col threading) wasn't enough: the segment merge required left.base is right.base, so cross-column ANDs failed to merge and fell back to a full scan (~2x regression). Replace the base-identity check with a row-grid compatibility check (_grid_compatible_segment_plans): segments are row-aligned, so columns sharing level/segment_len/shape/row-count map every segment to the same rows and their candidate masks can be combined directly. _merge_segment_plans now intersects (AND) / unions (OR) across columns, with a safe fallback when grids differ (AND -> most-selective side, OR -> full scan). Cross-column AND now prunes instead of full-scanning; OR prunes when both sides are segment-selective. Adds tests for AND/OR pruning, correctness vs scan, and merge semantics.
plot_series capped the exact min/max envelope at _PLOT_FULL_READ_MAX_BYTES (~1 GB) and fell back to a strided sample above it, which can step over peaks the envelope exists to preserve. For local objects (CTable columns, N-D arrays), stream the series in bounded spans and accumulate per-bucket min/max instead. Since min/max are associative, arbitrary span boundaries reproduce the single-read result exactly, so the envelope stays peak-preserving (method="reduce") at O(span) memory. Remote c2arrays keep the labeled strided sample to avoid many network round-trips. Adds _minmax_buckets_streaming / _bucket_geometry / _stream_span (reads aligned to native chunks, ~_PLOT_STREAM_BUFFER_BYTES each) and model-level tests in tests/b2view/test_plot_model.py: exactness vs full read, a spike a sample would miss, all-NaN/int/edge cases, and remote-stays-sample.
The 'p' plot showed only a whole-series min/max envelope, so detail within a region was bucketed away with no way to drill in. plot_series gains row_start/row_stop: the whole series still uses the fast SUMMARY tier, while a sub-range is read exactly (reduce/stream, or a strided sample only for large remote ranges) with x kept in absolute row coordinates. PlotScreen now holds a fetch closure + total n and re-queries the envelope on +/- (zoom about centre), left/right (pan), 0 (reset), and g (type an exact start:stop via the new PlotRangeScreen). A key-hint line and a '?'-help group advertise the keys.
b2view: 'v' locks the data grid to the plotted row range Add a public CTable.slice(start, stop=None, /, *, copy=True): range- or slice-style bounds in live-row space; copy=False returns a zero-copy view (via _view_from_positions, like head/tail), copy=True a compact copy (via take), mirroring NDArray.slice. In the plot modal, 'v' now locks the data grid in place to the navigated row range instead of just jumping the cursor. For CTable sources the model registers a copy=False slice view per-path in _window_views (precedence over _filter_views, so it composes over an active filter); len(view) bounds paging for free. The app holds self.row_window, reloads in place via _enter/_exit_row_window, shows a WINDOW a:b header chip, and gains an esc unlock layer. NDArray plots still fall back to a cursor jump (follow-up: window them copy-free via the layout).
Add a public CTable.slice(start, stop=None, /, *, copy=True): range- or slice-style bounds in live-row space; copy=False returns a zero-copy view (via _view_from_positions, like head/tail), copy=True a compact copy (via take), mirroring NDArray.slice. In the plot modal, 'v' now locks the data grid in place to the navigated row range instead of jumping the cursor; esc unlocks. CTable sources use a copy=False slice view registered per-path in the model's _window_views (precedence over _filter_views, so it composes over a filter); len(view) bounds paging for free. NDArray sources are clamped copy-free via the layout: DataSliceLayout gains a row_window field, preview_array_from_layout reports nrows = stop-start and offsets reads by start. The app holds self.row_window, reloads in place, shows a WINDOW a:b header chip, and gains an esc unlock layer in action_dim_exit.
In the plot modal, 'h' renders the raw values of the currently shown range as a real image over the braille plot; q/esc/h return with the zoom intact. The braille envelope stays the fast navigator. model.read_series reads the exact values for [row_start, row_stop) (same series selection as plot_series, no bucketing). PlotScreen gains a raw_fetch closure and action_hires, capped at _HIRES_MAX_POINTS (50k; above that it asks you to zoom in). HiResPlotScreen renders matplotlib (Agg) to PNG and shows it via textual-image's auto Image (kitty/iTerm2/sixel -> half-cells), scaled to fill the dialog; a focusable VerticalScroll body keeps the screen's keys live, and it closes with pop_screen. Add textual-image and matplotlib to the 'plot' extra.
Mirror pandas/polars display conventions: - CTable.to_string() now renders the whole table by default (all rows and columns), like pandas DataFrame.to_string(). New max_rows/max_width params truncate on demand. Decouple __str__ from to_string so str/print/repr stay truncated per the global options. - repr(t) now shows the same truncated table as str(t) instead of the one-line CTable<...> summary; the summary remains on t.info. - set_printoptions gains display_width (None=auto terminal, -1=all columns, int=fixed budget); display_rows now accepts -1 (all rows). - Add blosc2.printoptions(...) context manager (set + restore on exit). Behaviour changes: to_string() returns the full table (was truncated), and repr() returns the table (was the summary).
Make the path argument optional: with no path (or path=None), nothing is written and the CSV text is returned as a string, like pandas' DataFrame.to_csv(). Passing a path still writes the file and returns None; the returned string is byte-for-byte identical to the file content. No index column is written (matching polars and CTable's index-less data model).
Expensive CTable columns (list/struct/object/ndarray) render a `<...; skipped>` placeholder to keep table navigation responsive. Pressing Enter on such a cell now decodes just that one cell into a CellDetailScreen modal (pretty-printed, scrollable, esc/q/enter to return) — the table stays underneath with its position intact.
The data panel showed an "not implemented" stub for SChunk nodes. Render them instead as an xxd-style hex dump, modeled as a grid preview so all the existing row navigation applies unchanged.
With the braille 'p' plot now built in (textual-plotext is a core dep), the extra only covers the high-res 'h' image view, so `plot` was misleading. Rename it to `hires`. Unreleased (4.4.6.dev0), so no deprecation needed.
The WINDOW / DIM MODE indicators used plain [reverse] (white) reverse video. Switch them to the brand accent (yellow) so they match the theme: - new _accent_chip() helper renders dark-text-on-yellow via theme vars - WINDOW chips (zoomed/locked header and filter-chip path) use it - the dim-mode full-line highlight becomes dark-on-yellow; the DIM MODE label is an inverted cutout (yellow-on-dark) so it still stands out
Startup focus was applied on a fixed 0.05s timer that raced the node selection; when selection won, its tree.focus() pulled focus back to the tree, so `b2view store /path --panel data` left the grid unfocused. Replace the timer with a one-shot flag applied at the end of update_panels, once the data panel's display and contents have settled, so focus lands deterministically on the requested panel (the grid for a leaf, the preview scroll for a group).
882f135 to
0cdd259
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends the b2view terminal UI with plotting and richer data inspection, while also improving core Blosc2 table/array ergonomics (CTable display/CSV), adding performance fast paths for strided reads, and fixing compact-store indexing correctness/performance (sidecar handle cache + cross-column SUMMARY pruning). It also adds a new cibuildwheel WASM/Pyodide job.
Changes:
- Add
b2viewfeatures: zoomable plot modal (p), optional high-res matplotlib view (hviahiresextra), row-window locking (v), on-demand decode of skipped CTable cells (Enter), and a paged SChunk hex dump preview. - Revise
CTabledisplay APIs:to_string()shows full table by default,str/reprbecome the truncated “interactive” view controlled by print options, adddisplay_width, and addblosc2.printoptions(...)context manager;to_csv()can return CSV text when no path is provided. - Improve indexing/query planning and strided slicing performance (NDArray sparse-gather subsample path; Column identity fast path; fix sidecar cache keying + cross-column segment-plan merging).
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| todo/b2view.md | Updates b2view roadmap with newly completed features. |
| tests/test_b2view_model.py | Adds unit coverage for SChunk hex preview, on-demand cell decoding, and plot-series envelope behavior. |
| tests/ndarray/test_getitem.py | Adds tests for NDArray strided-slice sparse-gather dispatch/correctness. |
| tests/ctable/test_vlstring_vlbytes.py | Updates repr expectations after CTable display changes. |
| tests/ctable/test_select_describe_cov.py | Updates to_string footer expectations (pandas-style). |
| tests/ctable/test_getitem_access.py | Adds tests for printoptions, to_string truncation controls, and validation. |
| tests/ctable/test_ctable_take.py | Adds coverage for new CTable.slice() API (copy and zero-copy). |
| tests/ctable/test_ctable_indexing.py | Adds regressions for compact-store SUMMARY handle collisions and cross-column pruning. |
| tests/ctable/test_csv_interop.py | Adds coverage for to_csv() returning CSV text when no path is given. |
| tests/ctable/test_column.py | Updates repr/str semantics tests for CTable. |
| tests/ctable/test_column_slice_fastpath.py | New tests for Column identity-case fast path for strided slicing. |
| tests/b2view/test_plot_model.py | New unit tests for streamed plot envelope exactness and remote sampling behavior. |
| tests/b2view/test_basics.py | Adds Pilot regressions for focus, paging alignment, plotting, row window, cell decode modal, and SChunk hex dump paging. |
| src/blosc2/ndarray.py | Adds _try_subsample_gather fast path and updates slicing docs. |
| src/blosc2/indexing.py | Fixes sidecar handle cache keying (include path) and enables cross-column segment plan merging. |
| src/blosc2/ctable.py | Adds display-width options, printoptions context manager, repr/str behavior changes, to_string controls, CTable.slice, and to_csv return-string option. |
| src/blosc2/ctable_indexing.py | Adjusts index token handling for compact stores and passes column mapping into planning. |
| src/blosc2/b2view/model.py | Adds plot/read-series APIs, row-window support, SChunk hex preview, and cell decode API. |
| src/blosc2/b2view/cli.py | Adds a WASM/Pyodide guard with a clear error message. |
| src/blosc2/b2view/app.py | Implements plotting screens, high-res view, cell detail modal, theming, paging alignment, and row-window behavior. |
| src/blosc2/init.py | Exposes printoptions in the public API. |
| RELEASE_NOTES.md | Documents display/to_string/to_csv changes and new print options. |
| pyproject.toml | Adds textual-plotext to core deps; introduces hires extra (textual-image, matplotlib). |
| doc/getting_started/installation.rst | Documents extras (hires, parquet) and install syntax. |
| doc/getting_started/b2view.rst | Documents b2view plot/high-res dependencies and points to extras. |
| .github/workflows/cibuildwheels.yml | Adds WASM/Pyodide wheel build matrix and uploads artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Main improvements: