Skip to content

add SofaStream.verify#138

Open
artpelling wants to merge 1 commit into
mainfrom
verify-stream
Open

add SofaStream.verify#138
artpelling wants to merge 1 commit into
mainfrom
verify-stream

Conversation

@artpelling

@artpelling artpelling commented Jun 19, 2026

Copy link
Copy Markdown
Member

SofaStream has no verify method. For verification of conventions, one would need to read the entire file into memory which defeats the purpose of SofaStream. I've added a verify method to solve that. The method reads everything but the actual numerical data from the (potentially too large for memory) .sofa into a Sofa in-memory object and calls Sofa.verify.

Changes proposed in this pull request:

  • added SofaStream.verify
  • changed np.array to np.asarray in utils.py to avoid always loading everything into memory
  • added tests

Disclaimer:

I partially used an LLM for the NetCDF4 reading stuff. Local tests seem to work well

@artpelling artpelling moved this from Backlog to Require review in Weekly Planning Jun 19, 2026
@artpelling artpelling requested review from a team, ahms5, f-brinkmann and hoyer-a June 19, 2026 14:52
@artpelling artpelling added this to the v1.2.3 milestone Jun 19, 2026
@artpelling artpelling changed the title add verify to SofaStream add SofaStream.verify Jun 19, 2026

@f-brinkmann f-brinkmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the effort. I think this is great to have. I suggested a few small changes and to think about a good way to efficiently and comprehesively test the added functionality.

Oh and please already add this to the HISTORY. We do not have a test to enforce this yet but should already start this new best practice.

Comment thread sofar/sofastream.py

def _verify_open_file(self, issue_handling, mode):
"""Verify currently opened NetCDF file without loading numeric data."""
import sofar as sf

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports should be at the top. Any reason that this is here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slop :^)

Comment thread sofar/sofastream.py
Comment on lines +116 to +122
issue_handling : str, optional
Defines how detected issues are handled. See
:py:func:`~sofar.Sofa.verify` for details.
mode : str, optional
The SOFA standard is more strict for writing data than for reading
data. See :py:func:`~sofar.Sofa.verify` for details.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to copy paste the parameter description from Sofa.verify to make this more self contained.

Comment thread sofar/sofastream.py
placeholders for numeric variables. Thus, numeric data are checked for
type and dimensions without loading the variable contents into memory.
String variables are loaded because their values and string lengths can
be relevant for convention verification.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add something like 'see :py:func:~sofar.Sofa.verify for more information on the verification process and rules.'?

Comment thread tests/test_sofastream.py
_ = file.Wrong_Attribute


def test_sofastream_verify(temp_sofa_file, tmp_path_factory, monkeypatch):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions because this test two things at once:

How about having one test to only check that read_sofa is not called (and add a brief docstring to the test even though the existing tests do not follow this)

Since you are calling Sofa.verify, which is already tested in test_sofa_verify.py, we should think about a good way to test only the added functionality here. Would it be sufficient to test this against sf.io.read_sofa? This would mean to test if

  • string variables and attributes are loaded as intended
  • all variables exist and have the same type and shape

I think this could be done if the 'fake' sofa file that is created in SofaStream._verify_open_file is returned in an intermediate step (so we could test it) and verified in a next step. So maybe split the functionality into

  • SofaStream._read_open_file_for_verification and
  • SofaStream._verify_open_file

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also thinking that we could also add upgrade_conventions in a similar way. Going through with this fully could imply implementing all Sofa methods. Then, we should probably work with inheritance.

@f-brinkmann should we just concentrate on verify in this PR or zoom out a bit already?

@f-brinkmann f-brinkmann Jun 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@artpelling I would definately focus on verify at the moment. I see how for example adding variables can be helpful for example to sequentially write data from measurements into SOFA files, but we should maybe check if its possible and worth the effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Require review

Development

Successfully merging this pull request may close these issues.

2 participants