add SofaStream.verify#138
Conversation
There was a problem hiding this comment.
Thanks for the effort. I think this is great to have. I suggested a few small changes and to think about a good way to efficiently and comprehesively test the added functionality.
Oh and please already add this to the HISTORY. We do not have a test to enforce this yet but should already start this new best practice.
|
|
||
| def _verify_open_file(self, issue_handling, mode): | ||
| """Verify currently opened NetCDF file without loading numeric data.""" | ||
| import sofar as sf |
There was a problem hiding this comment.
Imports should be at the top. Any reason that this is here?
| issue_handling : str, optional | ||
| Defines how detected issues are handled. See | ||
| :py:func:`~sofar.Sofa.verify` for details. | ||
| mode : str, optional | ||
| The SOFA standard is more strict for writing data than for reading | ||
| data. See :py:func:`~sofar.Sofa.verify` for details. | ||
|
|
There was a problem hiding this comment.
I would suggest to copy paste the parameter description from Sofa.verify to make this more self contained.
| placeholders for numeric variables. Thus, numeric data are checked for | ||
| type and dimensions without loading the variable contents into memory. | ||
| String variables are loaded because their values and string lengths can | ||
| be relevant for convention verification. |
There was a problem hiding this comment.
Can you add something like 'see :py:func:~sofar.Sofa.verify for more information on the verification process and rules.'?
| _ = file.Wrong_Attribute | ||
|
|
||
|
|
||
| def test_sofastream_verify(temp_sofa_file, tmp_path_factory, monkeypatch): |
There was a problem hiding this comment.
Two suggestions because this test two things at once:
How about having one test to only check that read_sofa is not called (and add a brief docstring to the test even though the existing tests do not follow this)
Since you are calling Sofa.verify, which is already tested in test_sofa_verify.py, we should think about a good way to test only the added functionality here. Would it be sufficient to test this against sf.io.read_sofa? This would mean to test if
- string variables and attributes are loaded as intended
- all variables exist and have the same type and shape
I think this could be done if the 'fake' sofa file that is created in SofaStream._verify_open_file is returned in an intermediate step (so we could test it) and verified in a next step. So maybe split the functionality into
SofaStream._read_open_file_for_verificationandSofaStream._verify_open_file
There was a problem hiding this comment.
I am also thinking that we could also add upgrade_conventions in a similar way. Going through with this fully could imply implementing all Sofa methods. Then, we should probably work with inheritance.
@f-brinkmann should we just concentrate on verify in this PR or zoom out a bit already?
There was a problem hiding this comment.
@artpelling I would definately focus on verify at the moment. I see how for example adding variables can be helpful for example to sequentially write data from measurements into SOFA files, but we should maybe check if its possible and worth the effort.
SofaStreamhas noverifymethod. For verification of conventions, one would need to read the entire file into memory which defeats the purpose ofSofaStream. I've added averifymethod to solve that. The method reads everything but the actual numerical data from the (potentially too large for memory).sofainto aSofain-memory object and callsSofa.verify.Changes proposed in this pull request:
SofaStream.verifynp.arraytonp.asarrayinutils.pyto avoid always loading everything into memoryDisclaimer:
I partially used an LLM for the NetCDF4 reading stuff. Local tests seem to work well