Enabling of MDAnalysis.analysis.align.AverageStructure parallelization #4738
Enabling of MDAnalysis.analysis.align.AverageStructure parallelization #4738talagayev wants to merge 19 commits intoMDAnalysis:developfrom
MDAnalysis.analysis.align.AverageStructure parallelization #4738Conversation
|
Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2025-01-11 21:40:18 UTC |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #4738 +/- ##
========================================
Coverage 92.73% 92.74%
========================================
Files 180 180
Lines 22475 22491 +16
Branches 3190 3191 +1
========================================
+ Hits 20842 20859 +17
Misses 1176 1176
+ Partials 457 456 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c37add2 to
03eef45
Compare
|
@marinegor I would ping you in this PR as well. Here basically I tried different ways to see if it is possible to parallelize the I added As for |
I think parallelization should not actually work with
If you're running out of ideas, I'd suggest making this PR for Also, I imagine there are problems with serialization of |
Yes makes sense. I think the current two ones that use Yes that would be good, I can then rename the PR to cover only |
package/MDAnalysis/analysis/align.py
Outdated
| if requested_backend not in (None, "serial"): | ||
| warnings.warn( | ||
| "The in-memory parallel trajectory usage is inefficient" | ||
| "and not supported. Falling back to serial.", | ||
| RuntimeWarning, | ||
| ) |
There was a problem hiding this comment.
I won't be in favor of a warning, and would rather explicitly raise ValueError because, well, how often do you switch off / ignore warnings?)
There was a problem hiding this comment.
True, adjusted it to raise a ValueError for that case and adjusted the test to cover the ValueError.
|
@talagayev ok, will be waiting for your message. |
MDAnalysis.analysis.align.AverageStructure parallelization
Added the Documentation, |
There was a problem hiding this comment.
@talagayev all looking good! I initially commented on one extra line but just realized I can remove it myself.
Also, may I ask you to create a separate issue for AlignTraj parallelization? Perhaps you could describe the issues you encountered there, and suggest the direction in which one should move to actually enable it.
Perfect :) Yes, I can create an Issue and write some Ideas in there. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
marinegor
left a comment
There was a problem hiding this comment.
@talagayev sorry for postponing that but I think these comments will make the code better :)
package/MDAnalysis/analysis/align.py
Outdated
| if getattr(self, "_in_memory", False): | ||
| # We are in the in_memory case: always run serial. | ||
| if requested_backend not in (None, "serial"): | ||
| raise ValueError( | ||
| "The in-memory parallel trajectory usage is not supported. Use serial backend instead.", | ||
| ) | ||
| return super().run( | ||
| start=start, stop=stop, step=step, verbose=verbose | ||
| ) | ||
| else: | ||
| if requested_backend is not None: | ||
| kwargs["backend"] = requested_backend | ||
| return super().run( | ||
| start=start, stop=stop, step=step, verbose=verbose, **kwargs | ||
| ) |
There was a problem hiding this comment.
@talagayev sorry I didn't think about it earlier, but this actually feels a bit hacky to me, and I think I know why: run() isn't supposed to do any validation, it's performed by _configure_backend() method instead (docs). In your case, I'd patch it to be something like:
def _configure_backend(
self,
backend: Union[str, BackendBase],
n_workers: int,
unsupported_backend: bool = False,
) -> BackendBase:
configured_backend = super()._configure_backend(backend=backend, n_workers=n_workers, unsupported_backend=unsupported_backend)
if not isinstance(configured_backend, MDAnalysis.analysis.backends.BackendSerial) and self._in_memory:
raise ValueError('...')this way you don't have to patch run() with double-nested ifs, and generally write less code.
There was a problem hiding this comment.
yes was a hacky approach to do the go around with the serial case. Agree your approach looks cleaner, will try to adjust it in the upcoming days :)
There was a problem hiding this comment.
Adjusted now to use _configure_backend as suggested :)
|
@talagayev and regarding So even in this PR, set |
Fixes #4659 attempt
Changes made in this Pull Request:
backendsandaggregatorstoAlignTrajandAverageStructureinanalysis.align.client_AlignTrajandclient_AverageStructureinconftest.pyclient_AlignTrajandclient_AverageStructureinrun()intest_align.pyCurrently for
AlignTrajit only acceptsserialanddaskwithmultiprocessingleading to the pytests taking forever. An additional error that appears is the following:OSError: File opened in mode: self.mode. Reading only allow in mode "r"For
AverageStructurethe Failure that appears is the following:AttributeError: 'numpy.ndarray' object has no attribute 'load_new'Which leads me to believe that
AverageStructurecan not be parallelized, but I would need additional opinions on it and onAlignTraj:)PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4738.org.readthedocs.build/en/4738/