Conversation
Delivers significant performance improvements by caching previously computed results.
# Conflicts: # edtf/fields.py # edtf/jdutil.py # edtf/natlang/en.py # edtf/natlang/tests.py # edtf/parser/grammar.py # edtf/parser/parser_classes.py # edtf/parser/tests.py # pyproject.toml # setup.py
|
Not sure why it's failing -- the tests all pass and work on my end. |
Also added Andrew Hankinson to the authors list in pyproject.toml
|
It occurs to me that a big part of the reason for the speed-up is the use of the functools lru_cache, which will cache the result of the same input string and skip the computation of that value, instead returning only the cached value. While this is useful in real-world contexts, it's probably not so useful in benchmark runs! |
|
@aweakley does this PR look OK? |
I've had a pass at the Parser Classes file, but there are a lot of problems still to be sorted out. I've added return types and argument types whereever it makes sense. The "UncertainOrApproximate" class is a hot mess. There are boolean values with property and method calls associated with them, and I would be surprised if it actually works. However, it doesn't seem to be tested or implemented, so I can't figure out where to go from here.
This wasn't actually used anywhere! Also removed a redundant regex group
|
I've done a bit of cleanup, but the parser_classes file is still quite buggy. Unfortunately I don't quite know how to trigger the bugs, but I'm at least confident that I didn't make it much worse. |
|
Thank you. I'll check it out. |
|
@ahankinson The tests are passing for me, are there particular bugs you were worried about in your comment above? |
|
Yeah the tests pass but I don’t think they exercise the parser classes that much. I have the ruff checker running on that and it still highlights a bunch of stuff |
|
BTW, ruff has also updated some rules so I have updated it accordingly now. |
|
This seems to be going nowhere. I'm going to be adding more changes on my fork, so to prevent it from running here I'll close it. |
|
Gah! sorry @ahankinson , I missed your comments. I'd like to merge but if I reopen this I think I'll pick up your most recent changes on your branch - are you happy for me to merge them too? |
Temporarily ignore UP031 pending merge of #73
|
Do you have any preference on whether I keep the The other changes I've been adding are fairly benign, so I'm happy for them to be merged. |
Also removed the one regex and replaced it with the "replace" method on strings.
|
I think let's keep it in please. |
|
I probably won’t be able to check it over until next week. I’m the meantime if you see something you want removed let me know. |
|
One thing I did in my branch is remove support in the CI for python 3.9, since a) it's EOL later this year, and b) from 3.10 onwards the type annotation |
4ad6171 to
9bd142d
Compare
# Conflicts: # .github/workflows/ci.yml
|
This should be ready for review now. I'm not sure why it's failing though. |
|
That's great, thank you very much indeed. |
|
I think the change to the signature here: python-edtf/edtf/parser/grammar.py Lines 346 to 351 in 95ebb72 parseAll->parse_all) means that we should release this as a new major version, so I'll prepare that.
|
It's great to see that this project has some life in it now. I have been using it for a while and maintaining my own fork with some performance enhancements.
Primarily, the main enhancements have been to enable packratting (which seems to now be enabled) and pre-declaring and compiling the regexes in the natural language processor.
If the current benchmarks are to be believed, this leads to a significant speedup. All the tests pass, so I guess it's OK?
https://rism-digital.github.io/python-edtf/dev/bench/
I've gone through and tried to integrate the latest changes, but that was a big PR so another set of eyes would be helpful.