-
Notifications
You must be signed in to change notification settings - Fork 2
⚡️ Speed up function find_query_preview_references by 14,668%
#29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
⚡️ Speed up function find_query_preview_references by 14,668%
#29
Conversation
The optimization achieves a **14,667% speedup** primarily through **LRU caching** of expensive SQL parsing operations. Here's what changed: **Key Optimizations:** 1. **LRU Cache for SQL Parsing**: Added `@lru_cache(maxsize=64)` to cache `sqlparse.parse()` results, which was the dominant bottleneck (97.6% of original runtime). The same SQL strings are parsed multiple times during recursive traversal of query references. 2. **Cache Table Reference Extraction**: The `extract_table_references` function now uses cached `_cached_extract_table_references` that returns immutable tuples for cache efficiency while maintaining list compatibility for callers. 3. **Eliminated Redundant Object Comparisons**: Replaced the expensive `any(id(variable) == id(ref) for ref in query_preview_references)` check with a simple dictionary key lookup (`if variable_name in query_preview_references`), reducing O(n) iterations. 4. **Minor Micro-optimizations**: Stored `token.ttype` in a local variable to reduce attribute access overhead. **Why This Works:** - **Repeated Parsing**: The line profiler shows `sqlparse.parse()` consuming 99.7% of `is_single_select_query` runtime and 97.6% of `extract_table_references`. Caching eliminates this redundancy. - **Recursive Query Analysis**: When analyzing nested query references, the same SQL strings are parsed multiple times - caching provides exponential benefits. - **Test Results Pattern**: All test cases show 25x-400x improvements, with larger improvements for complex recursive/multiple reference scenarios (up to 45,000x for large-scale tests). **Best Performance Gains**: The optimization excels with repeated query analysis, recursive query references, and large-scale scenarios with many table references - exactly the patterns shown in the test cases where speedups range from 554% to 45,845%.
📝 WalkthroughWalkthroughTwo SQL utility modules were optimized to reduce redundant work through caching. In Pre-merge checks✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (2)
🧰 Additional context used🪛 Ruff (0.14.4)deepnote_toolkit/sql/sql_utils.py20-20: Missing return type annotation for private function (ANN202) deepnote_toolkit/sql/sql_query_chaining.py216-216: Missing return type annotation for private function (ANN202) 221-221: Do not catch blind exception: (BLE001) 222-222: Unnecessary Rewrite as a literal (C408) 244-244: Possible hardcoded password assigned to: "normalized_token" (S105) 🔇 Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📄 14,668% (146.68x) speedup for
find_query_preview_referencesindeepnote_toolkit/sql/sql_query_chaining.py⏱️ Runtime :
86.1 milliseconds→583 microseconds(best of250runs)📝 Explanation and details
The optimization achieves a 14,667% speedup primarily through LRU caching of expensive SQL parsing operations. Here's what changed:
Key Optimizations:
LRU Cache for SQL Parsing: Added
@lru_cache(maxsize=64)to cachesqlparse.parse()results, which was the dominant bottleneck (97.6% of original runtime). The same SQL strings are parsed multiple times during recursive traversal of query references.Cache Table Reference Extraction: The
extract_table_referencesfunction now uses cached_cached_extract_table_referencesthat returns immutable tuples for cache efficiency while maintaining list compatibility for callers.Eliminated Redundant Object Comparisons: Replaced the expensive
any(id(variable) == id(ref) for ref in query_preview_references)check with a simple dictionary key lookup (if variable_name in query_preview_references), reducing O(n) iterations.Minor Micro-optimizations: Stored
token.ttypein a local variable to reduce attribute access overhead.Why This Works:
sqlparse.parse()consuming 99.7% ofis_single_select_queryruntime and 97.6% ofextract_table_references. Caching eliminates this redundancy.Best Performance Gains: The optimization excels with repeated query analysis, recursive query references, and large-scale scenarios with many table references - exactly the patterns shown in the test cases where speedups range from 554% to 45,845%.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
unit/test_sql_query_chaining.py::TestSqlQueryChaining.test_find_query_preview_references_basicunit/test_sql_query_chaining.py::TestSqlQueryChaining.test_find_query_preview_references_circularunit/test_sql_query_chaining.py::TestSqlQueryChaining.test_find_query_preview_references_nestedunit/test_sql_query_chaining.py::TestSqlQueryChaining.test_find_query_preview_references_no_referencesunit/test_sql_query_chaining.py::TestSqlQueryChaining.test_find_query_preview_references_non_select_query🌀 Generated Regression Tests and Runtime
⏪ Replay Tests and Runtime
test_pytest_testsunittest_xdg_paths_py_testsunittest_jinjasql_utils_py_testsunittest_url_utils_py_testsun__replay_test_0.py::test_deepnote_toolkit_sql_sql_query_chaining_find_query_preview_referencesTo edit these changes
git checkout codeflash/optimize-find_query_preview_references-mhl9wno5and push.Summary by CodeRabbit