-
Notifications
You must be signed in to change notification settings - Fork 2
feat(algorithms, trie, index pairs): index pairs of strings #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| # Index Pairs of a String | ||
|
|
||
| Given a string text and an array of strings words, return a list of all index pairs [i, j] such that the substring | ||
| `text[i...j]` is present in words. | ||
|
|
||
| Return the pairs [i, j] in a sorted order, first by the value of i, and if two pairs have the same i, by the value of j. | ||
|
|
||
| ## Constraints | ||
|
|
||
| - 1 ≤ text.length ≤ 100 | ||
| - 1 ≤ words.length ≤ 20 | ||
| - 1 ≤ words[i].length ≤ 50 | ||
| - text and words[i] consist of lowercase English letters. | ||
| - All the strings of words are unique. | ||
|
|
||
| ## Examples | ||
|
|
||
|  | ||
|  | ||
|  | ||
|
|
||
| ## Solution | ||
|
|
||
| The algorithm uses a trie to find all pairs of start and end indexes for substrings in a given text that match words | ||
| from a list. First, it builds the trie by inserting each word from the list. Then, it iterates over each starting index | ||
| in the text to match substrings using the trie. For each character sequence, it checks if the current character exists | ||
| in the trie and traverses accordingly. If a word’s end is found (marked by a flag), the start and end indexes of the | ||
| matched substring are recorded in the result. This method optimizes substring searching using the trie structure to | ||
| avoid redundant checks and efficiently match multiple words in the text. | ||
|
|
||
| The algorithm to solve this problem is as follows: | ||
|
|
||
| 1. Insert each word from the list into the trie. Each character is added as a node, and isEndOfWord is set to mark the | ||
| end of a word. | ||
| 2. Loop through each character in text (starting at index i). For each starting index, try to find substrings that match | ||
| words in the trie by traversing them. | ||
| 3. For each character at position i, the algorithm begins traversing the trie from the root node. It then checks whether | ||
| each subsequent character (from index i to j) is a child node in the trie. If the character is found, the traversal | ||
| continues to the next character. A valid match has been found if the current node in the Trie marks the end of a word | ||
| (i.e., isEndOfWord is True). In that case, the index pair [i, j] is recorded, where i is the start index, and j is the | ||
| end index of the matched word. | ||
| 4. After checking all starting indexes, return the list of index pairs representing matched words’ start and end positions. | ||
|
|
||
| ### Time Complexity | ||
|
|
||
| Inserting n words of average length m into the trie takes O(n∗m). For each index i in text, we perform a search that | ||
| takes linear time in the length of the substring. This gives an overall time complexity of O(l∗k), where l is the length | ||
| of the text, and k is the average length of a word. | ||
|
|
||
| ### Space Complexity | ||
|
|
||
| The space the trie uses depends on the number of characters in the list words. If there are n words with an average length | ||
| of m, the trie can take up to `O(n∗m)` space, assuming no overlapping prefixes among the words. In the worst case, if all | ||
| words are unique and have no shared prefixes, each character is stored separately. | ||
|
|
||
| The result list stores the index pairs [i, j]. In the worst case, every possible substring of text could be a word in | ||
| the list words. This would result in `O(l^2)` pairs, where l is the length of the string text. So, the space complexity | ||
| for the result list is `O(l^2)` in the worst case. | ||
|
|
||
| Thus, the overall space complexity is `O(n∗m+l^2)`. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| from typing import List | ||
| from datastructures.trees.trie import Trie | ||
|
|
||
| def index_pairs(text: str, words: List[str]) -> List[List[int]]: | ||
| trie = Trie() | ||
|
|
||
| for word in words: | ||
| trie.insert(word) | ||
|
|
||
| results = [] | ||
|
|
||
| # loop through each character in the text | ||
| for idx, char in enumerate(text): | ||
| # start from the root of the Trie for each character in the text | ||
| node = trie.root | ||
|
|
||
| # Check each possible substring starting from index idx | ||
| for j in range(idx, len(text)): | ||
| ch = text[j] | ||
| # If the character is not in the current Trie Node's children, stop searching | ||
| if ch not in node.children: | ||
| break | ||
|
|
||
| # Move to the next node in the Trie | ||
| node = node.children[ch] | ||
|
|
||
| # If we reach the end of a word, record the indices | ||
| if node.is_end: | ||
| results.append([idx, j]) | ||
|
|
||
| return results | ||
|
BrianLusina marked this conversation as resolved.
|
||
Binary file added
BIN
+47.5 KB
...e/index_pairs_of_a_string/images/examples/index_pairs_of_a_string_example_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+44.9 KB
...e/index_pairs_of_a_string/images/examples/index_pairs_of_a_string_example_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+28.5 KB
...e/index_pairs_of_a_string/images/examples/index_pairs_of_a_string_example_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
33 changes: 33 additions & 0 deletions
33
algorithms/trie/index_pairs_of_a_string/test_index_pairs_of_a_string.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| import unittest | ||
| from typing import List | ||
| from parameterized import parameterized | ||
| from algorithms.trie.index_pairs_of_a_string import index_pairs | ||
|
|
||
| INDEX_PAIRS_OF_A_STRING = [ | ||
| ( | ||
| "thestoryofeducativeandme", | ||
| ["story", "feduc", "educative"], | ||
| [[3, 7], [9, 13], [10, 18]], | ||
| ), | ||
| ("xyxyx", ["xyx", "xy"], [[0, 1], [0, 2], [2, 3], [2, 4]]), | ||
| ("howareyou", ["how", "are", "you"], [[0, 2], [3, 5], [6, 8]]), | ||
| ("weather", ["weather"], [[0, 6]]), | ||
| ( | ||
| "aquickbrownfoxjumpsoverthelazydog", | ||
| ["quick", "fox", "dog"], | ||
| [[1, 5], [11, 13], [30, 32]], | ||
| ), | ||
| ] | ||
|
|
||
|
|
||
| class IndexPairsOfAStringTestCase(unittest.TestCase): | ||
| @parameterized.expand(INDEX_PAIRS_OF_A_STRING) | ||
| def test_index_pairs_of_a_string( | ||
| self, text: str, words: List[str], expected: List[List[int]] | ||
| ): | ||
| actual = index_pairs(text, words) | ||
| self.assertEqual(expected, actual) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| unittest.main() |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.