Skip to content

Conversation

@MariusWirtz
Copy link
Collaborator

Switch verify_edges default to False

Fixes #1300

@onefloid
Copy link
Collaborator

onefloid commented Nov 6, 2025

I hate to ask it, but should we deprecate this first?

This isn’t a real breaking change, but rather a behavioral change with lower risk.

A gradual change could make sense here because the check only affects early error detection, not core functionality. Keeping it temporarily enabled helps avoid surprising users while still allowing the dependency to become optional later. Depending on how heavy the workflows are, it might still be beneficial to fail early to save resources. Also, if TM1 declines building the hierarchy, it should ideally raise the same ValueError; otherwise, the risk of breaking existing code increases due to differences in error handling in the calling code.

@vmitsenko
Copy link
Collaborator

Hi both, just a suggestion — it might be worth replacing the networkx library with a depth-first search implementation to maintain backward compatibility and simplify dependencies.

@onefloid
Copy link
Collaborator

@vmitsenko I like this approach. In theory, it could be faster and use less memory than networkx. Have you tested it on a large dataframe?

@vmitsenko
Copy link
Collaborator

@onefloid Current code is definitely not the most efficient implementation of the depth-first search - it’s simply the easiest and most straightforward to write. However, it appears to run faster and use less memory than when using networkx.

I ran tests using the following dataframe, and these are the results:

def generate_df(n_rows=100_000, n_parents=10):
    elements = np.arange(1, n_rows + 1)

    data = {"Element": elements}

    for p in range(1, n_parents + 1):
        parents = elements - p
        parents[parents < 1] = 0
        data[f"Parent_{p}"] = parents

    return pd.DataFrame(data)

depth-first search
Execution time: 7.2776 seconds
Peak memory usage: 65.047 MB

networkx
Execution time: 14.5668 seconds
Peak memory usage: 185.592 MB

@MariusWirtz
Copy link
Collaborator Author

@onefloid Current code is definitely not the most efficient implementation of the depth-first search - it’s simply the easiest and most straightforward to write. However, it appears to run faster and use less memory than when using networkx.

I ran tests using the following dataframe, and these are the results:

def generate_df(n_rows=100_000, n_parents=10):
    elements = np.arange(1, n_rows + 1)

    data = {"Element": elements}

    for p in range(1, n_parents + 1):
        parents = elements - p
        parents[parents < 1] = 0
        data[f"Parent_{p}"] = parents

    return pd.DataFrame(data)

depth-first search Execution time: 7.2776 seconds Peak memory usage: 65.047 MB

networkx Execution time: 14.5668 seconds Peak memory usage: 185.592 MB

Amazing! Thank you @vmitsenko

@MariusWirtz MariusWirtz force-pushed the 1300-make-networkx-optional-dependency branch from 63c5174 to c78bbe9 Compare November 24, 2025 21:02
@MariusWirtz MariusWirtz merged commit 4b09446 into master Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make networkx an optional dependency

4 participants