Skip to content

Provenance/Explanation #742

@saulshanabrook

Description

@saulshanabrook

Hey! I am still getting up to speed on DD, so I apologize if this question is already answered somewhere, but I was curious if anyone has done any experiments adding provenance/explanation on top DD systems. I did see the work from about ten years ago here: https://github.com/frankmcsherry/explanation but was curious if anyone has extended that or are using that approach in a production system.

I work work on Egglog, a framework that can be used to build optimzers/compilers based on equality saturation and datalog. I am currently working with @oflatt to investigate ergonomic, performant, and modular ways to add proofs to a system like this. They are meant both for users to be able to understand why a certain equality holds in the end to help with debugging and also for verification of a result.

We are experimenting with porting egglog to different computational backends through this work, such as DD, duckdb, and souffle. One of the open questions is how to design a proofs system for a framework like this that can added in a modular manner, without baking it into the backend directly. We are trying out different implementations, including a source to source translation as well as more tightly integrated options to try to understand the performance, modularity, and maintainability tradeoffs between them, similar to work done for duckdb.

So as part of learning about the potential for a differential dataflow backend to egglog, I wanted to see if there are any best practices that have emerged around tracking provenance in a performant and modular manner. Appreciate any thoughts, thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions