Hey! I am still getting up to speed on DD, so I apologize if this question is already answered somewhere, but I was curious if anyone has done any experiments adding provenance/explanation on top DD systems. I did see the work from about ten years ago here: https://github.com/frankmcsherry/explanation but was curious if anyone has extended that or are using that approach in a production system.
I work work on Egglog, a framework that can be used to build optimzers/compilers based on equality saturation and datalog. I am currently working with @oflatt to investigate ergonomic, performant, and modular ways to add proofs to a system like this. They are meant both for users to be able to understand why a certain equality holds in the end to help with debugging and also for verification of a result.
We are experimenting with porting egglog to different computational backends through this work, such as DD, duckdb, and souffle. One of the open questions is how to design a proofs system for a framework like this that can added in a modular manner, without baking it into the backend directly. We are trying out different implementations, including a source to source translation as well as more tightly integrated options to try to understand the performance, modularity, and maintainability tradeoffs between them, similar to work done for duckdb.
So as part of learning about the potential for a differential dataflow backend to egglog, I wanted to see if there are any best practices that have emerged around tracking provenance in a performant and modular manner. Appreciate any thoughts, thank you!
Hey! I am still getting up to speed on DD, so I apologize if this question is already answered somewhere, but I was curious if anyone has done any experiments adding provenance/explanation on top DD systems. I did see the work from about ten years ago here: https://github.com/frankmcsherry/explanation but was curious if anyone has extended that or are using that approach in a production system.
I work work on Egglog, a framework that can be used to build optimzers/compilers based on equality saturation and datalog. I am currently working with @oflatt to investigate ergonomic, performant, and modular ways to add proofs to a system like this. They are meant both for users to be able to understand why a certain equality holds in the end to help with debugging and also for verification of a result.
We are experimenting with porting egglog to different computational backends through this work, such as DD, duckdb, and souffle. One of the open questions is how to design a proofs system for a framework like this that can added in a modular manner, without baking it into the backend directly. We are trying out different implementations, including a source to source translation as well as more tightly integrated options to try to understand the performance, modularity, and maintainability tradeoffs between them, similar to work done for duckdb.
So as part of learning about the potential for a differential dataflow backend to egglog, I wanted to see if there are any best practices that have emerged around tracking provenance in a performant and modular manner. Appreciate any thoughts, thank you!