Skip to content

Commit 5d2802f

Browse files
add tpc-ds tests and property-based testing utilities
This change introduces a new `property_based.rs` test utility which lets us evaluate correctness using properties. These are useful for evaluating correctness when we do not know the expected output of a test (ex. if we were to fuzz the database with randomized data or randomzed queries, then we can only verify the output using properties). The two oracles are - ResultSetOracle: Compares the result set between single node and distributed datafusion - OrderingOracle: Uses plan properties to figure out the expected ordering and asserts it This change does not introduce a fuzz test, but it introduces a TPC-DS test. This test randomly generates data using the duckdb CLI and runs 99 queries on a distributed cluster. The query outputs are validated against single-node datafusion using test utils in `metamorphic.rs`. This test also randomizes the test cluster parameters - there's no harm in doing so. Next steps: - Add fuzzing - Now that we have property-based testing utils, we can properly fuzz the project using SQLancer - SQLancer produces INSERT and SELECT statements which we could point at a datafusion distributed cluster and verify against single node datafusion - Although it doesn't support nested select statements, 70% of the queries were valid datafusion queries, meaning these are good test cases for us - Add metrics oracle to validate output_rows metric / metrics propagation
1 parent 83efd9e commit 5d2802f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

109 files changed

+7045
-530
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
/.idea
22
/target
33
/benchmarks/data/
4-
testdata/tpch/data/
4+
testdata/tpch/data/
5+
testdata/tpcds/data/

0 commit comments

Comments
 (0)