Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README/ReleaseNotes/v640/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,7 @@ Given the risk of silently incorrect physics results, and the absence of known w

- The change of default compression settings used by Snapshot for the TTree output data format introduced in 6.38 (was 101 before 6.38, became 505 in 6.38) is reverted. That choice was based on evidence available up to that point that indicated that ZSTD was outperforming ZLIB in all cases for the available datasets. New evidence demonstrated that this is not always the case, and in particular for the notable case of TTree branches made of collections where many (up to all) of them are empty. The investigation is described at https://github.com/vepadulano/ttree-lossless-compression-studies. The new default compression settings for Snapshot are respectively `kUndefined` for the compression algorithm and `0` for the compression level. When Snapshot detects `kUndefined` used in the options, it changes the compression settings to the new defaults of 101 (for TTree) and 505 (for RNTuple).
- Signatures of the HistoND and HistoNSparseD operations have been changed. Previously, the list of input column names was allowed to contain an extra column for events weights. This was done to align the logic with the THnBase::Fill method. But this signature was inconsistent with all other Histo* operations, which have a separate function argument that represents the column to get the weights from. Thus, HistoND and HistoNSparseD both now have a separate function argument for the weights. The previous signature is still supported, but deprecated: a warning will be raised if the user passes the column name of the weights as an extra element of the list of input column names. In a future version of ROOT this functionality will be removed. From now on, creating a (sparse) N-dim histogram with weights should be done by calling `HistoN[Sparse]D(histoModel, inputColumns, weightColumn)`.
- The string expressions passed to `Vary` calls can now be shortened. If the string begins with '{' and ends with '}' (excluding whitespace, tab and newline characters), RDataFrame will automatically inject the return type in the generated lambda expression before declaring it to the interpreter. This for example allows writing an expression such as `{{px * 0.9, px * 1.1}, {py * 0.9, py * 1.1}}` instead of `ROOT::RVec<ROOT::RVec<ROOT::RVec<float>>>{{px * 0.9, px * 1.1}, {py * 0.9, py * 1.1}}`

## Histograms

Expand Down
3 changes: 2 additions & 1 deletion tree/dataframe/inc/ROOT/RDF/InterfaceUtils.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,8 @@ std::shared_ptr<RJittedDefine> BookDefinePerSampleJit(std::string_view name, std
std::shared_ptr<RJittedVariation>
BookVariationJit(const std::vector<std::string> &colNames, std::string_view variationName,
const std::vector<std::string> &variationTags, std::string_view expression, RLoopManager &lm,
RDataSource *ds, const RColumnRegister &colRegister, bool isSingleColumn);
RDataSource *ds, const RColumnRegister &colRegister, bool isSingleColumn,
const std::string &varyColType);

std::string JitBuildAction(const ColumnNames_t &bl, const std::type_info &art, const std::type_info &at, TTree *tree,
const unsigned int nSlots, const RColumnRegister &colRegister, RDataSource *ds,
Expand Down
78 changes: 77 additions & 1 deletion tree/dataframe/inc/ROOT/RDF/RInterface.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -1072,6 +1072,18 @@ public:
/// hx["pt:up"].Draw("SAME");
/// ~~~
///
/// ## Short-hand expression syntax
///
/// For convenience, when a C++ expression is passed to Vary, the return type can be omitted if the string begins
/// with '{' and ends with '}' (whitespace, tab and newline characters are excluded from the search). This means that
/// the following is equivalent to the example above:
///
/// ~~~{.cpp}
/// auto nominal_hx =
/// df.Vary("pt", "{pt*0.9, pt*1.1}", {"down", "up"})
/// // Same as above
/// ~~~
///
/// \note See also This Vary() overload for more information.
RInterface<Proxied> Vary(std::string_view colName, std::string_view expression,
const std::vector<std::string> &variationTags, std::string_view variationName = "")
Expand Down Expand Up @@ -1105,6 +1117,18 @@ public:
/// hx["pt:1"].Draw("SAME");
/// ~~~
///
/// ## Short-hand expression syntax
///
/// For convenience, when a C++ expression is passed to Vary, the return type can be omitted if the string begins
/// with '{' and ends with '}' (whitespace, tab and newline characters are excluded from the search). This means that
/// the following is equivalent to the example above:
///
/// ~~~{.cpp}
/// auto nominal_hx =
/// df.Vary("pt", "{pt*0.9, pt*1.1}", 2)
/// // Same as above
/// ~~~
///
/// \note See also This Vary() overload for more information.
RInterface<Proxied> Vary(std::string_view colName, std::string_view expression, std::size_t nVariations,
std::string_view variationName = "")
Expand Down Expand Up @@ -1142,6 +1166,31 @@ public:
/// hx["xy:1"].Draw("SAME");
/// ~~~
///
/// ## Short-hand expression syntax
///
/// For convenience, when a C++ expression is passed to Vary, the return type can be omitted if the string begins
/// with '{' and ends with '}' (whitespace, tab and newline characters are excluded from the search). This means that
/// the following is equivalent to the example above:
///
/// ~~~{.cpp}
/// auto nominal_hx =
/// df.Vary("pt", "{{x*0.9, x*1.1}, {y*0.9, y*1.1}}", 2, "xy")
/// // Same as above
/// ~~~
///
/// or also:
///
/// ~~~{.cpp}
/// auto nominal_hx =
/// df.Vary("pt", R"(
/// {
/// {x*0.9, x*1.1}, // x variations
/// {y*0.9, y*1.1} // y variations
/// }
/// )", 2, "xy")
/// // Same as above
/// ~~~
///
/// \note See also This Vary() overload for more information.
RInterface<Proxied> Vary(const std::vector<std::string> &colNames, std::string_view expression,
std::size_t nVariations, std::string_view variationName)
Expand Down Expand Up @@ -1194,6 +1243,31 @@ public:
/// hx["xy:up"].Draw("SAME");
/// ~~~
///
/// ## Short-hand expression syntax
///
/// For convenience, when a C++ expression is passed to Vary, the return type can be omitted if the string begins
/// with '{' and ends with '}' (whitespace, tab and newline characters are excluded from the search). This means that
/// the following is equivalent to the example above:
///
/// ~~~{.cpp}
/// auto nominal_hx =
/// df.Vary("pt", "{{x*0.9, x*1.1}, {y*0.9, y*1.1}}", {"down", "up"}, "xy")
/// // Same as above
/// ~~~
///
/// or also:
///
/// ~~~{.cpp}
/// auto nominal_hx =
/// df.Vary("pt", R"(
/// {
/// {x*0.9, x*1.1}, // x variations
/// {y*0.9, y*1.1} // y variations
/// }
/// )", {"down", "up"}, "xy")
/// // Same as above
/// ~~~
///
/// \note See also This Vary() overload for more information.
RInterface<Proxied> Vary(const std::vector<std::string> &colNames, std::string_view expression,
const std::vector<std::string> &variationTags, std::string_view variationName)
Expand Down Expand Up @@ -3798,9 +3872,11 @@ private:
throw std::logic_error("A column name was passed to the same Vary invocation multiple times.");
}

// Cannot vary different input column types, assume the first
auto varyColType = GetColumnType(colNames[0]);
auto jittedVariation =
RDFInternal::BookVariationJit(colNames, variationName, variationTags, expression, *fLoopManager,
GetDataSource(), fColRegister, isSingleColumn);
GetDataSource(), fColRegister, isSingleColumn, varyColType);

RDFInternal::RColumnRegister newColRegister(fColRegister);
newColRegister.AddVariation(std::move(jittedVariation));
Expand Down
57 changes: 45 additions & 12 deletions tree/dataframe/src/RDFInterfaceUtils.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -219,8 +219,8 @@ std::unordered_map<std::string, std::string> &GetJittedExprs() {
return jittedExpressions;
}

std::string
BuildFunctionString(const std::string &expr, const ColumnNames_t &vars, const ColumnNames_t &varTypes)
std::string BuildFunctionString(const std::string &expr, const ColumnNames_t &vars, const ColumnNames_t &varTypes,
bool isSingleColumn = false, const std::string &varyColType = "")
{
assert(vars.size() == varTypes.size());

Expand Down Expand Up @@ -278,22 +278,53 @@ BuildFunctionString(const std::string &expr, const ColumnNames_t &vars, const Co
if (!vars.empty())
ss.seekp(-2, ss.cur);

if (hasReturnStmt)
ss << "){";
// When building the function expression for a Vary call, we try to help the
// user by removing the need to explicitly write the vector return type.
// For now, Vary works by returning a (nested) RVec, depending on how many
// variables need to vary in lockstep.
auto finalizeExprForVary = [&]() {
std::string trailRetType{};
// Trim formatting characters at the extremes of the user expression
auto first_not_space = expr.find_first_not_of(" \n\t");
auto last_not_space = expr.find_last_not_of(" \n\t");
if (first_not_space != std::string::npos && last_not_space != std::string::npos && expr[first_not_space] == '{' &&
expr[last_not_space] == '}') {
// User expression is of type '{...}', a potential constructor for an
// RVec. At the same time, they have not decided the RVec return type
// Add trailing return type for the convenience of the user
// The innermost value type is by default the type of the first given column
trailRetType = " -> ";
if (isSingleColumn)
trailRetType += "ROOT::RVec<" + varyColType + ">";
else
trailRetType += "ROOT::RVec<ROOT::RVec<" + varyColType + ">>";
trailRetType += ' ';
}
std::string trailRetToken{trailRetType.empty() ? ") {" : ')' + trailRetType + '{'};
if (!hasReturnStmt)
trailRetToken += " return ";
return trailRetToken;
};

if (!varyColType.empty())
ss << finalizeExprForVary();
else
ss << "){return ";
ss << expr << "\n;}";
ss << (hasReturnStmt ? ") {" : ") { return ");

// Must inject \n to avoid cases where the user puts a comment after the expression
ss << expr << "\n;}\n";

return ss.str();
}

/// Declare a function to the interpreter in namespace R_rdf, return the name of the jitted function.
/// If the function is already in GetJittedExprs, return the name for the function that has already been jitted.
std::string DeclareFunction(const std::string &expr, const ColumnNames_t &vars, const ColumnNames_t &varTypes)
std::string DeclareFunction(const std::string &expr, const ColumnNames_t &vars, const ColumnNames_t &varTypes,
bool isSingleColumn = false, const std::string &varyColType = "")
{
R__LOCKGUARD(gROOTMutex);

const auto funcCode = BuildFunctionString(expr, vars, varTypes);
const auto funcCode = BuildFunctionString(expr, vars, varTypes, isSingleColumn, varyColType);
auto &exprMap = GetJittedExprs();
const auto exprIt = exprMap.find(funcCode);
if (exprIt != exprMap.end()) {
Expand Down Expand Up @@ -728,20 +759,22 @@ std::shared_ptr<RJittedDefine> BookDefinePerSampleJit(std::string_view name, std
std::shared_ptr<RJittedVariation>
BookVariationJit(const std::vector<std::string> &colNames, std::string_view variationName,
const std::vector<std::string> &variationTags, std::string_view expression, RLoopManager &lm,
RDataSource *ds, const RColumnRegister &colRegister, bool isSingleColumn)
RDataSource *ds, const RColumnRegister &colRegister, bool isSingleColumn,
const std::string &varyColType)
{
const auto &dsColumns = ds ? ds->GetColumnNames() : ColumnNames_t{};

const auto parsedExpr = ParseRDFExpression(expression, colRegister, dsColumns);
const auto exprVarTypes =
GetValidatedArgTypes(parsedExpr.fUsedCols, colRegister, nullptr, ds, "Vary", /*vector2RVec=*/true);
const auto funcName = DeclareFunction(parsedExpr.fExpr, parsedExpr.fVarNames, exprVarTypes);
const auto funcName =
DeclareFunction(parsedExpr.fExpr, parsedExpr.fVarNames, exprVarTypes, isSingleColumn, varyColType);
const auto type = RetTypeOfFunc(funcName);

if (type.rfind("ROOT::VecOps::RVec", 0) != 0) {
throw std::runtime_error(
"Jitted Vary expressions must return an RVec object. The following expression returns a " + type +
" instead:\n" + parsedExpr.fExpr);
"Jitted Vary expressions must return an RVec object. The following expression return type is '" + type +
"' instead:\n" + parsedExpr.fExpr);
}

auto jittedVariation = std::make_shared<RJittedVariation>(colNames, variationName, variationTags, type, colRegister,
Expand Down
8 changes: 8 additions & 0 deletions tree/dataframe/src/RDataFrame.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -1204,6 +1204,14 @@ hx["pt:down"].Draw("SAME");
hx["pt:up"].Draw("SAME");
~~~

A shorter expression syntax is allowed for convenience (see the docs of the Vary overloads for more details):

~~~{.cpp}
auto nominal_hx =
df.Vary("pt", "{pt*0.9f, pt*1.1f}", {"down", "up"})
// The rest is the same as above
~~~

A list of variation "tags" is passed as the last argument to Vary(). The tags give names to the varied values that are returned
as elements of an RVec of the appropriate C++ type. The number of variation tags must correspond to the number of elements of
this RVec (2 in the example above: the first element will correspond to the tag "down", the second
Expand Down
Loading
Loading