I'm encountering an error when trying to use multiple Small Speculative Models (SSMs) with FlexFlow-serve. The code fails with the following assertion error:
python3: /usr/flexflow-serve/src/runtime/request_manager.cc:3627: std::vector<std::pair<int, int> >
FlexFlow::RequestManager::merge_dfs_trees(std::vector<std::vector<std::pair<int, int> > >, int, FlexFlow::RequestGuid): Assertion `input_trees.size()
== 1 && "currently using one ssm"' failed.
The error occurs in the merge_dfs_trees function in /src/runtime/request_manager.cc at line 3627. The function has an assertion that only allows one SSM:
std::vector<std::pair<BatchConfig::TokenId, int>> RequestManager::merge_dfs_trees(
std::vector<std::vector<std::pair<BatchConfig::TokenId, int>>> input_trees,
int root_abs_depth,
RequestGuid guid) {
assert(input_trees.size() == 1 && "currently using one ssm");
// ...
}
Is multiple SSM support currently not implemented in FlexFlow-serve? Or are there any workarounds to use multiple SSMs with the current implementation?
Thank you for your help.