Conversation
76ccf41 to
0371b3c
Compare
6d30b34 to
654d07b
Compare
|
Updated to rework OCI precomposition. The way I did it before resulted in two code paths which were essentially the same but couldn't be exactly the same. Now there is only one code path and it is exactly the same as itself. I am getting increasingly fearful that I am getting lost in a labyrinth of my own changes and although it all seems to work for me I would really value somebody else looking at it and kicking the tyres because what if I am not testing it right. I was sure that this last rework had broken something, and yet seemingly not, which makes me worry whether I am testing what I think I'm testing. |
Signed-off-by: itowlson <ivan.towlson@fermyon.com>
867c0be to
c08e5d1
Compare
|
One thing that is troubling me with all this is isolation behaviour. Like if you want a middleware to call an auth provider, then it needs network permissions. So:
We can probably work around 1 in the increasingly terrifying normalisation step but 2 is a gazillion times more complicated. The only way around that today is to use a service chaining approach where it passes through the host rather than being directly composed. (Which is do-able, but a non-trivial rewrite, and we'd need to check that this would work with downstream hosts who don't support service chaining.) |
| if self.precompose_only { | ||
| let Some(precompose_component_id) = self.precompose_component_id.as_ref() else { | ||
| anyhow::bail!("got --precompose-only but no --precompose-component-id"); | ||
| }; |
There was a problem hiding this comment.
Was this added to aid in the research? I'm not sure what's happening here.
There was a problem hiding this comment.
This is used by the OCI subsystem during precomposition. spin registry push needs to precompose Wasm binaries for cases like SpinKube where composition at load time isn't available. But the OCI subsystem doesn't know how to deal with HTTP middleware because that's an HTTP concern, not an OCI concern. So when the OCI subsystem sees complications in a trigger, such as HTTP middleware, it issues issues a command to the relevant trigger saying "hey, you're so smart, you precompose this for me."
|
|
||
| /// An object which composes extras onto the primary component. | ||
| /// TODO: the combination of functions and objects and traits is a bit funny and we may/should be able to streamline it. | ||
| fn complicator() -> impl spin_factors_executor::Complicator { |
There was a problem hiding this comment.
I know complicator is a placeholder so I'll suggest we use something generic here like fn compose_extras() -> impl spin_factors_executor::ComposeExtras.
There was a problem hiding this comment.
Yeah I think I originally had "extras" but if anything I hated that even more than "complication." I am not sure what a good name would be!
| middleware_blobs: impl Iterator<Item = &'a ComplicationData>, | ||
| ) -> anyhow::Result<Vec<u8>> { | ||
| const MW_NEXT_INBOUND: &str = "wasi:http/handler@0.3.0-rc-2026-01-06"; | ||
| const MW_NEXT_OUTBOUND: &str = "wasi:http/handler@0.3.0-rc-2026-01-06"; |
There was a problem hiding this comment.
Can we unify these into a single const MW_HANDLER_INTERFACE: &str = "wasi:http/handler@0.3.0-rc-2026-01-06";?
There was a problem hiding this comment.
We probably can. Originally they were different, and I guess I wanted to retain clarity about which context we were in at any given point in case we ever needed to prise them apart again. But yeah that does seem unlikely.
| /// but with different middleware. In this case, we will synthesise a component | ||
| /// for each such trigger, with the same main configuration but with its own | ||
| /// "extra" components. | ||
| fn reassign_extra_components_from_triggers(mut locked: LockedApp) -> LockedApp { |
There was a problem hiding this comment.
I think we can avoid cloning the locked app if we break this up into 3 parts:
- Collect a lightweight
TriggerInfostruct containing thetrigger_id,component_idand whether "extras" are present. - Use those to determine splitting, then clone only the individual components that need it.
- Move extras from triggers to components.
Something like (untested):
/// We want all component/composition graph information to be in the component,
/// because the component ID is how Spin looks this stuff up. So if a trigger
/// contains a `components` table, e.g. specifying middleware, we want to move
/// that to the component.
///
/// But it's possible to have two triggers pointing to the same primary component,
/// but with different middleware. In this case, we will synthesise a component
/// for each such trigger, with the same main configuration but with its own
/// "extra" components.
fn reassign_extra_components_from_triggers(mut locked: LockedApp) -> LockedApp {
use std::collections::{HashMap, HashSet};
fn trigger_component_id(trigger: &LockedTrigger) -> Option<String> {
trigger
.trigger_config
.get("component")
.and_then(|v| v.as_str())
.map(|s| s.to_string())
}
fn extra_components(trigger: &LockedTrigger) -> Option<&ValuesMap> {
trigger
.trigger_config
.get("components")
.and_then(|v| v.as_object())
}
fn has_extra_components(trigger: &LockedTrigger) -> bool {
extra_components(trigger).is_some_and(|xcs| !xcs.is_empty())
}
// Collect lightweight trigger metadata to determine which components
// need splitting. This avoids cloning the entire LockedApp.
struct TriggerInfo {
trigger_id: String,
component_id: String,
has_extras: bool,
}
let trigger_infos: Vec<TriggerInfo> = locked
.triggers
.iter()
.filter_map(|t| {
trigger_component_id(t).map(|cid| TriggerInfo {
trigger_id: t.id.clone(),
component_id: cid,
has_extras: has_extra_components(t),
})
})
.collect();
// Group triggers by component ID, find components that need splitting
// (multiple triggers reference the same component AND at least one has extras).
let mut cid_triggers: HashMap<&str, Vec<&TriggerInfo>> = HashMap::new();
for info in &trigger_infos {
cid_triggers
.entry(&info.component_id)
.or_default()
.push(info);
}
let mut seen = HashSet::new();
let mut disambiguator = 0;
// For components referenced by multiple triggers where at least
// one has extras, create synthetic component clones so each trigger gets
// its own composition graph.
let needs_splitting: Vec<_> = cid_triggers
.into_iter()
.filter(|(_, triggers)| {
triggers.len() > 1 && triggers.iter().any(|t| t.has_extras)
})
.collect();
for (cid, triggers) in &needs_splitting {
for info in triggers {
if !info.has_extras {
// Unenriched triggers can continue pointing to the original component.
continue;
}
let mut synthetic_id = format!("{cid}-for-{}", info.trigger_id);
if seen.contains(&synthetic_id) {
disambiguator += 1;
synthetic_id = format!("{synthetic_id}-d{disambiguator}");
}
seen.insert(synthetic_id.clone());
let component = locked
.components
.iter()
.find(|c| c.id == **cid)
.expect("trigger references non-existent component")
.clone();
let mut synthetic = component;
synthetic.id = synthetic_id.clone();
locked.components.push(synthetic);
// Update the trigger to point to the new synthetic component
let trigger = locked
.triggers
.iter_mut()
.find(|t| t.id == info.trigger_id)
.expect("trigger disappeared during splitting");
trigger
.trigger_config
.as_object_mut()
.expect("trigger config should be an object")
.insert("component".into(), synthetic_id.into());
}
}
// Move extras from triggers onto their respective components.
// Now each enriched trigger has its own component, so composition graphs
// are uniquely identified by component ID.
for trigger in &mut locked.triggers {
if let Some(extras) = extra_components(trigger) {
if let Some(component_id) = trigger_component_id(trigger) {
if let Some(component) = locked.components.iter_mut().find(|c| c.id == component_id)
{
component
.metadata
.insert("trigger-extras".into(), extras.clone().into());
component.metadata.insert(
"resolve-extras-using".into(),
trigger.trigger_type.clone().into(),
);
trigger
.trigger_config
.as_object_mut()
.expect("trigger config should be an object")
.remove("components");
}
}
}
}
locked
}
crates/loader/src/local.rs
Outdated
| .await | ||
| .unwrap(); | ||
| assert_eq!(4, locked_app.triggers.len()); | ||
| // Splitting should resultm in triggers pointing to different IDs, but the same primary source |
There was a problem hiding this comment.
| // Splitting should resultm in triggers pointing to different IDs, but the same primary source | |
| // Splitting should result in triggers pointing to different IDs, but the same primary source |
crates/trigger/src/loader.rs
Outdated
| for component_ref in components { | ||
| let component_ref = component_ref | ||
| .as_str() | ||
| .context("middleware should be strings curently")?; |
There was a problem hiding this comment.
| .context("middleware should be strings curently")?; | |
| .context("middleware should be strings currently")?; |
There was a problem hiding this comment.
Also i might say: "middleware references should be strings currently"
crates/environments/src/loader.rs
Outdated
| async fn load_source(&self, source: &Self::Source) -> anyhow::Result<Vec<u8>> { | ||
| let path = self | ||
| .wasm_loader | ||
| .load_component_source("bippety-boppety", source) |
There was a problem hiding this comment.
| .load_component_source("bippety-boppety", source) | |
| .load_component_source("<extra-component>", source) |
crates/loader/src/local.rs
Outdated
| assert!(t.trigger_config.get("components").is_none()); | ||
| } | ||
|
|
||
| println!("{}", serde_json::to_string_pretty(&locked_app).unwrap()); |
There was a problem hiding this comment.
TODO: delete this println!
| let value = toml::Value::deserialize(deserializer)?; | ||
| if let Ok(val) = T::deserialize(value.clone()) { | ||
| Ok(vec![val]) | ||
| if let Some(arr) = value.as_array() { |
There was a problem hiding this comment.
Maybe delete the old path commented out below and add a note here:
// NOTE: We explicitly check for array first rather than trying T::deserialize
// first, because toml's serde impl will treat an array as a sequence of fields
// to be assigned to struct members (e.g. Component), producing nonsensical results.
There was a problem hiding this comment.
nonsensical
would you compromise on "the valuable guano of our much-admired chiropterans"
There was a problem hiding this comment.
Prefer it, actually.😂
crates/oci/src/client.rs
Outdated
|
|
||
| let temp_dir = tempfile::tempdir().unwrap(); | ||
| let working_dir = temp_dir.path(); | ||
| let locked_url = write_locked_app(&locked, working_dir).await.unwrap(); |
There was a problem hiding this comment.
Are these .unwrap()s safe? Or should we .context(...)? here?
Capability isolation remains a thorn for composition. We either need to accept that middlewares inherit capability from the primary component (or alternatively deny-all by default via some adapter like we do for dependencies with an opt-in) or what you suggest with service-chaining (which I prefer less for the reason you postulate). |
| } | ||
| } | ||
|
|
||
| async fn compose_middlewares<'a>( |
There was a problem hiding this comment.
I didn't want to just push a commit but i would simplify this to use wac-graph (addressing the TODO). You can delete the wasm-compose, wasmparser and tempfile dependencies using this.
Note: I added chain as a general purpose thing thinking it would be useful to upstream but for now it can live here.
/// Chain a list of component packages into a middleware pipeline.
///
/// `packages` is ordered from **outermost** (first to receive a request) to
/// **innermost** (the final handler). Every component except the last must
/// import a name equal to `import_name` and every component must export a name
/// equal to `export_name`. In the common middleware pattern these are the same
/// (e.g. both `"handle"`), but they can differ if the WIT uses separate names.
///
/// Returns the [`NodeId`] of the alias for the outermost component's export,
/// ready to be passed to [`CompositionGraph::export`].
///
/// # Errors
///
/// Returns an error if fewer than two packages are provided, or if any
/// alias / argument wiring step fails.
fn chain(
graph: &mut CompositionGraph,
packages: &[PackageId],
import_name: &str,
export_name: &str,
) -> anyhow::Result<NodeId> {
if packages.len() < 2 {
bail!("chain requires at least 2 packages, got {}", packages.len());
}
// Start from the innermost component (last in the list) and work outward.
// The innermost component is instantiated first with no wiring — its
// unsatisfied imports (if any) will become implicit imports of the
// composed component.
let mut iter = packages.iter().rev();
let innermost = *iter.next().unwrap();
let mut instance = graph.instantiate(innermost);
let mut upstream_handle = graph.alias_instance_export(instance, export_name)?;
// For each remaining component (moving outward), instantiate it and
// wire the previous component's export into its import.
for &pkg in iter {
instance = graph.instantiate(pkg);
graph.set_instantiation_argument(instance, import_name, upstream_handle)?;
upstream_handle = graph.alias_instance_export(instance, export_name)?;
}
Ok(upstream_handle)
}
async fn compose_middlewares<'a>(
primary: Vec<u8>,
middleware_blobs: impl Iterator<Item = &'a ComplicationData>,
) -> anyhow::Result<Vec<u8>> {
const MW_HANDLER_INTERFACE: &str = "wasi:http/handler@0.3.0-rc-2026-01-06";
let mut graph = CompositionGraph::new();
let mut package_ids: Vec<PackageId> = Vec::new();
// Register middleware packages (outermost → innermost order).
for (index, blob) in middleware_blobs.enumerate() {
let bytes: Vec<u8> = match blob {
ComplicationData::InMemory(data) => data.clone(),
ComplicationData::OnDisk(path) => tokio::fs::read(path)
.await
.with_context(|| format!("reading middleware from {}", path.display()))?,
};
let name = format!("middleware{index}");
let package = Package::from_bytes(&name, None, bytes, graph.types_mut())
.context("parsing middleware component")?;
package_ids.push(graph.register_package(package)?);
}
// Register the primary component (innermost in the chain).
let package = Package::from_bytes("primary", None, primary, graph.types_mut())
.context("parsing primary component")?;
package_ids.push(graph.register_package(package)?);
// Wire the pipeline: outermost middleware → … → primary.
let outermost_export = chain(
&mut graph,
&package_ids,
MW_HANDLER_INTERFACE,
MW_HANDLER_INTERFACE,
)?;
// Export the outermost handler as the composed component's export.
graph.export(outermost_export, MW_HANDLER_INTERFACE)?;
Ok(graph.encode(Default::default())?)
}There was a problem hiding this comment.
Thank you @fibonacci1729 - this is so much clearer and well-structured than what I had - I really appreciate you taking the time to figure it out and share it!
Signed-off-by: itowlson <ivan.towlson@fermyon.com>
Fixes #3291.
This currently includes #3383; once that lands I will rebase. For now just ignore theDone!witdirectory.Example usage: https://github.com/itowlson/spin-middleware-terrifying-nonsense
Setting it up in spin.toml:
Implementing a middleware component in Rust (today: we'd presumably provide a wrapper for the onbound
handleimport):Needs lots of renaming, tests, etc. but anyway getting it into the system. (and yes, yes, I'll squash the commits)