ActivityPub MRF system

In this article we’ll look at something that will become an FEP in the future: a generic MRF system for ActivityPub.

What the f- is an MRF?

MRF is a term originating, at least in the fediverse context, from Pleroma. It stands for Message Rewrite Facility.

Imagine it as an ActivityPub-aware middleware. Any incoming/outgoing activity gets passed through it, the MRF can modify or reject it.

Choosing a language

Originally, because Pleroma is written in Elixir, MRFs are simply Elixir modules you drop into a directory.

This works great because Elixir is an interpreted language. Simply add files to the project and they will be picked up at the next restart (or can even be hot-loaded!).

But now we have the issue of implementations that aren’t written in these kinds of languages. Languages that are compiled into a single binary.

Examples for this are GoToSocial, Lemmy, Kitsune, Mitra, WriteFreely, snac, etc.

So what could we do? Choose a scripting language to embed!

Popular candidates are.. uhh.. Lua! Or maybe Roblox’s fork “Luau”, which features a full type-system.. or maybe we can use QuickJS and just embed JavaScript?

No. That sucks.. that would force everyone to use a particular language, even if they prefer writing, for example, Rust. Maybe I want to write my plugins in Haskell or OCaml!

Wasm is the answer

Scratch the idea of choosing a language, users should be able to choose whatever language they want. Something that helps with that is a common bytecode target that languages can compile to: Wasm.

The name “Wasm” stands for “WebAssembly” which is funny. Because Wasm is neither Web nor Assembly. It’s a specification for a stack-based bytecode VM.

And everything runs in Wasm. Rust, C#, Haskell, JS, Go, Java, Zig, C, even Elixir!

Sketching out an interface

I’m a Rust nerd, let’s sketch the API in terms of a Rust function signature.

We want to be able to take an activity, modify, return as-is, or reject it.

enum Outcome {
    Accept(Activity),
    Reject,
}

fn mrf(activity: Activity) -> Outcome;

So, that looks like a good start.. what else do we need.. oh! Execution might fail.

enum Outcome {
    Accept(Activity),
    Reject,
}

fn mrf(activity: Activity) -> Result<Outcome>;

Detour: The “A” in “ABI” stands for “Actual Pain”

By default Wasm is very limited. You can call functions and pass integers as parameters. That’s it. Anything else is up to you.

So to even just pass a string you need to do a whole ceremony.

Guest module design

The guest module needs the following functions:

malloc
free

..because we need to somehow reserve memory inside the VM

And then you need a function that takes the string. The function needs to take a pointer and a length argument since the pointers obviously don’t have a length associated with them.

Caller design

As the caller, if you have a guest with the above mentioned design, need to:

Call malloc inside the guest with the capacity you need
Copy the string you want to pass into the VM. Your Wasm implementation provides primitives for this
Call the function with the correct length and pointer
Depending on the guest implementation you might have to call free to reclaim the memory (or the guest might do it)

As you can see, passing anything in and out of the module is a very manual task. It’s basically like building C APIs and just as fragile (you need to document every API meticulously or risk UB).

So what are the alternatives?

There are two possible solutions to this for now:

Solution 1: Extism

Extism is a framework which builds SDKs for guest and host languages to make this API less error-prone.

Basically what it does is encode all the parameter in an intermediate codec, such as JSON or MessagePack, and just pass this encoded pile of bytes between the two sides.

This gets around this whole can of worms of having to define an ABI for every type, such as structs, arrays, strings, etc.

You just pass around a pointer and a length and both sides then deserialize the type using the well-known codec.

This works great but it has the overhead of having this codec in-between, but it builds on stable and widely available features which is an advantage.

Solution 2: Component Model

The Component Model is an alternative ABI championed by the Bytecode Alliance.

It abstracts the whole annoyance of having to build these low level functions and golfing the correct pointers and lengths around.

You first write your API in a WIT file, which just describes the entire API. You can imagine it like .proto files.

package world:hello@0.1.0;

world hello {
    export hello-world: func() -> string;
}

The above WIT basically says “the interface with the name world:hello at version 0.1.0 exports a world called hello and inside this world, a host can call the function hello-world, which takes no arguments and returns a string.

Then the host takes its codegen tool and generates the required glue code on its side, the guest takes its codegen tool and generates the required glue code on its side, and then they can interface beautifully.

And if one of the sides doesn’t adhere to the contract laid out in the interface (i.e. one side thinks the function should return a struct and not a string), it will simply result in an error upon component instantiation.

This also has the advantage of technically having a lower overhead than the Extism approach since there is no serialization nor parsing step involved.

Note that this approach also has downsides. The biggest downsides being that the component model is currently only implemented by Wasmtime, and you can only write hosts in Rust.

Even though it has this big downside, I decided to bet on the component model for the MRF system.

I’ll come back to how we can work around the “Rust only” restriction later. For now, let’s finish this detour up.

So with that detour out of the way, let’s sketch out the MRF in WIT:

package fep:mrf@0.1.0;

interface types {
    /// The direction the activity is going
    enum direction {
        /// The activity is being received
        incoming,

        /// The activity is being sent out
        outgoing,
    }

    /// Outcome union
    enum outcome {
        /// Activity is accepted
        accept(string),

        /// Activity is rejected
        reject,
    }

    /// Error types
    variant error {
        /// An error occurred but the processing can continue
        error-continue(string),

        /// An error occurred and the processing should not continue
        error-reject(string),
    }
}

world mrf {
    use types.{direction, error};

    /// Transform an ActivityPub activity
    export transform: func(direction: direction, activity: string) -> result<outcome, error>;
}

This draft adds a few more things. It adds the actual error we can return, the idea of “ignorable” and “fatal” errors.

It also adds another parameter, indicating whether the activity is sent out or received. Maybe it makes a difference to the implementation. Makes sense to add that context.

Configuring our module

So we have the works. Transformation, rejection, errors, information about the direction. What more could we need?

Let’s imagine we have something like a primitive spam filter, regex-based, and the admin of the instance wants to add another regex.

With the current design, the admin would have to modify the filter module and recompile it. Maybe the filter is only distributed in binary form. What now?

We need a design for making the modules configurable and let the host handle where to load the config from.

package fep:mrf@0.1.0;

interface types {
    /// The direction the activity is going
    enum direction {
        /// The activity is being received
        incoming,

        /// The activity is being sent out
        outgoing,
    }

    /// Outcome union
    enum outcome {
        /// Activity is accepted
        accept(string),

        /// Activity is rejected
        reject,
    }

    /// Error types
    variant error {
        /// An error occurred but the processing can continue
        error-continue(string),

        /// An error occurred and the processing should not continue
        error-reject(string),
    }
}

world mrf {
    use types.{direction, error};

    /// Transform an ActivityPub activity
    ///
    /// TODO: Maybe make the configuration a `list<u8>`
    export transform: func(configuration: string, direction: direction, activity: string) -> result<outcome, error>;
}

This works! And it’s simple. The jury is still out on whether the configuration should be just a bag of bytes. But I’m tending towards even specifying it as “the configuration string passed has to be a JSON document”.

We’ll worry about that in the final spec.

Additional functionality

So this covers the basic functionality an MRF absolutely needs. Let’s get to the fun additional features.

I propose the following things:

WASI compatibility is a must: all I/O APIs are disabled by default
HTTP client with a function to send requests signed by the instance actor
Persistent and synchronized key-value storage
Logging facilities

This list is informed by my own opinions and by talking with Perilla who was designing a spam filter for the Fediverse.

WASI compatibility

This is from my “want” list

Modules should be able to use things such as clocks, randomness, or potentially even raw socket operations.

We’ll get to how to make the raw socket operations “safe” later

For the MVP, I propose that we require that all implementations must be able to run Wasm modules using WASI preview 2, but any operations that access the network, filesystem, etc. are disallowed.

This allows for future extensibility, and allows developers to access basic things such as clocks or randomness.

HTTP client with signing capability

This was requested by Perilla

For some spam patterns developers may need to load images or remote actors. Doing that without an HTTP client is kinda very much impossible.

The signing function is required because some implementations, such as GoToSocial, have authorized fetch enabled by default. This means every request to their ActivityPub resources must be signed.

Persistent key-value storage

This is also from my “want” list

Modules should have some mechanism to cache data between invocations or store some data.

Let’s take a silly module that drops every tenth post. A very stupid functionality but the easiest to visualize.

This module requires a store that’s persistent between restarts and synchronized between all module instantiations.

Logging facilities

This is another one from me

Modules should be able to call functions like Log.warn, Log.info, or Log.error to emit some messages.

It just makes sense. Things can go wrong, they should be able to give the admin some extra information on what went wrong.

Module manifests

As alured to earlier, manifests are the mechanism which I chose to use to try to secure modules. Manifests are embedded into the module as a custom linker section, making the module self-contained.

The manifest contains information such as:

Manifest version
Name
Version
Activities it handles
Configuration schema

Admins which want to install an MRF can then extract this manifest and inspect it to see what it can do.

For example, a spam filter may say that it handles all activities. An MRF that simply marks attachments from other instances as NSFW may only specify it handles Create activities.

The manifest is simply a JSON document encoded using the OLPC canonical JSON specification. Why this particular spec? TUF uses it so there are already formatters.

In the future this might be expanded with something like “capabilities” that tells the host to expose certain directories to the module, or let it connect to some remote ports/address ranges.

The other fields are self-explanatory with the exception of “configuration schema”.

Configuration schema

Well, earlier I mentioned how modules can be configured. But how can we make this actually user friendly?

If there’s an admin UI for an implementation, all they can do is give the admin a text field in which they can throw some raw JSON and hope they didn’t make any mistakes.

This is where the schema comes in. The schema is simply a JSON Schema from which implementations can generate forms.

There’s already multiple libraries that do exactly that: dynamically generate forms from JSON schema.

Bridging the language gap

In the beginning I mentioned how you can basically only write component model hosts in Rust. This is an issue for implementations such as GoToSocial or Misskey which aren’t written in Rust.

I mean, yeah, pretty much only three implementations I know off the top of my head are written in Rust. The rest is Go, Ruby, or JavaScript.

For the time being I propose the solution of a single common implementation. We all rally around a single canonical implementation of the spec written in Rust and bind it to other languages via FFI.

Bridging the FFI gap from Rust to other languages is luckily already a solved problem. We have things such as napi for Node.js, Rustler for bridging to the BEAM (so Erlang, Elixir, Gleam, etc.), and UniFFI for Go, Python, Swift, Kotlin, Ruby, etc.

Using the FFI shouldn’t be the be-all and end-all of this specification. It’s just the means to an end until other implementations have caught up with the component model spec and implemented them for themselves.

As soon as that happens, I’m all for ending the monoculture of having a single implementation. It’s also a great way of testing the specification, whether we need to document some undocumented behaviours.

Read other posts

< [ [Announcement] Deprecation of Kitsune chatrooms ] :: [Oh shit, I accidentally made an algebraic effect system ] >