FIDL Unknown Interactions
Contents
Background
I worked for Google on the Fuchsia operating system from January 2022 to January 2023.
Fuchsia is a general-purpose operating system under development by Google for a variety of use cases. In order to achieve better security and updatability, it uses a microkernel with many features normally provided by the monolithic kernel of other operating systems like Windows or Linux instead being provided by userspace programs that communicate over OS-provided channels.
FIDL stands for Fuchsia Interface Definition Language and it is used to define the protocols that different programs use to communicate over Fuchsia’s channels, much the way Protocol Buffers and gRPC can be used to define interfaces that run over HTTP/2.
The FIDL language supports defining struct
s, table
s, and union
s for containing data
and protocol
s with methods and events to define how data is exchanged between programs.
Before I joined, FIDL supported some protocol evolution through the keywords strict
and
flexible
applied to tables and unions.
Unlike structs, which have a defined shape which which is difficult to alter, tables and
unions use numeric ordinals to identify the field of a table or the variant of a union
meaning that it is possible to handle the situation where a program receives a table with
an unkown field or a union with an unknown variant. The strict
and flexible
keywords
are used to control whether an application actually wants to handle unknown
fields/variants. A strict table or union denies unknown variants, while a flexible table
or union allows them and leave it up to the application to decide how to handle them.
When I joined the FIDL team, no such feature existed for protocols and methods. Like table fields and union ordinals, methods are identified with ordinals (though they’re generated from the protocol and method name by default), but there was no mechanism for allowing program code to decide how to deal with unknown methods; if a protocol received a method it didn’t recognize, it would just close the channel and end communication.
However, there was an RFC to change this, RFC-0138: Handling unkown interactions, and I was assigned to implement it.
Changes in the RFC
RFC-0138 added three new keywords for protocols, open
, ajar
, and closed
and extended
the strict
and flexible
keywords to apply to protocol methods and events.
As I describe in the official documentation, the three new protocol keywords
modify how a protocol is supposed to react to interactions if it doesn’t recognize them.
open
means that the program decides what to do for all unknown interactions. ajar
means that the program decides for one-way methods and events, but unknown two-way methods
cause the channel to always close. closed
means that the protocol rejects all unknown
interactions, closing the channel automatically. The protocol modifiers affect the
receiving side of an interaction, so the client in the case of events and the server in
the case of one-way and two-way methods.
The strict
and flexible
keywords as applied to FIDL methods and events serve to tell
the receiver of an interaction how the sender wants it to be handled if it isn’t
recognized on the receiving side. Accordingly, whether a method is strict or flexible on
the sender’s side is sent along with the method call in the message header, alongside
things like the method ordinal. strict
means that the sender doesn’t want the receiver
to handle the method if it’s unknown, that is, even if the protocol is open
, the
receiver should still hang up immediately if it doesn’t recognize a strict
method. On
the other hand, flexible
means that the sender would like the receiver to let the
program decide whether or not to handle the method call.
Note that since this whole thing is about dealing with version skew between programs built with different versions of the same protocol, the two sides of the interaction may disagree about which keyword was applied to the protocol or to any of the methods. If both sides recognize the method, that doesn’t matter. The header that says if the method/event is strict or flexible is ignored if the interaction is recognized. On the other hand, if the interaction is unknown, the version on the sending side determines whether the method/event is strict or flexible while the version on the receiving side determines whether the protocol is open, closed, or ajar.
Although the RFC makes open
and flexible
the defaults for protocols and methods/events
respectively, the change to the wire format uses a 1
bit in a set of flag fields to mean
flexible
while keeping 0
as strict
, which conveniently maintains binary backwards
compatibility, since all protocols before RFC-0138 behaved as if their protocols were
closed
and all their methods/events were strict
. This meant that releasing unknown
interaction support would require code changes to existing FIDL protocols (namely adding
the explicit closed
and strict
modifiers) to stay compatible during the rollout, but
it would not create any binary compatibility issues as protocols were updated.
fidlc
The first thing I needed to change was the FIDL compiler, fidlc
, which needed to be
updated to be able to understand the new keywords.
The FIDL compiler starts by parsing the raw token stream into a “raw AST”, then extracts data from that to build what is called the “flat AST”, then runs validation which performs a variety of semantic checks, then finally emits an intermediate representation of the FIDL file in JSON format, appropriately called the JSON-IR. Code generation for each of FIDL’s target languages is done later by separate programs that operate off of the JSON-IR.
I first added tokens to the language and updated the raw AST and flat AST to contain the new modifiers., I then updated the parser to recognize the new tokens in the appropriate locations and place them in the raw AST, then modified the raw-AST consumer to interpret them and add them to the flat AST. To allow the new feature to be released in a controlled manner, I gated it behind a flag.
I then had to add semantic validation. RFC-0138 has two main restrictions on how the
unknown interaction modifiers can be used on protocols and methods. First, a flexible
method or event cannot be declared in a protocol which would not be allowed to handle it.
For example, you can’t declare a flexible
event in a closed
protocol or a flexible
two-way method in an ajar
protocol. Second, FIDL protocols can be composed together,
which effectively adds all methods of the composed protocol to the composing protocol, and
we don’t allow composing a more open protocol into a more closed one. So you can compose a
closed
protocol into an open
or ajar
one, but can’t compose an open
protocol into
a closed
protocol. This second rule is in a way an extension of the first rule; if you
could compose an open
protocol into a closed
protocol, the composing protocol could
gain a flexible
method indirectly.
Note that these validation rules are mostly to prevent some pretty obvious types of
mistakes. For example, if you could declare a flexible
method in a closed
protocol,
you might expect that would allow you to remove it later while still maintaining backwards
compatibility with older clients. However, unless you also changed the new version of the
protocol to open
or ajar
, you wouldn’t actually be able to handle the unknown calls to
the deleted method. Of course, with version skew it is still possible for a closed
protocol to receive a call from a flexible
method even with these validation rules,
since a different version of the protocol could have been changed to open
or ajar
.
Fortunately, the FIDL compiler had a fairly robust visitor system for doing these kinds of validations already, so adding this one was relatively straightforward.
Once I had the new modifiers parsed, validated, and placed in the JSON-IR, I had to move on to updating the runtime libraries and code generation for each of the FIDL supported languages. FIDL had generated bindings for Rust, C++, Dart, C++, Go, and C++ which would all need to be updated. I’m not kidding about C++ being there three times either. FIDL had both a legacy high-level binding which was widely used but planned to be replaced and a newer binding which was supposed to replace it, which came in two distinct flavors for high-level vs low-level usage. The two flavors of the new binding shared a bit of their backing code so maybe not 3 C++ bindings, but definitely at least like 2½.
Rust
I decided to start my code generation with Rust, since its the language that I’m most familiar with, and in some ways the simplest. The project for every language would involve the same set of tasks.
- Update the runtime library that all the generated code depends on so it has the necessary types and behaviors to support unknown interaction handling.
- Change the code generation to make it understand the new unknown interaction modifiers and create appropiate code for handling them.
- Test thoroughly.
Rather unfortunately, in my opinion, all of the code generators for all of FIDL’s official
bindings are written in Go using Go’s default text/template
library. The reason
that’s unfortunate isn’t anything particular to Go or its templating library, but rather
because my experience is that doing code generation with templates designed for producing
text or HTML is painful at best. My prior experience with generating code using text
templates was when I was building a little toy Lisp-ish language and trying to compile it
into JavaScript. I started out by trying to use Jinja templates to generate parts
of the code, but this ended up being impractical enough that I wrote my own pseudo-AST
for JavaScript so I could generate code by building up an AST instead of fumbling
with text.
On the other hand, the code for FIDL bindings is a fair bit more uniform and repetitive than trying to build up arbitrary expressions, so it lends itself better to text templates than something like cross-compiling between languages does, though it would still be kind of nice if, e.g., the Rust bindings could be generated using syn and quote instead (I even did some experiments aimed at just that as a side project while working on FIDL).
Anyway, the work for getting unknown interactions to work on Rust was fairly
straightforward. Add the type to use for the responses to two-way flexible methods, update
code generation for methods to insert the flexible
marker bit into the header (also
update the header to be able to hold that bit). Change the code for processing incoming
messages so that if the protocol is open or ajar, there’s a branch to catch unknown method
ordinals and send them to a handler.
In Rust, the FIDL server receives method calls as an async stream of values with a Rust
enum
as its type, and the client has something similar for events. From the perspective
of the FIDL generated code’s user, the only difference for an open or ajar protocol is
that the enum type on each of these streams gains an additional variant that they have to
handle, which is called _UnknownMethod
or _UnknownEvent
and has some limited data
about the method, such as its ordinal.
So for example, a FIDL protocol like this:
ajar protocol Example {
flexible SomeMethod();
flexible -> SomeEvent();
};
would generate a request and event stream with these types:
#[derive(Debug)]
pub enum ExampleRequest {
SomeMethod {
control_handle: ExampleControlHandle,
},
/// An interaction was received which does not match any known method.
_UnknownInteraction {
ordinal: u64,
control_handle: ExampleControlHandle,
direction: fidl::endpoints::UnknownInteractionDirection,
},
}
#[derive(Debug)]
pub enum ExampleEvent {
SomeEvent {},
_UnknownEvent { ordinal: u64 },
}
Dart
In Dart, FIDL makes use of the language’s built-in async support by declaring two-way
methods as async
and using Streams for each type of event. Conveniently, this
means that the interface on the client and server is actually the same. On the client
side, you call methods or listen to steams on an implementation generated by FIDL’s
codegen which plumbs those calls into or events out of the Fuchsia channel. On the server,
you implement the interface and the FIDL generated code calls your methods with data from
the channel and listens to your events and pipes them back into the channel.
Of course, because the client and server are the same, someone cleverly realized they could use the same generated interface for both client and server, and just have the codegen produce an implementation of it for the client side.
Problem is, with unknown interactions, the two sides are no longer symmetrical.
On the client side, we now need to handle the extra case of unknown events, which we can do by providing an extra event stream with a stream of unknown event metadata (which currently just contains the ordinal of the event). But we don’t want that stream to appear in the interface that the server uses, because the server shouldn’t intentionally send unknown events (and in fact doesn’t know what things would or wouldn’t be known to the client).
On the server side, we need a new handler for the case where we’ve received an unknown method call, which we can do by providing an extra method to implement which receives metadata about the unknown method. But again, we don’t want this to appear on the client side, since the client shouldn’t ever call it.
I initially proposed a few solutions such as splitting the client and server interfaces
completely, however in practice there was quite a bit of Dart code which depended on both
interfaces being the same, either because it used the interface rather than concrete
types, or was linking two channels by passing the generated client implementation from one
channel as a server implementation for another, or just using the same interface for both
client and server tests. So I determined that refactoring all of this would have delayed
the unknown interactions feature, so I instead decided to leave existing protocols (and
effectively all closed
protocols) alone, and make a change that would only affect new
ajar
and open
protocols.
On the client side, I added the new unknown events stream only to the generated implementation of the protocol. So to access it, you would have to use the concrete implementation type rather than the interface, though it still implements the common interface. This works fine since you have to construct the concrete type when you are making a new connection anyway, so if you just don’t change the variable declaration type to the interface type, you retain access to the unknown events stream, it all just works.
On the server side, I generated a new interface which inherits from the common interface and adds the unknown method handler. I then changed the generated binding to require you to implement this new “server interface” rather than just the common interface.
So using our same example from above,
ajar protocol Example {
flexible SomeMethod();
flexible -> SomeEvent();
};
We would generate this common interface, client, and server interface.
// Common interface
abstract class Example {
$fidl.ServiceData? get $serviceData => ExampleData();
$async.Future<void> someMethod();
$async.Stream<void>? get someEvent;
}
// Server interface (only generated for ajar and open protocols).
abstract class ExampleServer extends Example {
$async.Future<void> $unknownOneWay(int ordinal);
}
// Client implementation. The unknown event stream is added here for open and ajar
// protocols.
class ExampleProxy extends $fidl.AsyncProxy<Example> implements Example {
ExampleProxy() : super($fidl.AsyncProxyController<Example>($interfaceName: r'Example')) {
ctrl.onResponse = _handleResponse;
ctrl.whenClosed.then((_) {
_someEventEventStreamController.close();
_$unknownEventStreamController.close();
}, onError: (_) {});
}
@override
Null get $serviceData => null;
void _handleEvent($fidl.IncomingMessage $message) {
switch ($message.ordinal) {
case _kExample_SomeEvent_Ordinal:
// ... put the event into the event stream.
break;
default:
// ... check if the unknown event is strict or flexible, then either close or
// collect metadata on it and put it in the unknown event stream.
break;
}
}
@override
$async.Future<void> someMethod() {
// ... encode and send the `SomeMethod` message on the channel.
}
final _someEventEventStreamController = $async.StreamController<void>.broadcast();
@override
$async.Stream<void> get someEvent => _someEventEventStreamController.stream;
final _$unknownEventStreamController =
$async.StreamController<$fidl.UnknownEvent>.broadcast();
// While writing this essay I realized this `@override` is probably a mistake since it
// doesn't override a member in the common interface, but it is in the actual generated
// code, so oops.
// Anyway, this is the extra event stream that isn't declared in the common interface.
@override
$async.Stream<$fidl.UnknownEvent> get $unknownEvents =>
_$unknownEventStreamController.stream;
}
C++ (The Legacy Bindings)
The Legacy C++ bindings (also called HLCPP for ‘high level C++’) had a similar problem to Dart, with an interface being shared between the client and server. However, because of assumptions about the structure of the client and server types baked into many layers of complex SFINAE templates, simply breaking out the extra server parts into a separate interface would have been impractical. Fortunately, since these bindings were headed for replacement anyway, I could take a slightly ugly approach.
On the client side, I did much the same thing as in Dart; I added the callback for unknown
events only to the concrete Proxy
type that is generated to implemen the common
interface.
However, for the server side, I had to take the messier approach of putting the unknown
method handlers in the common interface. Fortunately, C++ has the protected
access
level, which I could hide them behind to prevent accidental calls on the client, plus I
would have the generated Proxy
type just produce an error if the client side ever did
try to call them.
Here’s what the header for that looks like for the same example protocol:
class Example {
public:
using Proxy_ = ::test::unknowninteractions::Example_Proxy;
using Stub_ = ::test::unknowninteractions::Example_Stub;
using EventSender_ = ::test::unknowninteractions::Example_EventSender;
using Sync_ = ::test::unknowninteractions::Example_Sync;
virtual ~Example();
virtual void SomeMethod() = 0;
using SomeEventCallback = fit::function<void()>;
protected:
// Allow the stub to access the handle_unknown_method method.
// (The stub is the part that dispatches method calls on the server).
friend class Example_Stub;
virtual void handle_unknown_method(uint64_t ordinal) = 0;
};
class Example_Proxy final : public ::fidl::internal::Proxy, public Example {
public:
explicit Example_Proxy(::fidl::internal::ProxyController* controller);
~Example_Proxy() override;
zx_status_t Dispatch_(::fidl::HLCPPIncomingMessage message) override;
void SomeMethod() override;
SomeEventCallback SomeEvent;
fit::function<void(uint64_t)> handle_unknown_event;
protected:
void handle_unknown_method(uint64_t ordinal) override {
// This method is only provided to satisfy the base interface. It's
// protected and should never be called by generated code on the client
// side, so it should never be called.
ZX_PANIC("Unreachable: Example_Proxy::handle_unknown_method should not be callable");
}
private:
Example_Proxy(const ::test::unknowninteractions::Example_Proxy&) = delete;
Example_Proxy& operator=(const ::test::unknowninteractions::Example_Proxy&) = delete;
::fidl::internal::ProxyController* controller_;
};
C++ (The New Bindings)
The new C++ bindings come in two flavors, the “natural” bindings and the “wire” bindings
(formerly called LLCPP for low level C++). The difference between them is that the natural
bindings fully own their contets and use common C++ types such as vector
and string
,
while the wire bindings are designed to be decoded in-place in borrowed memory, so they
don’t own their memory and have to use specialized collections.
The tradeoff is that the wire bindings are much faster, since incoming messages don’t require allocations for collections and most data doesn’t have to be copied anywhere else, while the natural bindings are much easier to work with, since you can easily move or copy subsets of the data without worrying about how long a buffer or arena that it came from will live for. The natural bindings are designed for programs that don’t have too tight of performance constraints where developer convenience is more valuable than squeezing out every drop of performance, while the wire bindings are targeted towards usecases like network or graphics drivers where their performance affects every part of the system and they’ll be getting enough traffic that even small overheads matter.
Fortunately, both of these bindings are structured very similarly, so while the exact implementation code for each is different and has to be generated separately, a few parts are shared and the overall layout is the same. Also both bindings completely avoid the problem of having a common interface between client and server.
In the new bindings, instead of being a generated named interface, as single type is
generated as a marker to identify the protocol, and then the actual interface for the
server and proxy for the client are generated as template specializations of
fidl::Server<T>
or fidl::WireServer<T>
and fidl::Client<T>
or fidl::WireClient<T>
.
These template specializations are used because they reduce the likelihood of a name
collision with code written by FIDL users, but for me they’re very convenient since they
keep the client and server interfaces well separated, meaning I don’t have issues with the
client and server having to implement the same interface.
On the server side, I created a fidl::UnknownMethodHandler<T>
which is shared
between the natural and wire bindings, and then just have the generated fidl::Server<T>
or fidl::WireServer<T>
extend fidl::UnknownMethodHandler<T>
, so when a programmer
extends the appropriate Server<T>
they also have to implement the corresponding unknown
method handler. I made UnknownMethodHandler
specialized per protocol so that one class
can implement more than one protocol and still have distinct unknown method handlers for
each if it wants (otherwise it could funnel them all to one method).
On the client side, the new bindings implement event handling similarly to how they
implement the server; instead of registering individual callbacks, there is a
fidl::WireEventHandlerInterface<T>
and fidl::NaturalEventHandlerInterface<T>
which the
client implements in order to receive events. This is very convenient for me, since I
could just create a fidl::UnknownEventHandler<T>
and have the corresponding
generated event handler inherit from it.
Here’s that same example protocol in the new C++ bindings. I’m only showing the wire bindings, but the natural bindings are similar.
// Pure-virtual interface to be implemented by a server.
template<>
class ::fidl::WireServer<::examplepackage::Example> : public ::fidl::internal::IncomingMessageDispatcher, public ::fidl::UnknownMethodHandler<::examplepackage::UnknownInteractionsProtocol> {
public:
WireServer() = default;
virtual ~WireServer() = default;
// The FIDL protocol type that is implemented by this server.
using _EnclosingProtocol = ::examplepackage::Example;
using Handler = fidl::ProtocolHandler<::examplepackage::Example>;
using SomeMethodCompleter =
::fidl::internal::WireCompleter<::examplepackage::Example::SomeMethod>;
virtual void SomeMethod(SomeMethodCompleter::Sync& completer)= 0;
// ... a couple other irrelevant utilities are generated.
};
template<>
class ::fidl::internal::WireEventHandlerInterface<::examplepackage::Example>
: public ::fidl::internal::BaseEventHandlerInterface,
public ::fidl::UnknownEventHandler<::examplepackage::Example> {
public:
WireEventHandlerInterface() = default;
virtual ~WireEventHandlerInterface() = default;
virtual void SomeEvent() = 0;
};
Go
The Go bindings were only used in one place in Fuchsia (the network stack), and were planned to be removed once the new netstack3 eventually replaces that. They also have some very serious limitations, like the fact that to receive events, you have to know when a server is going to send you an event and what type it is going to be, because if you try to receive the wrong type of event the bindings will return an error. Obviously if you have to correctly guess when you’re going to receive an event of a particular type, that’s absolutely a non-starter for handling unknown events.
Because of the planned deprecation and the limitations, I decided to simply not
implement Unknown Interactions support in Go, and instead updated the code generator to
instead generate an empty stub type for open
or ajar
protocols, with a little warning
comment telling users to let the FIDL team know if for some reason it turns out they need
unknown interactions support in Go.
Testing
Of course, I wrote extensive unit tests for each updated binding as I implemented it, but for something like this it’s also important to test that every binding behaves the same so that they can interconnect. Before I started on the FIDL team, they had designed a system called the dynsuite (dynamic test suite) to allow them to test the behavior of protocols in various situations, though they hadn’t really implemented much of it and it was barely used.
As part of my work on FIDL unknown interactions, I picked up the dynsuite and refactored it a bit and added a variety of utilities to help with writing tests.
The dynsuite had two main parts, the client suite and the server suite. They were both
structured similarly. There was one FIDL protocol used to control the tests, called the
Runner
protocol, which would be used to set up the test and issue instructions to the
other side of the connection, and then a variety of protocols representing different
systems-under-test, including open
, ajar
, and closed
protocols with a variety of
strict
and flexible
methods and events with various payloads. There were also
protocols used to allow the system-under-test (SUT) to report one-way methods and events.
One of my significant refactorings was ensuring separation of concerns between the SUT and
the protocols used for running and reporting state, as an earlier version of the dynsuite
had used one protocol as both a system under test and for reporting one-way interactions.
The test cases were all implemented in one program called the test harness. The test
harness would start the server for the system under test then use the Runner
protocol to
tell that server how the SUT should behave for each test. The SUT server could tell the
harness to skip certain tests (for example, skipping all the open/ajar tests for Go which
didn’t implement them). In each test implementation, I would use handcrafted
requests/responses when talking to the SUT to make sure the test case itself doesn’t
depend on the behavior of FIDL and so that it would be possible to send various kinds of
invalid data during tests.
Once I had a basis, I could define a set of detailed tests for both the client side and server side of unknown interactions and implement them all in the harness.
Then I had to implement a separate test runner for every single version of the bindings that we wanted to test, which was a lot. It’s not just the Rust, Dart, Go, and 3 × C++ though because I’ve so far glossed over the fact that C++ and Rust each had a choice between synchronous and asynchronous bindings on the client side, and each of those needed to be tested independently since they didn’t necessarily share that much code, so I wrote a lot of very similar programs for those SUTs (each with subtle behavior difference depending on whether they had to deal with blocking or asynchrony).
However, once done, this allowed me to validate that all the bindings behaved correctly in a variety of circumstances, including lots of edge cases, and make sure the different bindings were consistent with each other.
Rollout Process
I then needed to roll out the new feature. This included updating public documentation
about FIDL to explain how the feature worked and what it is for, as well as changing both
the compiler defaults to enable it and update all existing FIDL code to explicitly use
closed
and strict
before the compiler defaults changed to open
and flexible
. All
without interfering with day-to-day operations of other developers.
I developed a rollout plan that went something like this:
-
Create a migration mode where the modifiers
open
,ajar
, andclosed
can be used on protocols andstrict
andflexible
can be applied to methods, but leave the defaults when they aren’t specified asclosed
andstrict
to maintain source compatibility with existing FIDL files.Also, change the FIDL linter to start suggesting including these modifiers with links to relevant documentation.
-
Update all existing FIDL files with explicit
closed
andstrict
modifiers so that they will retain the same behavior when the compiler defaults change. Use a tool to process most of the files efficiently. -
Change to a second migration mode where default values aren’t allowed at all and all protocols, methods, and events must be explicitly labeled with one of the modifiers.
This prevents backsliding from developers introducing new protocols without modifiers during this time and helps to catch any protocols that slipped through step 2 since they now break on compilation. It also ensures that no existing protocol will change when the default changes and creates a buffer time period around the change in default value so it’s less likely developers mix up the defaults.
-
Allow default values again, but now with the defaults as
open
andflexible
.
I had updated the public documentation, completed Step 1, and was in the process of working on Step 2, when Google announced they were laying off 12,000 people, and I was one of them. So, I never finished this project, since, although the project is open source, if Google didn’t think completing the project was worth my time, who am I to disagree? 🙃 Instead I went to 20 different National Parks and did a whole lot of hiking.