Skip to main content

Transitive Core Concepts — #1: Full-stack Packages

· 10 min read
Christian Fritz
Founder & CEO of Transitive Robotics

Transitive, the open-source framework for full-stack robotics, is built on three core concepts: full-stack packages, real-time data-synchronization with full-stack reactivity, and topic-based access control. In this post, as part of a mini-series exploring these concepts, we will describe the former.

The Need for Full-stack Packages

Software package managers serve three high-level functions: they define a standard format for packages incl. versioning, they provide dependency resolution, and they (typically) provide a mechanism for the distribution and installation of these packages. There exist a good many package managers, typically tied to a specific context, e.g. apt, yum, snap, npm, yarn, pip, uv, and cargo, to name just a few. It would seem plausible that at least one of these existing package managers would be suitable for managing robotic software as well, but they are not and that is due to a very specific need in robotics: cross-device dependency management.

All of these package managers were designed to resolve dependencies on one device, the machine where the package is being installed. But the robotics stack consists of several devices that all need to work together. This, at a minimum, includes the robot, the cloud, and a front-end UI, but in practice it may involve even more devices, such as on-prem server, multiple compute units on the robot, and several UIs (e.g., web + mobile apps). Since both the cardinality as well as the update cycles and constraints on these devices differ, it is naive to think that all these devices could all run the same version of software all the time. As a result, at any point in time, the fleet is often very heterogeneous in terms of the software running, with versions differing between robots, sites, customers, countries, and other grouping criteria.

The need to coordinate software running on different devices is specific to robotics and it is hence not surprising to find, yet again, that tools that were designed for end-devices and servers do not provide an ideal solution in robotics.

The Approach

The Transitive framework addresses this need by defining full-stack packages that bundle and version code for all participating devices: robots, on-prem servers, cloud, and web UIs. The idea for this approach was heavily inspired by the Meteor framework, which defines packages that included both cloud back-end and web front-end code. This has many, novel benefits similar to the ones we will describe below. Unlike Meteor, though, Transitive realizes full-stack packages without defining a new package format, tools, and distribution mechanism from scratch, but instead extends and exploits npm packages. As such, Transitive packages are npm packages with added structure and meta-data. The content of the package consists of sub-packages robot/, cloud/, and web/, plus additional payloads if necessary, and the meta-data, added to in the package.json file, defines Transitive-specific information about packages such as title, system dependencies, and pricing (for commercial packages).

Figure, illustrating the three components of a full-stack package

Transitive capabilities combine components for multiple execution contexts, typically robot, cloud, and web browsers. These components communicate via namespaced MQTT topics in MQTTSync (data-sync). Cloud and robot components run in a sandboxed environment to make it safe to install and run third-party capabilities.

The choice of npm as the package format to build upon was driven by several important features of npm. Like several other formats we considered, npm supports arbitrary payloads—npm packages are actually just tar balls—but unlike some others, such as apt, npm registries allow and support containing multiple versions of the same package. This, of course, is important for supporting different update cycles as well as strict semantic versioning and version pinning of dependencies. Furthermore, npm is extremely easy to extend due to the simplicity of its HTTP-based registry API and the flexibility of JSON as a package description format. As such, the "capability store" of Transitive, displaying all capabilities (full-stack packages) available for installation on robotics fleets, is designed like a robotics-specific analog of the npmjs.com registry. Other features of npm we exploit are packages script (pre-install, post-install, etc.), scopes, which allows us to distinguish Transitive packages from regular npm package-(dependencies) and package authors, as well as the included gyp-based build system for compiling native code on-device during install when necessary.

When a user adds a capability to their fleet it triggers the installation of the robot component on the robot by the agent. Once that has succeeded, the capability has started, and the robot-agent has reported the version running to the cloud, two things follow: 1) the Transitive cloud starts up the corresponding cloud component of the specific capability version in a docker container (if not already running), and 2) any requests for this robot's web components are served the front-end bundle of the capability matching the version running on the robot. This mechanism solves the problem of cross-device dependencies. It completely eliminates the need to either coordinate the update of robots and cloud, or require backward compatibility of cloud and web code versions with robot code versions -- an alternative solution employed by some robotics companies we have worked for. In addition, just like on mobile OSs (iOS and Android), these components run in a sandbox. On the robot the sandbox uses the same Linux container technology as Docker (kernel namespaces, overlayfs), but without using Docker and in particular without requiring sudo/root privileges. In the cloud the sandboxing is provided by Docker. The sandboxing is designed to ensure that capabilities do not access sensitive information on the robot or from other capabilities -- just like your phone OS ensures that, e.g., your banking apps information cannot be accessed by other, potentially malicious apps.

Apart from the npm-based package format, Transitive capabilities also heavily rely on the other two core concepts of the framework, especially the namespaced data-synchronization provided by MQTTSync. This communication is namespaced by both the package name (incl. scope) and the version number—at the discretion of the package author either at the major, minor (recommended), or patch level. This gives the participants (robot, cloud, etc.) a separate space to communicate about specific packages and different versions. An example of the resulting MQTT topic namespace used by MQTTSync is: /superbots/d_bot123/@transitive-robotics/health-monitoring/0.4/, denoting the namespace for organization superbots, device d_bot123, scope @transitive-robotics, capability health-monitoring and capability version 0.4.

The attentive reader may have noticed one caveat: some cloud and/or web components do need to be able to explicitly aggregate data from multiple robots and versions, for instance for fleet wide health monitoring, or simply showing multiple robots on the same map. This problem remains and is not automatically solved by this namespacing approach. But the explicit versioning of the data itself makes it much easier for capability authors to implement such aggregations. This is further helped by Transitive's built-in auto-upgrade mechanism of robot capabilities. This auto-upgrade policy is both made possible by the namespacing (hence avoiding the need to coordinate such upgrades among a multitude of robots) and at the same time adds to its value, as it reduces the number of versions running at any one time to typically just one or two.

The Benefits

Several benefits result from this approach to software version management in robotics.

Full-stack capabilities!

The first and foremost benefit, of course, is the creation of real end-to-end capabilities that Just Work™. By bundling all the pieces of code that need to work together across device boundaries, we take the headache out of managing and coordinating their deployment and operation. No more "legacy code", arduous implementation pathways for upgrading to "v2", and much reduced interest paid on tech debt. If you want to change the data schema used by the robot component of your capability, you can do so immediately, simply updating the corresponding cloud and web components at the same time (within the same version of the package you publish). No need to first implement and roll out a forward-compatible cloud or web version. Your design decisions have just become a lot less sticky and that reduction in stickiness is rewarded by higher agility—a must-have for many robotics companies as they solve the really hard problems of robotics, specific to their application, industry, and customer workflows.

Auto-deploy galore

Because of the way the framework itself ensures cross-device compatibility, deployment management becomes a whole lot easier. As described above, the version running on the robot directs the deployment of all other components, and robot components themselves auto-update at least once every 24 hours as well, if a newer version exists. The choice of letting the robot's version direct the others is natural for two reasons: robots are typically the most constraints in terms of networking and update time windows, and they usually outnumber all other participants—cloud deployments and clients—by a lot, often orders of magnitude. This combination makes it particularly tedious to ensure all robots run the (ideally same) version they should be running. Hence designing cloud and web to accommodate the robots' needs is easier than the opposite and avoid unnecessary downtime on the most critical component of any robotic application: the robot itself.

As robots upgrade to newer versions of software one-by-one, the cloud can follow suit and stop the docker containers of capability versions no longer running on any robot, freeing up resources.

Encapsulation!

Just like in object-oriented programming, the concept and implementation of full-stack packages ensures that things that belong together stay together, and are versioned and deployed together. This encapsulation and separation of concerns, makes authoring and maintaining capabilities a breeze. It reins in complexity by setting clear boundaries and empowers the developer to make local decisions, with less time spent in meetings. The clear separation into vertically integrated, functionality-specific packages reduces cross-bleed and the accidental creation of dependencies within the code-base.

Try it yourself!

It is easy to get started using Transitive and developing your own full-stack capabilities. Our npm package init script creates a fully-functional template capability for you so you can focus on coding the actual full-stack logic right away. We think that once you have experienced full-stack robot + cloud + web packages and real-time data-sync with UI reactivity, you will not want to go back to the old way of implementing such functionality for your robots. And if you are developing something new with Transitive, please join our community Slack and share.