There are many reasons and circumstances that require a robot operator to see through the eyes of a robot from afar. This is obviously the case for robots that are remote controlled, but the need also arises with autonomous robots. Examples include incident resolution, where a robot calls for help because it is aware of a problem it cannot resolve on its own, or after a customer reports an issue with the robot. Other examples include routine fleet monitoring, applications where robots are used for remote inspection, and AI applications that process video data in the cloud. In fact sight is such a fundamental sense to humans that the ability to "see at a distance" feels so enabling to robotic customers that, whenever possible and appropriate, a robotics company is well advised to offer it to its users.
Requirements and Criteria
In practice, robotics companies have taken a number of different approaches to stream live video from their robots for these purposes, but before we discuss these approaches and their pros and cons let us first enumerate some of the requirements and criteria of a good video streaming solution.
In our conversations with robotics companies as well as our own experience, having low latency is one of the most important requirements. Remotely tele-operating a robot when there is significant lag in the video stream is not only frustrating and exhausting to the operator who can then usually only proceed in small steps, it is also unsafe in most cases since the surroundings of the robot may change faster than the operator is able to react to. Similarly, other applications of video streaming become less valuable when the video is delayed. As a practical reference, latency should be less than 500ms.
No matter how your robots are connected to the Internet, your available bandwidth will almost certainly be limited. Of course, this is especially true when connected over a cellular connection in which case your data usage will also incur significant cost. A 5 GB/month data plan on a 5g network in the US costs anywhere from $30-$50 per robot and video data can get large.
Ability to operate on unreliable networks (robust)
Network connections of robots are typically unreliable, especially as they move around. Wifi connections tend to suffer from varying degrees of coverage, poor roaming support, but also simple configuration mistakes by a customer. This happens at customers small and large, which is why many robotics companies prefer to rely on a cellular connection even indoors. However cellular connections have their own problems. Both indoors and outdoors there can be gaps in coverage as well as significant fluctuations in available bandwidth. A reliable video streaming solution needs to be able to handle these situations and in particular gracefully handle packet loss, disconnect and reconnect events, as well as said fluctuations of available bandwidth.
Reasonable frame-rate (fast)
Together with low latency, many applications of video streaming also benefit from a reasonable framerate -- at least 5 frames per second, better 15, sometimes 30 is required. Note also that at a frame rate of 10Hz, 100ms additional latency is introduced merely by the time between frame updates.
Access to the video stream is as important as the stream itself. To enable a large number of users of varying professional roles and backgrounds to watch the video, it is essential to remove hurdles. This means reducing the software needing to be installed and configured on a watchers device, but also simplifying authentication and authorization as much as possible. Ideally, a user would be able to start watching without having to set up any new software on their device or configuring a VPN to gain access.
Last but definitely no least, the stream needs to be secure. End-to-end encryption is ideal as it maximizes privacy for the robotics customer. For the very least, transport level encryption needs to be used to secure the video as it streams through the systems of network and cloud providers, and any required cloud proxies should be configured to relay the stream without decrypting and reencrypting. This means that a simple proxy that maintains two SSL connections, one with the robot and one with the watcher, is not sufficient in all cases.
While many, and probably the majority of, robotics companies use ROS to write the software running on their robots, there is no standard platform like ROS for functionality like video-streaming that requires code running not just on the robot, but also in the cloud and on the front-end. As a result, different robotics companies have implemented their own robotic cloud stack and with it their own video-streaming solutions. We'll describe some common approaches and discuss their pros and cons. After looking at what people do in practice we'll evaluate these approaches against the requirements and criteria described above.
Sending individual images
This is by far the most straightforward approach and very often the first one taken by robotics companies. In this approach the robot captures still images from its cameras and transmits them as such over TCP to a cloud server. The cloud relays the images to web clients where they are displayed by repeatedly updating an HTML
<img> tag. The primary benefit of this approach is that it is relatively quick to implement and doesn't require a lot of knowledge of video formats. In fact it requires none of that, which is also it's greatest shortcoming: it is horribly inefficient. As many readers will already know and as we will see shortly, modern video compression algorithms are able to reduce the size of the stream by two orders of magnitude, i.e., save bandwidth and use the available network bandwidth more efficiently. Without this compression, most practical implementations can only support very low framerates -- typically 1-2 frames per second -- and often resort to grayscale images to reduce the size. This approach also suffers from being prone to man-in-the-middle attacks: in fact, in the naive approach, several parties beside the robotics company itself may be able to tap into the video stream.
RViz via ssh-tunnel or VPN
RViz is a powerful visualization tool for ROS built on Qt. It is primarily meant for developers but it does support visualizing video feeds from a robot's cameras and hence some companies use it for remote video streaming. To make this work, the watching RViz client needs to be put on a common network with the robot which implies some form of VPN being required. This makes this approach cumbersome to set up, especially on non-Ubuntu computers -- forget about mobile devices. It actually also suffers from the terrible inefficiency of the previous approach as ROS transmits these streams as sequences of individual camera images, too.
Foxglove via remote a websocket connection
Foxglove can be described as a modern, web-based replacement of RViz that, by being available on all major OS' and also being available in the browser, elegantly solves the accessibility problem RViz suffers from. This enabled a much greater variety of users to see robotic data, including video. Nevertheless, to use Foxglove for video streaming one still requires a VPN or custom-made cloud-proxy for establishing the connection between the robot and the client. As of this writing, Foxglove doesn't natively support video-compression yet either, making it suffer from the same bandwidth inefficiency as the previous two approaches, since again, individual images are transmitted. Foxglove is a big step in the right direct though and the team is working on adding support for h264-compressed video as well.
The web_video_server is a ROS package that has been available for many years. It consumes images from a ROS topic and compresses them into a video stream using modern compression algorithms including VP8 and h264. It then exposes an HTTP server on the robot from where these streams can be consumed. This solves the problem of bandwidth inefficiency, however HTTP, which uses TCP, is a sub-optimal choice for streaming over unreliable networks. This is because TCP guarantees the delivery of packets even when there are interruptions -- a property one actually doesn't want with video, because it results in lag building up with each network interruption. The better policy is to drop frames that are too old and show the most recent video frames instead. THe server is also not able to perform any sort of congestion control, i.e., reduce picture quality or frame rate, when available bandwidth fluctuates. Lastly, in order to connect to this HTTP server from afar, again a VPN is required or a cloud proxy that forwards this (unencrypted) stream.
WebRTC, Web Real-Time Communication, is still a relatively new protocol that so far is primarily implemented by browser for use in video conferencing. It is supported by all modern web browsers, making it readily accessible on any device without additional setup, and, given its purpose, it has many desirable properties and features for live video-streaming. These include support for modern video compression algorithm like VP8 and h264 (+ VP9 and h265 if desired), graceful handling of packet loss, use of the ICE framework for automatically finding the best network route between the two peers (robot and browser in our case), congestion control to dynamically adjust picture quality to account for fluctuations of available bandwidth, and end-to-end encryption. WebRTC by default uses UDP and typical latency is around 200ms.
If you've paid attention, you will have noticed that WebRTC scores high on all the requirements and criteria we've laid out above. So naturally everyone in the robotics industry is using it for video streaming, right? Wrong! As we will see, WebRTC is not yet used very much in robotics and the reason for that is actually quite simple: it's still a hell of a task to implement on an end device, i.e., not a browser. Even building your own WebRTC application for use between two browsers is not trivial. But outside of browsers there aren't many libraries one can use and those that exist are still very much work in progress. There has been some tremendous progress on such libraries in recent years, too, but in order to benefit from many of the great features of WebRTC named above, one still has to do a lot of work oneself. Because the truth about WebRTC is that it is not one protocol or standard, it is a loose collection of several RFCs, each proposing and specifying approaches for different aspects of what, as a whole, is needed for video conferencing. On top of that, robotics companies often run versions of Ubuntu that are already a few years old -- either because upgrading a whole fleet is not easy, or because they still run ROS 1 which is not supported beyond Ubuntu 20. This is a problem for implementing WebRTC since the libraries included in those older versions of Ubuntu are still missing a lot of the fixes and features that make WebRTC so desirable.
A report from the field
We polled roboticists on two different online forums to find out what people currently use for video streaming from their robotic fleet in practice. Here are the results for those respondents who do use video-streaming on their robots (around ⅔ of all respondents).
We invited people who built their own solution in house to comment with details. None of those that commented used WebRTC. Which is not to say that people don't use it. Through other channels we have heard from several companies that have already made the switch to WebRTC. It seems fair to say though that those are still exceptions.
Summing it all up, we evaluate the five approaches along the requirements and criteria laid out above as follows. Disclaimer: this evaluation is meant as a practical guide for someone choosing between these approaches and is definitely subjective in some regards.
Note that low latency is only provided by the first four approaches so long as there are no network disruptions, packet loss, or dips in available bandwidth. If any of these do happen then significant latency can build up -- even as much as 30 seconds.
Foxglove and web_video_server can be made accessibly without the need for a VPN that each watcher would need to configure and join, if an appropriate cloud-proxy is configured.
Regarding the security of the approaches it seems only fair to say that all these approaches can be made almost as secure as WebRTC, by choosing appropriate network level encryption. However at least the robotics company itself typically still retains the ability to watch ongoing streams, i.e., not providing privacy between the two peers. This last aspect may very well be a deciding factor when choosing a solution, especially when the robot is used in any kind of security or surveillance context, where customers may not want third-parties to be able to tap into the stream.
After many years of robotics companies trying a variety of approaches to streaming video from their robots there now exists a clear winner: WebRTC. It provides many features required for reliable, low latency streaming that many of us were not even aware off as we were exploring other means in the past. The primary downside of WebRTC is the complexity involved with implementing it, but we believe it is worth it.
Still not convinced? Try it on your own robots by installing our ready-to-go WebRTC Video capability -- which you can do for free when you register for an account on our hosted offering of Transitive. It only takes a few minutes and the capability includes everything you need: the code for the robot to tap the video stream and a web UI component for displaying the video. All required cloud services are provided by us. Afterwards you can still decide to implement you're own if you want, or just continue using ours by embedding the provided UI component in your own robot web portal aka. fleet management system.
Transitive is an open-source framework for full-stack robotic applications with a modular architecture. It is developed and maintained by Transitive Robotics, which also offers commercially supported capabilities ("apps") that run on Transitive such as video-streaming, remote-teleop, and health-monitoring. Transitive accelerates the development of commercial robotics applications by providing robotics companies with a growing collection of capabilities they can readily integrate into their fleet management systems.