AI: omnipotent or impotent?
Like “cloud” and “big data” a few years ago, “AI” has been pretty hype-y for the past couple of years. It’s always important to look beyond the hype, to try to understand if there’s a real technical advance underlying the trend, and what it is. In the case of AI, there sort of is actually, as this figure shows:
Basically, between 2011 and 2015, computers got to be about as good at recognizing objects in images as people are. They also got a lot better at understanding language; the two problems are similar. I worked in a speech lab for a summer in 2006, and it’s hard to overemphasize how much different this curve is than the progress back then. At that time, entire Ph.D. theses were written about low-single-digit improvements on standard tasks like this.
That’s cool! And that’s what all the hype around RNNs and so forth has been about. For the IoT world, this means that products which involve imaging in some way have gotten a lot better. For instance PIR sensors were state of the art a few years ago for occupancy sensing; now you can buy a VergeSense or PointGrab that does a great job not just of occupancy detection, but people counting or lots of other features you can extract from an image. Verkada will upgrade your CCTV system with structured data, extracted from the live feed. Nest will identify children, dogs, and fire alarms. There are plenty more floating around.
So, are we close to the omniscient AI overlord? In a word, no.
When I think about “AI” applications, it’s useful to divide them into a few types. The first, and easiest to deploy, are “sensor enriching” applications — things like Nest, VergeSense, and Moen Flo; these use AI to enrich an existing sensor stream with structured events — frequently but not always from images. Flo is actually my favorite residential IoT app so far, and extracts water-fixture-level data from the main water meter on your house, and shuts off the water when it detects a leak. Stuff like this has appeared in the literature for decades (for instance in 2011), but now you see it reaching the mass market. I would also place techniques like certain types of fault detection into this bucket (some of which involve more “AI” than others).
I like this kind of application because you basically take an existing type of sensor (water meter, camera, whatever) and add the ability to interpret the data automatically. They’re easy to deploy since they don’t really require cross-system integration: instead of a “dumb” camera, you now have a “smart” camera. You don’t have to be right all the time to be useful since you are layering on top of a lower-level data stream.
A second area that has been promising for a while, but may start to see some adoption is the use of AI and image processing techniques to populate semantic models, which we previously discussed. Sometimes called digital twins, companies like C3.iot are commercializing reading paper diagrams, (P&ID ones in this example) and building digital representations of the data they contain. The community has long looked for automatic mechanisms for applying semantic tags to IoT data streams (e.g, this one from 2015; there are many more). You’ll also start to see AI techniques appearing in features like the Azure IoT indoor maps ingestion as well, to build useable geographic representations from legacy formats like IFC, 3DXML, or other BIM formats. Much of the work the various schema groups supports this type of application by defining the target model being populated.
These applications suffer from a “correctness trap” — because they use inference algorithms or other stochastic techniques to extract structured data from noisy sources, you usually can’t guarantee correctness. For instance, if you use a neural net to read plans and extract fire damper symbols, there’s really no way to guarantee that you found them all. While this might not be a practical problem most of the time, it requires a mindset adjustment. I would probably not be happy with the architect who, in response to the question “how many smoke detectors does this building have” replied “120, +/- 2%”. It may also rule out certain types of applications where the inability to guarantee correctness, or the need for manual review makes the techniques uncompetitive.
The “holy grail,” and perhaps what most people jump to when they think of “AI for IoT” is some sort of omniscient puppet master/decision agent, orchestrating and optimizing nobs at multiple levels of abstraction, and across multiple systems to achieve some sort of global optimal. Things like the Google DeepMind datacenter (also this) result hint at this type of vision or end state. More narrowly, there has been a lot of work in specific domains, for instance applying MPC to building automation, or coordinating building operations with the electric grid. This is the kind of thing that digital twins might be used for.
While details about what is involved in implementing such techniques are not easily available, my suspicion is that we are years from any type of widespread application. Just like computer systems, the physical systems IoT connect to consist of a layered set of abstractions, with well-defined interfaces between them. Often, “AI” will try to look “through” these layers to find optimal operation points that would be difficult to otherwise determine. This type of optimization typically requires deep visibility across these interfaces and systems; or the ability to affect changes and observe the result.
Furthermore, although huge progress has been made in vision and signal processing, I’m not sure the same can be said about decision agents. For instance, much of the slow progress in self-driving cars seems more due to the difficulty of the planning problem, rather than perception challenges.
So what’s going to happen? Roughly speaking, this framework gives us a way to answer the question.
Data apps, which layer a structured interpretations of sensor data streams will see rapid deployment and adoption. The core techniques have seen substantial progress; and there are no significant barriers to deployment — they are a “better mousetrap.”
Twin construction applications are currently high in hype, low on results. Standards ontologies and even application use-cases are not widely agreed upon; and deploying these is significantly harder than interpreting single data stream due to the need to integrate across system, organization, and security boundaries. And, once deployed, it’s not always obvious what to do with a twin.
Puppet-master applications will continue to provide large wins for organizations with the expertise and patience to implement them, but will not see widespread deployment due to the challenge of achieving visibility and control through multiple layers of a complex stack, and the risks of this type of control.