(Note: This exercise was originally developed in the times when online maps were far less ubiquitous. But it stands the test of time, even with modern technology).
You are assigned a task: create a map of New York City. You start with a blank piece of paper, as big as you want it to be. The question is: How do you draw the best possible map that can be drawn?
If you’re not a professional cartographer, don’t worry. We can step through it together informally, allowing stream of consciousness to be our guide.
Some basics are clear and not in doubt. For starters, the map needs to be neat and accurate, with things shown to scale.
Our map will need all the streets and street names. So let’s add those.
Our map also requires the names and locations of parks and subway stops. Maybe subway routes also. Add all those.
It needs the names of buildings. Maybe the location of alleys, walkways and bicycle paths. Add those.
It would be really great to know where are the ramps from sidewalk to street. And the restaurants and hotels and theaters and attractions. Add all those.
Location of curbs? Add them. Speed limits? Add those too. Fire hydrants, manhole covers, major and minor trees, addresses, apartment numbers, apartment layouts?? Add add add!
You get the idea by now. In a quest for perfection, our tendency is to add everything and subtract nothing. It’s an American attitude and it’s not always wrong: more is just better! This attitude is echoed by a common thought pattern taught to engineers from their earliest days: we find answers by digging deeper. If we just keep adding until we’ve added everything, then the map will be perfect. Right?
Wrong of course! If we must add every single thing, then the resulting map of New York City becomes literally the size of New York City. In the end we have created not a map, but a life-size replica. With every grain of detail present, aggregating nothing, we have not created a solution. We have merely repeated the problem: a large and complex thing which cannot be easily traversed due to its high complexity.
A map is useful precisely because it does not include everything. It aggregates and simplifies. When I see a map of the whole city on my phone, I can’t see much. A few bridges and tunnels connect Manhattan to the rest. The Bronx is up and the Battery’s down; Central Park is in between. That’s about all I can see on my iPhone screen; and that’s fine. If the map program attempts to show me every street, much less the name of every street, it is immediately choked with information. I see that information when, and only when, I zoom in tighter.
We marvel at Google maps (and similar programs from Apple and others), because of the breathtaking detail available to us at lower levels. All that detail is indeed a triumph, especially to old guys like me who grew up with paper maps. But no less a triumph than detail is the intuitive scaling and re-scaling of these electronic maps as we zoom in and out. Each scale shows the relevant information that can be shown at that scale. And not more. In fact, “not more” is the whole point.
Application Example: Safety Monitor Strategies
For people in the software safety business, there’s an important application of this problem to safety-critical vision processing systems. Or rather, to the challenges of safely applying vision-processing systems in critical applications.
In some fields of safety-relevant controls engineering, we deploy a so-called “safety monitoring” technique. This typically means that some primary complex software performs the control function (on a motor controller for example); and a simpler independent piece of software “monitors” the primary software for safety. The primary control can do whatever it wants, so long as the simple safety monitor is satisfied that no unsafe command has been issued. But if the safety monitor sees something unsafe, it is empowered to shut down the system to ensure safety.
This “safety monitor” architecture is widely used in motor controls. For example it’s highly likely the engine or e-motor in your car employs such a safety strategy. Multiple times per second, a piece of software looks for hazardous conditions. (For example, if your engine is at wide-open throttle and burning up the tires, yet your foot is not pushing the accelerator pedal, that is, ahem, hazardous!) If such hazardous condition is observed, the safety software intervenes and shuts everything down. Otherwise, in normal conditions, the safety monitor software stands down and allows the much more-complex engine control software do its thing. Safety monitoring strategies (along with many more deep and complex details and implementations) are industry standard in automobiles today.
This safety-monitor approach is a great safety architecture for motor controls, and some other applications as well. But it’s difficult or impossible to deploy this strategy for safety-critical video processing; for example to enable self-driving or driver-assist features by looking for pedestrians and other cars in real-time video. Why? Can’t we process video with our main application software, and then check and see if they were processed correctly with a safety monitor?
Not exactly. Because the whole strategy assumes that the monitor software (a simple/small piece of code) can “see” unsafe conditions as they occur. In motor controller situations, such conditions are readily visible with simple sensors and algorithms. But for complex machine-learned sensor fusion algorithms, such as computer vision for pedestrians and other objects, there is no simple clean definition of “something unsafe” and no simple way to perform a simple safety-vision-monitor function. The only way to do it would be to re-create a new and independent software function for more computer vision, repeating an implementation of all those very- complex machine-learned sensor fusion algorithms. In other words, the monitor software cannot work well enough to see safety problems, unless we make the monitor software jut as capable and as complex as the primary function software. The map of New York City just ballooned into the size of New York City. The strategy is no longer valuable.
I don’t mean to say that vision processing can’t be safe or that safety monitors are totally worthless in such an application. They can work to some degree. But they’re a challenge. Because the old ideas of abstraction and simplification do not solve the basic safety problem. It means you can’t abstract much… and in systems terms, that is akin to being stuck at the street level where you live, unable to access the birds-eye view that an aggregated map would provide.
Mammal brains handle vision processing in 2 major layers. (oversimplication) Parts of the thalamus in the midbrain is responsible for not bumping into things and eye/hand coordination, while the occipital lobe in the back of the brain is responsible for recognizing what things are, describing them. It could be worth taking inspiration from this to have a safety system that only does the not bumping into things... except most cars' visual systems are only that anyways.
Okay, it might be useful to have what is normally the whole system as a safety system, if you're Tesla and have a software image processing system made from loads of driver data as well.
You know, I could get a lot of mileage from metaphors where safety systems and hardware controls are compared the parts of the brain inherited from our common ancestors with fish or invertebrates. Except either you'd have to be incredibly nerdy to understand, or I would end up explaining everything twice. >_<