AI Building Cameras

In my professional life I’m part of a team/org that builds camera products. I spent some time in my graduate school days learning about image formation and cameras. Modern digital cameras are highly complex systems there is a lot going on inside a simple looking digital device.

I was listening to the excellent Dwarkesh Patel podcast episode: “Sarah Paine: How Imperial Japan Crushed Tsarist Russia and Qing China.” There are several threads running in this excellent analysis but the TLDR is:

Japan’s reforms helped gain state capacity
China’s strategic missteps created opportunity for Japan
Russia’s Tsarist regime didn’t have institutions to avoid strategic blunders

It illustrates the need for building institutions and state capacity. One interesting thread being discussed in the episode was how it’s important for adversaries to understand each other’s capability. For example, Sarah makes the case that Japan saw what happened to China during Opium wars as an existential threat for them. They viewed building state capacity as essential for survival - to avoid colonization. When they went to battle with Russians, in Japanese people’s mind there was a lot to lose. Whereas, for the Russians the battle was framed in terms of expanding territory. Her argument is that if we knew these internal logic on two sides, then, you can almost predict the outcome of the war. The side that has more to lose should be favored to win - provided they don’t make strategic errors.

Why am I talking about wars in a post about AI? A thought occurred to me that if we pit ourselves against “AGI”, then analysis of wars provides a framework to understand who’s potentially going to win. That is the case I want to make. Furthermore, it might help us devise evaluations to know where things are in the journey to AGI.

It’s really important to understand what “AGIs” can and cannot do from multiple different perspectives. Let’s dive in a little bit into building a single complex system to understand what’s hype and what’s not. Cameras are an important complex system - it lets us (or AI) see the world remotely. An AGI would have to rely on cameras to know what’s unfolding at a remote location or even locally - since it doesn’t have biological eyes.

Let’s take a modern day digital security camera and look at the constraints it needs to satisfy. A security camera is built to:

handle all season weather
connect to the internet
have optimal algorithms to raise alarms (or detect events)

The camera contains the following components:

Mechanical hardware
- Optics
- Materials and design for operation in a wide range of temperatures
- Mounts
Electronic hardware
- Imager
- Silicon/PCB
- Battery
- LEDs
- Wires/cables
- Mic/speaker
- Radio
- Sensors
- Power management
Software
- Drivers
- Operating systems
- Firmware and application software
- AI models

The mechanical hardware deals with problems of the environment such as:

how to make the camera “IP 65” rated
how to manage the heat produced by various chips in Arizona in the summer
how to keep the camera stable during operation
how to make it easy for people to change batteries
how to position the sensors for their optimal operation
how to provide any material/mechanical support to improve sensor fidelity

Electronic hardware deals with functional requirements of the camera:

what field of view should the camera have
how well can the camera record video
can it record video and live stream at the same time
how to make all these electronics work together and supply currents etc.
can the camera record and emit audio
does the camera have its own illumination
does the camera process other EM waves (antennae for wifi, bluetooth)

Software deals with camera operations:

what format should the camera record video in?
how to let the camera record and live stream at the same time?
how to communicate with other devices? how to communicate with phones, wifi access points?
when should the camera record and raise an alarm?
how to produce the best quality video with the available resources?
how to produce the best quality audio with the available resources?
how to keep the camera secure while connected to the internet?
how to update the camera without it useless?
how to add new capabilities to the camera?
how to deal with hardware limitations such as maximum disk write limits, sensor limitations etc.?

I’m sure I’m missing something important despite having a long list of questions above. A lot of these problems are “solved” to a degree, but even then with every new product or with technological progress new challenges emerge that need to be solved in every one of these three major domains. One way companies deal with this is by becoming institutions. They discuss these problems, document hard won lessons and preserve their knowledge even as people come and go. This is the institutional capacity of a corporation. This capacity is essential in driving progress towards achieving organizational objectives. All the corporations are built on the assumption that the people occupying various roles at these teams are skilled, have agency and intelligence. The above roles are all engineering focused. Companies employ more than just engineers - even if they are technology companies. They need to solve many other problems to survive and make money.

Can AGI solve all these problems on its own, without relying on human resources? I doubt it. Just see the number of people it takes to build a camera. You need even more people to sell it and make money. All these people are AGI. This is just for building one single complex system.

In the episode itself, there is some discussion around how institutions and state capacity is key to achieving goals for a nation state, including economic/productivity growth. Productivity growth is related to progress. You can’t have one without the other. If that were true for human societies, shouldn’t it be true for AGI as well? What is the equivalent of state capacity for AGI? Since AGI is trained on human data it’s pretty reasonable to assume that it might think that it’s “human.” It would inherit/copy the institutions from human society - in the manner of Meiji reforms of Japan. Would that work?