Video Analytics is a broad term for technology that can analyze video frames and either alert user/client to specific things it’s learned to recognize such as a person walking, or alert when ‘unusual’ activity is detected. There is a lot of work that needs to happen to accomplish these goals. It’s helpful to think about how to build systems that can do analytics so that we can reason about its components.
graph LR
Frames --> Information-Extraction --> Information-Aggregation
Information-Aggregation --> Distill
Information-Aggregation --> Categorize
The abstract pattern for Video Analytics system contains the following layers:
- Information Extraction - in this layer all of the low level processing happens to extract information from frames. Computer Vision algorithms are used and the key requirements are to achieve efficient processing while maintaining a high level of accuracy.
- Information Aggregation - in this layer, information from multiple frames will be aggregated so that it can be then acted upon by the client. For example, a detected person can be categorized as either approaching the camera (or something valuable) or not. The key requirements are to maintain correct state information.
- Categorize - in this layer, the state information gets categorized as something which needs to be acted upon or not.
- Distill - in this layer, the state information would get ranked/reduced to remove any redundancies.