
Today, Artificial Intelligence has revolutionized the way people look at information collection and distribution. Detecting objects within videos enhances searchability within the video by navigating the user to time stamps where objects appeared and redacting all those objects without manual human intervention and labor. 

The VIDIZMO Indexer App provides advanced intelligent detection capabilities utilizing powerful AI models. These capabilities include face detection, people detection, license plate recognition, vehicle detection, optical character recognition (OCR), and weapon detection.

Before you start

  • Ensure you log into your VIDIZMO Portal as an Administrator or Manager to configure VIDIZMO Indexer in the VIDIZMO Portal for object detection and enhanced searchability.

VIDIZMO Indexer App Configuration

Accessing the Portal's Homepage:

  1. Click on the menu icon on the top left-hand corner of the screen to bring up the left navigation pane.
  2. Then click on the down arrow to expand the Admin section.
  3. Select Portal Settings. 

    Accessing Apps Option:

4. Navigate to the Portal Settings section and click on the Apps options in the navigation pane to expand the list of applications available in VIDIZMO.

5. Select Content Processing, where you can set up VIDIZMO Indexer.

6. Click the settings icon next to "VIDIZMO Indexer" to access the configuration options.

 VIDIZMO Indexer Screen:

The configuration screen of the VIDIZMO Indexer is designed to portray the dependency of every field on the selected object detection category, which is the detection type, media type, and model version. 

In this step, configure the settings based on the selected object detection category:

A. Detection Type

This is the type of object that the VIDIZMO Indexer can detect. At the moment, VIDIZMO Indexer provides detection of the following objects: 

  1. Vehicle 
  2. Face
  3. Person
  4. Weapon
  5. Optical Character Recognition 
  6. Activity Recognition

B. Media Type 

In this step, configure the media type settings based on the chosen object detection category. Choose the media type from the following options:

  1.  Image
  2. Video
  3. Document 

Note: Document media type only works with OCR detection, and activity recognition works only for video media type.

1. Vehicle Detection

To configure the VIDIZMO Indexer for vehicle detection, you'll need to fill in the following fields:

  • Model Size: Choose from three available models: "Small," "Medium," or "Large." Larger sizes provide increased accuracy.
  • Confidence Threshold: Input a value between 25-95 to determine the minimum confidence level required for object detection. The default is set at 45.
  • Tracking Frame: Input a value between 7-25 to set the number of frames used for object tracking in videos. This represents the consecutive frames in which an object is detected. Applicable in videos only.
  • Category: Select the appropriate object category for vehicle detection. Options include bus, bike, car, license plate, and truck.

2. Face Detection

To configure the VIDIZMO Indexer for face detection, you'll need to fill in the following fields:

  • Model Size, Confidence Threshold, and Tracking Frame functionalities remain consistent with the descriptions provided earlier in this article.
  • Facial Attribute: Choose the facial attribute you want to detect from the options: age, gender, and race.
  • Face Process Threshold: Set a value between 70-90 to filter out blurry faces for accurate attribute predictions by the AI model. The default value is 85, and you can adjust it between 70-99 based on your preference.

 Note: These facial attributes are when automatic processing is enabled. Upon uploading an image or video in VIDIZMO, the facial attributes will be detected and displayed in the VIDIZMO player. To perform facial detection when automatic processing is disabled, please refer to the detailed instructions provided in the associated article."Person's Attribute Detection: An In-Depth Explanation"

3. Person Detection

To set up the VIDIZMO Indexer for person detection, fill in the Model Size, Confidence Threshold, and Tracking Frame fields. The functionalities remain consistent with the earlier descriptions in this article.

4. Weapon Detection

To configure the VIDIZMO Indexer for weapon detection, fill in the Model Size, Confidence Threshold, Tracking Frame, and Category fields. Note that for weapon detection, the available category is "gun detection." The functionalities align with the descriptions provided earlier in this article.

5. Optical Character Recognition

To configure the VIDIZMO Indexer for optical character recognition, you'll need to fill in the following fields:

  • OCR Provider: Choose between "Paddle OCR" and "Tessaract" for Optical Character Recognition.
  • OCR language: Select the OCR language from the dropdown menu. By default, VIDIZMO offers support for English and Chinese.

Note: OCR-related fields are specifically relevant when the selected object detection type is Optical Character Recognition. If you require additional information, we suggest referring to "How to Perform OCR using VIDIZMO Indexer." 

6. Activity Recognition

To configure the VIDIZMO Indexer app for activity recognition, you only need to fill in one field. Choose the desired activity from the three available options: robbery, shopping, or trespassing. Note that activity recognition is applicable only for the video media type.

C. Automatic Processing 

Choose whether to turn automatic processing on or off. Enable Automatic Processing: Toggle the switch "On" to enable automatic object detection for uploaded media. This eliminates the need for manual intervention.

D. Saving configuration

Once you have completed the configuration, click the "Save Changes" button to apply the configured settings.

Content Processing screen

7. Enable the toggle button to configure automatic object detection video in your portal. A notification will appear briefly stating, "App Settings Updated Successfully." 

VIDIZMO Indexer Configuration Field Descriptions 

Model Size

There are 3 models, i.e., Small, Medium, and Large.

Small Model: This model is less resource-hungry, which means it requires less memory and computation power, and it takes less time to detect objects within the video. However, this model detects objects with less accuracy as compared to the medium and large models. This model is recommended for users who have machines with less computation power and memory. 

Medium Model: This model is more resource hungry as compared to the small model, which means it requires more memory and computation power, and it takes more time to detect objects within the video as compared to the small model. Moreover, this model detects objects with more accuracy as compared to the small model. This model is recommended for users who have machines with moderate computation power and memory. 

Large Model: This model is most resource-hungry, which means it requires more memory and computation power, and it takes more time to detect objects within the video as compared to small and medium models. Moreover, this model detects objects with the most accuracy as compared to the small and medium models. This model is recommended for users who have high-end machines (GPU is a plus). 

Confidence Threshold

The Confidence Threshold is the field the user can set to tell the VIDIZMO Indexer only to show the detected objects when the model is confident at a minimum or more than the provided input value. The model will only save those objects on which it is at a minimum or more confident than the provided input value in this field and disregard otherwise.

Tracking Frames

Tracking is the field that the user can set to specify the model to consider it as an object when the specific object appears in,  e.g., at least seven consecutive frames. The model will only consider those objects that appear in at least 7 consecutive frames and disregard those objects that don't appear in 7 consecutive frames.


The Category field is applicable when configuring the VIDIZMO Indexer for specific object types. Users can choose an appropriate category to classify the detected objects further.

OCR Provider

The OCR Provider refers to the technology used for Optical Character Recognition (OCR) during document processing. Users can select between "Paddle OCR" and "Tesseract" as OCR providers.

OCR Language

The OCR Language field allows users to choose the language for Optical Character Recognition (OCR) during document processing. It provides a dropdown box with multiple language options for selection.

Face Processing Threshold 

The Face Processing Threshold serves to eliminate blurry or unclear faces, ensuring the AI model achieves precise attribute predictions. The default value is set at 85; when the Confidence Score of a detected face falls below the specified Face Processing Threshold, the model refrains from making attribute predictions, classifying such faces as "Not Predicted."If a detected face has a confidence score below 85, the AI model won't generate attribute predictions for that face.

Activity Recognition

 Activity Recognition allows users to extract AI insights specific to predefined activities within their videos. To detect activities such as robbery, trespassing, and shopping. Activity Recognition exclusively operates on videos, requiring users to select at least one activity type while adding this Detection Type. This enables users to tailor their AI-driven insights by selecting specific activities relevant to their video content.