Introduction

With the surge of video content generation and consumption on the internet, Enterprise Video Content Management (EVCM) industry is gearing towards making sure that their content can be infused with the power of artificial intelligence to be able to garner useful insights, and use them as an enhanced tool for content moderation, advanced redaction, search optimization, content discovery, user engagement and much more.


VIDIZMO combines the AI indexer engine with its all-encompassing, robust video content management application, to provide a powerful platform for enterprises to upload, store, secure, stream, search, discover, share and monetize their content as per their organizational needs. In a cost-effective VIDIZMO is packaged with various utilities and functionalities which key to streamline content use and access for internal as well as external communication.


Concept

Using advanced media AI capabilities and technologies, VIDIZMO has introduced enhanced video processing features which enable users to view, analyze and search hidden data within a video. This equips large enterprises with necessary tools to fuel their content with a more powerful and robust search engine. This is achieved by associating and indexing the video on the basis of the keywords, spoken words, faces, and other essential data extracted from within a video, more about it is discussed in the later sections of this article.  For example, indexing spoken words and faces can greatly change the way we search our videos, enabling us to look for specific moments where a particular person spoke certain words or when two people were seen together.


The process involved in generating these meaningful insights starts from your content being submitted for processing. The URL of the path of your content is submitted, and not the original media file, which requires that your storage folder should be a public share using that link. This poses a slight data storage restriction that your content should be stored and be able to be accessed online. After accessing and temporarily storing the content via the URL, it is processed by the Video Indexer. Since the Indexer runs multiple jobs (media processing) at once the more the number of processors installed, the faster the speed and efficiency. Video encoding, processing, and indexing is a resource-consuming and time-intensive job, which is why it comes with an additional cost mechanism as easy as pay-as-you-go based on your usage. To learn more about the Video Indexer pricing, see:  Cognitive Services Pricing - Video Indexer




Visual Analysis

Following are the list of insights or meaningful data captured visually from within a video during processing:

  • Face Recognition: This enables users to recognize and capture faces in the videos for later use. Application displays their appearances in a video using an easy-to-navigate timeline and groups them together for a summarized insight so as to identify everyone who were a part of a particular media. VIDIZMO also allows you to search videos using the name of a person and navigate to certain points in the video where they were seen in a video.
  • Celebrity Recognition: Having intelligently populated profiles of over 1 million celebrities in the database – such as global leaders, actors, actresses, athletes, researchers, business, and tech leaders across the globe. This enables VIDIZMO to identify and index your videos based on any special/celebrity appearances.
  • Facial Thumbnail Generation: This employs a technique to detect the best captured face amongst all the different appearances of that face within that video. This is determined on the basis of some parameters like the angle of the face when captured to ensure all facial characteristics are taken into account, quality and size of image, etc.
  • Visual Text Recognition Using OCR capabilities, text that is visually displayed in a video is captured and indexed to be made searchable in a video.
  • Labels Association: Using a vast base of verified data of common objects, scenes, visuals and other detectable aspects, VIDIZMO includes labels in your videos further enriching the search experience by associating incredible details to your media without any hassle of manual tagging.


Audio Analysis

Following are the list of insights or meaningful data captured via sound from within a video during processing:

  • Automatic Language Detection: Without the hassle of manually registering the language to be considered for a media, VIDIZMO now identifies the dominant spoken language in the video. Supported languages include English, Spanish, French, German, Italian, Chinese (Simplified), Japanese, Russian, and Brazilian Portuguese. However, if the language is not supported, default language i.e. English shall be selected.
  • Audio Transcription: With transcription facility, you can now smoothly convert speech to text in 12 languages. Supported languages include English, Spanish, French, German, Italian, Chinese (Simplified), Japanese, Arabic, Russian, Brazilian Portuguese, Hindi, and Korean.
  • Translation: With configured AMS audio insights one can create translations of the audio transcript to 54 different languages. To view the list of supported languages, read Language support for text and speech translation
  • Closed Captioning: Creates closed captioning files in VTT format. These are downloadable and easily readable using our transcription pane. To know more, see: Understanding Transcription Pane
  • Noise Reduction: For a clear, unhindered and smooth transcription process, it is imperative to not take into account the noise or external buzz in the video which is meaningless to the substantial spoken content. For this, VIDIZMO employs filters to eliminate telephony disturbances.
  • Speaker Enumeration: Enhancing searchability within the video, this feature processes and determines which speaker spoke certain words and maps these instances against the speaker onto the media timeline for ease of navigation.
  • Speaker Statistics: This gives you an insight of the participation of a single speaker in the video, how many times did they appear, for how long did they speak, and their speak ratio as compared to the other speakers in the video. These statistics can be fairly useful when determining how well someone spoke or performed.
  • Text-based Content Moderation: Using pre-defined database of crude words in a language, such explicit words will automatically be detected and eliminated from the audio transcript. This saves you the trouble of manually moderating content for end-users.
  • Audio Effects: This allows videos to identify effects such as an audience applause, continued silence, etc.
  • Emotion Detection: Enabling detailed semantic analysis, emotions can be detected in a video based on what is being said and the way it is being delivered/spoken about. The emotions could be: joy, sadness, anger, or fear.


Other Valuable Insights

Following are the list of miscellaneous insights or meaningful data captured from within a video during processing:

  • Keywords Extraction: Intelligent video processing techniques including semantic and logical analysis extract various keywords from the visuals and audio of the video to make it more searchable.
  • Brands Recognition: Using a pre-defined database of known brands, VIDIZMO is able to extract brands appearing in a video as either a visual or via a spoken word.
  • Topic Inference: Using a strong inference engine, topics are extracted intelligently from within the context of a video and the spoken/visual content in it. The 1st-level IPTC taxonomy is included.
  • Artifacts Extraction: Extracts a comprehensive set of intricate details as artifacts for each of the models. This enriches the video with closely relevant details.
  • Sentiment Analysis: Identifies positive, negative, and neutral sentiments from speech and visual text.


Use-Case

With the world constantly looking for something beyond the ordinary processing and playback of their video content, every industry ranging from healthcare to logistics are greatly benefiting with VIDIZMO's robust content management system powered with new forms of AI.

  • Facial recognition can come handful in a great number of routine tasks that were previously done manually, like monitoring employees in/out time, gathering a count of people involved in an event/part of a scenario, or something as serious as theft detection using real-time video capturing and processing. Specialized use cases include personal security and authentication, criminal/impostor identification for commercial security via surveillance and national security for combating terrorism.
  • Similarly, using VIDIZMO's advanced content moderation capabilities, be it within text or visuals, self-learning platforms (LMS) who allow their institutions to generate content and educate subscribers, are saved the hassle of manually curating each and every line of content that gets uploaded on your web. This allows them to automatically redact or beep any expletive language from videos and audios while also ensuring to remove illegal content from enterprise platforms at your ease.
  • Some other use cases include Law Enforcement Agencies such as the Los Angeles Police Department collects a tremendous amount of video and is using AI to determine what footage contains valuable data.  
  • Retail and Wholesale Industry is largely benefitting from video AI technologies. One such example is Amazon GO has successfully launched its human-less AI powered store which lets you shop without the hassle of check-in and check-out via object detection, cart analysis and motion sensors.
  • Healthcare industry has seen a major transformation in their work processes after incorporating and using AI for tedious yet meticulous tasks, otherwise done manually. Video capturing and analysis are being used to measure the ounces of blood loss of a patient during orthopaedic medical procedures, without having a nurse to continually monitor the process and rely on fragile medical apparatus for measuring it manually.