Data Storage Options In VIDIZMO
This article discusses factors to consider when planning your storage and discusses options without favoring one over the other.
Planning Ahead For Storage
As the video is increasingly becoming a popular medium for enterprises in their workplace communication, collaboration, and sharing, they are turning to increase storage capacities as an efficient way to manage scalable storage solutions that grow to meet the businesses' storage demands.
With huge file sizes, numerous formats and choice of technology, the challenge of increasing the volume of video content seem a daunting task. Storage capacity planning allows you to make informed decisions by taking into consideration how the content will be stored and much and how frequently it will be accessed.
Understanding Storage Requirements
Since storage is an ever-changing technology, its all about growth.
Eventually, businesses grow and so does the data. Businesses involved in video streaming see exponential growth as they generate more data in the form of videos in different resolutions and bit rates taking up considerable space. All this data needs to be stored, managed, called through and maintained, often over extended periods of time.
Then what do you need to look for when understanding storage requirements? Here some factors to consider:
You will probably want to store all your digital media in some centrally accessible storage which is large enough for your needs. As we discussed earlier, videos take up a considerable amount of space, especially if they are in high resolution. In some cases, a large number of standard resolution videos may be required to be stored or there are businesses who need to have many versions of the same data available at the same time.
You might also want to monitor your storage growth trends in order to predict future capacity requirements for your business. Some software allows you to collect storage capacity data over time to identify storage issues and growth trends, enabling you to forecast peak capacity and take necessary steps to proactively prevent maxed-out storage or downtime.
With the storage constantly being consumed, let us see how would one go about deciding how much of storage will be required by an uncompressed, raw video using the following examples:
1. Example #1:
This formula determines the number of megabytes required to store one uncompressed frame at full resolution:
(height in pixels) x (width in pixels) x (number of bits per channel) ÷ 2,097,152 = Space Occupied
The value 2,097,152 is a conversion factor that accounts for the number of bytes per megabyte (2 to power 20), the number of bits per byte (8), and the number of channels per pixel (4).
If we use 4 bits per channel, then according to this formula, space required for one uncompressed frame at full resolution would be:
720 x 1280 x 4 ÷ 2,097,152 = 1.76 Mb
Therefore, one uncompressed frame of 720x1280 resolution will take up 1.76 Mb of space on the drive.
2. Example #2:
Here's another example of a 2-minute, 1920x1080p video with a file size of 87 Mb:
i. Convert the Megabits/s to MegaBytes/s:
Divide the 87 Mb by 8 (since each byte consists of 8 bits) = 10.875 MB/s (MegaBytes/s)
ii. Convert the seconds to minutes:
10.875 x 60 = 652.5 MB/m
iii.The duration of the video is two minutes:
652.5 x 2 = 1304.4 MB
iii. Convert the Megabyte to Gigabytes (divide by 1024):
1304.4 ÷ 1,024 = 1.27 GB
Therefore, to store a 2-minute 1080p video, 1.27 GB of space is required.
Size of a compressed video using the above example:
The space occupied by uncompressed/ raw video is normally greater than those videos that have been encoded (compressed) for streaming but that too depends on the bitrate, resolution and even the audio profile used when compressing the video.
Using the same 2-minute 1920x1080p video example, following are the compressed sizes of the encoded renditions performed by VIDIZMO:
Resolution | Total Bitrate (kbps) | Frames/sec | Data Rate (kbps) | Space Required (MB) |
426 x 240 (240p) | 371 | 25 | 242 | 5.40 |
640 x 360 (360p) | 615 | 25 | 486 | 8.89 |
852 x 480 (480p) | 860 | 25 | 731 | 12.40 |
1280 x 720 (720p) | 3028 | 25 | 2900 | 43.4 |
There are also a number of online Video Storage Calculators to give an approximation of space required for a given compression format. By selecting various compression types/Codecs against a 30-minute video, here are the results from one such tool Video Space Calculator:
Format | Resolution | Frame Rate | Video Length (min) | Space Required |
H.264, 720p | 1280 x 720 | 23.98 | 30 | 10.38 GB |
MPEG-2, 3.7Mbps fixed rate | 720 x 486 | 23.98 | 30 | 832.5 MB |
Photo-JPEG, 720p | 1280 x 720 | 23.98 | 30 | 26.32 GB |
Note:
As the tool suggests, the actual space taken up may differ slightly due to embedded audio, differing frame sizes and aspect ratios, and inter-frame compression/pulldown.
The process of storing digital videos and delivering them to your audiences encompasses how fast and how much data can be sent to all of them at the same time. For considerably large audiences, this depends greatly on the way storage devices are set up and how they are connected to the network. If the devices and the network are not efficiently set up, video storage, as well as video delivery, will be affected resulting in a slow or a buffered video playback experience.
The two most important components related to data delivery are the LAN card and the drive itself. In both of these components, what really matters is the throughput because the data delivery will be slow if either one of them has been set up with a low throughput according to the load.
To demonstrate how throughput effects delivery, let us consider the following:
i. Scenario 1
If you have a storage device with a data transfer rate of 155 megabytes/sec (155 x 8 = 1,240 megabits/sec throughput), and if there are no other processes running on the device, you can use that device to serve a stream to 1,240 concurrent viewers, with each user getting 1MB/s of data.
ii. Scenario 2
An application transfers data @ 200,000 Bps (0.2 Mbps)
Storage Device bandwidth (without overheads) = 150,000,000 Bps (150 Mbps)
Concurrent Users: 150,000,000 ÷ 200,000 = 750
According to these calculations, you would need a storage device with data transfer rate of 150 MBps to serve 750 concurrent users.
It really does not matter how fast your hard drive's throughput is if the LAN card is just 100MBps as the data will be transmitted at the rate of 100MBps. On the other hand, if you use a 1GBps LAN card, then you can pick any 1MBps stream and serve it to 1000 viewers.
What Impacts Storage Capacity
Another factor that greatly impacts video storage capacity is the video compression, also often known as video encoding which uses codecs. The word "codec" stands for compression-decompression and it is a piece of software that makes your video readable by your computer and allows you to play it. Without the correct codec, you won't be able to play either the audio or video or both.
To transport the video over the internet video information is encoded and along the way, it is often compressed in order to make the file size smaller and more transportable. The more compressed the video is, the less quality is retained, and the more degraded the image usually becomes.
Compression is always a compromise between quality and file size and this is why you also need to consider file compression as the deciding factor when it comes storage capacity. The lower the compression, the more space the video will occupy.
Some common compression formats are:
- H.264: This is by far the most efficient compression format technique available that can reduce the file size of a digital video file by as much as 80% as compared with Motion JPEG format and as much as 50% more than with the MPEG-4 standard. Using H.264 compression allows you to use less network bandwidth and storage space.
- MPEG-4: As with H.264, since of a number of variables that affect average bit rate levels, calculations are not so clear-cut for MPEG-4 type compression.
- Motion JPEG: Since Motion JEPG uses one individual file for each image, it is easier to calculate storage. Storage requirements for Motion JPEG recordings vary depending on the frame rate, resolution, and level of compression.
VIDIZMO has a number of encoding profiles built into it to encode uploaded videos that can be enabled or disabled according to the customer's requirements. One such profile, with its associated attributes, is provided here as an example:
- Encoding Profile Name: mpeg4_1080p
- EncodingProfile Output = mp4
- Extension = mp4
- Size = 1920x1080
- Bitrate = 6144k
- Video Codec = libx264
- Audio Profile = libfaac
- Two_Pass = Yes
- Constant Bit Rate (CBR) = No
To learn more about encoding in VIDIZMO, click here on VIDIZMO Transcoding.
What Impacts Throughput?
As we discussed in the previous Section ii of 2. Understanding Storage Requirements, throughput is the amount of data that enters and goes through a system, and how fast that data is sent out or received depends upon the type of disk used. Since it is the throughput of the disk that we are concerned with, the disks to use for our storage must have the ability to quickly process the data.
In this section, we will be discussing the types of disks to enable us to decide which disk type suits our storage needs, which type of storage to use (SAN, NAS, DAS) and the which interface is appropriate to use by the devices to connect with each other or over the LAN:
These are the traditional spinning hard drive (HDD) or a solid-state drive (SSD). The HDD is essentially made up of magnetically coated metal platters, where data is stored on the magnetic coating. On the other hand, on the SSD, the data is stored on flash memory chips that are interconnected with each other. The data on these chips is retained even if there is no constant power or no power at all. Both perform the same function but SSDs are more expensive due to the almost double the cost of storage per GB available in an SSD as compared to HDD.
SSDs, naturally perform better than HDDs when it comes to speed and performance, which is paramount when it comes to mission-critical applications.
Network Storage devices are flexible, scalable and depending on the capacity, can be set up to accommodate growth alongside the businesses' storage needs. Network storage can be set up as a Network Attached Storage (NAS) or a Storage Area Network (SAN). What makes the real difference is how the server access these storage devices, using which protocol and media.
Direct-Attached Storage (DAS)
DAS is usually connected to one computer and not accessible to others. Compared to networked storage devices, it provides better performance when it comes to data transfer as the data does not have to travel from one server to another to read and write data. Of the greatest drawbacks using DAS is that it cannot provide any failover should the server crash. DAS devices are low cost and suitable for businesses with low storage requirements.
Storage Area Network (SAN)
Storage Area Networks or SANs, are used where high performance and high I/O is required. SAN can be considered as a technology that combines the best features of both DAS and NAS. The difference between a NAS and a SAN storage is mainly the way the storage is connected to the system and how the input and output requests (eg. SCSI, NFS, CIFS etc) are handled. Another factor to consider is the way each type of storage uses to physically connect to a system e.g. Ethernet or Fiber channel.
SAN provides only block-based storage and leaves file system concerns on the "client" side and includes Fibre Channel, iSCSI, ATA over Ethernet (AoE) and HyperSCSI protocols.
Network Attached Storage (NAS)
A NAS storage unit is connected to a network that provides safe, reliable data transfer and storage from a centralized location for authenticated users and heterogeneous clients. Similar to the benefits of a private cloud, NAS units offer fast, less expensive and complete storage system deployed over the network.
NAS systems contain one or more hard disks, often arranged into logical, redundant storage containers or RAID arrays (redundant arrays of inexpensive/independent disks). NAS devices remove the responsibility of file serving from other servers on the network.
To sum up, the difference between these two types of storage systems, NAS appears to the client as a file server while the SAN appears as a disk, just like any other disk in the operating system, along with client's local disks, available for formatting and mounting. SAN and NAS are not mutually exclusive and may be combined as a SAN-NAS hybrid, offering both file-level protocols (NAS) and block-level protocols (SAN) from the same system.
Table #1 Shows Major Differences between SAN and NAS
Table 1: Shows Major Differences between SAN and NAS
SAN | NAS |
Block level data access | File Level Data access |
Fiber channel is the primary media used with SAN. | Ethernet is the primary media used by NAS |
SCSI is the main I/O protocol | NFS/CIFS is used as the main I/O protocol in NAS |
SAN storage appears to the computer as its own storage | NAS appears as a shared folder on the computer |
It can have excellent speeds and performance when used with fiber channel media | It can sometimes worsen the performance if the network is being used for other things as well(which normally is the case) |
Used primarily for higher performance block level data storage | Is used for long distance small read and write operations |
Table #2 Shows Disk Speeds with Transfer Rates, Latency, and IOPS of some basic storage devices.
Table 2: Performance metrics for some basic storage devices
Speed | Transfer Rate MBps (Read/Write) | Avg Latency ms (Read/Write) | IOPS (Read/Write) |
5400RPM Disk | 123 | 15 | 67 |
7200RPM Disk | 155 | 13.7 | 75 |
10K RPM Disk | 168 | 7.1 | 140 |
15K RPM Disk | 202 | 5.1 | 196 |
Micron P400e SATA MLC SSD | 350/140 | 0.5/3.5 | 50000/7500 |
Micron P320h PCIe SLC SSD | 3200/1900 | 0.009/0.042 | 785000/205000 |
iii. Interface (SATA vs eSata vs Thunderbolt vs Firewire vs Ethernet)
USB 2.0, USB 3.0, eSATA, Thunderbolt, Firewire, and Ethernet are some of the technologies that are built into many of the computers sold today. Another major factor which affects the throughput is the connector used for the drives. As an example, consider using a high-speed connector like Thunderbolt3 with a low throughput disk, then the connector alone will not help in increasing the throughput of the drive.
The actual data speed is dependent upon how many drives are connected using these connectors, how the drives are configured (SAN/ NAS) and how full the drives are. The more data you store on a hard drive, the slower it gets. A drive is the fastest when it is empty and when it is completely full, it neither plays back nor records data. It is always a good practice to have at least 20% of free space in the storage for optimum performance.
Table #2 Shows Ideal data transfer rates by storage connection type. Rated speeds are not real world speeds which come to about 70% to 80% of the max speed listed.
Table 2: Ideal data transfer rates by storage connection type
Connection | Data Transfer Speed |
ThunderBolt 3 | About 3,000 MB / second |
ThunderBolt 2 | About 1,400 MB / second |
USB 3.1 Gen 2 | About 1,000 MB / second |
10-Gig Ethernet | About 1,000 MB / second |
USB 3.1 Gen 1 | About 450 MB / second |
1-Gig Ethernet | 105 MB / second |
FireWire 800 | 70 – 80 MB / second |
Formats Too Slow to Use | |
USB 2.0 | 10 – 15 MB / second |
FireWire 400 | 20 – 25 MB / second |
iSCSI | 75 – 95 MB / second |
Redundancy/ Backup Options
i. RAID - Redundant Array of Independent Disks
RAID groups individual physical drives into one drive known as the RAID set, which represents all the drives in the group as one logical disk to the server. This logical disk is called a logical unit number or LUN. Using more than one hard drive, as in a RAID array, increases performance and reliability of the stored content. With more drives, read and write transactions are fast. They can also easily be scaled, for example, a RAID 5 configuration can have at least 3 drives but can be scaled up to 16. This type of configuration can use both a software or hardware controller but it is recommended to use hardware controllers. To improve the write performance, often an extra cache memory is used on these controllers.
Locally redundant storage (LRS) replicates data synchronously, usually, three copies within the same data center, which means write requests to the storage are not committed until they are replicated to all three copies. Although local redundancy has its advantages in better identification of the initiating events that cause failures in the system, they are infrequent in use as compared to Geo Redundant Storage. Local redundancy may still be used where data can be easily reconstructed or where data replication is restricted to other locations.
Enterprises that serve geographically dispersed customers have the advantage of using network storage as Geo-Redundant devices to reroute content storage to a completely different datacenter in any of their locations. This is particularly useful since data is replicated across two geographically distant sites which allow applications to switch from one site to another in case catastrophic or human failure and still have all the configuration data available on the second remote site. In a High Availability configuration, the sites are set up in pairs and each geographic site has a name which is used for looking up data relevant to the site pair, with the name of the remote site defined in the local site.
To avoid unscheduled downtimes, high availability databases are configured in such a way that single points of failure (SPOF) are eliminated and the databases are optimized to ensure that the end user does not experience an interruption in service or a degradation in user experience on hardware or network failure. In short, HA is continuously operational or which provides at least 99% uptime, meeting the demands for 24/7 availability.
To learn more about High Availability in VIDIZMO, click here on How To Setup High Availability With VIDIZMO On-Premises/Private Cloud.
What Are The Available Storage Options?
Storage Solutions Optimized for Video Content
For reasons of security, control, and scale, many enterprises opt for On-Premise or Network Storage because unlike cloud-based storage, these solutions provide a higher level of control and flexibility as far as rollout and timeframes are concerned. This also allows support for applications running with data on one system where duplicating and backing up can be achieved efficiently.
If a Network Storage is used to serve content, VIDIZMO recommends using a fast and secure Network Storage (SAN/NAS) which can be mapped to a drive for VIDIZMO to store content, just as it uses the local CDN storage path in the Web server.
There are several storage solutions available in the market, some of which claim optimizations for video content. Gartner Magic Quadrant for Distributed File Systems and Object Storage released in Oct 2016 lists most of the major players in this space proving an overview and some level of comparison their storage solutions.
VIDIZMO once again reiterates that these solutions are discussed as options without favoring one over the other.
Among these, the top three, IBM Cleversafe, Dell EMC and Scality storage systems are briefly described below:
One such option is the IBMs Cleversafe dsNet Storage System, which comes with a full set of features for on-premise deployment. It offers a variety of storage interfaces including industry standard APIs. It has a proprietary component that manages object data in the form of erasure coded slices across a network of storage nodes.
Cleversafe uses Information Dispersal Algorithms (IDA's) that have provision for forward error correction and recover. By coding and dispersing information, the reliability, security, and efficiency of data storage can be vastly improved over traditional copy and parity-based systems. By using Zero-touch provisioning (ZTP), a switch feature that allows the devices to be provisioned and configured automatically, Cleversafe eliminates most of the manual labor involved while adding more hardware to a network. Another Cleversafe feature is its carrier-grade security which guarantees availability 99.9999% (6-nines) of the times.
Cleversafe can be deployed as on-premise software on industry standard and qualified hardware or as the pre-integrated appliance. See this presentation to learn more about IBM CleversafeFileNewTemplate (gpfsug.org)
ii. Dell EMC Isilon Scale-Out NAS Storage
Dell EMC's Isilon is a scale-out network storage offering performance and capacity for a wide range of enterprise workloads. Among the use cases, it offers high-performance computing (HPC), file shares, home directories, archives, media content, video surveillance, and in-place data analytics. Customers can choose from flexible Isilon all-flash, hybrid, and archive storage systems.
With Dell adopting the All-Flash technology it is providing the foundation that drives modern infrastructure. Dell uses this to outperform the competition with its industry-leading software features and flash-designed architectures that come together to deliver higher performance, lower TCO and better business outcomes for any company.
With the growth of Flash storage adoption in enterprise storage systems, it is increasingly being used for the acceleration of I/O-intensive applications, such as databases and virtual desktop infrastructures, to enterprise workloads since the cost of flash has dropped and businesses have attempted to take advantage of its performance and low-latency benefits. Flash Storage does not require power to preserve stored data with integrity, so a system can be turned off -- or lose power -- without losing data.
Here is a guide for Administrators to setup Dell Storage Network Attached Storage (NAS) Systems Using Windows Storage Server 2016 or 2012 R2.
To learn more, click on the links below:
- All-Flash Storage - Speed, Efficiency, and Simplicity
- Get Modern. Transform It With All-Flash
Scality is another storage solution which boasts of petabyte-scale storage without limits and with guaranteed efficiency and 100% reliability while promising cost reduction as much as 90% over legacy systems. Acting as a single, distributed system, Scality offers to scale linearly over multiple sites, and an unlimited number of objects.
It ensures high throughput and low latency across small and large files through its unique any-to-any performance capabilities. The platform’s access and storage layers can scale independently to thousands of nodes, all of which can be accessed directly and concurrently.
Scality Ring solution is designed on the principles of delivering true customer value: massive capacity scaling, consolidation of multiple storage silos with reduced management costs, always-on data availability and the highest levels of data durability, all at the economics of cloud-scale data centers.
You can use Cloudberry drive for scalable storage options. To learn more about Cloudberry, refer to Map Cloud Storage as a Network Drive | MSP360™ (CloudBerry Lab)
Performance
File performance: Up to 700MB/sec very large file reads and900MB/sec on mixed file writes per RING connector.
Object performance: Up to 1GB/sec very large object reads per RING Connector.
Operations per second: Up to 3000 S3 operations per second, per Bucket on S3 Connector.
Read-ahead cache for sequential IOs: System detects sequential access patterns and repeatedly doubles the amount of data fetched into the cache.
Cache striping optimization for small file random IO: System by default reads only the requested number of bytes into the cache (avoids fetching whole stripes – unless sequential access detected).
iv. Microsoft Azure StorSimple
In addition to these options, Microsoft Azure StorSimple deserves a mention here too.
Microsoft Azure StorSimple is an integrated storage area network (SAN) solution that manages store tasks between on-premises devices and Microsoft Azure cloud storage.
Using automatic storage tiering and thin provisioning, data is stored on various devices based on data access frequency with the current working set stored on on-premises SSDs, less frequently used data on HDDs and archived data on the cloud, thus providing a hybrid storage solution. To reduce the amount of storage the data consumes, StorSimple uses deduplication and compression.
Performance (StorSimple 8000 series)
Maximum client read/write throughput (when served from the SSD tier)*: 920/720 MB/s with a single 10 GbE network interface.
Maximum client read/write throughput (when served from the HDD tier)*: 120/250 MB/s.
Maximum client read/write throughput (when served from the cloud tier)*: 40/60 MB/s for tiered volumes. Read throughput depends on clients generating and maintaining sufficient I/O queue depth.
Data optimization: Automatic Storage Tiering, Thing Provisioning, Deduplication, and Compression.
* Maximum throughput per I/O type was measured with 100 percent read and 100 percent write scenarios. Actual throughput may be lower and depends on I/O mix and network conditions. (Ref: What are StorSimple 8000 series system limits?)
To learn more about Microsoft Azure StorSimple device, click on StorSimple 8000 series: A Hybrid Cloud Storage Solution.
Options Supported By VIDIZMO
When VIDIZMO is set up as an On-Premise installation, a website is created automatically in the Web server (IIS) and the path defined in the Web server is used as the physical location to save and serve content uploaded by VIDIZMO users. VIDIZMO offers complete integration with external storage systems and any solution that can be mapped to a network drive will work for us.
Conclusion
With the number options available to choose from, Live streaming or On-demand content requires business specific configuration to cater to their audiences. Selecting a storage solution that meets specific requirements can turn into quite a bit of a challenge.
This article covers some of the basic elements that influence storage types, performance, and some suggested storage solutions optimized for video content in the hopes that information can be made available to ease the decision-making process.
The suggested storage solutions described in section #4 Storage Solutions Optimized for Video Content, can be used as an example to look for similar products and devices available that can be integrated with VIDIZMO.