Visual examples of some of the compression tests. Original sub-panel of video and differences between original and compressed frames. Difference previews are amplified 2x to enhance visibility.

Video Acquisition

Imaging Hardware

All data was acquired using the same imaging equipment. Data was acquired at 640x480px resolution, 8-bit monochrome depth, and 30fps using Sentech cameras (Model: STC-MB33USB) and Computar lenses (Model: T3Z2910CS-IR). Exposure time and gain were controlled digitally using a target brightness of 190/255. Aperture was adjusted to its widest so that lower analog gains were used to achieve the target brightness. This in turn reduced amplification of baseline noise. Files were saved temporarily on a local hard drive using the “raw video” codec and “pal8” pixel format. Our typical assays run for two hours, yielding a raw video file of approximately 50GB. Overnight, we use FFmpeg software (https://www.ffmpeg.org/) to apply a 480x480px crop, de-noise filter, and compress using the mpeg4 codec (quality set to max) using the YUV420P pixel format, which yields a compressed video size of approximately 600MB.

One camera and lens was mounted approximately 100cm above each arena to alleviate perspective distortion. Zoom and focus were set manually to achieve a zoom of 8px/cm. This resolution both minimizes the unused pixels on our arena border and yields approximately 800 pixels area per mouse.

Video Compression

Currently our lab collects video data at 480 x 480 pixels resolution, 30 fps, 8 bits per pixel. Below several compression standards we have tested for a 1hr. 40 minute video. For direct comparison, the raw cropped file is included.

Two Lossless formats we tested: Dirac and H264. These video encodings are lossless, which means that frames are identical to the original. H264 has a slightly smaller file size, but takes slightly more time to transcode. Dirac isn’t widely supported without transcoding again to another format.

Since we are aware that sensor noise exists, we can use strategies to enhance compression while only making changes to the video that are smaller than observed noise in our setup. The typical approach is to use a quality metric, but this often appears as blocky pixelated artifacts. We also include a temporal de-noising filter in our tests, which attempts to smooth temporal pixel variation. It functions as a low-pass filter that incorporates preserving precision and boundaries, making it ideal for smoothing out black noise while preserving the information we are interested in observing.

Compression tests

Codec

Filters

File Size (Bytes)

Notes

ffmpeg Command

Raw

None

41,911,246,848

Dirac

None

15,568,246,208

Lossless

-c:v libschroedinger -q 0

H264

None

14,840,999,602

Lossless

-c:v libx264  -crf 0

MPEG4

Q0

2,471,731,352

Lossy

-c:v mpeg4 -q 0

Dirac

HQDN3D

10,113,632,202

Filtered

-c:v libschroedinger -q 0 -vf hqdn3d

H264

HQDN3D

8,008,906,680

Filtered

-c:v libx264 -crf 0 -vf hqdn3d

MPEG4

HQDN3D Q0

429,522,590

Lossy Filtered

-c:v mpeg4 -q 0 -vf hqdn3d

FMF

None

41,720,668,160

Lossless

UFMF

None

41,722,662,912

Lossless

Within ffmpeg using the MPEG4 encoder, setting a variable bitrate is easily achieved through selecting a quality value (from 0-31 with 0 being near lossless). Our videos have approximately 0.01% of the pixels changed (increased or decreased a maximum of 4% of their intensity) from the original when using the quality 0 parameter. For our data, this accounts for about 25 pixels per frame. The majority of these pixels are located in the boundary of shadows. It should be noted that changes in the image this small are along the scale of noise interfering with the camera itself. With larger quality values, artifacts are introduced to better compress the video data. As seen in the video above, even with a Q value of 5 is introducing severe blocky artifacts.

In addition to these formats, other researchers have investigated this issue and have created lossless formats to accommodate their datasets. Two of these are the FMF codec (fly movie format), and the UFMF codec (micro fly movie format) (http://ctrax.sourceforge.net/any2ufmf.html). The purpose of these formats is to minimize extraneous information and optimize readability for tracking. Since these formats are lossless and function on a static background model, unfiltered sensor noise did not allow for any substantial data compression. We suspect that we could achieve substantially better compression if we apply the HQDN3D filter, but did not continue testing due to the restricted portability of this codec since the only readers are provided through a matlab interface.

We selected the MPEG4 codec with the HQDN3D filter because it provided improvement to our tracking and significantly reduced file size (approx. 100x smaller than raw). All loss of information was experimentally validated to be orders of magnitude less than what is generated from sensor noise (a video acquired in the absence of mice). This type of noise removal greatly enhances compressibility while preserving important information in the frame.

We do note that at our spatial and temporal resolution, whiskers can be unreliably observed in the raw videos but are filtered out after the denoise filter. This is because they move at high frequency (~100Hz) and are low resolution (~5px long). The denoise filter has parameters indicating the strength of the smoothing (cutoff frequencies for spatial and temporal). Lowering these values in the filter will attempt to preserve this information at a cost of increased file size.