The camera housings are rigidly mounted on either side of the submersible because the positions of the cameras relative to each other must remain constant to track an object accurately. A control box remotely powers both cameras and electrically controls the irises. When operated inside a manned submersible, the control box is connected to the cameras via a cable which penetrates the submersible's hull to transmit video and control signals (Fig. 1). Video recording is controlled from two VCRs inside the submersible. Sequences on the two videotapes must be synchronized for subsequent identification of identical frames. We use a time-code generator which lays down a time-code signal simultaneously on one audio track of each VCR. When the original tapes are dubbed onto work tapes, the auditory time code is transformed into a visual code in minutes, seconds, and frame numbers at 30 fps, that appears on every frame. The greatest advantage of the time code over periodic synchronizing signals, such as a strobe flash or tone, is that the continuous visible code allows the user to be sure that identical sequences of frames on multiple videotapes are selected for digitizing.
We use Super-VHS or Hi8 videocassette recorders because of their high video resolution. For recording from the Johnson Sea-Link , where space was limited, we used 2 portable VCRs. The model we chose had 2 audio inputs, allowing the observer to record commentary on the second audio track while the time code was recorded on the first. A portable monitor attached to the outputs of the two VCRs displayed what was actually recorded and permitted the operator to hear the time-code signal being recorded on the audio tracks.
For dives with the Johnson Sea-Link we calibrated the cameras with a "stick" box made of 12 black rods, each 71.0 cm long, inserted into 8 white plastic cubes which served as calibration targets on each corner of the box (Fig. 2). The cameras on the Johnson Sea-Link were positioned so that the grid filled approximately 3/4 of the monitor screen in each camera's view when it was held in front of the submersible. The two cameras were focused approximately 2 m in front of the submersible's sphere. Once the cameras were positioned, the motionless calibration grid was recorded with both cameras. The cameras were calibrated at depth from the submersible by holding the stick box in front of the cameras with the claw. Calibration at depth is easier than calibration on deck because of the uniform black background beyond the targets. We recorded animal behavior with the 3-D video system from the Johnson Sea-Link both in midwater and while the submersible rested on the ocean floor. One example is presented below.
The data presented encompass 45 frames of video, or 1.5 seconds. The paths of both the sergestid and the fish initially continued in the same directions they had been swimming since they first appeared. Fig. 3 shows the 2-D paths of the sergestid and the fish as they were computed from each individual camera angle. Note that the computed path directions of the animals are strikingly different in the two views. Nonetheless, after the x and y coordinates for these two views were combined to track the animals in three-dimensional space, the true paths and other motion-related parameters could be calculated.
Fig. 4 is a plot of the sergestid's speed over 1.5 seconds, calculated by the ExpertVision system from these coordinates. For the first 17 frames, the sergestid's speed averaged 12.2 cm s-1. Its lobstertail response to the proximity of the fish reached a maximum speed of 140.0 cm s-1 in 3 frames (0.1 sec). In the same sequence, the fish averaged 15.6 cm s-1 for the first 17 frames. In response to the sergestid, the fish abruptly changed course and accelerated to a maximum speed of 90.8 cm s-1 in 5 frames (0.7 sec). By frame 33, the last frame in which its image could be digitized, the fish had decelerated to 53.0 cm s-1.
In this example of three-dimensional behavioral quantification, the inaccuracy of relying on 2-D measurements is clear. When the swimming sergestid and fish converged, the fish and shrimp reacted at almost the same time, changing speeds and directions. While each camera view individually suggests that interaction affected the two animals, taken separately they provide conflicting information. Fig. 3 shows the paths of the sergestid and fish calculated by the EV3D program for each view. Because of the camera angles, the sergestid appears to dart upward vertically along 2 different diagonals, and the fish actually appears to swim in opposite directions. Any behavioral measurements based on either one of these 2-dimensional views alone would be wrong.
Correct lighting is important for subsequent digitization of recorded images. The digitizing computer outlines target images by "thresholding", that is, setting the grey level transition that defines the edge of the targets against the background. If lighting is uneven, the threshold level will change from frame to frame because contrast between the target and the background changes and the computer is unable to digitize the target throughout the sequence. Broad-beam lights are essential because the light field must be as even as possible.
Cameras on the submersible should be spaced as far apart as possible. Increased separation improves the accuracy of measuring coordinates along the axis perpendicular to the plane of the cameras. When possible, additional cameras should be used to increase the probability of the target remaining in more than one view. With additional cameras, two could be aligned side by side to provide a stereo view which could be projected in 3-D. Humans see the world with stereo optics, and the ability to review three-dimensional events in 3-D would assist us in understanding complex behaviors.
In an earlier report (Hamner et al. 1988) we discussed three-dimensional videography and described several systems of 3-D video projection. Once the elements of three-dimensional image collection, viewing, and computer analysis are combined, we will have an extremely powerful new tool for analyzing behavior in the deep sea. (Peggy and Bill Hamner are scientists in the Department of Biology at UCLA)
Walton, J. S. 1988. Underwater tracking in three dimensions using the Direct Linear Transformation and a video-based motion analysis system. in: Underwater Imaging (D. J. Holloway, ed.). SPIE Proceedings 980, San Diego, CA, 18 Aug, 1988. 3 pp. (unnumbered).