College Park Stereo Building Facade Dataset
Captured by Jeffrey Delmerico and Philip David, Summer 2010

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

This set of images was captured on the University of Maryland-College Park campus, as well as nearby College Park, MD, and is intended to include a wide variety of buildings from different camera angles and distances.  The original purpose of this data set was for training and testing a system for segmenting individual building facades from an image, and the ground truth images include segmentations of the visible building facades, as well as a two class (building/background) ground truth annotation.  All ground truth images were human-annotated. Additionally, The left camera image and the corresponding disparity map for each shot are included. 

-- We would like to thank the staff of the Asset Control and Behavior Branch of the Computational and Information Sciences Directorate at the US Army Research Laboratory for their assistance in data annotation. --

Camera
------

All images were captured with a Tyzx DeepSea V2 grayscale stereo camera which performed the disparity calculation onboard. The camera has a 14 cm baseline and 62 degree horizontal field of view, which provides an accuracy of approximately 0.5 m at a range of 15m, so the useful range for these disparity images is somewhat shorter than that. Please note that some of the ground truth annotations take this into account and some distant buildings are labeled as background. All images, disparity maps, and ground truth annotations have 500 x 312 resolution.  

Camera Calibration
------------------

Camera matrices (in pixels):

      | 412.6333859792975                   0   255.8801216520595 |
K_L = |                 0   412.2242196849530   155.7647978962623 |
      |                 0                   0   1.000000000000000 |

      | 412.4237172388727                   0   256.0004730684672 |
K_R = |                 0   412.0888225016296   155.1708878432427 |
      |                 0                   0   1.000000000000000 |

Rotation matrix and translation vector (for right camera with respect to left camera, translation in millimeters):

    | 0.999999716528539  -0.000086238257306  -0.000748001206474 |
R = | 0.000087532034967   0.999998500087721   0.001729786244292 |
    | 0.000747850910787  -0.001729851228015   0.999998224165295 |

    | -139.82093808851 |
T = | -0.0610218277064 |
    |  0.2948105780855 |

Conversion from disparity to depth can be obtained with the transformation: Z = C * fb/D, where Z is depth in millimeters, f is the focal length in pixels (412), b is the baseline of the camera in millimeters (140), and D is the disparity value in pixels.  The disparity values in the provided images are scaled up by a factor of 32 in order to accommodate the 5-bit sub-pixel accuracy of the camera, so the constant factor C = 32 is included to correct this.  So for example, if the disparity value at a point was 640 in one of the included images, the actual disparity is 20 pixels, so the depth at that point would be: Z = C * fb/D = 32 * (412 px)(140 mm)/(640) = 2884 mm, which is consistent with the computation using the actual sub-pixel disparity.

File Organization
-----------------

There are 142 grayscale images along with their corresponding disparity maps and ground truth building facade segmentations.  
These images are separated into the following subdirectories:

Images - PNG files with prefix L, indexed from 0 to 141, taken from the left sensor of the camera.  
Disparity - 16-bit TIFF files storing the disparity maps corresponding to the files in "Images", but with prefix D. **
GroundTruth - Two-class (building/background) PNG annotations, with prefix L and suffix _GT.  
FacadeSegmentations - PNG annotations where each facade is represented with a different gray value. Prefix L and suffix _GTP.

** The disparity maps will appear to be all black when viewed with many graphics programs; the stored values are large enough to require 16-bit images (> 256), but not large enough relative to the maximum value for 16-bit depth (65535) to be distinguishable from black when displayed.  To visualize the disparity maps, the "imagesc" command in MATLAB can provide a scaled, human-readable version.

Disparity Images
---------------

One important note about these disparity images: the real-time System-on-a-Chip calculation of the the disparity maps by the Tyzx camera are somewhat sparse, so there are some pixels in each image that have not been assigned a valid disparity value.  This is sometimes due to occlusion, distance outside the range of the camera, or just failure of the stereo algorithm to find an appropriate match in both camera images. The Tyzx camera assigns the maximum value (65535) to these invalid pixels, so the disparity image, once read, usually needs to be cleaned up before viewing.  Since disparity value of 0 doesn't have a meaningful physical representation (i.e. pixel is inside the camera), these invalid pixels have been assigned the value 0.

If you have any questions, or need any assistance with this data set, please feel free to contact me at: jad12@buffalo.edu 
