Camera Tracking¶
libavg can use cameras to track objects. It supports several driver subsystems as camera interfaces. The tracker was built primarily for our multitouch table MTC, but it works for other camera tracking applications as well.
The tracker uses a configuration file stored either at /etc/avgtrackerrc
or ./avgtrackerrc
. There is an example file in source:trunk/libavg/src/imaging/avgtrackerrc.minimal.
Note:
The following is outdated. libavg now supports several multitouch drivers, including TUIO and linux drivers. For information on the new interface, see the reference. If you want to use the internal tracker, the general information here is valid but the details aren't. For the specific syntax, refer to
avgtrackerrc.minimal
.
avgtrackerrc
contains configuration information for several stages of the image processing pipeline. Each stage of the pipeline is described in a different section of the file.
The first stage is the camera. The values in the config file determine device parameters for the camera to be used. <source>
determines the type of device to use and must be either 'fw' (for Firewire) or 'v4l' (for Video4Linux2). <device>
is the device file name. <size>
is the requested size of the grabbing frame. The example gives the default Firewire device file under linux. In the case of Video4Linux, the first camera is usually found under /dev/video0. <fps>
is the number of frames per second that the camera should be set to. The other parameters map directly to firewire camera settings (TheImagingSource has a lot of generic documentation about firewire cameras that doesn't only apply to their own products.)
The <transform>
stage is used to map the camera image coordinates to a rectangle. It takes care of tilted or rotated cameras, barrel distortion due to wide angle lenses and different resolutions of camera and the rectangle desired. There is an automatic configurator for this part of the file in avg_media/mtc/videochooser2/
. The example file above contains a setting that simply passes input to output values unchanged for a 640x480 camera. It can be used if you don't want any transformation to take place. Except for the last two parameters (<displaydisplacement>
and <displayscale>
), these transformations are applied to the actual camera bitmap before any image processing is performed. The last two are used to transform the coordinates to screen coordinates. If applying these might result in coordinates outside of the screen, libavg will abort:
[07-12-30 15:53:04.538] ERROR: Impossible tracker configuration: Region of interest is ((0,0)-(1280,720)), camera image size is (640,480). Aborting.
In this case, for instance, setting <displayscale x="2" y="1.5"/>
will solve the issue.
The middle stages of the pipeline are described in the <tracker>
section. <historyupdateinterval>
describes how long the algorithm for background subtraction takes before it assumes something new in the image is a part of the 'background' and hence not interesting. A value of 1 corresponds to about 256 frames before objects are absorbed completely. Larger values mean proportionately longer times.
The <touch>
and <track>
sections configure the tracker for generic tracking and DI-specific touch detection. These correspond to two sub-pipelines that work in parallel to detect different types of blobs. If one of the sections is missing, only the other sub-pipeline runs. For this reason, the <touch>
section should be left out completely if you're not doing table-based tracking. In these sections, the <threshold>
value is the minimum brightness (above the background image) a pixel must have to be considered part of a blob. <similarity>
is the maximum distance that a blob may travel in one frame and still be considered the same object. <areabounds>
determine the minimum and maximum size in pixels that a blob may have and be considered interesting, and <eccentricitybounds>
is how close to a circular form the blob must have to be considered interesting. Non-'interesting' blobs are discarded and not sent to the application as events. The values in the example above are very lenient. With these values, only very small blobs (which are usually just camera noise) are discarded.