Modeling and Comparing Spatiotemporal Events
Video query systems have been used increasingly for both business and personal applications. Many applications for video data involve stationary cameras, resulting in a stable background and moving objects in the foreground. The movements of these objects can be extracted to form lifelines using techniques such as those developed in our lab. Our current task is to organize these lifelines and their attributes in a way that will make them easy to query, even by inexperienced users. In order to accomplish this, we have employed data cubes and other hierarchical measures, as well as new metadata structures. After a brief review of our ongoing work with lifelines, we will discuss these additional components of our query system in more depth. Our comprehensive system has the potential to change the way in which video databases are organized and queried. 1. Extracting Video Lifeline Data Work done by members of our research group has dealt with the methods for extracting lifelines from video data sets (Stefanidis et al, 2001a,b). A lifeline can be defined as a sequence of the spatial locations (x,y,z), of an object over a time interval (t1,t2), during which an object has moved from one location to another. They are useful because they represent several attributes of an object’s spatiotemporal progression, such as acceleration and cardinality, that can then be incorporated into a master database for use in future queries. Lifelines can also be aggregated into groups, which have their own properties, such as topology. Users that require a high level of detail would desire many nodes and lifelines when defining lifelines and groups respectively, while those more concerned with a general overview of the video contents might ask for fewer nodes and lifelines in order to save space and processing time. Self-Organizing Maps (SOMs), a type of neural network, can handle the computations that must take place in order for an object’s movements to be generalized, as is shown in work done in our lab (Stefanidis et al., 2001c). SOMs are automated, so that the first nodes that are placed give a rough outline of the lifeline of interest, and additional nodes are clustered near these vertices in order to help to bring out the details. Our current work builds on this base, and takes it a step further by allowing the user some input in exactly how many nodes will be used to carry out the delineations of lifelines. This introduces the possibility of specifying a level of detail that will work best for any given application. Once the nodes have been extracted from the objects in question through techniques like Self-Organizing Maps, the next challenge is in how to organize them for storage in our database. We have addressed this issue by looking into data cubes as well as other hierarchical structures, such as pyramids and scale space. The following section elaborates on these options, and shows how the user’s ability to choose the number of nodes defining a lifeline comes into play. Metadata about both the video contents and specific lifelines are also stored in the database, and we have developed some new metadata structures in order to best make this aspect of the database available for querying. Metadata issues are presented in Section 3. 2. Data Cubes and Other Hierarchical Organizers Once all the geometries of the lifelines and their corresponding attributes have been gathered, we shift our focus to how we can best organize the input into the database. Current video database systems often use restrictive schemas in order to accomplish this. We prefer to use more flexible systems that allow some user input as well. Currently, we have developed a system where the components of our lifelines can be easily extracted for storage in data cubes and other hierarchical structures. The data cubes are constructed such that querying for a group and its behavior from a fact table would also pull up information about its component lifelines, their respective nodes, and the attributes associated with them. These could be accessed as needed to answer key questions about the contents of the video sequence. While data cubes can be very useful if the user already has a set of pre-defined questions to answer and does not anticipate wanting to grow beyond these questions, they do present some problems. The main problem with using a typical data cube setup is that all queries must be predefined. If we wanted to make any advanced queries for which we hadn’t set up structures already, we would be unsuccessful. This could happen if we wanted to increase our scale to something beyond groups, or to examine the contents of more than one video clip at a time. An example application for our system is traffic monitoring by municipalities. This application can be used to illustrate many of the different problems encountered in existing video query systems, and the ways in which we propose to remedy them. For example, if we are dealing with an application that monitors traffic flow through the streets of a busy downtown area, we will be primarily concerned with the level of detail that will allow us to best capture these streets and their associated events. If the scope of the project then expands to include the entire metropolitan area, data cubes would not be able to expand along with them in order to consider more generalized views. These limitations have led us to consider pyramids and scale space as alternatives that are more flexible but still have a hierarchical structure. Figure 1: Pyramid representations of the same lifeline Pyramids are basically discretized versions of scale space mechanisms. In a pyramid set up, there are predefined levels of the hierarchy that can be accessed (Chen et al, 2000). These would correspond to choosing different values for n, where n is the number of nodes that define a lifeline, or the number of lifelines that define a group. Pyramids would have some already-defined values for n, so that the broadest level of the hierarchy might only use three nodes to define a lifeline (perhaps the beginning and end of the lifeline as well as one key node between these two). A more detailed level might have ten nodes, or a proportional number for the temporal length of the video sequence (Figure 1). In contrast to the pyramid structure, a scale space method would be more continuous in the number of nodes that could be used to define a lifeline (Ogniewicz, 1994). While the pyramid scheme may allow the user to choose only multiples of 10 for n (or some other number suggested by SOM principles) when looking beyond the coarsest level of detail, the scale space method would allow the use of any integer value for n. This would give the user much more flexibility in the choice of exactly how much detail to extract from any given video sequence, and in the case of our traffic monitoring applications, it would allow us to zoom to whatever area of coverage is desired for a given situation. For that reason, we have chosen to implement scale space methods in our query system whenever possible. 3. A New Metadata Structure Now that we have the data from our videos and the lifeline extraction process stored in hierarchical structures within our database, we turn to their associated metadata. This metadata can also be stored in the same database in a hierarchical fashion. We can easily take advantage of the current structuring of the Federal Geographic Data Committee (FGDC)’s suggested metadata standards (FGDC, 1998). In this structuring, there are seven main categories of metadata, each of which branches into several subcategories, which in turn have their own subcategories (Figure 2).