2 Surveillance

Published on January 2017 | Categories: Documents | Downloads: 22 | Comments: 0 | Views: 845
of 23
Download PDF   Embed   Report

Comments

Content

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

493

Survey on Contemporary Remote Surveillance Systems for Public Safety
Tomi D. R¨ ty a
Abstract—Surveillance systems provide the capability of collecting authentic and purposeful information and forming appropriate decisions to enhance safety. This paper reviews concisely the historical development and current state of the three different generations of contemporary surveillance systems. Recently, in addition to the employment of the incessantly enlarging variety of sensors, the inclination has been to utilize more intelligence and situation awareness capabilities to assist the human surveillance personnel. The most recent generation is decomposed into multisensor environments, video and audio surveillance, wireless sensor networks, distributed intelligence and awareness, architecture and middleware, and the utilization of mobile robots. The prominent difficulties of the contemporary surveillance systems are highlighted. These challenging dilemmas are composed of the attainment of real-time distributed architecture, awareness and intelligence, existing difficulties in video surveillance, the utilization of wireless networks, the energy efficiency of remote sensors, the location difficulties of surveillance personnel, and scalability difficulties. The paper is concluded with concise summary and the future of surveillance systems for public safety. Index Terms—Distributed systems, human safety, surveillance, survey.

I. INTRODUCTION URVEILLANCE systems enable the remote surveillance of widespread society for public safety and proprietary integrity. This paper contains the revision of the background and the three different generations of surveillance systems. The emphasis of this paper is on the third-generation surveillance system (3GSS) and its current and significant difficulties. The 3GSSs use multiple sensors. Domain-specific issues are omitted from this paper, despite being inherent to their own domain. The focus is on generic surveillance, which is applicable to public safety. Surveillance systems are typically categorized into three distinct generations of which the 3GSSs is the current generation. The essential dilemmas of the 3GSSs are related to the attainment of real-time distributed architecture, awareness and intelligence, existing difficulties in video surveillance, the utilization of wireless networks, the energy efficiency of remote sensors, location difficulties of surveillance personnel, and scalability difficulties. These aspects repetitively occurred in the

S

Manuscript received August 4, 2009; revised November 16, 2009 and January 28, 2010; accepted January 28, 2010. Date of publication March 1, 2010; date of current version August 18, 2010. This paper was recommended by Associate Editor L. Zhang. The author is with the VTT Technical Research Centre of Finland, Oulu 90571, Finland (e-mail: tomi.raty@vtt.fi). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCC.2010.2042446

literature review. In public safety, real-time distributed architecture is required to transmit sensor data immediately for deduction. Awareness and intelligence is applied to address the automatic deduction. Video surveillance is thoroughly used in public safety. The usage of wireless networks is growing in public safety and it is accompanied with energy efficiency. Surveillance personnel often patrol in surveyed areas and their precise location must be known to exploit their benefit to the fullest. As surveyed areas become constantly larger and more complex, scalability is a crucial issue in the surveillance of public safety. Public safety and homeland security are substantial concerns for governments worldwide, which must protect their people and the critical infrastructures that uphold them. Information technology plays a significant role in such initiatives. It can assist in reducing risk and enabling effective responses to disasters of natural or human origin [1]. There is an increasing demand for security in society. This results in a growing need for surveillance activities in many environments. Recent events, including terrorist attacks, have resulted in an increased demand for security in society. This has influenced governments to make personal and asset security priorities in their policies. Valera and Velastin [2] state that the demand for remote surveillance relative to safety and security has received significant attention, especially in the public places, remote surveillance of human activities, surveillance in forensic applications, and remote surveillance in military applications. The public can be perceived either as individuals or as a crowd. Valera and Velastin [2] indicate that a future challenge is to develop a wide-area distributed multisensor surveillance system, which has robust, real-time computer algorithms, which are executable with minimal manual reconfiguration for different applications [2]. There is a growing interest in surveillance applications, because of the availability of cheap sensors and processors at reasonable costs. There is also an emerging need from the public for improved safety and security in urban environments and the significant utilization of resources in public infrastructure. This, with the growing maturity of algorithms and techniques, enables the application of technology in miscellaneous sectors, such as security, transportation, and the automotive industry. The problem of remote surveillance of unattended environments has received particular attention in the past few years [3]. Intelligent remote monitoring systems allow users to survey sites from significant distances. This is especially useful when numerous sites require security surveillance simultaneously. These systems use rapid and efficient corrective actions, which are executed immediately once a suspicious activity is de-

1094-6977/$26.00 © 2010 IEEE

494

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

tected. An alert system can be used to warn security personnel of impending difficulties and numerous sites can be simultaneously monitored. This considerably reduces the load of the security personnel [4]. A fundamental goal of surveillance systems is to acquire good coverage of the observed region with as few cameras as possible to keep the costs for the installation and the maintenance of cameras, transmission channels, and complexity in scene calibration reasonable [5]. In this paper, we first present the background and progression of surveillance systems. This is followed by careful descriptions of the three generations of surveillance systems. Then we present the difficulties of contemporary surveillance systems, which compose of the attainment of real-time distributed architecture, awareness and intelligence, existing difficulties in video surveillance, the utilization of wireless networks, the energy efficiency of remote sensors, location difficulties of surveillance personnel, and scalability difficulties. The paper is concluded with a future prospect and a brief summary. II. HISTORICAL SURVEILLANCE AND SURVEILLANCE SYSTEMS The stone-age warrior used his eyes and ears from atop of a mantle to survey his battle area and to distinguish targets against which his could utilize his primitive weapons. Despite advancements in weaponry to catapults, swords, and shields, the eyes and ears of warriors were utilized for surveillance. The observation balloon and the telegraph significantly improved range in both visibility and information transmission, respectively, but in the twentieth century, the improvements from the eyes and ears transformed surveillance into the concept “modern” [6]. Military operations have introduced the importance of the combat surveillance problem. The location of target coordinates and shifting own troops accordingly requires dynamic actions accompanied with decisions. Rapid, complete, and precise information is needed to address this [7]. Information included the detection and approximate location of personnel, concentrations of troops, and the monitoring and storage of position data over time and according to movements [8]. Surveillance information must be delivered to the correct commander when he requires it and the information must be presented in a meaningful form to address the problem of information processing [7]. The data-collection problem is addressed by the entities, which perform the surveillance, e.g., intelligence sources and human surveillance, and transmit it to the command [7]. The fundamental intention of a surveillance system is to acquire information of an aspect in the real world. Military surveillance systems enhance the sensory capabilities of a military commander. Surveillance systems have evolved from simple visual and verbal systems, but the purpose is still the same. Even the most primitive surveillance systems gathered information concerning reality and communicated it to the appropriate users [9]. Generic surveillance is composed of three essential parts. These are data acquisition, information analysis, and on-field operation. Any surveillance system requires means to monitor the environment and collect data in the form of, e.g., video,

still images, or audio. Such data processed and analyzed by a human, a computer, or a combination of both at a command center. An administrator can decide on performing an on-field operation to put the environment back into a situation considered as normal. On-field control operations are issued by on-field agents who require effective communication channels to uphold a close interaction with the command center [10]. A surveillance system can be defined as a technological tool that assists humans by offering an extended perception and reasoning capability about situations of interest that occur in the monitored environments. Human perception and reasoning are restricted by the capabilities and limits of human senses and mind to simultaneously collect, process, and store limited amount of data [3]. To address this amount of information, aspects such as scalability and usability become very significant. This includes how information needs to be given to the right people at the right time. To tolerate this growing demand, research and development has been subsequently executed in commercial and academic environments to discover improvements or new solutions in signal processing, communications, system engineering and computer vision [2].

III. PROGRESSION OF SURVEILLANCE SYSTEMS Over the past two decades, surveillance systems have been an area of considerable research. Recently, plenty of research has been concentrated on video-based surveillance systems, particularly for public safety and transportation systems [11]. Data are collected by distributed sources and then they are typically transmitted to some remote control center. The automatic capability to learn and adjust to altering scene conditions and the learning of statistical models of normal event patterns are growing issues in surveillance systems. The learning system offers a mechanism to flag potentially anomalous events through the discovery of the normal patterns of activity and flagging the least probable ones. Two substantial restrictions that affect the deployment of these systems in the real world contain real-time performance and low cost. Multisensor systems can capitalize from processing either the same type or different type of information collected by sensors, e.g., video cameras, and microphones, of the same monitored area. Appropriate processing techniques and new sensors offering real-time information associated to different scene characteristics can assist both to improve the size of monitored environments and to enhance performances of alarm detection in regions monitored by multiple sensors [3]. Security surveillance systems are becoming crucial in situations in which personal safety could be compromised resulting from criminal activity. Video cameras are constantly being installed for security reasons in prisons, banks, automatic teller machines, petrol stations, and elevators, which are the most susceptible for criminal activities. Usually, the video camera is connected to a recorder or to a display screen from which security personnel constantly monitor suspicious activities. As security personnel typically monitor multiple locations simultaneously,

¨ RATY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY

495

this manual task is labor intensive and inefficient. Significant stress may be placed on the security personnel involved [4]. Another technological breakthrough substantial to the development of surveillance systems is the capability of remotely transmitting and reproducing images and video information, e.g., TV broadcasting and the successive use of video signal transmission and display in the close circuit TV systems (CCTV). CCTVs that provide data at acceptable quality date back to the 1960s. The availability of CCTVs can be considered as the beginning point that allowed online surveillance to be feasible, and 1960 can be considered the beginning date of the first generation surveillance systems [3]. Surveillance systems have developed in the three generations [11]. The first generation of surveillance systems (1GSSs) used analogue equipment throughout the complete system [11]. Analogue closed-circuit television cameras (CCTV) captured the observed scene and transmitted the video signals over analogue communication lines to the central back-end systems, which presented and archived the video data [11]. The main challenge in the 1GSS is that it uses analogue techniques for image distribution and storage [2]. The second generation of surveillance systems (2GSSs) uses digital back-end components [11]. They enable real-time automated analysis of the incoming video data [11]. Automated event detection and alarms substantially improve the content of simultaneously monitored data and the quality of the surveillance system [11]. The difficulty in the 2GSS is that it does not support robust detection and tracking algorithms, which are needed for behavioral analysis [2]. The 3GSSs have finalized the digital transformation. In these systems, the video signal is converted into the digital domain at the cameras, which transmit the video data through a computer network, for instance a local area network. The back-end and transmission systems of a third-generation surveillance system have also improved their functionality [11]. There are immediate needs for automated surveillance systems in commercial, military applications, and law enforcement. Mounting video cameras is inexpensive, but locating available human resources to survey the output is expensive. Despite the usage of surveillance cameras in banks, stores, and parking lots, video data currently are used only retrospectively as a forensic tool, thus losing its primary benefit as an active realtime medium. What is required is a continuous 24-h monitoring of surveillance video to alert security officers of a burglary in progress, or a suspicious individual lingering in a parking lot, while there still is time to prevent the criminal offence [12]. IV. FIRST-GENERATION SURVEILLANCE SYSTEMS First video generation surveillance systems (1960–1980) considerably extend human perception capabilities in a spatial sense. The 1GSSs are based on analogue signal and image transmission and processing. In these systems, analogue video data form a collection of cameras, which view remote scenes and present information to the human operators. The main disadvantages of these systems concern the reasonably small attention span of operators that may result in a significant miss rate

of the events of interest. From a communication perspective, these systems suffered from the main difficulties of analogue video communication, e.g., high-bandwidth requirements and poor allocation flexibility [3]. The 1GSS utilizes analogue CCTV systems. The advantage is that they provide good performance in some situations and the technology is mature. The utilization of analogue techniques for image distribution and storing is inefficient. The current 1GSSs examine the usage of digital information against analogue, review digital video recording, and CCTV video compression [2]. Computer vision is a significant artificial intelligence (AI) research area. From the 1970s to the 1990s, computer vision proved its practical value in a vast range of application domains including medical diagnostics, automatic target recognition, and remote sensing [13]. V. SECOND-GENERATION SURVEILLANCE SYSTEMS In this technological evolution, 2GSSs (1980–2000) correspond to the maturity phase of the analogue 1GSS. The 2GSSs benefited from the early progression in digital video communications, e.g., digital compression, robust transmission, bandwidth reduction, and processing methods, which assist the human operator by prescreening important visual events [3]. Regarding the 2GSS, automated visual surveillance is achieved through the combination of computer vision technology and CCTV systems. The benefits of the second generation are that the surveillance efficiency of CCTV is enhanced. The difficulties lie within the robust detection and tracking algorithms needed for behavioral analysis. The current research of 2GSS rests in real-time robust computer vision algorithms, automatic learning of scene variability and patterns of behavior, and eliminating the differences between the statistical analyses of a scene and establishing natural language interpretations [2]. The 2GSS research addressed multiple areas with improved results in real-time analysis and separation of 2-D image sequences, identification, and tracking of multiple objects in complex scenes, human behavior comprehension, and multisensor data fusion. The 2GSS also improved intelligent man–machine interfaces, performance evaluation of video processing algorithms, wireless and wired broadband access networks, signal processing for video compression, and multimedia transmission for video-based surveillance systems [3]. The majority of research efforts during the period of the 2GSSs have been used in the development of automated realtime event detection techniques for video surveillance. The availability of automated methods would significantly ease the monitoring of large sites with multiple cameras as the automated event detection enables prefiltering and the presentation of the main events [3]. VI. THIRD-GENERATION SURVEILLANCE SYSTEMS The 3GSSs handle a large number of cameras, a geographical spread of resources, and many monitoring points. From an image processing view, they are based on the distribution of processing capacities over the network and the use of embedded signal-

496

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

Fig. 1. [2].

Illustration of a typical processing flow in video surveillance systems

processing devices to achieve the benefits of scalability and potential robustness offered by distributed systems [14]. In the 3GSS, the technology revolves around wide-area surveillance systems. This results in the advantages of the collection of more accurate information by combing different types of sensors and in the distribution of the information. The difficulties are in the efficient integration and communication of information, establishment of design methodologies, and moving and multisensor platforms. The current research of 3GSSs concentrate on distributed and centralized intelligence, data fusion, probabilistic reasoning frameworks, and multicamera surveillance techniques [2]. The fundamental goals that are expected of a third-generation vision surveillance application, based on end-user requirements, are to offer good scene comprehension, surveillance information at real-time in a multisensor environment, and the use of low-cost standard components. Fig. 1 presents a typical processing flow of video surveillance systems. It composes of object detection, object recognition, tracking, behavior and activities analysis, and a database [2]. Once the object is detected, the object recognition task uses model-based techniques in recognition and tracking. This is followed with the behavior and activities analysis of the tracked objects. The database addresses the storage and retrieval [2]. Research on distributed real-time video processing techniques in intelligent, open, and dedicated networks is anticipated to offer interesting results. This is largely due to the availability of enhanced computational power at reasonable expenses, advanced video processing and comprehension methods, and multisensor data fusion [3]. The main objective of the fully digital 3GSSs is to ease efficient data communication, management, and extraction of events in real-time video from a large collection of sensors. To achieve this goal, improvements in automatic recognition functionalities and digital multiuser communications are required. Technologies, which satisfy the requirements of the recognition algorithms, contain computational speed, memory utilization, remote data access, and multiuser communications between distributed processors. The availability of this technology significantly eases the 3GSS development and deployment [3]. The main application areas for the 3GSSs are in the region of public monitoring. This is required by the rapid growth of metropolitan localities and by the increasing need to offer enhanced safety and security to the general public. Other factors that drive the deployment of these systems include efficient resource management and rapid emergency assistance [3]. The essential limitation in the efficiency of CCTV surveillance systems is the cost of offering adequate human monitoring coverage for what is a considerably boring task. Additionally,

Fig. 2. Example of combining the data of multiple sensors in different events: (a) walking, (b) running, (c) talking, (d) knocking on a door, and (e) shouting [17].

CCTV is generally used as a reactive tool. If a problem happens which is not noticed, then it will proceed without any response [15]. The notable aspects of the 3GSSs are decomposed into the topics of the following subchapters. They consist of multisensors environments, video surveillance, audio surveillance, wireless sensor networks, distributed intelligence and awareness, architecture and middleware, and the utilization of mobile robots. A. Multiple Sensor-Enabled Environments Spatially distributed multisensor environments offer interesting possibilities and challenges to surveillance. Recently, there have been studies on data fusion techniques to tolerate information sharing that results from different types of sensors [2]. The communication aspects within separate parts of the system play a crucial role, with particular challenges either due to bandwidth constraints or due to the asymmetric characteristics of communication [2]. Rasheed et al. exploit the utilization of data fusion over multiple modalities, including radar information, automatic identification systems (AIS), and global position system (GPS) receivers [16]. Fig. 2 illustrates the reflection of two discrete sensors, the video sensor and the audio sensor, which can be fused to enhance information. The sequences compose of walking, running, talking, knocking, and shouting events. The recorded audio is segmented into audio frames of 50 ms. Each sequence was recorded over a time of 8 s [17].

¨ RATY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY

497

Fig. 4. Illustration of tracking with an occluding object passing across the view of the camera [22].

Fig. 3.

Generic view of networked cameras [18].

Junejo et al. state that a single camera is not sufficient in monitoring a large area. To address this problem, a network of cameras is established. Junejo et al. utilize an automatically configurable network of nonoverlapping cameras to attain sufficient monitoring capabilities on large areas of interest. Fig. 3 illustrates the principles of a network of cameras. In this example, each camera is mounted on to a moving platform while detecting and tracking objects [18]. B. Video Surveillance Video surveillance has become an omnipresent aspect of the modern urban landscape, situated in a vast variety of environments including shopping malls, railway stations, hospitals, government buildings, and commercial premises. In some cases, surveillance performs persuasion, discouraging unacceptable behavior that can no longer be performed anonymously, recording and logging events for evidential reasons, or offering remote observation of sensitive locations where access control is crucial [19]. Intelligent visual surveillance systems address the real-time monitoring of persistent and transient objects within a specific environment. The primary goals of these systems are to offer an automatic interpretation of scenes, and to understand and predict the actions and interactions of the observed objects. The understanding and prediction are based on the information collected by sensors. The basic stages of processing in an intelligent visual surveillance system are moving object definition, recognition, tracking, behavioral analysis, and retrieval. These stages contain the topics of machine vision, pattern analysis, artificial intelligence and data management [2]. As an active research topic in computer vision, visual surveillance in dynamic scenes attempt to detect, recognize, and track certain objects from image sequences. In addition, it is important to comprehend and depict object behaviors. The aim is to develop an intelligent visual surveillance to replace the traditional passive video surveillance that is proving to be inefficient as the amount of cameras exceeds the capability of human operator surveillance. Shortly, the goal of visual surveillance is not only to place cameras in the place of human eyes, but also

to achieve the exhaustive surveillance task as automatically as possible [20]. Intelligent cameras execute a statically defined collection of low-level image-processing operations on the captured frames to enhance the video compression and intelligent host efficiency. Changing or reconfiguring the video processing and analysis during the operation of a surveillance system is difficult [11]. The difficulty of tracking an individual maneuvering in a cluttered environment is a well-studied region. Usually, the objective is to predict the state of an object based on a set of noisy and unclear measurements. There is a vast range of applications in which the target-tracking problem is presented, including vehicle collision warning and avoidance, mobile robotics, speaker localization, people and animal tracking, and tracking a military target [21]. Fig. 4 illustrates a tracking sequence with an individual passing through the view of a camera causing an occlusion. The output of the background subtraction method for each frame is a binary image, which is composed of foreground region. When an occlusion occurs, multiple objects may merge into the same area. This requires an object model that can address split-and-merge cases. Each pixel in the foreground indicates an object label according to which the product of color and spatial probability is the highest [22]. However, watchful the operators, manual monitoring suffers from information overload, which results in periods of operator inattention due to weariness, distractions, and interruptions. In practice, it is unavoidable that a significant amount of the video channels is not usually monitored, and potentially important events are overlooked. Additionally, weariness grows significantly as the amount of cameras in the system increases. The automation of all or part of this process would obviously offer dramatic benefits, ranging from a capability to alert an operator of potential event of interest, to a completely automatic detection and analysis system. However, the dependability of automated detection systems is an essential issue, because frequent false alarms introduce skepticism in the operators, who quickly learn to disregard the system [19]. It is desirable that visual surveillance systems can understand the activity of the scene it is detecting and tracking. Ideally, this would be done in a manner, which is consistent with that of a human observer. The task of automating the interpretation of the video data is a detailed one and can depend on a vast range of factors, including location, context, time, and date.

498

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

This information indicates where objects are and what they may be doing as they are observed, and attempts to characterize usual behavior [19]. Research interests have shifted from ordinary static imagebased analysis to video-based dynamic monitoring and analysis. Researchers have advanced in addressing illumination, color, background, and perspective static aspects. They have advanced in tracking and analyzing shapes related to moving human bodies and moving cameras. They have improved activity analysis and control of multicamera systems. The research of Trivedi et al. [13] addresses a distributed collection of cameras, which provide wide-area monitoring and scene analysis on several levels of abstraction. Installing multiple sensors introduces new design aspects and challenges. Handoff schemes are needed to pass tracked objects between sensors and clusters, methods are required to specify the best view given in the scene’s context, and sensor-fusion algorithms capitalize a given sensor’s strengths [13]. Modern visual surveillance systems deploy multicamera clusters operating at real-time with embedded adaptive algorithms. These advanced systems need to be operational constantly, and to robustly and reliably detect events of interest in difficult weather conditions. This includes adjusting to natural and artificial changes in the illumination, and withstanding hardware and software system failures [23]. Generally, the initial step for automatic video surveillance is adaptive background subtraction to extract foreground regions from the incoming frames. Object tracking is then executed on the foreground regions. In this case, tracking isolated objects is relatively easy. When multiple tracked objects are placed into groups with miscellaneous complexities of occlusion, tracking each individual object through crowds becomes a challenging task. First, when objects merge into a group, the visual characteristics for each object become unclear and obscure. The objects distant from the camera can be partially or completely occluded by the surrounding objects. Second, the poses and scales of the target objects may severely change when they are in crowds. Third, the motion speed and the direction of the target objects may essentially change during occlusion [24]. Basically, the approach of the detection of moving objects is through background subtraction that contains the model of the background and the detection of moving objects from those that differ from such a model. In comparison to other approaches, such as optical flow, this approach is computationally affordable for real-time applications. The main dilemma is its sensitivity to dynamic scene challenges and the subsequent need for background model adaptation through background maintenance. This type of a problem is known to be essential and demanding [25]. Fig. 5 illustrates a collection of images from a parking lot and the background subtraction output of these images. Object detection is achieved by constructing a representation of the scene, which is called a background model, and then locating the differences from the model against each incoming frame. The higher image sequence illustrates the complete scene and the lower image sequence represents the resulting background subtraction output [22].

Fig. 5.

Illustration of images and the output of background subtraction [22].

Fig. 6.

Example of tracklet tracking [26].

Li et al. state that the aim of multitarget tracking is to infer the target trajectories from image observations in a video. This poses a significant challenge in crowded environments where there are frequent occlusions and multiple targets have a similar appearance and intersecting trajectories. Data association-based tracking (DAT) associates links to short track fragments, i.e., tracklets, or detection responses into trajectories based on similarity in position, size, and appearance. This enables multitarget tracking from a single camera by progressively associating detection responses into longer track fragments, i.e., tracklets, to resolve target trajectories. Fig. 6 presents an image of tracklet tracking [26]. Human motion tracking which is based on the input from red–green–blue (RGB) cameras can produce results in indoor scenes with consistent illumination and steady background [27]. Outdoor scenes with significant background clutter results from illumination changes are a challenge for conventional chargedcouple device (CCD) cameras [27]. There have been contributions on pedestrian localization and tracking in visible and infrared videos [28]. Fig. 7 presents a thermal image and a color image of the same scene [28]. A significant problem encountered in numerous surveillance systems are the changes in ambient light, particularly in an out-

¨ RATY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY

499

Fig. 7.

(Left) Thermal image and (right) color image of a scene [28]. Fig. 8. Example of a microphone array for measuring the bearing angle [32].

door environment, where the lighting conditions varies. This renders the conventional digital color image analysis very difficult. Thermography, or thermal visualization, is a type of infrared visualization. Thermal cameras have been utilized for imaging objects in the dark. These cameras use infrared (IR) sensors that capture IR radiation of different objects in the environment and forms IR images [29].

C. Audio Surveillance The creativeness of the research of Istrate et al. [30] is to use sound as an informative source simultaneously with other sensors. Istrate et al. [30] suggest extracting and classifying normal life sounds, such as a door banging, glass shattering, and objects falling, with the intention of identifying serious accidents, for instance as falling or somebody fainting. The approach of Istrate et al. [30] comprises the replacement of the video camera with a multichannel sound acquisition system, which analyzes the sound range of the location at real-time and specifies situations of emergency. Only the previously detected sound event is transmitted to the alarm monitor, if it is considered to be a possible alarm. To reduce the computation time required for a multichannel real-time system, the sound extraction process has been split into detection and classification. Sound event detection is a complicated task, because the audio signals occur in a noisy environment [30]. Accurate and robust localization and tracking of acoustic sources is of interest to a variety of applications in surveillance, multimedia, and hearing enhancement. The miniaturization of microphone arrays combined with acoustic processing further enhances the advantages of these systems, but poses challenges to achieve precise localization performance due to decreasing aperture. For surveillance, acoustic emissions from ground vehicles offer an easily detected signature, which can be used for unobtrusive and passive tracking. This results in a higher localization performance in distributed sensing environments. It exceeds the requirement for excessive data transfer and finegrain time synchronization among nodes, with low communication bandwidth and low complexity. Additional improvement can also be achieved through the fusion of other data modalities, such as video. Traditionally, large sensor arrays are used for source localization to guarantee adequate spatial diversity over sensors to resolve time delays between source observations. The precision of delay-based bearing estimation degrades with decreasing dimensions (aperture) of the sensor array [31].

Sound localization using compact sensor nodes deployed in networks has applications in surveillance, security, and law enforcement. Numerous groups have reported noncoherent and coherent methods for sound localization, detection, classification, and tracking in sensor networks. Coherent methods are based on the arrival time differences of the acoustic signal to the sensors. In standard systems, microphones are separated to maximize precision. The need of synchronization requires frequent communication that is expensive in terms of power consumption. The nodes must achieve synchronization to produce a valid estimate [32]. Fig. 8 presents an example of sound location. An array of microphones (M1, M2, M3, and M4) pairwise separated by a distance (d) is considered. The angle of the source of sound is presented against the coordinate axis. The bearing of the microphone pair M1 and M3 is given as the beta angle. The bearing of the microphone pair M2 and M4 is presented as the alpha angle [32]. Considering the nature of an event that is desirable to detect, the content of information created is more than just visual information. Many of the significant events from a monitoring point of view are accompanied with audio information, which would be useful to examine. The significance of these events is provided by their semantic information and their temporal context. A monitoring system that must distinguish between a door opening and glass breaking, should be expected to identify one and not the other at a given time and location. By expanding the range of information available to the system, the precision of the operation can be improved. The purpose of an audio sensor network would be to assist the end user to search through data and return the points of interest. This would not be done by adding an overwhelming amount additional data, but by drawing attention to the data already collected, but difficult to locate [33]. The sound analysis system has been separated into three modules as illustrated in Fig. 9. The first module is applied to every channel to detect sound events and to extract them from the signal flow. The source of speech or sound can be localized by comparing the predicted SNR for every channel. The fusion module chooses the premium channel if multiple events are detected simultaneously. The third module receives the sound event extracted by the previous module, and it predicts the most probable sound class [30].

500

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

Fig. 9.

Analysis of sound [30].

D. Wireless Sensor Networks Wireless devices, such as wireless-enabled laptops and palm pilots, have progressed into an integral part of daily lives. A wireless network can be considered to be a sensor network, where the network nodes function as sensors. They sense changes in the environment according to the movement of objects or humans. A possible additional functionality could be the indoor surveillance of corporate buildings and private houses [34]. Wireless sensor networks represent a new type of ad hoc networks, which integrate sensing, processing, and wireless communication in a distributed system [35]. Sensor networks are a growing technology that promises a novel ability to monitor and equip the physical world [36]. In a sensing-covered network, each point in a geographic area of interest needs to be within the sensing range of at least one sensor [35]. Sensor networks comprise a significant amount of inexpensive wireless devices (nodes) that are densely distributed over the region of interest [36]. They are usually battery powered with restricted computation and communication abilities [36]. Every node is equipped with different of sensing modalities, such as acoustic, infrared, and seismic [36]. Wireless sensor networks have the potential to improve the ability to develop user-centric applications to monitor and prevent harmful events. The availability of inexpensive low-power sensors, radios, and embedded processors enables the deployment of distributed sensor networks to offer information to users in distinct environments and to provide them control over undesirable situations. Networked sensors can collaborate to process and make deductions from the collected data and provide the user with access to continuous or selective observations of the

environment. In most situations these devices must be small in size, require low power, lightweight, and unobtrusive [37]. In addition to the new applications, wireless sensor networks offer an alternative to several existing technologies. The wiring costs restrict complicated environment controls and the reconfigurability of these systems. In many cases, the savings in the wiring costs alone justify the use of the wireless sensor nodes [38]. A basic issue that arises naturally in sensor networks is coverage. Due to the significant variety of sensors and their applications, sensor coverage is subject to a vast sphere of interpretations. Generally, coverage can be considered as a measure of the quality of service of a sensor network. Coverage formulations can attempt to locate the weak points in a sensor field and suggest future deployment or reconfiguration schemes to enhance the total quality of service [38]. In the previous years, wireless networks, such as IEEE 802.11 a, b, and g wireless local-area networks (WLANs), have become plentiful and their popularity is only increasing. In the near future, wireless networks will become omnipresent, and they will supply high-speed communication capabilities almost anywhere. An immediate question is whether it is possible to utilize the wireless network infrastructure to implement other functionalities in addition to communication. WLANs have been used for positioning mobile terminals and tracking their movements. If the communication infrastructure could be utilized for security purposes, the deployment of the additional infrastructure could be avoided or reduced, resulting in a considerably more cost-effective solution [34]. Fig. 10 illustrates the basic functionality of a store-andforward wireless sensor network (WSN) in which video information is obtained with cameras and transmitted forward. The WSN composes of shared-medium cameras, store-and-forward cameras, distributed servers, routing nodes, wireless cameras and base stations, and a control room. The cameras distribute their information through the nodes and the distributed server to the control room [39]. E. Distributed Intelligence and Awareness The 3GSSs use distributed intelligence functionality. An important design issue is to determine the granularity at which the tasks can be distributed based on available computational resources, network bandwidth, and task requirements. The distribution of intelligence can be achieved by the dynamic partition of all the logical processing tasks, including event recognition and communications. The dynamic task allocation dilemma is studied through the usage of a computational complexity model for representation and communication tasks [3]. A surveillance task can be separated into four phases, which are 1) event detection, 2) event representation, 3) event recognition, and 4) event query. The detection phase addresses multisource spatiotemporal data fusion for efficient and reliable extraction of motion trajectories from videos. The representation phase revises raw trajectory data to construct hierarchical, invariant, and adequate representations of the motion events. The recognition phase handles event recognition and classification.

¨ RATY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY

501

Fig. 11.

Generic architecture of a monitoring/control pervasive system [42].

Fig. 10. Wireless sensor network accompanied with distributed location servers [39].

Fig. 11 presents an illustration of an MCS, which is structured into three logical levels, which are 1) observation, 2) interpretation, and 3) actuation. In observation, the state of a monitored field is periodically captured by a specified monitoring agency (MA). This is usually a set of sensors. In interpretation, the values detected by sensors are evaluated by a specified interpretation agency (IA). In actuation, the specified actions are executed by a specific actuation agency (AA), depending on the interpretational results [42]. F. Architecture and Middleware

The query component indexes and retrieves videos that match some query criteria [40]. The key to security is situation awareness. Awareness requires information, which spans across multiple scales of time and space. A security analyst must keep track of “who are the people and vehicles in a space” (identity tracking), “where are the people in a space” (location tracking), and “what are the people/vehicles/objects in a space doing” (activity tracking). The analyst must use historical content to interpret this data. Smart video surveillance systems are capable of enhancing situational awareness over multiple scales of time and space. Currently, the component technologies are evolving in isolation. For instance, face recognition technology handles the identity-tracking challenge, while restricting the subject to be in front of the camera, and intelligent video surveillance technologies offer activity detection capabilities to video streams while disregarding the identity tracking challenge. To offer comprehensive, nonintrusive situation awareness, it is crucial to address the challenge of multiscale, spatiotemporal tracking [41]. Bandini and Sartori [42] present a monitoring and control system (MCS). An MCS attempts to support humans in decision making regarding problems, which can occur in critical domains. It can be characterized based on its functionalities 1) to gather data of the monitored situation, 2) to evaluate if the data concern an anomalous situation, and 3) in case of anomalous situations, to perform the proper actions, e.g., to remedy the problems [42]. An action is typically the creation of an alarm to notify humans about the problem. MCSs should be intelligent. For this reason, MCSs have been traditionally developed by using artificial intelligence (AI) technologies, such as neural networks, data mining, and knowledge-based systems [42].

The field of automated video surveillance is quite novel and the majority of contemporary approaches are engineered in an ad hoc manner. Recently, researchers have begun to consider architectures for video surveillance. Middleware that provides general support to video surveillance architectures is the logical next step. It should be noted that while video surveillance networks are a class of sensor networks, the engineering challenges are quite different. A large quantity of data flows through a surveillance network. Especially, the requirement for extreme economizing in use of power and network bandwidth, which is a dominating factor in most sensor networks, is excluded from most surveillance networks [43]. Fig. 12 illustrates a simple architecture for information fusion. The nodes scan the environment periodically and transmit a signal. The received signal is first processed by a preprocessor to extract significant characteristics from the environment. The preprocessors are responsible for quantifying how much the environment is different from the steady state. The information fusion function then deducts if there is an intruder present or not [34]. Due to the availability of more advanced and powerful communications, sensors, and processing units, the architectural choice in the 3GSSs can potentially become extremely variable and flexibly customized to acquire a desired performance level. The system architecture represents a key factor. For instance, different levels of distributed intelligence can result in preattentive detection methods either closer to the sensors or deployed at different levels in a computational processing hierarchy. Another source of variability results from the usage of heterogeneous networks, either wireless or wired, and transmission modalities both in means of source and channel coding and in means of multiuser access techniques. Temporal and spatial

502

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

Fig. 12.

Simple example of a basic architecture [34].

coding scalability can be extremely productive for reducing the quantity of information to be transmitted by every camera depending on the intelligence level of the camera itself. Multiple access techniques are a fundamental tool to allow a significant amount of sensors to share a communication channel in the most efficient and robust way [3]. Surveillance network management techniques are required in the 3GSSs to coordinate distributed intelligence modules to acquire optimal performances and to adjust the system behavior according to the variety of conditions occurring either in a scene or in the parameters of a system. All of these tools are crucial to design efficient systems. Finally, a further evolution is the integration among surveillance networks based on different types of sensor information, such as audio or visual, but oriented according to completely different functionalities, e.g., face detection, and different types of sensors, e.g., standard cameras [3]. G. Utilization of Mobile Robots Seals defines a robot to be an automatic machine with a certain degree of autonomy, which is designed for active interaction with the environment. It integrates different systems for the perception of the environment, decision making, and formation and execution of plans. In addition to these characteristics, a mobile robot must produce a transitable path and then follow this path [44]. The extremely hostile environments imposed by combat, space, and deep ocean environments created the need for practical autonomous vehicles for military applications, space, and ocean exploration. Several efforts have formed the foundation for autonomous vehicle development, such as Shaky, Jason, and the Stanford Cart. These first generation autonomous vehicles were used to explore fundamental issues in vision, planning, and robot control [45]. These systems were strictly hampered by primitive sensing and computing hardware. Efforts in the 1980s created the second generation of autonomous vehicle testbeds. This era includes the developments of autonomous land vehicle (ALV) and the United States Marine Corps (USMC) ground surveillance robot (GSR). The GSR was an autonomous vehicle, which transited from one known geographic location to another known geographic location across a completely unknown terrain [45].

In detail, the GSR was an experimental M114 personnel carrier, which had been modified for computer control. It had sensors and computer control for vision, navigation, and proximity aspects. The vision subsystem was mounted on a transport platform. The proximity sensor subsystem used acoustic ranging sensors to provide short range obstacle position and target tracking information. The proximity sensor subsystem fused the information from the sensors into consistent target and obstacle position and velocity vectors. In target tracking, vision estimates of target bearing could be fused with proximity estimates to enhance the knowledge of target angular position and motion accurate vehicle response [46]. SURBOT was another notable mobile surveillance robot developed in 1985. SURBOT was developed by Remote Technology Corporation (REMOTEC) to execute visual, sound, and radiation surveillance within rooms specified as radiologically hazardous at nuclear power plants. The results verified that SURBOT could be used for remote surveillance in 54 separate controlled radiation rooms at the plant [47]. Currently, the development of a completely automated surveillance system based on mobile multifunctional robots is an active research area. Mobility and multifunctionality are generically adopted to reduce the amount of sensors required to cover a given region. Mobile robots can be organized in teams, which results in intelligent distributed surveillance over considerable areas. Several worldwide projects attempt to develop completely or semiautonomous mobile security systems. There are a few security robot guards commercially available, e.g., CyberGuard, RoboGuard, and Security Patrolbot [48]. Recent progression in automation technologies, combined with research in machine vision and robot control, should in the near future allow industrial robots to adapt to unexpected variations in their environments. Such autonomous systems are dependent on real-time sensor feedback to reliably and precisely detect, recognize, and continuously track objects within the robot’s workspace, especially for applications such as onthe-fly object interception [49]. Traditionally, the amount of different sensors mounted on the robot, the amount of tasks related to navigation, exploration, monitoring, and detection operations present the design of the overall control system challenging. In recent years, there has been research in issues, such as autonomous navigation in indoor and outdoor environments and outdoor rough terrains, visual recognition, sensor fusion and modulation, and sensor scheduling. An essential part of the research has concentrated on behavior-based approaches in which complexity is reduced with computationally simple algorithms that process sensor information at real-time with high-level inference strategies [48]. The inclusion of distributed artificial intelligence has introduced the development of new technologies in detection (sensors and captors), robotics (actuators), and data communication. These technologies enable surveillance systems to detect a wider frequency range, to cover a wider sensor area, and to decide the character of a particular situation [50]. Researchers in robotics have debated the surveillance issue. Robots and cameras installed can identify obstacles or humans in the environment. The systems guide robots around these ob-

¨ RATY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY

503

Fig. 13.

Platform model of iBot [52]. Fig. 14. Detected target tracked and geo-registered on the map [53].

stacles. These systems typically extract purposeful information from massive visual data, which requires substantial computation or manpower [51]. A security guard system, which uses autonomous mobile guard robots, can be used in buildings. The guard can be a wheel-type autonomous robot that moves on a planned path. The robot is always on alert for anything unusual, from moving objects to leaking water. The robot is equipped with cameras. While the robot is patrolling, it transmits images back to the monitoring station. After the robot finishes patrolling, it can automatically return to and dock in a battery recharging station. These security robot systems can improve the security of homes and offices [52]. A basic need in security is the ability to automatically verify an intruder, to alert remote guards, and to allow them to monitor the intruder when an intruder enters a secure or prohibited area. To assure both mobility and automaticity, the camera is embedded onto a teleoperated robot. Mobility and teleoperation, in which security guards can remotely instruct a mobile robot to track and identify a potential intruder, are more attractive than conventional immovable security systems [52]. An example of mobile robots is “iBotGuard” which was developed by Liu et al. [52]. It is an Internet-based intelligent robot security system, which can detect intruders utilizing invariant face recognition [52]. Fig. 13 illustrates the iBot platform model. This platform enables users to remotely control a robot in response to live video images captured by the camera on the robot. The iBot Server connects the robot and camera over a wireless channel, excluding problems associated with cables [52]. The iBot Server includes two components, which are 1) a streaming media encoder (SME) and 2) a controller server (CS). The iBot client includes another two components, which are 1) a streaming media player (SMP) and 2) a controller client (CC). The SME captures and encodes the real-time video from the camera on the robot under the instruction of the CS. The encoded streams are delivered by the streaming media server (SMS) to the SMP. The SMP receives, decodes, and displays the media data. The CC communicates with the SMP and the CC interacts with the CS to perform the intelligent control algorithms. The CS

eventually deploys its robot movement commands and camera pan-tilt-zoom commands [52]. Liu et al. present an unmanned water vehicle (UWV), which performs automatic maritime visual surveillance. The UMV mobile platform is equipped with a GPS device and a highresolution omnicamera. Omnicameras provide a 360◦ view capability. Targets are detected with a saliency-based model and adaptively tracked with through selective features. Each target is geo-registered to a longitude and latitude coordinate. The target geo-location and appearance information is then transmitted to the fusion sensor, where the target location and image is displayed on a map, as in Fig. 14 [53]. VII. DISCUSSION ON CURRENT DILEMMAS IN THE 3GSS According to Pavlidis et al. [54], the contemporary security infrastructure could be summarized as the following: 1) security systems act locally and they do not cooperate in an efficient manner; 2) extremely high value assets are insufficiently protected by obsolete technology systems; and 3) there is a dependence on intensive human concentration to detect and assess threats. Considering the practical realities, Pavlidis et al. [54] recommend to cooperate closely with both the business unit that would productize the surveillance prototype and the potential customers [54]. Security-related technology is a growing industry. Governments and corporations worldwide are spending billions of dollars in the research, development, and deployment of intelligent video surveillance systems, data mining software, biometrics systems, and Internet geolocation technology. The technologies target terrorists and violators of export restrictions. Surveillance technologies are typically shrouded with secrecy, because the fear of exposing them will make them less efficient, but the growing utilization of these technologies has provoked public interest and resistance to security-related technologies [55]. The following sections consist of the most notable aspects discovered in literature review. They compose of the attainment of real-time distributed architecture, awareness and intelligence, existing difficulties in video surveillance, the utilization of wireless networks, the energy efficiency of remote sensors,

504

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

the location difficulties of surveillance personnel, and scalability difficulties. A. Real-Time Distributed Architecture It is fundamental to establish a framework or methodology for designing distributed wide-area surveillance systems. This ranges from the generation of requirements to the creation of design paradigms by defining functional and intercommunication models. The future realization of a wide-area distributed intelligent surveillance system should be through a collection of distinct disciplines. Computer vision, telecommunications, and system engineering are clearly needed [2]. A distributed multiagent approach may provide numerous benefits. First, intelligent cooperation between agents may enable the use of less expensive sensors and, therefore, a large number of sensors may be deployed over a larger area. Second, robustness is enhanced, because even if some agents fail, others remain to perform the mission. Third, performance is more flexible, there is a distribution of tasks at miscellaneous locations between groups of agents. For instance, the likelihood of correctly classifying an object or target increases if multiple sensors are concentrated on it from different locations [2]. A video surveillance network is a complicated distributed application and requires sophisticated support from middleware. The role of middleware is primarily to support communication between modules. The nonfunctional requirements for the video surveillance networks are best defined in architectural terms and contain scalability (middleware must offer tools suitable for the scalable re-implementation of these algorithms), availability (the middleware needs to support sufficient fault tolerance to uphold acceptable levels of availability), evolvability (the capacity of the surveillance network to adjust to changes, including changes to the hardware and modifications to the software), integration (middleware is the intermediary for this type of communication), security (middleware needs to offer security facilities to address such attacks), and manageability (the network middleware must support the on-demand requirement for manageability) [43]. The systems provide a concrete and profitable assistance to forensic investigations, despite that their potential capabilities are decreased in reality by the limitations of storage capacities, the frame skipping, and the data compression. Currently realtime reactivity is insufficient, because the human operators that cannot handle enormous amounts of surveillance streams [57]. 1) Architectural Dilemmas in Video Surveillance: While existing research has addressed multiple issues in the analysis of surveillance video, there has been little work in the area of more efficient information acquisition based on real-time automatic video analysis, such as the automatic acquisition of high-resolution face images. There is a challenge in transmitting information across different scales and the interpretation of the information become essential. Multiscale techniques present a completely novel region of research, including camera control, processing video from moving cameras, resource allocation, and task-based camera management in addition to challenges in performance modeling and evaluation [41].

The fundamental techniques for interpreting video and extracting information from it have received a substantial amount of attention. The successive set of challenges addresses on how to use these techniques to construct large-scale deployable systems. Several challenges of deployment contain the cost minimization of wiring, low-power hardware for battery-operated camera installations, automatic calibration of cameras, automatic fault detection, and the development of system management tools [41]. Improving the smart cameras with additional sensors could transform them into a high-performance multisensor system. By combining visual, acoustic, tactile, or location-based information, the smart cameras become more sensitive and can transmit results that are more precise. This makes the results more applicable widely [11]. The usual scenario in an industrial research and development unit developing vision systems is that a customer presents a system specification and its requirements. The engineer then interprets these requirements into a system design and validates that the system design fulfils the user-specified requirements. The accuracy requirements are typically defined in terms of detection and false alarm rates for objects. The computational requirement is specified commonly by the system response time to the presence of an object, e.g., real-time or delayed. The intention of the vision systems engineer is to then exploit these restrictions and design a system that is operational in the sense that it satisfies customer requirements regarding speed, accuracy, and expenses [58]. The essential dilemma is that there is no known systematic way for vision systems engineers to conduct this translation of the system requirements to a detailed design. It is still an art to engineer systems that satisfy application-specific requirements. There are two basic steps in the design process, which are 1) the choice of the system architecture and the modules to achieve the task, and 2) the statistical analysis and validation of the system to check if it fulfils user requirements. In real life, the system design and analysis phases usually follow each other in a cycle until the engineer creates a design and a suitable analysis that satisfies the user specifications [58]. Automation of the design process is a research area with multiple open issues, even though there has been some studies in the context of image analysis, e.g., automatic programming. The systems analysis (performance characterization) phase in the context of video processing systems has been an active region of research in the recent years. Performance evaluation of image and video analysis components or systems is an active research topic in the vision community [58]. 2) Real-Time Data Constraints: Society requires the results of research activities to address new solutions in video surveillance and sensor networks. Security and safety calls for new generations of multimedia surveillance systems, in which computers will act not only as supporting platforms but as the essential core of real-time data comprehension process, is becoming a reality [57]. Most of the new research activities in surveillance are exploring larger dimensions, such as distributed video surveillance systems, heterogeneous video surveillance, audio surveillance,

¨ RATY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY

505

and biometric systems. In vast distributed environments, the exploitation of networks of small cooperative sensors should considerably improve the surveillance capability of high-levels sensors, such as cameras [57]. As system size and diversity grow and consequently the complexity increases, the probability of inconsistency, unreliability and nonresponsiveness grows. The design and implementation of distributed real-time systems present essential challenges to ensure that these complicated systems function as required. To comprehend or implement any complex system, it is necessary to decompose it into component parts and functions. Distributed systems can be considered in terms of independent concurrent activities that need to exchange data that do not weaken the overall predictability and performance of the system [59]. There are four crucial objectives that design methods for realtime systems should achieve, 1) to be able to structure the system in concurrent tasks, 2) to be capable of developing reusable software by information hiding, 3) to be able to determine the behavioral characteristics of the system, and 4) to be able to analyze the performance of the design by distinguishing its performance and the fulfillment of requirements [59]. The main motivation of the paradigm shift from a central to a distributed control surveillance system is an improvement of the functionality, availability, and autonomy of the surveillance system. These surveillance systems can respond autonomously to changes in the environment of the system and to detected events in the monitored scenes. A static surveillance system configuration is not desirable. The system architecture must support reconfiguration, migration, quality of service, and power adaptation in analysis tasks [11]. Recently, there has been rapid development in advanced surveillance systems to solve a collection of difficulties that vary from people recognition to behavior analysis with the intention to enhance security. These challenges have encountered different perspectives and were followed by a vast selection of system architectures. As cheaper and faster computing hardware accompanied with efficient and versatile sensors reached the consumer, there was a rapid development of multicamera systems. In spite of their large area coverage, they introduce new dilemmas that must be addressed in the architectural definition [60]. B. Difficulties in Video Surveillance In realistic surveillance scenarios, it is impossible for a single sensor to view all the areas simultaneously, or to visually track a moving object for a long period. Objects become occluded by buildings and trees and the sensors themselves have confined fields of view. A promising solution to this difficulty is to use a network of video sensors to cooperatively monitor all the objects within an extended region and seamlessly track individual objects that cannot be viewed continuously by an individual sensor alone. Some of the technical challenges within this method are to 1) actively control sensors to cooperatively track multiple moving objects, 2) fuse information from multiple sensors into scene-level object representations, 3) survey the scene for events and activities that should “trigger” further

processing or operator involvement, and 4) offer human users a high-level interface for dynamic scene visualization and system tasking [12]. Intelligent visual surveillance is a vital application area for computer vision. In situations in which networks of hundreds of cameras are used to cover a wide area, the obvious restriction is the ability of the user to manage vast amounts of information. Due to this reason, automated tools that can generalize activities or track objects are crucial to the operator. The ability to track objects across (spatially separated) camera scenes is the key to the user requirements. Extensive geometric knowledge of the site and camera positions is normally needed. This type of explicit mapping to camera placement is impossible for large installations, because it requires that the operator knows to which camera to switch when an object vanishes [61]. While detecting and tracking objects are crucial capabilities for smart surveillance, from the perspective of human intelligence analyst, the most critical challenge in video-based surveillance is interpreting the automatic analysis of data into the detection of events of interest and the identification of trends. Contemporary systems have just begun to examine automatic event detection. The key points are video-based detection and tracking, video-based person identification, large-scale surveillance systems, and automatic system calibration [41]. Object tracking is a vital task for many applications in the region of computer vision and particularly in those associated to video surveillance. Recently, the research community has concentrated its interests on developing smart applications to enhance event detection capabilities in video surveillance systems. Advanced visual-based surveillance systems need to process videos resulting from multiple cameras to detect the presence of mobile objects in the monitored scene. Every detected object is tracked and their trajectories are analyzed to deduct their movement in the scene. Finally, at the highest levels of the system, detected objects are recognized and their behavior is analyzed to verify if the state is normal or potentially dangerous [62]. Motion detection, tracking, behavior comprehension, and personal identification at a distance can be realized by single camera-based visual surveillance systems. Multiple camerabased visual surveillance systems can be helpful, because the surveillance region is enlarged and multiple view information can outperform occlusion. Tracking with a single camera easily creates obscurity resulting from occlusion or depth (see Fig. 15). This incomprehensibility may be removed by another view. Visual surveillance using multicameras introduces dilemmas, such as camera installation, camera calibration, object matching, automated camera switching, and data fusion [20]. The recognition of human activities in restricted settings, such as airports, parking lots, and banks is of significant interest in security and automated surveillance systems. Albanese et al. [63] state that science is still far from achieving a systematic solution to this difficulty. The analysis of activities executed by humans in restricted settings is of great importance in applications, such as automated security and surveillance systems. There has been essential interest in this area where the challenge is to automatically recognize the activities occurring in the field of a camera and detect abnormalities [63].

506

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

Fig. 15.

Example of occlusion [15].

Visual surveillance is a very active research area in computer vision because of the rapidly increasing number of surveillance cameras, which results in a strong demand for the automatic processing methods of their output. The scientific challenge is to plan and implement automatic systems that can detect and track moving objects, and interpret their activities and behaviors. This need is a worldwide phenomenon, which is required by both private companies, governmental and public institutions, with the aim of enhancing public safety. Visual surveillance is a key technology in public safety, e.g., in transport networks, town centers, schools, and hospitals. The main tasks in visual surveillance systems contain motion detection, object classification, tracking, activity understanding, and semantic classification [25]. Luan et al. proclaim that tracking in low frame rate (LFR) video is a practical requirement for numerous real-time applications, including visual surveillance. For tracking systems, an LFR condition is equivalent to abrupt motion, which is typically encountered but difficult to address. Specifically, these difficulties include poor motion continuity, and fast appearance variation of target and increased background clutter. The majority of existing approaches cannot be readily applied to LFR tracking problems because of their vulnerability to motion and appearance discontinuity inflicted by LFR data [64]. 1) Occlusions: Outdoor and indoor surveillance has some distinct requirements. Indoor surveillance can be considered as less complicated than outdoor surveillance. The operating conditions are stable in indoor environments. The cameras are typically fixed and not subject to vibration, weather conditions do not affect the scene, and the moving targets are generally limited to people. Regardless of these simplified conditions, inhouse scenes are characterized by other eccentricities, which enlarge the dilemmas of surveillance systems [65]. Occlusions and operation in difficult weather conditions are fundamental challenges. In a multiple-target-tracking system, the key points of the local tracker are typically the detection subsystem and the measurements-to-tracks association subsystem. The design of the association system is dependent on the quality of the detection subsystem [66]. The difficulty of tracking multiple objects among complicated crowds in busy areas is far from being completely solved. The majority of existing algorithms are designed under one or multiple presumptions on occlusions, e.g., the number of objects, partial occlusion, short-term occlusion, constant motion, and constant illumination. Some methods use a human model for

reasoning the occlusion between standing humans. Exploiting a human appearance model can achieve better results in tracking multiple standing and walking people in a large crowd, but it may also result in difficulties in addressing occlusions involving objects, such as bags, luggage, children, sitting people, and vehicles. The change and interchange of labels of tracked objects after occlusion are the most conventional and significant errors of these methods [24]. Tracking multiple people in cluttered and crowded scenes is a demanding task primarily because of the occlusion between people. If a person is visually isolated, it is easier to perform the tasks of detection and tracking. The increase of the density of objects in the scene increases interobject occlusions. A foreground blob may not belong to a single individual and it may belong to several individuals in the scene. A person may even be completely occluded by other people, resulting in an impossibility to detect and track multiple individuals with a single camera. Multiple views of the same scene attempts to acquire information that might be omitted in a particular view [67]. The usage of multiple cameras in visual surveillance has grown significantly, because it is very useful to address many difficulties, such as occlusion. Visual surveillance that uses multiple cameras has numerous problems though. These include camera installation, calibration of multiple cameras, correspondence between multiple cameras, automated camera switching, and data fusion [68]. 2) Feature Extraction and Classification: The recognition of moving targets in a video stream still remains a difficulty. Moving target recognition entails two main steps, which are 1) feature extraction and 2) classification. The feature extraction process derives a collection of features from the video stream. Numerous machine-learning classification techniques have been studied for surveillance tasks [69]. The most typical approach to detect moving objects is background subtraction in which each frame of a video sequence is compared against a background model. One dilemma in background subtraction is caused by the detection of false objects when an object that belongs to the background, e.g., after remaining stationary for a period of time, moves away. This creates what are called “ghosts.” It is vital to address this problem because ghost objects will unfavorably affect many tasks, such as object classification, tracking and event analysis, e.g., abandoned item detection [70]. Fig. 16 presents visual results from a basic motion tracker and a ghost detection algorithm. Boxes with dark borders indicate the valid moving tracks created by the tracker. Boxes with dashed dark borders denote the valid but static tracks. Boxes with white borders represent invalid tracks, also known as, ghost tracks. The patches presented in the boxes present the foreground pixels, which are detected as moving pixels [70]. 3) Automatic Video Analysis: The strategy proposed by Wang et al. [71] to support rapid decision making is to reduce the amount of information required to be processed by human operators. For this reason, researchers have been studying automatic video content analysis technologies to extract information from videos. Even though substantial progress has been made, the high computational cost of these techniques limits their us-

¨ RATY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY

507

Fig. 16.

Example basic tracking and ghosts [70].

age in real-time situations in the near future. Even though these techniques can essentially reduce the amount of video information, which must be analyzed by human operators, the human operators must resolve the ambiguities in the videos, synthesize a vast range of context information within the videos, and make final decisions. Therefore, it is important to design interactive visualizations that can support real-time information synthesis and decision making for video surveillance tasks [71]. Additionally, the amount of cameras and the area under surveillance are restricted by the personnel available. To reduce the restrictions of traditional surveillance methods, there is ongoing effort in the computer vision and artificial intelligence community to develop automated systems for the realtime monitoring of people, vehicles, and other objects. These systems can create a depiction of the events occurring within their vicinity and raise alarms if they detect a suspicious person or unusual activity [22]. Camera systems for surveillance are in extensive use and produce considerable amounts of video data, which are stored for future or immediate utilization. In this context, efficient indexing and retrieval from surveillance video databases are crucial. Surveillance videos are rich in motion information, which is the most important cue to identify the dynamic content of videos. Extraction, storage, and analysis of motion information in videos and in content-based surveillance video retrieval are of importance [72]. C. Awareness and Intelligence The ultimate goal of surveillance systems is to automatically evaluate the ongoing activities of the monitored environment by flagging and presenting the suspicious events at real-time to the operator to prevent dangerous situations. Data fusion techniques can be used to enhance the estimation of performance and system robustness by exploiting the redundancy offered by multiple sensors observing the same scene. With recent advancements in camera and processing technology, data fusion is being considered for video-based systems. Intelligent sensors, which are equipped with microprocessors to execute distributed

data processing and computation, are available and can decrease the computational burden of a central processing node [73]. The reliability of sensors is never explicitly considered. The difficulty in choosing the most relevant sensor or collection of sensors to execute a particular task often arises. The task could be target tracking, audio recording of a suspicious event, or triggering an alarm. It would be desirable to have a system that could automatically select the correct camera or collection of cameras. If data from multiple sensors are available and data fusion can be performed, results could be considerably affected in a case of a malfunctioning sensor. A means to evaluate the performance of the sensors and to weight their contribution in the fusion process is required [73]. In the contemporary generation of surveillance systems, in which multiple asynchronous and miscellaneous sensors are used, the adaption of the information acquired from them to derive the events from the environment is an important and challenging research problem. Information adaption refers to the process of combining the sensor and nonsensor information using the context and past experience. The issue of information adaption is vital, because when information is acquired from multiple sources, adapted information offers more precise inferences of the environment than individual sources [74]. 1) Context Awareness: To improve software autonomy, applications depend on the context information to dynamically adapt their behavior to match the environment and user requirements. Context-aware applications require middleware for the transparent distribution of components. Context-aware applications are needed to support personalization and adaptation based on context awareness. The user must understand how the applications function, such as what context information and logic are utilized at particular automated actions. Context-aware applications must ensure that actions are committed on behalf of users are both accountable and intelligible. The system cannot simply be trusted to act on behalf of users [75]. To address these dilemmas, autonomous context-aware systems need to provide mechanisms to present a suitable balance between user control and software autonomy. This contains providing mechanisms to make users aware of application adaptations by indicating aspects of the application state, such as context information and adaptation logic used in decision making processes. The challenge is not to only identify what application state information should be presented, but in what manner, e.g., with what level of explanation. In traditional applications, the tradeoff between user control and software autonomy has been fixed during the design phase. In contrast, context-aware applications may need to adjust the balance of software autonomy and user control at run-time by changing the level of feedback to users and the content of user input. The support for adaption includes the management of rules and user preferences that are used to distinguish how the context-aware system will respond to the available context information [75]. The design of aware systems, the systems that have capabilities of automatic adaption to changes, learning from experience, and active interaction with external entities, are all active topics of research involving several disciplines ranging from computer vision to artificial intelligence. To reach this goal, the approaches

508

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

Fig. 17.

Example of a cognitive cycle [76].

based on the imitation of human brain skills are typical and, in the past, they have offered successful applications. In Fig. 17, Dore et al. present a possible model that contains sensing information from the external world, analyzing and representing the information, conducting decisions, and issuing actions and communications to the external world [76]. 2) Data Fusion: Blasch and Plano [77] state that “data fusion” is a term used to refer to the bottom-level, data-driven fusion. “Information fusion” refers to processing of already-fused data, such as from primary sensors or sources, into meaningful and preferably relevant information to another part of the system, human or not [77]. A multimedia system incorporates relevant media streams to accomplish a detection task. As the different streams have different confidence levels in achieving distinct tasks, it is vital for the system to wisely identify the most appropriate streams for a specific analysis task for it to reach higher confidence. The confidence information of media streams is usually used in their incorporation by assigning weights to them accordingly. The confidence in a stream is normally determined on how it has assisted in performing the detection task previously. Arguably, if the system acquires precise results based on a particular stream, a higher confidence level is assigned to it in the adaption process [16], [17]. Data fusion from multiple cameras involving the same objects is a main challenge in multicamera surveillance systems and influences the optimal data combination of different sources. It is required to estimate the reliability of the available sensors and processes to combine complementary information in regions where there are multiple views to solve dilemmas of specific sensors, such as occlusions, overlaps, and shadows. Some traditional benefits, in addition to extended spatial coverage, are the enhancements in accuracy with the combination of covariance reduction, improved robustness by the identification of malfunc-

tioning sensors, and enhanced continuity with complementary detections [78]. Typically, surveillance systems are composed of numerous sensors to acquire data from each target in the environment. These systems encounter two types of dilemmas, which are 1) the fusion of data which addresses the combination of data from distinct sources in an optimal manner, and 2) the management of multiple sensors, which addresses the optimization of the global management of the system through the application of individual operations in every sensor [14]. In Castanedo et al.’s [14] surveillance systems, autonomous agents can cooperate with others agents for two different objectives, which are 1) to acquire enhanced performance or precision for a specific surveillance task, in which the complementary information can be incorporated and then combined through data fusion techniques, and 2) to use capabilities of other agents to expand system coverage and execute tasks that they are not able to achieve individually [14]. The information adaption is a challenging task, because of 1) the diversity and asynchrony of sensors, 2) the disagreement or agreement of media streams, and 3) the confidence regarding the media streams. There is an issue on how to fuse individual information to establish comprehensive information. These are items of importance and essential challenges [74]. D. Wireless Networks and Their Applicability The WSN has multiple applications in environment monitoring applications. Advances in microsensor and communication technologies have made it possible to manufacture cost-effective and small WSNs. Several interesting WSN applications have been specified, such as the active badge system, which locates individuals within a building. Radio frequency identification (RFID) technology is utilized in inventory management and monitoring, e.g., rail car tracking. The confidence in object location may be improved with an RFID stream in comparison to an audio stream [16]. Berkeley Smart Dust can be used to periodically receive readings from sensors. The Massachusetts Institute of Technology (MIT) cricket uses the time difference of arrival (TDoA) model to distinguish the position and orientation of a device [79]. The combination of these technologies could provide many new applications. The sensor networks can detect and indicate environment-related information and events. Through messaging systems, these events can be transmitted to the outside world for immediate processing. These events may trigger human or application programs to respond with actions, which may be further conveyed back into the sensor networks [79]. By adopting the networks as the communications medium for real-time transmission of video signals in a security-sensitive operation, many technological issues need to be resolved. A great amount of data flow can cause network congestion. The system must provide real-time transmission of video signals even though there might be only a small amount of bandwidth available. Robust and efficient error control mechanisms and video compression techniques need to be used to prevent the difficulties related to limited bandwidth [4].

¨ RATY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY

509

Recently, there has been an emphasis on the development of wide-area distributed wireless sensor networks with selforganization capabilities to tolerate sensor failures, changing environmental conditions, and distinct environmental sensing applications. Particularly, mobile sensor networks (MSNs) require support from self-configuration mechanisms to guarantee adaptability, scalability, and optimal performance. The best network configuration is typically time varying and context dependent. Mobile sensors can physically change the network topology, responding to events of the environment or to changes in the mission [80].

E. Energy Efficiency of Remote Sensors With the emergence of high-resolution image sensors, video transmission requires high-bandwidth communication networks. It is predicted that future intelligent video surveillance requires more computing power and higher communication bandwidth than currently. This results in higher resolution images, higher frame rates, and increasing numbers of cameras in video surveillance networks. Novel solutions are needed to handle demanding restrictions of video surveillance systems, both in terms of communication bandwidth and computing power [81]. Intruder detection and data collection are examples of applications envisioned for battery-powered sensor networks. In many of these applications, the detection of a certain triggering event is the initial step executed prior to any other processing. If trigger events occur seldom, sensor nodes will use a large majority of their lifetime in the detection loop. The efficient use of system resources in detection then plays a key role in the longevity of the sensor nodes. The energy consumption in the system includes transmission energy and the energy required by processing has not been considered directly in the detection problem [82]. It is crucial to note that technology scaling will gradually decrease the processing costs with the transmission cost remaining constant. With the usage of compression techniques, one can reduce the number of transmitted bits. The transmission cost is decreased with an increase of additional computation. This communication computation tradeoff is the fundamental idea behind low-energy sensor networks. This is a sharp contrast to the classical distributed systems, in which the goal is usually maximizing with the speed of execution. The most appropriate metrics in wireless networks is power. Experimental measurements indicate that the communication cost in wireless ad hoc networks can be two orders of magnitude higher than computation costs regarding consumed power [38]. Integrated video systems (IVSs) are based on the recent development of smart cameras. In addition to high demands in computing performance, power awareness is of major importance in IVS. Power savings may be achieved by graceful degradation of quality of service (QoS). There has been research done in the tradeoff of image quality and power consumption. The work mainly concentrates on sophisticated image compression techniques [83].

A sensor surveillance system comprises a set of wireless sensor nodes and a set of targets to be monitored. The wireless sensor nodes collaborate with each other to survey the targets and transmit the sensed data to a base station. The wireless sensor nodes are powered by batteries and have demanding power requirements. The lifetime is the duration until there is no target that can be surveyed by any wireless sensor node or data cannot be forwarded to be processed because of a lack of energy in the sensors nodes [84]. A client-side computing device has a crucial influence on the total performance of a surveillance system. The utilization of a cellular phone as a client of a surveillance system is notable, because of its portability and omnipresent computing. The integration of video information and sensor networks established the fundamental infrastructure for new generations of multimedia surveillance systems. In this infrastructure, different media streams, such as audio, video and sensor signals, would provide an automatic analysis of the controlled environment and a real-time interpretation of the scene [85]. F. Dilemmas in Scalability A scalable system should be able to integrate the sensor data with contextual information and domain knowledge provided by both the humans and the physical environment to maintain a coherent picture of the world over time. The performance of the majority of the systems is far from what is required from real-world applications [86]. A large-scale distributed video surveillance system usually comprises many video sources distributed over a vast area, transmitting live video streams to a central location for monitoring and processing. Contemporary advances in video sensors and the increasing availability of networked digital video cameras have allowed the deployment of large-scale surveillance systems over existing IP-network infrastructure. Implementing an intelligent, scalable, and distributed video surveillance system remains a research problem. Researchers have not paid too much attention on the scalability of video surveillance systems. They typically utilize a centralized architecture and assume the availability of all the required system resources, such as computational power and network bandwidth [87]. Fig. 18 presents an example of sensor coverage in a large complex [15]. The sensor and its coverage is drawn and indicated, e.g., B1, C1, C2, and C3 [15]. The integration of heterogeneous digital networks in the same surveillance architecture needs a video encoding and distribution technology capable of adapting to the currently available bandwidth, which may change in time for the same communication channel, and to be robust against transmission errors. The presence of clients with different processing power and display capabilities accessing video information requires a multiscale representation of the signal. The restrictions of surveillance applications regarding delay, security, complexity, and visual quality introduce strict demands to the technology of the video codec. In a large surveillance system, the digital network that enables remote monitoring, storage, control and analysis is not within a single local area network (LAN). It typically represents

510

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

G. Location Difficulties Location techniques have numerous possible applications in wireless communication, surveillance, military equipment, tracking, and safety applications. Sagiraju et al. [56] concentrate on positioning in cellular wireless networks. The results can be applied to other systems. In the GPS, code-modulated signals are transmitted by numerous satellites, which orbit the earth, and are received by GPS receivers to determine the current position. To calculate a position, the receiver must first acquire the satellite signals. Traditionally, GPS receivers have been designed with specific acquisition and tracking modes. After the signal has been acquired, the receiver switches to the tracking mode. If it loses the lock, then the acquisition needs to be repeated [56]. The GPS system comprises of at least 24 satellites in orbit around the world, with at least four satellites viewable from any point, at a given time, on Earth. Despite GPS being a sophisticated solution to the location discovery process, it has multiple network dilemmas. First, GPS is expensive both in terms of hardware and power requirements. Second, GPS requires line-of-sight between the receiver and the satellites. It does not function well when obstructions, such as buildings, block the direct “view” of the satellites. Locations can be calculated by trilateration. For a trilateration to be successful, a node needs to have at least three neighbors who already are aware of their positions [38]. Security personnel review their wireless video systems for critical incident information. Complementary information in the form of maps and live video streaming can assist in locating the problematic zone and act quickly and with knowledge of the situation. The need for providing detailed real-time information to the surveillance agents has been identified and is being addressed by the research community [10]. The analysis and fusion of different sensor information requires mapping observations to a common coordinate system to achieve situational awareness and scene comprehension. Availability of mapping capabilities enables critical operational tasks, such as the fusion of multiple target measurements across the network, deduction of the relative size and speed of the target, and the assignment of tasks to Pan, Tilt, Zoom (PTZ) and mobile sensors. This presents the need for automated and efficient geo-registration mechanism for all sensors. For instance, target observations from multiple sensors may be mapped to a geodetic coordinate system and then displayed on a map-based interface. Fig. 19 illustrates an example of geo-registration in a visual sensor network [90]. H. Challenges in Privacy Surveillance of events poses ethical problems. For instance, events involving humans and the right to monitor can conflict with the individual privacy rights of the monitored people. These privacy challenges depend heavily on the shared acceptance of the surveillance task as a necessity by the public with respect to a given application [3]. The suitability of homeland security for this role is plagued by questions ranging from dependability to the risks that tech-

Fig. 18.

Schematic representation of sensor coverage in a large area [15].

a collection of interconnected LANs, wired or wireless, with different bandwidths and QoS. Different types of clients connect to these networks and access one or multiple video sources, decode them at the temporal and spatial resolution they require, and provide different functions [88]. QoS is a fundamental concern in distributed IVS. In videobased surveillance, normal QoS parameters contain frame rate, transfer delay, image resolution, and video-compression rate. The surveillance tasks might also provide multiple QoS levels. In addition, the offered QoS levels can change over time due to user instructions or modifications in the monitored environment. Novel IVS systems need to contain dedicated QoS management mechanisms [11]. 1) Scalability in Testing: Testing of individual modules is called unit testing. Integration testing comprised of rerunning the unit test cases after the system was completely integrated. For feature testing, which is also called system testing, testers developed test cases based on the requirements of the system. They chose adequate test cases according to every expected result. Load testing comprises four subphases, which are 1) stability testing, 2) stress testing, 3) reliability testing, and 4) performance testing. Stability testing comprises the installation of software in a field-like environment and the verification of its ability to appropriately address data continuously. Stress testing comprises the verification of the ability of the software to address heavy loads for short periods without crashing. Reliability testing comprises the verification that the software can fulfill reliability requirements. Performance testing comprises the verification that the software can achieve performance requirements [89]. A substantial pitfall in incorporating intelligent functions into real-world systems is the lack of robustness, the inability to test and validate these systems under a variety of use cases, and the lack of quantification of the performance of the system. Additionally, the system should gracefully degrade in performance as the complexity of data grows. This is a very open research issue that is vital for the deployment of these systems [3].

¨ RATY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY

511

VIII. GROWING TECHNOLOGIES AND TRENDS There are novel technologies and trends, which have begun or are beginning to establish themselves. Kankanhalli and Rui [93] have indicated numerous. Prati et al. [94] introduced a multisensor surveillance system containing video cameras and passive infrared sensors (PIR). Calderara et al. [95] state that visual sensors will continue to be dominant sensors but they will be complemented with other appropriate sensors. Atrey et al. [96] claim that contemporary systems are constructed for specific physical environments with specific sensor types and sensor deployments. While this is efficient, it lacks portability required for widespread deployment. The system architecture should be capable of the usage of the sensors and resources to address the needs of the environment. With the increasing variety and decreasing expenses of miscellaneous types of sensors, there will be an increase in the usage of radically differentiated media, such as infrared, motion sensor information, text in diverse formats, optical sensor data, biological and satellite telemetric data, and location data obtained by GPS devices. Some other developments are mobile sensors, such as moving cameras on vehicles used in public buses. Humans are also mobile sensors recording information in different media types such as blogs. It would beneficial to enhance the environment with suitable sensors to reduce the sensor and semantic omissions [93]. Accompanied with the increased popularity of portable security applications, it is more important that the surveillance system has low power consumption, simple functionality, and compact size. This includes the integration of the miscellaneous functional blocks and a motion detection sensor (MDS) into a single chip [97]. The process of extracting and tracking human figures in image sequences is vital for video surveillance and video-indexing applications. A useful and popular approach is based on silhouette analysis with spatiotemporal representation in which the goal is to achieve an invariant representation of the detected object. Symmetries of the silhouette can be used as a gait parameter for the identification of a person [98]. Biometrics has been vastly applied to secure surveillance, access control, and personal identification with high security. With the rise of pervasive and personal computation, cell phones and PDAs will become a major communication and computation platform for individuals and suitable organizations. Even though biometrics has been an appropriate method for attaching physical identity to a digital correspondence, a flexible biometrics system, which can accommodate real world applications in a secure manner is still a substantial challenge [99]. The next generation video surveillance system will be a networked, intelligent, multicamera cooperative system with integrated situation awareness of complicated and dynamic scenes. It will be applicable to urban centers or indoor complexes. The essence of such a system is the increasingly intelligent and robust video analysis that is capable of reviewing the videos from lowlevel image appearance and feature extraction to middle-level object or event detection, and finally to high-level reasoning and scene comprehension. Significant steps have been reached

Fig. 19.

Fields of view of four cameras at a port [90].

nologies, e.g., surveillance, profiling, and data aggregation, pose to privacy and civil liberties [1]. In many applications, surveillance data needs to be transmitted across open networks with multiuser access characteristics. Information protection on these networks is a crucial issue for upholding privacy in the surveillance service. Paternity of surveillance data can be extremely essential for efficient use in law enforcement. Legal requirements necessitate the development of watermarking and data-hiding techniques for secure sensor identity assessment [3]. Despite the relevance of the contemporary surveillance systems, and their role of supporting human control, there is a world-spread controversy about their utilization, connected with risks of privacy violations [57]. Advancements in sensor, communications, and storage capacities ease the large collection of multimedia material. The value of this recorded data is only unlocked by technologies that can efficiently exploit the knowledge it contains. Regardless of the concerns over privacy issues, such capabilities are becoming more common in different environments, for example, in public transportation premises, cities, public building, and commercial establishments [91]. CCTV surveillance systems used in the field with their centralized processing and recording architecture together with a simple multimonitor visualization of the crude video streams has several disadvantages and restrictions. The most relevant dilemma is the complete lack of privacy. An automated and privacy-respecting surveillance system is a desirable goal. The latest video analysis systems emerging currently are based on centralized approaches that impose strict limitations to expandability and privacy [92]. To realize the fusion for integrated situation awareness, Trivedi et al. [13] developed the networked sensor tapestry (NeST) framework for multilevel semantic integration. NeST ensures the tracked person’s privacy by using a set of programmable plug-in privacy filters operating on incoming sensor data. The filters either inhibit access to the data or remove any personally identifiable information. Trivedi et al. [13] use privacy filters with a privacy grammar that can connect multiple low-level data filters and aspects to create data-dependent privacy definitions.

512

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

in examining these issues by the research laboratories in the last decade. Currently, the focus is on the application of these integrated systems and the supplying of automated solutions to realistic surveillance dilemmas [100]. There has been a dramatic progression in sensing for security applications and in the analysis and processing of sensor data. O’Sullivan and Pless [101] concentrate on two broad applications of sensors for security applications, which are 1) anomaly detection, and 2) object or pattern recognition [101]. In anomaly detection, the difficulty is to detect activity, behavior, objects, or substances that are atypical. Typical is defined with respect to historical data and is extremely scenario dependent. Algorithms for anomaly detection must adjust to the scenario and be robust to a vast range of possible assumptions. As a result, there is typically no model for an anomaly and the model for the location and time are derived from observations. Scenarios that need anomaly detection include perimeter, border, or gateway surveillance [101]. In object or pattern recognition, there is typically a model or prior information of the object or pattern and the intention is to categorize the pattern. The level of categorization, the required system robustness, and the required system efficiency define and restrict the possible models and processing. The usage of biometrics for the recognition of people is a prime example of an application that is evolving rapidly [101]. Gupta et al. propose a leader–follower system, which receives multimodal sensor information from a wide array of sensors, including radars and cameras. In such a system, a fixed wide field of view (FOV) sensor conducts the duties of the leader. The leader directs follower PTZ cameras to zoom in on targets of interest. One of the typical difficulties in a leader–follower system is that the follower camera can only follow the target as it remains in the FOV of the leader. Additionally, inaccuracies in the leader–follower calibration may result in imprecise zooming operations [102]. In general, there is plenty of prototypical research, which has transformed into practical solutions. Environments with multiple sensors include solutions in which electronic locks and user identification have been incorporated into doors, both of which can be perceived as individual sensors. The electronic lock indicates its own status and the user identification device denotes the access rights of the user. This also forms a simple realization of distributed intelligence and awareness in which each sensor acts independently but a higher level of deduction can be performed based on the individual information of each sensor. Video surveillance has been employed in solutions such as the detection of the direction of movement. Airports have utilized this technology to automatically raise alarms in situations in which a person goes through a passage in the wrong direction. Audio surveillance technology has been adopted to video camera solutions, which direct the cameras to the location of alarming sounds. Within various police forces, mobile robots have been used to remotely survey a potentially hazardous environment and transmit video feed to the user. Wireless sensor networks can be used to indicate the locations of nomadic guards to the control room within an indoor perimeter. All of these solutions have their own appropriate middle-

ware and architecture, which serves their unique properties and purposes. There are several major companies that deliver surveillance systems. GE Security offers integrated security management, intrusion and property protection, and video surveillance [103]. ObjectVideo provides intelligent video software for security, public safety, and other applications [104]. IOImage provides video surveillance, real-time detection, and alert and tracking services [105]. RemoteReality offers video surveillance services, including the detection and tracking of objects, in both visible and infrared thermal spectra [106]. Point Grey Research offers digital camera technology for machine vision and computer vision applications [107]. IX. CONCLUSION This paper presented the contemporary state of modern surveillance systems for public safety with a special emphasis on the 3GSSs and especially the difficulties of present surveillance systems. The paper briefly reviewed the background and progression of surveillance systems, including a short review of the first and second generation of surveillance systems. The third generation of surveillance systems addresses topics such as multisensor environments, video surveillance, audio surveillance, wireless sensor networks, distributed intelligence and awareness, and architecture and middleware. According to modern science, the current difficulties of surveillance systems for public safety reside in the fields of the attainment of real-time distributed architecture, awareness and intelligence, existing difficulties in video surveillance, the utilization of wireless networks, the energy efficiency of remote sensors, location difficulties of surveillance personnel, and scalability difficulties. A portion of the difficulties are the same as declared in the 3GSSs, but with detailed descriptions on the characteristics of the dilemmas, such as the architectural, visual and awareness aspects. Other difficulties are completely novel or substantially highlighted, such as surveillance personnel location, application of wireless networks, energy efficiency, and scalability. Novel sensors and new requirements will accompany surveillance systems. This places demanding challenges on architecture and its real-time functionality. There are existing fundamental concepts, such as video and audio surveillance, but there is a lack of their intelligent usage and especially their seamless interoperability through a united real-time architecture. Contemporary surveillance systems still reside in state in which individual concepts may achieve functionality in specific cases, but their comprehensive on-site interoperability is yet to be reached. Substantial evidence of a distributed multisensor intelligent surveillance system does not exist. As the size of surveyed complexes and buildings grow, the deployment of wireless sensors and their energy consumption becomes more notable. Wireless sensors are easy to deploy and low-energy consumption is constantly improving. Scalability issues are fundamentally related to magnitude of areas under surveillance. Areas that require surveillance are growing and also the complexity of surveillance systems is expanding. These both pose great challenges to the scalability aspect. Different sensors provide

¨ RATY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY

513

different information and their exploitation in intelligent tasks remain a challenge. Sensor data should be decomposed into fundamental blocks and the intelligent components should have the responsibility of composing the deductions from them. An attempt should be made to construct a multisensor distributed intelligent surveillance system that functions at a relatively high level, capturing alerting situations with a very low false alarm rate. The surveillance personnel are one of the strongest aspects in a surveillance system and should be retained in the system. Despite advancements in intelligence and awareness, the human being will always be a forerunner in adaptability and deductions. The endless demand and abundance of surveillance systems for public safety has multiple issues, which still require resolutions. Extensive intelligent and automation accompanied with energy efficiency and scalability in large areas are required to be adopted by suppliers to establish surveillance systems for civic and communal public safety. REFERENCES
[1] M. Reiter and P. Rohatgi, “Homeland security guest editor’s introduction,” IEEE Internet Comput., vol. 8, no. 6, pp. 16–17, Nov./Dec. 2004, doi: 10.1109/MIC.2004.62. [2] M. Valera and S. A. Velastin, “Intelligent distributed surveillance systems: A review,” IEE Proc.-Vis. Image Signal Process., vol. 152, no. 2, pp. 192–204, Apr. 2005, doi: 10.1049/ip-vis: 20041147. [3] C. S. Regazzoni, V. Ramesh, and G. L. Foresti, “Scanning the issue/technology special issue on video communications, processing, and understanding for third generation surveillance systems,” Proc. IEEE, vol. 89, no. 10, pp. 1355–1367, Oct. 2001, doi: 10.1109/5.959335. [4] A. C. M. Fong and S. C. Hui, “Web-based intelligent surveillance system for detection of criminal activities,” Comput. Control Eng. J., vol. 12, no. 6, pp. 263–270, Dec. 2001. [5] K. M¨ ller, A. Smolic, M. Dr¨ se, P. Voigt, and T. Wiegand, “3-D conu o struction of a dynamic environment with a fully calibrated background for traffic scenes,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 4, pp. 538–549, Apr. 2005, doi: 10.1109/TCSVT.2005.844452. [6] W. M. Thames, “From eye to electron—Management problems of the combat surveillance research and development field,” IRE Trans. Mil. Electron., vol. MIL-4, no. 4, pp. 548–551, Oct. 1960, doi: 10.1109/IRETMIL.1960.5008288. [7] H. A. Nye, “The problem of combat surveillance,” IRE Trans. Mil. Electron., vol. MIL-4, no. 4, pp. 551–555, Oct. 1960, doi: 10.1109/IRETMIL.1960.5008289. [8] A. S. White, “Application of signal corps radar to combat surveillance,” IRE Trans. Mil. Electron., vol. MIL-4, no. 4, pp. 561–565, Oct. 1960, doi: 10.1109/IRET-MIL.1960.5008291. [9] C. E. Wolfe, “Information system displays for aerospace surveillance applications,” IEEE Trans. Aerosp., vol. AS-2, no. 2, pp. 204–210, Apr. 1964, doi: 10.1109/TA.1964.4319590. [10] R. Ott, M. Gutierrez, D. Thalmann, and F. Vexo, “Advanced virtual reality technologies for surveillance and security applications,” in Proc. ACM SIGGRAPH Int. Conf. Virtual Real. Continuum Its Appl. (VCRIA), Jun. 2006, pp. 163–170. [11] M. Bramberger, A. Doblander, A. Maier, B. Rinner, and H. Schwabach, “Distributed embedded smart cameras for surveillance applications,” Computer, vol. 39, no. 2, pp. 68–75, Feb. 2006, doi: 10.1109/MC.2006.55. [12] R. T. Collins, A. J. Lipton, H. Fujiyoshi, and T. Kanade, “Algorithms for cooperative multisensor surveillance,” Proc. IEEE, vol. 89, no. 10, pp. 1456–1477, Oct. 2001, doi: 10.1109/5.959341. [13] M. M. Trivedi, T. L. Gandhi, and K. S. Huang, “Homeland security distributed interactive video arrays for event capture and enhanced situational awareness,” IEEE Intell. Syst., vol. 20, no. 5, pp. 58–66, Sep./Oct. 2005, doi:10.1109/MIS.2005.86. [14] F. Castanedo, M. A. Patricio, J. Garcia, and J. M. Molina, “Extending surveillance systems capabilities using BDI cooperative sensor agents,” in Proc. 4th Int. Workshop Video Surveill. Sens. Netw. (VSSN), Oct. 2006, pp. 131–138.

[15] S. A. Velastin, B. A. Boghossian, B. P. I. Lo, J. Sun, and M. A. VicencioSilva, “PRISMATICA: Toward ambient intelligence in public transport environments,” IEEE Trans. Syst., Man, Cybern. A, Syst. Hum., vol. 35, no. 1, pp. 164–182, Jan. 2005, doi: 10.1109/TMSCA.2004.838461. [16] Z. Rasheed, X. Cao, K. Shafique, H. Liu, L. Yu, M. Lee, K. Ramnath, T. Choe, O. Javed, and N. Haering, “Automated visual analysis in large scale sensor networks,” in Proc. 2nd ACM/IEEE Int. Conf. Distrib. Smart Cameras (ICDSC), Sep. 2008, pp. 1–10, doi: 10.1109/ICDSC.2008.4635678. [17] P. K. Atrey and A. El Saddik, “Confidence evolution in multimedia systems,” IEEE Trans. Multimedia, vol. 10, no. 7, pp. 1288–1298, Nov. 2008, doi:10.1109/TMM.2008.2004907. [18] I. N. Junejo, X. Cao, and H. Foroosh, “Autoconfiguration of a dynamic nonoverlapping camera network,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 37, no. 4, pp. 803–816, Aug. 2007, doi: 10.1109/TSMCB.2007.895366. [19] D. Makris and T. Ellis, “Learning semantic sense models from observing activity in visual surveillance,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 35, no. 3, pp. 397–408, Jun. 2005. [20] W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual surveillance of object motion and behaviors,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 34, no. 3, pp. 334–352, Aug. 2004, doi: 10.1109/TSMCC.2004.829274. [21] C. Kreucher, K. Kastella, and A. O. Hero III, “Multitarget tracking using the joint multitarget probability density,” IEEE Trans. Aerosp. Electron. Syst., vol. 41, no. 4, pp. 1396–1414, Oct. 2005, doi: 10.1109/TAES.2005.1561892. [22] M. Shah, O. Javed, and K. Shafique, “Automated visual surveillance in realistic scenarios,” IEEE Multimedia, vol. 14, no. 1, pp. 30–39, Jan.–Mar. 2007, doi: 10.1109/MMUL.2007.3. [23] G. L. Foresti, C. Micheloni, L. Snidaro, P. Remagnino, and T. Ellis, “Active video-based surveillance system,” IEEE Signal Process. Mag., vol. 22, no. 2, pp. 25–37, Mar. 2005, doi: 10.1109/MSP. 2005.1406473. [24] L. Li, W. Huang, I. Y.-H. Gu, R. Luo, and Q. Tian, “An efficient sequential approach to tracking multiple objects through crowds for real-time intelligent CCTV systems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 5, pp. 1254–1269, Oct. 2008, doi: 10.1109/TSMCB.2008.927265. [25] L. Maddalena and A. Petrosino, “A self-organizing approach to background subtraction for visual surveillance applications,” IEEE Trans. Image Process., vol. 17, no. 7, pp. 1168–1177, Jul. 2008, doi: 10.1109/TIP.2008.924285. [26] Y. Li, C. Huang, and R. Nevatia, “Learning to associate: Hybrid boosted multi-target tracker for crowded scene,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2009, pp. 2953–2960, doi: 10.1109/CVPRW.2009.5206735. [27] A. Leykin, Y. Ran, and R. Hammoud, “Thermal-visible video fusion for moving target tracking and pedestrian classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2007, pp. 1–8, doi: 10.1109/CVPR.2007.383444. [28] A. Leykin and R. Hammoud, “Robust multi-pedestrian tracking in thermal-visible surveillance videos,” in Proc. Conf. Comput. Vis. Pattern Recognit. Workshop (CVPRW), Jun. 2006, pp. 136–143, doi: 10.1109/CVPRW.2006.175. [29] W. K. Wong, P. N. Tan, C. K. Loo, and W. S. Lim, “An effective surveillance system using thermal camera,” in Int. Conf. Signal Acquis. Process. (ICSAP), Apr. 2009, pp. 13–17, doi: 10.1109/ICSAP.2009.12. [30] D. Istrate, E. Castelli, M. Vacher, L. Besacier, and J. F. Serignat, “Information extraction from sound for medical telemonitoring,” IEEE Trans. Inf. Technol. Biomed., vol. 10, no. 2, pp. 264–274, Apr. 2006, doi: 10.1109/TITB.2005.859889. [31] M. Stanacevic and G. Cauwenberghs, “Micropower gradient flow acoustic localizer,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 10, pp. 2148–2157, Oct. 2005, doi: 10.1109/TCSI.2005.853356. [32] P. Julian, A. G. Andreou, L. Riddle, S. Shamma, D. H. Goldberg, and G. Cauwenberghs, “A comparative study of sound localization algorithms for energy aware sensor network nodes,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 4, pp. 640–648, Apr. 2004, doi: 10.1109/TCSI.2004.826205. [33] A. F. Smeaton and M. McHugh, “Towards event detection in an audiobased sensor network,” in Proc. 3rd Int. Workshop Video Surveill. Sens. Netw. (VSSN), Nov. 2005, pp. 87–94. [34] J. Chen, Z. Safar, and J. A. Sorensen, “Multimodal wireless networks: Communication and surveillance on the same infrastructure,” IEEE

514

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 40, NO. 5, SEPTEMBER 2010

[35] [36] [37]

[38]

[39]

[40]

[41]

[42]

[43]

[44] [45] [46] [47] [48]

[49]

[50]

[51] [52]

[53] [54] [55]

Trans. Inf. Forensics Secur., vol. 2, no. 3, pp. 468–484, Sep. 2007, doi: 10.1109/TIFS.2007.904944. G. Xing, C. Lu, R. Pless, and Q. Huang, “Impact of sensing coverage on greedy geographic routing algorithms,” IEEE Trans. Parallel Distrib. Syst., vol. 17, no. 4, pp. 348–360, Apr. 2006, doi: 10.1109/TPDS.2006.48. R. R. Brooks, P. Ramanathan, and A. M. Sayeed, “Distributed target classification and tracking in sensor networks,” Proc. IEEE, vol. 91, no. 8, pp. 1163–1171, Aug. 2003, doi: 10.1109/JPROC.2003.814923. A. M. Tabar, A. Keshavarz, and H. Aghajan, “Smart home care network using sensor fusion and distributed vision-based reasoning,” in Proc. 4th Int. Workshop Video Surveill. Sens. Netw. (VSSN), Oct. 2006, pp. 145– 154. S. Megerian, F. Koushanfar, M. Potkonjak, and M. B. Srivastava, “Worst and best-case coverage in sensor networks,” IEEE Trans. Mobile Comput., vol. 4, no. 1, pp. 84–92, Jan./Feb. 2005, doi: 10.1109/TMC.2005.15(410)4. V. Chandramohan and K. Christensen, “A first look at wired sensor networks for video surveillance systems,” in Proc. 27th Annu. IEEE Conf. Local Comput. Netw. (LCN), Nov. 2002, pp. 728– 729. Z. Dimitrijevic, G. Wu, and E. Y. Chang, “SFINX: A multi-sensor fusion and mining system,” in Proc. 2003 Joint Conf. Fourth Int. Conf. Inf., Commun. Signal Process., Dec., vol. 2, pp. 1128–1132, doi: 10.1109/ICICS.2003.1292636. A. Hampapur, L. Brown, J. Connell, A. Ekin, N. Haas, M. Lu, H. Merkl, S. Pankanti, A. Senior, C.-F. Shu, and Y. L. Tian, “Smart video surveillance: Exploring the concept of multiscale spatiotemporal tracking,” IEEE Signal Process. Mag., vol. 22, no. 2, pp. 38–51, Mar. 2005, doi: 10.1109/MSP.20005.1406476. S. Bandini and F. Sartori, “Improving the effectiveness of monitoring and control systems exploiting knowledge-based approaches,” Pers. Ubiquitous Comput., vol. 9, no. 5, pp. 301–311, Sep. 2005, doi: 10.1007/s00779004-0334-3. H. Detmold, A. Dick, K. Falkner, D. S. Munro, A. Van Den Hengel, and P. Morrison, “Middleware for video surveillance networks,” in Proc. 1st Int. Workshop Middleware Sens. Netw. (MidSens), Nov.–Dec. 2006, pp. 31–36. R. Seals, “Mobile robotics,” Electron. Power, vol. 30, no. 7, pp. 543–546, Jul. 1984, doi: 10.1049/ep.1984.0286. S. Harmon, “The ground surveillance robot (GSR): An autonomous vehicle designed to transit unknown terrain,” IEEE J. Robot. Autom., vol. RA3, no. 3, pp. 266–279, Jun. 1987, doi: 10.1109/JRA.1987.1087091. S. Harmon, G. Bianchini, and B. Pinz, “Sensor data fusion through a distributed blackboard,” in Proc. IEEE Int. Conf. Robot. Autom., Apr. 1986, pp. 1449–1454. J. White, H. Harvey, and K. Farnstrom, “Testing of mobile surveillance robot at a nuclear power plant,” in Proc. IEEE Int. Conf. Robot. Autom., Mar. 1987, pp. 714–719. D. Di Paola, D. Naso, A. Milella, G. Cicirelli, and A. Distante, “Multisensor surveillance of indoor environments by an autonomous mobile robot,” in Proc. 15th Int. Conf. Mechatronics Mach. Vis. Pract. (M2VIP), Dec. 2008, pp. 23–28, doi: 10.1109/MMVIP.2008.474501. A. Bakhtari, M. D. Naish, M. Eskandari, E. A. Cloft, and B. Benhabib, “Active-vision-based multisensor surveillance—An implementation,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 36, no. 5, pp. 668–680, Sep. 2006, doi: 10.1109/TSMCC.2005.855525. J. J. Valencia-Jimenez and A. Fernandez-Caballero, “Holonic multiagent systems to integrate multi-sensor platforms in complex surveillance,” in Proc. IEEE Int. Conf. Video Signal Based Surveill. (AVSS), Nov. 2006, p. 49, doi: 10.1109/AVS.2006.58. Y.-C. Tseng, Y.-C. Wang, K.-Y. Cheng, and Y.-Y. Hsieh, “iMouse: An integrated mobile surveillance and wireless sensor system,” Computer, vol. 40, no. 6, pp. 60–66, Jun. 2007, doi: 10.1109/MC.2007.211. J. N. K. Liu, M. Wang, and B. Feng, “iBotGuard: An internet-based intelligent robot security system using invariant face recognition against intruder,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 35, no. 1, pp. 97–105, Feb. 2005, doi:10.1109/TSMCC.2004.840051. H. Liu, O. Javed, G. Taylor, X. Cao, and N. Haering, “Omni-directional surveillance for unmanned water vehicles,” presented at the 8th Int. Workshop Vis. Surveill., Marseilles, France, Oct. 2008. I. Pavlidis, V. Morellas, P. Tsiamyrtzis, and S. Harp, “Urban surveillance systems: From the laboratory to the commercial world,” Proc. IEEE, vol. 89, no. 10, pp. 1478–1497, Oct. 2001, doi: 10.1109/5.959342. J. Krikke, “Intelligent surveillance empowers security analysts,” IEEE Intell. Syst., vol. 21, no. 3, pp. 102–104, May/Jun. 2006.

[56] P. K. Sagiraju, S. Agaian, and D. Akopian, “Reduced complexity acquisition of GPS signals for software embedded applications,” IEE Proc.-Radar Sonar Navig., vol. 153, no. 1, pp. 69–78, Feb. 2006, doi: 10.1049/ip-rsn:20050091. [57] R. Cucchiara, “Multimedia surveillance systems,” in Proc. 3rd Int. Workshop Video Surveill. Sens. Netw. (VSSN), Nov. 2005, pp. 3–10. [58] M. Greiffenhagen, D. Comaniciu, H. Niemann, and V. Ramesh, “Design, analysis, and engineering of video monitoring systems: An approach and a case study,” Proc. IEEE, vol. 89, no. 10, pp. 1498–1517, Oct. 2001, doi: 10.1109/5.959343. [59] M. Valera and S. A. Velastin, “Real-time architecture for a large distributed surveillance system,” in Proc. IEE Intell. Surveill. Syst., London, U.K., Feb. 2004, pp. 41–45. [60] C. Micheloni, L. Snidaro, L. Visentini, and G. L. Foresti, “Sensor bandwidth assignment through video annotation,” in Proc. IEEE Int. Conf. Video Signal Based Surveill. (AVSS), Nov. 2006, pp. 48–48, doi: 10.1109/AVSS.2006.102. [61] R. Bowden and P. KaewTraKulPong, “Towards automated wide area visual surveillance: Tracking objects between spatially-separated, uncalibrated views,” IEE Proc.-Vis. Image Signal Process., vol. 152, no. 2, pp. 213–223, Apr. 2005, doi: 10.1049/ip-vis: 20041233. [62] C. Micheloni, G. L. Foresti, and L. Snidaro, “A network of co-operative cameras for visual surveillance,” IEE Proc.-Vis. Image Signal Process., vol. 152, no. 2, pp. 205–212, Apr. 2005, doi: 10.1049/ip-vis: 20041256. [63] M. Albanese, R. Chellappa, V. Moscato, A. Picariello, V. S. Subrahmanian, P. Turaga, and O. Udrea, “A constrained probabilistic petri net framework for human activity detection in video,” IEEE Trans. Multimedia, vol. 10, no. 8, pp. 1429–1443, Dec. 2009, doi: 10.1109/TMM.2008.2010417. [64] L. Yuan, A. Haizhou, T. Tamashita, L. Shihong, and M. Kaware, “Tracking in low frame rate video: A cascade particle filter with discriminative observers of different life spans,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 10, pp. 1728–1740, Oct. 2008, doi: 10.1109/TPAMI.2008.73. [65] R. Cucchiara, C. Grana, A. Prati, and R. Vezzani, “Computer vision system for in-house video surveillance,” IEE Proc.-Vis. Image Signal Process., vol. 152, no. 2, pp. 242–249, Apr. 2005, doi: 10.1049/ip-vis: 20041215. [66] J. A. Besada, J. Garcia, J. Portillo, J. M. Molina, A. Varona, and G. Gonzalex, “Airport surface surveillance based on video images,” IEEE Trans. Aerosp. Electron. Syst., vol. 41, no. 3, pp. 1075–1082, Jul. 2005, doi: 10.1109/TAES.2005.1541452. [67] S. M. Khan and M. Shah, “Tracking multiple occluding people by localizing on multiple scene planes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 3, pp. 505–519, Mar. 2009, doi: 10.1109/TPAMI.2008.102. [68] W. Hu, M. Hu, X. Zhou, T. Tan, J. Lou, and S. Maybank, “Principal axisbased correspondence between multiple cameras for people tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 663–671, Apr. 2006, doi: 10.1109/TPAMI.2006.80. [69] D.-Y. Chen, K. Cannons, H.-R. Tyan, S.-W. Shih, and H.-Y. M. Liao, “Spatiotemporal motion analysis for the detection and classification of moving targets,” IEEE Trans. Multimedia, vol. 10, no. 8, pp. 1578–1591, Dec. 2008, doi:10.1109/TMM.2008.2007289. [70] F. Yin, D. Makris, and S. A. Velastin, “Time efficient ghost removal for motion detection in visual surveillance systems,” Electron. Lett., vol. 44, no. 23, pp. 1351–1353, Nov. 2008, doi: 10.1049/el:20082118. [71] Y. Wang, D. Bowman, D. Krum, E. Coelho, T. Smith-Jackson, D. Bailey, S. Peck, S. Anand, T. Kennedy, and Y. Abdrazakov, “Effects on video placement and spatial context presentation on path reconstruction tasks with contextualized videos,” IEEE Trans. Vis. Comput. Graph., vol. 14, no. 6, pp. 1755–1762, Nov./Dec. 2008, doi:10.1109/TVCG.2008.126. [72] W. Hu, D. Xie, Z. Fu, W. Zeng, and S. Maybank, “Semantic-based surveillance video retrieval,” IEEE Trans. Image Process., vol. 16, no. 4, pp. 1168–1181, Apr. 2007, doi:10.1109/TIP.2006.891352. [73] L. Snidaro, R. Niu, G. L. Foresti, and P. K. Varshney, “Quality-based fusions of multiple video sensors for video surveillance,” IEEE Trans. Syst., Man, Cybern. – Part B: Cybern., vol. 37, no. 4, pp. 1044–1051, Aug. 2007, doi: 10.1109/TSMCB.2007.895331. [74] P. K. Atrey, M. S. Kankanhalli, and R. Jain, “Timeline-based information assimilation in multimedia surveillance and monitoring systems,” in Proc. 3rd Int. Workshop Video Surveill. Sens. Netw. (VSSN), Nov. 2005, pp. 103–112. [75] B. Hardian, “Middleware support for transparency and user control in context-aware systems,” presented at the 3rd Int. Middleware Doctoral Symp. (MDS), Melbourne, Australia, Nov.–Dec. 2006.

¨ RATY: SURVEY ON CONTEMPORARY REMOTE SURVEILLANCE SYSTEMS FOR PUBLIC SAFETY

515

[76] A. Dore, M. Pinasco, and C. S. Regazzoni, “A bio-inspired learning approach for the classification of risk zones in a smart space,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2007, pp. 1–8, doi: 10.1109/CVPR.2007.383440. [77] E. Blasch and S. Plano, “Proactive decision fusion for site security,” in Proc. 8th Int. Conf. Inf. Fusion, Jul. 2005, pp. 1584–1591, doi: 10.1109/ICIF.2005.1592044. [78] F. Castanedo, M. A. Patricio, J. Garcia, and J. M. Molina, “Robust data fusion in a visual sensor multi-agent architecture,” in Proc. 10th Int. Conf. Inf. Fusion, Jul. 2007, pp. 1–7, doi: 10.1109/ICIF.2007.4408121. [79] Y.-C. Tseng, T.-Y. Lin, Y.-K. Liu, and B.-R. Lin, “Event-driven messaging services over integrated cellular and wireless sensor networks: Prototyping experiences of a visitor system,” IEEE J. Sel. Areas Commun., vol. 23, no. 6, pp. 1133–1145, Jun. 2005, doi: 10.1109/JSAC.2005.845623. [80] J.-S. Lee, “A petri net design of command filters for semiautonomous mobile sensor networks,” IEEE Trans. Ind. Electron., vol. 55, no. 4, pp. 1835–1841, Apr. 2008, doi: 10.1109/TIE.2007.911926. [81] E. Norouznezhad, A. Bigdeli, A. Postula, and B. C. Lovell, “A high resolution smart camera with GigE vision extension for surveillance applications,” in Proc. Second ACM/IEEE Int. Conf. Distrib. Smart Cameras, Sep. 2008, pp. 1–8, doi: 10.1109/ICDSC.2008.4635711. [82] S. Appadwedula, V. V. Veeravalli, and D. L. Jones, “Energy-efficient detection in sensor networks,” IEEE J. Sel. Areas Commun., vol. 23, no. 4, pp. 693–702, Apr. 2005, doi: 10.1109/JSAC.2005.843536. [83] A. Maier, B. Rinner, W. Schriebl, and H. Schwabach, “Online multicriterion optimization for dynamic power-aware camera configuration in distributed embedded surveillance clusters,” in Proc. 20th Int. Conf. Adv. Inf. Netw. Appl. (AINA 2006), Apr., pp. 307–312, doi: 10.1109/AINA.2006.250. [84] H. Liu, X. Jia, P.-J. Wan, C.-W. Yi, S.-K. Makki, and N. Pissnou, “Maximizing lifetime of sensor surveillance systems,” IEEE/ACM Trans. Netw., vol. 15, no. 2, pp. 334–345, Apr. 2007, doi: 10.1109/TNET.2007.892883. [85] Y. Imai, Y. Hori, and S. Masuda, “Development and a brief evaluation of a web-based surveillance system for cellular phones and other mobile computing clients,” in Proc. Conf. Hum. Syst. Interact., May 2008, pp. 526–531, doi: 10.1109/HSI.2008.4581494. [86] V. A. Petrushin, O. Shakil, D. Roqueiro, G. Wei, and A. V. Gershman, “Multiple-sensor indoor surveillance system,” in Proc. 3rd Can. Conf. Comput. Robot Vis., Jun. 2006, p. 40, doi:10.1109/CRV.2006.50. [87] P. Korshunov and W. T. Ooi, “Critical video quality for distributed automated video surveillance,” in Proc. 13th Annu. ACM Int. Conf. Multimedia, Nov. 2005, pp. 151–160. [88] A. May, J. Teh, P. Hobson, F. Ziliani, and J. Reichel, “Scalable video requirements for surveillance systems,” IEE Intell. Surveill. Syst., pp. 17– 20, Feb. 2004. [89] A. Avritzer, J. P. Ros, and E. Weyuker, “Reliability testing of rulebased systems,” IEEE Softw., vol. 13, no. 5, pp. 76–82, Sep. 1996, doi: 10.1109/52.536461. [90] K. Shafique, F. Guo, G. Aggarwal, Z. Rasheed, X. Cao, and N. Haering, “Automatic geo-registration and inter-sensor calibration in large sensor networks,” in Smart Cameras. New York: Springer-Verlag, 2009, pp. 245–257. [91] C. Caricotte, X. Desurmont, B. Ravera, F. Bremond, J. Orwell, S. A. Velastin, J. M. Obodez, B. Corbucci, J. Palo, and J. Cernocky, “Toward generic intelligent knowledge extractions from video and audio: The EU-funded CARETAKER project,” in Proc. Inst. Eng. Technol. Conf. Crime Secur., Jun. 2006, pp. 470–475. [92] S. Fleck and W. Strasser, “Smart camera based monitoring system and its application to assisted living,” Proc. IEEE, vol. 96, no. 10, pp. 1698– 1714, Oct. 2008, doi:10.1109/JPROC.2008.928765.

[93] M. S. Kankanhalli and Y. Rui, “Application potential of multimedia information retrieval,” Proc. IEEE, vol. 96, no. 4, pp. 712–720, Apr. 2008, doi: 10.1109/JPROC.2008.916383. [94] A. Prati, R. Vezzani, L. Benini, E. Farella, and P. Zappi, “An integrated multi-modal sensor network for video surveillance,” in Proc. ACM Int. Workshop Video Surveill. Sens. Netw., Nov. 2005, pp. 95–102. [95] S. Calderara, R. Cucchiara, and A. Prati, “Multimedia surveillance: Content-based retrieval with multicamera people tracking,” in Proc. ACM Int. Workshop Video Surveill. Sens. Netw., Oct. 2006, pp. 95–100. [96] P. K. Atrey, M. S. Kankanhalli, and R. Jain, “Information assimilation framework for event detection in multimedia surveillance systems,” ACM Multimedia Syst. J., vol. 12, no. 3, pp. 239–253, Dec. 2006. [97] J. Kim, J. Park, K. Lee, K.-H. Baek, and S. Kim, “A portable surveillance camera architecture using one-bit motion detection,” IEEE Trans. Consum. Electron., vol. 53, no. 4, pp. 1254–1259, Nov. 2007, doi: 10.1109/TCE.2007.4429209. [98] L. Havasi, Z. Szlavik, and T. Sziranyi, “Detection of gait characteristics for scene registration in video surveillance system,” IEEE Trans. Image Process., vol. 16, no. 2, pp. 503–510, Feb. 2007, doi: 10.1109/TIP.2006.88839. [99] Y. Huang, X. Ao, Y. Li, and C. Wang, “Multiple biometrics system based on DavinCi platform,” in Proc. Int. Symp. Inf. Sci. Eng. (ISISE), Dec. 2008, pp. 88–92, doi: 10.1109/ISISE.2008.163. [100] L.-Q. Xu, “Issues in video analytics and surveillance systems: Research/prototyping vs. applications/user requirements,” in Proc. IEEE Conf. Adv. Video Signal Based Surveill. (AVSS), Sep. 2007, pp. 10–14, doi: 10.1109/AVSS.2007.4425278. [101] J. A. O. O’Sullivan and R. Pless, “Advances in security technologies: Imaging, anomaly detection, and target and biometric recognition,” in Proc. IEEE/MTT-S Int. Microw. Symp., Jun. 2007, pp. 761–764, doi: 10.1109/MWSYM.2007.380051. [102] H. Gupta, X. Cao, and N. Haering, “Map-based active leader-follower surveillance system,” presented at the Workshop Multi-Camera MultiModal Sens. Fusion Algorithms Appl. (M2SFA2), Marseille, France, Oct. 2008. [103] GE Security website. (2009). [Online]. Available: http://www.gesecurity. com/portal/site/GESecurity [104] ObjectVideo website. (2009). [Online]. Available: http://www. objectvideo.com/company/ [105] IOImage website. (2009). [Online]. Available: http://www.ioimage.com/ [106] RemoteReality website. (2009). [Online]. Available: http://www. remotereality.com/ [107] PointGrey webiste. (2009). [Online]. Available: http://www.ptgrey.com/

Tomi D. R¨ ty received the Ph.D. degree in informaa tion processing science from the University of Oulu, Oulu, Finland, in 2008. He is currently a Senior Research Scientist and a Team Leader of the Software Platforms Team at VTT Technical Research Centre of Finland, Oulu. His research interests include surveillance systems, model-based testing, network monitoring, software platforms, and middleware. He is the author or coauthor of more than 20 papers published in various conferences and journals. Dr. R¨ ty has served as a Reviewer for IEEE TRANSACTIONS ON MOBILE a COMPUTING and in several conferences.

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close