Feature ArticleFinding the Calls of Whales In a Sea of Sound
By Dr. Aaron N. Rice
Dr. Peter J. Dugan
Director of Applied Science and Engineering
Dr. Christopher W. Clark
Bioacoustics Research Program
Cornell Laboratory of Ornithology
Ithaca, New York
Recent expansion of commercial activities in the ocean has brought with it a major increase in the levels of anthropogenic noise propagating through marine habitats. This has been in large part due to an increase in seismic exploration and noise from ship traffic. However, it is still unclear what effect these increased noise levels may have on the ocean's inhabitants. Of principle concern are the many vulnerable, threatened and endangered species of marine mammals.
Under the Endangered Species Act (ESA) and Marine Mammal Protection Act (MMPA), government regulators have instituted a monitoring and mitigation requirement to accompany many shipping and construction-related activities. Through recent developments in acoustic technology—recording capabilities, duration, computational processing capabilities—it is now possible to use the same recording devices used for ESA and MMPA compliance to also monitor ambient ocean noise, marine mammals and other organisms.
Passive Acoustics Whale Monitoring
Acoustic communication plays an integral role in the lives of many marine mammals and fishes. Sounds are used for communicating, navigating, finding food and detecting predators. Unlike other communication modalities, acoustic communication can be observed remotely and passively, and it can be used to assess species-specific patterns of occurrence and behavior. After establishing baseline levels of bioacoustic activity in a monitoring program, marine mammal and fish vocalizations can serve as an indicator of overall environmental health or indicate organismal responses to anthropogenic noises through changes in the occurrence of sound production.
The MARU is anchored a few meters above the seafloor, allowing researchers to record long-term underwater acoustic data.
When these animals become stressed, their acoustic behavior is either altered or inhibited, and sound patterns recorded deviate from control periods. Thus, passive acoustics can serve as a long-term method to record temporal or spatial changes in marine animal occurrence, behavior and ecology through the recording of their sounds. This is particularly true in cases where other methods of environmental monitoring are impractical due to seasonal conditions (e.g., aerial surveys in the winter) or inaccessible locations (e.g., sea ice cover in polar regions).
One of the essential elements of successful passive acoustic monitoring is the collection of adequate baseline data. Many whales and fishes have seasonal periods of residency or sound production in different geographical regions. Thus, initiating a passive acoustic recording effort in an area at a certain time of year may not yield the expected results, simply because the animals are either not there or not vocalizing. To circumvent this limitation, passive acoustic recording in previously unrecorded regions should be conducted for extended lengths of time in order to capture such seasonal profiles.
The development of autonomous, long-term recording technology has enabled passive acoustic monitoring in remote locations. Previously, long-term recordings were limited to ship or shore-bound facilities, but now recording units can be deployed anywhere around the world to listen for species of interest.
Cornell's Bioacoustics Research Program (BRP) has developed a marine autonomous recording unit (MARU), a digital audio recorder that can be programmed to record on a desired daily schedule and be deployed for periods of weeks or months in a remote environment. The instruments within MARUs are housed in positively buoyant glass spheres and anchored on the seafloor, such that the recording unit floats a few meters above the bottom. Underwater sounds are recorded through a hydrophone mounted outside the sphere.
The acoustic data generated by the MARUs are digitized and stored in binary digital audio format on an internal hard disk or flash memory. At the conclusion of the deployment, the MARU is sent an acoustic command to release itself from its anchor, and it floats to the surface for recovery. After the device is recovered, its recorded audio data are extracted, converted into multichannel sound files and stored on a server in preparation for analysis.
The relatively small size and low cost of the recorders allow many units to be deployed in different geographical configurations, allowing researchers to understand spatial as well as temporal patterns of marine animal vocalizations.
Time-frequency representation (spectrogram) of marine sounds, showing the temporal and spectral overlap of two different species: the North Atlantic right whale and black drum fish.
Spectrograms of contact calls produced by North Atlantic right whales (indicated by white boxes). Both the calls and background noise are often variable, posing a challenge for automated detection methods.
The development and capacity of acoustic recording technology has advanced dramatically over the past several decades. When recording media were limited to analog magnetic tape, recording duration was typically limited to the scale of hours. Today's digital storage media (such as hard drives and flash memory) have extended the duration of recording to the scale of many months in a single session, limited primarily by disk size and battery life.
Currently, MARUs record continuously for up to 100 days at a sampling rate of 2,000 hertz to listen for baleen whale species.
Approaches for Mining Data
Thousands of hours of audio are recorded during passive acoustic monitoring, making it logistically impossible for any individual scientist to listen to the recordings in their entirety. A typical deployment consists of an array of many recording units: some deployments have as many as 19 units recording over a 100-day period, resulting in 1,900 days of data needing inspection. Before automated detection, humans would identify animal vocalizations using hearing-based methods: With 1,900 days of audio, human listening is not a realistic option.
Consequently, a combination of computational methods has become paramount in dealing with data collection at this scale. First, these data are not represented in the acoustic domain in which they were recorded, but transformed to the visual domain for analysis, where these sounds are represented as a spectrogram, showing the relationship of frequency versus time for the signal. Using this spectrographic representation, the second step uses detection and classification methods to find sounds of interest and also uses image-processing-based approaches.
Different image-based algorithms have been developed to use pattern recognition to find sounds (or images of the sound) of particular interest. While these computer-aided methods allow automatic inspection, they are not error-free, and they will never remove the human experts from the process. For the foreseeable future, computers require guidance. Therefore, humans not only confirm detections but also ground truth the recordings, which enables the computers to properly identify sounds.
Auto-Detection and Classification
Marine mammals produce highly variable sound features that span many orders of magnitude along the dimensions of time, frequency and amplitude. The main challenge becomes understanding and defining the acoustic parameters to guide the detection effort.
The main acoustic parameters guiding the development of a detector are time and frequency ranges. Among cetaceans, these range from the long-duration, low-frequency calls of blue whales to the short-duration, high-frequency calls of dolphins. Some calls, like the upcall of the North Atlantic right whale or the song of the fin whale, are highly stereotyped and show relatively little variability. Others calls, such as the songs of humpback whales or bowhead whales, are extremely variable and show dramatic variation in patterns of frequency modulation. The acoustic feature space (i.e., time and frequency) and the level of variability of these calls often dictate the strategy used for developing an automated detection procedure. Additional challenges can occur when sounds of interest overlap in both frequency and time with other co-occurring species, such as the calls of haddock fish and minke whales.
Building a signal-detection algorithm starts with understanding many aspects of the call pattern and the surrounding environment. This includes documenting the signal context, specifically sounds of interest and other calls in the geographic area that may be similar, and describing fluctuations in levels of ambient background noise that may mask sounds of interest. Signals of interest for detection are isolated, or 'clipped,' from the longer sound stream and put into a catalog consisting of many (often tens of thousands) short-duration clips for detector development. Including sounds from many individuals from different recording sites and locations helps document the variability of both the signal and the background noise.
Three common stages exist for automatically identifying sounds in passive acoustic monitoring data. These main steps include energy detection (Stage 1), feature extraction (Stage 2) and classification (Stage 3). The approaches described here provide only a sample of the range of acoustic processing technology and include some of the methods currently used at BRP.
Stage 1 is essentially an initial screening method to identify potential sounds of interest. Energy detection methods use a set of criteria combined with a threshold amplitude level: Sound above the threshold is recorded, other sound is rejected. Criteria include a time-frequency threshold, in which a simple range of frequencies and time bounds are established; connected-region detection, which detects the number of connected pixels in the spectrographic image, helping distinguish biological sounds from random, incidental or ambient background noise; and data-template detection, in which a predefined spectrographic image template (or templates) of a sound of interest is set by the user and all candidate sounds are correlated against their 'match' to the template.
In Stage 2, acoustic features are extracted from these candidate sounds. For algorithms developed for right whale detection, a total of 11 features are used, including duration, frequency range, bandwidth, rate of change, etc. The goal of feature extraction is to quantitatively describe the sounds of interest using a multitude of different parameters and to use these parameters to filter the candidate sounds for sounds of interest.
Stage 3 is where the quantitatively described signal is classified into either signal or noise. In classification procedures, the computer essentially is taught what the sounds of interest are, then informs the user whether candidate sounds from the recording match the sounds of interest. Three approaches used are linear discriminant analysis—a linear combination of features to classify the signal; classification and regression trees—a set of rules to make a set of tree-based decisions to classify the sound; and artificial neural networks—a series of interconnected nodes (mimicking the pattern of biological neurons) that function as a series of adaptive filters, processing the information as it flows through the system. In multiclassifier approaches, two or more of the above procedures are combined, and final classification of a sound occurs when two or more classifiers are in agreement about the nature of a signal.
The combination of using all three stages to characterize different sounds in automated sound detection provides a more robust (though still not error-free) method of analysis to categorize vast amounts of data. Thus, for large-scale monitoring efforts, acoustic data can be processed in fractions of the time compared to the rate at which a human analyst can examine the data.
As both the sound recording and computational technologies improve, the scope and scale of the application of these methods both broadens and improves. In an ideal setting, the relationship between the biology and the technology creates a feedback loop where the biology guides the technological development, but these new technologies in turn create new questions, approaches and ideas for the biology.
For the past 10 years, research at BRP has focused on passive acoustic monitoring of cetaceans, particularly observing the occurrence of the critically endangered North Atlantic right whale along its migration route in many locations in the Atlantic Ocean and researching the ecology of bowhead whales in the Arctic Ocean. Through these ongoing efforts of marine archival recordings, an initial understanding of the acoustic ecologies for a number of other cetacean and fish species has emerged. This extensive archive of hundreds of thousands of hours of recordings represents a library encompassing a wide range of acoustic habitats, and it will be used for further development of baseline data and monitoring methods in these waters for a variety of taxa.
For a full list of references or more information on the technologies discussed, please contact Aaron Rice at firstname.lastname@example.org or Peter Dugan at email@example.com.
Dr. Aaron N. Rice is the science director at Cornell's Bioacoustics Research Program, where he leads a team of researchers investigating the acoustic behavior and ecology of whales and fishes.
Dr. Peter J. Dugan, director of applied science and engineering at the Bioacoustics Research Program, leads the development of detection and classification technologies used in the automated recognition of marine animal vocalizations.
Dr. Christopher W. Clark is the program director of the Bioacoustics Research Program, where, for more than 25 years, he has combined the innovation, development and application of acoustic technologies with biological inquiry to promote the understanding and conservation of marine mammals through the sounds they make and their acoustic environment.