The purpose of this project was to develop a proof-of-concept system that will determine the spatial location of a sound source. The sound source will have known characteristics which will selected for ease of detection.
By examining the difference in time between the arrival times of a sound at different points in space, trigonometry can be used to determine the location of that sound source. If the sound is a tone, the difference in phase between the signals can be examined to determine the difference in arrival time at different points in space (microphones). This process is subject to several constraints and assumptions which will be described later.
Data acquisition proved to be one of the biggest challenges in this project. The sensors from the DARPA Distributed Sensor Network project were not available, and had a sample rate of only 1kHz. We ended up selecting a standard PC sound card.
The PC sound card is capable of capturing two channels in each at a rate of 44.1 kHz with very precise time synchronization between the two channels. Using the standard Linux audio driver, each 32-bit DWORD read from the sound card contains two 16-bit samples -- one for each channel at that instant in time. This lockstep capture is precise enough to allow a temporal comparison between the two channels.
While the sound card is capable of capturing the two channels simultaneously from the line input or the CD audio input, the microphone input is mono. The microphone input expects a signal of with an impedance of 500 ohms, ranging between one and negative one millivolt. The stereo line input expects a signal with an impedance of 47k ohms, ranging between one hundred millivolts and negative one hundred millivolts. This problem was solved by using the microphone pre-amps from two Sing-Along cassette recorders purchased at K&B toys. The amplifiers are noisy and unbalanced, but they boost the signal to a level that the sound card can read.
The signal is produced by the Stanford Research signal generator that is kept in Room 367 of the New Engineering Building, and fed into a standard amplified computer speaker.

The trigonometry depends on several assumptions about the environment in which this system will operate. First, the microphones must be no farther apart than one half the wavelength of the signal. This makes angle of incidence unambiguous, and also means that if the waves are more than one half wavelength out of phase, they can be shifted to a sane value without any loss of data.
Second, it is assumed that the audio wavefronts are flat. Theoretically, the wavefront from a point source is spherical, but the effect becomes insignificant when the sound source is far away. Also, the speaker that is used to generate the sound attempts to direct the sound in a particular direction.
Also worth noting is that all distance calculations are done in the unit of a ``sample''. This way, the program doesn't need to know the speed of sound, or waste any time doing unit conversions. The program does, though, have know how long it would take for a wave to travel along the shortest path from one microphone to the other.
As you can see from the above diagram, we know the hypotenuse and the base of a right triangle. We want to know the angle Theta. So:
cos(Theta) = Adjacent/HypotenuseNow, Theta is a bearing to the sound source, relative to the line between the two microphones.Theta = arccos(Adjacent/Hypotenuse)
The signal we chose to look for is a sine wave with a frequency of 689.0625 Hz. This frequency was selected because when recorded, it has a wavelength of approximately 64 samples. With the air at standard temperature and pressure, and ignoring the effects of humiditity, the wavelength is approximately 0.4480 meters. This allowed the microphones to be placed approximately twenty centimeters apart, which is convenient given the size of the apparatus.
In addition the the amplifier noise, the system must be able to filter it's signal from background noise consisting of human speech, recorded music, computer fans and power supplies, and any other ambient noise found in Room 367.
In order to do this, a software-based FIR filter was implemented with 1024 coefficients. The coefficients to the filter are merely a repeated sine wave of the frequency for which we are looking. For a signal with a wavelength of 64 samples, this is approximately sixteen iterations.
![]()
Before filter
![]()
After filter
The results from this filtering are surprisingly good. The incoming signal has significant low and high frequency noise which almost completely disappear after filtering..
Several ways of examining and averaging the zero crossings were tried. None of them produced satisfactory results, due either to coding problems or due to effects of leftover noise.
The chosen alternative was to compare the signals by convolution against each other over two wavelengths. It should have only been necessary to compare one wavelength, but the results close to 180 degrees were better with two wavelengths.
I then find the highest value in the convoluted signal. This value reflects the position where the second signal looks the most like the first signal. For the signal and situation described above, the index of this peak is the average phase difference between the two signals, plus or minus one wavelength.
The algorithms described above were implemented in C++. This language was chosen due to it's high execution speed as compared to Matlab and Java. Also, C++ is a very flexible language -- it is suitable for both fast number-crunching code and for creating a GUI. Last, an application can be designed so that it can be ported, for example, from Linux to Windows with minimal effort.
A small suite of utilities were written for accessing the sound card and for testing the data analysis routines. All utilities share a common core of code (the sndbuf object), and read and write the same audio disk file format. Here is a brief description of each utility:
pcmics2 a small GTK+ based GUI which performs the same function as snd_realtime. Instead of displaying the angle in degrees, it draws a diagram including the relative positions of the two microphones and a ray indicating the angle of incidence to the sound source. All images are calculated in near-realtime based on the incoming data. Also included is some basic data visualization capabilities. They were first included as a debugging tool, but the output is so pleasing aesthetically that they've been left in the project. Data visualization can be disabled at compile-time.
![]()
A typical session of the pcmics2 GUI.In the above picture, the diagram with the microphones represents the physical layout of the microphones in the real world. The arrows show the angle at which the sound has been calculated to be coming from. Two arrows are drawn to show the ambiguity that results from having only two microphones.
Buttons:
Defaults - reset all scrollbars to default values. Snapshot - take one black of samples, analyze, and display the results. Start - Start continuously taking snapshots as fast as the machine can go. Stop - The opposite of ``Start''. Quit - Exit the program. The waveforms show the filtered signal -- the lower waveform is the however much of the waveform will fit on the screen. The three waveforms in the upper left are one wavelength sections of the filtered signal overlayed on itself. These two displays are a compile-time option implemented for debugging purpose. They've been kept because of the aesthetic appeal they add to the screen.
The architecture of the system is shown below:
![]()
The objects in the pcmics2 applet.This project was designed to include several portability features. In order to use a different data acquisition device(high resolution ADC, Windows MCI), the methods in the class sndbuf whose names are prefixed with ``dsp_'' must be rewritten. Also all GTK+ specific code is in the object pcmics2_gui. If GTK+ is not available on the target system, pcmics2_gui can be rewritten to use whatever GUI library is available.
snd_realtime - repeatedly capture a block of data from the sound card. Calculate and display the angle of incidence. This a similar function to the pcmics2 applet, but non-graphical, and without any run time configuration. snd_compare - read a disk file containing a captured block of audio. Calculate and display the angle of incidence. dsp_capture - record a block of data from the sound card, write to a disk file. dsp_play - play data from a disk file through the sound card. gen_noise - generate random noise and write to a disk file. gen_tone - generate a tone of a specific frequency, and write to a disk file.
On the existing system, the 44.1 kHz sample rate allows for a spatial resolution of about +-15 degrees at the extremes (0 degrees and 180 degrees) and a resolution of about +-2 degrees in the middle (around 90 degrees).
Even though the phase correlation algorithm averages the phase difference over the entire block of samples, the angle measured from one block to another varies. This variation appears to be about +- one sample, causing the direction arrow in pcmics2 to ``jump'' around in a small area. A higher sample rate will help to reduce this effect.
The system works, and the algorithms used are effective in solving the problem.
The system implemented differs from the original plan in one important way -- the Distributed Sensor Network sensors were not used. Instead of having an arbitrary number of microphones located in arbitrary places, the sensors consist of two microphones attached to a meter stick. While the geometry is simplified by this change, adding the functionality for Distributed Sensor Network would be nontrivial but straightforward modification. The other implications of this change were discussed above.
Future work should work toward the goal of determining the location of a sound source in three dimensions. In order to do this, the data acquisition must be significantly improved. At least four channels are required to do this. Also, a higher sample-rate will enable better spatial resolution.
National Semiconductor ADC1175-50 Evaluation Board is available for the purpose. The board has a sample rate of 50 mHz. In order to use this board, though, new pre-amps should be built, and need to be used with an analog mux circuit. Also, the ADC board must be interfaced to the PC -- either through the parallel port, or through the ISA bus.
I would like to give a special thanks to Jae Hong Park for teaching me a great deal about signal processing. Also for his help in the filter design and his help overcoming the quirks of this particular project.