Let's assume that we have two syncronized audio streams from two separate microphones. To be able to calculate the sound direction we need to know the time or sample delay between the microphones.

There is a study made by the University of Auckland - A Comparative Study of Time-Delay Estimation Techniques Using Microphone Arrays which includes Matlab code.

Image 1 shows example data recorded using Kinect.

Image 1 ##### Audio data buffers

Let's assume that we have the syncronized audio data in two vectors. The size of channel 0 data vector is
m_dataLength. The channel 1 data vector has m_maxDelay items before and after as show in image 2.

Image 2 ##### Peak Detection

The simplest way to calculate the delay is by finding the highest peaks in the audio streams. C++ code:

``````DelayType TDE::FindPeak()
{
CalcType value0Max = CalcZero;
CalcType value1Max = CalcZero;

size_t index0 = 0;
size_t index1 = 0;

for (size_t pos = 0; pos < m_dataLength; pos++)
{
CalcType val = abs(m_channel0[pos]);
if (val > value0Max)
{
value0Max = val;
index0 = pos;
}
}

for (size_t pos = 0; pos < m_dataLength + 2 * m_maxDelay; pos++)
{
CalcType val = abs(m_channel1[pos]);
if (val > value1Max)
{
value1Max = val;
index1 = pos;
}
}
return (DelayType)index1 - ((DelayType)index0 + m_maxDelay);
}
``````
##### Cross Correlation

Wikipedia defines cross correlation - "In signal processing, cross-correlation is a measure of similarity of two series as a function of the lag of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, ..." C++ code:

``````TDEVector* TDE::CrossCorrelation()
{
TDEVector* res = new TDEVector(
2 * m_maxDelay + 1, {0, CalcZero});

for (DelayType = -m_maxDelay; delay <= m_maxDelay; delay++)
{
CalcType sum = 0;
for (size_t pos = 0; pos < m_datalength; pos++)
{
sum += m_channel0[pos]
* m_channel1[pos + delay + m_maxDelay];
}
res->at(delay + m_maxDelay).delay = delay;
res->at(delay + m_maxDelay).value = sum;
}
return res;
}
``````
##### Phase Transform

A way to sharpen the cross correlation peak is to whiten the input signals by using weighting function, which leads to the so-called generalized cross-correlation technique (GCC). C++ code:

``````TDEVector* TDE::PhaseTransform()
{
TDEVector* res = CrossCorrelation();
int nfft = (int)res->size();
typedef kissfft<double> FFT;
typedef std::complex<double> cpx_type;

FFT fft(nfft, false);
FFT ifft(nfft, true);

std::vector<cpx_type> inbuf(nfft);
std::vector<cpx_type> outbuf(nfft);

for (int k = 0; k < nfft; ++k)
{
inbuf[k] = cpx_type((double)res->at(k).value, 0);
}
fft.transform(&inbuf, &outbuf);

for (int k = 0; k < nfft; ++k)
{
inbuf[k] = outbuf[k] / std::abs(outbuf[k]);
}
ifft.transform(&inbuf, &outbuf);

for (int k = 0; k < nfft; ++k)
{
res->at(k).value = (int)std::abs(outbuf[k]);
}
return res;
``````

}

##### Average Square Difference

Average square difference tries to minimize the sum square errors between the signals. C++ code:

``````TDEVector* TDE::AverageSquareDifference()
{
TDEVector* res = new TDEVector(2 * m_maxDelay + 1, { 0, CalcZero });
for (DelayType delay = -m_maxDelay; delay <= m_maxDelay; delay++)
{
CalcType sum = 0;
for (size_t pos = 0; pos < m_dataLength; pos++)
{
CalcType diff = m_channel0[pos] - m_channel1[pos + delay + m_maxDelay];
sum += diff * diff;
}
res->at(delay + m_maxDelay).delay = delay;
res->at(delay + m_maxDelay).value = sum / m_dataLength;
}
return res;
}
``````

In this example all the methods returned the same value (-5).

The full TDE source code is available in the GitHub repository.