Abstract:
This paper examines the behavior of two different energy-based voice activity detector (VAD) algorithms for noisy input signals. The
examined detectors use time-domain methods to find speech boundaries. Time-domain short time energy features and/or zero-crossing
rate of speech signals are used to evaluate the performance of the methods. In the first stage of both algorithms, time-domain short-time
energy (STE) features are calculated for each speech segment. Then energy ratios and threshold values are used to detect any voicing
activity of speech signals. The decision threshold value is calculated by evaluating the average STE of an initial silence period. The
effectiveness of the selected methods is tested for clean and noisy speech samples. The methods are tested using the noisy speech signals
under different SNR levels. The results indicated that both methods achieve a reasonable accuracy as low as an SNR value nearly 0dB
with a slowly decreasing performance. But, under 0dB SNR, both methods lose their effectiveness against noisy conditions.