User guide

Aries Reference Manual Section 2 – Viewing and Analysing Data
Radio Systems Information Ltd. Page 42
3.1 How PESQ works
PESQ measures one-way, end-to-end voice quality and is designed for use with intrusive tests: a signal is
passed through the system under test, and the degraded output is compared with the input (reference)
signal.
The test signals must be speech-like, because many systems are optimised for speech, and respond in an
unrepresentative way to non-speech signals (e.g. tones, noise, ITU-T P.50). The processing carried out by
PESQ is illustrated in fig 5.1 below.
Figure 48: Structure of PESQ
The model includes the following stages:
Level alignment. In order to compare signals, the reference speech signal and the degraded signal are
aligned to the same, constant power level. This corresponds to the normal listening level used in subjective
tests.
Input filtering. PESQ models and compensates for filtering that takes place in the telephone handset and in
the network.
Time alignment. The system may include a delay, which may be variable. In order to compare the
reference and degraded signals, they need to be lined up with each other. Time alignment is then done in
a number of stages. First it estimates the delay applied to each speech utterance, then searches for delay
changes that occurred within utterances. Finally, bad intervals (sections which may have been mis-aligned)
are realigned. Delay variations during speech may be audible, so PESQ samples across each delay
change to determine its subjectivity.
Auditory transform. The reference and degraded signals are passed through an auditory transform that
mimics key properties of human hearing.
Disturbance processing. The disturbance parameters are calculated using non-linear averages over
specific areas of the error surface:
the absolute (symmetric) disturbance: a measure of absolute audible error
the additive (asymmetric) disturbance: a measure of audible errors that are significantly louder than the
reference
These disturbance parameters are converted to a PESQ score, which ranges from –1 to 4.5. This may also
be convert to PESQ LQ which is on a P.800 MOS-like scale from 1 to 5 as shown below:
Speech Quality
5 Excellent
4 Good
3 Fair
2 Poor
1 Bad
Time align
and equalise
Input
filter
Input
filter
Level
align
Level
align
Auditory
transform
Auditory
transform
Disturbance
processing
Cognitive
modelling
Identify bad
intervals
Prediction of
perceived
speech quality
Re-align bad intervals
System under
test
Reference signal
Degraded signal