User guide

Aries Reference Manual Section 2 – Viewing and Analysing Data
Radio Systems Information Ltd. Page 43
3.2 Performance
Early models for quality assessment (for example, P.861 PSQM, P.861 MNB, PSQM+) were mainly
designed for assessing speech codecs and are unsuitable for use with today’s networks because they are:
inaccurate in predicting quality with some important codecs
unable to take proper account of noise or errors such as packet loss
unable to account for the filtering effect of analogue elements (for example, handsets and 2-wire
access)
unable to deal with variable delay
PESQ compared with PSQM, PSQM+ and MNB
The ITU-T use correlation coefficient as a measure of the accuracy of models like PESQ at predicting
subjective MOS, using P.800/P.830 subjective tests as a benchmark.
The table below presents correlation figures for 38 subjective tests that were available to the PESQ
developers.
No. tests Type Corr. Coeff. PESQ PAMS PSQM PSQM+ MNB
19 Mobile Average 0.962 0.954 0.924 0.935 0.884
Network Worst-case 0.905 0.895 0.843 0.859 0.731
9 Fixed Average 0.942 0.936 0.881 0.897 0.801
Network Worst-case 0.902 0.805 0.657 0.652 0.596
10 VoIP Average 0.918 0.916 0.674 0.726 0.690
Multi-type Worst-case 0.810 0.758 0.260 0.469 0.363
The table below presents figures from an independent evaluation of PESQ by four of the world’s leading
test labs. These tests cover a very broad range of fixed, mobile and VoIP networks as well as
combinations of different types of network.
Test Type Corr.
Coeff.
1 Mobile: real network measurements 0.979
2 Mobile: simulations 0.943
3 Mobile: real network, per file only 0.927
4 Fixed: simulations 4-32kbit/s codecs 0.992
5 Fixed: simulations, 4-32kbit/s codecs 0.974
6 VoIP: simulations 0.971
7 Multiple network types: simulations 0.881
8 VoIP frame erasure concealment
simulations
0.785
Average 0.932
Worst-case 0.785
The average correlation is a measure of how well models perform on average in a wide range of
conditions. The worst-case correlation is very important this shows what happens when the models are
used in the most challenging conditions.
With every type of network, on both average and worst –case performance, PESQ is much better than
PSQM, PSQM+ and MNB. PESQ is also slightly better than PAMS, particularly in worst case performance.
In fact the performance of PESQ was so good that the old recommendation P.861, which specified PSQM
and MNB, was withdrawn by the ITU as soon as they standardised PESQ as P.862.