The Qualiverse - Intelligibility
I have been accused of “not listening” to my wife many times in the past. I should preface this by saying that I love my wife dearly, and on date nights or after the kids go to sleep, I am very attentive. However, there are times during the weekends where I am upstairs listening to a podcast, the kids are fighting about what to have for lunch, there is background music playing downstairs, and my wife picks THAT time to ask me, from all the way downstairs and at a whisper (probably), to take the trash out.
Fast forward an hour: the trash had not been taken out.
My wife argues that I wasn’t listening. I argue that her public address system did not have adequate intelligibility. That didn’t go over well, but the science was on my side!
Why is intelligibility so important?
The whole point of public address and speech reinforcement systems are to distribute voice audio from a talker to all expected listeners in an intelligible manner. Listeners should not have to struggle to listen to what is being said by a properly installed audio system. Anyone who has attended a graduation ceremony in a middle school gym with horrible acoustics can attest to how fatiguing and difficult it can be to understand what is being said when the system’s intelligibility is not adequate. That’s just at a cute ceremony. Imagine struggling to understand what is being said during an emergency evacuation?! Measuring the ability of the system to accurately distribute audio signals is an incredibly important performance requirement, and one that is often overlooked.
Can it really be measured?
Why don’t integrators and technology managers demand that the intelligibility of their systems be measured? My guess is they don’t know they can. To be fair, it used to be very difficult and subjective to quantify how accurately a system is delivering audio. If you think about it, human speech is such a complex waveform, generated from so many varying sources, being transmitted to myriad unique receivers. Everyone’s voice is unique. Everyone’s ears are unique. Is it even possible to measure something as complex as speech reinforcement?
The first attempt at measurement
They tried. Just after World War II, scientists were trying to improve communication methods, ostensibly for submarine and airplane radio traffic. As such, they needed to devise a scheme to measure “speech quality”. They came up with 720 phonetically balanced sentences that use the phonemes at similar frequencies as they happen while speaking English. They are somewhat nonsensical and/or uncommon phrases on purpose, so when someone says them through a system, the listener has to rely on the speech quality (or intelligibility) of the system to determine the actual words. An example would be someone saying:
Tea served from the brown jug is tasty.
The receiver would have to interpret what their heard. Was it “tea served” or “teas served”? Was it “from the brown” or “from a brown”? Did they just really say “Tea soaked in Lebron James is tasty?!” (an actual mishearing of the phrase during testing performed at the University of Wisconsin-Madison1). This was a great first step, but very time-consuming, expensive, subjective, and certainly not repeatable with different talkers and listeners. Something else was required.
The Introduction of STI
Enter the Speech Transmission Index (STI). This genius methodology understood that speech is complex. To measure intelligibility, we have to look at audio information across the frequency band of speech, from 125 Hz to 8 kHz. We also need to look at the audio signal at different modulations to capture the effect the system has on the speech, to determine the “Modulation Transfer Function” (MTF). By modulating the source at different frequencies, we can get a sense of the impact background noise, reverberation, echo, etc. from the room has on the audio system. STI takes audio in each speech octave band, modulates it at 14 different frequencies, normalizes it (puts it on a scale from 0 to 1), and we get 98 points of data to play with. Knowing that STI matrix can clue us in to what room contributions are adversely affecting our audio signal, whether it’s a noisy space, a reverberant space, or an echo-y space. But, people want an easy to understand, easy to consume description of how intelligible the system is. So, STI takes this normalized matrix of information and reduces it down as a weighted sum to one number for overall intelligibility.
STI |
0.00 – 0.30 |
0.30 – 0.45 |
0.45 – 0.60 |
0.60 – 0.75 |
0.75 – 1.00 |
Speech Intelligibility |
Bad |
Poor |
Fair |
Good |
Excellent |
As great as this method is, it takes time and processing power. To come up with that one number for the STI takes about 15 minutes. Obviously, we want to do it a few times to assure accuracy... and we need to do it at several locations in the room to get a sampling of listening positions. To measure intelligibility takes several hours... which gets expensive to perform that single test. It also begs the question: do we really need all 98 points of data in the MTF matrix?
A Better Approach
Enter Speech Transmission Index for Public Address (STIPA). STIPA looks at 7 octaves at 14 modulation frequencies, but by using a modulated noise source and only measuring 14/98 data points (two per octave), the time to perform a measurement and the time to compute the result is greatly reduced, from 15 minutes per measurement to about 10 seconds. This is the way.
There is also a Speech Transmission Index for Telecommunication Systems (STITEL) which is another simplified version of STI. It only measures 7 modulation frequencies across the 7 octave bands (one per band), so STIPA is actually more rigorous. As long as vocoders aren’t used in the audio compression algorithms, STITEL or STIPA may be a valid way to measure intelligibility through a conferencing system, which can be very useful. But, let’s focus on STIPA for now.
STIPA produces an intelligibility index just like STI. Technology Managers can then determine what level of intelligibility is acceptable for their users. Life Safety systems, with typically short, repeated messages can be as low of 0.55 and still be acceptable. However, for executive conference rooms, where meetings can last all day, it might be fatiguing to have systems with a STIPA of less than 0.70. We also need to keep in mind that there is a cost to improving the intelligibility of the system and technology can only get us so far. Room reverberation time, ambient noise, HVAC system balancing, construction materials and methods all play critical rolls in how intelligible a space will be. Improving these criteria can add up quickly (think hundreds of thousands of dollars), so intelligibility requirements need to be carefully considered. However, they should be considered. AQAV recommends requiring conference rooms to have AT LEAST a STIPA of 0.62. For most small conference spaces, where the loudspeakers are reasonably close to the listeners and the room volume is reasonably small with little reverberation, this can be easily attained.
Again, why is it so important?
As designers and installers of audio systems, our focus needs to be delivering intelligible audio that is easy and pleasurable to listen to. By specifying an intelligibility requirement, we are assuring system users that we have their listening enjoyment in mind. It is important to be able to measure intelligibility repeatably to avoid the subjective nature of describing how well a system is reproducing a source signal. It is not enough to rely on opinions when lives are potentially on the line in the case of life-safety systems. STIPA offers a relatively inexpensive, rapid, consistent method to provide that measurement for deployed public address systems, and perhaps other audio systems as well.
I tried to explain this all to my wife last time I didn’t take the garbage out. I’m not saying this will win you any arguments with your significant other. It definitely won’t do that. But, it might offer them an explanation as to why you “weren’t listening”, or at the very least, stun them for a second by the crazy turn the conversation has taken, and lighten the mood a bit. My dad used say, “If you can’t dazzle them with brilliance, baffle them with nonsense (or another colorful term that rhymes with “full pit”).” Bringing up measuring intelligibility during a heated argument with your loved ones might just accomplish both!
Next Steps
Interested in learning more about Quality Standards in AV? Continue reading The Qualiverse or get connected with us! You can contact us here or click the chat box below to connect instantly. We look forward to chatting with you!
References
- The “Harvard Sentences” Secretly Shaped the Development of Audio Tect by Sarah Zhang: https://gizmodo.com/the-harvard-sentences-secretly-shaped-the-development-1689793568