Audio Compression & Limiting - low CPU%

by **trogluddite** » Wed Feb 19, 2020 5:20 pm

Firstly, thanks for the detailed explanation of your project. I'm always fascinated by projects which are not the usual synthesisers/effects, and I also have a personal interest in atypical hearing (I have difficulties of my own resolving sound sources from background noise - perceptual rather than physiological in my case).

Much of the maths required for your project is beyond me - I shall defer to Martin there - but I can answer a couple of the simpler questions!

- In practical terms, "ducking" usually refers to connecting the compressor side-chain to a different audio source than the one for which gain is controlled. The usual example is the announcements on radio shows, where the announcer's voice is sent to the side-chain of a compressor on the music channel, so that the music reduces in level whenever they are speaking.

- Square root is possible, though there is a caveat. The SSE2 'sqrtps' opcode will compute the square root for all four SSE channels. However, 'sqrtps' is limited to approximately 11-bits of precision ("half-precision"), so I'm unsure whether this will sufficient to meet your need for such a low threshhold. Unfortunately, the FPU 'fsqrt' opcode doesn't seem to be recognised, so there is no instruction available which is more precise. If you need greater precision, you would have to use e.g. Newton's method to refine the result of 'sqrtps'.

- In assembly, it is possibly to modify hopping so that the phase is controlled in addition to the interval. This allows the CPU peaks of hopped execution to be spread more evenly across time. This requires just one extra opcode...

Code: Select all: mov eax, ecx; add eax, 10; // Hop phase in samples. and eax, 127; // Hop interval (must be a power of two) minus one. jnz BypassCode; // Hopped code here. BypassCode: // End of hopped code.

by **wlangfor@uoguelph.ca** » Wed Feb 19, 2020 5:23 pm

trogluddite wrote:Square root is possible, though there is a caveat. The SSE2 'sqrtps' opcode will compute the square root for all four SSE channels. However, 'sqrtps' is limited to approximately 11-bits of precision ("half-precision"), so I'm unsure whether this will sufficient to meet your need for such a low threshhold. Unfortunately, the FPU 'fsqrt' opcode doesn't seem to be recognised, so there is no instruction available which is more precise. If you need greater precision, you would have to use e.g. Newton's method to refine the result of 'sqrtps'.

- In assembly, it is possibly to modify hopping so that the phase is controlled in addition to the interval. This allows the CPU peaks of hopped execution to be spread more evenly across time. This requires just one extra opcode...
Code: Select all
mov eax, ecx; add eax, 10; // Hopping phase in samples. and eax, 127; // Hop interval (must be a power of two) minus one. jnz BypassCode; // Hopped code here. BypassCode: // End of hopped code.

Nice post thanks for that info. And btw; Cyto had some really good RMS examples too, should you wish to see more examples. Some use less CPU, others more, it comes down to trial and error.

by **trogluddite** » Wed Feb 19, 2020 5:37 pm

PS) I should also add that the effect of spreading the hopping phase may be limited unless the hop interval is significantly greater than the audio-driver (ASIO/DS) buffer size. Audio is always processed in buffer-sized blocks, for which our "single-sample" code provides only the "body" of the loop - and we have no control over when the CPU scheduler will switch threads.

by **wlangfor@uoguelph.ca** » Wed Feb 19, 2020 9:11 pm

trogluddite wrote:PS) I should also add that the effect of spreading the hopping phase may be limited unless the hop interval is significantly greater than the audio-driver (ASIO/DS) buffer size. Audio is always processed in buffer-sized blocks, for which our "single-sample" code provides only the "body" of the loop - and we have no control over when the CPU scheduler will switch threads.

Thanks I hadn't known this, or frankly about the sqrt either. I've always used other examples, but it would seem this could save CPU for a hack as you say.

by **steph_tsf** » Wed Feb 19, 2020 9:45 pm

Thanks, trogluddite. I think I see the duck (ambient music) floating on the water surface, then diving into it, then showing up again.

Regarding the square root, or any other x86 SSE routine, can you please attach a x86 SSE Flowstone canvas, enabling to insert a particular routine between A and B in the code, such canvas managing to execute 100 times in a row, the particular x86 SSE routine that's assessed?

I think I will enjoy focusing on the true RMS detector, and determine the cause of the issue I am facing. Is it the way I am using it, or some internal noise, or some internal DC drift? Such RMS detector (that's not my design) is relying on Martin Vicanek x86 SSE general and useful "full blue" routine that's computing "A to the power of X". Here, A equals the signal, and X equals 0.5 (for the square root). Obviously this is sub-optimal. I guess Martin Vicanek already authored and published a more efficient routine, to be exploited here, that's only computing "A to the power of 0.5", not in full 32-bit x86 SSE precision, but precise to 1e-7 which is -140 dBFS.

Apparently, all routes lead to Rome, because in case the octave band detector that's relying on a Hilbert pair, is delivering some exploitable result, comes the exact same need for a "A to the power of 0.5" x86 SSE routine, not in full 32-bit x86 SSE precision, but precise to 1e-7 which is -140 dBFS.

I am curious about how to subjectively assess the parasitic ripple that's coming from a RMS detector or Hilbert pair detector. I am considering adding the ripple to a constant signal, that's driving a AM modulator. This way I will be able to subjectively assess the max tolerable AM percentage, in function of the modulating signal frequency. A spectrum analyzer will of course show sidebands, in case one is inputting a pure sine, and in case one is relying on a long FFT, say a 16,384 samples FFT. Unfortunately music and speech are not a pure sine signal. The spectrum analyzer is thus of little use. Are there habits or standards to be followed, for assessing the subjective audio quality of a side-chain compressor?

By the way, various building blocks may serve emulating on Flowstone, a modern, standardized open-source hearing aid that I am progressively building and testing here : viewtopic.php?f=2&t=38053

This helps demystifying and clarifying all technical aspects of audio correction. Teaching how to measure a hearing impairment, and teaching how to fit a hearing aid, all become evident once you are working on such standardized open-source realtime platform. Such platform is due to evolve in a well structured way, propelled by demand on one side, and propelled by silicon innovation, wireless innovation, etc. on the other side. It is hoped that all young people and scholars that are dealing with audiology, get accustomed to such open-source realtime platform during their studies.

By open-source, I don't mean Flowstone itself. There is no intention, of driving Flowstone towards open-source.
By open-source, I only mean the realtime audio DSP applications, designed using Flowstone, and running on Flowstone.

One need to keep an eye on other silicon and operating systems.
Imagine M$, progressively dropping the x86 SSE architecture, steadily favoring the x64 architecture. I want to know the consequences for the Flowstone community.
Imagine young people and scholars, turning away from computers that are requiring more than 3 seconds to boot. What kind of silicon and what kind of operating system, will they massively adopt? Does it exist yet?
Just curious.

by **trogluddite** » Wed Feb 19, 2020 10:44 pm

steph_tsf wrote:can you please attach a x86 SSE Flowstone canvas, enabling to insert a particular routine between A and B in the code, such canvas managing to execute 100 times in a row, the particular x86 SSE routine that's assessed?

Here are templates for assembly loops with either a fixed loop count or the count set by a stream input.

Assembly Looping.fsm: (664 Bytes) Downloaded 1112 times

You may also find the 'Analyser' component from the toolbox useful - it allows any DSP/ASM code (including networks of multiple parts) to be executed in a loop of any size. It also returns an Array showing the CPU clock cycles for every iteration, so you can also see whether CPU load is consistent or has "spikes". Don't be concerned that it uses poly streams; this is only to allow FS to control the number of samples, the results are the same as if mono/mono4 were used.

Also there is this 'Average CPU Analyser' module which allows easy comparison of code-A vs. code-B, and shows the CPU cycles as a neat graph (it is from the SM days; I forget now who created it)...

Average CPU Analyser.fsm: (22.38 KiB) Downloaded 1097 times

by **adamszabo** » Thu Feb 20, 2020 9:47 am

Trog, in the assembly looping schematic you posted, is it possible to input different variables? So lets say I write a simple sawtooth oscillator inside the loop and I loop it 10 times, is it possible to have them different frequencies, then add their output together in the end?

by **trogluddite** » Thu Feb 20, 2020 1:51 pm

adamszabo wrote:Trog, in the assembly looping schematic you posted, is it possible to input different variables?

The only restriction is that the number of loop iterations has to be the same for all SSE channels; other than that you can put whatever code you like as the loop body, and there's no restriction on what variables and registers you use there (so long as you keep the matching eax push/pops). The first example, with the fixed loop iterations, is exactly the same assembly that gets produced by the DSP loop(x){...} instruction.

by **martinvicanek** » Thu Feb 20, 2020 9:12 pm

trogluddite wrote:[...]'sqrtps' is limited to approximately 11-bits of precision ("half-precision")

Are you sure about that? I seem to get full precision.

by **trogluddite** » Thu Feb 20, 2020 9:51 pm

martinvicanek wrote:Are you sure about that? I seem to get full precision.

Aaah - my mistake; it's the reciprocal square root "rsqrtps" that's limited precision, the same as the "rcpps" reciprocal. I must have missed the little "r" in my search results, because I did go looking to check (it might have helped if search engines stopped being so "helpful" by looking for random crap based on "social media" addicts' search histories, and just showed what I actually damned well asked for - well; that's my excuse, anyway! :oops:

)

Thanks for the correction, Martin - can't argue with the numbers!

Audio Compression & Limiting - low CPU%

Re: Audio Compression & Limiting - low CPU%

Re: Audio Compression & Limiting - low CPU%

Re: Audio Compression & Limiting - low CPU%

Re: Audio Compression & Limiting - low CPU%

Re: Audio Compression & Limiting - low CPU%

Re: Audio Compression & Limiting - low CPU%

Re: Audio Compression & Limiting - low CPU%

Re: Audio Compression & Limiting - low CPU%

Re: Audio Compression & Limiting - low CPU%

Re: Audio Compression & Limiting - low CPU%

Who is online