can this be simplified / optimized for speed?

by **tester** » Sat Aug 09, 2014 3:47 pm

This is an old design created on SM forum. Quick question. Can this be optimized/simplified? If one such fellow is running, then there is no problem, but with 40 on board - things start to get heavy.

Alternative is to use Trogs FFT/iFFT module, but again - using single one (stereo) is fine, but using two of them (at 32kpts), with other stuff on board - starts to be too much. (or maybe not? while it shows low CPU usage, I have glitches...)

by **trogluddite** » Sat Aug 09, 2014 4:58 pm

You're using green and the mono4 channels well there, so the only optimisations I can see would be to reduce the amount of reading/writing to memory - pretty much all assembly stuff I'm afraid!

- The DSP code compiler is pretty rubbish - I can see a fair few redundant reads/writes in the bi-quad's compiler output.
- Combine the 'chains' of bi-quads into single primitives for less stream in/out reads and writes.
- Which would also allow 'shufps' optimisation of the pack/unpack loop.

No idea how much more efficient that would be - but just one 'cache miss' when accessing memory can easily cost more CPU cycles than the rest of that code put together, so I suspect that it would be well worth doing for the number of copies that you need.
I'm pretty busy for the rest of the weekend, but if no-one else jumps in, I'll take a look when I get time - remind me in a few days if you haven't got anywhere with it.

by **tester** » Sat Aug 09, 2014 5:37 pm

No rush, this is a prototyping part right now, so it may not work as expected at all.

I'm rethinking 3rd solution - recaching part of audio and remixing it with ongoing part, driven by your FFT/iFFT guts.

by **KG_is_back** » Sat Aug 09, 2014 6:33 pm

Yes... for a chain of biquads, there are many ways how to reduce CPU. One thing comes to my mind, is to put them into single code block and use output of the previous as an input for next (directly). You can prevent a lot of completely nonsense memory writing, because what is "in" in one block is exactly same as "out" in previous including the delay chain.
Another way, of reducing read/write stuff is to use circular buffer instead of a delay chain (in assembly - in code read/write arrays is done differently (per channel) so that would be CPU blast). However the delay chain is only 3 variables long, so the improvement of having less writing to memory might be diminished by need to calculate index for the loop.

By the way, are you concerned that our schematic will introduce one sample delay, because you have the pack/unpack connected sort of in feedback?

by **tester** » Sat Aug 09, 2014 10:03 pm

Small delays are not a problem.

Basically Im' thinking on reconstructing (at least to some degree) a process that I'm doing manually. Normally it takes a lot of hdd space and time... and invention if you wish to do something else. Originally - I used CoolEdit (aka adobe audition, only old versions) FFT filter, but it was at times of SM, when there was no Trogs FFT/iFFT modules. It appeared, that it's possible to switch into butterworth filters, to get somewhat similar results, although I'm not sure if they can really stay for that particular project I have in mind. There seem to be subtle but vital differences in transparency produced by the FFT process.

The concept is this. When I have prepared what is to be filtered - a lot of narrow sharp filters are added to that layer. With FFT filter (at hi-res), these will be only array data windows combined together and placed on it. With butterworths - there will be 10-20 individual sharp filters in place (like this one), tuned to cover the ranges of few to 40Hz. But before something else is done - filtered layer is mixed with itself on a resampled level, which means that in live mode FFT filter can't be used (due to non 1:1 rescaling of freq windows), and using bttwrths - the amount of them doubles. Or - first layer could be recorded for some time (best would be to get it cached - but it's 30-60 minutes long file...) and used as in-mix with playback. Probably I will end up with making just the top part, to keep the schematic running live (because I'd like to add there some filters for shaping the background, and some modulators, to get different direction of effects).

And then is first destination layer exported for further processing.

Now - what the filters do - they extract small portions of dynamics of a background sound, and they create some sort of tonal representation of it. But what makes the effect is "how" they do it and "what" they use as a source.

KG, you might be right, for example when I tried to replace the schematic with blue'ish modules (suited for modulation) - while CPU wasn't that high - I had glitches. Whether on green or blue - destination pack of btwths should contail 20 such units per process, and either one or two such packs would be used.

One of my thought is - is there a simpler design, that can do the same as a bunch of so combined butterworths? Bandpass filter with very (!) sharp edges. Maybe there are some parameters that can be combined together, to get such unit?

by **trogluddite** » Sun Aug 10, 2014 1:22 pm

Just thinking out loud...

You have in your 'filter pack' effectively 15 bi-quads chained per channel, each with four 'multiply-add/subtract' for the co-efficients.
CPU wise, we could say that you're using the same power as 60 'taps' - in fact, much more than that due to the parts that deal with the feedback, de-normal removal etc.
So , I wonder, how many 'taps' would an FIR filter with similar properties need? - really, I don't know, maybe KG or Martin would have some idea. The delay would be greater (tens of samples), but it would be numerically more stable (no feedback), and it could possibly be linear-phase. Whether those last two would have a more/less suitable sound, I can't say, just thinking that maybe it could be done with lower CPU load, and still much less latency than FFT.

by **KG_is_back** » Sun Aug 10, 2014 2:53 pm

I just connected the thing to an impulse and an analyser and the IR of the thing is more than 20000samples long. That is nothing surprising - filters with high Q have longer "ringing" + when you chain them, it is like if you have convoluted their impulse responses. The IR length effectively adds together. a band pass filter with Q about 15 (which are present in this schematic) have about 1000-2000sample long IR. with 15chained the length roughly sits - about 20000.
That is not even an filter response - it's basically a reverb already.

One thing that comes to my mind that would greatly simplify the thing is to use feedback comb filter as a starting point. This filter creates narrow frequency peaks spaced evenly (harmonic tones), removing everything between them. It is used in physical modeling to turn noise burst into a string sound (Karplus-Strong algorithm).
Then follow that by series of band-pass filters to isolate only one peak (probably much less then 15 will be needed for that). You may even use lowpass instead of bandpass to remove upper harmonics and leave only fundamental.

by **KG_is_back** » Sun Aug 10, 2014 3:20 pm

Here it is. Comb filter followed by two low pass filters to remove harmonics above fundamental. Has very narrow bandpass, probably even narrower than your 15biquads. The resonance in the comb filter greatly affects the bandpass (smaller it is, more resonance and narrower filter).

by **tester** » Sun Aug 10, 2014 5:26 pm

The tests I did (and do) were/are through listening and comparing (that's why I ended up with so many filters per channel when I switched from FFT to btwts). No numerical science, no graphs. Having raw track A and processed track B, and having expectations to audible features to achieve - I just took what I had under my hand. Since then I'm looking for comparable alternatives, that I could modify for quicker job.

I guess, this also illustrates the difference between programming market and sound production market. ;-)

*

I'm attaching a sample that I can output from cool edit. It contains noise recorded from FS schematic and processed part (c.a. 1116Hz carrier, 16Hz window around it set in FFT filter). As you can hear - it's very different from yours. Things are not that simple when you listen to them, and they get even more complicated if you combine dozens of such spikes in order to get harmony and aspects of source dynamics.

I'm not sure how the attached schematics handles the window frequency? (top, center, bottom to the carrier?). And it seems to have limited narrowing capabilities (even at 1Hz I hear very high vibrations coming out of it). My old bandpass design is narrower, just listen to it.

can this be simplified / optimized for speed?

can this be simplified / optimized for speed?

Re: can this be simplified / optimized for speed?

Re: can this be simplified / optimized for speed?

Re: can this be simplified / optimized for speed?

Re: can this be simplified / optimized for speed?

Re: can this be simplified / optimized for speed?

Re: can this be simplified / optimized for speed?

Re: can this be simplified / optimized for speed?

Re: can this be simplified / optimized for speed?

Who is online