If you have a problem or need to report a bug please email : support@dsprobotics.com
There are 3 sections to this support area:
DOWNLOADS: access to product manuals, support files and drivers
HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects
USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here
NEW REGISTRATIONS - please contact us if you wish to register on the forum
Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright
can this be simplified / optimized for speed?
9 posts
• Page 1 of 1
can this be simplified / optimized for speed?
This is an old design created on SM forum. Quick question. Can this be optimized/simplified? If one such fellow is running, then there is no problem, but with 40 on board - things start to get heavy.
Alternative is to use Trogs FFT/iFFT module, but again - using single one (stereo) is fine, but using two of them (at 32kpts), with other stuff on board - starts to be too much. (or maybe not? while it shows low CPU usage, I have glitches...)
Alternative is to use Trogs FFT/iFFT module, but again - using single one (stereo) is fine, but using two of them (at 32kpts), with other stuff on board - starts to be too much. (or maybe not? while it shows low CPU usage, I have glitches...)
- Attachments
-
- simplify.fsm
- (4.66 KiB) Downloaded 781 times
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
Re: can this be simplified / optimized for speed?
You're using green and the mono4 channels well there, so the only optimisations I can see would be to reduce the amount of reading/writing to memory - pretty much all assembly stuff I'm afraid!
- The DSP code compiler is pretty rubbish - I can see a fair few redundant reads/writes in the bi-quad's compiler output.
- Combine the 'chains' of bi-quads into single primitives for less stream in/out reads and writes.
- Which would also allow 'shufps' optimisation of the pack/unpack loop.
No idea how much more efficient that would be - but just one 'cache miss' when accessing memory can easily cost more CPU cycles than the rest of that code put together, so I suspect that it would be well worth doing for the number of copies that you need.
I'm pretty busy for the rest of the weekend, but if no-one else jumps in, I'll take a look when I get time - remind me in a few days if you haven't got anywhere with it.
- The DSP code compiler is pretty rubbish - I can see a fair few redundant reads/writes in the bi-quad's compiler output.
- Combine the 'chains' of bi-quads into single primitives for less stream in/out reads and writes.
- Which would also allow 'shufps' optimisation of the pack/unpack loop.
No idea how much more efficient that would be - but just one 'cache miss' when accessing memory can easily cost more CPU cycles than the rest of that code put together, so I suspect that it would be well worth doing for the number of copies that you need.
I'm pretty busy for the rest of the weekend, but if no-one else jumps in, I'll take a look when I get time - remind me in a few days if you haven't got anywhere with it.
All schematics/modules I post are free for all to use - but a credit is always polite!
Don't stagnate, mutate to create!
Don't stagnate, mutate to create!
-
trogluddite - Posts: 1730
- Joined: Fri Oct 22, 2010 12:46 am
- Location: Yorkshire, UK
Re: can this be simplified / optimized for speed?
No rush, this is a prototyping part right now, so it may not work as expected at all.
I'm rethinking 3rd solution - recaching part of audio and remixing it with ongoing part, driven by your FFT/iFFT guts.
I'm rethinking 3rd solution - recaching part of audio and remixing it with ongoing part, driven by your FFT/iFFT guts.
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
Re: can this be simplified / optimized for speed?
Yes... for a chain of biquads, there are many ways how to reduce CPU. One thing comes to my mind, is to put them into single code block and use output of the previous as an input for next (directly). You can prevent a lot of completely nonsense memory writing, because what is "in" in one block is exactly same as "out" in previous including the delay chain.
Another way, of reducing read/write stuff is to use circular buffer instead of a delay chain (in assembly - in code read/write arrays is done differently (per channel) so that would be CPU blast). However the delay chain is only 3 variables long, so the improvement of having less writing to memory might be diminished by need to calculate index for the loop.
By the way, are you concerned that our schematic will introduce one sample delay, because you have the pack/unpack connected sort of in feedback?
Another way, of reducing read/write stuff is to use circular buffer instead of a delay chain (in assembly - in code read/write arrays is done differently (per channel) so that would be CPU blast). However the delay chain is only 3 variables long, so the improvement of having less writing to memory might be diminished by need to calculate index for the loop.
By the way, are you concerned that our schematic will introduce one sample delay, because you have the pack/unpack connected sort of in feedback?
- KG_is_back
- Posts: 1196
- Joined: Tue Oct 22, 2013 5:43 pm
- Location: Slovakia
Re: can this be simplified / optimized for speed?
Small delays are not a problem.
Basically Im' thinking on reconstructing (at least to some degree) a process that I'm doing manually. Normally it takes a lot of hdd space and time... and invention if you wish to do something else. Originally - I used CoolEdit (aka adobe audition, only old versions) FFT filter, but it was at times of SM, when there was no Trogs FFT/iFFT modules. It appeared, that it's possible to switch into butterworth filters, to get somewhat similar results, although I'm not sure if they can really stay for that particular project I have in mind. There seem to be subtle but vital differences in transparency produced by the FFT process.
The concept is this. When I have prepared what is to be filtered - a lot of narrow sharp filters are added to that layer. With FFT filter (at hi-res), these will be only array data windows combined together and placed on it. With butterworths - there will be 10-20 individual sharp filters in place (like this one), tuned to cover the ranges of few to 40Hz. But before something else is done - filtered layer is mixed with itself on a resampled level, which means that in live mode FFT filter can't be used (due to non 1:1 rescaling of freq windows), and using bttwrths - the amount of them doubles. Or - first layer could be recorded for some time (best would be to get it cached - but it's 30-60 minutes long file...) and used as in-mix with playback. Probably I will end up with making just the top part, to keep the schematic running live (because I'd like to add there some filters for shaping the background, and some modulators, to get different direction of effects).
And then is first destination layer exported for further processing.
Now - what the filters do - they extract small portions of dynamics of a background sound, and they create some sort of tonal representation of it. But what makes the effect is "how" they do it and "what" they use as a source.
KG, you might be right, for example when I tried to replace the schematic with blue'ish modules (suited for modulation) - while CPU wasn't that high - I had glitches. Whether on green or blue - destination pack of btwths should contail 20 such units per process, and either one or two such packs would be used.
One of my thought is - is there a simpler design, that can do the same as a bunch of so combined butterworths? Bandpass filter with very (!) sharp edges. Maybe there are some parameters that can be combined together, to get such unit?
Basically Im' thinking on reconstructing (at least to some degree) a process that I'm doing manually. Normally it takes a lot of hdd space and time... and invention if you wish to do something else. Originally - I used CoolEdit (aka adobe audition, only old versions) FFT filter, but it was at times of SM, when there was no Trogs FFT/iFFT modules. It appeared, that it's possible to switch into butterworth filters, to get somewhat similar results, although I'm not sure if they can really stay for that particular project I have in mind. There seem to be subtle but vital differences in transparency produced by the FFT process.
The concept is this. When I have prepared what is to be filtered - a lot of narrow sharp filters are added to that layer. With FFT filter (at hi-res), these will be only array data windows combined together and placed on it. With butterworths - there will be 10-20 individual sharp filters in place (like this one), tuned to cover the ranges of few to 40Hz. But before something else is done - filtered layer is mixed with itself on a resampled level, which means that in live mode FFT filter can't be used (due to non 1:1 rescaling of freq windows), and using bttwrths - the amount of them doubles. Or - first layer could be recorded for some time (best would be to get it cached - but it's 30-60 minutes long file...) and used as in-mix with playback. Probably I will end up with making just the top part, to keep the schematic running live (because I'd like to add there some filters for shaping the background, and some modulators, to get different direction of effects).
And then is first destination layer exported for further processing.
Now - what the filters do - they extract small portions of dynamics of a background sound, and they create some sort of tonal representation of it. But what makes the effect is "how" they do it and "what" they use as a source.
KG, you might be right, for example when I tried to replace the schematic with blue'ish modules (suited for modulation) - while CPU wasn't that high - I had glitches. Whether on green or blue - destination pack of btwths should contail 20 such units per process, and either one or two such packs would be used.
One of my thought is - is there a simpler design, that can do the same as a bunch of so combined butterworths? Bandpass filter with very (!) sharp edges. Maybe there are some parameters that can be combined together, to get such unit?
- Attachments
-
- blue-simplify.fsm
- (34.6 KiB) Downloaded 798 times
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
Re: can this be simplified / optimized for speed?
Just thinking out loud...
You have in your 'filter pack' effectively 15 bi-quads chained per channel, each with four 'multiply-add/subtract' for the co-efficients.
CPU wise, we could say that you're using the same power as 60 'taps' - in fact, much more than that due to the parts that deal with the feedback, de-normal removal etc.
So , I wonder, how many 'taps' would an FIR filter with similar properties need? - really, I don't know, maybe KG or Martin would have some idea. The delay would be greater (tens of samples), but it would be numerically more stable (no feedback), and it could possibly be linear-phase. Whether those last two would have a more/less suitable sound, I can't say, just thinking that maybe it could be done with lower CPU load, and still much less latency than FFT.
You have in your 'filter pack' effectively 15 bi-quads chained per channel, each with four 'multiply-add/subtract' for the co-efficients.
CPU wise, we could say that you're using the same power as 60 'taps' - in fact, much more than that due to the parts that deal with the feedback, de-normal removal etc.
So , I wonder, how many 'taps' would an FIR filter with similar properties need? - really, I don't know, maybe KG or Martin would have some idea. The delay would be greater (tens of samples), but it would be numerically more stable (no feedback), and it could possibly be linear-phase. Whether those last two would have a more/less suitable sound, I can't say, just thinking that maybe it could be done with lower CPU load, and still much less latency than FFT.
All schematics/modules I post are free for all to use - but a credit is always polite!
Don't stagnate, mutate to create!
Don't stagnate, mutate to create!
-
trogluddite - Posts: 1730
- Joined: Fri Oct 22, 2010 12:46 am
- Location: Yorkshire, UK
Re: can this be simplified / optimized for speed?
I just connected the thing to an impulse and an analyser and the IR of the thing is more than 20000samples long. That is nothing surprising - filters with high Q have longer "ringing" + when you chain them, it is like if you have convoluted their impulse responses. The IR length effectively adds together. a band pass filter with Q about 15 (which are present in this schematic) have about 1000-2000sample long IR. with 15chained the length roughly sits - about 20000.
That is not even an filter response - it's basically a reverb already.
One thing that comes to my mind that would greatly simplify the thing is to use feedback comb filter as a starting point. This filter creates narrow frequency peaks spaced evenly (harmonic tones), removing everything between them. It is used in physical modeling to turn noise burst into a string sound (Karplus-Strong algorithm).
Then follow that by series of band-pass filters to isolate only one peak (probably much less then 15 will be needed for that). You may even use lowpass instead of bandpass to remove upper harmonics and leave only fundamental.
That is not even an filter response - it's basically a reverb already.
One thing that comes to my mind that would greatly simplify the thing is to use feedback comb filter as a starting point. This filter creates narrow frequency peaks spaced evenly (harmonic tones), removing everything between them. It is used in physical modeling to turn noise burst into a string sound (Karplus-Strong algorithm).
Then follow that by series of band-pass filters to isolate only one peak (probably much less then 15 will be needed for that). You may even use lowpass instead of bandpass to remove upper harmonics and leave only fundamental.
- KG_is_back
- Posts: 1196
- Joined: Tue Oct 22, 2013 5:43 pm
- Location: Slovakia
Re: can this be simplified / optimized for speed?
Here it is. Comb filter followed by two low pass filters to remove harmonics above fundamental. Has very narrow bandpass, probably even narrower than your 15biquads. The resonance in the comb filter greatly affects the bandpass (smaller it is, more resonance and narrower filter).
- Attachments
-
- comb-bandpass.osm
- (38.42 KiB) Downloaded 819 times
- KG_is_back
- Posts: 1196
- Joined: Tue Oct 22, 2013 5:43 pm
- Location: Slovakia
Re: can this be simplified / optimized for speed?
The tests I did (and do) were/are through listening and comparing (that's why I ended up with so many filters per channel when I switched from FFT to btwts). No numerical science, no graphs. Having raw track A and processed track B, and having expectations to audible features to achieve - I just took what I had under my hand. Since then I'm looking for comparable alternatives, that I could modify for quicker job.
I guess, this also illustrates the difference between programming market and sound production market.
*
I'm attaching a sample that I can output from cool edit. It contains noise recorded from FS schematic and processed part (c.a. 1116Hz carrier, 16Hz window around it set in FFT filter). As you can hear - it's very different from yours. Things are not that simple when you listen to them, and they get even more complicated if you combine dozens of such spikes in order to get harmony and aspects of source dynamics.
I'm not sure how the attached schematics handles the window frequency? (top, center, bottom to the carrier?). And it seems to have limited narrowing capabilities (even at 1Hz I hear very high vibrations coming out of it). My old bandpass design is narrower, just listen to it.
I guess, this also illustrates the difference between programming market and sound production market.
*
I'm attaching a sample that I can output from cool edit. It contains noise recorded from FS schematic and processed part (c.a. 1116Hz carrier, 16Hz window around it set in FFT filter). As you can hear - it's very different from yours. Things are not that simple when you listen to them, and they get even more complicated if you combine dozens of such spikes in order to get harmony and aspects of source dynamics.
I'm not sure how the attached schematics handles the window frequency? (top, center, bottom to the carrier?). And it seems to have limited narrowing capabilities (even at 1Hz I hear very high vibrations coming out of it). My old bandpass design is narrower, just listen to it.
- Attachments
-
- 1116carrier+16Hzwindow(24kpts+blackman).zip
- (463.59 KiB) Downloaded 801 times
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
9 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 20 guests