If you have a problem or need to report a bug please email : support@dsprobotics.com
There are 3 sections to this support area:
DOWNLOADS: access to product manuals, support files and drivers
HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects
USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here
NEW REGISTRATIONS - please contact us if you wish to register on the forum
Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright
FFT-based Audio Analyzer
FFT-based Audio Analyzer
Here is FFT-based Audio Analyzer. It takes two audio streams as input, one as reference signal, the other as signal to be compared against the reference signal. The comparison can result into a Bode plot.
The FFT-based Audio Analyzer is based on the Audio FFT, part of the STEM Example Projects published by Admin http://www.dsprobotics.com/support/viewtopic.php?f=82&t=1211, basing on the Analysis Toolkit published by Sambean in 2008 http://synthmaker.co.uk/forum/viewtopic.php?t=2409 and http://www.synthmaker.co.uk/dokuwiki/doku.php?id=user_creations:analysis:analysis_toolkit:analysis_toolkit
Any idea for implementing an averaging function, as refinement?
Possibly relying on Green Code, solely?
Or using a DSP Code block (albeit the data to be processed is not a stream)?
Or using Ruby (speed penalty) if no other solution?
What's the wanted averaging function? In the present .fsm, each time there is a "Get" event generated by the 200ms Ticker, the Float Array containing the frequency-domain info gets updated. I would like to define up to sixteen Float Arrays "above the Sheet", used as averaging buffer. Each time there is a "Get" event, the oldest Float Array gets replaced by the newest Float Array, so after having done this, a simple Green Code consisting on a few "Float Array Add" will combine the Float Arrays two-by-two, using a pyramid layout, resulting into a "total" Float Array as "tip". All you need is to divide all values of the "tip" Float Array by sixteen. The wanted Moving Average is there. This is the most common implementation, basing on a circular buffer.
There will be a dramatic CPU waste, if no circular buffer gets implemented. In such case, all sixteen buffers need to be re-written for each 200ms Ticker.
Even if there is a circular buffer, the CPU usage will be heavy, as for each 200ms Ticker, one need to sum-up all sixteen Float Arrays.
CPU usage will decrease, keeping the Moving Average Float Array in memory, only adding 1/16th of the newest Float Array, and only subtracting 1/16th of the oldest Float Array. Such method got described here http://www.dsprobotics.com/support/viewtopic.php?f=57&t=368 by Admin.
Such fast implementation generates a tiny error at each iteration, because of arithmetic rounding.
Such fast implementation should be assisted by a "leaking integrator" behavior, for avoiding the Float Array drifting beyond the +1/-1 limit when used for long periods, as possible effect of the above mentioned rounding errors.
Instead of programming an IIR Lowpass Filter (which itself is also going to generate rounding errors), one may try something radically simpler, like subtracting 1e-9 when the Moving Average is positive, and adding 1e-9 when the Moving Avarage is negative. We do this, to all elements contained into the Float Array. All elements will thus inherit a slight tendency to converge to zero.
For implementing the averaging function, as FS newbee I'm currently stuck because :
- On the FS worksheet, I don't see how and where I can store sixteen consecutive (in time) Float Arrays
- I need to access the required storage as a circular buffer, for avoiding re-writing the whole storage at each Ticker
Apparently, FS and the Graphical Programming concept (sketching a signal flow on a sheet) need TLC for dealing with the time dimension. Working with FS, I get the impression that everything remains simple and intuitive, provided you don't need to export consecutive frames (Float Arrays) into some "global" memory sitting "above" the worksheet, the same for the accompanying pointer.
I'm anxious to see the solution, or any workaround that may do the job. If there is genuine, clean solution, I would say "yes, now FS has become a full blown graphical DSP environment".
The above mentioned Audio Analyzer, plus some Averaged version of it, may add to the STEM Example Projects.
I intend to deal with the phase (deliver the phase plot).
I intend to deal with a delay detection & compensation, for easing the phase readout.
Any hint much appreciated,
Steph
The FFT-based Audio Analyzer is based on the Audio FFT, part of the STEM Example Projects published by Admin http://www.dsprobotics.com/support/viewtopic.php?f=82&t=1211, basing on the Analysis Toolkit published by Sambean in 2008 http://synthmaker.co.uk/forum/viewtopic.php?t=2409 and http://www.synthmaker.co.uk/dokuwiki/doku.php?id=user_creations:analysis:analysis_toolkit:analysis_toolkit
Any idea for implementing an averaging function, as refinement?
Possibly relying on Green Code, solely?
Or using a DSP Code block (albeit the data to be processed is not a stream)?
Or using Ruby (speed penalty) if no other solution?
What's the wanted averaging function? In the present .fsm, each time there is a "Get" event generated by the 200ms Ticker, the Float Array containing the frequency-domain info gets updated. I would like to define up to sixteen Float Arrays "above the Sheet", used as averaging buffer. Each time there is a "Get" event, the oldest Float Array gets replaced by the newest Float Array, so after having done this, a simple Green Code consisting on a few "Float Array Add" will combine the Float Arrays two-by-two, using a pyramid layout, resulting into a "total" Float Array as "tip". All you need is to divide all values of the "tip" Float Array by sixteen. The wanted Moving Average is there. This is the most common implementation, basing on a circular buffer.
There will be a dramatic CPU waste, if no circular buffer gets implemented. In such case, all sixteen buffers need to be re-written for each 200ms Ticker.
Even if there is a circular buffer, the CPU usage will be heavy, as for each 200ms Ticker, one need to sum-up all sixteen Float Arrays.
CPU usage will decrease, keeping the Moving Average Float Array in memory, only adding 1/16th of the newest Float Array, and only subtracting 1/16th of the oldest Float Array. Such method got described here http://www.dsprobotics.com/support/viewtopic.php?f=57&t=368 by Admin.
Such fast implementation generates a tiny error at each iteration, because of arithmetic rounding.
Such fast implementation should be assisted by a "leaking integrator" behavior, for avoiding the Float Array drifting beyond the +1/-1 limit when used for long periods, as possible effect of the above mentioned rounding errors.
Instead of programming an IIR Lowpass Filter (which itself is also going to generate rounding errors), one may try something radically simpler, like subtracting 1e-9 when the Moving Average is positive, and adding 1e-9 when the Moving Avarage is negative. We do this, to all elements contained into the Float Array. All elements will thus inherit a slight tendency to converge to zero.
For implementing the averaging function, as FS newbee I'm currently stuck because :
- On the FS worksheet, I don't see how and where I can store sixteen consecutive (in time) Float Arrays
- I need to access the required storage as a circular buffer, for avoiding re-writing the whole storage at each Ticker
Apparently, FS and the Graphical Programming concept (sketching a signal flow on a sheet) need TLC for dealing with the time dimension. Working with FS, I get the impression that everything remains simple and intuitive, provided you don't need to export consecutive frames (Float Arrays) into some "global" memory sitting "above" the worksheet, the same for the accompanying pointer.
I'm anxious to see the solution, or any workaround that may do the job. If there is genuine, clean solution, I would say "yes, now FS has become a full blown graphical DSP environment".
The above mentioned Audio Analyzer, plus some Averaged version of it, may add to the STEM Example Projects.
I intend to deal with the phase (deliver the phase plot).
I intend to deal with a delay detection & compensation, for easing the phase readout.
Any hint much appreciated,
Steph
- Attachments
-
- FFT-based Audio Analyzer (GreenLines LinLogF) (gain only).fsm
- (615.14 KiB) Downloaded 2368 times
- steph_tsf
- Posts: 249
- Joined: Sun Aug 15, 2010 10:26 pm
Re: FFT-based Audio Analyzer
Nice project, and a very clean one, too.
I don't know if that averaging method you described is that much more effective than the one that is quite common (IIR Filter). I've added an IIR Filter to your schematic, that smooths display signal. I've also added a simple linear interpolator which actually is a FIR (just averaging). The result isn't much better. There is so much noise on the signal that the smoothing filter needs a very low cutoff, this makes the response of the graph very slow.
I don't know if that averaging method you described is that much more effective than the one that is quite common (IIR Filter). I've added an IIR Filter to your schematic, that smooths display signal. I've also added a simple linear interpolator which actually is a FIR (just averaging). The result isn't much better. There is so much noise on the signal that the smoothing filter needs a very low cutoff, this makes the response of the graph very slow.
- Attachments
-
- FFT-based Audio Analyzer (GreenLines LinLogF) (with filter).fsm
- (660.76 KiB) Downloaded 1775 times
Last edited by MyCo on Mon Jun 24, 2013 2:50 am, edited 1 time in total.
-
MyCo - Posts: 718
- Joined: Tue Jul 13, 2010 12:33 pm
- Location: Germany
Re: FFT-based Audio Analyzer
This is an simple circular buffer in ruby
- Code: Select all
# set in and out as Float Array
def init
@ring_buffer = []
input 0,nil
end
def event i
@ring_buffer.shift
@ring_buffer.push @ins[0]
output 0,@ring_buffer.flatten
end
-
digitalwhitebyte - Posts: 106
- Joined: Sat Jul 31, 2010 10:20 am
Re: FFT-based Audio Analyzer
1. "Zoom spectra" combined with FFT resolution change would be good for performance. You don't need high resolution for full view, but you need details when zooming spectral area.
2. I don't know whether sFFT (sparse FFT) would/could help to improve performance for higher resolutions (of some/most digital audio signals); steph_tsf and anyone else - could you comment on that? Is it doable in FS or worth of implementing? viewtopic.php?f=4&t=1513
2. I don't know whether sFFT (sparse FFT) would/could help to improve performance for higher resolutions (of some/most digital audio signals); steph_tsf and anyone else - could you comment on that? Is it doable in FS or worth of implementing? viewtopic.php?f=4&t=1513
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
Re: FFT-based Audio Analyzer
Mein Gott, that old averaging method looks like archaeology compared to what you are showing.MyCo wrote:I don't know if that averaging method you described is that much more effective than the one that is quite common (IIR Filter). I've added an IIR Filter to your schematic, that smooths display signal. I've also added a simple linear interpolator which actually is a FIR (just averaging).
IIR filter : you call this a "moving average buffer". On my side I recognize a 1st-order IIR lowpass filter (0.95 z-1 feedback and 0.05 input scaling) applied on the dB value of each spectral band, individually. I'm realizing that such little Green code materializes a bank of 1025 IIR filters (if the FFT is done on 2048 time-domain samples), each processing a different spectral band. This is great! One could try a 4th-order Bessel Lowpass using two 2nd-order cells in series. By the way, what if filtering, not the decibel values, but the linear magnitudes values ?
FIR filter : you call this a linear interpolation. At the moment I'm not sure I understand the functionality. That Shift Float Array component makes me wonder what's the actual processing, that gets done over there. Allow me some time to digest it.
Playing with the .fsm you have published, I see that the response displayed by the Audio Analyzer gets improperly warped when changing the FFT length parameter. Have you noticed this? What's the cause?
I'm taking the opportunity to submit a bruteforce averaging implementation, in order you to remember what's the look and feel of an old-fashioned linear averager. I call this a linear averager, because it averages the linear magnitudes, before the decibel conversion taking place. See the attached .fsm.
I'm sure you are going to laugh, when reading the kind of Ruby code I have produced, for dealing with this. For sure, the Green code IIR lowpass approach that you have shown, is much more effective, and much more flexible. Again, 1000 x thanks for this.
Playing with Ruby, being a newbee with Ruby, I had difficulties:
- I asked myself how to declare the 32 required storage (Arrays), without Ruby considering them as "nil" just after loading the diagram (leading to an error - Ruby complaining about variables being "nil")
- I asked myself how to tell Ruby to leave the storages intact each time the Ruby code got triggered by a new Magnitude Array arriving, for avoiding Ruby re-instanciating them with a "nil" content.
- I went for "global" variables, declared in the "init" section of the Ruby code, processed in the "event" section of the Ruby code - this is why their names begin with a $. I'm still asking myself if global variables are mandatory. Isn't there a simpler solution, only relying on local variables?
- I have a big question about Ruby managing to initialize the arrays, with a size that's not the one I'm specifying in the "init" section. In the "init" section, I initialize the storages using a moderate size, and while running the program, if I'm selecting a bigger FFT size, Ruby seems to understand that the storages sized need to be increased accordingly. This aspect is intriguing me. What's under the hood? In DSP, can we, shall we trust a system, taking decisions about array sizes? Isn't this going to create huge issues when dealing with audio buffers, and so on?
- Finally, I lost one hour of my life, trying to understand why Ruby was always outputting a zero magnitude Averaged Array. Let me explain. The bruteforce (shall I say stupid or naïve) averager was working nice until the last rescale, the rescale that's needed for compensating the amplification coming from adding N successive frames together. Say I'm dealing with a N x averager. I wanted to be nice. Instead of asking Ruby to divide each element of the final array by N, I remembered the old DSP days, preferring a 1-cycle multiply instead of a 20-cycles division. I thus asked Ruby to pre-compute a variable, equal to 1/N, with N as an integer coming from outside of the Ruby module. The Ruby module was OK with N. It could read it properly. To my surprise, because of N being defined as an integer, Ruby casted the 1/N variable as an integer. That was the problem, indeed. In the integer world, 1/N with N=8, doesn't equal 0.125. It equals zero! The very last line of my Ruby code was thus multiplying all Array elements by zero, instead of multiplying it by 1/N. What a shame! And, I still don't know how to get Ruby outputting 1/N as a Float, when N got defined as an Integer. What's the solution?
It is pretty amazing, to see radically different computing tastes and habits, co-residing in a same development environment like Flowstone is : Assembly, DSP high-level, Ruby, and Green. I like this.
Thanks, again.
Steph
- Attachments
-
- FFT-based Audio Analyzer (GreenLines LinLogF) (gain only)(bruteforce avg).fsm
- (50.7 KiB) Downloaded 1589 times
- steph_tsf
- Posts: 249
- Joined: Sun Aug 15, 2010 10:26 pm
Re: FFT-based Audio Analyzer
I called the IIR filter "moving average buffer" because it filters each sample of the incoming buffer in combination with the previous result buffer. It's a simple first order lowpass. I think there is a difference between working on linear vs. working on log scale. But this only effects the display, so who cares?
The FIR filter is a sample interpolator. It only works with one buffer. It smooths the waveform in the buffer by taking two adjacent samples and getting there mid point. This is done by shifting the incoming buffer by one sample, adding it with the original and then half the result. The filter itself is:
The error in my example was, that when you change the size, the IIR filter can't calculate correctly with the stored data. I've attached a fixed version, the clears the previous data, when the length changes.
Ruby is a very weird language. I had my problems with it, because it can do stuff, that no other language can do. And for other trivia stuff you need a huge amount of lines. The power of Ruby is doing huge processing tasks in just one line. Just as an example, this line replaces nearly all of your "bruteforce" averaging:
I've attached another schematic, where I replaced your "bruteforce" averaging with a version where I think it's the same. I've commented everything, so maybe you'll also understand that Ruby-line above.
Regarding the array initialization: Ruby doesn't care about initialization. You can assign anything to everything. Let's say you give x = 5, and later x = "bla" then you automatically changed the type of x. Same happens with arrays. When you write at an index that is higher than the current array size, then the array length is automatically adjusted.
The FIR filter is a sample interpolator. It only works with one buffer. It smooths the waveform in the buffer by taking two adjacent samples and getting there mid point. This is done by shifting the incoming buffer by one sample, adding it with the original and then half the result. The filter itself is:
- Code: Select all
f[t] = (in[t] + in[t-1]) / 2
The error in my example was, that when you change the size, the IIR filter can't calculate correctly with the stored data. I've attached a fixed version, the clears the previous data, when the length changes.
Ruby is a very weird language. I had my problems with it, because it can do stuff, that no other language can do. And for other trivia stuff you need a huge amount of lines. The power of Ruby is doing huge processing tasks in just one line. Just as an example, this line replaces nearly all of your "bruteforce" averaging:
- Code: Select all
sums = @avgBuf.transpose.map{|x| x.reduce(:+) / @in_avg.to_f}
I've attached another schematic, where I replaced your "bruteforce" averaging with a version where I think it's the same. I've commented everything, so maybe you'll also understand that Ruby-line above.
Regarding the array initialization: Ruby doesn't care about initialization. You can assign anything to everything. Let's say you give x = 5, and later x = "bla" then you automatically changed the type of x. Same happens with arrays. When you write at an index that is higher than the current array size, then the array length is automatically adjusted.
- Attachments
-
- FFT-based Audio Analyzer (GreenLines LinLogF) (gain only)(bruteforce avg)-3.fsm
- (90.2 KiB) Downloaded 1601 times
-
- FFT-based Audio Analyzer (GreenLines LinLogF) (with filter)2.fsm
- (155.13 KiB) Downloaded 1580 times
-
MyCo - Posts: 718
- Joined: Tue Jul 13, 2010 12:33 pm
- Location: Germany
Re: FFT-based Audio Analyzer
@MyCo about your True Averager implementation. Indeed your Ruby code behaves like a True Averager. Attached is a revision, only cosmetic changes not involving coding. I'm very happy with this. Thanks, again.
@MyCo about your IIR & FIR dB-domain filters in Green code. I would feel more comfortable, placing those filters before the dB conversion. Doing so, the FIR (possibly with lengths of 2,4,8,16,32) need to be considered as a True Averager generalization. The True Averager is indeed a trivial FIR implementation coming from the early eighties, when µP like the Zilog Z80 (8-bit) or the Motorola 68K (32-bit) could not execute a multiplication in one single cycle, albeit there were some slower CPU chips (around 1 Mips, only) like the Intel C51 family (8031, 8051) equipped with a 8x8 bit multiplier, and the Motorola 6809 equipped with a 16x16 hardware multiplier.
Back in those times, the early eighties, there were expensive machines costing the price of a luxury car, basing on some trivial microprocessor (say a Zilog Z80, Motorola 6809 or Motorola 68K), having an IDT 7210, TRW TDC1010 or TRW TMC2210 as Hardware Multiplier hooked on the bus, executing a 16x16 bit multiplication in 200ns without the microprocessor needing to wait for long.
In the mid eighties came the Intel 8087 and Motorola 68881/68882 Math Coprocessors (FPU - Floating Points Units), easy to hook on a Intel 8086 CPU, easy to hook on a Motorola 68030 CPU.
The big shake came in the early nineties, with the Intel 80486 and the Motorola 68040, both containing the FPU. Back in those times, albeit Intel was invading the market thanks to the PC, the Motorola 68040 still appeared as a winner for industry-oriented realtime applications, possibly manually optimized in assembler because of the elegant 68K instruction set. Motorola made a fatal mistake inside the 68040, degrading the specification of the built-in FPU. Indeed, the 68040 had a FPU, with less capabilities than the old external 68881 and 68882 FPUs. What a deception. Compatibility issues. They did this at Motorola, because they knew their chip would overheat at 40 MHz, if there was a full-blown 68881 or 68882 FPU inside, operating at full speed. Clearly, Motorola had issues reaching 50 MHz with the 68040, while at the same time Intel was planning (and showing in their labs) 80486 chips operating without a hiccup, well above 100 MHz.
Since that day (possibly a black Friday for all 68K fans) the Motorola 68K product line got depreciated. Motorola progressively divested the 68K line, recycled some par of the instruction set for creating a new industry-oriented product line that you know as the Freescale Coldfire MFC5xxx product line. In parallel, for not leaving DSP-oriented customers into a desert, Motorola continued the development of the 24-bit DSP56K Audio-specialized chips like the DSP56000, the DSP56002 in 1994, and finally the DSP56300 (single core) and DSP56600 family (dual core) that you find in Home Theaters, executing the Dolby AC-3 and other Multichannel compressed Audio algorithms. Could be Motorola managed to license the DSP56K architecture, for offering a second-source service, becaue Analog Devices is offering nearly the same, albeit with some improvements, through their SigmaDSP Audio Processors.
The game is now "nearly" over, as nowadays in 2013 any decent desktop or mobile CPU operating above 1 GHz like the Intel x86, or the Arm Cortex-A8/A9, can execute a 32x32 bit multiply-accumulate in one or two nanosecond, with double or quad throughput in case of dual or quad core chips. When less performance is required any ARM Cortex-M4 clocked around 100 MHz, possibly embedding a FPU, can execute a 32x32 bit multiply-accumulate in 10 or 20 nanosecond.
Nowadays the trend is to target the same speed, with twice the resolution which leads to 64-bit audio. The 64-bit audio is not as stupid as it sounds, because when executing a very long FFT (say one million samples), you face the same issue as when executing a very long FIR. For instance, if during 256 samples you correlate a signal evolving between -1 and +1, with another signal that's identical to it, the output signal may attain a +100 value instead of a +1 value. You'll get arithmetic saturation. Imagine correlating a DC signal at +0.5, with the first FFT spectral band which is DC. This is why Audio DSP chips like the DSP56K offer a 8-bit range extension within their accumulator. Thanks to this, they don't introduce any scaling resolution loss, when dealing with FFTs and FIRs having a length of 256.
Imagine this, now with FIRs or FFTs, having a length of 1 million samples.
Let's see if a fixed point 32 x 32 bit multiplier, delivering a 64-bit result, fits the requirement. It doesn't fit, as basically you need 24 bits (the audio) + 24 bits (the correlating wave) + 20 bits (the range extension through the 1 Meg accumulation) = 68 bits. Even if you carefully justify the audio and the correlating wave, you miss 4 bits. You "only" can remain exact for 256k FFTs and 256k FIRs. Is that a big issue? I don't think so. Such quality may exceed most practical requirements. Now realize the advantages. Chips like the Microchip PIC32 (they embed a 32-bit MIPS CPU), or the ARM Cortex-M4 (they embed a 32-bit ARM CPU) can execute this without a hiccup. They cost between 2 dollars and 12 dollars, for small orders. RAM size is the usual issue. You will end up with a complicated hardware, if you need to add external RAM. If you read the Flowstone forum, you'll see statements from me, saying that Flowstone should generate executables for the Microchip PIC32 and the ARM Cortex-M4. They are not equipped with a MMU. Only minimalist MMU-free Linux distributions can run on these, which I find to be a big positive point. I would like to see a tiny VST host running on a PIC32 and a ARM Cortex-M4. Anybody in the room, wanting to try this? If RAM needs to be large, there are better chips in the ARM family, like the ones equipping BeagleBoard and more recently, RaspBerry Pi, the 25-dollar computer. I'd like to hook a Creative Labs X-Fi Surround 5.1 Pro on a Raspberry, which is a USB multichannel soundcard. Is this feasible? Are there Linux drivers to be used?
An elegant solution consists into a fixed point 32 x 32 bit multiply unit (put the 24-bit audio and the 24-bit correlating wave, without taking care about justification), boosted by a 20-bit range extender within the accumulator. The accumulator will thus provide the 64-bit nominal width (as full result of a 32x32 multiplication), plus a 20-bit extension towards the MSB, for accommodating the growth coming from accumulating 1 Meg results. You end up with a 84 bit accumulator. At this stage, for streamlining the design, you may want to provide a 32-bit extension towards the accumulator, instead of those weird 20-bits. With such inexpensive fixed-point architecture, once the Multiply-accumulate is done on all samples (possibly 4 Gig samples with the 32-bit extension), you need three consecutive 32-bit register reads for accessing it. Practically speaking, most of the time like for hearing the result, you will only use the most significant 24 bits, thus you will access the final result using a single 32-bit register read. This looks optimal.
Nowadays, as cherry on the pie, such audio architecture remains compatible with 32-bit audio. My best recommendation is that MIPS and ARM, manage to introduce a 32-bit range extender above their 64-bit accumulator, rendering those machines fully compatible with accurate 32-bit audio, and accurate 4 Gig FFTs and FIRs applied to it.
A truly minimalist solution consists into improving the DSP56K architecture, for coping with 1-million-long exact FIRs or FFTs. We thus need the same relatively small DSP56K multiplier executing a 24 x 24 bit multiply-accumulation, now featuring a 20-bit range extension within the accumulator, instead of the standard 8-bit range extension provided by the DSP56K. Such minimalist solution is however not compatible with 32-bit audio samples.
Today, using modern CPUs like the x86 chips, the "True Averager" looks like a dinosaur. We have plenty computing power at our disposition. We can do more than adding successive frames together. The FIR you have suggested looks like a "True Averager" generalization. Using the processing power at our disposition, we can introduce weighting factors, which are the essence of FIR filters. Initially, you went for a 3-tap FIR after the dB conversion stage. Let's try with a 4-tap FIR, a 8-tap FIR, a 16-tap FIR, and a 32-tap FIR all implementing embryonnary Bessel or Butterworth lowpass filters, before the dB conversion stage. They should provide a more effective spike reduction, than the dinosaur x4, x8, x16, x32 "True Averager".
Nowadays, on top of the computing power, we have the 32-bit resolution. This means that other filtering schemes can be used, like the 1st-order IIR lowpass filter you have implemented. Provided there is enough resolution (and with our 32-bit system, I would say there is enough resolution), those IIR filters can duplicate analog filters, say a 4th-order Butterworth lowpass, or a 4-th order Bessel lowpass, and take very few CPU cycles to compute. Initially, you went for a 1st-order IIR lowpass after the dB conversion stage. Let's try with a 4th-order Butterworth IIR lowpass, or a 4th-order Bessel IIR lowpass, before the dB conversion stage. They should provide an even more effective spike reduction, and eat less computing power. When analyzing the coefficients of your 1st-order IIR filter (0.95 in the feedback, and 0.05 from the output), I come to the conclusion that they lead to a -3dB lowpass response at 1/122 the sampling frequency. I'm not talking about the audio sampling frequency. I'm talking about the frequency each spectral band gets re-calculated, which should correspond to the 10 Hz Flowstone Ticker frequency (if set at 0.1s). Thus, with a 0.1s Ticker, the actual -3dB response of the lowpass filter got set to 10 Hz / 122 = 0.082 Hz.
I'm attaching :
- a .fsm, with a few cosmetic changes, which is the one implementing the Ruby "True Averager", with your nicely commented Ruby code. This is very pedagogic. Thanks.
- a .jpg screenshot of IIR Lab mini, a little program I made using MS Visual Basic for calculating IIR filters. For duplicating your IIR lowpass, I have set a 44100 Hz as virtual sampling frequency (it should be 10 Hz, but the program doesn't allow such low value), a 1st-order, and a 360 Hz frequency. The "LP" coefficient must be set to 1, and the "HP" coefficient must be set to zero (this is a pure lowpass). On the schematic, you get all IIR coefficients properly displayed. Why 360 Hz here ? Because 44100 Hz multiplied by 0.0825 = 360 Hz roughly.
- a .zip containing the install files for IIR Lab mini.
Steph
@MyCo about your IIR & FIR dB-domain filters in Green code. I would feel more comfortable, placing those filters before the dB conversion. Doing so, the FIR (possibly with lengths of 2,4,8,16,32) need to be considered as a True Averager generalization. The True Averager is indeed a trivial FIR implementation coming from the early eighties, when µP like the Zilog Z80 (8-bit) or the Motorola 68K (32-bit) could not execute a multiplication in one single cycle, albeit there were some slower CPU chips (around 1 Mips, only) like the Intel C51 family (8031, 8051) equipped with a 8x8 bit multiplier, and the Motorola 6809 equipped with a 16x16 hardware multiplier.
Back in those times, the early eighties, there were expensive machines costing the price of a luxury car, basing on some trivial microprocessor (say a Zilog Z80, Motorola 6809 or Motorola 68K), having an IDT 7210, TRW TDC1010 or TRW TMC2210 as Hardware Multiplier hooked on the bus, executing a 16x16 bit multiplication in 200ns without the microprocessor needing to wait for long.
In the mid eighties came the Intel 8087 and Motorola 68881/68882 Math Coprocessors (FPU - Floating Points Units), easy to hook on a Intel 8086 CPU, easy to hook on a Motorola 68030 CPU.
The big shake came in the early nineties, with the Intel 80486 and the Motorola 68040, both containing the FPU. Back in those times, albeit Intel was invading the market thanks to the PC, the Motorola 68040 still appeared as a winner for industry-oriented realtime applications, possibly manually optimized in assembler because of the elegant 68K instruction set. Motorola made a fatal mistake inside the 68040, degrading the specification of the built-in FPU. Indeed, the 68040 had a FPU, with less capabilities than the old external 68881 and 68882 FPUs. What a deception. Compatibility issues. They did this at Motorola, because they knew their chip would overheat at 40 MHz, if there was a full-blown 68881 or 68882 FPU inside, operating at full speed. Clearly, Motorola had issues reaching 50 MHz with the 68040, while at the same time Intel was planning (and showing in their labs) 80486 chips operating without a hiccup, well above 100 MHz.
Since that day (possibly a black Friday for all 68K fans) the Motorola 68K product line got depreciated. Motorola progressively divested the 68K line, recycled some par of the instruction set for creating a new industry-oriented product line that you know as the Freescale Coldfire MFC5xxx product line. In parallel, for not leaving DSP-oriented customers into a desert, Motorola continued the development of the 24-bit DSP56K Audio-specialized chips like the DSP56000, the DSP56002 in 1994, and finally the DSP56300 (single core) and DSP56600 family (dual core) that you find in Home Theaters, executing the Dolby AC-3 and other Multichannel compressed Audio algorithms. Could be Motorola managed to license the DSP56K architecture, for offering a second-source service, becaue Analog Devices is offering nearly the same, albeit with some improvements, through their SigmaDSP Audio Processors.
The game is now "nearly" over, as nowadays in 2013 any decent desktop or mobile CPU operating above 1 GHz like the Intel x86, or the Arm Cortex-A8/A9, can execute a 32x32 bit multiply-accumulate in one or two nanosecond, with double or quad throughput in case of dual or quad core chips. When less performance is required any ARM Cortex-M4 clocked around 100 MHz, possibly embedding a FPU, can execute a 32x32 bit multiply-accumulate in 10 or 20 nanosecond.
Nowadays the trend is to target the same speed, with twice the resolution which leads to 64-bit audio. The 64-bit audio is not as stupid as it sounds, because when executing a very long FFT (say one million samples), you face the same issue as when executing a very long FIR. For instance, if during 256 samples you correlate a signal evolving between -1 and +1, with another signal that's identical to it, the output signal may attain a +100 value instead of a +1 value. You'll get arithmetic saturation. Imagine correlating a DC signal at +0.5, with the first FFT spectral band which is DC. This is why Audio DSP chips like the DSP56K offer a 8-bit range extension within their accumulator. Thanks to this, they don't introduce any scaling resolution loss, when dealing with FFTs and FIRs having a length of 256.
Imagine this, now with FIRs or FFTs, having a length of 1 million samples.
Let's see if a fixed point 32 x 32 bit multiplier, delivering a 64-bit result, fits the requirement. It doesn't fit, as basically you need 24 bits (the audio) + 24 bits (the correlating wave) + 20 bits (the range extension through the 1 Meg accumulation) = 68 bits. Even if you carefully justify the audio and the correlating wave, you miss 4 bits. You "only" can remain exact for 256k FFTs and 256k FIRs. Is that a big issue? I don't think so. Such quality may exceed most practical requirements. Now realize the advantages. Chips like the Microchip PIC32 (they embed a 32-bit MIPS CPU), or the ARM Cortex-M4 (they embed a 32-bit ARM CPU) can execute this without a hiccup. They cost between 2 dollars and 12 dollars, for small orders. RAM size is the usual issue. You will end up with a complicated hardware, if you need to add external RAM. If you read the Flowstone forum, you'll see statements from me, saying that Flowstone should generate executables for the Microchip PIC32 and the ARM Cortex-M4. They are not equipped with a MMU. Only minimalist MMU-free Linux distributions can run on these, which I find to be a big positive point. I would like to see a tiny VST host running on a PIC32 and a ARM Cortex-M4. Anybody in the room, wanting to try this? If RAM needs to be large, there are better chips in the ARM family, like the ones equipping BeagleBoard and more recently, RaspBerry Pi, the 25-dollar computer. I'd like to hook a Creative Labs X-Fi Surround 5.1 Pro on a Raspberry, which is a USB multichannel soundcard. Is this feasible? Are there Linux drivers to be used?
An elegant solution consists into a fixed point 32 x 32 bit multiply unit (put the 24-bit audio and the 24-bit correlating wave, without taking care about justification), boosted by a 20-bit range extender within the accumulator. The accumulator will thus provide the 64-bit nominal width (as full result of a 32x32 multiplication), plus a 20-bit extension towards the MSB, for accommodating the growth coming from accumulating 1 Meg results. You end up with a 84 bit accumulator. At this stage, for streamlining the design, you may want to provide a 32-bit extension towards the accumulator, instead of those weird 20-bits. With such inexpensive fixed-point architecture, once the Multiply-accumulate is done on all samples (possibly 4 Gig samples with the 32-bit extension), you need three consecutive 32-bit register reads for accessing it. Practically speaking, most of the time like for hearing the result, you will only use the most significant 24 bits, thus you will access the final result using a single 32-bit register read. This looks optimal.
Nowadays, as cherry on the pie, such audio architecture remains compatible with 32-bit audio. My best recommendation is that MIPS and ARM, manage to introduce a 32-bit range extender above their 64-bit accumulator, rendering those machines fully compatible with accurate 32-bit audio, and accurate 4 Gig FFTs and FIRs applied to it.
A truly minimalist solution consists into improving the DSP56K architecture, for coping with 1-million-long exact FIRs or FFTs. We thus need the same relatively small DSP56K multiplier executing a 24 x 24 bit multiply-accumulation, now featuring a 20-bit range extension within the accumulator, instead of the standard 8-bit range extension provided by the DSP56K. Such minimalist solution is however not compatible with 32-bit audio samples.
Today, using modern CPUs like the x86 chips, the "True Averager" looks like a dinosaur. We have plenty computing power at our disposition. We can do more than adding successive frames together. The FIR you have suggested looks like a "True Averager" generalization. Using the processing power at our disposition, we can introduce weighting factors, which are the essence of FIR filters. Initially, you went for a 3-tap FIR after the dB conversion stage. Let's try with a 4-tap FIR, a 8-tap FIR, a 16-tap FIR, and a 32-tap FIR all implementing embryonnary Bessel or Butterworth lowpass filters, before the dB conversion stage. They should provide a more effective spike reduction, than the dinosaur x4, x8, x16, x32 "True Averager".
Nowadays, on top of the computing power, we have the 32-bit resolution. This means that other filtering schemes can be used, like the 1st-order IIR lowpass filter you have implemented. Provided there is enough resolution (and with our 32-bit system, I would say there is enough resolution), those IIR filters can duplicate analog filters, say a 4th-order Butterworth lowpass, or a 4-th order Bessel lowpass, and take very few CPU cycles to compute. Initially, you went for a 1st-order IIR lowpass after the dB conversion stage. Let's try with a 4th-order Butterworth IIR lowpass, or a 4th-order Bessel IIR lowpass, before the dB conversion stage. They should provide an even more effective spike reduction, and eat less computing power. When analyzing the coefficients of your 1st-order IIR filter (0.95 in the feedback, and 0.05 from the output), I come to the conclusion that they lead to a -3dB lowpass response at 1/122 the sampling frequency. I'm not talking about the audio sampling frequency. I'm talking about the frequency each spectral band gets re-calculated, which should correspond to the 10 Hz Flowstone Ticker frequency (if set at 0.1s). Thus, with a 0.1s Ticker, the actual -3dB response of the lowpass filter got set to 10 Hz / 122 = 0.082 Hz.
I'm attaching :
- a .fsm, with a few cosmetic changes, which is the one implementing the Ruby "True Averager", with your nicely commented Ruby code. This is very pedagogic. Thanks.
- a .jpg screenshot of IIR Lab mini, a little program I made using MS Visual Basic for calculating IIR filters. For duplicating your IIR lowpass, I have set a 44100 Hz as virtual sampling frequency (it should be 10 Hz, but the program doesn't allow such low value), a 1st-order, and a 360 Hz frequency. The "LP" coefficient must be set to 1, and the "HP" coefficient must be set to zero (this is a pure lowpass). On the schematic, you get all IIR coefficients properly displayed. Why 360 Hz here ? Because 44100 Hz multiplied by 0.0825 = 360 Hz roughly.
- a .zip containing the install files for IIR Lab mini.
Steph
- Attachments
-
- IIR Lab mini.zip
- (224.26 KiB) Downloaded 1728 times
-
- IIR Lab mini - screen capture.jpg (29.67 KiB) Viewed 59237 times
-
- FFT-based Audio Analyzer_GreenLines LinLogF_ gain only_True Averager using Ruby.fsm
- (672.4 KiB) Downloaded 1706 times
- steph_tsf
- Posts: 249
- Joined: Sun Aug 15, 2010 10:26 pm
Re: FFT-based Audio Analyzer
Thanks for pointing this. Would be nice to try the Incremental Averaging Method described by Admin in http://www.dsprobotics.com/support/viewtopic.php?f=57&t=368digitalwhitebyte wrote:This is an simple circular buffer in ruby
- Code: Select all
# set in and out as Float Array
def init
@ring_buffer = []
input 0,nil
end
def event i
@ring_buffer.shift
@ring_buffer.push @ins[0]
output 0,@ring_buffer.flatten
end
- steph_tsf
- Posts: 249
- Joined: Sun Aug 15, 2010 10:26 pm
Re: FFT-based Audio Analyzer
Nice excursion into DSP history I don't think that DSP ICs are deprecated. They have their special abilities, a SHARC for example can easily reach 5 times the floating point processing power of modern 12Core desktop processors. Their benefit is, they they do several different instructions at once in a true single cycle. With true single cycle I mean, they don't have to deal with instruction latencies because of their memory structure. Also all DSPs that I know have special buffer architectures (circular), that you'll never ever find in desktop CPUs.
If the dev. environment and the emulator wouldn't cost 5000$, then the SHARCs would be my preferred DSP IC plattform.
I've tested several other modern micros, but til now I haven't found a cheap one, that could be abused as cheap DSP. So far I tested dsPIC (completely useless), PIC32 (more usefull), AVR32 (nice, but special architecture), STM32F4 (perfect, but still not fast enough). I've also tried a small Spartan6 (built my own board for that)... This was just painfull, but it has the benefit of doing everything in parallel. The IDE just sucks!
I've been playing with an idea for very long time: Building an environment like FlowStone, just for one single target board...without using any OS. The environment would just output machine code - like FlowStone does - and then you transfere it into a micro via bootloader. The best pick for the micro right now would be an STM32F407, because it is cheap, everywhere available and powerfull. But for doing heavy processing, my target board would propably need 9 of those, communicating via SPI. 8 of them are for pure processing and one does I/O and slow parameter control. This additional one would also do the programming of the other 8. I've already done my research for doing this project, but it's lot of work.
Regarding that averaging: This FIR filtering isn't that good for implementing it in a micro at all. Let's say you average 16 times over 1024 sample with 32bit floating point, then you would need for this alone 512kB RAM. Even the modern Cortex M4 don't have that much internally, And external RAM is just slowing down the processing. So still the best way to do it, is using IIR: Leak a little bit from the sum array, then add the scaled down input. You don't even need division at all, even with floating point. A division by power of 2 value is for floating point just an integer substraction. You need some bit mangling, but it's doable.
PS: I've played a little bit with your program, really nice. I noticed, that you generate the filter response with a noise, like you do it in this audio analyzer project. I've your goal for this analyzer project is only plotting a IIR filter response you should have a closer look at my "Filter construction kit", there are components, that can create filter responses just from coefficients or poles/zeros.
viewtopic.php?f=3&t=1234
BTW: How do you like LTSPice If you don't know why I ask this, let me know.
If the dev. environment and the emulator wouldn't cost 5000$, then the SHARCs would be my preferred DSP IC plattform.
I've tested several other modern micros, but til now I haven't found a cheap one, that could be abused as cheap DSP. So far I tested dsPIC (completely useless), PIC32 (more usefull), AVR32 (nice, but special architecture), STM32F4 (perfect, but still not fast enough). I've also tried a small Spartan6 (built my own board for that)... This was just painfull, but it has the benefit of doing everything in parallel. The IDE just sucks!
I've been playing with an idea for very long time: Building an environment like FlowStone, just for one single target board...without using any OS. The environment would just output machine code - like FlowStone does - and then you transfere it into a micro via bootloader. The best pick for the micro right now would be an STM32F407, because it is cheap, everywhere available and powerfull. But for doing heavy processing, my target board would propably need 9 of those, communicating via SPI. 8 of them are for pure processing and one does I/O and slow parameter control. This additional one would also do the programming of the other 8. I've already done my research for doing this project, but it's lot of work.
Regarding that averaging: This FIR filtering isn't that good for implementing it in a micro at all. Let's say you average 16 times over 1024 sample with 32bit floating point, then you would need for this alone 512kB RAM. Even the modern Cortex M4 don't have that much internally, And external RAM is just slowing down the processing. So still the best way to do it, is using IIR: Leak a little bit from the sum array, then add the scaled down input. You don't even need division at all, even with floating point. A division by power of 2 value is for floating point just an integer substraction. You need some bit mangling, but it's doable.
PS: I've played a little bit with your program, really nice. I noticed, that you generate the filter response with a noise, like you do it in this audio analyzer project. I've your goal for this analyzer project is only plotting a IIR filter response you should have a closer look at my "Filter construction kit", there are components, that can create filter responses just from coefficients or poles/zeros.
viewtopic.php?f=3&t=1234
BTW: How do you like LTSPice If you don't know why I ask this, let me know.
-
MyCo - Posts: 718
- Joined: Tue Jul 13, 2010 12:33 pm
- Location: Germany
Re: FFT-based Audio Analyzer
Agree, especially their memory structure (many parallel buses) and a couple of features features like hardware support for circular buffers pointers. Please note however that the trend with ARM is to provide, between the cache memory and the CPU, a memory structure matching a DSP layout. In December 1995 ARM and DEC (Digital Equipment Corporation) introduced their Strong ARM SA-110S chip supposed to invade the world, powering the first mobile internet devices like some Apple Newton PDAs. One big year later, in 1997, Intel acquired the Strong ARM division. Nine years later, in 2006 when the PDA market was booming, when most PDAs were powered by a Strong ARM inside, Intel sold the Strong ARM division to Marvell for 600 million dollars. In 2013, Marvell is still there, making a lot of money. More recently, each time ARM is introducing a new model, ARM manages to deliver a Harvard (DSP-like) layout between the cache memory and the CPU registers. A typical example is the low cost Cortex-M4, like the STM32F407 you pointed out. You may want to use many Cortex-M4, in a "trench" DSP layout, stacking them in parallel. Like you pointed out, the issue with "trench" DSP is the interprocessor communication. So, like the InMos transputer, you may end up hooking a SPI at each chip edge (North, South, East, West) clocked as fast as possible, say CPUclk/4 which is (only) 25 MHz for a 100 MHz chip. This is much slower than the on-chip RAM access. On top of this you may need bulk DRAM (say a 1 Gigabyte DIMM coming from the PC world), asking for a dedicated controller. Distant RAM accesses will translate in bandwidth consumption on the SPI links. It's going to be expensive, difficult to manage, and less powerful than a x86 Atom. Now, putting the ARM architecture evolution in perspective, one can foresee that ARM will introduce more DSP-like arrangements between the cache memory and the CPU. I bet that in a few years, there will be an inexpensive quad-core ARM chip eating less than 5 watt, able to run TinyLinux, that you can hook a) on a QSPI Flash memory containing the OS to be booted, and b) on a PC3-6400 DRAM module (800 MHz). Each of the four cores will feature a 1 MB X Data cache memory, 1 MB Y Data cash memory, 1 MB Code cache memory, all operating at 3.2 GHz, executing a single-cycle 32x32 = 64 bit multiply-accumulation in parallel with a register data move, featuring a 32-bit range extension (96-bit accumulator), with plenty of DSP features like hardware support and instructions for dual data processing (the Real component, and the Imaginary component), circular buffer addressing, butterfly addressing, etc. All this in each core. Such chip will exist in two versions : with or without MMU. The one without MMU is going to be very inexpensive, say 5 dollars, capable of executing TinyLinux, kind of advanced embedded machine. The one featuring a MMU will sell for more, able to run a full blown Linux, iOS and Android, perfect for mobile devices, car infotainment and set top boxes. It is the right moment now, for porting Flowstone on Linux, for being able to run Flowstone as graphical DSP compiler on any Linux desktop. Actually, what's the equivalent of GDI+ on Linux, what's the equivalent of a VST host on Linux, and what's the equivalent of ASIO on Linux? One could recompile all existing Synthmaket/Flowstone plugins and executables in two different Linux blends : with MMU, and without MMU. This way, all TinyLinux boards equipped with an audio subsystem, shall run the plugins and the executables.MyCo wrote:I don't think that DSP ICs are deprecated...
That's a very good idea indeed, but you need to undertake this, knowing that as soon your system is up and running, you'l get thousands of people, asking how your "audio main_loop" can get installed as top priority task, on a TinyLinux target, and possibly also, on a full blown Linux, iOS and Android. Think about this! Without any hesitation, I'm willing to assist you in this.MyCo wrote:I've been playing with an idea for very long time: building an environment like FlowStone, just for one single target board...without using any OS.
Agree. For such application, the FIR advantages like a tailor-made magnitude and phase response are irrelevant. Any well-designed IIR will eat less processing power, and possibly provide better results. I'm thus suggesting a 4th-order Bessel IIR lowpass at 1/200 the video refresh rate, as initial guess. I'm attaching the IIR Lab application (the full one) now that you got accustomed to IIR Lab mini. Using IIR Lab, you can design up to four IIR BiQuads in series, and check how they behave in various fixed-point data width representation. It is up to you to manage the scaling and dynamic headroom within your DSP application, in a proper way. IIR Lab only tells you the max quality you can expect.MyCo wrote:Regarding that averaging: this FIR filtering isn't that good for implementing it in a micro at all.
I like it very much, and I'm using it for modeling digital audio IIR and FIR filters. See the attached .zip, containing the roots of a LTspice-based graphical audio compiler, for any kind of audio target, like a PIC32 based board, or a ARM Cortex-M4 based board. I'm exploiting the LTspice netlist.MyCo wrote:How do you like LTSPice?
- Implementation A : what we send to the microcontroller is the bare LTspice netlist in text format. Inside the microcontroller, we have an embedded application parsing the netlist, recognizing the part list names, having their prototype object code in Flash memory, instantiating them in a structured way, and establishing the signal flows (nothing else than pointers) compliant with the netlist.
- Implementation B : what we send to the microcontroller, is PIC32 or ARM Cortex-M4 precompiled executable code to be copied into RAM, to be executed without parsing it. This scheme allows an ultra-thin target, without any OS or application, only requiring a bootloader in essence.
What's your opinion on this? I'm attaching a .zip as preliminary illustration. Those are LTspice files to be installed into a same directory.
No idea, really. I'm intrigued. Please tell me ...MyCo wrote:How do you like LTSPice? If you don't know why I ask this, let me know.
One more thing I would like to undertake, is to illustrate the eight Alexander Potchinkov articles about audio DSP on the DSP56374, that got published by Elektor between May 2011 and Feb 2012. I'd like to simulate the examples using LTspice, and execute them using Flowstone. For the simulation to be exact, I need a way to accurately simulate the 24-bit fixed-point arithmetic, both on LTspice and Flowstone. Any idea welcome, for efficient 24-bit fixed-point arithmetic implementations.
Thanks,
Steph
- Attachments
-
- WM8731-Audio-Crossovers-Digital-XOs.zip
- (9.4 KiB) Downloaded 1664 times
-
- IIR_Lab.zip
- (228.74 KiB) Downloaded 1673 times
- steph_tsf
- Posts: 249
- Joined: Sun Aug 15, 2010 10:26 pm
Who is online
Users browsing this forum: No registered users and 14 guests