If you have a problem or need to report a bug please email : support@dsprobotics.com
There are 3 sections to this support area:
DOWNLOADS: access to product manuals, support files and drivers
HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects
USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here
NEW REGISTRATIONS - please contact us if you wish to register on the forum
Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright
optimization question - custom selectors
16 posts
• Page 1 of 2 • 1, 2
optimization question - custom selectors
I'm looking for optimized selector-like switcher for streams, since the native selector doesn't work well when there are multiple copies of it. At the moment, I'm using theme like this:
What would be faster way?
(mono4 compatible)
Also, from the past, I remember, there was some asm hack, that allows to "stop" some inputs from processing, like the selectors do. But I don't remember the details now.
- Code: Select all
streamin sw;
streamin in1;
streamin in2;
streamin in3;
streamout out1;
float a1,a2,a3,a4;
a1 = in1&(sw==0);
a2 = in2&(sw==1);
a3 = (-1*in2)&(sw==2);
a4 = in3&(sw==3);
out1 = a1+a2+a3+a4;
What would be faster way?
(mono4 compatible)
Also, from the past, I remember, there was some asm hack, that allows to "stop" some inputs from processing, like the selectors do. But I don't remember the details now.
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
Re: optimization question - custom selectors
You can save CPU by not storing a1 through a4 because they are actually not needed further. The following ASM code uses about half the CPU:
Further optimizations might be possible, depending on how often sw changes and if it is Mono4 or has the same value for all 4 channels. In the latter case you might do the (sw==0) comparisons in green. You might also consider hopping, but there is not really much more to gain anyway.
- Code: Select all
streamin sw;
streamin in1;
streamin in2;
streamin in3;
streamout out1;
float F0=0.0;
float F1=1.0;
float F2=2.0;
float F3=3.0;
movaps xmm0,F0; cmpps xmm0,sw,0; andps xmm0,in1; // in1&(sw==0)
movaps xmm1,F1; cmpps xmm1,sw,0; andps xmm1,in2; // sin2&(w==1)
movaps xmm2,F2; cmpps xmm2,sw,0; andps xmm2,in2; // in2&(sw==2)
movaps xmm3,F3; cmpps xmm3,sw,0; andps xmm3,in3; // in3&(sw==3)
addps xmm0,xmm1; subps xmm0,xmm2; addps xmm0,xmm3;
movaps out1,xmm0;
Further optimizations might be possible, depending on how often sw changes and if it is Mono4 or has the same value for all 4 channels. In the latter case you might do the (sw==0) comparisons in green. You might also consider hopping, but there is not really much more to gain anyway.
-
martinvicanek - Posts: 1328
- Joined: Sat Jun 22, 2013 8:28 pm
Re: optimization question - custom selectors
Oh, nice one. I was attempting something complicated with stream selectors a couple of weeks back, but gave it up because of some weird behaviour. I'll certainly try again now, U-turn the U-turn.
Typo with that "sin2" obviously - had me puzzled for a moment!
And presumably .. ' addps xmm0,xmm2 ' ? Or am I as confused as usual
H
Typo with that "sin2" obviously - had me puzzled for a moment!
And presumably .. ' addps xmm0,xmm2 ' ? Or am I as confused as usual
H
-
HughBanton - Posts: 265
- Joined: Sat Apr 12, 2008 3:10 pm
- Location: Evesham, Worcestershire
Re: optimization question - custom selectors
Thanks Martin,
This is for switching audio signals, full mono4 usage, so hoping or removing channels rather isn't an option.
And how such asm optimized code would look like for multiplexer? (unused outs = 0)
This is for switching audio signals, full mono4 usage, so hoping or removing channels rather isn't an option.
And how such asm optimized code would look like for multiplexer? (unused outs = 0)
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
Re: optimization question - custom selectors
HughBanton wrote:And presumably .. ' addps xmm0,xmm2 ' ? Or am I as confused as usual
Normally yes, but Martin made the code behave like in the very first example, so it works the same way as that.
- adamszabo
- Posts: 667
- Joined: Sun Jul 11, 2010 7:21 am
Re: optimization question - custom selectors
A simple multiplexer would go like this:
If the switch input does not change very often you could hop the compares, however the CPU gain is only marginal:
- Code: Select all
// inputs
streamin switch;
streamin in;
// outputs
streamout out0;
streamout out1;
streamout out2;
streamout out3;
// constants
float F0=0;
float F1=1;
float F2=2;
float F3=3;
// code
movaps xmm6,switch;
movaps xmm7,in;
movaps xmm0,F0; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out0,xmm0;
movaps xmm1,F1; cmpps xmm1,xmm6,0; andps xmm1,xmm7; movaps out1,xmm1;
movaps xmm2,F2; cmpps xmm2,xmm6,0; andps xmm2,xmm7; movaps out2,xmm2;
movaps xmm3,F3; cmpps xmm3,xmm6,0; andps xmm3,xmm7; movaps out3,xmm3;
If the switch input does not change very often you could hop the compares, however the CPU gain is only marginal:
- Code: Select all
// inputs
streamin switch;
streamin in;
// outputs
streamout out0;
streamout out1;
streamout out2;
streamout out3;
// constants
float F0=0;
float F1=1;
float F2=2;
float F3=3;
// masks
int mask0=0;
int mask1=0;
int mask2=0;
int mask3=0;
// code
mov eax,ecx; and eax,63; cmp eax,0; jnz skipCompares;
movaps xmm0,F0; cmpps xmm0,switch,0; movaps mask0,xmm0;
movaps xmm1,F1; cmpps xmm1,switch,0; movaps mask1,xmm1;
movaps xmm2,F2; cmpps xmm2,switch,0; movaps mask2,xmm2;
movaps xmm3,F3; cmpps xmm3,switch,0; movaps mask3,xmm3;
skipCompares:
movaps xmm7,in;
movaps xmm0,mask0; andps xmm0,xmm7; movaps out0,xmm0;
movaps xmm1,mask1; andps xmm1,xmm7; movaps out1,xmm1;
movaps xmm2,mask2; andps xmm2,xmm7; movaps out2,xmm2;
movaps xmm3,mask3; andps xmm3,xmm7; movaps out3,xmm3;
-
martinvicanek - Posts: 1328
- Joined: Sat Jun 22, 2013 8:28 pm
Re: optimization question - custom selectors
Thanks again.
I admit, my domain is rather in wiring green relationships, than messing with asm code.
I admit, my domain is rather in wiring green relationships, than messing with asm code.
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
Re: optimization question - custom selectors
No need to use xmm1, 2 or 3 in the simple multiplexer I think ...
.. may or may not matter in practice, but means you could easily turn this into a super-efficient multipole mpx using the spare xmm's. (Should you ever need such a device!)
Also note that any of these can generally be used in stage0 only, if you only need a one-off note-on lookup of something.
H
- Code: Select all
// 1-pole, 4-way Multiplexer
streamin switch, in;
streamout out0, out1, out2, out3;
float F0=0, F1=1, F2=2, F3=3;
movaps xmm6,switch;
movaps xmm7,in;
movaps xmm0,F0; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out0,xmm0; //(sw==0)
movaps xmm0,F1; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out1,xmm0; //(sw==1)
movaps xmm0,F2; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out2,xmm0; //(sw==2)
movaps xmm0,F3; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out3,xmm0; //(sw==3)
.. may or may not matter in practice, but means you could easily turn this into a super-efficient multipole mpx using the spare xmm's. (Should you ever need such a device!)
Also note that any of these can generally be used in stage0 only, if you only need a one-off note-on lookup of something.
H
-
HughBanton - Posts: 265
- Joined: Sat Apr 12, 2008 3:10 pm
- Location: Evesham, Worcestershire
Re: optimization question - custom selectors
Since I just lerrrv messing with asm code, I just came up with this simplification for the Selector
- OR instead of ADD
... seems to work OK?
H
- OR instead of ADD
- Code: Select all
//4-in, 1-out selector
streamin sw, in0, in1, in2, in3;
streamout out;
float F0=0, F1=1, F2=2, F3=3;
movaps xmm7,sw; //xmm7=switch
movaps xmm0,xmm7; cmpps xmm0,F0,0; andps xmm0,in0; movaps xmm1,xmm0; //(sw==0)
movaps xmm0,xmm7; cmpps xmm0,F1,0; andps xmm0,in1; orps xmm1,xmm0; //(sw==1)
movaps xmm0,xmm7; cmpps xmm0,F2,0; andps xmm0,in2; orps xmm1,xmm0; //(sw==2)
movaps xmm0,xmm7; cmpps xmm0,F3,0; andps xmm0,in3; orps xmm1,xmm0; //(sw==3)
movaps out,xmm1;
... seems to work OK?
H
-
HughBanton - Posts: 265
- Joined: Sat Apr 12, 2008 3:10 pm
- Location: Evesham, Worcestershire
Re: optimization question - custom selectors
HughBanton wrote:No need to use xmm1, 2 or 3 in the simple multiplexer I think ...
Correct, you can spare xmm1 etc. for something else if you need to. On the other hand, I like to use 4 lanes if I can afford so. If anything, it might help the processor to do things in parallel.
HughBanton wrote:[...] simplification for the Selector- OR instead of ADD
Yes! For a plain selector OR will be somewhat lighter on CPU than ADD. I used ADD and SUB only to comply with the OP's requirement for sw==2.
-
martinvicanek - Posts: 1328
- Joined: Sat Jun 22, 2013 8:28 pm
16 posts
• Page 1 of 2 • 1, 2
Who is online
Users browsing this forum: Google [Bot] and 54 guests