If you have a problem or need to report a bug please email : support@dsprobotics.com
There are 3 sections to this support area:
DOWNLOADS: access to product manuals, support files and drivers
HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects
USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here
NEW REGISTRATIONS - please contact us if you wish to register on the forum
Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright
Fast Stream Array Access
29 posts
• Page 1 of 3 • 1, 2, 3
Fast Stream Array Access
Following KG's excellent ASM posts over at FS Guru I stumbled over a possibility to considerably cut down CPU load for stream array access. As an example I am attaching a low-CPU delay (integer and interpolated variants). The design borrows from Trogz Toolz, he has some smart and highly optimized stuff there. Hard to believe there was still a factor of 3(!) of optimization potential to gain.
Boy this opens up possibilities: fast lookup tables, fast wavetable oscillators, you name it.
Boy this opens up possibilities: fast lookup tables, fast wavetable oscillators, you name it.
- Attachments
-
- fastDelay.fsm
- (15.65 KiB) Downloaded 1145 times
-
martinvicanek - Posts: 1328
- Joined: Sat Jun 22, 2013 8:28 pm
Re: Fast Stream Array Access
It's great news Martin. Another set of "impossible" (since the SM age) problems will be solved. I can't wait to see it.
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
Re: Fast Stream Array Access
Excellent nice work, yes stream arrays have always been very slow because of the need to unpack the channels.
This really is a game changer because a huge bottle neck has been removed
Gonna have a look see i can optimize a few other things with this
This really is a game changer because a huge bottle neck has been removed
Gonna have a look see i can optimize a few other things with this
- Exo
- Posts: 426
- Joined: Wed Aug 04, 2010 8:58 pm
- Location: UK
Re: Fast Stream Array Access
Hi Martin, do you think it is possible to do this trick with this code?
This reads directly from the address of a mem, instead of from the mem input or an array. Where eax is the actually memory address and we read the actual value by doing [eax] . I know it can work easy with the mem input because it is copied into a standard code array.
- Code: Select all
polyintin addr;
polyintin max;
streamin index;
streamout out;
int zero = 0;
int temp = 0;
stage2;
mov eax,addr[0];
cmp eax,0;
jz bypass;
cvtps2dq xmm0,index;
maxps xmm0,zero;
minps xmm0,max;
pslld xmm0,2;
paddd xmm0,addr;
movaps temp,xmm0;
//Read
mov eax,temp[0];
fld [eax] ; fstp out[0];
mov eax,temp[1];
fld [eax] ; fstp out[1];
mov eax,temp[2];
fld [eax] ; fstp out[2];
mov eax,temp[3];
fld [eax] ; fstp out[3];
bypass:
This reads directly from the address of a mem, instead of from the mem input or an array. Where eax is the actually memory address and we read the actual value by doing [eax] . I know it can work easy with the mem input because it is copied into a standard code array.
- Exo
- Posts: 426
- Joined: Wed Aug 04, 2010 8:58 pm
- Location: UK
Re: Fast Stream Array Access
It should be possible, as I have posted on the FS guru. http://flowstone.guru/blog/how-to-use-assembler-part-3-alu-fpu-and-array-management/ just after Martins example post. I didn't tested it though. In that particular case the problem is a little bit more complicated - you need to read values that are in different channels and put them into desired channel. Only way to do that is code branching to pick the right shufps action.
Another concern is what happens when array is not 4*N size (in samples), because with the last values you would also read data outside the mem when using movaps (which works on 16bit aligned data). That may or may not crash. Further testing has to be done...
Another concern is what happens when array is not 4*N size (in samples), because with the last values you would also read data outside the mem when using movaps (which works on 16bit aligned data). That may or may not crash. Further testing has to be done...
- KG_is_back
- Posts: 1196
- Joined: Tue Oct 22, 2013 5:43 pm
- Location: Slovakia
Re: Fast Stream Array Access
Exo wrote:Hi Martin, do you think it is possible to do this trick with this code? [...]
Hehe, that's what I am after as well. So far I have only been able to do this with arrays declared in the same ASM module, though. KG has me lost, I'm curious what he will be pulling out his sleeve next.
Last edited by martinvicanek on Sun Oct 19, 2014 9:46 pm, edited 1 time in total.
-
martinvicanek - Posts: 1328
- Joined: Sat Jun 22, 2013 8:28 pm
Re: Fast Stream Array Access
Nope... it seems the movaps works only with data that was declared as SSE array - which mems are not the case.
- KG_is_back
- Posts: 1196
- Joined: Tue Oct 22, 2013 5:43 pm
- Location: Slovakia
Re: Fast Stream Array Access
Okay, that explains it. So could we declare an SSE array and copy the external mem to it in stage0 (basically what mem input in 3.0.5 does)? Then we'd have fast movaps/shufps access in stage2.
-
martinvicanek - Posts: 1328
- Joined: Sat Jun 22, 2013 8:28 pm
Re: Fast Stream Array Access
That should do the trick.
BTW here is the code I came up with:
it does not work because of the movaps xmm0,[eax] but replacing that with array should fix it.
BTW here is the code I came up with:
- Code: Select all
streamin addr;
streamin max;
streamin index;
streamout out;
int zero = 0;
int temp = 0;
int temp2=0;
int I0=0;
int I1=1;
int I2=2;
int I3=3;
int In4=-4; //this is binary mask that makes last two bits zero
//that means it rounds down to nearest multiple of 4
int I3=3; //this extracts only first two bits. It is actually N%4
float array[4];
stage2;
mov eax,addr[0];
cmp eax,0;
jz bypass;
cvtps2dq xmm0,index;
maxps xmm0,zero;
minps xmm0,max;
movaps xmm1,xmm0;
andps xmm0,In4;
pslld xmm0,2;
paddd xmm0,addr; //this is address for 16bit aligned read
movaps temp,xmm0;
andps xmm1,I3; //this will be used to shuffle the right sample into output
movaps temp2,xmm1;
pslld xmm1,4;
//read for channel1 and store into array
mov eax,temp[0];
movaps xmm2,[eax];
movd eax,xmm1;
movaps array[eax],xmm2;
//extract values from array and shuffle each value into index[0]
mov eax,0;
movaps xmm0,array[eax]; //xmm0 may contain desired value in ch(0) - no shufling needed
movaps xmm4,I0;
cmpps xmm4,temp2,0; //true if index%4==0
andps xmm1,xmm4;
add eax,16;
movaps xmm1,array[eax]; //xmm1 may cntn desired value in ch(1) - shuffle it to 0
shufps xmm1,xmm1,1;
movaps xmm4,I1;
cmpps xmm4,temp2,0; //true if index%4==1
andps xmm1,xmm4;
add eax,16;
movaps xmm2,array[eax]; //...
shufps xmm2,xmm2,2;
movaps xmm4,I2;
cmpps xmm4,temp2,0; //true if index%4==2
andps xmm2,xmm4;
add eax,16;
movaps xmm3,array[eax];
shufps xmm3,xmm3,3;
movaps xmm4,I3;
cmpps xmm4,temp2,0; //true if index%4==3
andps xmm3,xmm4;
orps xmm0,xmm1;
orps xmm0,xmm2;
orps xmm0,xmm3;
movaps out,xmm0;
bypass:
it does not work because of the movaps xmm0,[eax] but replacing that with array should fix it.
- KG_is_back
- Posts: 1196
- Joined: Tue Oct 22, 2013 5:43 pm
- Location: Slovakia
Re: Fast Stream Array Access
KG_is_back wrote:it does not work because of the movaps xmm0,[eax] but replacing that with array should fix it.
Yes movaps xmm0,[eax]; is the first thing I tried. Shame really. Should it work?
I was going to ask you guys is there any opcodes you really want/need? If you can give clear examples of benefits of certain opcodes I could get on to Malc to add them (I'm usually quite good at getting him to add little things if I give him a clear example and make it simple for him).
Maybe topic for another thread?
- Exo
- Posts: 426
- Joined: Wed Aug 04, 2010 8:58 pm
- Location: UK
29 posts
• Page 1 of 3 • 1, 2, 3
Who is online
Users browsing this forum: No registered users and 30 guests