Support

If you have a problem or need to report a bug please email : support@dsprobotics.com

There are 3 sections to this support area:

DOWNLOADS: access to product manuals, support files and drivers

HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects

USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here

NEW REGISTRATIONS - please contact us if you wish to register on the forum

Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright

Fast Stream Array Access

Post any examples or modules that you want to share here

Fast Stream Array Access

Postby martinvicanek » Sun Oct 19, 2014 7:27 pm

Following KG's excellent ASM posts over at FS Guru I stumbled over a possibility to considerably cut down CPU load for stream array access. As an example I am attaching a low-CPU delay (integer and interpolated variants). The design borrows from Trogz Toolz, he has some smart and highly optimized stuff there. Hard to believe there was still a factor of 3(!) of optimization potential to gain. :shock:

Boy this opens up possibilities: fast lookup tables, fast wavetable oscillators, you name it.
Attachments
fastDelay.fsm
(15.65 KiB) Downloaded 1146 times
User avatar
martinvicanek
 
Posts: 1328
Joined: Sat Jun 22, 2013 8:28 pm

Re: Fast Stream Array Access

Postby tester » Sun Oct 19, 2014 7:51 pm

It's great news Martin. Another set of "impossible" (since the SM age) problems will be solved. I can't wait to see it. :-)
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
tester
 
Posts: 1786
Joined: Wed Jan 18, 2012 10:52 pm
Location: Poland, internet

Re: Fast Stream Array Access

Postby Exo » Sun Oct 19, 2014 8:12 pm

Excellent nice work, yes stream arrays have always been very slow because of the need to unpack the channels.
This really is a game changer because a huge bottle neck has been removed :)

Gonna have a look see i can optimize a few other things with this :)
Flowstone Guru. Blog and download site for Flowstone.
Best VST Plugins. Initial Audio.
Exo
 
Posts: 426
Joined: Wed Aug 04, 2010 8:58 pm
Location: UK

Re: Fast Stream Array Access

Postby Exo » Sun Oct 19, 2014 8:27 pm

Hi Martin, do you think it is possible to do this trick with this code?

Code: Select all
polyintin addr;
polyintin max;
streamin index;
streamout out;

int zero = 0;
int temp = 0;
stage2;
mov eax,addr[0];
cmp eax,0;
jz bypass;

  cvtps2dq xmm0,index;
  maxps xmm0,zero;
  minps xmm0,max;
  pslld xmm0,2;
  paddd xmm0,addr;
  movaps temp,xmm0;
 
  //Read
  mov eax,temp[0];
  fld [eax] ; fstp out[0];

  mov eax,temp[1];
  fld [eax] ; fstp out[1];
 
  mov eax,temp[2];
  fld [eax] ; fstp out[2];

  mov eax,temp[3];
  fld [eax] ; fstp out[3];
   
bypass:


This reads directly from the address of a mem, instead of from the mem input or an array. Where eax is the actually memory address and we read the actual value by doing [eax] . I know it can work easy with the mem input because it is copied into a standard code array.
Flowstone Guru. Blog and download site for Flowstone.
Best VST Plugins. Initial Audio.
Exo
 
Posts: 426
Joined: Wed Aug 04, 2010 8:58 pm
Location: UK

Re: Fast Stream Array Access

Postby KG_is_back » Sun Oct 19, 2014 8:37 pm

It should be possible, as I have posted on the FS guru. http://flowstone.guru/blog/how-to-use-assembler-part-3-alu-fpu-and-array-management/ just after Martins example post. I didn't tested it though. In that particular case the problem is a little bit more complicated - you need to read values that are in different channels and put them into desired channel. Only way to do that is code branching to pick the right shufps action.
Another concern is what happens when array is not 4*N size (in samples), because with the last values you would also read data outside the mem when using movaps (which works on 16bit aligned data). That may or may not crash. Further testing has to be done...
KG_is_back
 
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: Fast Stream Array Access

Postby martinvicanek » Sun Oct 19, 2014 9:28 pm

Exo wrote:Hi Martin, do you think it is possible to do this trick with this code? [...]

Hehe, that's what I am after as well. :mrgreen: So far I have only been able to do this with arrays declared in the same ASM module, though. KG has me lost, I'm curious what he will be pulling out his sleeve next. :ugeek: :ugeek:
Last edited by martinvicanek on Sun Oct 19, 2014 9:46 pm, edited 1 time in total.
User avatar
martinvicanek
 
Posts: 1328
Joined: Sat Jun 22, 2013 8:28 pm

Re: Fast Stream Array Access

Postby KG_is_back » Sun Oct 19, 2014 9:36 pm

Nope... it seems the movaps works only with data that was declared as SSE array - which mems are not the case.
KG_is_back
 
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: Fast Stream Array Access

Postby martinvicanek » Sun Oct 19, 2014 9:44 pm

Okay, that explains it. So could we declare an SSE array and copy the external mem to it in stage0 (basically what mem input in 3.0.5 does)? Then we'd have fast movaps/shufps access in stage2.
User avatar
martinvicanek
 
Posts: 1328
Joined: Sat Jun 22, 2013 8:28 pm

Re: Fast Stream Array Access

Postby KG_is_back » Sun Oct 19, 2014 9:47 pm

That should do the trick.

BTW here is the code I came up with:
Code: Select all
streamin addr;
streamin max;
streamin index;
streamout out;

int zero = 0;
int temp = 0;
int temp2=0;
int I0=0;
int I1=1;
int I2=2;
int I3=3;
int In4=-4; //this is binary mask that makes last two bits zero
           //that means it rounds down to nearest multiple of 4
int I3=3; //this extracts only first two bits. It is actually N%4
float array[4];
stage2;
mov eax,addr[0];
cmp eax,0;
jz bypass;

  cvtps2dq xmm0,index;
  maxps xmm0,zero;
  minps xmm0,max;
  movaps xmm1,xmm0;
  andps xmm0,In4;
  pslld xmm0,2;
  paddd xmm0,addr; //this is address for 16bit aligned read
  movaps temp,xmm0;
  andps xmm1,I3; //this will be used to shuffle the right sample into output
  movaps temp2,xmm1;
  pslld xmm1,4;
  //read for channel1 and store into array
  mov eax,temp[0];
  movaps xmm2,[eax];
  movd eax,xmm1;
  movaps array[eax],xmm2;
 
  //extract values from array and shuffle each value into index[0]
  mov eax,0;
  movaps xmm0,array[eax]; //xmm0 may contain desired value in ch(0) - no shufling needed
  movaps xmm4,I0;
  cmpps xmm4,temp2,0; //true if index%4==0
  andps xmm1,xmm4;
 
  add eax,16;
  movaps xmm1,array[eax]; //xmm1 may cntn desired value in ch(1) - shuffle it to 0
  shufps xmm1,xmm1,1;
  movaps xmm4,I1;
  cmpps xmm4,temp2,0; //true if index%4==1
  andps xmm1,xmm4;
 
  add eax,16;
  movaps xmm2,array[eax]; //...
  shufps xmm2,xmm2,2;
  movaps xmm4,I2;
  cmpps xmm4,temp2,0; //true if index%4==2
  andps xmm2,xmm4;
 
  add eax,16;
  movaps xmm3,array[eax];
  shufps xmm3,xmm3,3;
  movaps xmm4,I3;
  cmpps xmm4,temp2,0; //true if index%4==3
  andps xmm3,xmm4;
 
  orps xmm0,xmm1;
  orps xmm0,xmm2;
  orps xmm0,xmm3;
  movaps out,xmm0;
 
bypass:


it does not work because of the movaps xmm0,[eax] but replacing that with array should fix it.
KG_is_back
 
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: Fast Stream Array Access

Postby Exo » Sun Oct 19, 2014 10:00 pm

KG_is_back wrote:it does not work because of the movaps xmm0,[eax] but replacing that with array should fix it.


Yes movaps xmm0,[eax]; is the first thing I tried. Shame really. Should it work?

I was going to ask you guys is there any opcodes you really want/need? If you can give clear examples of benefits of certain opcodes I could get on to Malc to add them (I'm usually quite good at getting him to add little things if I give him a clear example and make it simple for him).

Maybe topic for another thread?
Flowstone Guru. Blog and download site for Flowstone.
Best VST Plugins. Initial Audio.
Exo
 
Posts: 426
Joined: Wed Aug 04, 2010 8:58 pm
Location: UK

Next

Return to User Examples

Who is online

Users browsing this forum: No registered users and 60 guests