If you have a problem or need to report a bug please email : support@dsprobotics.com
There are 3 sections to this support area:
DOWNLOADS: access to product manuals, support files and drivers
HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects
USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here
NEW REGISTRATIONS - please contact us if you wish to register on the forum
Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright
whats faster for repacking mono4 stream?
5 posts
• Page 1 of 1
whats faster for repacking mono4 stream?
quick asm question again,
for optimizing my schematics i do a lot of repacking mono4 streams, after using just 2 channels most of the time (stereo) i often pack 2 stereo signals (from 2 mono4 nodes) into one mono4, instead of using unpacking and packing again i normally always used this:
but i also could use this:
which i think should be faster? am i right that the shufps is faster?
for optimizing my schematics i do a lot of repacking mono4 streams, after using just 2 channels most of the time (stereo) i often pack 2 stereo signals (from 2 mono4 nodes) into one mono4, instead of using unpacking and packing again i normally always used this:
- Code: Select all
fld in1[0]; fstp out1n2[0];
fld in1[1]; fstp out1n2[1];
fld in2[0]; fstp out1n2[2];
fld in2[1]; fstp out1n2[3];
but i also could use this:
- Code: Select all
movaps xmm0,in1;
movaps xmm1,in2;
shufps xmm0,xmm1,68;
movaps out,xmm0;
which i think should be faster? am i right that the shufps is faster?
-
Nubeat7 - Posts: 1347
- Joined: Sat Apr 14, 2012 9:59 am
- Location: Vienna
Re: whats faster for repacking mono4 stream?
The shufps takes only one cycle on most CPUs, In the first example you read four times from memory and write 4 times to memory, While in example 2 you read twice and read once, so it's definitely faster, as far as I can tell.
Have a look at the Opcode reference I've made recently and also you can easily use Code Speed tester to inspect the actual CPU load.
Have a look at the Opcode reference I've made recently and also you can easily use Code Speed tester to inspect the actual CPU load.
- KG_is_back
- Posts: 1196
- Joined: Tue Oct 22, 2013 5:43 pm
- Location: Slovakia
Re: whats faster for repacking mono4 stream?
Yes, shufps is much faster. Also avoid using the stock Pack and Unpack modules as they essentially use fld and fstp. The worst example of "Verschlimmbesserung" (sorry about the German term) is the stock Stereo Clipper, where the Pack/Unpack modules overhead outweighs by far any potential CPU savings.
-
martinvicanek - Posts: 1328
- Joined: Sat Jun 22, 2013 8:28 pm
Re: whats faster for repacking mono4 stream?
thanks martin for the confirmation
but how to do it the other way around without fld / fstp
so if i have one mono4 input (2 x stereo) and i want to route them into 2 mono4 streams again
couldn't figure out a way with shufps?
but how to do it the other way around without fld / fstp
so if i have one mono4 input (2 x stereo) and i want to route them into 2 mono4 streams again
- Code: Select all
fld in[0]; fstp out1[0];
fld in[1]; fstp out1[1];
fld in[2]; fstp out2[0];
fld in[3]; fstp out2[1];
couldn't figure out a way with shufps?
-
Nubeat7 - Posts: 1347
- Joined: Sat Apr 14, 2012 9:59 am
- Location: Vienna
Re: whats faster for repacking mono4 stream?
Like this?
Or, depending on what you do with the two outputs further on, you might even drop the masking:
- Code: Select all
streamin pack;
streamout out0;
streamout out1;
int true=-1; // binary 11111111111111111111111111111111
float mask0=01;
stage0;
fld true[0]; fst mask01[0]; fstp mask01[1];
stage 2;
movaps xmm0,pack;
movaps xmm1,xmm0;
shufps xmm1,xmm1,78; // 0123 -> 2301 (23 are first)
andps xmm0,mask01;
movaps out0,xmm0;
andps xmm1,mask01;
movaps out1,xmm1;
Or, depending on what you do with the two outputs further on, you might even drop the masking:
- Code: Select all
streamin pack;
streamout out0;
streamout out1;
movaps xmm0,pack;
movaps out0,xmm0;
shufps xmm0,xmm0,78; // 0123 -> 2301 (23 are first)
movaps out1,xmm0;
-
martinvicanek - Posts: 1328
- Joined: Sat Jun 22, 2013 8:28 pm
5 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 28 guests