Support

If you have a problem or need to report a bug please email : support@dsprobotics.com

There are 3 sections to this support area:

DOWNLOADS: access to product manuals, support files and drivers

HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects

USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here

NEW REGISTRATIONS - please contact us if you wish to register on the forum

Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright

Cache miss - big CPU eater

DSP related issues, mathematics, processing and techniques

Cache miss - big CPU eater

Postby KG_is_back » Thu Nov 06, 2014 8:30 pm

In order to speed up RAM access, your processor has so called memory cache (also called Near Memory or CPU memory). Processor analyzes your code on a run and attempts to predict which variables will be used - loads those variables from RAM to cache (prefetches them). Reading/writing variables from/to cache is several hundred times faster than directly from RAM. Fortunately, variables in flowstone are automatically aligned in a way, that maximizes the cache efficiency.

Problem is with arrays. cache has size to load only few thousand values (samples). When you use big arrays and wave tables in your schematic, your cache cannot load them whole - your processor tries to predict which parts of the array will be used and prefetches them. When you attempt to load value that wasn't prefetched there is massive CPU penalty for reading it form main RAM.

This schematic illustrates that... The code component has a part that reads from an array (the array in the example is empty, but your processor doesn't know that). You have an option to switch between two different index calculations - first one is a regular ramp (a very predictable pattern) while second one is a random number generator (a very unpredictable pattern). While with the ramp the processor can easily predict and prefetch the right memory segment with random indexes it fails to do that. Result is that for example on my machine the random indexing takes TWICE as much CPU as the regular ramp.

Note that by switching nothing is changed in the code - nothing gets bypassed = the code runs in the very same way - only thing that changes is the pattern at which the index changes.
Also the schematic uses Code Speed Tester - have a look at its description to use it correctly.

It is certainly another thing to consider when optimizing your code.
Attachments
cacheMiss.fsm
(61.94 KiB) Downloaded 1382 times
KG_is_back
 
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Return to DSP

Who is online

Users browsing this forum: No registered users and 40 guests