Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have used mclapply quite a bit and love it. It is a memory hog but very convenient. Alas, now I have a different problem that is not simply embarrassingly parallel.
Can R (esp Unix R) employ multiple CPU cores on a single computer, sharing the same memory space, without resorting to copying full OS processes, so that
there is minimal process overhead; and
modification of global data by one CPU are immediately available to other CPUs?
If yes, can R lock some memory just like files (flock)?
I suspect that the answer is no and learning this definitively would be very useful. If the answer is yes, please point me the right way.
regards,
/iaw
You can use the Rdsm package to use distributed shared memory parallelism, i.e. multiple R processes using the same memory space.
Besides that, you can employ multi-threaded BLAS/LAPACK (e.g. OpenBLAS or Intel MKL) and you can use C/C++ (and probably Fortran) code together with OpenMP. See assembling a matrix from diagonal slices with mclapply or %dopar%, like Matrix::bandSparse for an example.
Have you take a look at Microsoft's R Open (available for Linux), with the custom Math Kernel Library (MKL).
I've seen very good performance improvements without rewriting code.
https://mran.microsoft.com/documents/rro/multithread
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
this is more of an hypothetical question, but might have great consequences. Lot of us in Modelica community are dealing with large scale systems with expensive simulation times. This is usually not an obstacle for bugfixing and development, but speeding up the simulation might allow for better and faster optimizations.
Recently I came across Modia possibilities, claiming to have superb numerical solvers, achieving better simulation times than Dymola, a state-of-the-art Modelica compiler. The syntax seemed to cover all important bits. Recreating large scale component models in Modia is unfeasible, but what about automatically translating the flattenized Modelica to Modia? Is that realistic? Would that provide a speed up? Has anyone tried before? I have searched for some
This might also hopefully improve integration of Modelica models and postprocesssing / identificaiton tooling within one language, instead of using FMI or invoking a separate executable.
Thanks for any suggestions.
For those interested, we might as well start developing this.
We in the Modia team agrees that the modeling know how in Modelica libraries must be reused. So we are working on a translator (brief details given in https://ep.liu.se/ecp/157/060/ecp19157060.pdf) from Modelica to Modia. The plan is to initially provide translated versions of Modelica.Blocks, Modelica.Electrical.Analog and Modelica.Mechanics together with Modia.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Workflow for statistical analysis and report writing
This question had a lot of good answers, but as pointed out, they are outdated.
I mostly work on scripts that will probably never be re-run after a paper has been published. Are packages worth the trouble in cases where I don't need to redistribute the codes to the world for easy access? What about the organization of data? How can makefiles be used?
I think if you use the basics laid out by Josh Reichs in that post you provided, making sure that you create a directory to save everything in, then you are good to go.
My added step for the modern world would be to product a markdown report in one of the available formats.
rMarkdown- which you can run right out of rStudio
rNotebooks - which
you can run right out of rStudio
Jupyter Notebooks - which you can
run out of Anaconda or Jupyter with some easy tweaking.
The beauty of these three report systems is that you get to integrate the thought process, code, data, graphs and visualizations in a single spot.
So, if as you say no one will ever re-run your code, then they will at least see it to appease suspicions. Also, if they do choose to repeat your process, they just follow your logic and process in a duplicate document (especially easy with the notebooks)
As for using packages. That is a more complex question. If the packages are well orchestrated and save you a ton of time cleaning, sorting and structuring data, USE THEM! Time is money. If the things you are using them for are simple, straight forward, just as easy to program yourself and recognizable by those who would jury your paper, it probably does not matter either way.
The one place where I feel it matters is complex processes that are difficult (read that as easy to do wrong yourself) and have been implemented, tested and vetted by prior researchers.
Using those packages garners credibility and makes it easier for peers to accept your methods at face value. But if you are on the cutting edge..you should feel free to slice away. Maybe make a package of your own!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am working on optimization of the ADAS algorithm which are in c++.
I want to optimize that algorithm using OpenCL tech.
I have gone through some basic doc of OpenCL.
I came to know the kernel code is written in C which is doing the optimization.
But I want to know how internally kernel is splitting the work into different workitems ?
How is the single statement is doing for loop task.
Please share your knowledge with me on OpenCL.
Tr,
Ashwin
First of all C code is not doing the optimization. Parallelism is. Optimization with OpenCL only works on algorithms that can heavily utilize parallelism. If you are using OpenCL like regular C you are probably slowing your algorithm down. This is because it takes lot of time to move data between host and device.
Secondly kernel is not splitting the work into different workitems. Instead programmer is splitting it by launching multiple kernels to run the same kernel code in parallel. You can set how many kernels you want launch by setting the global_work_size of the clEnqueueNDRangeKernel.
If you have a for loop where iterations are not dependent on each other, it could be a good part to optimize with OpenCL. It is also good if there is quite a lot calculations in that loop but not much data going into it and out from it. In that case you make the inner part of the loop into OpenCL kernel and launch it with a global_work_size that is equivalent to the for loop's total loop count.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am working on an analysis of big data, which is based on social network data combined with data on the social network users from other internal sources, such as a CRM database.
I realize there are a lot of good memory profiling, CPU benchmarking, and HPC packages and code snippets out there. I'm currently using the following:
system.time() to measure the current CPU usage of my functions
Rprof(tf <- "rprof.log", memory.profiling=TRUE) to profile memory
usage
Rprofmem("Rprofmem.out", threshold = 10485760) to log objects that
exceed 10MB
require(parallel) to give me multicore and parallel functionality
for use in my functions
source('http://rbenchmark.googlecode.com/svn/trunk/benchmark.R') to
benchmark CPU usage differences in single core and parallel modes
sort( sapply(ls(),function(x){format(object.size(get(x)), units = "Mb")})) to list object sizes
print(object.size(x=lapply(ls(), get)), units="Mb") to give me total memory used at the completion of my script
The tools above give me lots of good data points and I know that many more tools exist to provide related information as well as to minimize memory use and make better use of HPC/cluster technologies, such as those mentioned in this StackOverflow post and from CRAN's HPC task view. However, I don't know a straighforward way to synthesize this information and forecast my CPU, RAM and/or storage memory requirements as the size of my input data increases over time from increased usage of the social network that I'm analyzing.
Can anyone give examples or make recommendations on how to do this? For instance, is it possible to make a chart or a regression model or something like that that shows how many CPU cores I will need as the size of my input data increases, holding constant CPU speed and amount of time the scripts should take to complete?
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've just finished and tested the core of a common lisp application and want to optimize it for speed now. It works with SBCL and makes use of CLOS.
Could someone outline the way to optimize my code for speed?
Where will I have to start? Will I just have to provide some global declaration or will I have to blow up my code with type information for each binding? Is there a way to find out which parts of my code could be compiled better with further type information?
The programm makes heavy use of a single 1-dimensional array 0..119 where it shifts CLOS-Instances around.
Thank you you Advance!
It's not great to optimize in a vacuum, because there's no limit to the ugliness you can introduce to make things some fraction of a percent faster.
If it's not fast enough, it's helpful to define what success means so you know when to stop.
With that in mind, a good first pass is to run your project under the profiler (sb-sprof) to get an idea of where the time is spent. If it's in generic arithmetic, it can help to judiciously use modular arithmetic in inner loops. If it's in CLOS stuff, it might possibly help to switch to structures for key bits of data. Whatever's the most wasteful will direct where to spend your effort in optimization.
I think it could be helpful if, after profiling, you post a followup question along the lines of "A lot of my program's time is spent in <foo>, how do I make it faster?"