I executed command make after correctly passed all steps, and it takes to long to build all that libraries and tools, is it possible to shorten that time? For example to throw out uneccessary libraries and tools? It looks that make time goes to infinity...
Try running make with parallel processes:
make -jN
Where N is the number of parallel processes you want, perhaps the number of CPUs of your computer
Related
I've been learning to parallelize code in R using the parallel package, and specifically, the mclapply() function with 14 cores.
Something I noticed, just from a few runs of code, is that repeat calls of mclapply() (with the same arguments and same number of cores used) take significantly different lengths of time. For example, the first run took 18s, the next run took 23s, and the next one took 34s when I did them back to back to back (on the same input). So I waited a minute, ran the code again, and it was back down to taking 18s.
Is there some equivalent of "the computer needs a second to cool down" after running the code, which would mean that running separate calls of mclapply() back to back might take longer and longer amounts of time, but waiting for a minute or so and then running mclapply() again gets it back to normal?
I don't have much experience with parallelizing in R, but this is the only ad-hoc explanation I can think of. It would be very helpful to know if my reasoning checks out, and hear in more detail about why this might be happening. Thanks!
To clarify, my calls are like:
RNGkind("L'Ecuyer-CMRG")
set.seed(1)
x <- mclapply(training_data, simulation, testing_data, mc.cores=14, mc.set.seed = TRUE)
Running this twice in a row takes a lot longer the second time for me. Waiting for a minute and then running it again, it becomes fast again.
I haven't used mcapply but I have used parallel, foreach and pbapply packages. I think the inconsistency lies in the fact that there are small overheads involved in firing workers and in communicating on progress of running tasks in parallel.
I am writing a piece of code that utilizes the GPU using OpenCL. I succeeded in making a kernel that runs Vector addition (in a function called VecAdd), so I know it is working. Suppose I want to make a second kernel for Vector subtraction VecSub. How should I go about that? Or more specifically: can I use the same context for both the VecAdd and VecSub function?
Hi #debruss welcome to StackOverflow!
Yes, you certainly can run multiple Kernels in the same Context.
You can define the Kernels in the same or multiple Programs. You could even run them simultaneously in two different Command Queues or a single Command Queue configured for out of order execution.
There is an example (in rust) of defining and running two Kernels in a Program here: opencl2_kernel_test.rs.
When I run an R script to generate a model through machine learning frameworks like mxnet and tensorflow, I see in task manager that the cpu usage reaches 100%.
I have 2x 2.7 ghz and the pc becomes too slow until it blocks.
Is there a method to limit cpu usage in R with a slower model training time?
MXnet looks at some environment variables:
https://mxnet.incubator.apache.org/faq/env_var.html
You could experiment by setting MXNET_GPU_WORKER_NTHREADS=2 at the command line, for example.
Note that you may have to restart R after you set the environment variables for this to take effect.
0) As mentioned above, you could manipulate the environment variables that dictate how many workers you want.
1) You could adjust your workbook context to use only one of the CPUs.
e.g. z = nd.ones(shape=(3,3), ctx=mx.cpu(0))
2) Could resort to using OS-level tools, in Windows there are a few: https://superuser.com/questions/214566/are-there-solutions-that-can-limit-the-cpu-usage-of-a-process
Vishaal
I am writing a script that needs to be running continuously storing information on a MySQL database.
However, at some point of the day I will like to produce some summary of the data being colected, but writing this in the same script will stop collecting data while doing these summaries. Here's a sketch of the problem:
while (1==1) {
# get data and store it on the relational database
# At some point of the day (or time interval) do some summaries
if (time == certain_time) {
source("analyze_data.R")
}
}
The problem is that I'll like the data collection not to stop, being executed by another core of the computer.
I have seen references to packages parallel and multicore but my impression is that they are useful to repetitive tasks applied over vectors or lists.
You can use parallel to fork a process but you are right that the program will wait eternally for all the forked processes to come back together before proceeding (that is kind of the use case of parallel).
Why not run two separate R programs, one that collects the data and one that grabs it? Then, you simply run one continuously in the background and the other at set times. The problem then becomes one of getting the data out of the continuous data gathering program and into the summary program.
Do the logic outside of R:
Write 2 scripts; 1 with a while loop storing data, the other with a check. Run the while loop with one process and just leave it running.
Meanwhile, run your other (checking script) on demand to crunch the data. Or, put it in a cron job.
There are robust tools outside of R to handle this kind of thing; why do it inside R?
Among the choices I have for quickly parallelizing simple code (snowfall, foreach, and so on), what are my options for showing the progress of all the slave processes? Do any of the offerings excel in this regard?
I've seen that snowfall 1.70 has sfCat(), but it doesn't seem to cat output to the master R session.
That's where it can turn into black art... I notice that you did not list MPI or PVM -- those old workhorses of parallel computing do have monitors. You may find solutions by going outside of R and relying on job schedulers (slurm, torque, ...)
If you can't do that (and hey, there are reasons why we like the simplicity of snow, foreach, ...) then maybe you can alter your jobs to log a 'heartbeat' or progress message every N steps. You can log to text files (if you have a NFS or SMB/CIFS share), log to a database, or heck, send a tweet with R. It will most likely be specific to your app, and yes, it will have some cost.