OpenMP, random variables, and reproducibility [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm writing an R code, which calls C++, and C++ functions use a lot of parallel computing based on openMP. This is my first code using openMP and what I saw is that even setting the same C++ random seed, the code never gives the same results.
I read a lot of posts here, where it seems that this is an issue with openMP, but they are all old (between12 to 5 years ago)
I want to know if there are solutions now and if there are published article which explain this problem or/and possible solutions.
Thanks

You need to read up on parallel random number generation. This is not an OpenMP problem, but one that will afflict any use of random numbers in a parallel code.
Start with
Parallel Random Numbers: As Easy as 1, 2, 3 - The Salmons

Related

Which command saves data faster in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Does anyone know which method of saving data is faster fwrite from data.table or saveWorkbook in openxlsx?
Not quite an answer, but too long for a comment.
The easy comment is: Just try to benchmark your code with bench::mark
library(bench)
...
mark(
data.table::fwrite(data, tempfile()),
openxlsx::saveWorkbook(data, tempfile()),
check = FALSE
)
The slightly longer comment is: Do you just want to have the fastest read/write? Then you might want to look into fst and or qs.
I presented a lightning talk at our last R User Group where I benchmarked different read/write speeds, memory usages, file sizes etc. You find the slides here.
Hope that helps

Why is 'DO LOOP' missing in 328eForth? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I’m trying to learning Forth directly in an embedded system and using Starting Forth by Leo Brodie as a text. The Forth version I’m using is 328eForth (a port of eforth to the ATmega328) which I’ve flashed into an Arduino Uno.
It appears that the DO LOOP words are not implemented in 328eForth - which puts a kink in my learning with Brodie. But looking at the dictionary using “WORDS” shows that a series of looping words exist e.g. BEGIN UNTIL WHILE FOR NEXT AFT EXIT AGAIN REPEAT amongst others.
My questions are as follows:
Q1 Why was DO LOOP omitted from 328eForth?
Q2 Can DO LOOP be implemented in other existing words? If so, how please and if not why? (I guess there must be a very good reason for the omission of DO LOOP...)
Q3 Can you give some commented examples of the 328eForth looping words?
Q1: A choice was made for a different loop construct.
Q2: The words FOR and NEXT perform a similar function that just counts down to 0 and runs exactly the specified number of times, including zero.
The ( n2 n1 -- ) DO ... LOOP always runs at least once, which requires additional (mental) bookkeeping. People have been complaining
about that as long back as I can remember.
Q3: The 382eforth documentation ForthArduino_1.pdf contains some examples.
Edit: Added some exposé to Q2

How do I determine which parallel processing package for R to use? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am exploring parallel programming in R and I have a good understanding of how the foreach function works, but I don't understand the differences between parallel,doparallel,doMC,doSNOW,SNOW,multicore, etc.
After doing a bunch of reading it seems that these packages work differently depending on the operating system, and I see some packages use the word multicore, and others use cluster (I am not sure if those are different), but beyond that it isn't clear what advantages or disadvantages each have.
I am working Windows, and I want to calculate standard errors using replicate weights in parallel so I don't have to calculate each replicate one at a time (if I have n cores I should be able to do n replicates at once). I was able to implement it using doSNOW, but it looks like plyr and the R community in general uses doMC so I am wondering if using doSNOW is a mistake.
Regards,
Carl
My understanding is that parallel is a conglomeration of snow and multicore, and is meant to incorporate the best parts of both.
For parallel computing on a single machine, I find parallel to have been very effective.
For parallel computing using a cluster of multiple machines, I've never succeeded in completing the cluster set up using parallel, but have succeeded using snow.
I've never used any of the do* packages, so I'm afraid I'm unable to comment.

Variable selection in R package data.table : $ vs [,,] [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I'm curently using R package data.table to process big datasets.
I'm wondering if there is a difference between the syntax
DT[,v]
and the syntax :
DT$v
if DT is my data.table object and v the variable I want to select.
I know that the dollar sign is usually used for data frames and that [,v] is always used in data.table examples. However they both work and seem to give (in my experience with 5million rows) similar times to execute.
Do you know if they are processed differently and if one is more efficient when processing even huger datasets ?

Application for solving linear system of equations [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I needed an application for solving linear systems of equations (N up to 10), so I got different codes, and compile them, and they seem to work, but I get lots of problems with precision. I mean, the solvers are really very sensitive to small changes of the system.
So, could somebody recommend to me a reliable commandl ine application for this purpose? Or some useful open source code (and easy to compile)
Thanks
GNU Octave is essentially a free version of Matlab (the syntax is identical for basic operations), so you can try things out there and see how they compare to the answers that you're getting.
Having said that, if your answer is very sensitive to the input, it's possible that your problem is ill-conditioned - you can check this by computing the condition number of the matrix in Octave. It's hard to say what to do in that case without knowing more specifics on the problem.
Also, you don't mention which method you're currently using. Gaussian elimination (i.e. "what you learned in math class") is notoriously numerically unstable if you don't use pivoting (see the wikipedia entry for "Pivoting"); adding that might be enough to improve the quality of the results.
An approach is to use the numpy package in Python. You can create a 2d matrix A and a 1d vector b, then solve Ax=b for x using solve(A, x). It's part of the linalg subpackage of numpy.

Resources