Unable to replicate sample_n results using constant seed across R versions - r

I'm trying to replicate the sampling results from a script made in early January 2021. At the time, I forgot to record the R version and dplyr version I was using to create the sample. Now I have reinstalled R with the newest version of R (4.1.1) and dplyr (1.0.7) but I can't replicate my sampling results. I know that earlier R versions might use different RNGs, so I've tried to use RNGversion() to try out my seed with all versions of R but to no avail. This is not entirely surprising because I recall having used at least R 3.6.0, after which there shouldn't have been changes to the default RNG.
rm(list=ls())
library(dplyr)
RNGversion("3.5.0")
set.seed(182508)
Are there any other factors besides the R version that could affect my randomization results? For example, changes in the dplyr function sample_n? I know that sample_n has been superseded by slice_sample, but sample_n is still usable in the newest version of dplyr.

Related

package ‘Zelig’ is not available for R version 3.6.3

I want to use the Zelig package to get the ATT (Average treatment effect on treated) and ATE (Average treatment effect) after performing PSM (propensity score matching) using Nearest-neighbor matching method.
After going through all the related issues I found that Zelig has no compatibility with R version 3.6.3.
I would like to ask if there is any possible way to run the above package on macOS Catalina 10.15.4 and R: version 3.6.3.
If not, Is there any other way to get the ATT and ATE after performing PSM matching using MatchIt/Match. I would really appreciate a way out.
Thanks in advance.
Zelig has been removed from CRAN. See
https://cran.r-project.org/web/packages/Zelig/index.html
On the archive page, the last version appears to have been 5.1.6. You can get it using
remotes::install_version("Zelig", "5.1.6")

tidyr / dplyr performance: CRAN R vs. MRAN Microsoft R Open

When I read about Microsoft R Open I ususally read that it is faster in matrix calculations that R from CRAN due to multicore support.
I understand that this can increase performance e.g. when running regressions. Does it also significantly increase calculations from tidyr or dplyr? The underlying question is, i guess, whether these packages rely on matrix calculations or not. More generally, do data.frames work with matrix calculations under the hood? As far as I know, data.frames are a special kind of a list...
Does anyone have an answer to this. Theoretically ans (ideally) some benchmarks?

Why is sample() different in SparkR to R?

I've some experience in R and am learning Spark 1.6.1 by first exploring the implementation of R in Spark.
I noticed that the syntax for the R sample() command is different in Spark:
base::R: sample(x, size, replace)
Spark R: sample(DataFrame, withReplacement, fraction)
base::sample(x, size, replace) still works, but is masked by the Spark R version.
Does anyone know why this is, when most commands are identical between the two?
Are there use cases that I should use one versus the other?
Has anyone found an authoritative list of differences between Spark R and base:: R?
Thanks!
If you have a SparkR dataframe, you'll need to use the SparkR api for sampling. If you have a R dataframe, you'll need to use the base::R sampling function call. SparkR is not R and the function calls are not identical.
The issue you are having is one of masking.
To address the second part of the question, for the benefit of others who follow, I found that the Spark documentation does in fact list the R functions that are masked:
R Function Name Conflicts

Determining the previous version of r that produced an older R data file

So I had run some mixed models on an older version, an unknown version of R. These models had converged, however, I recently updated R to 3.1.1, and now these models don't converge.
I would like potentially revert back to the previous (yet unknown) version of R, and re-run the models to check and make sure that they actually converged on that version.
The specific question I have is:
Is there a way to determine the previous version that produced the results that are saved in a given data file? If so, how?
Alternatively, I do know that the data file was produced on the last version that I had installed, so if I can somehow figure out which version that was that would suffice. I should note that it's unlikely that the last version that I installed corresponds to the last version offered.
Any help would be appreciated.

Bivariate Poisson Regression in R?

I found a package 'bivpois' for R which evaluates a model for two related poisson processes (for example, the number of goals by the home and the away team in a soccer game). However, this package seems to no longer be useable in newer versions of R.
Is there a reasonable way to modify the glm() function to do a similar process, or run this older package on my new version of R? I have found very little literature on these sorts of processes and have found very little in terms of easy implementation in other statistical packages like STATA.
Any suggestions would be much appreciated.
While CRAN does not host a current binary of bivpois, you can build the package from the archived source code (see http://cran.r-project.org/doc/manuals/R-exts.html#Checking-and-building-packages ). Building bivpois 0.50-3.1 from source (available at http://cran.r-project.org/src/contrib/Archive/bivpois/) works for me on R 2.15.0 Windows x64. The zipped Windows binary I built is available here: http://commondatastorage.googleapis.com/jthetzel-public/bivpois_0.50-3.1.zip .
You can feel free to refer to odds modelling and testing inefficiency of sports-bookmakersas I had modified the relevant functions inside bivpois package.

Resources