Computing Spearman's rho for increasing subsets of rows in for Loop - r

I am trying to fit a for Loop in R in order to run correlations for multiple subsets in a data frame and then store the results in a vector.
What I have in this loop is a data frame with 2 columns, x and y, and 30 rows of different continuous measurement values in each column. The process should be repeated 100 times. The data can be invented.
What I need, is to compute the Spearman's rho for the first five rows (between x and y) and then for increasing subsets (e.g., the sixth first rows, the sevenths first rows etc.). Then, I'd need to store the rho results in a vector that I can further use.
What I had in mind (but does not work):
sortvector <- 1:(30)
for (i in 1:100)
{
sortvector <- sample(sortvector, replace = F)
xtemp <- x[sortvector]
rho <- cor.test(xtemp,y, method="spearman")$estimate
}
The problem is that the code gives me one value of rho for the whole dataframe, but I need it for increments of subsets.
How can I get rho for subsets of increasing values in a for-loop? And how can i store the coefficients in a vector that i can use afterwards?
Any help would be much appreciated, thanks.
Cheers

The easiest approach is to convertfor loop into sapply function, which returns a vector of rho's as a result of your bootstrapping:
sortvector <- 1:(30)
x <- rnorm(30)
y <- rnorm(30)
rho <- sapply(1:100, function(i) {
sortvector <- sample(sortvector, replace = F)
xtemp <- x[sortvector]
cor.test(xtemp, y, method = "spearman")$estimate
})
head(rho)
Output:
rho rho rho rho rho rho
0.014460512 -0.239599555 0.003337041 -0.126585095 0.007341491 0.264516129

Related

How to create correlation matrix after mice multiple imputation

I'm using the mice package to create multiple imputations. I want to create a correlations matrix (and a matrix of p-values for the correlation coefficients. I use miceadds::micombine.cor to do this. But this gives a dataframe with variables in the first to columns, and then a number of columns to contain r, p, t-values, and the like.
I'm looking for a way to turn this dataframe into a "good old" matrix with the correlation coefficient between x and y in position [x,y], and a matrix with p-values Does anyone have an easy way to do this?
Here's some code to reproduce:
data <- mtcars
mt.mis <- prodNA(mtcars, noNA = 0.1)
imputed <-mice(iris.mis, m = 5, maxit = 5, method = "pmm")
correlations<- miceadds::micombine.cor(mi.res=iris.mis, variables = c(1:3))
What I'm looking for is something like the output from cor(mtcars). Who can help?
I ended up writing my own function. Can probably be done much more efficiently, but this is what I made.
cormatrix <- function(r, N){
x <- 1
cormatrix <- matrix(nrow = N, ncol = N) # create empty matrix
for (i in 1:N) {
for (j in i:N) {
if(j>i){
cormatrix[i,j] <- r[x]
cormatrix[j,i] <- r[x]
x <- x + 1
}
}
}
diag(cormatrix) <- 1
cormatrix
}
You can call it with the output of micombine.cor and the number of variables in your model as arguments. So for example cormatrix(correlations$r,ncol(df)).

sampling random values each iteration

I have some simulated data, on top of the data I add some noise to see how the noise affects my data for further analyses. I created the following function
create.noise <- function(n, amount_needed, mean, sd){
set.seed(25)
values <- rnorm(n, mean, sd)
returned.values <- sample(values, size=amount_needed)
}
I call this function in the following loop:
dataframe.noises <- as.data.frame(noises) #i create here a dataframe dim 1x45 containing zeros
for(i in 1:100){
noises <- as.matrix(create.noise(100,45,0,1))
dataframe.noises[,i] <- noises
data_w_noise <- df.data_responses+noises
Estimators <- solve(transposed_schema %*% df.data_schema) %*% (transposed_schema %*% data_w_noise)
df.calculated_estimators[,i] <-Estimators
}
The code above always returns the same values, one solution I tried is sending i as parameter(which i think isn't correct) for each iteration I add i to the set.seed(25+i)
This gives me a unique value for each iteration, butas mentioned I don't think that this is the correct way to go with it.

Generating n new datasets by randomly sampling existing data, and then applying a function to new datasets

For a paper I'm writing I have subsetted a larger dataset into 3 groups, because I thought the strength of correlations between 2 variables in those groups would differ (they did). I want to see if subsetting my data into random groupings would also significantly affect the strength of correlations (i.e., whether what I'm seeing is just an effect of subsetting, or if those groupings are actually significant).
To this end, I am trying to generate n new data frames by randomly sampling 150 rows from an existing dataset, and then want to calculate correlation coefficients for two variables in those n new data frames, saving the correlation coefficient and significance in a new file.
But, HOW?
I can do it manually, e.g., with dplyr, something like
newdata <- sample_n(Random_sample_data, 150)
output <- cor.test(newdata$x, newdata$y, method="kendall")
I'd obviously like to not type this out 1000 or 100000 times, and have been trying things with loops and lapply (see below) but they've not worked (undoubtedly due to something really obvious that I'm missing!).
Here I have tried to assign each row to a different group, with 10 groups in total, and then to do correlations between x and y by those groups:
Random_sample_data<-select(Range_corrected, x, y)
cat <- sample(1:10, 1229, replace=TRUE)
Random_sample_cats<-cbind(Random_sample_data,cat)
correlation <- function(c) {
c <- cor.test(x,y, method="kendall")
return(c)
}
b<- daply(Random_sample_cats, .(cat), correlation)
Error message:
Error in cor.test(x, y, method = "kendall") :
object 'x' not found
Once you have the code for what you want to do once, you can put it in replicate to do it n times. Here's a reproducible example on built-in data
result = replicate(n = 10, expr = {
newdata <- sample_n(mtcars, 10)
output <- cor.test(newdata$wt, newdata$qsec, method="kendall")
})
replicate will save the result of the last line of what you did (output <- ...) for each replication. It will attempt to simplify the result, in this case cor.test returns a list of length 8, so replicate will simplify the results to a matrix with 8 rows and 10 columns (1 column per replication).
You may want to clean up the results a little bit so that, e.g., you only save the p-value. Here, we store only the p-value, so the result is a vector with one p-value per replication, not a matrix:
result = replicate(n = 10, expr = {
newdata <- sample_n(mtcars, 10)
cor.test(newdata$wt, newdata$qsec, method="kendall")$p.value
})

Saving output from for-loop to 3D array in R

I am working in R to save outputs from a 'for' loop in to a 3D matrix. I have been unable to adapt a similar example answered here for my purposes, so I'd like to share a different example.
I have a mostly-completed "for" loop that generates slopes and intercepts from a linear model for N iterations; with each iteration using a new set of y-values with a random t-distribution ('rt').
The desired resulting output is a 3D matrix with two slices, here named "out2". One slice is named "Intercept" and the other is "Slope." Each column in both of the sheets is a result from the model generated with different degrees of dreedom (dfs)
set.seed(14)
x <- sample(0:50, 15) # Generate x-values for simulation
true.a <- 1.5 # Intercept for linear relationship
true.m <- 5 # Slope for linear relationship
dfs <- c(1,2,3,4,6,8,10,15,20,25) # Degrees of freedom
N <- 1000 # Reps in for-loop
out2 <- array(NA, dim=c(N, length(dfs), 2))
dimnames(out2) <- list(NULL, dfs, c("Intercept", "Slope"))
for(j in 1:length(dfs)) {
df.tdist <- dfs[j]
for(i in 1:N) {
y <- true.a + true.m * x + 25*rt(15,df.tdist)
fit <- lm(y ~ x)
out2[ ] <- ?????????????
# The output array 'out2' will consist of two "slices", one with intercepts
and one with slopes. The length of each slice is 1000 rows, and the
width of each slice is 10 columns
}
}
Thanks greatly in advance for your feedback.

Computing linear regressions for every possible permutation of matrix columns

I have a (k x n) matrix. I have initially managed to linearly regress (using the lm function) column 1 with each and every other column and extracted only the coefficients.
fore.choose <- matrix(0, 1, NCOL(assets))
for(i in seq(1, NCOL(assets), 1))
{
abc <- lm(assets[,1]~assets[,i])$coefficients
fore.choose[1,i] <- abc[2:length(abc)]
}
The coefficients are placed in the fore.choose matrix.
What I now need to do is to linearly regress column 2 with each and every other column, and then column 3 and so on and so forth and extract only the coefficients.
The output will be a square matrix of OLS univariate coefficients. Kind of similar to a correlation matrix, but it is the beta coefficients I am interested in.
fore.choose <- matrix(0, 1, NCOL(assets))
will initially need to become
fore.choose <- matrix(0, NCOL(assets), NCOL(assets))
I'd just compute the coefficients directly from the correlation matrix, using beta = cor(x,y)*sd(x)/sd(y), like this:
# set up some sample data
set.seed(1)
d <- matrix(rnorm(50), ncol=5)
# get the coefficients
s <- apply(d, 2, sd)
cor(d)*outer(s, s, "/")
You could also use lsfit to get the coefficients of one term on all the others at once and then only have one loop to do:
sapply(1:ncol(d), function(i) {
coef(lsfit(d[,i], d))[2,]
})
I'm sure there must be a more elegant way than to nested loops.
fore.choose <- matrix(NA, NCOL(assets), NCOL(assets))
abc <- NULL
for(i in seq_len(ncol(assets))){ # loop over "dependant" columns
for(j in seq_len(ncol(assets))){ # loop over "independant" columns
abc <- lm(assets[,i]~assets[,j])$coefficients
fore.choose[i,j] <- abc[-1]
}
}

Resources