Collecting P-values using a Loop in R

Collecting P-values using a Loop in R - r

Ran a bunch of regressions and now I am trying to collect their p values and put them into a vector.
x=summary(reg2)$coefficients[4,4] #p value from the first regression, p-val is in row 4, col 4
for (i in 3:1000){
currentreg=summary(paste("reg",i,sep=""))
assign(x,c(x,currentreg$coefficients[4,4]))
}
I also tried eval(parse(currentreg)) and eval(parse(summary(paste("reg",i,sep="")))) with no luck. I always have this problem with telling R "Hey don't treat this as a string, treat it as a variable" and vice versa.

While it would be better to store the objects in a list and loop over that, you're asking for get:
currentreg <- summary(get(paste("reg", i, sep="")))
If you had a list of objects, models <- list(reg2, reg3, reg4, ...). You can then loop over this list with sapply to achieve the desired result (looping, collecting the results into a vector):
x <- sapply(models, function(z) { summary(z)$coeficients[4,4] })

You can use
sapply(mget(ls(pattern = "^reg\\d+$")), function(x) summary(x)$coefficients[4,4])
to create a vector with all p-values.

Related

Assign dynamic variable names to a list where each variable contains results

Context: I'd like to save the results of a Likelihood ratio test for a multinomial logistic regression in several dynamic variables, but I'm not sure how I could do that. This is what I've been trying:
library(lmtest)
indels = c("C.T","A.G","G.A","G.C","T.C","C.A","G.T","A.C","C.G","A.del","TAT.del","TCTGGTTTT.del","TACATG.del","GATTTC.del")
my_list = list()
for (i in 1:length(indels)) {
assign(paste0("lrtest_results_",indels[i]), my_list[[i]]) = lrtest(multinom_model_completo, indels[i])
}
I was basically trying to save each variable (with the name lrtest_results_ + the dynamic part of the variable name which depends on the vector indels) in a list using the assign method and paste0, but it doesn't seem to be working. Any help is very welcome!

The best way is to lapply the test function to each element of the vector indels and assign the names after.
my_list <- lapply(indels, \(x) lrtest(multinom_model_completo, x))
names(my_list) <- paste0("lrtest_results_", indels)

How to transfer multiple columns into numeric & find correlation coefficients

I have a dataset "res.sav" that I read in via haven. It contains 20 columns, called "Genes1_Acc4", "Genes2_Acc4" etc. I am trying to find a correlation coefficient between those and another column called "Condition". I want to separately list all coefficients.
I created two functions, cor.condition.cols and cor.func to do that. The first iterates through the filenames and works just fine. The second was supposed to give me my correlations which didn't work at all. I also created a new "cor.condition.Genes" which I would like to fill with the correlations, ideally as a matrix or dataframe.
I have tried to iterate through the columns with two functions. However, when I try to pass it, I get the error: "NAs introduced by conversion". This wouldn't be the end of the world (I tried also suppressWarning()). But the bigger problem I have that it seems like my function does not convert said columns into the numeric type I need for my cor() function. I receive the "y must be numeric" error when trying to run the cor() function. I tried to put several arguments within and without '' or "" without success.
When I ran str(cor.condition.cols) I only receive character strings, which makes me think that my function somehow messes up with the as.numeric function. Any suggestions of how else I could iter through these columns and transfer them?
Thanks guys :)
cor.condition.cols <- lapply(1:20, function(x){paste0("res$Genes", x, "_Acc4")})
#save acc_4 columns as numeric columns and calculate correlations
res <- (as.numeric("cor.condition.cols"))
cor.func <- function(x){
cor(res$Condition, x, use="complete.obs", method="pearson")
}
cor.condition.Genes <- cor.func(cor.condition.cols)

You can do:
cor.condition.cols <- paste0("Genes", 1:20, "_Acc4")
res2 <- as.numeric(as.matrix(res[cor.condition.cols]))
cor.condition.Genes <- cor(res2, res$Condition, use="complete.obs", method="pearson")
eventually the short variant:
cor.condition.cols <- paste0("Genes", 1:20, "_Acc4")
cor.condition.Genes <- cor(res[cor.condition.cols], res$Condition, use="complete.obs")
Here is an example with other data:
cor(iris[-(4:5)], iris[[4]])

Indexing variables in R

I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)

I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!

Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.

Removing quotes in function output in R

I am trying to write a function in R, for a simple time series regression (the result of this function is the output for more complicated ones). In the first part i define the variables and create some lags for the function, which are named ar_i depending on the used lag.
However in the second part i try to combine this lags in a matrix using a cbind function on the variables initially defined. As you can see the output is not the expected matrix, but the names of the lags themselves. I tried to solve this by using the noquote() and cat() function, but these don't seem to work.
Do you have any suggestions? Thanks in advance!!!
Pd: The code and the results are below.
trans <- dlpib
ar <- dlpib
linear <- 1:4
for (i in linear){
assign(paste("ar_",i,sep = ""), lag(ar,k=-i))
}
linear_dat <- cbind(paste("ar_",linear, collapse=',', sep = ""))
> linear_dat
[,1]
[1,] "ar_1,ar_2,ar_3,ar_4"

I think you could go about this more efficiently with sapply:
linear <- 1:4
linear_list <- lapply(linear, function(i) lag(ar, k=-i))
linear_dat <- do.call(cbind, linear_list)
colnames(linear_dat) <- paste0("ar_", linear)

printing objects from a double for loop in R

I have code for nested for loops here. The output I would like to receive is a matrix of the means of the columns of the matrix produced by the nested loop. So, the interior loop should run 1000 simulations of a randomized vector, and run a function each time. This works fine on its own, and spits the output into R. But I want to save the output from the nested loop to an object (a matrix of 1000 rows and 11 columns), and then print only the colMeans of that matrix, to be performed by the outer loop.
I think the problem lies in the step where I assign the results of the inner loop to the obj matrix. I have tried every variation on obj[i,],obj[i],obj[[i]], etc. with no success. R tells me that it is an object of only one dimension.
x=ACexp
obj=matrix(nrow=1000,ncol=11,byrow=T) #create an empty matrix to dump results into
for(i in 1:ncol(x)){ #nested for loops
a=rep(1,times=i) #repeat 1 for 1:# columns in x
b=rep(0,times=(ncol(x)-length(a))) #have the rest of the vector be 0
Inv=append(a,b) #append these two for the Inv vector
for (i in 1:1000){ #run this vector through the simulations
Inv2=sample(Inv,replace=FALSE) #randomize interactions
temp2=rbind(x,Inv2)
obj[i]<-property(temp2) #print results to obj matrix
}
print.table(colMeans(obj)) #get colMeans and print to excel file
}
Any ideas how this can be fixed?

You're repeatedly printing the whole matrix to the screen as it gets modified but your comment says "print to excel file". I'm guessing you actually want to save your data out to a file. Remove print.table command all together and after your loops are completed use write.table()
write.table(colMeans(obj), 'myNewMatrixFile.csv', quote = FALSE, sep = ',', row.names = FALSE)
(my preferred options... see ?write.table to select the ones you like)

Since your code isn't reproducible, we can't quite tell what you want. However, I guess that property is returning a single number that you want to place in the right row/column place of the obj matrix, which you would refer to as obj[row,col]. But you'll have trouble with that as is, because both your loops are using the same index i. Maybe something like this will work for you.
obj <- matrix(nrow=1000,ncol=11,byrow=T) #create an empty matrix to dump results into
for(i in 1:ncol(x)){ #nested for loops
Inv <- rep(c(1,0), times=c(i, ncol(x)-i)) #repeat 1 for 1:# columns in x, then 0's
for (j in 1:nrow(obj)){ #run this vector through the simulations
Inv2 <- sample(Inv,replace=FALSE) #randomize interactions
temp2 <- rbind(x,Inv2)
obj[j,i] <- property(temp2) #save results in obj matrix
}
}
write.csv(colMeans(obj), 'myFile.csv') #get colMeans and print to csv file

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Collecting P-values using a Loop in R - r

You can use sapply(mget(ls(pattern = "^reg\\d+$")), function(x) summary(x)$coefficients[4,4]) to create a vector with all p-values.

Related

Assign dynamic variable names to a list where each variable contains results

How to transfer multiple columns into numeric & find correlation coefficients

Indexing variables in R

Removing quotes in function output in R

printing objects from a double for loop in R

Categories

Resources