Sum a list of variables in R - r

I am trying to sum up a list of variables.
qdisgust <- c(2,3,5,8,12,17,18,22,23,25,28,29,31,33)
vqdisgust <- list()
n <- length(qdisgust)
lhs <- paste("mydata$Disgust_", qdisgust, sep="")
eq <- paste("vqdisgust <- c(lhs)")
eval(parse(text=eq))
I sucessfully get all the variables in the list, but then am not able to get the sum of them. I assume there would be a way even simpler to do this.
Thank you for your help!

If we are looking to get the sum of multiple columns in 'mydata', we can use colSums
colSums(mydata)
If qdisgust is the index of columns, either
colSums(mydata[paste0("Disgust_", qdisgust)])
or
colSums(mydata[grep("Disgust_", names(mydata))])

Related

Creating for loops in R using subset data

I recently started programming in R, and am trying to compute slopes for a data set. This is my code:
slopes<- vector()
gdd.values <- length(unique(data.gdd$GDD))
for (i in 1:gdd.values){
subset.data <- data.gdd[which(data.gdd$GDD==i),]
volume <- apply(subset.data[,4,6],1,prod)
species.richness <- apply(subset.data[,7:59],1,sum)
slopes[i] <- lm(log(species.richness) ~ log(volume))$coefficients[2]
}
When I run it the "slopes" value remains empty. All other values are fine (no other empty sets). Let me know if you find any obvious mistakes. Thanks
Currently, you are iterating across the length of unique values and not unique values themselves. So, as #RobJensen comments, adjust the for loop vector and iteration. Hence, why some or all returned values result in missing as subset.data may contain no rows due to imprecise filter.
However, consider a more streamlined approach using the often underused and overlooked by() to subset dataset by needed grouping factor(s) and bind returned list into a vector:
coeff_list <- by(data.gdd, data.gdd$GDD, FUN=function(df) {
volume <- apply(df[,4,6],1,prod)
species.richness <- apply(df[,7:59],1,sum)
lm(log(species.richness) ~ log(volume))$coefficients[2]
})
slopes <- do.call(c, coeff_list)

Saving rows into variables in R

I have a 18-by-48 matrix.
Is there a way to save each of the 18 rows automatically in a separate variable (e.g., from r1 to r18) ?
I'd definitely advise against splitting a data.frame or matrix into its constituent rows. If i absolutely had to split the rows up, I'd put them in a list then operate from there.
If you desperately had to split it up, you could do something like this:
toy <- matrix(1:(18*48),18,48)
variables <- list()
for(i in 1:nrow(toy)){
variables[[paste0("variable", i)]] <- toy[i,]
}
list2env(variables, envir = .GlobalEnv)
I'd be inclined to stop after the for loop and avoid the list2env. But I think this should give you your result.
I believe you can select a row r from your dataframe d by indexing without a column specified:
var <- d[r,]
Thus you can extract all of the rows into a variable by using
var <- d[1:length(d),]
Where var[1] is the first row, var[2] the second. Etc.. not sure if this is exactly what you are looking for. Why would you want 18 different variables for each row?
result <- data.frame(t(mat))
colnames(result) <- paste("r", 1:18, sep="")
attach(result)
your matrix is mat

lapply function with 2 count variables

I am very green in R, so there is probably a very easy solution to this:
I want to calculate the average correlation between the column vectors in a square matrix:
x<-matrix(rnorm(10000),ncol=100)
aux<-matrix(seq(1,10000))
loop<-sapply(aux,function(i,j) cov(x[,i],x[,j])
cor_x<-mean(loop)
When evaluating the sapply line I get the error 'subscript out of bounds'. I know I can do this via a script but is there any way to achieve this in one line of code?
No need for any loops. Just use mean(cov(x)), which does this very efficiently.
The problem is due to aux. The variable auxhas to range from 1 to 100 since you have 100 columns. But your aux is a sequence along the rows of x and hence ranges from 1 to 10000. It will work with the following code:
aux <- seq(1, 100)
loop <- sapply(aux, function(i, j) cov(x[, i], x[, j]))
Afterwards, you can calculate mean covariance with:
cor_x <- mean(loop)
If you want to exclude duplicate fields (e.g., cov(X,Y) is inherently identical to cov(Y,X)), you can use:
cor_x <- mean(loop[upper.tri(loop, diag = TRUE)])
If you also want to exclude cov(X,X), i.e., variance, you can use:
cor_x <- mean(loop[upper.tri(loop)])

R programming - Adding extra column to existing matrix

I am a beginner to R programming and am trying to add one extra column to a matrix having 50 columns. This new column would be the avg of first 10 values in that row.
randomMatrix <- generateMatrix(1,5000,100,50)
randomMatrix51 <- matrix(nrow=100, ncol=1)
for(ctr in 1:ncol(randomMatrix)){
randomMatrix51.mat[1,ctr] <- sum(randomMatrix [ctr, 1:10])/10
}
This gives the below error
Error in randomMatrix51.mat[1, ctr] <- sum(randomMatrix[ctr, 1:10])/10 :incorrect
number of subscripts on matrix
I tried this
cbind(randomMatrix,sum(randomMatrix [ctr, 1:10])/10)
But it only works for one row, if I use this cbind in the loop all the old values are over written.
How do I add the average of first 10 values in the new column. Is there a better way to do this other than looping over rows ?
Bam!
a <- matrix(1:5000, nrow=100)
a <- cbind(a,apply(a[,1:10],1,mean))
On big datasets it is however faster (and arguably simpler) to use:
cbind(a, rowMeans(a[,1:10]) )
Methinks you are over thinking this.
a <- matrix(1:5000, nrow=100)
a <- transform(a, first10ave = colMeans(a[1:10,]))

R: t tests on rows of 2 dataframes

I have two dataframes and I would like to do independent 2-group t-tests on the rows (i.e. t.test(y1, y2) where y1 is a row in dataframe1 and y2 is matching row in dataframe2)
whats best way of accomplishing this?
EDIT:
I just found the format: dataframe1[i,] dataframe2[i,]. This will work in a loop. Is that the best solution?
The approach you outlined is reasonable, just make sure to preallocate your storage vector. I'd double check that you really want to compare the rows instead of the columns. Most datasets I work with have each row as a unit of observation and the columns represent separate responses/columns of interest Regardless, it's your data - so if that's what you need to do, here's an approach:
#Fake data
df1 <- data.frame(matrix(runif(100),10))
df2 <- data.frame(matrix(runif(100),10))
#Preallocate results
testresults <- vector("list", nrow(df1))
#For loop
for (j in seq(nrow(df1))){
testresults[[j]] <- t.test(df1[j,], df2[j,])
}
You now have a list that is as long as you have rows in df1. I would then recommend using lapply and sapply to easily extract things out of the list object.
It would make more sense to have your data stored as columns.
You can transpose a data.frame by
df1_t <- as.data.frame(t(df1))
df2_t <- as.data.frame(t(df2))
Then you can use mapply to cycle through the two data.frames a column at a time
t.test_results <- mapply(t.test, x= df1_t, y = df2_t, SIMPLIFY = F)
Or you could use Map which is a simple wrapper for mapply with SIMPLIFY = F (Thus saving key strokes!)
t.test_results <- Map(t.test, x = df1_t, y = df2_t)

Resources