How can combine dataset in R? - r

I think my question is very simple.
dat1<-seq(1:100)
dat2<-seq(1:100)
how can I combine dat1 and dat2 and make it look like
dat3<-seq(1:200)
Thanks so much!

How do you want to combine dat1 and dat2? By rows or columns? I'd take a look at the help pages for rbind() (row bind) , cbind() (column bind), orc() which combines arguments to form a vector.

Let me start by a comment.
In order to create a sequence of number on can use the following syntax:
x <- seq(from=, to=, by=)
A shorthand for, e.g., x <- seq(from=1, to=10, by=1) is simply 1:10. So, your notation is a little bit weird...
On the other hand, you can combine two or more vectors using the c() function. Let us say, for example, that a <- c(1, 2) and b <- c(3, 4). Then c <- c(a, b) is the vector (1, 2, 3, 4).
There exist similar functions to combine data sets: rbind() and cbind().

Related

Compute 15 rows in parallel (through vectorization) and create df with them

I am creating 15 rows in a dataframe, like this. I cannot show my real code, but the create row function involves complex calculations that can be put in a function. Any ideas on how I can do this using lapply, apply, etc. to create all 15 in parallel and then concatenate all the rows into a dataframe? I think using lapply will work (i.e. put all rows in a list, then unlist and concatenate, but not exactly sure how to do it).
for( i in 1:15 ) {
row <- create_row()
# row is essentially a dataframe with 1 row
rbind(my_df,row)
}
Something like this should work for you,
create_row <- function(){
rnorm(10, 0,1)
}
my_list <- vector(100, mode = "list")
my_list_2 <- lapply(my_list, function(x) create_row())
data.frame(t(sapply(my_list_2,c)))
The create_row function is just make the example reproducible, then we predefine an empty list, then fill it with the result from the create_row() function, then convert the resulting list to a data frame.
Alternatively, predefine a matrix and use the apply functions, over the row margin, then use the t (transpose) function, to get the output correct,
df <- data.frame(matrix(ncol = 10, nrow = 100))
t(apply(df, 1, function(x) create_row(x)))

How to get the combinations of multiple vectors in R

I have a data frame x. I want to get the pairwise combinations of all rows, like (x[1,], x[2,), (x[1,], x[3,]), (x[2,], x[3,]). Here I take each row as an entirety. I tried functions like combn, but it gave me the combinations of all elements in all rows.
I think with combn you are on the right track:
x <- data.frame(a=sample(letters, 10), b=1:10, c=runif(10), stringsAsFactors=FALSE)
ans <- combn(nrow(x), 2, FUN=function(sub) x[sub,], simplify=FALSE)
Now ans is a list of (in this case 45, in general choose(nrow(x), 2)) data.frames with two rows each.
The crossing() function from the tidyr package may help you. (The link contains a StackOverflow example.)

R apply and for loop calculation

Would someone be able to explain why this apply doesn't work correctly? I wanted to normalise all the values in each row by the sum of the values in each row - such that the sum of each row =1 However, when I did this using an apply function, the answer is incorrect.
data <- data.frame(Sample=c("A","B","C"),val1=c(1235,34567,234346),val2=c(3445,23446,234235),val3=c(457643,234567,754234))
norm <- function(x){
x/sum(x)}
applymeth <- data
applymeth[,2:4] <- apply(applymeth[,2:4], 1, norm)
rowSums(applymeth[,2:4])
loopmeth <- data
for(i in 1:nrow(data)){
loopmeth[i,2:4] <- norm(loopmeth[i,2:4])
}
rowSums(loopmeth[,2:4])
Thanks.
apply() gives you (in the result) a matrix column by column - in your case from a row-by-row input. You have to transpose the result:
applymeth <- data
applymeth[,2:4] <- t(apply(applymeth[,2:4], 1, norm))
rowSums(applymeth[,2:4])
Have a look at
apply(matrix(1:12, 3), 1, norm)
The reason for this result of apply() is a convention:
in a matrix or a multidimensional array the index of the first dimension is running first, then the second and so on. Example:
array(1:12, dim=c(2,2,3))
So (without any reorganisation of the data) apply() produces one column after the other. This behavior not depends on the parameter MARGIN= of the function apply().

Operating on multiple matrices in a for loop using R

I have 1000 matrices named A1, A2, A3,...A1000.
In a for loop I would like to simply take the colMeans() of each matrix:
for (i in 1:1000){
means[i,]<-colMeans(A1)
}
I would like to do this for each matrix Ax. Is there a way to put Ai instead of A1 in the for loop?
So, one way is:
for (i in 1:1000){
means[i,]<-colMeans(get(paste('A', i, sep = '')))
}
but I think that misses the point of some of the comments, i.e., you probably had to do something like this:
csvs = lapply(list.files('.', pattern = 'A*.csv'), function(fname) {
read.csv(fname)
})
Then the answer to your question is:
means = lapply(csvs, colMeans)
I don't completely understand, but maybe you have assigned each matrix to a different variable name? That is not the best structure, but you can recover from it:
# Simulate the awful data structure.
matrix.names<-paste0('A',1:1000)
for (name in matrix.names) assign(name,matrix(rnorm(9),ncol=3))
# Pull it into an appropriate list
list.of.matrices<-lapply(matrix.names,get)
# Calculate the column means
column.mean.by.matrix<-sapply(list.of.matrices,colMeans)
You initial question asks for a 'for loop' solution. However, there is an easy way to get the desired
result if we use an 'apply' function.
Perhaps putting the matrices into a list, and then applying a function would prove worthwhile.
### Create matrices
A1 <- matrix(1:4, nrow = 2, ncol = 2)
A2 <- matrix(5:9, nrow = 2, ncol = 2)
A3 <- matrix(11:14, nrow = 2, ncol = 2)
### Create a vector of names
names <- paste0('A', 1:3)
### Create a list of matrices, and assign names
list <- lapply(names, get)
names(list) <- names
### Apply the function 'colMeans' to every matrix in our list
sapply(list, colMeans)
I hope this was useful!
As others wrote already, using a list is perhaps your best option. First you'll need to place your 1000 matrices in a list, most easily accomplished using a for-loop (see several posts above). Your next step is more important: using another for-loop to calculate the summary statistics (colMeans).
To apply a for-loop through an R object, in general you can do one of the two options:
Loop over by indices: for example:
for(i in 1:10){head(mat[i])} #simplistic example
Loop "directly"
for(i in mat){print(i)} #simplistic example
In the case of looping through R lists, the FIRST option will be much easier to set up. Here is the idea adapted to your example:
column_means <- rep(NA,1000) #empty vector to store column means
for (i in 1:length(list_of_matrices)){
mat <- list_of_matrices[[i]] #temporarily store individual matrices
##be sure also to use double brackets!
column_means <- c(column_means, colMeans(mat))

R: t tests on rows of 2 dataframes

I have two dataframes and I would like to do independent 2-group t-tests on the rows (i.e. t.test(y1, y2) where y1 is a row in dataframe1 and y2 is matching row in dataframe2)
whats best way of accomplishing this?
EDIT:
I just found the format: dataframe1[i,] dataframe2[i,]. This will work in a loop. Is that the best solution?
The approach you outlined is reasonable, just make sure to preallocate your storage vector. I'd double check that you really want to compare the rows instead of the columns. Most datasets I work with have each row as a unit of observation and the columns represent separate responses/columns of interest Regardless, it's your data - so if that's what you need to do, here's an approach:
#Fake data
df1 <- data.frame(matrix(runif(100),10))
df2 <- data.frame(matrix(runif(100),10))
#Preallocate results
testresults <- vector("list", nrow(df1))
#For loop
for (j in seq(nrow(df1))){
testresults[[j]] <- t.test(df1[j,], df2[j,])
}
You now have a list that is as long as you have rows in df1. I would then recommend using lapply and sapply to easily extract things out of the list object.
It would make more sense to have your data stored as columns.
You can transpose a data.frame by
df1_t <- as.data.frame(t(df1))
df2_t <- as.data.frame(t(df2))
Then you can use mapply to cycle through the two data.frames a column at a time
t.test_results <- mapply(t.test, x= df1_t, y = df2_t, SIMPLIFY = F)
Or you could use Map which is a simple wrapper for mapply with SIMPLIFY = F (Thus saving key strokes!)
t.test_results <- Map(t.test, x = df1_t, y = df2_t)

Resources