I have a main data frame (mydata) and two secondary ones (df1, df2) such as follows:
x <- c(1, 2, 3, 4, 5)
y <- c(5, 4, 3, 2, 1)
mydata <- data.frame(x)
df1 <- data.frame(y)
df2 <- data.frame(y)
df2$y <- y+1 #This way, the columns in the df have the same name but different values.
I want to create new columns in mydata based on a formula with the variables in df1 and df2 like this:
mydata$new1 <- mydata$x*df1$y
mydata$new2 <- mydata$x*df2$y
Is there a way I can do this with a for loop? This is what I had in mind:
for (i in 2) {
mydata$paste0("new", i) <- mydata$x*dfpaste0(i)$y
}
Something along the lines of:
for (i in 1:2) {
mydata[[as.symbol(paste0('new', i))]] <- mydata$x*get(paste0("df", i))$y
}
We could also use mget to get all the object values in a list and multiply with the concerned vector
mydata[paste0("new", 1:2)] <- mydata$x * data.frame(mget(paste0("df", 1:2)))
Related
I have a data frame like this:
gene expression data frame
Assuming column name as different samples and row name as different genes.
Now I want to know the number of genes left after I filter from each column with a number
For example,
sample1_more_than_5 <- df[(df[,1]>5),]
sample1_more_than_10 <- df[(df[,1]>10),]
sample1_more_than_20 <- df[(df[,1]>20),]
sample1_more_than_30 <- df[(df[,1]>30),]
Then,
sample2_more_than_5 <- df[(df[,2]>5),]
sample2_more_than_10 <- df[(df[,2]>10),]
sample2_more_than_20 <- df[(df[,2]>20),]
sample2_more_than_30 <- df[(df[,2]>30),]
But I don't want to repeat this 100 times as I have 100 samples.
Can anyone write a loop for me for this situation? Thank you
Here is a solution using two loops that calculates, by each sample (columns), the number of genes (rows) that have a value greater than the one indicated in the nums vector.
#Create the vector with the numbers used to filter each columns
nums<-c(5, 10, 20, 30)
#Loop for each column
resul <- apply(df, 2, function(x){
#Get the length of rows that have a higher value than each nums entry
sapply(nums, function(y){
length(x[x>y])
})
})
#Transform the data into a data.frame and add the nums vector in the first column
resul<-data.frame(greaterthan = nums,
as.data.frame(resul))
We can loop over the columns and do this and create the grouping with cut
lst1 <- lapply(df, function(x) split(x, cut(x, breaks = c(5, 10, 20, 30))))
or findInterval and then split
lst1 <- lapply(df, function(x) split(x, findInterval(x, c(5, 10, 20, 30))))
If we go by the way the objects are created in the OP's post, there would be 100 * 4 i.e. 400 objects (100 columns) in the global environment. Instead, it can be single list object.
The objects can be created, but it is not recommended
v1 <- c(5, 10, 20, 30)
v2 <- seq_along(df)
for(i in v2) {
for(j in v1) {
assign(sprintf('sample%d_more_than_%d', i, j),
value = df[df[,i] > j,, drop = FALSE])
}
}
I am attempting to write a function to add a suffix to variable names in a data frame.
The code I want to turn into a function: colnames(dataframe) <- paste0(colnames(dataframe), "_suffix")
My function:
rename <- function(x, y) {colnames(x) <- paste0(colnames(x), y)}
When I call the function on the data frame I would like to append with a suffix, the data frame does not change. I am sure I am missing some fundamental understanding of how the function should work to append the data frame column names and preserve the data frame name.
Try this solution:
df <- data.frame(x = c(1, 2, 3), y = c(4, 5, 6))
rename <- function(x, y) {
colnames(x) <- paste0(colnames(x), y)
return(x)
}
As suggested above, you would overwrite the original data frame like so:
df <- rename(df, "_suffix")
How to apply a function on every second column of a data frame? That is to say, how to modify df2 <- sapply(df1, fun) such that df2 equals df1 but with fun applied to every second column? Here is what I tried:
a <- c(1,2,3,4,5)
b <- c(6,7,8,9,10)
df1 <- data.frame(a,b)
df2 <- sapply(df1[c(TRUE, FALSE)], function(x) x^2)
isTRUE(dim(df1)==dim(df2)) # FALSE
The problem with this code is, that it deletes all columns to which fun was not applied to (dim(df2) # 5 1).
Assigning variables to slices
You can assign new values for subsets of an object. Say for:
x <- c(1,2,3)
x[2] <- 4
Now x will be c(1,4,2). Similarly you can do this for row/columns of a matrix or dataframe. Here we use the apply function with the second argument 2 for cols (1 for cols). I recommend the seq function to generate a sequence of indices from=1, by=2 gives odd and from=2, by=2 gives even indices. Specifying this it way generalises to other subsets and straightforward to check you got it right.
a <- c(1,2,3,4,5)
b <- c(6,7,8,9,10)
df1 <- data.frame(a,b)
df2 <- df1
df2[,seq(1, ncol(df2), 2)] <- apply(df2[,seq(1, ncol(df2), 2)], 2, function(x) x^2)
Loops
Note that you can also do this with a loop:
df2 <- df1
for(col in seq(1, ncol(df2), 2)) df2[,col] <- sapply(df2[,col], function(x) x^2)
Vectorised functions
Since the squared operation is "vectorised" in R, in this case you could also do:
for(col in seq(1, ncol(df2), 2)) df2[,col] <- df2[,col]x^2
Or use vectorisation completely:
df2 <- df1
df2[,seq(1, ncol(df2), 2)] <- df2[,seq(1, ncol(df2), 2)]^2
This question already has answers here:
How can R loop over data frames?
(2 answers)
Closed 6 years ago.
Here is a simple made up data set:
df1 <- data.frame(x = c(1,2,3),
y = c(4,6,8),
z= c(1, 6, 7))
df2 <- data.frame(x = c(3,5,6),
y = c(3,4,9),
z= c(6, 7, 7))
What I want to do is to create a new variable "a" which is just the sum of all three variables (x,y,z)
Instead of doing this separately for each dataframe I thought it would be more efficient to just create a loop. So here is the code I wrote:
my.list<- list(df1, df2)
for (i in 1:2) {
my.list[i]$a<- my.list[i]$x +my.list[i]$y + my.list[i]$z
}
or alternatively
for (i in 1:2) {
my.list[i]<- transform(my.list[i], a= x+ y+ z)
}
In both cases it does not work and the error "number of items to replace is not a multiple of replacement length" is returned.
What would be the best solution to writing a loop code where I can loop through dataframes?
See ?Extract:
Recursive (list-like) objects
Indexing by [ is similar to atomic vectors and selects a list of the
specified element(s).
Both [[ and $ select a single element of the list.
In short, my.list[i] returns a list of length 1, and you are trying to assign it a data.frame, so that doesn't work; whereas my.list[[i]] returns the data.frame #i in your list, which you can replace with a data.frame.
So you can use either:
for (i in 1:2) {
my.list[[i]]$a<- my.list[[i]]$x +my.list[[i]]$y + my.list[[i]]$z
}
or
for (i in 1:2) {
my.list[[i]]<- transform(my.list[[i]], a= x+ y+ z)
}
But it would be even simpler to use lapply, where you don't need [[:
my.list <- lapply(my.list, function(df) df$a <- df$x + df$y + df$z)
Rather than using an explicit loop to extract the data.frames from the list, just use lapply. It takes a list of data.frames (or any object) and a function, applies the function to every element of the list, and returns a list with the results.
# Sample data
df1 <- data.frame(x = c(1,2,3), y = c(4,6,8), z = c(1, 6, 7))
df2 <- data.frame(x = c(3,5,6), y = c(3,4,9), z = c(6, 7, 7))
# Put them in a list
df_list <- list(df1, df2)
# Use lapply to iterate. FUN takes the function you want, and
# then its arguments (a = x + y + z) are just listed after it.
result_list <- lapply(df_list, FUN = transform, a = x + y + z)
I'm trying to append a row to an existing dataframe in R. The dataframe represents a subject and I want to update this with newly (generated) data. When I run this, the index numbers of the dataframe become strange:
1,
2,
21,
211,
2111,
21111, etc.
These are not practicle to read.
How to get 'normal' index numbers? (1, 2, 3, 4, etc.).
x <- 10
y <- 463
dat <- data.frame(x,y)
for (i in 1:10) {
dat.sub <- dat[nrow(dat),] # select the last row from 'dat'
dat.sub <- within(dat.sub, { # within that selection update the objects
x <- x+1
y <- y+1
})
dat <- rbind(dat, dat.sub, deparse.level = 2) # attach updated row to the 'dat'
}
dat
dat[3,]
I think the problem is dat.sub has data.frame class and has the same index number after second row. The easiest way is to change the class of dat.sub without assigning any index. One way is like:
dat.sub <- c(within(dat.sub, { # within that selection update the objects
x <- x+1
y <- y+1
}))
add a c in your for loop, making dat.sub as a vector.