This question already has answers here:
Combine a list of data frames into one data frame by row
(10 answers)
Closed 3 years ago.
I am extracting specific rows from a list of data frames in R and would like to have those rows assembled into a new data frame. As an example, I will use the iris data:
data(iris)
a.iris <- split(iris, iris$Species)
b.iris <- lapply(a.iris, function(x) with(x, x[3,]))
I want the return from lapply() to be arranged into a single data frame that is in the same structure as the original data frame (e.g., names(iris)). I have been looking at the plyr package but cannot find the right code to make this work. Any assistance would be greatly appreciated!
Brian
You can use do.call() with rbind() and simplify your lapply() call.
a.iris <- split(iris, iris$Species)
b.iris <- do.call(rbind, lapply(a.iris, `[`, 3, ))
b.iris
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## setosa 4.7 3.2 1.3 0.2 setosa
## versicolor 6.9 3.1 4.9 1.5 versicolor
## virginica 7.1 3.0 5.9 2.1 virginica
> all.equal(names(iris), names(b.iris))
## [1] TRUE
Or course, you could have also used tapply() to find the third row per group.
iris[tapply(seq_len(nrow(iris)), iris$Species, `[`, 3), ]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 3 4.7 3.2 1.3 0.2 setosa
# 53 6.9 3.1 4.9 1.5 versicolor
# 103 7.1 3.0 5.9 2.1 virginica
Related
I would like to use dplyr to divide a subset of variables by the IQR. I am open to ideas that use a different approach than what I've tried before, which is a combination of mutate_if and %in%. I want to reference the list bin instead of indexing the data frame by position. Thanks for any thoughts!
contin <- c("age", "ct")
data %>%
mutate_if(%in% contin, function(x) x/IQR(x))
You should use:
data %>%
mutate(across(all_of(contin), ~.x/IQR(.x)))
Working example:
data <- head(iris)
contin <- c("Sepal.Length", "Sepal.Width")
data %>%
mutate(across(all_of(contin), ~.x/IQR(.x)))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 15.69231 7.777778 1.4 0.2 setosa
2 15.07692 6.666667 1.4 0.2 setosa
3 14.46154 7.111111 1.3 0.2 setosa
4 14.15385 6.888889 1.5 0.2 setosa
5 15.38462 8.000000 1.4 0.2 setosa
6 16.61538 8.666667 1.7 0.4 setosa
This question already has answers here:
Create loop with dynamic column names and repeating values based on defined i
(1 answer)
How to use mutate and ifelse in a loop?
(3 answers)
How can I dynamically create new variables/columns on databases in R using dplyr?
(2 answers)
How to use mutate from dplyr to create a series of columns defined and called by a vector specifying values for mutation?
(1 answer)
dplyr apply a single function with changing argument to the same column
(2 answers)
Closed 1 year ago.
Let me clarify I am not looking at mutate_at or mutate(across(..., ...)) type of syntax here. I just want to know how to create several new columns at once inside tidyverse pipe syntax.
Let us assume the case of iris dataset.
I want to create say 10 (or 100 or more) new columns having a criteria like this.
first new column(variable) say V1 is just Petal.Length * 1,
second new col say V2 is Petal.Length * 2
and so on upto say V10 Petal.Length * 10
without explicitly writing the names and formula for each of these columns, which may be cumbersome If I want to create say 100 new columns.
You can use map functions :
library(dplyr)
library(purrr)
df <- iris %>% head
value <- 1:5
bind_cols(df,
map_dfc(value, ~df %>% transmute(!!paste0('col', .x) := Petal.Length * .x)))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species col1 col2 col3 col4 col5
#1 5.1 3.5 1.4 0.2 setosa 1.4 2.8 4.2 5.6 7.0
#2 4.9 3.0 1.4 0.2 setosa 1.4 2.8 4.2 5.6 7.0
#3 4.7 3.2 1.3 0.2 setosa 1.3 2.6 3.9 5.2 6.5
#4 4.6 3.1 1.5 0.2 setosa 1.5 3.0 4.5 6.0 7.5
#5 5.0 3.6 1.4 0.2 setosa 1.4 2.8 4.2 5.6 7.0
#6 5.4 3.9 1.7 0.4 setosa 1.7 3.4 5.1 6.8 8.5
In base R, this can be done with lapply :
df[paste0('col', value)] <- lapply(value, `*`, df$Petal.Length)
I would like to use the ntile function from dplyr or a similar function on a list of data frames but using a different n for each data frame. My list contains 150 data frames so a manual solution like the one below will not work. How can I rewrite the code below to act on the list of data frames and return the list of data frames with the new column?
library(tidyverse)
iris_list=split(iris,iris$Species)
iris_setosa=iris_list[[1]]
iris_versicolor=iris_list[[2]]
iris_virginica=iris_list[[3]]
iris_setosa$n3=ntile(iris_setosa$Sepal.Length,3)
iris_versicolor$n5=ntile(iris_setosa$Sepal.Length,5)
iris_virginica$n7=ntile(iris_setosa$Sepal.Length,7)
The final result should be this
final_list=list(iris_setosa,iris_versicolor,iris_virginica)
head(final_list[[1]])
Sepal.Length Sepal.Width Petal.Length Petal.Width Species n3
1 5.1 3.5 1.4 0.2 setosa 2
2 4.9 3.0 1.4 0.2 setosa 1
3 4.7 3.2 1.3 0.2 setosa 1
4 4.6 3.1 1.5 0.2 setosa 1
5 5.0 3.6 1.4 0.2 setosa 2
6 5.4 3.9 1.7 0.4 setosa 3
There are several ways to achieve this, depending on what type of object you want in the end.
One way would be to use base::expand.grid and purrr::pmap like this:
percentiles = list(3,5,7)
iris_list %>%
map("Sepal.Length") %>%
expand.grid(percentiles) %>%
pmap(~ntile(..1,..2))
First, you want only the Sepal.Length variable of all your datasets, so you use purrr::map to get them.
Then, expand.grid creates a dataframe of all combinations of its parameters. Here, with 2 lists of 3 members, it would return a dataframe of 3x3=9 rows: setosa 3, versicolor 3, virginica 3, setosa 5, ...
Finally, pmap can iterate over the dataframe and apply the function ntile, with the first column (iris_list) as the first argument and the second column (percentiles) as the second argument. Unfortunately, purrr is very bad in dealing with names, but it seems that it is on purpose.
EDIT:
Your edit is somehow another question, so here is another answer:
iris_list %>%
map(~mutate(.x, n3=ntile(Sepal.Length,3)),
n5=ntile(Sepal.Length,5)), n7=ntile(Sepal.Length,7)))
I've found a way that works
n_size=data.frame(Species=c("setosa ","versicolor","virginica"),size=c(3,5,7))
iris_bin=iris %>% inner_join(n_size,by="Species") %>%
group_by(Species)%>%
mutate(bin=ntile(Sepal.Length,size[1])) %>%
arrange(Species,Sepal.Length,bin)
New here and not very experienced, and I'm trying to get a project in R shinyapp to work.
I have a list of data frames which have a column labeled 'Gender' containing all/M/F. I want to filter all data frames based on the input, so that if the input is male, only rows containing M or all are kept.
list_tables <- list(adverb,adjective,simplenoun,verber,thingnoun,
personnoun,name_firstpart,name_secondpart)
input$gender <- "male
if(input$gender == "male"){
for (i in list_tables){
list_tables$i <- i[which((i$Gender=="M")|(i$Gender=="all")),]
}
}
Problem is, if I check the list afterwards, nothing has changed. If I do the same, but instead of using a for loop to cycle through the dataframes, I perform the same actions on only one dataframe, it does work. Theoretically, I could make a line of code for each dataframe separately, but it doesn't seem very neat and I have the feeling that the for loop should work but I'm just missing something. Would love to hear tips if anyone has them!
i is not a named-entry within list_tables, so list_tables$i doesn't work. Inside that loop, i is the data.frame you're trying to modify, but you don't update it.
Try either:
for (ind in seq_along(list_tables)) {
i <- list_tables[[ind]] # feels a little sloppt, but it's compact ...
list_tables[[ind]] <- i[which((i$Gender=="M")|(i$Gender=="all")),]
}
or even better
list_tables <- lapply(list_tables, function(i) i[which((i$Gender=="M")|(i$Gender=="all")),])
You could use lapply with subset:
example:
list_tables <- replicate(2,iris[c(1,51,101),],F)
# [[1]]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 51 7.0 3.2 4.7 1.4 versicolor
# 101 6.3 3.3 6.0 2.5 virginica
#
# [[2]]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 51 7.0 3.2 4.7 1.4 versicolor
# 101 6.3 3.3 6.0 2.5 virginica
solution:
lapply(list_tables,subset,Species %in% c("setosa","virginica"))
# [[1]]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 101 6.3 3.3 6.0 2.5 virginica
#
# [[2]]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 101 6.3 3.3 6.0 2.5 virginica
In your case that would be:
lapply(list_tables,subset,Gender %in% c("M","all"))
In my data frame the first column is a factor and I want to delete rows that have a certain value of factorname (when the value is present). I tried:
df <- df[-grep("factorname",df$parameters),]
Which works well when the targeted factor name is present. However if the factorname is absent, this command destroys the data frame, leaving it with 0 rows. So I tried:
df <- df[!apply(df, 1, function(x) {df$parameters == "factorname"}),]
that does not remove the offending lines. How can I test for the presence of factorname and remove the line if factorname is present?
You could use:
df[ which( ! df$parameter %in% "factorname") , ]
(Used %in% since it would generalize better to multiple exclusion criteria.) Also possible:
df[ !grepl("factorname", df$parameter) , ]
l<-sapply(iris,function(x)is.factor(x)) # test for the factor variables
>l
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
FALSE FALSE FALSE FALSE TRUE
m<-iris[,names(which(l=="TRUE"))]) #gives the data frame of factor variables only
iris[iris$Species !="setosa",] #generates the data with Species other than setosa
> head(iris[iris$Species!="setosa",])
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
53 6.9 3.1 4.9 1.5 versicolor
54 5.5 2.3 4.0 1.3 versicolor
55 6.5 2.8 4.6 1.5 versicolor
56 5.7 2.8 4.5 1.3 versicolor