I have a list of data frames, and I'm looking to assign to each data frame within the list a variable column that is simple a character vector of the given dataframe's name.
data <- list(
d1 = data.frame(animal = sample(c("cat","dog","bird"), 5, replace = T)),
d2 = data.frame(animal = sample(c("cat","dog","bird"), 5, replace = T)),
d3 = data.frame(animal = sample(c("cat","dog","bird"), 5, replace = T))
)
This yields:
> data
$d1
animal
1 cat
2 bird
3 cat
4 cat
5 cat
$d2
animal
1 dog
2 cat
3 cat
4 cat
5 bird
$d3
animal
1 cat
2 dog
3 cat
4 cat
5 cat
What I want to do is create something like the following:
> newdata
$d1
animal newvar
1 cat d1
2 cat d1
3 cat d1
4 dog d1
5 cat d1
$d2
animal newvar
1 bird d2
2 cat d2
3 bird d2
4 cat d2
5 cat d2
$d3
animal newvar
1 bird d3
2 bird d3
3 cat d3
4 cat d3
5 bird d3
But I can't quite figure out how to actually reference the data frame name --in a list of data frames-- and turn it into a character vector appropriately.
Something like the following does not work:
namefunc <- function(x) {
x <- x %>% transform(newvar = as.character(x))
}
newdata <- namefunc(data)
We can use Map to cbind the corresponding list elements of 'data' with the names of 'data'
Map(cbind, data, newvar= names(data))
lapply(names(data), function(d) transform(data[[d]], newvar=d))
or eventually:
L <- lapply(names(data), function(d) transform(data[[d]], newvar=d))
names(L) <- names(data)
Related
I am quite new to R, and I do not know how to create variables in a loop. I have a dataset where each observation is uniquely defined by an id and a type. My goal would be to create different datasets from a starting one, keeping for each dataset the id, type a specific variable, and to rename the variable type as type_variable. Please see below a reproducible example of my dataset:
dt_type <- data.frame(id = c(1,1,1,1,2,2,2,2),
type= c("b1", "b2","c1", "c2","b1", "b2","c1", "c2"),
a=rnorm(8), b=rnorm(8),c=rnorm(8),d=rnorm(8))
# id type a b c d
# 1 1 b1 -0.74733339 -1.1121249 -0.2005649 1.70320036
# 2 1 b2 -0.87290362 -0.1221949 -2.7723691 1.04158671
# 3 1 c1 -0.00878965 -0.7592988 -0.5108226 2.10755315
# 4 1 c2 0.87295622 -0.5885439 0.2606365 -0.87080649
# 5 2 b1 -0.74536372 0.1377794 -0.1382621 0.01743011
# 6 2 b2 -0.01570109 -0.3058672 -0.3146880 -0.43594081
# 7 2 c1 -0.28966205 -0.2045772 -1.1776759 -2.24223369
# 8 2 c2 -0.63680969 2.3815740 0.4462243 -0.05397941
This is how I have tried to do it, but unfortunately it does not work.
varlist <- list("a", "b", "c", "d")
for (i in 1:4) {
tmp <- dt_type %>% rename(paste("type", varlist[[i]], sep=="_") = type) %>%
arrange(id, varlist[[i]], desc(paste("type", varlist[[i]], sep=="_"))) %>%
distinct(id, varlist[[i]], .keep_all = T)
assign(paste("dt_type_", varlist[[i]]), tmp)
}
I am used to using loops in other programming languages, but if there are better ways to reach the result I want, please let me know.
Sorry for not posting the expected output, here it is:
dt_type_a
# id type value
# 1 1 b1 -1.5023199
# 2 1 b2 -0.3653626
# 3 1 c1 1.2842098
# 4 1 c2 0.2732327
# 5 2 b1 -0.7581897
# 6 2 b2 1.1627059
# 7 2 c1 -1.6644546
# 8 2 c2 1.2916819
dt_type_b
# id type value
# 1 1 b1 -0.19573684
# 2 1 b2 -1.35095843
# 3 1 c1 0.69342205
# 4 1 c2 0.47689611
# 5 2 b1 0.67058845
# 6 2 b2 0.21992074
# 7 2 c1 -0.02046201
# 8 2 c2 0.19686712
Thanks,
Vincenzo
Hum, I would just go from wide to long but since you're asking to create variables dynamically:
library(data.table)
dt_type <- data.frame(id = c(1,1,1,1,2,2,2,2),
type= c("b1", "b2","c1", "c2","b1", "b2","c1", "c2"),
a=rnorm(8), b=rnorm(8),c=rnorm(8),d=rnorm(8))
setDT(dt_type)
dt_long <- melt(dt_type, id.vars = c("id", "type"))
varnames <- unique(dt_long$variable)
for (var in varnames) {
assign(paste0("dt_type_", var), dt_long[variable == var, .(id, type, value)])
}
hope it helps...
This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 5 years ago.
i have a dataframe that looks like this
> data <- data.frame(foo=c(1, 1, 2, 3, 3, 3), bar=c('a', 'b', 'a', 'b', 'c', 'd'))
> data
foo bar
1 1 a
2 1 b
3 2 a
4 3 b
5 3 c
6 3 d
I would like to create a new column bars_by_foo which is the concatenation of the values of bar by foo. So the new data should look like this:
foo bar bars_by_foo
1 1 a ab
2 1 b ab
3 2 a a
4 3 b bcd
5 3 c bcd
6 3 d bcd
I was hoping that the following would work:
p <- function(v) {
Reduce(f=paste, x = v)
}
data %>%
group_by(foo) %>%
mutate(bars_by_foo=p(bar))
But that code gives me an error
Error: incompatible types, expecting a character vector.
What am I doing wrong?
You could simply do
data %>%
group_by(foo) %>%
mutate(bars_by_foo = paste0(bar, collapse = ""))
Without any helper functions
It looks like there's a bit of an issue with the mutate function - I've found that it's a better approach to work with summarise when you're grouping data in dplyr (that's no way a hard and fast rule though).
paste function also introduces whitespace into the result so either set sep = 0 or use just use paste0.
Here is my code:
p <- function(v) {
Reduce(f=paste0, x = v)
}
data %>%
group_by(foo) %>%
summarise(bars_by_foo = p(as.character(bar))) %>%
merge(., data, by = 'foo') %>%
select(foo, bar, bars_by_foo)
Resulting in..
foo bar bars_by_foo
1 1 a ab
2 1 b ab
3 2 a a
4 3 b bcd
5 3 c bcd
6 3 d bcd
You can try this:
agg <- aggregate(bar~foo, data = data, paste0, collapse="")
df <- merge(data, agg, by = "foo", all = T)
colnames(df) <- c(colnames(data), "bars_by_foo") # optional
# foo bar bars_by_foo
# 1 1 a ab
# 2 1 b ab
# 3 2 a a
# 4 3 b bcd
# 5 3 c bcd
# 6 3 d bcd
Your function works if you ensure that bar are all characters and not levels of a factor.
data <- data.frame(foo=c(1, 1, 2, 3, 3, 3), bar=c('a', 'b', 'a', 'b', 'c', 'd'),
stringsAsFactors = FALSE)
library("dplyr")
p <- function(v) {
Reduce(f=paste, x = v)
}
data %>%
group_by(foo) %>%
mutate(bars_by_foo=p(bar))
Source: local data frame [6 x 3]
Groups: foo [3]
foo bar bars_by_foo
<dbl> <chr> <chr>
1 1 a a b
2 1 b a b
3 2 a a
4 3 b b c d
5 3 c b c d
6 3 d b c d
I have five data frames which have same number of columns. I want to use rbind to append my data, but they have different variable names. Fortunately, it has same form like this.
date prod1 code1 tot1
date prod2 code2 tot2
...
date prod5 code5 tot5
I want to delete the number-code at the same time, so then I can rbind my data frames. How can I do this?
Thanks in advance.
Since the questions was how to change the column names, I will address this problem first:
lapply(dflist, setNames, nm = new_col_name)
df1 <- data.frame(prod1 = 1:5, code1 = 1:5, tot1 = 1:5)
df2 <- data.frame(prod2 = 1:5, code2 = 1:5, tot2 = 1:5)
dflist <- list(df1, df2)
lapply(dflist, setNames, nm = c("prod", "code", "tot"))
[[1]]
prod code tot
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
[[2]]
prod code tot
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
As already mentioned it may be better just to ignore column names and use rbindlist from data.table to bind rows.
data.table::rbindlist(dflist, use.names = F)
You can do it using magrittr and dplyr :
d1 <- mtcars
d2 <- d1
d3 <- d1
names(d2) <- paste0(names(d2), "_2")
names(d3) <- paste0(names(d2), "_3")
rbind(d1, d2, d3) # gives an error, ok
#> Error in match.names(clabs, names(xi)): les noms ne correspondent pas aux noms précédents
library(magrittr, quietly = TRUE, warn.conflicts = FALSE)
library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
df_list <- list(d2, d3)
df_list <- lapply(df_list, magrittr::set_colnames, names(d1))
df_final <- rbind(d1, dplyr::bind_rows(df_list) )
nrow(df_final) == 3* nrow(d1)
#> [1] TRUE
I have a function that queries a database and returns a list of two data frames (df1 and df2). If I lapply iteratively over that function, I return a list of nested lists with the two data frames.
The resultant list is structured as below:
#e.g. sample list of lists of 2 data frames
A1 <- data.frame(Value =c("A","B","C"))
A2 <- data.frame(Value =c("1","2","3"))
B1 <- data.frame(Value =c("D","E","F"))
B2 <- data.frame(Value =c("4","5","6"))
C1 <- data.frame(Value =c("G","H","I"))
C2 <- data.frame(Value =c("7","8","9"))
myList <- list( list(df1 = A1, df2 = A2),
list(df1 = B1, df2 = B2),
list(df1 = C1, df2 = C2))
I then want to combine the data frames into their own separate big data frames - df1_All and df2_All.
How can I extract all of the df1 data frames from the list and combine them into a larger data frame? I am thinking it would be to use make use of a do.call(rbind) construct with an apply or map function applied to myList?
Based on Ronak Shah's comment to my question, this is the answer I went with:
dfX1 <- data.frame(do.call("rbind",lapply(myList,"[[","df1")))
dfX2 <- data.frame(do.call("rbind",lapply(myList,"[[","df2")))
myList %>%
pmap(.,bind_rows) %>%
bind_cols()
Value Value1
1 A 1
2 B 2
3 C 3
4 D 4
5 E 5
6 F 6
7 G 7
8 H 8
9 I 9
Edit: the following code does not create the required output (OP clarified the intended output after I drafted this)
Let's create a custom function. Your data frames seem to be in the same position, so let's exploit that regularity:
getDataFrame <- function(mylist, wantx) {
df <- sapply(myList, `[[`, wantx)
names(df) <- paste0("Name", seq(1:length(mylist)))
df <- as_tibble(df)
return(df)
}
So,
getDataFrame(myList, 1)
returns:
# A tibble: 3 x 3
Name1 Name2 Name3
<fct> <fct> <fct>
1 A D G
2 B E H
3 C F I
And similarly:
> getDataFrame(myList, 2)
# A tibble: 3 x 3
Name1 Name2 Name3
<fct> <fct> <fct>
1 1 4 7
2 2 5 8
3 3 6 9
If you don't want them to be factors, you'll have to convert them afterwards. Hope this helps.
This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 5 years ago.
i have a dataframe that looks like this
> data <- data.frame(foo=c(1, 1, 2, 3, 3, 3), bar=c('a', 'b', 'a', 'b', 'c', 'd'))
> data
foo bar
1 1 a
2 1 b
3 2 a
4 3 b
5 3 c
6 3 d
I would like to create a new column bars_by_foo which is the concatenation of the values of bar by foo. So the new data should look like this:
foo bar bars_by_foo
1 1 a ab
2 1 b ab
3 2 a a
4 3 b bcd
5 3 c bcd
6 3 d bcd
I was hoping that the following would work:
p <- function(v) {
Reduce(f=paste, x = v)
}
data %>%
group_by(foo) %>%
mutate(bars_by_foo=p(bar))
But that code gives me an error
Error: incompatible types, expecting a character vector.
What am I doing wrong?
You could simply do
data %>%
group_by(foo) %>%
mutate(bars_by_foo = paste0(bar, collapse = ""))
Without any helper functions
It looks like there's a bit of an issue with the mutate function - I've found that it's a better approach to work with summarise when you're grouping data in dplyr (that's no way a hard and fast rule though).
paste function also introduces whitespace into the result so either set sep = 0 or use just use paste0.
Here is my code:
p <- function(v) {
Reduce(f=paste0, x = v)
}
data %>%
group_by(foo) %>%
summarise(bars_by_foo = p(as.character(bar))) %>%
merge(., data, by = 'foo') %>%
select(foo, bar, bars_by_foo)
Resulting in..
foo bar bars_by_foo
1 1 a ab
2 1 b ab
3 2 a a
4 3 b bcd
5 3 c bcd
6 3 d bcd
You can try this:
agg <- aggregate(bar~foo, data = data, paste0, collapse="")
df <- merge(data, agg, by = "foo", all = T)
colnames(df) <- c(colnames(data), "bars_by_foo") # optional
# foo bar bars_by_foo
# 1 1 a ab
# 2 1 b ab
# 3 2 a a
# 4 3 b bcd
# 5 3 c bcd
# 6 3 d bcd
Your function works if you ensure that bar are all characters and not levels of a factor.
data <- data.frame(foo=c(1, 1, 2, 3, 3, 3), bar=c('a', 'b', 'a', 'b', 'c', 'd'),
stringsAsFactors = FALSE)
library("dplyr")
p <- function(v) {
Reduce(f=paste, x = v)
}
data %>%
group_by(foo) %>%
mutate(bars_by_foo=p(bar))
Source: local data frame [6 x 3]
Groups: foo [3]
foo bar bars_by_foo
<dbl> <chr> <chr>
1 1 a a b
2 1 b a b
3 2 a a
4 3 b b c d
5 3 c b c d
6 3 d b c d