I would like to split a dataframe
df <- data.frame(a = 1:4, b = letters[1:4])
a b
1 1 a
2 2 b
3 3 c
4 4 d
into a list of one-row dataframes
list(
data.frame(a = 1, b = letters[1])
, data.frame(a = 2, b = letters[2])
, data.frame(a = 3, b = letters[3])
, data.frame(a = 4, b = letters[4])
)
[[1]]
a b
1 1 a
[[2]]
a b
1 2 b
[[3]]
a b
1 3 c
[[4]]
a b
1 4 d
Is there an elegant solution to this?
Using dplyr:
df %>%
rowid_to_column() %>%
group_split(rowid, keep = FALSE)
[[1]]
# A tibble: 1 x 2
a b
<int> <fct>
1 1 a
[[2]]
# A tibble: 1 x 2
a b
<int> <fct>
1 2 b
[[3]]
# A tibble: 1 x 2
a b
<int> <fct>
1 3 c
[[4]]
# A tibble: 1 x 2
a b
<int> <fct>
1 4 d
Or:
df %>%
mutate(rowid = 1:n()) %>%
group_split(rowid, keep = FALSE)
Or a shortened version (provided by #arg0naut91):
group_split(df, row_number(), keep = FALSE)
A simple way would be to use the split() command built into R
split( df, 1:length( df$a ) )
It should be robust enough to handle duplicates in df$a.
It would be with asplit
lapply(asplit(df, 1), as.data.frame.list)
#[[1]]
# a b
#1 1 a
#[[2]]
# a b
#1 2 b
#[[3]]
# a b
#1 3 c
#[[4]]
# a b
#1 4 d
Or with pmap
library(purrr)
pmap(df, tibble)
#[[1]]
# A tibble: 1 x 2
# a b
# <int> <fct>
#1 1 a
#[[2]]
# A tibble: 1 x 2
# a b
# <int> <fct>
#1 2 b
#[[3]]
# A tibble: 1 x 2
# a b
# <int> <fct>
#1 3 c
#[[4]]
# A tibble: 1 x 2
# a b
# <int> <fct>
#1 4 d
Related
I have a dataset :
library(tidyverse)
fac = factor(c("a","b","c"))
x = c(1,2,3)
d = tibble(fac,x);d
that looks like this :
# A tibble: 3 × 2
fac x
<fct> <dbl>
1 a 1
2 b 2
3 c 3
I want to change the value 2 of column x that corresponds to factor b with 3.14.
How can I do it in the dplyr pipeline framework ?
One alternative with ifelse statement:
library(dplyr)
d %>%
mutate(x = ifelse(fac == "b", 3.14, x))
fac x
<fct> <dbl>
1 a 1
2 b 3.14
3 c 3
We may use replace
library(dplyr)
library(magrittr)
d %<>%
mutate(x = replace(x, fac == "b", 3.14))
-output
d
# A tibble: 3 × 2
fac x
<fct> <dbl>
1 a 1
2 b 3.14
3 c 3
Example of the problem I'm having with applying a function including tidyverse code. I want to repeat for different variable names, but I'm not sure how to 'unquote'.
Example data:
df <- data.frame(grp=c(1,2,1,2,1), one=c(rep('a', 3), rep('b', 2)), two=c(rep('a', 1), rep('d', 4)))
cn <- colnames(df)[2:ncol(df)]
for(i in cn){
i <- enquo(i)
print(df %>% group_by(grp) %>% count(!!i))
}
# A tibble: 2 x 3
# Groups: grp [2]
grp `"one"` n
<dbl> <chr> <int>
1 1 one 3
2 2 one 2
# A tibble: 2 x 3
# Groups: grp [2]
grp `"two"` n
<dbl> <chr> <int>
1 1 two 3
2 2 two 2
Doing it for a single variable named one; this is the correct output.
df %>% group_by(grp) %>% count(one)
# A tibble: 4 x 3
# Groups: grp [2]
grp one n
<dbl> <fct> <int>
1 1 a 2
2 1 b 1
3 2 a 1
4 2 b 1
You can use map, also can avoid group_by by including grp in count
library(dplyr)
library(purrr)
map(cn, ~df %>% count(grp, .data[[.x]]))
#[[1]]
# grp one n
#1 1 a 2
#2 1 b 1
#3 2 a 1
#4 2 b 1
#[[2]]
# grp two n
#1 1 a 1
#2 1 d 2
#3 2 d 2
You can also use NSE with sym
map(cn, ~df %>% count(grp, !!sym(.x)))
I have the following data frame:
library(tidyverse)
dat <- data.frame(foo=c(1, 1, 2, 3, 3, 3), bar=c('a', 'a', 'b', 'b', 'c', 'd'))
dat
#> foo bar
#> 1 1 a
#> 2 1 a
#> 3 2 b
#> 4 3 b
#> 5 3 c
#> 6 3 d
What I want to do is to create a new column with bar column tagged with the sequential count of its member, resulting in:
foo bar new_column
1 a a.sample.1
1 a a.sample.2
2 b b.sample.1
3 b b.sample.2
3 c c.sample.1
3 d d.sample.1
I'm stuck with this code:
> dat %>% group_by(bar) %>% summarise(n=n())
# A tibble: 4 x 2
bar n
<fctr> <int>
1 a 2
2 b 2
3 c 1
4 d 1
You can use group_by %>% mutate:
dat %>% group_by(bar) %>% mutate(new_column = paste(bar, 'sample', 1:n(), sep = "."))
# A tibble: 6 x 3
# Groups: bar [4]
# foo bar new_column
# <dbl> <fctr> <chr>
#1 1 a a.sample.1
#2 1 a a.sample.2
#3 2 b b.sample.1
#4 3 b b.sample.2
#5 3 c c.sample.1
#6 3 d d.sample.1
dat%>%group_by(bar)%>%mutate(new_column=paste0(bar,'.','sample.',row_number()))
# A tibble: 6 x 3
# Groups: bar [4]
foo bar new_column
<dbl> <fctr> <chr>
1 1 a a.sample.1
2 1 a a.sample.2
3 2 b b.sample.1
4 3 b b.sample.2
5 3 c c.sample.1
6 3 d d.sample.1
I want to use purrr:pmap_df on a data.frame I created, to give me back another data.frame. However I want the original data.frame "kept" and cbinded to the new data.frame in a single pipe. Example:
f <- function(a, b, c) {
return(list(d = 1, e = 2, f = 3))
}
tibble(a = 1:2, b = 3:4, c = 5:6) %>%
pmap_df(f)
This would give me:
# A tibble: 2 × 3
d e f
<dbl> <dbl> <dbl>
1 1 2 3
2 1 2 3
But I would like to keep that tibble:
# A tibble: 2 × 6
a b c d e f
<int> <int> <int> <dbl> <dbl> <dbl>
1 1 3 5 1 2 3
2 2 4 6 1 2 3
(Silly example but you get what I mean). Any elegant way of doing this in a single pipe?
If you don't want to redefine the function, the simplest way is to just use bind_cols on the results, using . to place the data.frame where you need:
library(tidyverse)
f <- function(a, b, c) {
return(list(d = 1, e = 2, f = 3))
}
tibble(a = 1:2, b = 3:4, c = 5:6) %>%
bind_cols(pmap_df(., f))
#> # A tibble: 2 x 6
#> a b c d e f
#> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 1 3 5 1 2 3
#> 2 2 4 6 1 2 3
You can also use ... to represent the inputs into pmap, which lets you do
tibble(a = 1:2, b = 3:4, c = 5:6) %>% pmap_df(~c(..., f(...)))
which returns the same thing.
I have a dataframe df with three columns a,b,c.
df <- data.frame(a = c('a','b','c','d','e','f','g','e','f','g'),
b = c('X','Y','Z','X','Y','Z','X','X','Y','Z'),
c = c('cat','dog','cat','dog','cat','cat','dog','cat','cat','dog'))
df
# output
a b c
1 a X cat
2 b Y dog
3 c Z cat
4 d X dog
5 e Y cat
6 f Z cat
7 g X dog
8 e X cat
9 f Y cat
10 g Z dog
I have to group_by using the column b followed by summarise using the column c with counts of available values in it.
df %>% group_by(b) %>%
summarise(nCat = sum(c == 'cat'),
nDog = sum(c == 'dog'))
#output
# A tibble: 3 × 3
b nCat nDog
<fctr> <int> <int>
1 X 2 2
2 Y 2 1
3 Z 2 1
However, before doing the above task, I should remove the rows belonging to a value in a which has more than one value in b.
df %>% group_by(a) %>% summarise(count = n())
#output
# A tibble: 7 × 2
a count
<fctr> <int>
1 a 1
2 b 1
3 c 1
4 d 1
5 e 2
6 f 2
7 g 2
For example, in this dataframe, all the rows having value e(values: Y,X), f(values: Z,Y), g(values: X,Z) in column a.
# Expected output
# A tibble: 3 × 3
b nCat nDog
<fctr> <int> <int>
1 X 1 1
2 Y 0 1
3 Z 1 0
We can use filter with n_distinct to filter the values in 'b' that have only one unique element for each 'a' group, then grouped by 'b', we do the summarise
df %>%
group_by(a) %>%
filter(n_distinct(b)==1) %>%
group_by(b) %>%
summarise(nCat =sum(c=='cat'), nDog = sum(c=='dog'), Total = n())
# A tibble: 3 × 4
# b nCat nDog Total
# <fctr> <int> <int> <int>
#1 X 1 1 2
#2 Y 0 1 1
#3 Z 1 0 1