Change value in tibble column element in R - r

I have a dataset :
library(tidyverse)
fac = factor(c("a","b","c"))
x = c(1,2,3)
d = tibble(fac,x);d
that looks like this :
# A tibble: 3 × 2
fac x
<fct> <dbl>
1 a 1
2 b 2
3 c 3
I want to change the value 2 of column x that corresponds to factor b with 3.14.
How can I do it in the dplyr pipeline framework ?

One alternative with ifelse statement:
library(dplyr)
d %>%
mutate(x = ifelse(fac == "b", 3.14, x))
fac x
<fct> <dbl>
1 a 1
2 b 3.14
3 c 3

We may use replace
library(dplyr)
library(magrittr)
d %<>%
mutate(x = replace(x, fac == "b", 3.14))
-output
d
# A tibble: 3 × 2
fac x
<fct> <dbl>
1 a 1
2 b 3.14
3 c 3

Related

Remove sequence of rows conditional on value in single cell in group-first position

In this type of data:
df <- data.frame(
Sequ = c(1,1,2,2,2,3,3,3),
G = c("A", "B", "*", "B", "A", "A", "*", "B")
)
I need to filter out rows grouped by Sequ iff the Sequ-first value is *. I can do it like so, but was wondering if there's a more direct and more elegant way in dplyr:
library(dplyr)
df %>%
group_by(Sequ) %>%
mutate(check = ifelse(first(G)=="*", 1, 0)) %>%
filter(check != 1)
# A tibble: 5 × 3
# Groups: Sequ [2]
Sequ G check
<dbl> <chr> <dbl>
1 1 A 0
2 1 B 0
3 3 A 0
4 3 * 0
5 3 B 0
We can try the following base R code using subset + ave
subset(
df,
!ave(G == "*", Sequ, FUN = function(x) head(x, 1))
)
which gives
Sequ G
1 1 A
2 1 B
6 3 A
7 3 *
8 3 B
Here is a direct dplyr way:
library(dplyr)
df %>%
group_by(Sequ) %>%
filter(!first(G == "*"))
Sequ G
<dbl> <chr>
1 1 A
2 1 B
3 3 A
4 3 *
5 3 B
Another base R option with duplicated
subset(df, !Sequ %in% Sequ[G == "*" & !duplicated(Sequ)])
Sequ G
1 1 A
2 1 B
6 3 A
7 3 *
8 3 B

Function over tidyverse code results in issue with quotes

Example of the problem I'm having with applying a function including tidyverse code. I want to repeat for different variable names, but I'm not sure how to 'unquote'.
Example data:
df <- data.frame(grp=c(1,2,1,2,1), one=c(rep('a', 3), rep('b', 2)), two=c(rep('a', 1), rep('d', 4)))
cn <- colnames(df)[2:ncol(df)]
for(i in cn){
i <- enquo(i)
print(df %>% group_by(grp) %>% count(!!i))
}
# A tibble: 2 x 3
# Groups: grp [2]
grp `"one"` n
<dbl> <chr> <int>
1 1 one 3
2 2 one 2
# A tibble: 2 x 3
# Groups: grp [2]
grp `"two"` n
<dbl> <chr> <int>
1 1 two 3
2 2 two 2
Doing it for a single variable named one; this is the correct output.
df %>% group_by(grp) %>% count(one)
# A tibble: 4 x 3
# Groups: grp [2]
grp one n
<dbl> <fct> <int>
1 1 a 2
2 1 b 1
3 2 a 1
4 2 b 1
You can use map, also can avoid group_by by including grp in count
library(dplyr)
library(purrr)
map(cn, ~df %>% count(grp, .data[[.x]]))
#[[1]]
# grp one n
#1 1 a 2
#2 1 b 1
#3 2 a 1
#4 2 b 1
#[[2]]
# grp two n
#1 1 a 1
#2 1 d 2
#3 2 d 2
You can also use NSE with sym
map(cn, ~df %>% count(grp, !!sym(.x)))

Split Dataframe into list of one-row Dataframes

I would like to split a dataframe
df <- data.frame(a = 1:4, b = letters[1:4])
a b
1 1 a
2 2 b
3 3 c
4 4 d
into a list of one-row dataframes
list(
data.frame(a = 1, b = letters[1])
, data.frame(a = 2, b = letters[2])
, data.frame(a = 3, b = letters[3])
, data.frame(a = 4, b = letters[4])
)
[[1]]
a b
1 1 a
[[2]]
a b
1 2 b
[[3]]
a b
1 3 c
[[4]]
a b
1 4 d
Is there an elegant solution to this?
Using dplyr:
df %>%
rowid_to_column() %>%
group_split(rowid, keep = FALSE)
[[1]]
# A tibble: 1 x 2
a b
<int> <fct>
1 1 a
[[2]]
# A tibble: 1 x 2
a b
<int> <fct>
1 2 b
[[3]]
# A tibble: 1 x 2
a b
<int> <fct>
1 3 c
[[4]]
# A tibble: 1 x 2
a b
<int> <fct>
1 4 d
Or:
df %>%
mutate(rowid = 1:n()) %>%
group_split(rowid, keep = FALSE)
Or a shortened version (provided by #arg0naut91):
group_split(df, row_number(), keep = FALSE)
A simple way would be to use the split() command built into R
split( df, 1:length( df$a ) )
It should be robust enough to handle duplicates in df$a.
It would be with asplit
lapply(asplit(df, 1), as.data.frame.list)
#[[1]]
# a b
#1 1 a
#[[2]]
# a b
#1 2 b
#[[3]]
# a b
#1 3 c
#[[4]]
# a b
#1 4 d
Or with pmap
library(purrr)
pmap(df, tibble)
#[[1]]
# A tibble: 1 x 2
# a b
# <int> <fct>
#1 1 a
#[[2]]
# A tibble: 1 x 2
# a b
# <int> <fct>
#1 2 b
#[[3]]
# A tibble: 1 x 2
# a b
# <int> <fct>
#1 3 c
#[[4]]
# A tibble: 1 x 2
# a b
# <int> <fct>
#1 4 d

dplyr mutate: create column using first occurrence of another column

I was wondering if there's a more elegant way of taking a dataframe, grouping by x to see how many x's occur in the dataset, then mutating to find the first occurrence of every x (y)
test <- data.frame(x = c("a", "b", "c", "d",
"c", "b", "e", "f", "g"),
y = c(1,1,1,1,2,2,2,2,2))
x y
1 a 1
2 b 1
3 c 1
4 d 1
5 c 2
6 b 2
7 e 2
8 f 2
9 g 2
Current Output
output <- test %>%
group_by(x) %>%
summarise(count = n())
x count
<fct> <int>
1 a 1
2 b 2
3 c 2
4 d 1
5 e 1
6 f 1
7 g 1
Desired Output
x count first_seen
<fct> <int> <dbl>
1 a 1 1
2 b 2 1
3 c 2 1
4 d 1 1
5 e 1 2
6 f 1 2
7 g 1 2
I can filter the test dataframe for the first occurrences then use a left_join but was hoping there's a more elegant solution using mutate?
# filter for first occurrences of y
right <- test %>%
group_by(x) %>%
filter(y == min(y)) %>%
slice(1) %>%
ungroup()
# bind to the output dataframe
left_join(output, right, by = "x")
We can use first after grouping by 'x' to create a new column, use that also in group_by and get the count with n()
library(dplyr)
test %>%
group_by(x) %>%
group_by(first_seen = first(y), add = TRUE) %>%
summarise(count = n())
# A tibble: 7 x 3
# Groups: x [7]
# x first_seen count
# <fct> <dbl> <int>
#1 a 1 1
#2 b 1 2
#3 c 1 2
#4 d 1 1
#5 e 2 1
#6 f 2 1
#7 g 2 1
I have a question. Why not keep it simple? for example
test %>%
group_by(x) %>%
summarise(
count = n(),
first_seen = first(y)
)
#> # A tibble: 7 x 3
#> x count first_seen
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 2 1
#> 4 d 1 1
#> 5 e 1 2
#> 6 f 1 2
#> 7 g 1 2

summarise and group_by using two different columns consecutively

I have a dataframe df with three columns a,b,c.
df <- data.frame(a = c('a','b','c','d','e','f','g','e','f','g'),
b = c('X','Y','Z','X','Y','Z','X','X','Y','Z'),
c = c('cat','dog','cat','dog','cat','cat','dog','cat','cat','dog'))
df
# output
a b c
1 a X cat
2 b Y dog
3 c Z cat
4 d X dog
5 e Y cat
6 f Z cat
7 g X dog
8 e X cat
9 f Y cat
10 g Z dog
I have to group_by using the column b followed by summarise using the column c with counts of available values in it.
df %>% group_by(b) %>%
summarise(nCat = sum(c == 'cat'),
nDog = sum(c == 'dog'))
#output
# A tibble: 3 × 3
b nCat nDog
<fctr> <int> <int>
1 X 2 2
2 Y 2 1
3 Z 2 1
However, before doing the above task, I should remove the rows belonging to a value in a which has more than one value in b.
df %>% group_by(a) %>% summarise(count = n())
#output
# A tibble: 7 × 2
a count
<fctr> <int>
1 a 1
2 b 1
3 c 1
4 d 1
5 e 2
6 f 2
7 g 2
For example, in this dataframe, all the rows having value e(values: Y,X), f(values: Z,Y), g(values: X,Z) in column a.
# Expected output
# A tibble: 3 × 3
b nCat nDog
<fctr> <int> <int>
1 X 1 1
2 Y 0 1
3 Z 1 0
We can use filter with n_distinct to filter the values in 'b' that have only one unique element for each 'a' group, then grouped by 'b', we do the summarise
df %>%
group_by(a) %>%
filter(n_distinct(b)==1) %>%
group_by(b) %>%
summarise(nCat =sum(c=='cat'), nDog = sum(c=='dog'), Total = n())
# A tibble: 3 × 4
# b nCat nDog Total
# <fctr> <int> <int> <int>
#1 X 1 1 2
#2 Y 0 1 1
#3 Z 1 0 1

Resources