How do I overwrite data with dplyr?

How do I overwrite data with dplyr? - r

df <- data.frame(a = c(1, 2, 3), b = c(4, 5, 6))
#> a b
#> 1 1 4
#> 2 2 5
#> 3 3 6
I figured out how to overwrite data with base R. Maybe that was day 1 of learning R.
df[2:3, 2] <- c(50, 60)
#> a b
#> 1 1 4
#> 2 2 50
#> 3 3 60
I never found an easy way to do it with dplyr. How do I overwrite data with the pipe %>%?

We can use replace within mutate. If we can use the column names, i.e. 'b', replace the 'b' by specifying the list parameter in replace with the index of rows and the values as a vector
library(dplyr)
df %>%
mutate(b = replace(b, 2:3, c(50, 60)))
# a b
#1 1 4
#2 2 50
#3 3 60
Or specify the index of columns in mutate_at
df %>%
mutate_at(2, replace, list = 2:3, values = c(50, 60))

Related

Merge two columns and only keep data from second

I am trying to unite two columns but the data Concatenates from each column. I would like to only keep the data from the second column B and remove any of the A data
column data. Is there a way to do this without 1) Deleting the data manually from the 1st col first? Thank you!
df1 <- data.frame(A = c(1,1,1,1,1,1,1,1),
B = c(2,2,2,2,2,2,2,2))
df1 %>% unite("A",A,B,remove = TRUE,na.rm = TRUE)
The out returns 1_2 in the combine new column 'A'

Do you want to keep only column B where a value is present, otherwise use column A, like this? So you end up with one column (C) that coalesces A and B?
If you want to delete column A, then just add %>% select(-A).
library(tidyverse)
data.frame(
A = c(1, 1, 1, 1, 1, 1, 1, 1),
B = c(2, 2, 2, 2, 2, 2, NA, 2)
) %>%
mutate(C = coalesce(B, A))
#> A B C
#> 1 1 2 2
#> 2 1 2 2
#> 3 1 2 2
#> 4 1 2 2
#> 5 1 2 2
#> 6 1 2 2
#> 7 1 NA 1
#> 8 1 2 2
Created on 2022-05-17 by the reprex package (v2.0.1)

Cumulative sum across a nested list

I have a large tibble with one nested list column. Each element of the nested list column has 10,000 iterations and i would like to apply a cumulative sum across these iterations by a grouping variable.
I have created a minimal reproducible example below
tibble(a = list(c(1,2),c(3,4), c(5,6), c(7,8)),
c = c(1,1, 2, 2))
The intended output should be
tibble(a = list(c(1,2),c(4,6), c(5,6), c(12,14)),
c = c(1,1, 2, 2))
Tried the follwoing syntax but its clearly wrong
x <- tibble(a = list(c(1,2),c(4,6), c(5,6), c(7,8)),
c = c(1,1, 2, 2))
x %>%
group_by(c) %>%
mutate(a = map(a,cumsum))
Any help greatly appreciated. I can potentially spread the data and add across the columns but that would be slow

One base R option could be:
with(tbl, ave(a, c, FUN = function(x) Reduce(`+`, x, accumulate = TRUE)))
[[1]]
[1] 1 2
[[2]]
[1] 4 6
[[3]]
[1] 5 6
[[4]]
[1] 12 14

I think you're looking for the following though it doesn't match your desired output for the last two values (can you check that these are correct):
library(dplyr)
library(purrr)
library(tidyr)
df %>%
group_by(c) %>%
mutate(x = accumulate(a, `+`)) %>%
unnest(cols = c(a, x))
# A tibble: 8 x 3
# Groups: c [2]
a c x
<dbl> <dbl> <dbl>
1 1 1 1
2 2 1 2
3 3 1 4
4 4 1 6
5 5 2 5
6 6 2 6
7 7 2 12
8 8 2 14

repeat dataframe n times whilst adding column

This is my reproducible code:
df <- data.frame(x = c(1, 2), y = c(3, 4))
df1 <- df %>% mutate(z = 1)
df2 <- df %>% mutate(z = 2)
df3 <- df %>% mutate(z = 3)
df <- rbind(df1, df2, df3)
df
I repeat the original data frame df 3 times, whilst adding one column where the number in the column indicated the repetition. In my use case, I have to do this more than 3 times. I could use a loop but is there a neater way? I guess i cannot use expand.grid.

You can also do it with a merge:
dfz <- data.frame(z = 1:3)
merge(df, dfz)
# x y z
# 1 1 3 1
# 2 2 4 1
# 3 1 3 2
# 4 2 4 2
# 5 1 3 3
# 6 2 4 3

We can create a list column and unnest
library(tidyverse)
df %>%
mutate(z = list(1:3)) %>%
unnest %>%
arrange(z)
# x y z
#1 1 3 1
#2 2 4 1
#3 1 3 2
#4 2 4 2
#5 1 3 3
#6 2 4 3

We can also do a cross join with sqldf. This creates a Cartesian Product of df and the reps tables:
library(sqldf)
reps <- data.frame(z = 1:3)
sqldf("select * from df, reps order by z")
or simply with map_dfr from purrr:
library(purrr)
map_dfr(1:3, ~cbind(df, z = .))
Output:
x y z
1 1 3 1
2 2 4 1
3 1 3 2
4 2 4 2
5 1 3 3
6 2 4 3

Yet another option using base R
n <- 3
do.call(rbind,
Map(`[<-`, replicate(n = n,
expr = df,
simplify = FALSE),
"z",
value = seq_len(n)))
# x y z
#1 1 3 1
#2 2 4 1
#3 1 3 2
#4 2 4 2
#5 1 3 3
#6 2 4 3

A few other ways not covered yet:
# setup
df = data.frame(x = c(1, 2), y = c(3, 4))
n = 3
# simple row indexing, add column manually
result = df[rep(1:nrow(df), 3), ]
result$id = rep(1:n, each = nrow(df))
# cross join in base
merge(df, data.frame(id = 1:n), by = NULL)
# cross join in tidyr
tidyr::crossing(df, data.frame(id = 1:n))
# dplyr version of the row-index method above
slice(df, rep(1:n(), n)) %>% mutate(id = rep(1:n, each = nrow(df)))
Inspiration drawn heavily from an old question of mine, How can I repeat a data frame?. Basically the same question but without the id column requirement.

Making a new variable, means by group, conditional on value of another variable

I want to find the most efficient way to create a new variable. Suppose I have this data frame:
set.seed(1234)
df <- data.frame(group = c(rep(1,4), rep(2,4)), X = rep(1:4, 2), G = sample(1:10, 8, replace = T) )
I want to make a new variable that is the mean of G within each group, conditional on X being 1 or 2. In the example df, then, the new variable would have the following values:
df$newvar <- c(rep(4.5, 4), rep(8, 4))
Is there a way to do this without resorting the dataframe and then filling down? That seems really cumbersome. Thanks!

After groupig by 'group', filter the 'G' elements based on the logical condition on 'X' and get the mean of those values to create a new column with mutate
library(dplyr)
df %>%
group_by(group) %>%
mutate(newvar = mean(G[X %in% 1:2]))
# A tibble: 8 x 4
# Groups: group [2]
# group X G newvar
# <dbl> <int> <int> <dbl>
#1 1 1 2 4.5
#2 1 2 7 4.5
#3 1 3 7 4.5
#4 1 4 7 4.5
#5 2 1 9 8
#6 2 2 7 8
#7 2 3 1 8
#8 2 4 3 8
Or using ave from base R
df$newvar <- with(df, ave(G * NA^(!X %in% 1:2), group,
FUN = function(x) mean(x, na.rm = TRUE)))

Operations between groups with dplyr

I have a data frame as follow where I would like to group the data by grp and index and use group a as a reference to perform some simple calculations. I would like to subtract the variable value from other group from the values of group a.
df <- data.frame(grp = rep(letters[1:3], each = 2),
index = rep(1:2, times = 3),
value = seq(10, 60, length.out = 6))
df
## grp index value
## 1 a 1 10
## 2 a 2 20
## 3 b 1 30
## 4 b 2 40
## 5 c 1 50
## 6 c 2 60
The desired outpout would be like:
## grp index value
## 1 b 1 20
## 2 b 2 20
## 3 c 1 40
## 4 c 2 40
My guess is it will be something close to:
group_by(df, grp, index) %>%
mutate(diff = value - value[grp == "a"])
Ideally I would like to do it using dplyr.
Regards, Philippe

We can filter for 'grp' that are not 'a' and then do the difference within mutate.
df %>%
filter(grp!="a") %>%
mutate(value = value- df$value[df$grp=="a"])
Or another option would be join
df %>%
filter(grp!="a") %>%
left_join(., subset(df, grp=="a", select=-1), by = "index") %>%
mutate(value = value.x- value.y) %>%
select(1, 2, 5)
# grp index value
#1 b 1 20
#2 b 2 20
#3 c 1 40
#4 c 2 40

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How do I overwrite data with dplyr? - r

Related

Merge two columns and only keep data from second

Cumulative sum across a nested list

repeat dataframe n times whilst adding column

Making a new variable, means by group, conditional on value of another variable

Operations between groups with dplyr

Categories

Resources