dplyr row sum on selected rows [duplicate] - r

This question already has an answer here:
Row sum using mutate and select [duplicate]
(1 answer)
Closed 4 years ago.
I have the following data:
library(dplyr)
library(purrr)
d <- data.frame(
Type= c("d", "e", "d", "e"),
"2000"= c(1, 5, 1, 5),
"2001" = c(2, 5 , 6, 4),
"2002" = c(8, 9, 6, 3))
I would like to use rowsum and mutate to generate a new row which is the sum of 'd' and another row which is the sum of 'e' so that the data looks like this:
d2 <- data.frame(
Type= c("d", "e", "d", "e", "sum_of_d", "Sum_of_e"),
"2000"= c(1, 5, 1, 5, 2, 10),
"2001" = c(2, 5 , 6, 4, 8, 9),
"2002" = c(8, 9, 6, 3, 14, 12))
I think the code should look something like this:
d %>%
dplyr::mutate(sum_of_d = rowSums(d[1,3], na.rm = TRUE)) %>%
dplyr::mutate(sum_of_e = rowSums(d[2,4], na.rm = TRUE)) -> d2
however this does not quite work. Any ideas?
Thanks

You're looking for the sum by Type across all other columns, so..
library(dplyr)
d %>%
group_by(Type) %>%
summarise_all(sum) %>%
mutate(Type = paste0("sum_of_", Type)) %>%
rbind(d, .)
Type X2000 X2001 X2002
1 d 1 2 8
2 e 5 5 9
3 d 1 6 6
4 e 5 4 3
5 sum_of_d 2 8 14
6 sum_of_e 10 9 12

d %>%
group_by(Type) %>%
summarize_all(sum) %>%
mutate(Type=paste0("sum_of_",Type)) %>%
bind_rows(d,.)
Type X2000 X2001 X2002
1 d 1 2 8
2 e 5 5 9
3 d 1 6 6
4 e 5 4 3
5 sum_of_d 2 8 14
6 sum_of_e 10 9 12

Related

R How to Generate Sequences in a Tibble Given Start and End Points

I can't think how to do this in a tidy fashion.
I have a table as follows:
tibble(
Min = c(1, 5, 12, 13, 19),
Max = c(3, 11, 12, 14, 19),
Value = c("a", "bb", "c", "d", "e" )
)
and I want to generate another table from it as shown below
tibble(
Row = c(1:3, 5:11, 12:12, 13:14, 19:19),
Value = c( rep("a", 3), rep("bb", 7), "c", "d", "d", "e" )
)
Grateful for any suggestions folk might have. The only 'solutions' which come to mind are a bit cumbersome.
1) If DF is the input then:
library(dplyr)
DF %>%
group_by(Value) %>%
group_modify(~ tibble(Row = seq(.$Min, .$Max))) %>%
ungroup
giving:
# A tibble: 14 x 2
Value Row
<chr> <int>
1 a 1
2 a 2
3 a 3
4 bb 5
5 bb 6
6 bb 7
7 bb 8
8 bb 9
9 bb 10
10 bb 11
11 c 12
12 d 13
13 d 14
14 e 19
2) This one creates a list column L containing tibbles and then unnests it. Duplicate Value elements are ok with this one.
library(dplyr)
library(tidyr)
DF %>%
rowwise %>%
summarize(L = list(tibble(Value, Row = seq(Min, Max)))) %>%
ungroup %>%
unnest(L)

How to add new column and calculate recursive cum using dplyr and shift

I have a dataset: (actually I have more than 100 groups)
and I want to use dplyr to create a variable-y for each group, and fill first value of y to be 1,
Second y = 1* first x + 2*first y
The result would be:
I tried to create a column- y, all=1, then use
df%>% group_by(group)%>% mutate(var=shift(x)+2*shift(y))%>% ungroup()
but the formula for y become, always use initialize y value--1
Second y = 1* first x + 2*1
Could someone give me some ideas about this? Thank you!
The dput of my result data is:
structure(list(group = c("a", "a", "a", "a", "a", "b", "b", "b" ), x =
c(1, 2, 3, 4, 5, 6, 7, 8), y = c(1, 3, 8, 19, 42, 1, 8, 23)),
row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame" ))
To perform such calculation we can use accumulate from purrr or Reduce in base R.
Since you are already using dplyr we can use accumulate :
library(dplyr)
df %>%
group_by(group) %>%
mutate(y1 = purrr::accumulate(x[-n()], ~.x * 2 + .y, .init = 1))
# group x y y1
# <chr> <dbl> <dbl> <dbl>
#1 a 1 1 1
#2 a 2 3 3
#3 a 3 8 8
#4 a 4 19 19
#5 a 5 42 42
#6 b 6 1 1
#7 b 7 8 8
#8 b 8 23 23

Dplyr rolling balance

I am trying to compute a balance column.
So, to show an example, I want to go from this:
df <- data.frame(group = c("A", "A", "A", "A", "A"),
start = c(5, 0, 0, 0, 0),
receipt = c(1, 5, 6, 4, 6),
out = c(4, 5, 3, 2, 5))
> df
group start receipt out
1 A 5 1 4
2 A 0 5 5
3 A 0 6 3
4 A 0 4 2
5 A 0 6 5
to creating a new balance column like the following
> dfb
group start receipt out balance
1 A 5 1 4 2
2 A 0 5 5 2
3 A 0 6 3 5
4 A 0 4 2 7
5 A 0 6 5 8
I tried the following attempt but it isn't working
dfc <- df %>%
group_by(group) %>%
mutate(balance = if_else(row_number() == 1, start + receipt - out, (lag(balance) + receipt) - out)) %>%
ungroup()
Would really appreciate some help with this. Thanks!
You could use cumsum from dplyr. Note: I had to change your initial df table to match the one in your required result because you have different data in "out".
df <- data.frame(group = c("A", "A", "A", "A", "A"),
start = c(5, 0, 0, 0, 0),
receipt = c(1, 5, 6, 4, 6),
out = c(4, 5, 3, 2, 5))
dfc <- df %>%
group_by(group) %>%
mutate(balance=cumsum(start+receipt-out))
Source: local data frame [5 x 5]
Groups: group [1]
group start receipt out balance
<fctr> <dbl> <dbl> <dbl> <dbl>
1 A 5 1 4 2
2 A 0 5 5 2
3 A 0 6 3 5
4 A 0 4 2 7
5 A 0 6 5 8

Unique body count column

I'm trying to add a body count for each unique person. Each person has multiple data points.
df <- data.frame(PERSON = c("A", "A", "A", "B", "B", "C", "C", "C", "C"),
Y = c(2, 5, 4, 1, 2, 5, 3, 7, 1))
This is what I'd like it to look like:
PERSON Y UNIQ_CT
1 A 2 1
2 A 5 0
3 A 4 0
4 B 1 1
5 B 2 0
6 C 5 1
7 C 3 0
8 C 7 0
9 C 1 0
You can use duplicated and negate it:
transform(df, uniqct = as.integer(!duplicated(Person)))
Since there is dplyr tag to the question here is an option
library(dplyr)
df %>%
group_by(PERSON) %>%
mutate(UNIQ_CT = ifelse(row_number( ) == 1, 1, 0))

Easiest way to reshape this dataframe in R? [duplicate]

This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 4 years ago.
Say I have the following wide/messy dataframe:
df1 <- data.frame(ID = c(1, 2), Gender = c("M","F"),
Q1 = c(1, 5), Q2 = c(2, 6),
Q3 = c(3, 7), Q4 = c(4, 8))
ID Gender Q1 Q2 Q3 Q4
1 M 1 2 3 4
2 F 5 6 7 8
how can I turn it into this dataframe:
df2 <- data.frame(ID = c(1, 1, 2, 2), Gender = c("M", "M", "F", "F"),
V1 = c(1, 3, 5, 7), V2 = c(2, 4, 6, 8))
ID Gender V1 V2
1 M 1 2
1 M 3 4
2 F 5 6
2 F 7 8
I know there are multiple packages and functions (e.g., tidyr, reshape2, reshape function) that can accomplish this. Which is the easiest way to do it and how? Really appreciate any help anyone can provide. Thanks!
You could try melt from the devel version of data.table i.e v1.9.5. It can take multiple variables in the measure.vars as a list. Instructions to install the devel version are here
library(data.table)#v1.9.5+
melt(setDT(df1), measure.vars=list(c(3,5), c(4,6)),
value.name=c('V1', 'V2'))[,variable:=NULL][order(ID)]
# ID Gender V1 V2
#1: 1 M 1 2
#2: 1 M 3 4
#3: 2 F 5 6
#4: 2 F 7 8
Or use reshape from base R
res <- subset(reshape(df1, idvar=c('ID', 'Gender'),
varying=list(c(3,5), c(4,6)), direction='long'), select=-time)
row.names(res) <- NULL
Update
If we need to transform back the 'df2' to 'df1', dcast from data.table could be used. It can take multiple value.var columns. We need to create a sequence column (N) by group ('ID', 'Gender') before proceeding with dcast
dcast(setDT(df2)[, N:=1:.N, list(ID, Gender)], ID+Gender~N,
value.var=c('V1', 'V2'))
# ID Gender 1_V1 2_V1 1_V2 2_V2
#1: 1 M 1 3 2 4
#2: 2 F 5 7 6 8
Or we create a sequence by group with ave and then use reshape from base R.
df2 <- transform(df2, N= ave(seq_along(ID), ID, Gender, FUN=seq_along))
reshape(df2, idvar=c('ID', 'Gender'), timevar='N', direction='wide')
# ID Gender V1.1 V2.1 V1.2 V2.2
#1 1 M 1 2 3 4
#3 2 F 5 6 7 8
data
df1 <- data.frame(ID = c(1, 2), Gender = c("M","F"), Q1 = c(1, 5),
Q2 = c(2, 6), Q3 = c(3, 7), Q4 = c(4, 8))
df2 <- data.frame(ID = c(1, 1, 2, 2), Gender = c("M", "M", "F", "F"),
V1 = c(1, 3, 5, 7), V2 = c(2, 4, 6, 8))

Resources