Easier way to add rows with totals for groups in dplyr

Easier way to add rows with totals for groups in dplyr - r

How could I add rows with the sum of VL-FOB_real for each CO_ANO-niv100-subsector group in an easier way? I couldn't figure how to use add_rows and the like to do so, only by creating a new dataframe and then appending it.
Here is what I have done:
df <- structure(list(CO_ANO = c("1996", "1990", "1993", "1993", "1994",
"1992", "1995", "1995", "1996", "1995",
"1994", "1990", "1989", "1992", "1995"),
CO_UF = c("32", "45", "45", "36", "55", "99", "36",
"34", "14", "25", "53", "41", "41", "41", "16"),
niv100 = c("2210","1530", "210", "3210", "1530", "2610", "2210",
"2630", "1030","1020", "3020", "3020", "410", "2510",
"1520"),
subsector = c("11","8", "1", "7", "8", "13", "11", "13", "4", "5",
"13", "13", "2","13", "8"),
VL_FOB_real = c(1, 2, 3,
1, 4, 5,
5, 6, 7,
6, 8, 9,
10, 11, 11)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,-15L))
df1 <- df %>%
group_by(CO_ANO, subsector, niv100) %>%
summarise(VL_FOB_real = sum(VL_FOB_real)) %>%
mutate(CO_UF = 'Total')
df <- bind_rows(df1,df)

This groups the rows and then modify each group using adorn_totals.
library(dplyr)
library(janitor)
df %>%
group_by(CO_ANO, CO_UF, niv100) %>%
group_modify(~ adorn_totals(.x, where = "row"))
giving:
# A tibble: 30 x 5
# Groups: CO_ANO, CO_UF, niv100 [15]
CO_ANO CO_UF niv100 subsector VL_FOB_real
<chr> <chr> <chr> <chr> <dbl>
1 1989 41 410 2 10
2 1989 41 410 Total 10
3 1990 41 3020 13 9
4 1990 41 3020 Total 9
5 1990 45 1530 8 2
6 1990 45 1530 Total 2
7 1992 41 2510 13 11
8 1992 41 2510 Total 11
9 1992 99 2610 13 5
10 1992 99 2610 Total 5
# ... with 20 more rows
Another thing to try is the following which gives somewhat different output. It splits the input into groups and applies adorn_totals separately to each group giving a c("tabyl", "tbl_df", "tbl", "data.frame") object.
library(dplyr)
library(janitor)
library(purrr)
df %>%
group_split(CO_ANO, subsector, niv100, CO_UF) %>%
map_df(adorn_totals)

Honestly, I would do what you have done to add rows for each group but for the purpose of demonstrating way to use add_row here's an answer :
library(dplyr)
library(purrr)
df %>%
group_split(CO_ANO, subsector, niv100) %>%
map_df(~add_row(.x, CO_ANO = first(.x$CO_ANO), subsector = first(.x$subsector),
niv100 = first(.x$niv100),VL_FOB_real = sum(.x$VL_FOB_real), CO_UF = 'Total'))
# CO_ANO CO_UF niv100 subsector VL_FOB_real
# <chr> <chr> <chr> <chr> <dbl>
# 1 1989 41 410 2 10
# 2 1989 Total 410 2 10
# 3 1990 41 3020 13 9
# 4 1990 Total 3020 13 9
# 5 1990 45 1530 8 2
# 6 1990 Total 1530 8 2
# 7 1992 41 2510 13 11
# 8 1992 Total 2510 13 11
# 9 1992 99 2610 13 5
#10 1992 Total 2610 13 5
# … with 20 more rows
The only benefit I see of this approach is you get "Total" row for each group immediately after the group unlike in bind_rows where you get all "Total" rows together.

Related

compute variable over the value of the difference between another variable this year and the previous one R

In the data below I want to compute the following ratio tr(year)/(op(year) - op(year-1). I would appreciate an answer with dplyr.
year op tr cp
<chr> <dbl> <dbl> <dbl>
1 1984 10 39.1 38.3
2 1985 55 132. 77.1
3 1986 79 69.3 78.7
4 1987 78 47.7 74.1
5 1988 109 77.0 86.4
this is the expected output
year2 ratio
1 1985 2.933333
2 1986 2.887500
3 1987 -47.700000
4 1988 -2.483871
I do not manage to get to any result...

Use lag:
library(dplyr)
df %>%
mutate(year = year,
ratio = tr / (op - lag(op)),
.keep = "none") %>%
tidyr::drop_na()
# year ratio
#2 1985 2.933333
#3 1986 2.887500
#4 1987 -47.700000
#5 1988 2.483871

We may use
library(dplyr)
df1 %>%
reframe(year = year[-1], ratio = tr[-1]/diff(op))
-output
year ratio
1 1985 2.933333
2 1986 2.887500
3 1987 -47.700000
4 1988 2.483871
data
df1 <- structure(list(year = 1984:1988, op = c(10L, 55L, 79L, 78L, 109L
), tr = c(39.1, 132, 69.3, 47.7, 77), cp = c(38.3, 77.1, 78.7,
74.1, 86.4)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5"))

Calculate average real variability for successive multiple measurements in R

I have a long-form dataframe, with a column (B) including the absolute successive differences between values in column (A), for each individual's ID separately.
ID = c("1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2")
A = c("120", "115", "125", "119", "128", "129", "130", "140", "142", "143", "145", "144", "148")
B = c("NA", "5", "10", "6", "9", "1", "1", "NA", "2", "1", "2", "1", "4")
DF <- data.frame(ID, A, B)
I would like to create a new column (C), that is the sum of the absolute differences before and including each value, divided by (the number of measurements used to calculate it minus 1).
This is what I would like the data to look like:
I hope this makes sense, any help greatly appreciated!

Here's a tidyverse solution. You can first group_by the ID, then divide the cumulative sum (cumsum) of B by the row_number minus one. You can only do this after omitting the first row of each group and replacing it with NA
Note also that in your example, the 'numeric' columns are actually character vectors, so have to be coerced to numeric first.
library(tidyverse)
DF %>%
mutate(across(A:B, \(x) suppressWarnings(as.numeric(x)))) %>%
group_by(ID) %>%
mutate(C = c(NA, cumsum(B[-1])/(row_number() - 1)[-1]))
#> # A tibble: 13 x 4
#> # Groups: ID [2]
#> ID A B C
#> <chr> <dbl> <dbl> <dbl>
#> 1 1 120 NA NA
#> 2 1 115 5 5
#> 3 1 125 10 7.5
#> 4 1 119 6 7
#> 5 1 128 9 7.5
#> 6 1 129 1 6.2
#> 7 1 130 1 5.33
#> 8 2 140 NA NA
#> 9 2 142 2 2
#> 10 2 143 1 1.5
#> 11 2 145 2 1.67
#> 12 2 144 1 1.5
#> 13 2 148 4 2
Created on 2022-11-11 with reprex v2.0.2

How to replace all variable names with the contents of the first row in a tibble

Is there a quick way to replace variable names with the content of the first row of a tibble?
So turning something like this:
Subject Q1 Q2 Q3
Subject age gender cue
429753 24 1 man
b952x8 23 2 mushroom
264062 19 1 night
53082m 35 1 moon
Into this:
Subject age gender cue
429753 24 1 man
b952x8 23 2 mushroom
264062 19 1 night
53082m 35 1 moon
My dataset has over 100 variables so I'm looking for a way that doesn't involve typing out each old and new variable name.

A possible solution:
df <- structure(list(Subject = c("Subject", "429753", "b952x8", "264062",
"53082m"), Q1 = c("age", "24", "23", "19", "35"), Q2 = c("gender",
"1", "2", "1", "1"), Q3 = c("cue", "man", "mushroom", "night",
"moon")), row.names = c(NA, -5L), class = "data.frame")
names(df) <- df[1,]
df <- df[-1,]
df
#> Subject age gender cue
#> 2 429753 24 1 man
#> 3 b952x8 23 2 mushroom
#> 4 264062 19 1 night
#> 5 53082m 35 1 moon

How to sort the row order according to number not character?

I want to sort the row order of the data frame according to number, not character. My row indices for my data frame are numeric with an order of 1,10,11,12,2,20,21,22, etc. I have used order() trying to sort my row indices to 1,2,3,4,5,6,7,8,9,10, etc, but my row indices just stayed the same.
So my data frame has 1 column with 11 rows:
structure(list(`colSums(fake_with_noise_boundary)` = c(-3405, 2304,
-4096, 474, -2089, -3921, -2590, 1605, 1317, 2804, 2934)),
row.names = c("1", "10", "11", "12", "2", "20", "21", "3", "30", "31" ,
"40"), class = "data.frame")

rownames are always stored as characters, if you want to sort them according to their numeric value you can change it to numeric and order.
df <- df[order(as.numeric(rownames(df))), , drop = FALSE]
df
# colSums(fake_with_noise_boundary)
#1 -3405
#2 -2089
#3 1605
#10 2304
#11 -4096
#12 474
#20 -3921
#21 -2590
#30 1317
#31 2804
#40 2934

library(tidyverse)
df <-
structure(list(`colSums(fake_with_noise_boundary)` = c(-3405, 2304,
-4096, 474, -2089, -3921, -2590, 1605, 1317, 2804, 2934)),
row.names = c("1", "10", "11", "12", "2", "20", "21", "3", "30", "31" ,
"40"), class = "data.frame")
df %>%
#Create a column with your rowname
rownames_to_column() %>%
#Transform rowname to numeric
mutate(rowname = as.numeric(rowname)) %>%
# Sort row order by rowname
arrange(rowname)
rowname colSums(fake_with_noise_boundary)
1 1 -3405
2 2 -2089
3 3 1605
4 10 2304
5 11 -4096
6 12 474
7 20 -3921
8 21 -2590
9 30 1317
10 31 2804
11 40 2934

How to combine the across () function with mutate () and case_when () to mutate values in multiple columns according to a condition?

I have demographic data set, which includes the age of people in a household. This is collected via a survey and participants are allowed to refuse providing their age.
The result is a data set with one household per row (each with a household ID code), and various household characteristics such as age in the columns. Refused responses as coded as "R", and you could re-create a sample using the code below:
df <- list(Household_ID = c("1A", "1B", "1C", "1D", "1E"),
AGE1 = c("25", "47", "39", "50", "R"),
AGE2 = c("66", "23", "71", "R", "16"),
AGE3 = c("28", "17", "R", "R", "80"),
AGE4 = c("81", "22", "48", "59", "R"))
df <- as_tibble(df)
> df
# A tibble: 5 x 5
Household_ID AGE1 AGE2 AGE3 AGE4
<chr> <chr> <chr> <chr> <chr>
1 1A 25 66 28 81
2 1B 47 23 17 22
3 1C 39 71 R 48
4 1D 50 R R 59
5 1E R 16 80 R
For our intents and purposes we re-code the "R" to "-9" so that we can subsequently convert the format of the AGE columns to integer, and carry out analysis. We usually do this in another software and my objective is to replicate this process in R.
I have managed to do this with the following code:
df <- df %>% mutate(AGE1 = case_when(AGE1 == "R" ~ "-9", TRUE ~ as.character(AGE1)))
df <- df %>% mutate(AGE2 = case_when(AGE2 == "R" ~ "-9", TRUE ~ as.character(AGE2)))
df <- df %>% mutate(AGE3 = case_when(AGE3 == "R" ~ "-9", TRUE ~ as.character(AGE3)))
df <- df %>% mutate(AGE4 = case_when(AGE4 == "R" ~ "-9", TRUE ~ as.character(AGE4)))
Given that this feels clumsy, I tried to find a solution using mutate_if etc. but read that these have been superseded by across(). Hence, I tried to replicate this operation using across():
df <- df %>%
mutate(across(AGE1:AEG4),
~ (case_when(. == "R" ~ "-9")))
But I get the following error:
Error: Problem with `mutate()` input `..2`.
x Input `..2` must be a vector, not a `formula` object.
i Input `..2` is `~(case_when(. == "R" ~ "-9"))`.
Been wrestling with this and googling for a while now but can't figure out what I am missing. Would really appreciate some input on how to get this working, please and thank you.
EDIT: Solved!
df <- df %>%
mutate(across(AGE1:AGE4, ~ (case_when(.x == "R" ~ "-9", TRUE ~ as.character(.x)))))

Or maybe this one which is not much difference from dear #TarJae's interpretation:
library(dplyr)
library(stringr)
df %>%
mutate(across(AGE1:AGE4, ~ str_replace(., "R", "-9")),
across(AGE1:AGE4, as.integer))
# A tibble: 5 x 5
Household_ID AGE1 AGE2 AGE3 AGE4
<chr> <int> <int> <int> <int>
1 1A 25 66 28 81
2 1B 47 23 17 22
3 1C 39 71 -9 48
4 1D 50 -9 -9 59
5 1E -9 16 80 -9
Data:
df <- list(Household_ID = c("1A", "1B", "1C", "1D", "1E"),
AGE1 = c("25", "47", "39", "50", "R"),
AGE2 = c("66", "23", "71", "R", "16"),
AGE3 = c("28", "17", "R", "R", "80"),
AGE4 = c("81", "22", "48", "59", "R"))
df <- as_tibble(df)

Why not simply?
df[,2:5][df[, 2:5] == 'R'] <- '-9'
# A tibble: 5 x 5
Household_ID AGE1 AGE2 AGE3 AGE4
<chr> <chr> <chr> <chr> <chr>
1 1A 25 66 28 81
2 1B 47 23 17 22
3 1C 39 71 -9 48
4 1D 50 -9 -9 59
5 1E -9 16 80 -9

You could use across with replace.
List to tibble with as_tibble()
replace R with -9
integer class for AGE
df %>%
as_tibble() %>%
mutate(across(everything(), ~replace(., . == "R" , "-9"))) %>%
type.convert(as.is=TRUE)
Output:
Household_ID AGE1 AGE2 AGE3 AGE4
<chr> <int> <int> <int> <int>
1 1A 25 66 28 81
2 1B 47 23 17 22
3 1C 39 71 -9 48
4 1D 50 -9 -9 59
5 1E -9 16 80 -9

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Easier way to add rows with totals for groups in dplyr - r

Related

compute variable over the value of the difference between another variable this year and the previous one R

Calculate average real variability for successive multiple measurements in R

How to replace all variable names with the contents of the first row in a tibble

How to sort the row order according to number not character?

How to combine the across () function with mutate () and case_when () to mutate values in multiple columns according to a condition?

Categories

Resources