Divide columns by a reference row

Divide columns by a reference row - r

I need to divide columns despesatotal and despesamonetaria by the row named Total:

Lets suppose your data set is df.
# 1) Delete the last row
df <- df[-nrow(df),]
# 2) Build the desired data.frame [combining the CNAE names and the proportion columns
new.df <- cbind(grup_CNAE = df$grup_CNAE,
100*prop.table(df[,-1],margin = 2))
Finally, rename your columns. Be careful with the matrix or data.frame formats, because sometimes mathematical operations may suppose a problem. If you you use dput function in order to give us a reproducible example, the answer would be more accurate.

Here is a way to get it done. This is not the best way, but I think it is very readable.
Suppose this is your data frame:
mydf = structure(list(grup_CNAE = c("A", "B", "C", "D", "E", "Total"
), despesatotal = c(71, 93, 81, 27, 39, 311), despesamonetaria = c(7,
72, 36, 22, 73, 210)), row.names = c(NA, -6L), class = "data.frame")
mydf
# grup_CNAE despesatotal despesamonetaria
#1 A 71 7
#2 B 93 72
#3 C 81 36
#4 D 27 22
#5 E 39 73
#6 Total 311 210
To divide despesatotal values with its total value, you need to use the total value (311 in this example) as the denominator. Note that the total value is located in the last row. You can identify its position by indexing the despesatotal column and use nrow() as the index value.
mydf |> mutate(percentage1 = despesatotal/despesatotal[nrow(mydf)],
percentage2 = despesamonetaria /despesamonetaria[nrow(mydf)])
# grup_CNAE despesatotal despesamonetaria percentage1 percentage2
#1 A 71 7 0.22829582 0.03333333
#2 B 93 72 0.29903537 0.34285714
#3 C 81 36 0.26045016 0.17142857
#4 D 27 22 0.08681672 0.10476190
#5 E 39 73 0.12540193 0.34761905
#6 Total 311 210 1.00000000 1.00000000

library(tidyverse)
Sample data
# A tibble: 11 x 3
group despesatotal despesamonetaria
<chr> <int> <int>
1 1 198 586
2 2 186 525
3 3 202 563
4 4 300 562
5 5 126 545
6 6 215 529
7 7 183 524
8 8 163 597
9 9 213 592
10 10 175 530
11 Total 1961 5553
df %>%
mutate(percentage_total = despesatotal / last(despesatotal),
percentage_monetaria = despesamonetaria/ last(despesamonetaria)) %>%
slice(-nrow(.))
# A tibble: 10 x 5
group despesatotal despesamonetaria percentage_total percentage_monetaria
<chr> <int> <int> <dbl> <dbl>
1 1 198 586 0.101 0.106
2 2 186 525 0.0948 0.0945
3 3 202 563 0.103 0.101
4 4 300 562 0.153 0.101
5 5 126 545 0.0643 0.0981
6 6 215 529 0.110 0.0953
7 7 183 524 0.0933 0.0944
8 8 163 597 0.0831 0.108
9 9 213 592 0.109 0.107
10 10 175 530 0.0892 0.0954

This is a good place to use dplyr::mutate(across()) to divide all relevant columns by the Total row. Note this is not sensitive to the order of the rows and will apply the manipulation to all numeric columns. You can supply any tidyselect semantics to across() instead if needed in your case.
library(tidyverse)
# make sample data
d <- tibble(grup_CNAE = paste0("Group", 1:12),
despesatotal = sample(1e6:5e7, 12),
despesamonetaria = sample(1e6:5e7, 12)) %>%
add_row(grup_CNAE = "Total", summarize(., across(where(is.numeric), sum)))
# divide numeric columns by value in "Total" row
d %>%
mutate(across(where(is.numeric), ~./.[grup_CNAE == "Total"]))
#> # A tibble: 13 × 3
#> grup_CNAE despesatotal despesamonetaria
#> <chr> <dbl> <dbl>
#> 1 Group1 0.117 0.0204
#> 2 Group2 0.170 0.103
#> 3 Group3 0.0451 0.0837
#> 4 Group4 0.0823 0.114
#> 5 Group5 0.0170 0.0838
#> 6 Group6 0.0174 0.0612
#> 7 Group7 0.163 0.155
#> 8 Group8 0.0352 0.0816
#> 9 Group9 0.0874 0.135
#> 10 Group10 0.113 0.0877
#> 11 Group11 0.0499 0.0495
#> 12 Group12 0.104 0.0251
#> 13 Total 1 1
Created on 2022-11-08 with reprex v2.0.2

Related

How can I mutate all columns to change class using dplyr in R?

I have a dataset let say it df which has 100 columns of different type of column vectors (class), i.e character, integer,double etc.
How I can use dplyr or any other tidyverse like functions in order to change the class of columns? I want to mutate_if across all columns that are integer to double.
And then keep only the as.double columns and drop the character columns?
Is there a way in R ?
Any help ?

In both cases you can use tidy select calls to choose the correct columns:
library(dplyr)
df <- tibble(
a = runif(100),
b = sample(1:1000, 100),
c = sample(letters, 100, replace = TRUE),
d = rnorm(100),
e = 101:200
)
df |>
mutate(across(where(is.integer), as.double)) |>
select(where(is.double))
#> # A tibble: 100 × 4
#> a b d e
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.196 468 -1.35 101
#> 2 0.373 865 0.123 102
#> 3 0.0250 534 0.131 103
#> 4 0.622 388 0.426 104
#> 5 0.354 670 0.625 105
#> 6 0.806 474 -1.15 106
#> 7 0.282 318 -1.27 107
#> 8 0.813 331 1.05 108
#> 9 0.360 165 -0.765 109
#> 10 0.0929 645 -0.0232 110
#> # … with 90 more rows

R:dplyr summarise data by group with nth() call with variable n calculated during aggregation

I'm aggregating data with variable bin sizes (see previous question here: R: aggregate every n rows with variable n depending on sum(n) of second column). In addition to calculating sums and means over groups of variable ranges, I need to pull out single-value covariates at the midpoint of each group range. When I try to do this on the fly, I only get a value for the first group and NAs for the remaining.
df.summary<-as.data.frame(df %>%
mutate(rn = row_number()) %>%
group_by(grp = (cumsum(d)-1)%/% 100 + 1) %>%
summarise(x=mean(x, na.rm = TRUE), d=sum(d, na.rm=T), ,i.start=first(rn), i.end=last(rn), y=nth(y, round(first(rn)+(last(rn)-first(rn))/2-1))))
head(df.summary)
grp x d i.start i.end y
1 1 0.07458317 88.99342 1 4 19.78992
2 2 0.07594546 97.62130 5 8 NA
3 3 0.05353308 104.69683 9 12 NA
4 4 0.06498291 106.23468 13 16 NA
5 5 0.08601759 98.24939 17 20 NA
6 6 0.06262427 84.43745 21 23 NA
sample data:
structure(list(x = c(0.10000112377193, 0.110742170350877, 0.0300274304561404,
0.0575619395964912, 0.109060465438596, 0.0595491225614035, 0.0539270264912281,
0.0812452063859649, 0.0341699389122807, 0.0391744879122807, 0.0411787485614035,
0.0996091644385965, 0.0970479474912281, 0.0595715843684211, 0.0483489989122807,
0.0549631194561404, 0.0705080555964912, 0.080437472631579, 0.105883664631579,
0.0872411613684211, 0.103236660631579, 0.0381296894912281, 0.0465064491578947,
0.0936565184561403, 0.0410095752631579, 0.0311180032105263, 0.0257758157894737,
0.0354721928947368, 0.0584999394736842, 0.0241286060175439, 0.112053376666667,
0.0769823868596491, 0.0558137530526316, 0.0374491000701754, 0.0419279142631579,
0.0260257506842105, 0.0544360374561404, 0.107411071842105, 0.103873468,
0.0419322114035088, 0.0483912961052632, 0.0328373653157895, 0.0866868717719298,
0.063990467245614, 0.0799280314035088, 0.123490407070175, 0.145676836280702,
0.0292878782807018, 0.0432093036666667, 0.0203547443684211),
d = c(22.2483512600033, 22.2483529247042, 22.2483545865809,
22.2483562542823, 22.24835791863, 25.1243105415557, 25.1243148759953,
25.1243192107884, 25.1243235416981, 25.1243278750792, 27.2240858553058,
27.2240943134697, 27.2241027638674, 27.224111222031, 27.2241196741942,
24.5623431981188, 24.5623453409221, 24.5623474809012, 24.562349626705,
24.5623517696847, 28.1458125837154, 28.1458157376341, 28.1458188889053,
28.1458220452951, 28.1458251983314, 27.8293318542146, 27.8293366652115,
27.8293414829159, 27.829346292148, 27.8293511094993, 27.5271773325046,
27.5271834011289, 27.5271894694002, 27.5271955369655, 27.5272016048837,
28.0376097925214, 28.0376146410729, 28.0376194959786, 28.0376243427651,
28.0376291969647, 26.8766095768196, 26.8766122563318, 26.8766149309023,
26.8766176123562, 26.8766202925746, 27.8736950101666, 27.8736960528853,
27.8736971017815, 27.8736981446767, 27.8736991932199), y = c(19.79001,
19.789922, 19.789834, 19.789746, 19.789658, 19.78957, 19.789468,
19.789366, 19.789264, 19.789162, 19.78906, 19.78896, 19.78886,
19.78876, 19.78866, 19.78856, 19.788458, 19.788356, 19.788254,
19.788152, 19.78805, 19.787948, 19.787846, 19.787744, 19.787642,
19.78754, 19.787442, 19.787344, 19.787246, 19.787148, 19.78705,
19.786956, 19.786862, 19.786768, 19.786674, 19.78658, 19.786486,
19.786392, 19.786298, 19.786204, 19.78611, 19.786016, 19.785922,
19.785828, 19.785734, 19.78564, 19.785544, 19.785448, 19.785352,
19.785256)), row.names = c(NA, 50L), class = "data.frame")

Let's add variable z and n in summarise part. Those variables are defined as below.
df %>%
mutate(rn = row_number()) %>%
group_by(grp = (cumsum(d)-1)%/% 100 + 1) %>%
summarise(x=mean(x, na.rm = TRUE),
d=sum(d, na.rm=T), ,i.start=first(rn),
i.end=last(rn),
z = round(first(rn)+(last(rn)-first(rn))/2-1),
n = n())
grp x d i.start i.end z n
<dbl> <dbl> <dbl> <int> <int> <dbl> <int>
1 1 0.0746 89.0 1 4 2 4
2 2 0.0759 97.6 5 8 6 4
3 3 0.0535 105. 9 12 10 4
4 4 0.0650 106. 13 16 14 4
5 5 0.0860 98.2 17 20 18 4
6 6 0.0626 84.4 21 23 21 3
7 7 0.0479 112. 24 27 24 4
8 8 0.0394 83.5 28 30 28 3
9 9 0.0706 110. 31 34 32 4
10 10 0.0575 112. 35 38 36 4
11 11 0.0647 83.0 39 41 39 3
12 12 0.0659 108. 42 45 42 4
13 13 0.0854 111. 46 49 46 4
14 14 0.0204 27.9 50 50 49 1
In dataframe above, n indicates sample size of each groups separated by grp. However, as you state group_by(grp), when you call nth(y, z), YOU WILL CALL Z-TH VALUE BY GROUP.
It means that for 5th group, although there exists only 4 values, you call 18th value of y. So it prints NA.
To get this easy, the most simple way I think is use n().
df %>%
mutate(rn = row_number()) %>%
group_by(grp = (cumsum(d)-1)%/% 100 + 1) %>%
summarise(x=mean(x, na.rm = TRUE),
d=sum(d, na.rm=T), ,i.start=first(rn),
i.end=last(rn),
y=nth(y, round(n()/2)))
grp x d i.start i.end y
<dbl> <dbl> <dbl> <int> <int> <dbl>
1 1 0.0746 89.0 1 4 19.8
2 2 0.0759 97.6 5 8 19.8
3 3 0.0535 105. 9 12 19.8
4 4 0.0650 106. 13 16 19.8
5 5 0.0860 98.2 17 20 19.8
6 6 0.0626 84.4 21 23 19.8
7 7 0.0479 112. 24 27 19.8
8 8 0.0394 83.5 28 30 19.8
9 9 0.0706 110. 31 34 19.8
10 10 0.0575 112. 35 38 19.8
11 11 0.0647 83.0 39 41 19.8
12 12 0.0659 108. 42 45 19.8
13 13 0.0854 111. 46 49 19.8
14 14 0.0204 27.9 50 50 NA
You'll call floor(n/2)th y, which means y that locates middle of each group. Note that you can also try floor(n/2)+1.

df %>%
mutate(rn = row_number()) %>%
group_by(grp = (cumsum(d)-1)%/% 100 + 1) %>%
summarise(x=mean(x, na.rm = TRUE),
d = sum(d, na.rm=T),
i.start=first(rn),
i.end=last(rn),
y = nth(y, floor(median(rn)) - i.start))

dplyr: omit rows with negative values conditional on two duplicated parameters per row

Let's say I have the following stored in p
los tti ID
1 1.002083333 23.516667 84
2 -0.007638889 2.633333 118
3 0.036805556 2.633333 118
4 0.134722222 2.716667 120
5 2.756250000 82.800000 132
6 1.066666667 17.933333 156
7 -2.496250000 12.830948 156
I want to filter out rows with negative values for p$los, but only if p$tti and p$ID are duplicated between the rows. E.g., row 2 and 3 are duplicated on both p$tti and p$ID, and therefore should row 2 be omitted due to negative value in p$los.
Row 6 and 7 are duplicated in regards to p$ID, but not p$tti, and should therefore stay.
I am looking for a solution in dplyr
p <- structure(list(los = c(1.00208333333333, -0.00763888888888889,
0.0368055555555556, 0.134722222222222, 2.75625, 1.06666666666667,
-0.00763888888888889, 4.84305555555556, 1.79375, 8.55694444444444
), tti = c(23.5166666666667, 2.63333333333333, 2.63333333333333,
2.71666666666667, 82.8, 17.9333333333333, 1.31666666666667, 69.2666666666667,
52.9833333333333, 36.0166666666667), ID = c(84L, 118L, 118L,
120L, 132L, 156L, 179L, 245L, 253L, 334L)), row.names = c(NA,
10L), class = "data.frame")

Depending on your measure, you may want to round your tti column (which is a numeric decimal) to some tolerance level (e.g., 3 decimal places) as part of data processing.
Using dplyr you could try something like:
p %>%
group_by(tti, ID) %>%
filter(n() == 1 | los >= 0)
This would filter/keep rows where there are no duplicates by tti and ID (n() == 1) for the group), and then if duplicates exist, keep those where los is positive or zero (not negative).
Output
los tti ID
<dbl> <dbl> <int>
1 1.00 23.5 84
2 0.0368 2.63 118
3 0.135 2.72 120
4 2.76 82.8 132
5 1.07 17.9 156
6 -0.00764 1.32 179
7 4.84 69.3 245
8 1.79 53.0 253
9 8.56 36.0 334

if I understood correctly
library(tidyverse)
df <- read.table(text = " los tti ID
1 1.002083333 23.516667 84
2 -0.007638889 2.633333 118
3 0.036805556 2.633333 118
4 0.134722222 2.716667 120
5 2.756250000 82.800000 132
6 1.066666667 17.933333 156
7 -2.496250000 12.830948 156", header = T)
df %>%
group_by(ID) %>%
filter((sd(tti, na.rm = T) + los) > 0 | is.na(sd(tti, na.rm = T))) %>%
ungroup()
#> # A tibble: 6 x 3
#> los tti ID
#> <dbl> <dbl> <int>
#> 1 1.00 23.5 84
#> 2 0.0368 2.63 118
#> 3 0.135 2.72 120
#> 4 2.76 82.8 132
#> 5 1.07 17.9 156
#> 6 -2.50 12.8 156
Created on 2021-03-15 by the reprex package (v1.0.0)

Looping over Dataframes and Columns in R

I have a dataframe with this structure:
df <- read.table(text="
site date v1 v2 v3 v4
a 2019-08-01 0 17 94 150
b 2019-08-01 5 25 83 148
c 2019-08-01 6 39 43 148
d 2019-08-01 10 39 144 165
a 2019-03-31 4 15 106 154
b 2019-03-31 4 21 70 151
c 2019-03-31 8 30 44 148
d 2019-03-31 9 41 144 160
a 2019-01-04 3 10 104 153
b 2019-01-04 2 16 90 150
c 2019-01-04 8 40 62 151
d 2019-01-04 9 43 142 162
a 2019-07-07 3 14 93 152
b 2019-07-07 2 23 74 147
c 2019-07-07 9 31 58 147
d 2019-07-07 9 36 123 170
a 2019-06-17 0 12 91 153
b 2019-06-17 3 25 73 147
c 2019-06-17 7 35 45 146
d 2019-06-17 8 40 134 168
a 2019-01-11 4 14 104 153
b 2019-01-11 5 18 73 151
c 2019-01-11 7 35 65 147
d 2019-01-11 11 44 134 168
a 2019-11-11 4 20 103 152
b 2019-11-11 6 22 79 152
c 2019-11-11 5 38 52 147
d 2019-11-11 10 38 144 163
a 2019-09-06 3 13 102 155
b 2019-09-06 6 17 74 149
c 2019-09-06 9 32 45 146
d 2019-09-06 11 42 138 165
", header=TRUE, stringsAsFactors=FALSE)
Now, I would like to calculate the statistic (min, max, mean, median, sd) of the variables (v1 - v4) for each of the sites for a full year, only the summer and only the winter.
First I subsetted the data for the summer and winter using the following code:
df_summer <- selectByDate(df, month = c(4:9))
df_winter <- selectByDate(df, month = c(1,2,3,10,11,12))
Then I tried to build a loop for the season and then for the variables. For this i created two lists:
df_list <- list(df, df_summer, df_winter)
col_names <- c("v1", "v2", "v3", "v4")
which I then tried to implement in the loop:
for (i in seq_along(df_list)){
for (j in col_names[,i]){
[j]_[i] <- describeBy([i]$[,j], [i]$site)
[j]_[i] <- data.frame(matrix(unlist([j]_[i]), nrow=length([j]_[i]), byrow=T))
[j]_[i]$site <- c("Frau2", "MW", "Sys1", "Sys4")
[j]_[i]$season <- c([i], [i], [i], [i])
[j]_[i]$type <- c([j], [j], [j], [j])
}
}
But this did not work - I get the messages:
Error: unexpected '[' in:
"for (j in col_names[,i]){
["
Error: unexpected '[' in " ["
Error: unexpected '}' in " }"
I already used the loop-"workflow" to generate the data I wanted, but this was done with copy and paste in order to get the data quick and dirty. Now I would like to tidy up the code.
Do you have an Idea how I could make this work or what I am doing wrong?
Thank you!
Matthias
UPDATE
So I tried what ekoam suggested - thank you for that! - and the following problems occured.
In contrary to the comments I wrote below ekoam's answer, the error occurs with both datasets (the example one provided here and the actual one I'm using - I'm not sure whether I'm allowed to publish the dataset).
This is my used code and the error I got:
df <- read_excel("C:/###/###/###/Example_data.xlsx")
df <- subset(data_watersamples, site %in% c("a","b","c", "d"))
my_summary <-
. %>%
group_by(site) %>%
summarise_at(vars(
c(v1, v2, v3, v4),
list(min = min, max = max, mean = mean, median = median, sd = sd)
)) %>%
pivot_longer(-site, names_to = c("type", "stat"), names_sep = "_") %>%
pivot_wider(names_from = "stat")
summer <- as.integer(format.Date(df$date, "%m")) %in% 4:9
df_list <- list(full_year = df, summer = df[summer, ], winter = df[!summer, ])
lapply(df_list, my_summary)
and get this error:
Error: Must subset columns with a valid subscript vector.
x Subscript has the wrong type `list`.
i It must be numeric or character.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
Error in `*tmp*`[[id - n]] :
attempt to select more than one element in integerOneIndex
Thanks for your help!
Matthias

As you want things to be tidy, how about this tidyverse approach to your problem?
library(dplyr)
library(tidyr)
my_summary <-
. %>%
group_by(site) %>%
summarise(across(
c(v1, v2, v3, v4),
list(min = min, max = max, mean = mean, median = median, sd = sd)
)) %>%
pivot_longer(-site, names_to = c("type", "stat"), names_sep = "_") %>%
pivot_wider(names_from = "stat")
summer <- as.integer(format.Date(df$date, "%m")) %in% 4:9
df_list <- list(full_year = df, summer = df[summer, ], winter = df[!summer, ])
lapply(df_list, my_summary)
Output
`summarise()` ungrouping output (override with `.groups` argument)
`summarise()` ungrouping output (override with `.groups` argument)
`summarise()` ungrouping output (override with `.groups` argument)
$full_year
# A tibble: 16 x 7
site type min max mean median sd
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a v1 0 4 2.62 3 1.69
2 a v2 10 20 14.4 14 3.07
3 a v3 91 106 99.6 102. 5.93
4 a v4 150 155 153. 153 1.49
5 b v1 2 6 4.12 4.5 1.64
6 b v2 16 25 20.9 21.5 3.52
7 b v3 70 90 77 74 6.63
8 b v4 147 152 149. 150. 1.92
9 c v1 5 9 7.38 7.5 1.41
10 c v2 30 40 35 35 3.78
11 c v3 43 65 51.8 48.5 8.84
12 c v4 146 151 148. 147 1.60
13 d v1 8 11 9.62 9.5 1.06
14 d v2 36 44 40.4 40.5 2.67
15 d v3 123 144 138. 140 7.38
16 d v4 160 170 165. 165 3.40
$summer
# A tibble: 16 x 7
site type min max mean median sd
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a v1 0 3 1.5 1.5 1.73
2 a v2 12 17 14 13.5 2.16
3 a v3 91 102 95 93.5 4.83
4 a v4 150 155 152. 152. 2.08
5 b v1 2 6 4 4 1.83
6 b v2 17 25 22.5 24 3.79
7 b v3 73 83 76 74 4.69
8 b v4 147 149 148. 148. 0.957
9 c v1 6 9 7.75 8 1.5
10 c v2 31 39 34.2 33.5 3.59
11 c v3 43 58 47.8 45 6.90
12 c v4 146 148 147. 146. 0.957
13 d v1 8 11 9.5 9.5 1.29
14 d v2 36 42 39.2 39.5 2.5
15 d v3 123 144 135. 136 8.85
16 d v4 165 170 167 166. 2.45
$winter
# A tibble: 16 x 7
site type min max mean median sd
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a v1 3 4 3.75 4 0.5
2 a v2 10 20 14.8 14.5 4.11
3 a v3 103 106 104. 104 1.26
4 a v4 152 154 153 153 0.816
5 b v1 2 6 4.25 4.5 1.71
6 b v2 16 22 19.2 19.5 2.75
7 b v3 70 90 78 76 8.83
8 b v4 150 152 151 151 0.816
9 c v1 5 8 7 7.5 1.41
10 c v2 30 40 35.8 36.5 4.35
11 c v3 44 65 55.8 57 9.60
12 c v4 147 151 148. 148. 1.89
13 d v1 9 11 9.75 9.5 0.957
14 d v2 38 44 41.5 42 2.65
15 d v3 134 144 141 143 4.76
16 d v4 160 168 163. 162. 3.40

group_by and pmap a piecewise operation on each row per group (ifelse vs case_when)

I am trying to group_by a variable and then do operations per row per group. I got lost when using ifelse vs case_when. There is something basic I am failing to understand between the usage of two. I was assuming both would give me same output but that is not the case here. Using ifelse didn't give the expected output but case_when did. And I am trying to understand why ifelse didn't give me the expected output.
Here is the example df
structure(list(Pos = c(73L, 146L, 146L, 150L, 150L, 151L, 151L,
152L, 182L, 182L), Percentage = c(81.2, 13.5, 86.4, 66.1, 33.9,
48.1, 51.9, 86.1, 48, 52)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame")) -> foo
I am grouping by Pos and I want to round Percentage if their sum is 100. The following is using ifelse:
library(tidyverse)
foo %>%
group_by(Pos) %>%
mutate(sumn = n()) %>%
mutate(Val = ifelse(sumn == 1,100,
ifelse(sum(Percentage) == 100, unlist(map(Percentage,round)), 0)
# case_when(sum(Percentage) == 100 ~ unlist(map(Percentage,round)),
# TRUE ~ 0
# )
))
the output is
# A tibble: 10 x 4
# Groups: Pos [6]
Pos Percentage sumn Val
<int> <dbl> <int> <dbl>
1 73 81.2 1 100
2 146 13.5 2 0
3 146 86.4 2 0
4 150 66.1 2 66
5 150 33.9 2 66
6 151 48.1 2 48
7 151 51.9 2 48
8 152 86.1 1 100
9 182 48 2 48
10 182 52 2 48
I don't want this, rather I want the following which I get using case_when
foo %>%
group_by(Pos) %>%
mutate(sumn = n()) %>%
mutate(Val = ifelse(sumn == 1,100,
#ifelse(sum(Percentage) == 100, unlist(map(Percentage,round)), 0)
case_when(sum(Percentage) == 100 ~ unlist(map(Percentage,round)),
TRUE ~ 0
)
))
# A tibble: 10 x 4
# Groups: Pos [6]
Pos Percentage sumn Val
<int> <dbl> <int> <dbl>
1 73 81.2 1 100
2 146 13.5 2 0
3 146 86.4 2 0
4 150 66.1 2 66
5 150 33.9 2 34
6 151 48.1 2 48
7 151 51.9 2 52
8 152 86.1 1 100
9 182 48 2 48
10 182 52 2 52
What is ifelse doing different?

According to ?ifelse
A vector of the same length and attributes (including dimensions and "class") as test and data values from the values of yes or no.
If we replicate to make the lengths same, then it should work
foo %>%
group_by(Pos) %>%
mutate(sumn = n()) %>%
mutate(Val = ifelse(sumn == 1,100,
ifelse(rep(sum(Percentage) == 100,
n()), unlist(map(Percentage,round)), 0)
))
# A tibble: 10 x 4
# Groups: Pos [6]
Pos Percentage sumn Val
<int> <dbl> <int> <dbl>
1 73 81.2 1 100
2 146 13.5 2 0
3 146 86.4 2 0
4 150 66.1 2 66
5 150 33.9 2 34
6 151 48.1 2 48
7 151 51.9 2 52
8 152 86.1 1 100
9 182 48 2 48
10 182 52 2 52

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Divide columns by a reference row - r

I need to divide columns despesatotal and despesamonetaria by the row named Total:

Related

How can I mutate all columns to change class using dplyr in R?

R:dplyr summarise data by group with nth() call with variable n calculated during aggregation

dplyr: omit rows with negative values conditional on two duplicated parameters per row

Looping over Dataframes and Columns in R

group_by and pmap a piecewise operation on each row per group (ifelse vs case_when)

Categories

Resources