Error adding multiple columns to R dataframe - r

I have a dataframe dept_sales
Store 2010-02-19 2010-02-26 2010-03-05 2010-03-19 2010-05-14 2010-12-10
1 2 0.78 7.02 0.78 0.78 0 0.00
2 4 0.00 0.00 0.00 0.00 0 1.56
3 18 0.00 0.00 0.00 0.00 28 0.0
I am trying to add multiple columns to this dataframe all with value 0. I did this
dept_sales[, dropped_columns] = 0
where dropped_columns is just a list of dates:
[1] "2010-02-05" "2010-02-12" "2010-03-12" "2010-03-26" "2010-04-02" "2010-04-09" "2010-04-16" "2010-04-23" "2010-04-30" "2010-05-07" "2010-05-21"
[12] "2010-05-28" "2010-06-04" "2010-06-11" "2010-06-18" "2010-06-25" "2010-07-02" "2010-07-09" "2010-07-16" "2010-07-23" "2010-07-30" "2010-08-06"
[23] "2010-08-13" "2010-08-20" "2010-08-27" "2010-09-03" "2010-09-10" "2010-09-17" "2010-09-24" "2010-10-01" "2010-10-08" "2010-10-15" "2010-10-22"
[34] "2010-10-29" "2010-11-05" "2010-11-12" "2010-11-19" "2010-11-26" "2010-12-03" "2010-12-17" "2010-12-24" "2010-12-31" "2011-01-07" "2011-01-14"
[45] "2011-01-21" "2011-01-28" "2011-02-04" "2011-02-11" "2011-02-18" "2011-02-25"
which I get error
Error in `[<-.data.frame`(`*tmp*`, , dropped_columns, value = 0) :
new columns would leave holes after existing columns

We can mimic that error by going a bit extreme:
dept_sales[, 10000] <- 0
# Error in `[<-.data.frame`(`*tmp*`, , 10000, value = 0) :
# new columns would leave holes after existing columns
It appears that your dropped_columns may be real Dates instead of strings. Convert to strings first.
dept_sales[,dropped_columns] <- 0
# Error in `[<-.data.frame`(`*tmp*`, , dropped_columns, value = 0) :
# new columns would leave holes after existing columns
dept_sales[,as.character(dropped_columns)] <- 0
dept_sales[,1:16] # just a subset of columns for demonstration here
# Store 2010-02-19 2010-02-26 2010-03-05 2010-03-19 2010-05-14 2010-12-10 2010-02-05 2010-02-12 2010-03-12 2010-03-26 2010-04-02 2010-04-09 2010-04-16 2010-04-23 2010-04-30
# 1 2 0.78 7.02 0.78 0.78 0 0.00 0 0 0 0 0 0 0 0 0
# 2 4 0.00 0.00 0.00 0.00 0 1.56 0 0 0 0 0 0 0 0 0
# 3 18 0.00 0.00 0.00 0.00 28 0.00 0 0 0 0 0 0 0 0 0
Data
dept_sales <- structure(list(Store = c(2L, 4L, 18L), "2010-02-19" = c(0.78, 0, 0), "2010-02-26" = c(7.02, 0, 0), "2010-03-05" = c(0.78, 0, 0), "2010-03-19" = c(0.78, 0, 0), "2010-05-14" = c(0L, 0L, 28L), "2010-12-10" = c(0, 1.56, 0)), class = "data.frame", row.names = c("1", "2", "3"))
dropped_columns <- structure(c(14645, 14652, 14680, 14694, 14701, 14708, 14715, 14722, 14729, 14736, 14750, 14757, 14764, 14771, 14778, 14785, 14792, 14799, 14806, 14813, 14820, 14827, 14834, 14841, 14848, 14855, 14862, 14869, 14876, 14883, 14890, 14897, 14904, 14911, 14918, 14925, 14932, 14939, 14946, 14960, 14967, 14974, 14981, 14988, 14995, 15002, 15009, 15016, 15023, 15030), class = "Date")

Related

format table to have mean (sd) instead of separate columns R

I Have a data frame of several water quality measures. For each measure I have a calculated mean and SD. I have a value for 6 sites and 4 seasons. Currently my dataframe has the means in a column for examples 'Temp_1' and then a column for the standard deviation as 'Temp_2'. I want to export the file with one column for each water quality measure with the format mean (SD).
current output
This is an example for the first water measure, but I'd like to code it so it is also done to remaining factors as well.
desired output
Head of dataframe
structure(list(season = structure(c(1L, 1L, 1L, 1L, 1L, 1L), levels = c("Winter",
"Spring", "Summer", "Autumn"), class = "factor"), Site = structure(1:6, levels = c("1",
"2", "3", "4", "5", "6"), class = "factor"), Temp_1 = c(7.2,
7.05, 6.3, 6.25, 6.2, 5.4), Temp_2 = c(1.55563491861041, 1.90918830920368,
1.69705627484771, 2.33345237791561, 2.40416305603426, 2.40416305603426
), pH_1 = c(7.435, 7.38, 7.52, 7.525, 7.38, 7.565), pH_2 = c(0.289913780286484,
0.282842712474619, 0.0989949493661164, 0.120208152801713, 0.0565685424949239,
0.261629509039023), DO_1 = c(9, 9.1, 8.25, 8.85, 9.25, 9), DO_2 = c(0,
0.424264068711928, 0.0707106781186558, 0.494974746830583, 0.636396103067892,
0.42426406871193), EC_1 = c(337.5, 333, 321.5, 322, 309, 300.5
), EC_2 = c(55.8614357137373, 41.0121933088198, 51.618795026618,
32.5269119345812, 25.4558441227157, 30.4055915910215), SS_1 = c(5.945,
3.65, 5.025, 2.535, 10.22, 4.595), SS_2 = c(0.728319984622144,
1.06066017177982, 2.93449314192417, 0.473761543394987, 8.23072293301141,
0.67175144212722), TP_1 = c(73.5, 75, 61.5, 66.5, 83, 87), TP_2 = c(3.53553390593274,
12.7279220613579, 9.19238815542512, 6.36396103067893, 26.8700576850888,
24.0416305603426), SRP_1 = c(19, 19, 10, 14, 13.5, 23.5), SRP_2 = c(2.82842712474619,
1.4142135623731, 2.82842712474619, 0, 0.707106781186548, 3.53553390593274
), PP_1 = c(54.5, 56, 51.5, 52.5, 69.5, 63.5), PP_2 = c(6.36396103067893,
11.3137084989848, 6.36396103067893, 6.36396103067893, 26.1629509039023,
20.5060966544099), DA_1 = c(0.083, 0.0775, 0.0775, 0.044, 0.059,
0.051), DA_2 = c(0.00282842712474619, 0.0120208152801713, 0.00919238815542513,
0.0014142135623731, 0.0127279220613579, 0.00848528137423857),
DNI_1 = c(0.048739437, 0.041015562, 0.0617723365, 0.0337441755,
0.041480944, 0.0143461675), DNI_2 = c(0.0345079125942686,
0.0223312453226695, 0.0187360224120165, 0.0162032493604065,
0.0258169069873252, 0.0202885446465761), DNA_1 = c(20.43507986,
20.438919615, 14.98692746, 19.953408625, 17.03060377, 8.5767502525
), DNA_2 = c(1.80288106961836, 1.2687128010491, 2.28839365291436,
1.03116172040732, 0.396528484042397, 1.72350828181138), DF_1 = c(0.0992379715,
0.0947268395, 0.094323125, 0.098064875, 0.0980304675, 0.085783911
), DF_2 = c(0.00372072305060515, 0.00724914346231915, 0.0142932471712976,
0.0116895470668939, 0.00255671780854136, 0.00830519117656529
), DC_1 = c(12.18685357, 12.73924378, 13.09550326, 13.417557825,
15.140975265, 21.429763715), DC_2 = c(0.57615880774946, 0.0430071960969884,
0.702539578486863, 0.134642528587041, 0.66786605299916, 0.17012889453292
), DS_1 = c(15.834380095, 15.69623116, 14.37636388, 15.444235935,
14.647596185, 11.9877372), DS_2 = c(1.67153135346354, 1.69978765863781,
2.47560570280853, 1.03831263471691, 1.24488755930594, 0.975483163720397
), DOC_1 = c(19.74, 20.08, 21.24, 20.34, 21.88, 24.92), DOC_2 = c(2.7435743110038,
1.69705627484772, 2.60215295476649, 1.04651803615609, 0.226274169979695,
0.452548339959388)), row.names = c(NA, 6L), class = "data.frame")
Using mutate across with some tricks to organize paired data we can do it this way. Further adaptation is possible (for example just to keep the mean_sd columns (just use transmute instead of mutate):
Update:
library(dplyr)
library(stringr)
df %>%
mutate(across(-c(season, Site), ~round(.,2))) %>%
mutate(across(ends_with('_1'), ~ paste0(.,
"(",
get(str_replace(cur_column(), "_1$", "_2")),
")"
), .names = "mean_sd_{.col}")) %>%
rename_at(vars(starts_with('mean_sd')), ~ str_remove(., "\\_1"))
season Site Temp_1 Temp_2 pH_1 pH_2 DO_1 DO_2 EC_1 EC_2 SS_1 SS_2 TP_1 TP_2 SRP_1 SRP_2 PP_1 PP_2 DA_1 DA_2 DNI_1 DNI_2 DNA_1 DNA_2 DF_1
1 Winter 1 7.20 1.56 7.43 0.29 9.00 0.00 337.5 55.86 5.94 0.73 73.5 3.54 19.0 2.83 54.5 6.36 0.08 0.00 0.05 0.03 20.44 1.80 0.10
2 Winter 2 7.05 1.91 7.38 0.28 9.10 0.42 333.0 41.01 3.65 1.06 75.0 12.73 19.0 1.41 56.0 11.31 0.08 0.01 0.04 0.02 20.44 1.27 0.09
3 Winter 3 6.30 1.70 7.52 0.10 8.25 0.07 321.5 51.62 5.03 2.93 61.5 9.19 10.0 2.83 51.5 6.36 0.08 0.01 0.06 0.02 14.99 2.29 0.09
4 Winter 4 6.25 2.33 7.53 0.12 8.85 0.49 322.0 32.53 2.54 0.47 66.5 6.36 14.0 0.00 52.5 6.36 0.04 0.00 0.03 0.02 19.95 1.03 0.10
5 Winter 5 6.20 2.40 7.38 0.06 9.25 0.64 309.0 25.46 10.22 8.23 83.0 26.87 13.5 0.71 69.5 26.16 0.06 0.01 0.04 0.03 17.03 0.40 0.10
6 Winter 6 5.40 2.40 7.57 0.26 9.00 0.42 300.5 30.41 4.60 0.67 87.0 24.04 23.5 3.54 63.5 20.51 0.05 0.01 0.01 0.02 8.58 1.72 0.09
DF_2 DC_1 DC_2 DS_1 DS_2 DOC_1 DOC_2 mean_sd_Temp mean_sd_pH mean_sd_DO mean_sd_EC mean_sd_SS mean_sd_TP mean_sd_SRP mean_sd_PP mean_sd_DA
1 0.00 12.19 0.58 15.83 1.67 19.74 2.74 7.2(1.56) 7.43(0.29) 9(0) 337.5(55.86) 5.94(0.73) 73.5(3.54) 19(2.83) 54.5(6.36) 0.08(0)
2 0.01 12.74 0.04 15.70 1.70 20.08 1.70 7.05(1.91) 7.38(0.28) 9.1(0.42) 333(41.01) 3.65(1.06) 75(12.73) 19(1.41) 56(11.31) 0.08(0.01)
3 0.01 13.10 0.70 14.38 2.48 21.24 2.60 6.3(1.7) 7.52(0.1) 8.25(0.07) 321.5(51.62) 5.03(2.93) 61.5(9.19) 10(2.83) 51.5(6.36) 0.08(0.01)
4 0.01 13.42 0.13 15.44 1.04 20.34 1.05 6.25(2.33) 7.53(0.12) 8.85(0.49) 322(32.53) 2.54(0.47) 66.5(6.36) 14(0) 52.5(6.36) 0.04(0)
5 0.00 15.14 0.67 14.65 1.24 21.88 0.23 6.2(2.4) 7.38(0.06) 9.25(0.64) 309(25.46) 10.22(8.23) 83(26.87) 13.5(0.71) 69.5(26.16) 0.06(0.01)
6 0.01 21.43 0.17 11.99 0.98 24.92 0.45 5.4(2.4) 7.57(0.26) 9(0.42) 300.5(30.41) 4.6(0.67) 87(24.04) 23.5(3.54) 63.5(20.51) 0.05(0.01)
mean_sd_DNI mean_sd_DNA mean_sd_DF mean_sd_DC mean_sd_DS mean_sd_DOC
1 0.05(0.03) 20.44(1.8) 0.1(0) 12.19(0.58) 15.83(1.67) 19.74(2.74)
2 0.04(0.02) 20.44(1.27) 0.09(0.01) 12.74(0.04) 15.7(1.7) 20.08(1.7)
3 0.06(0.02) 14.99(2.29) 0.09(0.01) 13.1(0.7) 14.38(2.48) 21.24(2.6)
4 0.03(0.02) 19.95(1.03) 0.1(0.01) 13.42(0.13) 15.44(1.04) 20.34(1.05)
5 0.04(0.03) 17.03(0.4) 0.1(0) 15.14(0.67) 14.65(1.24) 21.88(0.23)
6 0.01(0.02) 8.58(1.72) 0.09(0.01) 21.43(0.17) 11.99(0.98) 24.92(0.45)
First answer:
We could do this like so:
library(dplyr)
df %>% mutate(mean_sd = paste0(Temp_1, " (", round(Temp_2,2), ")"), .before=5)
season Site Temp_1 Temp_2 mean_sd pH_1 pH_2 DO_1 DO_2 EC_1 EC_2 SS_1 SS_2 TP_1 TP_2 SRP_1 SRP_2 PP_1
1 Winter 1 7.20 1.555635 7.2 (1.56) 7.435 0.28991378 9.00 0.00000000 337.5 55.86144 5.945 0.7283200 73.5 3.535534 19.0 2.8284271 54.5
2 Winter 2 7.05 1.909188 7.05 (1.91) 7.380 0.28284271 9.10 0.42426407 333.0 41.01219 3.650 1.0606602 75.0 12.727922 19.0 1.4142136 56.0
3 Winter 3 6.30 1.697056 6.3 (1.7) 7.520 0.09899495 8.25 0.07071068 321.5 51.61880 5.025 2.9344931 61.5 9.192388 10.0 2.8284271 51.5
4 Winter 4 6.25 2.333452 6.25 (2.33) 7.525 0.12020815 8.85 0.49497475 322.0 32.52691 2.535 0.4737615 66.5 6.363961 14.0 0.0000000 52.5
5 Winter 5 6.20 2.404163 6.2 (2.4) 7.380 0.05656854 9.25 0.63639610 309.0 25.45584 10.220 8.2307229 83.0 26.870058 13.5 0.7071068 69.5
6 Winter 6 5.40 2.404163 5.4 (2.4) 7.565 0.26162951 9.00 0.42426407 300.5 30.40559 4.595 0.6717514 87.0 24.041631 23.5 3.5355339 63.5
PP_2 DA_1 DA_2 DNI_1 DNI_2 DNA_1 DNA_2 DF_1 DF_2 DC_1 DC_2 DS_1 DS_2 DOC_1
1 6.363961 0.0830 0.002828427 0.04873944 0.03450791 20.43508 1.8028811 0.09923797 0.003720723 12.18685 0.5761588 15.83438 1.6715314 19.74
2 11.313708 0.0775 0.012020815 0.04101556 0.02233125 20.43892 1.2687128 0.09472684 0.007249143 12.73924 0.0430072 15.69623 1.6997877 20.08
3 6.363961 0.0775 0.009192388 0.06177234 0.01873602 14.98693 2.2883937 0.09432312 0.014293247 13.09550 0.7025396 14.37636 2.4756057 21.24
4 6.363961 0.0440 0.001414214 0.03374418 0.01620325 19.95341 1.0311617 0.09806487 0.011689547 13.41756 0.1346425 15.44424 1.0383126 20.34
5 26.162951 0.0590 0.012727922 0.04148094 0.02581691 17.03060 0.3965285 0.09803047 0.002556718 15.14098 0.6678661 14.64760 1.2448876 21.88
6 20.506097 0.0510 0.008485281 0.01434617 0.02028854 8.57675 1.7235083 0.08578391 0.008305191 21.42976 0.1701289 11.98774 0.9754832 24.92
DOC_2
1 2.7435743
2 1.6970563
3 2.6021530
4 1.0465180
5 0.2262742
6 0.4525483
You can create a new column like this
df$Temp <- paste0(df$Temp_1, ' (', df$Temp_2, ')')
And select only the desired output columns
df[, c('season', 'Site', 'Temp')]
library(tidyverse)
df %>%
pivot_longer(-c(season, Site)) %>%
mutate(name = name %>% str_remove_all("[^a-zA-Z]")) %>%
group_by(season, Site, name) %>%
summarise(value = str_c(round(value, 2), collapse = ", ")) %>%
pivot_wider(names_from = name,
values_from = value)
# A tibble: 6 x 17
# Groups: season, Site [6]
season Site DA DC DF DNA DNI DO DOC DS EC pH PP SRP SS Temp TP
<fct> <fct> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Winter 1 0.08, 0 12.19, 0.58 0.1, 0 20.44, 1.8 0.05, 0.03 9, 0 19.7~ 15.8~ 337.~ 7.43~ 54.5~ 19, ~ 5.94~ 7.2,~ 73.5~
2 Winter 2 0.08, 0.01 12.74, 0.04 0.09, 0.01 20.44, 1.27 0.04, 0.02 9.1, 0.~ 20.0~ 15.7~ 333,~ 7.38~ 56, ~ 19, ~ 3.65~ 7.05~ 75, ~
3 Winter 3 0.08, 0.01 13.1, 0.7 0.09, 0.01 14.99, 2.29 0.06, 0.02 8.25, 0~ 21.2~ 14.3~ 321.~ 7.52~ 51.5~ 10, ~ 5.03~ 6.3,~ 61.5~
4 Winter 4 0.04, 0 13.42, 0.13 0.1, 0.01 19.95, 1.03 0.03, 0.02 8.85, 0~ 20.3~ 15.4~ 322,~ 7.53~ 52.5~ 14, 0 2.54~ 6.25~ 66.5~
5 Winter 5 0.06, 0.01 15.14, 0.67 0.1, 0 17.03, 0.4 0.04, 0.03 9.25, 0~ 21.8~ 14.6~ 309,~ 7.38~ 69.5~ 13.5~ 10.2~ 6.2,~ 83, ~
6 Winter 6 0.05, 0.01 21.43, 0.17 0.09, 0.01 8.58, 1.72 0.01, 0.02 9, 0.42 24.9~ 11.9~ 300.~ 7.57~ 63.5~ 23.5~ 4.6,~ 5.4,~ 87, ~

How can I make a moving sum from a cell in R?

I have a dataframe looking like this:
date
P
>60?
03-31-2020
6.8
0
03-30-2020
5.0
0
03-29-2020
0.0
0
03-28-2020
0.0
0
03-27-2020
2.0
0
03-26-2020
0.0
0
03-25-2020
71.0
1
03-24-2020
2.0
0
03-23-2020
0.0
0
03-22-2020
23.8
0
03-21-2020
0.0
0
03-20-2020
23.8
0
Code to reproduce the dataframe:
df1 <- data.frame(date = c("03-31-2020", "03-30-2020", "03-29-2020", "03-28-2020", "03-27-2020", "03-26-2020",
"03-25-2020", "03-24-2020", "03-23-2020", "03-22-2020", "03-21-2020", "03-20-2020"),
P = c(6.8, 5.0, 0.0, 0.0, 2.0, 0.0, 71.0, 2.0, 0.0, 23.8, 0.0, 23.8),
Sup60 = c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0))
I want to sum the P values N days befores the P > 60.
For example, the first barrier (number bigger than 60) is the P = 71 on the day 25-03-2020, from that i want to sum the 5 P values before that day, like:
2.0 + 0.0 + 23.8 + 0.0 + 23.8 = 49,6
It is a kind of moving sum because the concept is similar to a moving average.
Instead of the average of the last 5 values, for example, I want the sum of the last 5 values from a value greater than 60.
How can I do this?
Hi firstly we can solve how to calculate a running sum then we do an if_else on this column, as a general rule you always split complex problems into minor solvable problems
library(tidyverse)
df_example <- tibble::tribble(
~date, ~P, ~`>60?`,
"03-31-2020", 6.8, 0L,
"03-30-2020", 5, 0L,
"03-29-2020", 0, 0L,
"03-28-2020", 0, 0L,
"03-27-2020", 2, 0L,
"03-26-2020", 0, 0L,
"03-25-2020", 71, 1L,
"03-24-2020", 2, 0L,
"03-23-2020", 0, 0L,
"03-22-2020", 23.8, 0L,
"03-21-2020", 0, 0L,
"03-20-2020", 23.8, 0L
)
# lets start by doing a simple running sum
jjj <- df_example |>
arrange(date)
jjj |>
mutate(running_sum = slider::slide_dbl(.x = P,.f = ~ sum(.x),.before = 5,.after = -1)) |>
mutate(chosen_sum = if_else(P > 60,running_sum,NA_real_))
#> # A tibble: 12 x 5
#> date P `>60?` running_sum chosen_sum
#> <chr> <dbl> <int> <dbl> <dbl>
#> 1 03-20-2020 23.8 0 0 NA
#> 2 03-21-2020 0 0 23.8 NA
#> 3 03-22-2020 23.8 0 23.8 NA
#> 4 03-23-2020 0 0 47.6 NA
#> 5 03-24-2020 2 0 47.6 NA
#> 6 03-25-2020 71 1 49.6 49.6
#> 7 03-26-2020 0 0 96.8 NA
#> 8 03-27-2020 2 0 96.8 NA
#> 9 03-28-2020 0 0 75 NA
#> 10 03-29-2020 0 0 75 NA
#> 11 03-30-2020 5 0 73 NA
#> 12 03-31-2020 6.8 0 7 NA
Created on 2021-10-20 by the reprex package (v2.0.1)

How to convert long form to wide form based on category in R

I have the following data.
name x1 x2 x3 x4
1 V1_3 1 0 999 999
2 V2_3 1.12 0.044 25.4 0
3 V3_3 0.917 0.045 20.4 0
4 V1_15 1 0 999 999
5 V2_15 1.07 0.036 29.8 0
6 V3_15 0.867 0.039 22.5 0
7 V1_25 1 0 999 999
8 V2_25 1.07 0.034 31.1 0
9 V3_25 0.917 0.037 24.6 0
10 V1_35 1 0 999 999
11 V2_35 1.05 0.034 31.2 0
12 V3_35 0.994 0.037 26.6 0
13 V1_47 1 0 999 999
14 V2_47 1.03 0.031 33.6 0
15 V3_47 0.937 0.034 27.4 0
16 V1_57 1 0 999 999
17 V2_57 1.13 0.036 31.9 0
18 V3_57 1.03 0.037 28.1 0
I want to convert this data to the following data. Can someone give me some suggestion, please?
name est_3 est_15 est_25 est_35 est_47 est_57
1 V2 1.12 1.07 1.07 1.05 1.03 1.13
2 V3 0.917 0.867 0.917 0.994 0.937 1.03
Here is one approach for you. Your data is called mydf here. First, you want to choose necessary columns (i.e., name and x1) using select(). Then, you want to subset rows using filter(). You want to grab rows that begin with V2 or V3 in strings. grepl() checks if each string has the pattern. Then, you want to split the column, name and create two columns (i.e., name and est). Finally, you want to convert the data to a long-format data using pivot_wider().
library(dplyr)
library(tidyr)
select(mydf, name:x1) %>%
filter(grepl(x = name, pattern = "^V[2|3]")) %>%
separate(col = name, into = c("name", "est"), sep = "_") %>%
pivot_wider(names_from = "est",values_from = "x1", names_prefix = "est_")
# name est_3 est_15 est_25 est_35 est_47 est_57
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 V2 1.12 1.07 1.07 1.05 1.03 1.13
#2 V3 0.917 0.867 0.917 0.994 0.937 1.03
For your reference, when you ask questions, you want to provide a minimal sample data and code. If you can do that, SO users can easily help you out. Please read this question.
DATA
mydf <- structure(list(name = c("V1_3", "V2_3", "V3_3", "V1_15", "V2_15",
"V3_15", "V1_25", "V2_25", "V3_25", "V1_35", "V2_35", "V3_35",
"V1_47", "V2_47", "V3_47", "V1_57", "V2_57", "V3_57"), x1 = c(1,
1.122, 0.917, 1, 1.069, 0.867, 1, 1.066, 0.917, 1, 1.048, 0.994,
1, 1.03, 0.937, 1, 1.133, 1.032), x2 = c(0, 0.044, 0.045, 0,
0.036, 0.039, 0, 0.034, 0.037, 0, 0.034, 0.037, 0, 0.031, 0.034,
0, 0.036, 0.037), x3 = c(999, 25.446, 20.385, 999, 29.751, 22.478,
999, 31.134, 24.565, 999, 31.18, 26.587, 999, 33.637, 27.405,
999, 31.883, 28.081), x4 = c(999, 0, 0, 999, 0, 0, 999, 0, 0,
999, 0, 0, 999, 0, 0, 999, 0, 0)), row.names = c(NA, -18L), class = c("tbl_df",
"tbl", "data.frame"))

transform data.frame matrix-column into columns

When using aggregate with compound function, the resulting data.frame has matrices inside columns.
ta=aggregate(cbind(precision,result,prPo)~rstx+qx+laplace,t0
,function(x) c(x=mean(x),m=min(x),M=max(x)))
ta=head(ta)
dput(ta)
structure(list(rstx = c(3, 3, 2, 3, 2, 3), qx = c(0.2, 0.25,
0.3, 0.3, 0.33, 0.33), laplace = c(0, 0, 0, 0, 0, 0), precision = structure(c(0.174583333333333,
0.186833333333333, 0.3035, 0.19175, 0.30675, 0.193666666666667,
0.106, 0.117, 0.213, 0.101, 0.22, 0.109, 0.212, 0.235, 0.339,
0.232, 0.344, 0.232), .Dim = c(6L, 3L), .Dimnames = list(NULL,
c("x", "m", "M"))), result = structure(c(-142.333333333333,
-108.316666666667, -69.1, -85.7, -59.1666666666667, -68.5666666666667,
-268.8, -198.2, -164, -151.6, -138.2, -144.8, -30.8, -12.2, -14.2,
-3.8, -12.6, -3.4), .Dim = c(6L, 3L), .Dimnames = list(NULL,
c("x", "m", "M"))), prPo = structure(c(3.68416666666667,
3.045, 2.235, 2.53916666666667, 2.0775, 2.23666666666667, 1.6,
1, 1.02, 0.54, 0.87, 0.31, 5.04, 4.02, 2.77, 3.53, 2.63, 3.25
), .Dim = c(6L, 3L), .Dimnames = list(NULL, c("x", "m", "M")))), .Names = c("rstx",
"qx", "laplace", "precision", "result", "prPo"), row.names = c(NA,
6L), class = "data.frame")
Is there a function that transform data.frame matrix-colum into columns?
Manually, for each matrix-column, column bind plus column delete works:
colnames(ta)
[1] "rstx" "qx" "laplace" "precision" "result" "prPo"
ta[,"precision"] # ta[,4]
x m M
[1,] 0.1745833 0.106 0.212
[2,] 0.1868333 0.117 0.235
[3,] 0.3035000 0.213 0.339
[4,] 0.1917500 0.101 0.232
[5,] 0.3067500 0.220 0.344
[6,] 0.1936667 0.109 0.232
#column bind + column delete
ta=cbind(ta,precision=ta[,4])
ta=ta[,-4]
colnames(ta)
[1] "rstx" "qx" "laplace" "result" "prPo" "precision.x" "precision.m"
[8] "precision.M"
ta
rstx qx laplace result.x result.m result.M prPo.x prPo.m prPo.M precision.x precision.m
1 3 0.20 0 -142.33333 -268.80000 -30.80000 3.684167 1.600000 5.040000 0.1745833 0.106
2 3 0.25 0 -108.31667 -198.20000 -12.20000 3.045000 1.000000 4.020000 0.1868333 0.117
3 2 0.30 0 -69.10000 -164.00000 -14.20000 2.235000 1.020000 2.770000 0.3035000 0.213
4 3 0.30 0 -85.70000 -151.60000 -3.80000 2.539167 0.540000 3.530000 0.1917500 0.101
5 2 0.33 0 -59.16667 -138.20000 -12.60000 2.077500 0.870000 2.630000 0.3067500 0.220
6 3 0.33 0 -68.56667 -144.80000 -3.40000 2.236667 0.310000 3.250000 0.1936667 0.109
precision.M
1 0.212
2 0.235
3 0.339
4 0.232
5 0.344
6 0.232
matrix doesn't support matrix-column. So as.matrix() transform data.frame into matrix, breaking up matrix-column.
Here is my idea:
library(tidyverse)
ta2 <- ta %>%
as.matrix() %>%
as.data.frame()
Somewhere in Stackoverflow I found a very simple solution:
cbind(ta[-ncol(ta)],ta[[ncol(ta)]])
rstx qx laplace precision.x precision.m precision.M result.x result.m result.M x m
1 3 0.20 0 0.1745833 0.1060000 0.2120000 -142.33333 -268.80000 -30.80000 3.684167 1.60
2 3 0.25 0 0.1868333 0.1170000 0.2350000 -108.31667 -198.20000 -12.20000 3.045000 1.00
3 2 0.30 0 0.3035000 0.2130000 0.3390000 -69.10000 -164.00000 -14.20000 2.235000 1.02
4 3 0.30 0 0.1917500 0.1010000 0.2320000 -85.70000 -151.60000 -3.80000 2.539167 0.54
5 2 0.33 0 0.3067500 0.2200000 0.3440000 -59.16667 -138.20000 -12.60000 2.077500 0.87
6 3 0.33 0 0.1936667 0.1090000 0.2320000 -68.56667 -144.80000 -3.40000 2.236667 0.31
M
1 5.04
2 4.02
3 2.77
4 3.53
5 2.63
6 3.25
Just that!

loops in R, finding the mean for one column depends on another column

So my test data looks like this:
structure(list(day = c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L
), Left = c(0.25, 0.33, 0, 0, 0.25, 0.33, 0.5, 0.33, 0.5, 0),
Left1 = c(NA, NA, 0, 0.5, 0.25, 0.33, 0.1, 0.33, 0.5, 0),
Middle = c(0, 0, 0.3, 0, 0.25, 0, 0.3, 0.33, 0, 0), Right = c(0.25,
0.33, 0.3, 0.5, 0.25, 0.33, 0.1, 0, 0, 0.25), Right1 = c(0.5,
0.33, 0.3, 0, 0, 0, 0, 0, 0, 0.75), Side = structure(c(2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("L", "R"), class = "factor")), .Names = c("day",
"Left", "Left1", "Middle", "Right", "Right1", "Side"), class = "data.frame", row.names = c(NA,
-10L))
or this:
day Left Left1 Middle Right Right1 Side
1 0.25 NA 0.00 0.25 0.50 R
1 0.33 NA 0.00 0.33 0.33 R
2 0.00 0.00 0.30 0.30 0.30 R
2 0.00 0.50 0.00 0.50 0.00 R
2 0.25 0.25 0.25 0.25 0.00 L
3 0.33 0.33 0.00 0.33 0.00 L
I would like to write a loop to find the standard error and average value for each day on the chosen side..
Ok.. So far I have this code:
td<-read.csv('test data.csv')
IDs<-unique(td$day)
se<-function(x) sqrt(var(x)/length(x))
for (i in 1:length (IDs)) {
day.i<-which(td$day==IDs[i])
td.i<-td[day.i,]
if(td$Side=='L'){
side<-cbind(td.i$Left + td.i$Left1)
}else{
side<-cbind(td.i$Right + td.i$Right1)
}
mean(side)
se(side)
print(mean)
print(se)
}
But I am getting error messages like this
Error: unexpected '}' in "}"
Obviously, I am also not getting the print out of means for each day.. Does anyone know why?
also working on things here: http://www.talkstats.com/showthread.php/27187-Writing-a-mean-loop..-(literally)
Convert your data into a list and work with that instead:
First, split up your data into a list according to Side, subsetting the relevant columns along the way.
td = split(td, td$Side)
NAMES = names(td)
td = lapply(1:length(td),
function(x) td[[x]][c(1, grep(NAMES[x],
names(td[[x]])))])
names(td) = NAMES
td
# $L
# day Left Left1
# 5 2 0.25 0.25
# 6 3 0.33 0.33
# 7 3 0.50 0.10
# 8 4 0.33 0.33
# 9 4 0.50 0.50
#
# $R
# day Right Right1
# 1 1 0.25 0.50
# 2 1 0.33 0.33
# 3 2 0.30 0.30
# 4 2 0.50 0.00
# 10 4 0.25 0.75
Then, use lapply and aggregate to apply whatever functions you want to your data.
lapply(1:length(td),
function(x) aggregate(list(td[[x]][-1]),
list(day = td[[x]]$day), mean))
# [[1]]
# day Left Left1
# 1 2 0.250 0.250
# 2 3 0.415 0.215
# 3 4 0.415 0.415
#
# [[2]]
# day Right Right1
# 1 1 0.29 0.415
# 2 2 0.40 0.150
# 3 4 0.25 0.750
Still not entirely sure if I understand (that is if you want mean and SE for both Left and Left 1 or some sort of combination like sum). This is how I interpreted your question:
FUN <- function(dat, side = "L") {
DF <- split(dat, dat$Side)[[side]]
ind <- if(side=="L") 2:3 else 5:6
stderr <- function(x) sqrt(var(x)/length(x))
meanNse <- function(x) c(mean=mean(x), se=stderr(x))
OUT <- aggregate(DF[, ind], list(DF[, 1]), meanNse)
names(OUT)[1] <- "day"
return(OUT)
}
#test it
FUN(td)
FUN(td, "R")
Which yields:
> FUN(td)
day Left.mean Left.se Left1.mean Left1.se
1 2 0.250 NA 0.250 NA
2 3 0.415 0.085 0.215 0.115
3 4 0.415 0.085 0.415 0.085
> FUN(td, "R")
day Right.mean Right.se Right1.mean Right1.se
1 1 0.29 0.04 0.415 0.085
2 2 0.40 0.10 0.150 0.150
3 4 0.25 NA 0.750 NA

Resources