I am trying to calculate cumulated acetone and acetaldehyde emission from different soil incubations across three time points. Emission of the compounds was measured from six soils (of different soil_types) on three days. I wish to calculate the cumulated emission for each soil for each time point.
The end goal is to calculate the average emission from all soils and present a graph similar to this one (except there should be error bars on my graph):
Can anyone spot where I'm going wrong?
Here's the code:
library(tidyverse)
library(plotrix)
df%>%
group_by(soil, compound, days)%>%
mutate(cum_emission=cumsum(emission))%>%
summarise(mean=mean(cum_emission, na.rm = TRUE),
sd = sd(cum_emission, na.rm = TRUE),
se = std.error(cum_emission, na.rm = TRUE))
Here's the data:
df <- structure(list(days = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4,
4, 4, 4, 4, 4, 4, 4, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 4, 4, 4, 4), soil = c(12, 12, 2, 2, 1, 1, 9, 9, 13, 13,
3, 3, 12, 12, 2, 2, 1, 1, 9, 9, 12, 12, 2, 2, 1, 1, 9, 9, 13,
13, 3, 3, 13, 13, 3, 3), soil_type = c("organic", "organic",
"mineral", "mineral", "mineral", "mineral", "organic", "organic",
"organic", "organic", "mineral", "mineral", "organic", "organic",
"mineral", "mineral", "mineral", "mineral", "organic", "organic",
"organic", "organic", "mineral", "mineral", "mineral", "mineral",
"organic", "organic", "organic", "organic", "mineral", "mineral",
"organic", "organic", "mineral", "mineral"), compound = c("Acetone",
"Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde",
"Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone",
"Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde",
"Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone",
"Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde",
"Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone",
"Acetaldehyde", "Acetone", "Acetaldehyde", "Acetone", "Acetaldehyde"
), emission = c(0.01, 0, 0.03, 0.03, 0.07, 0.06, 0.33, 0.1, 0.02,
0.01, 0.01, 0, 0.02, 0.01, 0.07, 0.08, 0.09, 0.07, 0.32, 0.22,
0.01, 0, 0.06, 0.06, 0.08, 0.06, 0.23, 0.14, 0.4, 0.04, 0.14,
0, 0.05, 0.05, 0.14, 0)), row.names = c(NA, -36L), class = c("tbl_df",
"tbl", "data.frame"))
This only addresses the setup of the data, not the plotting. (sorry for the partial answer!)
You wrote that you wanted to group by soil, compound, days, did you mean soil_type, compound, days? As #maarvd pointed out, with soil, every row is unique.
When I modified the content to
df %>%
group_by(soil_type, compound, days)%>%
mutate(cum_emission=cumsum(emission))%>%
summarise(mean=mean(cum_emission, na.rm = TRUE),
sd = sd(cum_emission, na.rm = TRUE),
se = std.error(cum_emission, na.rm = TRUE))
I was able to render the following results
# A tibble: 12 x 6
# Groups: soil_type, compound [4]
soil_type compound days mean sd se
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 mineral Acetaldehyde 0 0.0700 0.0346 0.02
2 mineral Acetaldehyde 4 0.127 0.0404 0.0233
3 mineral Acetaldehyde 10 0.10 0.0346 0.02
4 mineral Acetone 0 0.08 0.0436 0.0252
5 mineral Acetone 4 0.177 0.116 0.0669
6 mineral Acetone 10 0.16 0.111 0.0643
7 organic Acetaldehyde 0 0.07 0.0608 0.0351
8 organic Acetaldehyde 4 0.173 0.144 0.0829
9 organic Acetaldehyde 10 0.107 0.0945 0.0546
10 organic Acetone 0 0.237 0.197 0.113
11 organic Acetone 4 0.25 0.201 0.116
12 organic Acetone 10 0.297 0.319 0.184
** changes based on #Tiptop's comment
If you're looking for the cumulative, moving averages, how about this?
I'm sure some of this I didn't originally write, but wherever it originated, I've repurposed it many times.
You won't need plotrix, but you will need the library tidyquant.
library(tidyverse)
library(tidyquant)
UDF_roll <- function(x, na.rm = TRUE) {
m <- mean(x, na.rm = na.rm) # calculate the average (for the rolling average)
s <- sd(x, na.rm = na.rm) # calculate the sd to find the confidence interval
hi <- m + 2*s # CI HI
lo <- m - 2*s # CI Low
vals <- c(Mean = m, SD = s, HI.95 = hi, LO.95 = lo)
return(vals)
}
# loop for each type of compound (I'm assuming that the data you provided is a sample and you have more.)
trends <- vector("list") # empty list to store the results
cp = unique(df$compound) # create a list of unique compound names
for(i in 1:length(unique(df$compound))){ # loop through each compound
trends[[i]] <- df %>% as.data.frame() %>% # add results to the list
filter(compound == cp[i]) %>% # for one compound
arrange(days) %>%
# the rolling functions requires time series with a date; so random dates added as controller
mutate(time = seq(as.Date("2010/1/1"),
by = "month",
length.out = nrow(.)),
cum_emission = cumsum(emission)) %>%
arrange(compound,-days) %>% # most recent on top for TS
tq_mutate(select = cum_emission, # collect mean, sd, error
mutate_fun = rollapply,
width = 2, # 2: current & previous reading
align = "right",
by.column = FALSE,
FUN = UDF_roll, # calls the function UDF
na.rm = TRUE) %>%
ggplot(aes(x = seq_along(time))) +
geom_point(aes(y = cum_emission),
color = "black", alpha = 0.2) + # cumulative
geom_ribbon(aes(ymin = LO.95, ymax = HI.95),
fill = "azure3", alpha = 0.4) + # confidence interval
geom_jitter(aes(y = Mean, color= Mean),
size = 1, alpha = 0.9) + # rolling average
labs(title = paste0(cp[[i]], ": Trends and Volatility\nIncremental Moving Average with 95% CI Bands (+/-2 SD)"),
x = "", y = "Soil Emissions") +
scale_color_viridis_c(end = .8) + theme_bw() +
theme(legend.position="none")
}
trends[[1]]
trends[[2]]
trends[[1]]$data # you can NULL the time column if you use the data another way
This makes the data time series. The plots:
The data is shown below. If you wanted to group it differently, you'll have to add the argument .groups = "drop" to the summarise() call, or you won't be able to get it through tq_mutate.
# A tibble: 18 x 11
days soil soil_type compound emission time cum_emission Mean SD HI.95 LO.95
<dbl> <dbl> <chr> <chr> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 12 organic Acetone 0.01 2010-01-01 0.01 NA NA NA NA
2 0 2 mineral Acetone 0.03 2010-02-01 0.04 0.025 0.0212 0.0674 -0.0174
3 0 1 mineral Acetone 0.07 2010-03-01 0.11 0.075 0.0495 0.174 -0.0240
4 0 9 organic Acetone 0.33 2010-04-01 0.44 0.275 0.233 0.742 -0.192
5 0 13 organic Acetone 0.02 2010-05-01 0.46 0.45 0.0141 0.478 0.422
6 0 3 mineral Acetone 0.01 2010-06-01 0.47 0.465 0.00707 0.479 0.451
7 4 12 organic Acetone 0.02 2010-07-01 0.49 0.48 0.0141 0.508 0.452
8 4 2 mineral Acetone 0.07 2010-08-01 0.56 0.525 0.0495 0.624 0.426
9 4 1 mineral Acetone 0.09 2010-09-01 0.65 0.605 0.0636 0.732 0.478
10 4 9 organic Acetone 0.32 2010-10-01 0.97 0.81 0.226 1.26 0.357
11 4 13 organic Acetone 0.05 2010-11-01 1.02 0.995 0.0354 1.07 0.924
12 4 3 mineral Acetone 0.14 2010-12-01 1.16 1.09 0.0990 1.29 0.892
13 10 12 organic Acetone 0.01 2011-01-01 1.17 1.16 0.00707 1.18 1.15
14 10 2 mineral Acetone 0.06 2011-02-01 1.23 1.2 0.0424 1.28 1.12
15 10 1 mineral Acetone 0.08 2011-03-01 1.31 1.27 0.0566 1.38 1.16
16 10 9 organic Acetone 0.23 2011-04-01 1.54 1.42 0.163 1.75 1.10
17 10 13 organic Acetone 0.4 2011-05-01 1.94 1.74 0.283 2.31 1.17
18 10 3 mineral Acetone 0.14 2011-06-01 2.08 2.01 0.0990 2.21 1.81
Related
Based on the data below how can I remove the rows with duplicate X and Y coordinates? In the example below, you will notice that one of X coordinate is -1.52 which is repeated twice but it's not a duplicate since it's corresponding Y coordiantes are different.
I don't know if it matters but please note that the orginal dataset has more than 2 decimal places for the X and Y values.
Sample data:
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), X = c(-1.01,
-1.11, -1.11, -2.13, -2.13, -1.52, -1.52, -1.98, -3.69, -4.79),
Y = c(2.11, 3.33, 3.33, 6.66, 6.66, 7.77, 8.88, 9.99, 1.11,
6.68)), class = "data.frame", row.names = c(NA, -10L))
Desired data:
id X Y
1 -1.01 2.11
2 -1.11 3.33
4 -2.13 6.66
6 -1.52 7.77
7 -1.52 8.88
8 -1.98 9.99
9 -3.69 1.11
19 -4.79 6.68
Use duplicated
subset(df1, !duplicated(df1[-1]))
-output
id X Y
1 1 -1.01 2.11
2 2 -1.11 3.33
4 4 -2.13 6.66
6 6 -1.52 7.77
7 7 -1.52 8.88
8 8 -1.98 9.99
9 9 -3.69 1.11
10 10 -4.79 6.68
Or with distinct
library(dplyr)
df1 %>%
distinct(X, Y, .keep_all = TRUE)
I'm trying to split a data frame from long to wide format by converting selected rows to columns. Here is the current general long-format structure:
data_long <- data.frame(
id = c("kelp","kelp","fish","fish","beach","beach","kelp","kelp","fish","fish","beach","beach"),
desig = c("mpa","reference","mpa","reference","mpa","reference","mpa","reference","mpa","reference","mpa","reference"),
indicator = c("density","density","density","density","density","density","biomass","biomass","biomass","biomass","biomass","biomass"),
n = c(1118,1118,1118,1118,1118,1118,1118,1118,1118,1118,1118,1118),
m = c(0.35, 4.28, 1.16, 106.35, 13.44,0.63,0.35, 4.28, 1.16, 106.35, 13.44,0.63),
sd = c(1.19, 8.48, 4.25, 118, 31.77,2.79,1.19, 8.48, 4.25, 118, 31.77,2.79)
)
data_long
I want to keep id and indicator, split by "desig",and move "n", "m", and "sd" into new columns. The final data frame structure I'm trying to obtain is:
data_wide <- data.frame(
id = c("kelp","fish","beach","kelp","fish","beach"),
indicator = c("density","density","density","biomass","biomass","biomass"),
mpa.n = c(1118,1118,1118,1118,1118,1118),
mpa.m = c(0.35, 4.28, 1.16, 106.35, 13.44,0.63),
mpa.sd = c(1.19, 8.48, 4.25, 118, 31.77,2.79),
reference.n = c(1118,1118,1118,1118,1118,1118),
reference.m = c(0.35, 4.28, 1.16, 106.35, 13.44,0.63),
reference.sd = c(1.19, 8.48, 4.25, 118, 31.77,2.79)
)
data_wide
I can't seem to get this right using reshape2. Any suggestions?
We may use pivot_wider
library(tidyr)
library(dplyr)
pivot_wider(data_long, names_from = desig,
values_from = c(n, m, sd), names_glue = "{desig}.{.value}") %>%
select(id, indicator, starts_with("mpa"), starts_with('reference'))
-output
# A tibble: 6 × 8
id indicator mpa.n mpa.m mpa.sd reference.n reference.m reference.sd
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 kelp density 1118 0.35 1.19 1118 4.28 8.48
2 fish density 1118 1.16 4.25 1118 106. 118
3 beach density 1118 13.4 31.8 1118 0.63 2.79
4 kelp biomass 1118 0.35 1.19 1118 4.28 8.48
5 fish biomass 1118 1.16 4.25 1118 106. 118
6 beach biomass 1118 13.4 31.8 1118 0.63 2.79
My data set is about forest fires and NDVI values (a value ranging from 0 to 1, indicating how green is the surface). It has an initial column which says when the forest fire of row one took place, and subsequent columns indicating the NDVI value on different dates, before and after the fire happened. NDVI values before the fire are substantially higher compared with values after the fire. Something like:
data1989 <- data.frame("date_fire" = c("1987-01-01", "1987-07-03", "1988-01-01"),
"1986-01-01" = c(0.5, 0.589, 0.66),
"1986-06-03" = c(0.56, 0.447, 0.75),
"1986-10-19" = c(0.8, NA, 0.83),
"1987-01-19" = c(0.75, 0.65,0.75),
"1987-06-19" = c(0.1, 0.55,0.811),
"1987-10-19" = c(0.15, 0.12, 0.780),
"1988-01-19" = c(0.2, 0.22,0.32),
"1988-06-19" = c(0.18, 0.21,0.23),
"1988-10-19" = c(0.21, 0.24, 0.250),
stringsAsFactors = FALSE)
> data1989
date_fire X1986.01.01 X1986.06.03 X1986.10.19 X1987.01.19 X1987.06.19 X1987.10.19 X1988.01.19 X1988.06.19 X1988.10.19
1 1987-01-01 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18 0.21
2 1987-07-03 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21 0.24
3 1988-01-01 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23 0.25
I would like to compute the average of NDVI values, in a new column, PRIOR to the forest fire. In case one, it would be the average of columns 2, 3, 4 and 5.
What I need to get is:
date_fire X1986.01.01 X1986.06.03 X1986.10.19 X1987.01.19 X1987.06.19 X1987.10.19 X1988.01.19 X1988.06.19 X1988.10.19 meanPreFire
1 1987-01-01 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18 0.21 0.653
2 1987-07-03 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21 0.24 0.559
3 1988-01-01 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23 0.25 0.764
Thanks!
EDIT: SOLUTION
How to adapt the code with more than one column to exclude:
data1989 <- data.frame("date_fire" = c("1987-02-01", "1987-07-03", "1988-01-01"),
"type" = c("oak", "pine", "oak"),
"meanRainfall" = c(600, 300, 450),
"1986.01.01" = c(0.5, 0.589, 0.66),
"1986.06.03" = c(0.56, 0.447, 0.75),
"1986.10.19" = c(0.8, NA, 0.83),
"1987.01.19" = c(0.75, 0.65,0.75),
"1987.06.19" = c(0.1, 0.55,0.811),
"1987.10.19" = c(0.15, 0.12, 0.780),
"1988.01.19" = c(0.2, 0.22,0.32),
"1988.06.19" = c(0.18, 0.21,0.23),
"1988.10.19" = c(0.21, 0.24, 0.250),
check.names = FALSE,
stringsAsFactors = FALSE)
Using:
j1 <- findInterval(as.Date(data1989$date_fire), as.Date(names(data1989)[-(1:3)],format="%Y.%m.%d"))
m1 <- cbind(rep(seq_len(nrow(data1989)), j1), sequence(j1))
data1989$meanPreFire <- tapply(data1989[-(1:3)][m1], m1[,1], FUN = mean, na.rm = TRUE)
> data1989
date_fire type meanRainfall 1986.01.01 1986.06.03 1986.10.19 1987.01.19 1987.06.19 1987.10.19 1988.01.19 1988.06.19 1988.10.19 meanPreFire
1 1987-02-01 oak 600 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18 0.21 0.6525
2 1987-07-03 pine 300 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21 0.24 0.5590
3 1988-01-01 oak 450 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23 0.25 0.7635
Reshape data to the long form and filter dates prior to the forest fire.
library(tidyverse)
data1989 %>%
pivot_longer(-date_fire, names_to = "date") %>%
mutate(date_fire = as.Date(date_fire),
date = as.Date(date, "X%Y.%m.%d")) %>%
filter(date < date_fire) %>%
group_by(date_fire) %>%
summarise(meanPreFire = mean(value, na.rm = T))
# # A tibble: 3 x 2
# date_fire meanPreFire
# <date> <dbl>
# 1 1987-01-01 0.62
# 2 1987-07-03 0.559
# 3 1988-01-01 0.764
The solution would be much more concise if we would keep the data in long(er) form... but this reproduces the desired output:
library(dplyr)
library(tidyr)
data1989 %>%
pivot_longer(-date_fire, names_to = "date_NDVI", values_to = "value", names_prefix = "^X") %>%
mutate(date_fire = as.Date(date_fire, "%Y-%m-%d"),
date_NDVI = as.Date(date_NDVI, "%Y.%m.%d")) %>%
group_by(date_fire) %>%
mutate(period = ifelse(date_NDVI < date_fire, "before_fire", "after_fire")) %>%
group_by(date_fire, period) %>%
mutate(average_NDVI = mean(value, na.rm = TRUE)) %>%
pivot_wider(names_from = date_NDVI, names_prefix = "X", values_from = value) %>%
pivot_wider(names_from = period, values_from = average_NDVI) %>%
group_by(date_fire) %>%
summarise_all(funs(sum(., na.rm=T)))
Returns:
# A tibble: 3 x 12
date_fire `X1986-01-01` `X1986-06-03` `X1986-10-19` `X1987-01-19` `X1987-06-19` `X1987-10-19` `X1988-01-19` `X1988-06-19` `X1988-10-19` before_fire after_fire
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1987-01-01 0.5 0.56 0.8 0.75 0.1 0.15 0.2 0.18 0.21 0.62 0.265
2 1987-07-03 0.589 0.447 0 0.65 0.55 0.12 0.22 0.21 0.24 0.559 0.198
3 1988-01-01 0.66 0.75 0.83 0.75 0.811 0.78 0.32 0.23 0.25 0.764 0.267
Edit:
If we stop the expression right after calculating the averages we can use the data in this structure to easily calculate the variance or account for variable number of observations. I think it's ok to keep the date_fireas its own column, but I'd suggest leaving the other dates as a column (because they correspond to observations). Especially if we want to do more analysis with the data using ggplot2 and other tidyverse functions.
We can use base R, by creating a row/column index. The column index can be got from findInterval with the column names and the 'date_fire'
j1 <- findInterval(as.Date(data1989$date_fire), as.Date(names(data1989)[-1]))
l1 <- lapply(j1+1, `:`, ncol(data1989)-1)
m1 <- cbind(rep(seq_len(nrow(data1989)), j1), sequence(j1))
m2 <- cbind(rep(seq_len(nrow(data1989)), lengths(l1)), unlist(l1))
data1989$meanPreFire <- tapply(data1989[-1][m1], m1[,1], FUN = mean, na.rm = TRUE)
data1989$meanPostFire <- tapply(data1989[-1][m2], m2[,1], FUN = mean, na.rm = TRUE)
data1989
# date_fire 1986-01-01 1986-06-03 1986-10-19 1987-01-19 1987-06-19 1987-10-19 1988-01-19 1988-06-19 1988-10-19
#1 1987-01-01 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18 0.21
#2 1987-07-03 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21 0.24
#3 1988-01-01 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23 0.25
# meanPreFire meanPostFire
#1 0.6200 0.2650000
#2 0.5590 0.1975000
#3 0.7635 0.2666667
Or using melt/dcast from data.table
library(data.table)
dcast(melt(setDT(data1989), id.var = 'date_fire')[,
.(value = mean(value, na.rm = TRUE)),
.(date_fire, grp = c('postFire', 'preFire')[1 + (as.IDate(variable) < as.IDate(date_fire))]) ], date_fire ~ grp)[data1989, on = .(date_fire)]
# date_fire postFire preFire 1986-01-01 1986-06-03 1986-10-19 1987-01-19 1987-06-19 1987-10-19 1988-01-19 1988-06-19
#1: 1987-01-01 0.2650000 0.6200 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18
#2: 1987-07-03 0.1975000 0.5590 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21
#3: 1988-01-01 0.2666667 0.7635 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23
# 1988-10-19
#1: 0.21
#2: 0.24
#3: 0.25
data
data1989 <- data.frame("date_fire" = c("1987-01-01", "1987-07-03", "1988-01-01"),
"1986-01-01" = c(0.5, 0.589, 0.66),
"1986-06-03" = c(0.56, 0.447, 0.75),
"1986-10-19" = c(0.8, NA, 0.83),
"1987-01-19" = c(0.75, 0.65,0.75),
"1987-06-19" = c(0.1, 0.55,0.811),
"1987-10-19" = c(0.15, 0.12, 0.780),
"1988-01-19" = c(0.2, 0.22,0.32),
"1988-06-19" = c(0.18, 0.21,0.23),
"1988-10-19" = c(0.21, 0.24, 0.250), check.names = FALSE,
stringsAsFactors = FALSE)
I have a data frame which part of the columns are not in the correct order (they are dates). See:
data1989 <- data.frame("date_fire" = c("1987-02-01", "1987-07-03", "1988-01-01"),
"Foresttype" = c("oak", "pine", "oak"),
"meanSolarRad" = c(500, 550, 450),
"meanRainfall" = c(600, 300, 450),
"meanTemp" = c(14, 15, 12),
"1988.01.01" = c(0.5, 0.589, 0.66),
"1986.06.03" = c(0.56, 0.447, 0.75),
"1986.10.19" = c(0.8, NA, 0.83),
"1988.01.19" = c(0.75, 0.65,0.75),
"1986.06.19" = c(0.1, 0.55,0.811),
"1987.10.19" = c(0.15, 0.12, 0.780),
"1988.01.19" = c(0.2, 0.22,0.32),
"1986.06.19" = c(0.18, 0.21,0.23),
"1987.10.19" = c(0.21, 0.24, 0.250),
check.names = FALSE,
stringsAsFactors = FALSE)
> data1989
date_fire Foresttype meanSolarRad meanRainfall meanTemp 1988.01.01 1986.06.03 1986.10.19 1988.01.19 1986.06.19 1987.10.19 1988.01.19 1986.06.19 1987.10.19
1 1987-02-01 oak 500 600 14 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18 0.21
2 1987-07-03 pine 550 300 15 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21 0.24
3 1988-01-01 oak 450 450 12 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23 0.25
I would like to order the columns by increasing date, and keep the first 5 columns the same. Keep in mind that in my original dataset I have 30 initial columns to be kept the same.
As commented, try to avoid wide formatted data with columns that contain data elements such as dates, category values, other indicators. Instead use long-formatted, tidy data where ordering is much easier including aggregation, merging, plotting, and modeling.
Specifically, consider reshape to melt dates into one field such as quarter with value. Then order quarter column easily:
# RESHAPE WIDE TO LONG
long_data1989 <- reshape(data1989, varying = names(data1989)[6:ncol(data1989)],
times = names(data1989)[6:ncol(data1989)],
v.names = "value", timevar = "quarter", ids = NULL,
new.row.names = 1:1E4, direction = "long")
# ORDER DATES AND RESET row.names
long_data1989 <- `row.names<-`(with(long_data1989, long_data1989[order(date_fire, quarter),]),
NULL)
long_data1989
Online Demo
If you wanted to use dplyr here is an alternative. Note each colname would have to be unique. In you df there were some duplicate ones
library(dplyr)
data1989 <- data.frame("date_fire" = c("1987-02-01", "1987-07-03", "1988-01-01"),
"Foresttype" = c("oak", "pine", "oak"),
"meanSolarRad" = c(500, 550, 450),
"meanRainfall" = c(600, 300, 450),
"meanTemp" = c(14, 15, 12),
"1988.01.01" = c(0.5, 0.589, 0.66),
"1986.06.03" = c(0.56, 0.447, 0.75),
"1986.10.19" = c(0.8, NA, 0.83),
"1988.01.19" = c(0.75, 0.65,0.75),
"1986.06.19" = c(0.1, 0.55,0.811),
"1987.10.19" = c(0.15, 0.12, 0.780),
# "1988.01.19" = c(0.2, 0.22,0.32),
# "1986.06.19" = c(0.18, 0.21,0.23),
# "1987.10.19" = c(0.21, 0.24, 0.250),
check.names = FALSE,
stringsAsFactors = FALSE)
# Sort date column names. replace 6 with first date column
sorted_colnames = sort(names(data1989)[6:ncol(data1989)])
# Sort columns. Replace 5 with last non-date column
data1989 %>%
select(1:5, sorted_colnames)
We can convert the column names that are dates to Date class, do the order and then use that as column index
i1 <- grep('^\\d{4}\\.\\d{2}\\.\\d{2}$', names(data1989))
data1989[c(seq_len(i1[1]-1), order(as.Date(names(data1989)[i1], "%Y.%m.%d")) + i1[1]-1)]
# date_fire Foresttype meanSolarRad meanRainfall meanTemp 1986.06.03 1986.06.19 1986.06.19.1 1986.10.19 1987.10.19
#1 1987-02-01 oak 500 600 14 0.560 0.100 0.18 0.80 0.15
#2 1987-07-03 pine 550 300 15 0.447 0.550 0.21 NA 0.12
#3 1988-01-01 oak 450 450 12 0.750 0.811 0.23 0.83 0.78
# 1987.10.19.1 1988.01.01 1988.01.19 1988.01.19.1
#1 0.21 0.500 0.75 0.20
#2 0.24 0.589 0.65 0.22
#3 0.25 0.660 0.75 0.32
Base R solution (similar to #Parfaits):
# Reshape dataframe wide --> long:
df_long <-
reshape(data1989,
direction = "long",
varying = which(!(is.na(as.Date(names(data1989), "%Y.%m.%d")))),
idvar = which(is.na(as.Date(names(data1989), "%Y.%m.%d"))),
v.names = "value",
times = na.omit(as.Date(names(data1989), "%Y.%m.%d")),
timevar = "date_surveyed",
new.row.names = 1:(nrow(data1989)*length(na.omit(as.Date(names(data1989),
"%Y.%m.%d")))))
# Order the data frame and reset the index:
ordered_df_long <- data.frame(df_long[with(df_long, order(date_fire, date_surveyed)),],
row.names = NULL)
When using aggregate with compound function, the resulting data.frame has matrices inside columns.
ta=aggregate(cbind(precision,result,prPo)~rstx+qx+laplace,t0
,function(x) c(x=mean(x),m=min(x),M=max(x)))
ta=head(ta)
dput(ta)
structure(list(rstx = c(3, 3, 2, 3, 2, 3), qx = c(0.2, 0.25,
0.3, 0.3, 0.33, 0.33), laplace = c(0, 0, 0, 0, 0, 0), precision = structure(c(0.174583333333333,
0.186833333333333, 0.3035, 0.19175, 0.30675, 0.193666666666667,
0.106, 0.117, 0.213, 0.101, 0.22, 0.109, 0.212, 0.235, 0.339,
0.232, 0.344, 0.232), .Dim = c(6L, 3L), .Dimnames = list(NULL,
c("x", "m", "M"))), result = structure(c(-142.333333333333,
-108.316666666667, -69.1, -85.7, -59.1666666666667, -68.5666666666667,
-268.8, -198.2, -164, -151.6, -138.2, -144.8, -30.8, -12.2, -14.2,
-3.8, -12.6, -3.4), .Dim = c(6L, 3L), .Dimnames = list(NULL,
c("x", "m", "M"))), prPo = structure(c(3.68416666666667,
3.045, 2.235, 2.53916666666667, 2.0775, 2.23666666666667, 1.6,
1, 1.02, 0.54, 0.87, 0.31, 5.04, 4.02, 2.77, 3.53, 2.63, 3.25
), .Dim = c(6L, 3L), .Dimnames = list(NULL, c("x", "m", "M")))), .Names = c("rstx",
"qx", "laplace", "precision", "result", "prPo"), row.names = c(NA,
6L), class = "data.frame")
Is there a function that transform data.frame matrix-colum into columns?
Manually, for each matrix-column, column bind plus column delete works:
colnames(ta)
[1] "rstx" "qx" "laplace" "precision" "result" "prPo"
ta[,"precision"] # ta[,4]
x m M
[1,] 0.1745833 0.106 0.212
[2,] 0.1868333 0.117 0.235
[3,] 0.3035000 0.213 0.339
[4,] 0.1917500 0.101 0.232
[5,] 0.3067500 0.220 0.344
[6,] 0.1936667 0.109 0.232
#column bind + column delete
ta=cbind(ta,precision=ta[,4])
ta=ta[,-4]
colnames(ta)
[1] "rstx" "qx" "laplace" "result" "prPo" "precision.x" "precision.m"
[8] "precision.M"
ta
rstx qx laplace result.x result.m result.M prPo.x prPo.m prPo.M precision.x precision.m
1 3 0.20 0 -142.33333 -268.80000 -30.80000 3.684167 1.600000 5.040000 0.1745833 0.106
2 3 0.25 0 -108.31667 -198.20000 -12.20000 3.045000 1.000000 4.020000 0.1868333 0.117
3 2 0.30 0 -69.10000 -164.00000 -14.20000 2.235000 1.020000 2.770000 0.3035000 0.213
4 3 0.30 0 -85.70000 -151.60000 -3.80000 2.539167 0.540000 3.530000 0.1917500 0.101
5 2 0.33 0 -59.16667 -138.20000 -12.60000 2.077500 0.870000 2.630000 0.3067500 0.220
6 3 0.33 0 -68.56667 -144.80000 -3.40000 2.236667 0.310000 3.250000 0.1936667 0.109
precision.M
1 0.212
2 0.235
3 0.339
4 0.232
5 0.344
6 0.232
matrix doesn't support matrix-column. So as.matrix() transform data.frame into matrix, breaking up matrix-column.
Here is my idea:
library(tidyverse)
ta2 <- ta %>%
as.matrix() %>%
as.data.frame()
Somewhere in Stackoverflow I found a very simple solution:
cbind(ta[-ncol(ta)],ta[[ncol(ta)]])
rstx qx laplace precision.x precision.m precision.M result.x result.m result.M x m
1 3 0.20 0 0.1745833 0.1060000 0.2120000 -142.33333 -268.80000 -30.80000 3.684167 1.60
2 3 0.25 0 0.1868333 0.1170000 0.2350000 -108.31667 -198.20000 -12.20000 3.045000 1.00
3 2 0.30 0 0.3035000 0.2130000 0.3390000 -69.10000 -164.00000 -14.20000 2.235000 1.02
4 3 0.30 0 0.1917500 0.1010000 0.2320000 -85.70000 -151.60000 -3.80000 2.539167 0.54
5 2 0.33 0 0.3067500 0.2200000 0.3440000 -59.16667 -138.20000 -12.60000 2.077500 0.87
6 3 0.33 0 0.1936667 0.1090000 0.2320000 -68.56667 -144.80000 -3.40000 2.236667 0.31
M
1 5.04
2 4.02
3 2.77
4 3.53
5 2.63
6 3.25
Just that!