I have a dataframe representing a two-year daily time series of temperature for two rivers. I have identified when the temperature is either above or below the peak temperature. I have also created a run-length ID column for when temperature is either above or below a threshold temperature of 10 degrees.
How can I get the first day of year for each site and year and the following conditions:
maximum run-length & below peak = TRUE
maximum run-length & above peak = TRUE
Example Data:
library(ggplot2)
library(lubridate)
library(dplyr)
library(dataRetrieval)
siteNumber <- c("01432805","01388000") # United States Geological Survey site numbers
parameterCd <- "00010" # temperature
statCd <- "00003" # mean
startDate <- "1996-01-01"
endDate <- "1997-12-31"
dat <- readNWISdv(siteNumber, parameterCd, startDate, endDate, statCd=statCd) # obtains the timeseries from the USGS
dat <- dat[,c(2:4)]
colnames(dat)[3] <- "temperature"
# To view at the time series
ggplot(data = dat, aes(x = Date, y = temperature)) +
geom_point() +
theme_bw() +
facet_wrap(~site_no)
To create the columns described above
dat <- dat %>%
mutate(year = year(Date),
doy = yday(Date)) %>% # doy = day of year
group_by(site_no, year) %>%
mutate(lt_10 = temperature <= 10,
peak_doy = doy[which.max(temperature)],
below_peak = doy < peak_doy,
after_peak = doy > peak_doy,
run = data.table::rleid(lt_10))
View(dat)
The ideal output would look as follows:
site_no year doy_below doy_after
1 01388000 1996 111 317
2 01388000 1997 112 312
3 01432805 1996 137 315
4 01432805 1997 130 294
doy_after = the first row for after_peak == TRUE & max(run) when group_by(site_no,year)
doy_below = the first row for below_peak == TRUE & max(run) when group_by(site_no,year)
For site_no = 01388000 in year = 1996, the max(run) when below_peak == TRUE is 4. The first row whenrun = 4 and below_peak == TRUE corresponds with date 1996-04-20 which has a doy = 111.
As the data is already grouped, just summarise by extracting the 'doy' where the run is max for the subset of run where the values are TRUE in 'below_peak' or 'after_peak' and get the first element of 'doy'
library(dplyr)
dat %>%
summarise(doy_below = first(doy[run == max(run[below_peak])]),
doy_above = first(doy[run == max(run[after_peak])]), .groups = 'drop')
-output
# A tibble: 4 × 4
site_no year doy_below doy_above
<chr> <dbl> <dbl> <dbl>
1 01388000 1996 111 317
2 01388000 1997 112 312
3 01432805 1996 137 315
4 01432805 1997 130 294
I have a dataset that counts number of posts per month per year. Looks like that:
monthdate year n
<date> <dbl> <int>
1 2020-01-01 2001 133
2 2020-01-01 2002 129
3 2020-01-01 2003 149
4 2020-01-01 2004 96
5 2020-01-01 2005 94
6 2020-01-01 2006 109
7 2020-01-01 2007 158
8 2020-01-01 2008 138
9 2020-01-01 2009 83
(monthdate as a date is needed only for rendering month names in ggplot).
So the resulting plot is generated like that:
posts %>% mutate(monthdate = as.Date(paste("2020", month, '01', sep = "-"))) %>%
group_by(monthdate, year) %>% summarise(n = n()) %>%
ggplot(aes(x = monthdate, y = n)) +
geom_point(, stat = 'identity') +
geom_smooth(method = "loess") +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
and looks like that:
I want to give year labels for topmost and bottom most outliers, so for each month it can be seen which year produced the least and the most posts per month. What is the efficient way to do it?
Ok, I found the solution. Pretty simple:
posts %>% mutate(monthdate = as.Date(paste("2020", month, '01', sep = "-"))) %>%
group_by(monthdate, year) %>% summarise(n = n()) %>% group_by(monthdate) %>% mutate(lab=case_when(n==max(n)|n==min(n)~year))%>%
ggplot(aes(x = monthdate, y = n)) +
geom_point(, stat = 'identity') +
geom_smooth(method = "loess") +
scale_x_date(date_breaks = "1 month", date_labels = "%b") +
xlab('Month')+
ylab('Number of posts')+ geom_text(aes(label=lab))
and the resulting plot is:
I want to chart the relative no of fatalities by year for each of various event types.
I can do with with facets in ggplot but am struggling to calculate the % By Event based on Event, Year and no of fatalities.
Event Type Year Fatalities % by Event
(calculated)
----- ---- ---------- ----------
Storm 1980 5 12.5%
Storm 1981 9 22.5%
Storm 1982 15 37.5%
Storm 1983 11 27.5%
Ice 1980 7 70%
Ice 1981 3 30%
I have the following code to calculate it, but the calculation is not working with the % using a much higher denominator.
fatalitiesByYearType <- stormDF %>%
group_by(eventType) %>%
mutate(totalEventFatalities = sum(FATALITIES)) %>%
group_by(year, add = TRUE) %>%
mutate(fatalitiesPct = sum(FATALITIES) / totalEventFatalities)
What am I doing wrong?
My charting as a below. I include this in case as I'm also interested to see whether there is a way of showing data in a proportionate way within ggplot.
p <- ggplot(data = fatalitiesByYearType,
aes(x=factor(year),y=fatalitiesPct))
p + geom_bar(stat="identity") +
facet_wrap(.~eventType, nrow = 5) +
labs(x = "Year",
y = "Fatalities",
title = "Fatalities by Type")
Maybe I do not get your problem, but we can start from here:
library(dplyr)
library(ggplot2)
# here the dplyr part
dats <- fatalitiesByYearType %>%
group_by(eventType) %>%
mutate(totalEventFatalities = sum(FATALITIES)) %>%
group_by(year, add = TRUE) %>%
# here we add the summarise
summarise(fatalitiesPct = sum(FATALITIES) / totalEventFatalities)
dats
# A tibble: 6 x 3
# Groups: eventType [?]
eventType year fatalitiesPct
<fct> <int> <dbl>
1 Ice 1980 0.7
2 Ice 1981 0.3
3 Storm 1980 0.125
4 Storm 1981 0.225
5 Storm 1982 0.375
6 Storm 1983 0.275
You can clearly merge everything in an unique dplyr chain:
# here the ggplot2 part
p <- ggplot(dats,aes(x=factor(year),y=fatalitiesPct)) +
geom_bar(stat="identity") +
facet_wrap(.~eventType, nrow = 5) +
labs(x = "Year", y = "Fatalities", title = "Fatalities by Type") +
# here we add the % in the plot
scale_y_continuous(labels = scales::percent)
With data:
fatalitiesByYearType <- read.table(text = "eventType year FATALITIES
Storm 1980 5
Storm 1981 9
Storm 1982 15
Storm 1983 11
Ice 1980 7
Ice 1981 3 ",header = T)
Hello! Is there a way to index a chart to start and end at specific points
(which may be out of numeric order)?
I have data that begins October 1st, and ends September 31st the following year. The series repeats through multiple years past, and i want to build a daily seasonality chart. The challenge is the X axis is not from low to high, it runs 10-11-12-1-2-3-4-5-6-7-8-9.
Question 1:
Can you order the index by month 10-11-12-1-2-3-4-5-6-7-8-9?
while, being compatible with %m-%d formatting, as the real problem is in
daily format, but for the sake of brevity, I am only using months.
the result should look something like this...sorry i had to use excel...
Question 2:
Can we remove the connected chart lines, or will the solution to 1, naturally fix
question 2? examples in the attempts below.
Question 3:
Can the final formatting of the solution allow to take a moving average, or other
mutations of the initial data? The table in attempt #2 would allow to take the average of each month by year. Since July 17 is 6 and July 18 is 12, we would plot a 9 in the chart, ect for the entire plot.
Question 4:
Is there and XTS equivalent to solve this problem?
THANK YOU, THANK YOU, THANK YOU!
library(ggplot2)
library(plotly)
library(tidyr)
library(reshape2)
Date <- seq(as.Date("2016-10-1"), as.Date("2018-09-01"), by="month")
values <- c(2,3,4,3,4,5,6,4,5,6,7,8,9,10,8,9,10,11,12,13,11,12,13,14)
YearEnd <-c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,
2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018)
df <- data.frame(Date,values,YearEnd)
## PLOT THE TIMESERIES
plot_ly(df, x = ~Date, y = ~values, type = "scatter", mode = "lines")
## PLOT THE DATA BY MONTH: attempt 1
df$Month <- format(df$Date, format="%m")
df2 <- df %>%
select(values, Month, YearEnd)
plot_ly(df2, x = ~Month, y = ~values, type = "scatter", mode = "lines",
connectgaps = FALSE)
## Plot starts on the 10th month, which is good, but the index is
## in standard order, not 10-11-12-1-2-3-4-5-6-7-8-9
## It also still connects the gaps, bad.
## CREATE A PIVOTTABLE: attempt 2
table <- spread(df2,YearEnd, values)
df3 <- melt(table , id.vars = 'Month', variable.name = 'series')
plot_ly(df3, x = ~Month, y = ~values, type = "scatter", mode = "lines",
connectgaps = FALSE)
## now the data are in the right order, but the index is still wrong
## I also do not understand how plotly is ordering it correctly, as 2
## is not the starting point in January.
You just need to set the desired levels for the Month inside factor
library(magrittr)
library(tidyverse)
library(lubridate)
library(plotly)
Date <- seq(as.Date("2016-10-1"), as.Date("2018-09-01"), by = "month")
values <- c(2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7, 8, 9, 10, 8, 9, 10, 11, 12, 13, 11, 12, 13, 14)
YearEnd <- c(
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018
)
df <- data.frame(Date, values, YearEnd)
# to fiscal year order
df %<>%
mutate(
Month = month(Date),
YearEnd = factor(YearEnd)) %>%
mutate(Month = factor(Month,
levels = c(10:12, 1:9),
labels = c(month.abb[10:12], month.abb[1:9])))
df
#> Date values YearEnd Month
#> 1 2016-10-01 2 2017 Oct
#> 2 2016-11-01 3 2017 Nov
#> 3 2016-12-01 4 2017 Dec
#> 4 2017-01-01 3 2017 Jan
#> 5 2017-02-01 4 2017 Feb
#> 6 2017-03-01 5 2017 Mar
#> 7 2017-04-01 6 2017 Apr
#> 8 2017-05-01 4 2017 May
#> 9 2017-06-01 5 2017 Jun
#> 10 2017-07-01 6 2017 Jul
#> 11 2017-08-01 7 2017 Aug
#> 12 2017-09-01 8 2017 Sep
...
p1 <- ggplot(df, aes(
x = Month, y = values,
color = YearEnd,
group = YearEnd)) +
geom_line() +
theme_classic(base_size = 12)
ggplotly(p1)
Edit: to plot by Julian day, we use a similar method to the 3rd one from this answer
# Generate random data
set.seed(2018)
date = seq(from = as.Date("2016-10-01"), to = as.Date("2018-09-30"),
by = "days")
values = c(rnorm(length(date)/2, 8, 1.5), rnorm(length(date)/2, 16, 2))
dat <- data.frame(date, values)
df <- dat %>%
tbl_df() %>%
mutate(jday = factor(yday(date)),
Month = month(date),
Year = year(date),
# only create label for the 1st day of the month
myLabel = case_when(day(date) == 1L ~ format(date, "%b-%d"),
TRUE ~ NA_character_)) %>%
# create fiscal year column
mutate(fcyear = case_when(Month > 9 ~ as.factor(Year + 1),
TRUE ~ as.factor(Year))) %>%
mutate(Month = factor(Month,
levels = c(10:12, 1:9),
labels = c(month.abb[10:12], month.abb[1:9])))
df
#> # A tibble: 730 x 7
#> date values jday Month Year myLabel fcyear
#> <date> <dbl> <fct> <fct> <dbl> <chr> <fct>
#> 1 2016-10-01 7.37 275 Oct 2016 Oct-01 2017
#> 2 2016-10-02 5.68 276 Oct 2016 <NA> 2017
#> 3 2016-10-03 7.90 277 Oct 2016 <NA> 2017
#> 4 2016-10-04 8.41 278 Oct 2016 <NA> 2017
#> 5 2016-10-05 10.6 279 Oct 2016 <NA> 2017
#> 6 2016-10-06 7.60 280 Oct 2016 <NA> 2017
#> 7 2016-10-07 11.1 281 Oct 2016 <NA> 2017
#> 8 2016-10-08 9.30 282 Oct 2016 <NA> 2017
#> 9 2016-10-09 7.08 283 Oct 2016 <NA> 2017
#> 10 2016-10-10 8.96 284 Oct 2016 <NA> 2017
#> # ... with 720 more rows
# Create a row number for plotting to make sure ggplot plot in
# the exact order of a fiscal year
df1 <- df %>%
group_by(fcyear) %>%
mutate(order = row_number()) %>%
ungroup()
df1
#> # A tibble: 730 x 8
#> date values jday Month Year myLabel fcyear order
#> <date> <dbl> <fct> <fct> <dbl> <chr> <fct> <int>
#> 1 2016-10-01 7.37 275 Oct 2016 Oct-01 2017 1
#> 2 2016-10-02 5.68 276 Oct 2016 <NA> 2017 2
#> 3 2016-10-03 7.90 277 Oct 2016 <NA> 2017 3
#> 4 2016-10-04 8.41 278 Oct 2016 <NA> 2017 4
#> 5 2016-10-05 10.6 279 Oct 2016 <NA> 2017 5
#> 6 2016-10-06 7.60 280 Oct 2016 <NA> 2017 6
#> 7 2016-10-07 11.1 281 Oct 2016 <NA> 2017 7
#> 8 2016-10-08 9.30 282 Oct 2016 <NA> 2017 8
#> 9 2016-10-09 7.08 283 Oct 2016 <NA> 2017 9
#> 10 2016-10-10 8.96 284 Oct 2016 <NA> 2017 10
#> # ... with 720 more rows
# plot with `order` as x-axis
p2 <- ggplot(df1,
aes(x = order, y = values,
color = fcyear,
group = fcyear)) +
geom_line() +
theme_classic(base_size = 12) +
xlab(NULL)
# now replace `order` label with `myLabel` created above
x_break <- df1$order[!is.na(df1$myLabel)][1:12]
x_label <- df1$myLabel[x_break]
x_label
#> [1] "Oct-01" "Nov-01" "Dec-01" "Jan-01" "Feb-01" "Mar-01" "Apr-01"
#> [8] "May-01" "Jun-01" "Jul-01" "Aug-01" "Sep-01"
p3 <- p2 +
scale_x_continuous(
breaks = x_break,
labels = x_label) +
theme(axis.text.x = element_text(angle = 90)) +
scale_color_brewer("Fiscal Year", palette = "Dark2") +
xlab(NULL)
p3
ggplotly(p3)
Created on 2018-09-09 by the reprex package (v0.2.0.9000).
Consider this an appendix to Tung's excellent answer. Here I've made it obvious how to alter the code for different start and end months of financial (or production) years which varies by country (and industry), with the Parameter EndMonth. I've also added an annual average, which seems like a pretty obvious thing to want as well (though outside the OP's request).
library(tidyverse)
library(lubridate)
## Generate random data
set.seed(2018)
date = seq(from = as.Date("2016-06-01"), to = as.Date("2016-06-01")+729,
by = "days") # about 2 years, but even number of days
values = c(rnorm(length(date)/2, 8, 1.5), rnorm(length(date)/2, 16, 2))
dat <- data.frame(date, values)
EndMonth <- 5 #i.e. if last month of financial year is May, use 5 for 5th month of calendar year
df <- dat %>%
tbl_df() %>%
mutate(jday = factor(yday(date)),
Month = month(date),
Year = year(date),
# only create label for the 1st day of the month
myLabel = case_when(day(date) == 1L ~ format(date, "%b%e"),
TRUE ~ NA_character_)) %>%
# create fiscal year column
mutate(fcyear = case_when(Month > EndMonth ~ as.factor(Year + 1),
TRUE ~ as.factor(Year))) %>%
mutate(Month = factor(Month,
levels = c((EndMonth+1):12, 1:(EndMonth)),
labels = c(month.abb[(EndMonth+1):12], month.abb[1:EndMonth])))
df
#make 2 (or n) year average
df_mean <- df %>%
group_by(jday) %>%
mutate(values = mean(values, na.rm=TRUE)) %>%
filter(fcyear %in% c("2017")) %>% #note hard code for first fcyear in dataset
mutate(fcyear = "Average")
#Add average to data frame
df <- bind_rows(df, df_mean)
# Create a row number for plotting to make sure ggplot plot in
# the exact order of a fiscal year
df1 <- df %>%
group_by(fcyear) %>%
mutate(order = row_number()) %>%
ungroup()
df1
# plot with `order` as x-axis
p2 <- ggplot(df1,
aes(x = order, y = values,
color = fcyear,
group = fcyear)) +
geom_line() +
theme_classic(base_size = 12) +
xlab(NULL)
p2
# now replace `order` label with `myLabel` created above
x_break <- df1$order[!is.na(df1$myLabel)][1:12]
x_label <- df1$myLabel[x_break]
x_label
p3 <- p2 +
scale_x_continuous(
breaks = x_break,
labels = x_label) +
theme(axis.text.x = element_text(angle = 90)) +
scale_color_brewer("Fiscal Year", palette = "Dark2") +
xlab(NULL)
p3
I have tryed to understand the other results, but I could not.
This is my dataset:
> HIST
# A tibble: 1,071 x 16
Ano Leilao Fonte UF Vend Projeto
<dbl> <chr> <chr> <chr> <chr> <chr>
1 2008 2008 Leilao 1 Bio SP Abengoa UTE São Luiz (Abengoa São Luiz)
2 2013 2013 A-5 1 Bio MS AMANDINA Amandina
3 2017 2017 A-6 Bio MG BEVAP BIOENERGETICA AROEIRA 2
4 2015 2015 A-5 1 Bio BA Bolt BOLTBAH
5 2013 2013 A-5 1 Bio BA Bolt CAMPO GRANDE
6 2013 2013 A-5 1 Bio PI Bolt CANTO DO BURITI
7 2010 2010 LER Bio TO Bunge PEDRO AFONSO
8 2015 2015 LFA Bio SP Clealco CLEALCO QUEIROZ
9 2015 2015 A-3 Bio SP Clealco CLEALCO QUEIROZ
10 2008 2008 Leilao 1 Bio MG CMAA UTE Vale do Tijuco
# ... with 1,061 more rows, and 10 more variables: CODPPA <dttm>, CAPEX <dbl>,
# MW <dbl>, GF <dbl>, FC <dbl>, PPA <dbl>, RMW <dbl>, WACC <dbl>, TIR <dbl>,
# VPL <dbl>
`
I want to make a graph sorted by the sum(MW), like this:
HIST %>%
group_by(Fonte, UF)%>%
summarise(SUMMW = sum(MW))%>%
arrange(desc(SUMMW))%>%
ggplot(aes(x = UF, y = SUMMW, fill = Fonte))+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_col()
But the problem is that I get the following chart, not ordered by the sum of MW. I would like this graph`s columns to be ordered by the height of the columns:
thank you, Paulo
I think the easiest way is to reorder your variable SUMMW in the aestetics function aes with reorder(UF, desc(SUMMW)):
HIST %>%
group_by(Fonte, UF)%>%
summarise(SUMMW = sum(MW))%>%
arrange(desc(SUMMW))%>%
ggplot(aes(x = reorder(UF, desc(SUMMW)), y = SUMMW, fill = Fonte))+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_col()
You can get calculate the height of each bar first & assign UF the appropriate order as a factor. Otherwise ggplot will plot UF's values in categorical order on the x-axis.
# create summary data frame from HIST
df <- HIST %>%
group_by(Fonte, UF) %>%
summarise(SUMMW = sum(MW))
# calculate total bar height for each UF value, & sort accordingly.
df2 <- df %>%
group_by(UF) %>%
summarise(bar.heights = sum(SUMMW)) %>%
ungroup() %>%
arrange(desc(bar.heights))
# convert UF in the summary data frame to factor, with levels in the sorted order
df$UF <- factor(df$UF, levels = df2$UF)
rm(df2) # you can remove df2 after this; it's not needed anymore
# plot
ggplot(df,
aes(x = UF, y = SUMMW, fill = Fonte))+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_col()