Plotting daily summed values of data against months [duplicate] - r

This question already has answers here:
How to change x axis from years to months with ggplot2
(2 answers)
Closed 5 years ago.
I am trying to make a ggplot of solar irradiance (from a weather file) on y-axis and time in months on x-axis.
My data consists of values collected on hour basis for 12 months so overall there are 8760 rows filled with data values.
Now, I want to make plot in such a way that for a single day, I only get a point on plot by adding values for a complete day (Not like taking all the values and plotting them. I believe geom_freqpoly() can plot this type of data. I have looked for this but not finding enough examples in the way I want. (Or if there is some approach that can help me achieve the plot I want as I am not sure what exactly I have to do to add points for a day. Otherwise writing code for 365 days is crazy)
I want the following kind of plot
My plot is showing all the reading for a year and looks like this
My code for this plotting is :
library(ggplot2)
cmsaf_data <- read.csv("C://Users//MEJA03514//Desktop//main folder//Irradiation data//tmy_era_25.796_45.547_2005_2014.csv",skip=16, header=T)
time<- strptime(cmsaf_data[,2], format = "%m/%d/%Y %H:%M")
data <- cbind(time,cmsaf_data[5])
#data %>% select(time)
data <- data.frame(data, months = month(time),days = mday(time))
data <- unite(data, date_month, c(months, days), remove=FALSE, sep="-")
data <- subset(data, data[,2]>0)
GHI <- data[,2]
date_month <- data[,3]
ggplot(data, aes(date_month, GHI))+geom_line()
whereas my data looks like this :
head(data)
time Global.horizontal.irradiance..W.m2.
1 2007-01-01 00:00:00 0
2 2007-01-01 01:00:00 0
3 2007-01-01 02:00:00 0
4 2007-01-01 03:00:00 0
5 2007-01-01 04:00:00 0
6 2007-01-01 05:00:00 159
As I want 1 point for a day, how can I perform sum function so that I can get the output I require and show months names on x-axis (may be using something from time and date that can do this addition for a day and give 365 vales for a year in output)
I have no idea at all of any such function or approach.
Your help will be appreciated!

Here is a solution using the tidyverse and lubridate packages. As you haven't provided complete sample data, I've generated some random data.
library(tidyverse)
library(lubridate)
data <- tibble(
time = seq(ymd_hms('2007-01-01 00:00:00'),
ymd_hms('2007-12-31 23:00:00'),
by='hour'),
variable = sample(0:400, 8760, replace = TRUE)
)
head(data)
#> # A tibble: 6 x 2
#> time variable
#> <dttm> <int>
#> 1 2007-01-01 00:00:00 220
#> 2 2007-01-01 01:00:00 348
#> 3 2007-01-01 02:00:00 360
#> 4 2007-01-01 03:00:00 10
#> 5 2007-01-01 04:00:00 18
#> 6 2007-01-01 05:00:00 227
summarised <- data %>%
mutate(date = date(time)) %>%
group_by(date) %>%
summarise(total = sum(variable))
head(summarised)
#> # A tibble: 6 x 2
#> date total
#> <date> <int>
#> 1 2007-01-01 5205
#> 2 2007-01-02 3938
#> 3 2007-01-03 5865
#> 4 2007-01-04 5157
#> 5 2007-01-05 4702
#> 6 2007-01-06 4625
summarised %>%
ggplot(aes(date, total)) +
geom_line()

In order to get a sum for every month of every year, you need to create a Column which describes a specific month of a specific year (Yearmon).
Then you can group over that Column and sum over that group giving you one sum for every month of every year.
Then you just plot it and set the labels of the x-axis to your liking.
library(ggplot2)
library(dplyr)
library(zoo)
library(scales)
# Create dummy data for time column
time <- seq.POSIXt(from = as.POSIXct("2007-01-01 00:00:00"),
to = as.POSIXct("2017-01-01 23:00:00"),
by = "hour")
# Create dummy data.frame
data <- data.frame(Time = time,
GHI = rnorm(length(time)))
############################
# Add column Yearmon to the data.frame
# Groupy by Yearmon and summarise with sum
# This creates one sum per Yearmon
# ungroup is often not neccessary, however
# not doing this caused problems for me in the past
# Change type of Yearmon to Date for ggplot
#
df <- mutate(data,
Yearmon = as.yearmon(Time)) %>%
group_by(Yearmon) %>%
summarise(GHI_sum = sum(GHI)) %>%
ungroup() %>%
mutate(Yearmon = as.Date(Yearmon))
# Plot the chart with special scale lables
ggplot(df, aes(Yearmon, GHI_sum))+
geom_line()+
scale_x_date(labels = date_format("%m/%y"))
I hope this helps.

Related

Create interval of dates for my existing data in R

I am trying to get my existing observations to 10 min intervals in R.
I did this:
data3$date= ceiling_date(as.POSIXct(data3$betdate), unit = "10 minutes")
data3 %>% group_by(date, prov) %>%
summarise(cant=n())
But the problem with this code it is that if there is no observation for one interval, the interval will not appear in the output data, which have a lot of sense because there are no observations with the date in that interval. So i need to capture the information about that intervals that does not have observations registred. Any ideas? Already thanks to all of you.
See a simplified example of #Limey's comment, using just months and data.table
# set up fake data
set.seed(1000)
library(lubridate)
# create sequence, and save it as a data.frame so it has a header
months <- seq(ymd("2022-01-01"), ymd("2022-06-01"), by = "month")
# create fake data, and remove some rows
dat <- data.frame(month = months, values = sample(100:200, length(months)))
dat <- dat[-sample(1:length(months),3),]
dat
# month values
#1 2022-01-01 167
#4 2022-04-01 150
#6 2022-06-01 128
here we perform the merge and see the NAs representing missing observations
library(data.table)
setDT(dat)
months_listed <- data.frame(month = seq(min(dat$month), max(dat$month), by = "month"))
setDT(months_listed)
merge.data.table(months_listed, dat, by = "month", all.x = T)
# month values
#1: 2022-01-01 167
#2: 2022-02-01 NA
#3: 2022-03-01 NA
#4: 2022-04-01 150
#5: 2022-05-01 NA
#6: 2022-06-01 128

R - Group datetimes using cut at < 1 second intervals

In R, I am able to group data by datetimes using the cut function to divide the datetimes in time interval groups.
To create datetime data with fractions of seconds, this can be done with epoch timestamps like as.POSIXct(nanotime::nanotime(1112089999201723886))
Here is some toy data:
times = c(as.POSIXct(nanotime::nanotime(1112089999201723886)), as.POSIXct(nanotime::nanotime(1112089999201724886)), as.POSIXct(nanotime::nanotime(1112089999201725886)), as.POSIXct(nanotime::nanotime(1112089999201726886)), as.POSIXct(nanotime::nanotime(1112089999201727886))),
as.POSIXct(nanotime::nanotime(1112089999201728886)))
x=c(5,6,7,8,9,10)
y=c('F','A','T','P','O','O')
In tabular format:
data
# A tibble: 9,188 x 3
datetime x y
<dttm> <dbl> <chr>
1 2000-12-31 5:00:00 5 F
2 2000-12-31 5:00:00 6 A
3 2000-12-31 5:00:00 7 T
4 2000-12-31 5:00:00 8 P
5 2000-12-31 5:00:00 9 O
6 2000-12-31 5:00:00 10 O
For example this works:
data %>% group_by(time_group=cut(datetime, "1 sec")) %>% summarise(count=n())
However, if I want to group by a time interval smaller than one secone, like half a second, or one tenth of a second, or 50ms, I can't do it in the same way.
E.g. these throw errors:
data %>% group_by(time_group=cut(datetime, "0.5 sec")) %>% summarise(count=n())
data %>% group_by(time_group=cut(datetime, "1 ms")) %>% summarise(count=n())
How can I accomplish this?
You can convert your sub-second times to integers and calculate the # of breaks you want like so:
interval_in_secs <- 0.250
interval_in_secs_cut_breaks <- ( max(as.numeric(df$timestamp)) - min(as.numeric(df$timestamp)) ) / interval_in_secs
df %>%
mutate(timestamp_ms_int = 1000*as.integer(timestamp)) %>%
group_by(timestamp = cut(timestamp_ms_int, interval_in_secs_cut_breaks))

Filter a data frame by two time series

Hi I am new to R and would like to know if there is a simple way to filter data over multiple dates.
I have a data which has dates from 07.03.2003 to 31.12.2016.
I need to split/ filter the data by multiple time series, as per below.
Dates require in new data frame:
07.03.2003 to 06/03/2005
and
01/01/2013 to 31/12/2016
i.e the new data frame should not include dates from 07/03/2005 to 31/12/2012
Let's take the following data.frame with dates:
df <- data.frame( date = c(ymd("2017-02-02"),ymd("2016-02-02"),ymd("2014-02-01"),ymd("2012-01-01")))
date
1 2017-02-02
2 2016-02-02
3 2014-02-01
4 2012-01-01
I can filter this for a range of dates using lubridate::ymd and dplyr::between and dplyr::between:
df1 <- filter(df, between(date, ymd("2017-01-01"), ymd("2017-03-01")))
date
1 2017-02-02
Or:
df2 <- filter(df, between(date, ymd("2013-01-01"), ymd("2014-04-01")))
date
1 2014-02-01
I would go with lubridate. In particular
library(data.table)
library(lubridate)
set.seed(555)#in order to be reproducible
N <- 1000#number of pseudonumbers to be generated
date1<-dmy("07-03-2003")
date2<-dmy("06-03-2005")
date3<-dmy("01-01-2013")
date4<-dmy("31-12-2016")
Creating data table with two columns (dates and numbers):
my_dt<-data.table(date_sample=c(sample(seq(date1, date4, by="day"), N),numeric_sample=sample(N,replace = F)))
> head(my_dt)
date_sample numeric_sample
1: 2007-04-11 2
2: 2006-04-20 71
3: 2007-12-20 46
4: 2016-05-23 78
5: 2011-10-07 5
6: 2003-09-10 47
Let's impose some cuts:
forbidden_dates<-interval(date2+1,date3-1)#create interval that dates should not fall in.
> forbidden_dates
[1] 2005-03-07 UTC--2012-12-31 UTC
test_date1<-dmy("08-03-2003")#should not fall in above range
test_date2<-dmy("08-03-2005")#should fall in above range
Therefore:
test_date1 %within% forbidden_dates
[1] FALSE
test_date2 %within% forbidden_dates
[1] TRUE
A good way of visualizing the cut:
before
>plot(my_dt)
my_dt<-my_dt[!(date_sample %within% forbidden_dates)]#applying the temporal cut
after
>plot(my_dt)

Calculate mean date across years

I am trying to calculate the mean date independent of year for each level of a factor.
DF <- data.frame(Date = seq(as.Date("2013-2-15"), by = "day", length.out = 730))
DF$ID = rep(c("AAA", "BBB", "CCC"), length.out = 730)
head(DF)
Date ID
1 2013-02-15 AAA
2 2013-02-16 BBB
3 2013-02-17 CCC
4 2013-02-18 AAA
5 2013-02-19 BBB
6 2013-02-20 CCC
With the data above and the code below, I can calculate the mean date for each factor, but this includes the year.
I want a mean month and day across years. The preferred result would be a POSIXct time class formatted as month-day (eg. 12-31 for Dec 31st) representing the mean month and day across multiple years.
library(dplyr)
DF2 <- DF %>% group_by(ID) %>% mutate(
Col = mean(Date, na.rm = T))
DF2
Addition
I am looking for the mean day of the year with a month and day component, for each factor level. If the date represents, for example, the date an animal reproduced, I am not interested in the yearly differences between years, but instead want a single mean day.
I The end result would look like DF2 but with the new value calculated as previously described (mean day of the year with a month day component.
Sorry this was not more clear.
If I understand your question correctly, here's how to get a mean date column. I first extract the day of the year with yday from POSIXlt. I then calculate the mean. To get a date back, I have to add those days to an actual year, hence the creation of the Year object. As requested, I put the results in the same format as DF2 in your example.
library(dplyr)
DF2 <- DF %>%
mutate(Year=format(Date,"%Y"),
Date_day=as.POSIXlt(Date, origin = "1960-01-01")$yday)%>%
group_by(ID) %>%
mutate(Col = mean(Date_day, na.rm = T),Mean_date=format(as.Date(paste0(Year,"-01-01"))+Col,"%m-%d"))%>%
select(Date,ID,Mean_date)
DF2
> DF2
Source: local data frame [730 x 3]
Groups: ID [3]
Date ID Mean_date
(date) (chr) (chr)
1 2013-02-15 AAA 07-02
2 2013-02-16 BBB 07-02
3 2013-02-17 CCC 07-01
4 2013-02-18 AAA 07-02
5 2013-02-19 BBB 07-02
6 2013-02-20 CCC 07-01
7 2013-02-21 AAA 07-02
8 2013-02-22 BBB 07-02
9 2013-02-23 CCC 07-01
10 2013-02-24 AAA 07-02
.. ... ... ...
You can take the mean of dates by using the mean function. However, note that the mean implementation (and result) will be different depending on the data type. For POSIXct, the mean will be calculated and return the date and time - think of taking the mean of a bunch of integers and you will likely get a float or numeric. For Date, it will essentially 'round' the date to the nearest date.
For example, I recently took a mean of dates. Look at the output when different data types are used.
> mean(as.Date(stationPointDf$knockInDate))
[1] "2018-06-04"
> mean(as.POSIXct(stationPointDf$knockInDate))
[1] "2018-06-03 21:19:21 CDT"
If I am looking for a mean Month and Day across years, I convert all the dates to have the current year using lubridate package.
library(lubridate)
year(myVectorOfDates) <- 2018
Then, I compute the mean and drop the year.

Find value relative to mean for particular Day of Year in R

I'm working with some meteorology data in R, and conceptually, I'm trying to find out how much a certain day is above/below average. To do this, I want to separate by day of year, find the average for all DOY (e.g. what is the average January 1 Temperature?), and then compare every date (e.g was January 1, 2014 anomalously warm, by how much?)
I can find a 'mean' table for every day of the year using aggregate:
head(data)
x date
1 5.072241 1970-01-01
2 6.517069 1970-01-02
3 4.413654 1970-01-03
4 11.129351 1970-01-04
5 9.331630 1970-01-05
library(lubridate)
temp = aggregate(data$x, list(yday(data$date)), mean)
but I'm stuck then how to use the aggregated table to compare with my original data.frame, to see how x at 1970 Jan 1 relates to average Jan 1 x.
We can remove the 'year' part with sub ('Monthday'). Use ave if a Mean variable needs to be created grouped by 'Monthday'.
data$Monthday <- sub('\\d+-', '', data$date)
data$Mean <- with(data, ave(x, Monthday))
Then, we can compare with 'x' variable, for example
data$rel_temp <- with(data, x/Mean)
You should use dplyr as well.
library(dplyr); library(lubridate)
data %>% mutate(year_day = paste0(month(date), "_",mday(date))) %>%
group_by(year_day) %>% mutate(relev_temp = x/mean(x)) %>% ungroup
The logic is the following:
Create a new variable year_day which is just the month and day of every date mutate(year_day =...
Then take the temperature x and divide with the average temp of that year_day, group_by(year_day) %>% mutate(relev_temp = x/mean(x))
Thanks for the feedback. #akrun's answer works well for me.
As an alternative, I also hacked this together, which produces the same output as #akrun's answer (and is 1/10th of a second slower for 40 yrs of daily data):
averages = aggregate(x, list(DOY = yday(date)), mean)
temp = merge(data.frame(x,date, DOY = yday(date)), averages, by = 'DOY')
head(temp[order(temp$date),])
DOY x.x date x.y
1 1 -12.0 1970-01-01 -8.306667
70 2 -14.2 1970-01-02 -8.695556
113 3 -16.7 1970-01-03 -8.060000
157 4 -13.6 1970-01-04 -8.233333
200 5 -19.2 1970-01-05 -8.633333
243 6 -15.0 1970-01-06 -8.922222

Resources