ggplot time series: Plotting a full month on the x-axis - r

Ola, I have a question concerning ggplot.
In this code I have 31 days and each day has around 200 measured times with a precipitation sum.
Delay is measured in seconds, Precip in mm (times 100, for plotting measures).
The code below is just so you can get a good view on this.
I'm currently struggling to form this into a ggplot graph.
X ARRIVAL DELAY PRECIP DATDEP
1 08:12 -10 0 01AUG2019
2 11:22 120 19.2222 01AUG2019
3 09:22 22 0.4444 01AUG2019
4 21:22 0 33.2222 01AUG2019
5 08:22 2 744.4444 02AUG2019
etc. etc.
How do I manage to plot this month into a plot with 2 lines (one for DELAY and one for PRECIP)? I can't manage to transform the DATDEP into a nice time frame on the x-axis.
Q2: also for for example 4 months? How would you manage to form that into a nice time frame on the x-axis.

Use package lubridate for your dates. And make your data long. Many many threads here on SO on this topic.
library(tidyverse)
library(lubridate)
#devtools::install_github('alistaire47/read.so')
mydf <- read.so::read.so(' X ARRIVAL DELAY PRECIP DATDEP
1 08:12 -10 0 01AUG2019
2 11:22 120 19.2222 01AUG2019
3 09:22 22 0.4444 01AUG2019
4 21:22 0 33.2222 01AUG2019
5 08:22 2 744.4444 02AUG2019')
mydf <- mydf %>% pivot_longer(names_to = 'key', values_to = 'value', cols = DELAY:PRECIP)
ggplot(mydf, aes(dmy(DATDEP), value)) + geom_line(aes(color = key))
Created on 2020-03-20 by the reprex package (v0.3.0)

Related

How to use another variable values as labels on date x-axis in ggplot?

I have created a ggplot using date x axis but I would like to show their values from another variable instead of dates.
df
library(tidyverse)
library(lubridate)
df <- read_rds("https://github.com/johnsnow09/covid19-df_stack-code/blob/main/vaccine_milestones.rds?raw=true")
df
Updated.On cr_bin days_to_next_10cr_vacc
<date> <fct> <drtn>
1 2021-04-11 10 Cr 85 days
2 2021-05-27 20 Cr 46 days
3 2021-06-24 30 Cr 28 days
4 2021-07-18 40 Cr 24 days
5 2021-08-06 50 Cr 19 days
6 2021-08-25 60 Cr 19 days
7 2021-09-07 70 Cr 13 days
8 2021-09-18 80 Cr 11 days
9 2021-10-02 90 Cr 14 days
df %>%
ggplot(aes(x = Updated.On, y = days_to_next_10cr_vacc)) +
geom_col() +
scale_x_date(aes(labels = cr_bin))
Also tried: scale_x_date(aes(labels = c("10","20","30","40","50","60","70","80","90")))
In the plot on the x axis I would like to have values displayed from cr_bin instead of dates as 10 Cr, 20 cr, 30 Cr ... so on 90 Cr.
I have tried above code but I am not sure what else to use in place of labels to get desired results
You need to set breaks for labels. I'm using unique, just in case there might be duplicate rows.
Also note conversion off difftime to integer.
library(tidyverse)
library(lubridate)
df <- read_rds("https://github.com/johnsnow09/covid19-df_stack-code/blob/main/vaccine_milestones.rds?raw=true")
df %>%
ggplot(aes(x = Updated.On, y = as.integer(days_to_next_10cr_vacc))) +
geom_col() +
scale_x_date(breaks = unique(df$Updated.On), labels = unique(df$cr_bin))
Created on 2021-10-21 by the reprex package (v2.0.1)

Given time column, how can I create time bins in R?

Given a dataframe with say 3 columns:
date time respond
1/1/2018 15:40 1
4/5/2017 08:25 0
3/4/2016 09:00 1
5/4/2017 09:25 1
....
I want to bin my time column say into 24 bins - for each our and if for example I have 50 samples I want all times between hour1 to hour2 (08:00 - 09:00) to represent bin of 08:00 hour etc.
Now when I will achieve this, I want to count how many responders I have within each bin:
bin08:00 = 10 responders
bin09:00 = 134 responders
and to plot it using ggplot2.
Also please guide me how can I create different bin map:
from 08:00 to 12:00 AM - hourly bins.
12:00AM - 15:00 every 15 minutes bins etc.
Please guide me how can I do this.
#akrun
One way to do this is to use strptime to format your time column as POSIX objects, and then use format on those objects to round down to the hour like so:
library(dplyr)
df$hour <- format(strptime(df$time, "%H:%M"), "%H:00")
df %>% group_by(hour) %>% summarize(respond = sum(respond))
# # A tibble: 3 x 2
# hour respond
# <chr> <int>
# 1 08:00 0
# 2 09:00 2
# 3 15:00 1

Plotting daily summed values of data against months [duplicate]

This question already has answers here:
How to change x axis from years to months with ggplot2
(2 answers)
Closed 5 years ago.
I am trying to make a ggplot of solar irradiance (from a weather file) on y-axis and time in months on x-axis.
My data consists of values collected on hour basis for 12 months so overall there are 8760 rows filled with data values.
Now, I want to make plot in such a way that for a single day, I only get a point on plot by adding values for a complete day (Not like taking all the values and plotting them. I believe geom_freqpoly() can plot this type of data. I have looked for this but not finding enough examples in the way I want. (Or if there is some approach that can help me achieve the plot I want as I am not sure what exactly I have to do to add points for a day. Otherwise writing code for 365 days is crazy)
I want the following kind of plot
My plot is showing all the reading for a year and looks like this
My code for this plotting is :
library(ggplot2)
cmsaf_data <- read.csv("C://Users//MEJA03514//Desktop//main folder//Irradiation data//tmy_era_25.796_45.547_2005_2014.csv",skip=16, header=T)
time<- strptime(cmsaf_data[,2], format = "%m/%d/%Y %H:%M")
data <- cbind(time,cmsaf_data[5])
#data %>% select(time)
data <- data.frame(data, months = month(time),days = mday(time))
data <- unite(data, date_month, c(months, days), remove=FALSE, sep="-")
data <- subset(data, data[,2]>0)
GHI <- data[,2]
date_month <- data[,3]
ggplot(data, aes(date_month, GHI))+geom_line()
whereas my data looks like this :
head(data)
time Global.horizontal.irradiance..W.m2.
1 2007-01-01 00:00:00 0
2 2007-01-01 01:00:00 0
3 2007-01-01 02:00:00 0
4 2007-01-01 03:00:00 0
5 2007-01-01 04:00:00 0
6 2007-01-01 05:00:00 159
As I want 1 point for a day, how can I perform sum function so that I can get the output I require and show months names on x-axis (may be using something from time and date that can do this addition for a day and give 365 vales for a year in output)
I have no idea at all of any such function or approach.
Your help will be appreciated!
Here is a solution using the tidyverse and lubridate packages. As you haven't provided complete sample data, I've generated some random data.
library(tidyverse)
library(lubridate)
data <- tibble(
time = seq(ymd_hms('2007-01-01 00:00:00'),
ymd_hms('2007-12-31 23:00:00'),
by='hour'),
variable = sample(0:400, 8760, replace = TRUE)
)
head(data)
#> # A tibble: 6 x 2
#> time variable
#> <dttm> <int>
#> 1 2007-01-01 00:00:00 220
#> 2 2007-01-01 01:00:00 348
#> 3 2007-01-01 02:00:00 360
#> 4 2007-01-01 03:00:00 10
#> 5 2007-01-01 04:00:00 18
#> 6 2007-01-01 05:00:00 227
summarised <- data %>%
mutate(date = date(time)) %>%
group_by(date) %>%
summarise(total = sum(variable))
head(summarised)
#> # A tibble: 6 x 2
#> date total
#> <date> <int>
#> 1 2007-01-01 5205
#> 2 2007-01-02 3938
#> 3 2007-01-03 5865
#> 4 2007-01-04 5157
#> 5 2007-01-05 4702
#> 6 2007-01-06 4625
summarised %>%
ggplot(aes(date, total)) +
geom_line()
In order to get a sum for every month of every year, you need to create a Column which describes a specific month of a specific year (Yearmon).
Then you can group over that Column and sum over that group giving you one sum for every month of every year.
Then you just plot it and set the labels of the x-axis to your liking.
library(ggplot2)
library(dplyr)
library(zoo)
library(scales)
# Create dummy data for time column
time <- seq.POSIXt(from = as.POSIXct("2007-01-01 00:00:00"),
to = as.POSIXct("2017-01-01 23:00:00"),
by = "hour")
# Create dummy data.frame
data <- data.frame(Time = time,
GHI = rnorm(length(time)))
############################
# Add column Yearmon to the data.frame
# Groupy by Yearmon and summarise with sum
# This creates one sum per Yearmon
# ungroup is often not neccessary, however
# not doing this caused problems for me in the past
# Change type of Yearmon to Date for ggplot
#
df <- mutate(data,
Yearmon = as.yearmon(Time)) %>%
group_by(Yearmon) %>%
summarise(GHI_sum = sum(GHI)) %>%
ungroup() %>%
mutate(Yearmon = as.Date(Yearmon))
# Plot the chart with special scale lables
ggplot(df, aes(Yearmon, GHI_sum))+
geom_line()+
scale_x_date(labels = date_format("%m/%y"))
I hope this helps.

Time series graph does not show a fluid line

I am quite new to r and am trying to perform ARIMA time series forecast. The data I am looking into in electricity load per 15 min. My data looks as follows:
day month year PTE periode_van periode_tm gemeten_uitwisseling
1 1 01 2010 1 0 secs 900 secs 2636
2 1 01 2010 2 900 secs 1800 secs 2621
3 1 01 2010 3 1800 secs 2700 secs 2617
4 1 01 2010 4 2700 secs 3600 secs 2600
5 1 01 2010 5 3600 secs 4500 secs 2582
geplande_import geplande_export date weekend
1 719 -284 2010-01-01 00:00:00 0
2 719 -284 2010-01-01 00:15:00 0
3 719 -284 2010-01-01 00:30:00 0
4 719 -284 2010-01-01 00:45:00 0
5 650 -253 2010-01-01 01:00:00 0
weekday Month gu_ma
1 5 01 NA
2 5 01 NA
3 5 01 NA
4 5 01 NA
5 5 01 NA
to create a time series I have used the following code
library("zoo")
ZOO <- zoo(NLData$gemeten_uitwisseling,
order.by=as.POSIXct(NLData$date, format="%Y-%m-%d %H:%M:%S"))
ZOO <- na.approx(ZOO)
tsNLData <- ts(ZOO)
plot(tsNLData)
I have also tried the following
NLDatats <- ts(NLData$gemeten_uitwisseling, frequency = 96)
However when I plot the data I get the following;
How can I solve this?
There doesn't seem to be any problem with your graph, but your data come in 15 minute intervals, and you are plotting 4 years worth of data. So naturally it will look like a dark shaded region because there is no way to show the thousands of data points you have in your series in a single plot.
If you are struggling to handle this much data, you can consider sampling from your data frame before plotting, although this will remove seasonality and autocorrelation from the outcome. That can be helpful if you want to know average values of your outcome over time, but not as helpful to see the seasonal and autocorrelative structure in the data.
See the code below that uses dplyr and ggplot2 to plot some simulated time series that illustrates these issues. It's always best to start with simulated data and then work with your own data.
require(ggplot2)
require(dplyr)
sim_data <- arima.sim(model=list(ar=.88,order=c(1,0,0)),n=10000,sd=.3)
#Too many points
data_frame(y=as.numeric(sim_data),x=1:10000) %>% ggplot(aes(y=y,x=x)) + geom_line() +
theme_minimal() + xlab('Time') + ylab('Y_t')
#Sample from data (random sample)
#However, this will remove autocorrelation/seasonality
data_frame(y=as.numeric(sim_data),x=1:10000) %>% sample_n(500) %>%
ggplot(aes(y=y,x=x)) + geom_line() + theme_minimal() + xlab('Time') + ylab('Y_t')
# Plot a subset, which preserves autocorrelation and seasonality
data_frame(y=as.numeric(sim_data),x=1:10000) %>% slice(1:300) %>%
ggplot(aes(y=y,x=x)) + geom_line() + theme_minimal() + xlab('Time') + ylab('Y_t')

Plotting the frequency of string matches over time in R

I've compiled a corpus of tweets sent over the past few months or so, which looks something like this (the actual corpus has a lot more columns and obviously a lot more rows, but you get the idea)
id when time day month year handle what
UK1.1 Sat Feb 20 2016 12:34:02 20 2 2016 dave Great goal by #lfc
UK1.2 Sat Feb 20 2016 15:12:42 20 2 2016 john Can't wait for the weekend
UK1.3 Sat Mar 01 2016 12:09:21 1 3 2016 smith Generic boring tweet
Now what I'd like to do in R is, using grep for string matching, plot the frequency of certain words/hashtags over time, ideally normalised by the number of tweets from that month/day/hour/whatever. But I have no idea how to do this.
I know how to use grep to create subsets of this dataframe, e.g. for all tweets including the #lfc hashtag, but I don't really know where to go from there.
The other issue is that whatever time scale is on my x-axis (hour/day/month etc.) needs to be numerical, and the 'when' column isn't. I've tried concatenating the 'day' and 'month' columns into something like '2.13' for February 13th, but this leads to the issue of R treating 2.13 as being 'earlier', so to speak, than 2.7 (February 7th) on mathematical grounds.
So basically, I'd like to make plots like these, where frequency of string x is plotted against time
Thanks!
Here's one way to count up tweets by day. I've illustrated with a simplified fake data set:
library(dplyr)
library(lubridate)
# Fake data
set.seed(485)
dat = data.frame(time = seq(as.POSIXct("2016-01-01"),as.POSIXct("2016-12-31"), length.out=10000),
what = sample(LETTERS, 10000, replace=TRUE))
tweet.summary = dat %>% group_by(day = date(time)) %>% # To summarise by month: group_by(month = month(time, label=TRUE))
summarise(total.tweets = n(),
A.tweets = sum(grepl("A", what)),
pct.A = A.tweets/total.tweets,
B.tweets = sum(grepl("B", what)),
pct.B = B.tweets/total.tweets)
tweet.summary
day total.tweets A.tweets pct.A B.tweets pct.B
1 2016-01-01 28 3 0.10714286 0 0.00000000
2 2016-01-02 27 0 0.00000000 1 0.03703704
3 2016-01-03 28 4 0.14285714 1 0.03571429
4 2016-01-04 27 2 0.07407407 2 0.07407407
...
Here's a way to plot the data using ggplot2. I've also summarized the data frame on the fly within ggplot, using the dplyr and reshape2 packages:
library(ggplot2)
library(reshape2)
library(scales)
ggplot(dat %>% group_by(Month = month(time, label=TRUE)) %>%
summarise(A = sum(grepl("A", what))/n(),
B = sum(grepl("B", what))/n()) %>%
melt(id.var="Month"),
aes(Month, value, colour=variable, group=variable)) +
geom_line() +
theme_bw() +
scale_y_continuous(limits=c(0,0.06), labels=percent_format()) +
labs(colour="", y="")
Regarding your date formatting issue, here's how to get numeric dates: You can turn the day month and year columns into a date using as.Date and/or turn the day, month, year, and time columns into a date-time column using as.POSIXct. Both will have underlying numeric values with a date class attached, so that R treats them as dates in plotting functions and other functions. Once you've done this conversion, you can run the code above to count up tweets by day, month, etc.
# Fake time data
dat2 = data.frame(day=sample(1:28, 10), month=sample(1:12,10), year=2016,
time = paste0(sample(c(paste0(0,0:9),10:12),10),":",sample(10:50,10)))
# Create date-time format column from existing day/month/year/time columns
dat2$posix.date = with(dat2, as.POSIXct(paste0(year,"-",
sprintf("%02d",month),"-",
sprintf("%02d", day)," ",
time)))
# Create date format column
dat2$date = with(dat2, as.Date(paste0(year,"-",
sprintf("%02d",month),"-",
sprintf("%02d", day))))
dat2
day month year time posix.date date
1 28 10 2016 01:44 2016-10-28 01:44:00 2016-10-28
2 22 6 2016 12:28 2016-06-22 12:28:00 2016-06-22
3 3 4 2016 11:46 2016-04-03 11:46:00 2016-04-03
4 15 8 2016 10:13 2016-08-15 10:13:00 2016-08-15
5 6 2 2016 06:32 2016-02-06 06:32:00 2016-02-06
6 2 12 2016 02:38 2016-12-02 02:38:00 2016-12-02
7 4 11 2016 00:27 2016-11-04 00:27:00 2016-11-04
8 12 3 2016 07:20 2016-03-12 07:20:00 2016-03-12
9 24 5 2016 08:47 2016-05-24 08:47:00 2016-05-24
10 27 1 2016 04:22 2016-01-27 04:22:00 2016-01-27
You can see that the underlying values of a POSIXct date are numeric (number of seconds elapsed since midnight on Jan 1, 1970), by doing as.numeric(dat2$posix.date). Likewise for a Date object (number of days elapsed since Jan 1, 1970): as.numeric(dat2$date).

Resources