I am trying to plot a time series using ggplot, having the day and time stored in different data frame columns. How can I tell ggplot to take into account both the date and time in the plot?
My data looks like this
Date Hour_min Tair Tflower Tbud
Day1 8:35 24,73 29,79 31,41
Day1 8:36 24,29 29,99 31,82
... .. .. ... ...
Day2 00:00 23,62 30,37 32,59
One can load a small sample of the dataset with this:
#Tagua <- read.table(file = "TIMESERIE_OTO32.txt", header = TRUE,dec = ",")
Tagua <- structure(
list(
Date = structure(c(1L, 1L, 2L, 2L), .Label = c("Day1", "Day2"), class = "factor"),
Hour_min = structure(c(1L, 2L, 1L, 2L), .Label = c("8:35", "8:36"), class = "factor"),
Tair = c(24.73, 24.29, 23.62, 24.29),
Tflower = c(29.79, 29.99, 30.37, 29.99),
Tbud = c(31.41, 31.82, 32.59, 31.82)
),
.Names = c("Date", "Hour_min", "Tair", "Tflower", "Tbud"),
class = "data.frame",
row.names = c(NA, -4L))
Days, hours, and 3 temperature from different parts of the flower.
I have 1400 minutes for 2 days.
I wrote this script:
library(ggplot2)
ggplot(aes(x = (Hour_min), group=1), data = Tagua) +
geom_line(aes(y = Tair, colour = "var1")) +
geom_line(aes(y = Tbud, colour = "var2")) +
geom_line(aes(y = Tflower, colour = "var3"))
The problem is that R plots from 00:00 to 23 (of course), without considering the days.
How can I solve this problem?
If possible, I would like to set the x-axis tick just corresponding to the hour (eg. 2:00, 3:00,...).
This may not be the shortest solution, but you can run it step by step and see how it works.
library(lubridate)
library(dplyr)
library(tidyr)
library(ggplot2)
Tagua <- read.table(file = "TIMESERIE_OTO32.txt", header = TRUE, dec = ",")
Tagua_clean <- Tagua %>%
# Separate hours and minutes:
separate(Hour_min, into = c("Hour", "Minute"), sep = ":") %>%
# Convert Day1 -> 0
# Day2 -> 1
mutate(Day = as.numeric(gsub("Day", "", Date)) - 1) %>%
# Create a Period:
mutate(time_period = period(days = Day, hours = Hour, minutes = Minute)) %>%
# Create a Date, using the beginning of the experiment (if you know it):
mutate(Date = as.POSIXct("2017-01-01") + time_period) %>%
# Option 2: Convert the time period to hours:
mutate(Hours = as.numeric(time_period)/3600) %>%
select(Date, Hours, Tair, Tflower, Tbud)
# Option 1: With real dates:
ggplot(aes(x = Date), data = Tagua_clean) +
geom_line(aes(y = Tair, colour = "var1")) +
geom_line(aes(y = Tbud, colour = "var2"))+
geom_line(aes(y = Tflower, colour = "var3"))
# Option 2: With hours:
ggplot(aes(x = Hours), data = Tagua_clean) +
geom_line(aes(y = Tair, colour = "var1")) +
geom_line(aes(y = Tbud, colour = "var2"))+
geom_line(aes(y = Tflower, colour = "var3"))
Update: Restart the hours to 0 every day. Here we use Dates but we customize how they are shown.
scale_x_datetime has the argument date_labels that can be set to "%H" to show the hour of the day or can be set to "Day %d \n Hour: %H" for a combination of day and hour. See ?strptime for more format options. Another argument that can be used is date_breaks to specify "1 hour" if you want a label every hour.
ggplot(aes(x = Date), data = Tagua_clean) +
geom_line(aes(y = Tair, colour = "var1")) +
geom_line(aes(y = Tbud, colour = "var2"))+
geom_line(aes(y = Tflower, colour = "var3")) +
scale_x_datetime(date_labels = "Day %d \n Hour: %H")
Related
I have a time series data (date column and a value column). I am trying for a daily distribution plot.
In the below image is the weekly distribution plot that plots the values of the days of the week. Similarly I am trying to plot a daily distribution plot where x axis would be months, y axis is the value and the plot has 10 lines where each line gives you the date 1, date 2 , date 3 and so on until date 10 (since 30 days in one subplot will be clumsy so i wanted to divide the plots into 3 , 1-10, 11-20 and 21-31)
Code for weekly distribution for reference:
#dummy data
start_date <- as.Date("2020-01-01")
end_date <- as.Date("2021-12-31")
date_seq <- seq(from = start_date, to = end_date, by = "day")
set.seed(123)
value <- round(runif(length(date_seq), min = 10000, max = 100000000), 0)
df <- data.frame(date = date_seq, value = value)
df$week_number <- as.numeric(format(as.Date(df$date), "%U")) + 1
df$weekday <- weekdays(as.Date(df$date))
df$year <- as.numeric(format(as.Date(df$date), "%Y"))
years <- unique(df$year)
# Create a list of ggplots, one for each year
plots <- lapply(years, function(y) {
year_df <- df[df$year == y, ]
ggplot(year_df, aes(x = week_number, y = value, color = weekday)) +
geom_line() +
scale_color_discrete(limits = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")) +
ggtitle(paste("Weekday Distribution", y)) +
xlab("Week number") +
ylab("Value") +
theme(legend.key.size = unit(0.4, "cm")) +
theme(plot.title = element_text(hjust = 0.5, vjust = 1.5))
library(cowplot)
plot_grid(plotlist = plots, ncol = 1)
So at the end, there will be three plots(1 to 10 dates, 11 to 20 dates and 21 to 31 dates) and each plot would contain 2 subplots (as the dates ranges from 2020 to 2021). Can anyone help me with this?
Below how I would do this. The lubridate package is your friend. For the grouping, use cuts.
The result is a (in my opinion) pretty useless clutter of lines. But this is not the only reason why I do not endorse this visualisation. I feel this somehow defeats the point of a time series... one point is to visualise the auto-correlation of your data. Artificially separating out only specific days from each month impacts drastically on this particular advantage (and maybe: reason) of using a time series. You're not only losing information, but also making your own analytical life much more complicated.
library(ggplot2)
library(dplyr)
library(lubridate)
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = month, y = value, color = day, group=interaction(day, day_group))) +
geom_line() +
theme(legend.key.size = unit(0.4, "cm"),
plot.title = element_text(hjust = 0.5, vjust = 1.5),
axis.text.x = element_text(angle = 90)) +
facet_wrap(year~day_group)
I feel you want to show how the "typical" 1st day compares with the 2nd, etc. For this, an aggregate visualisation might be more useful. (Still not a good idea, but at least you get a better idea of your data). This you can do with "stat_summary" which you pass to geom_smooth which has a geometry that combines geom_line and geom_ribbon.
df %>%
mutate(day = mday(date),
month = month(date, label = T, abbr = T)) %>%
ggplot(aes(x = day, y = value)) +
geom_smooth(stat= "summary", alpha = .5, color = "black") +
facet_grid(~year)
#> No summary function supplied, defaulting to `mean_se()`
#> No summary function supplied, defaulting to `mean_se()`
Following on tjebo's answer, I would also suggest to if you must you can simply highlight a line of code that would convey something out of the clutter of lines, here is an example if you want to highlight the 11th day from the rest.
Plot
df %>%
mutate(day = mday(date),
day_group = cut(day, c(1,11,21, 31), incl = T),
month = month(date, label = T, abbr = T),
highlight = ifelse(day == 11, "Yes", "No")) %>%
ggplot(aes(x = month, y = value, color = highlight, group=interaction(day, day_group))) +
geom_line() +
theme_bw()+
theme(plot.title = element_text(hjust = 1, vjust = 2),
axis.text.x = element_text(angle = 90)) +
scale_color_manual(breaks = c("Yes", "No"),
labels = c("11th Day", "Other"),
values = c("Yes" = "red2", "No" = "grey60")) +
facet_wrap(year~day_group) +
guides(color = guide_legend(order = 1))
merged 5 data frame by the r_bind and want to draw a dot plot in ggplot. Based on the code I have the data are ordered based on the second column in the descending order. But what I need, reorder based on second column, where I have all the positive values in the descending order, then the negative values in the descending order of their absolute value, and the last have all the small values. Overall, I want to categorize the data in three group significant positive, significant negative and not significant. And in my bubble ggplot, the top have all the positive significant, then negative significant and the bottom on the plot just not significant. I generate some data to clarify more:
df <- data.frame(
Weekday = c("Fri", "Tues", "Mon", "Thurs","Mon", "Tues", "Wed", "Fri","Wed", "Thurs", "Fri"),
Quarter = c(rep("Q1", 3), rep("Q2", 5), rep("Q3",3)),
Delay = runif(11, -2,5),
pval = runif(11, 0,1))
df$Quarter <- factor(df$Quarter, levels = c("Q1", "Q2", "Q3"))
df %>%
mutate(
Weekday = fct_reorder2(
.f = Weekday,
.x = Delay,
.y = Quarter,
.fun = function(x,y){mean(x[y=="Q2"])}
)) %>%
ggplot(aes(x = Quarter, y = Weekday)) +
geom_point(aes(size = -log10(pval), color = Delay), alpha = 0.8) +
scale_size_binned(range = c(-2, 12)) +
scale_color_gradient(low = "mediumblue", high = "red2", space = "Lab") +
theme_bw() +
theme(axis.text.x = element_text(angle = 25, hjust = 1, size = 10)) +
ylab(NULL) + xlab(NULL)
Thank you.
i've read every relevant aggregate() by month and lubridate question i could find but am still running into an error of aesthetic length. lots didn't work for me bc they grouped data by month but the dataframe only contained data from one year. i don't need the cumulative total of every January across time – i need it to be month- AND year-specific.
my sample data: (df is called "sales")
order_date_create order_sum
2020-05-19 900
2020-08-29 500
2020-08-30 900
2021-02-01 200
2021-02-06 500
aggregating by month-year:
# aggregate by month (i used _moyr short for month year)
sales$bymonth <- aggregate(cbind(order_sum)~month(order_date_create),
data=sales,FUN=sum)
sales$order_moyr <- format(sales$order_date_create, '%m-%Y') # why does this get saved under values instead of data?
here's my ggplot:
# plot
ggplot(sales, aes(order_moyr, order_sum)) +
scale_x_date(limits = c(min, as.Date(now())),
breaks = "1 month",
labels = date_format("%m-%Y")) +
scale_y_continuous(labels = function(x) format(x, big.mark = "'", decimal.mark = ".", scientific = FALSE)) +
labs(x = "Date", y = "Sales Volume", title = "Sales by Month") +
geom_bar(stat="identity")+ theme_economist(base_size = 10, base_family = "sans", horizontal = TRUE, dkpanel = FALSE) + scale_colour_economist()
if i use x = order_date_create and y = order_sum it plots correctly, with month-year axis, but each bar is still daily sum.
if i use x = order_moyr and y = bymonth, i get this error:
Error: Aesthetics must be either length 1 or the same as the data (48839): y
tangentially, if anyone knows how to use both scale::dollar AND format the thousands separator in the same scale_y_continous fcn it would be a great help. i've not found how to do both.
library(scales); library(lubridate); library(dplyr);
library(ggthemes)
sales %>%
count(order_moyr = floor_date(order_date_create, "month"),
wt = order_sum, name = "order_sum") %>%
ggplot(aes(order_moyr, order_sum)) +
scale_x_date(breaks = "1 month",
labels = date_format("%m-%Y")) +
scale_y_continuous(labels = scales::dollar_format(big.mark = "'",
decimal.mark = ".")) +
labs(x = "Date", y = "Sales Volume", title = "Sales by Month") +
geom_bar(stat="identity", width = 25)+
theme_economist(base_size = 10, base_family = "sans",
horizontal = TRUE, dkpanel = FALSE) +
scale_colour_economist()
enter image description hereI have the following data; please can any one help me to plot it, I have tried to use a lot of different commands but none has given me a perfect graph
year x y
2012 4 5
2014 7 9
2017 4 3
enter image description here
this picture i need to make as it
Based on your comments you might be looking for:
library(tidyverse)
plot1 <- df %>% gather(key = measure, value = value, -year) %>%
ggplot(aes(x = year, y = value, color = measure))+
geom_point()+
geom_line()+
facet_wrap(~measure)
plot1
The biggest points here are gather and facet_wrap. I recommend the following two links:
https://ggplot2.tidyverse.org/reference/facet_grid.html
https://ggplot2.tidyverse.org/reference/facet_wrap.html
You need to convert year column type to Date.
This is a tidyverse style solution
library(tidyverse)
mydf %>%
rename("col1" = x, "col2" = y) %>%
mutate(year = paste0(year, "-01-01")) %>%
mutate(year = as.Date(year)) %>%
ggplot() +
geom_line(aes(x = year, y = col1), color = "red", size = 2) +
geom_line(aes(x = year, y = col2), color = "blue", size = 2) +
theme_minimal()
which returns this
Using the data shown reproducibly in the Note below use matplot. No packages are used.
matplot(dd[[1]], dd[-1], pch = c("x", "y"), type = "o", xlab = "year", ylab = "value")
Note
dd <- structure(list(year = c(2012L, 2014L, 2017L), x = c(4L, 7L, 4L),
y = c(5L, 9L, 3L)), class = "data.frame", row.names = c(NA, -3L))
I have a heat map showing a value with the year month pairing of the observation in the y axis, and the hour of the observation in the bottom axis. The data is held in a data.table object.
In default ggplot2 the graph looks like this:
ggplot(repeatability, aes(x = iHrMi, y = iYrMo, fill = erraticity)) +
geom_tile() +
facet_grid(. ~ off) +
scale_x_discrete(name = "Time", breaks = c("00:00", "12:00"))
Which is sort of fine, but my goal is to have the labels on the y axis include the abbreviated name, not the the number, of the month, and retain their order. zoo has extensions of ggplot2 which allow you to chart yearmon objects like so:
ggplot(repeatability, aes(x = iHrMi, y = zoo::as.yearmon(iYrMo), fill = erraticity, group = iYrMo)) +
geom_tile() +
facet_grid(. ~ off) +
zoo::scale_y_yearmon(name = "Year Month", expand = c(0,0)) +
scale_x_discrete(name = "Time", breaks = c("00:00", "12:00"))
This has the right format, but not the right number of labels, I want one for each month. Additionally, if expand is left to default, the y axis expands and includes periods that don't have data there.
If I supply an n argument though, to get 12 labels, this happens:
I don't get twelve rows, but I get an extra odd label blow the x axis intercept.
ggplot(repeatability, aes(x = iHrMi, y = zoo::as.yearmon(iYrMo), fill = erraticity, group = iYrMo)) +
geom_tile() +
facet_grid(. ~ off) +
zoo::scale_y_yearmon(name = "Year Month", expand = c(0,0), n = 12) +
scale_x_discrete(name = "Time", breaks = c("00:00", "12:00"))
This is the head of the table used to make it. I'm afraid I'm not sure how to dput the full table in a useful manner. I'll update this if anyone has any useful suggestions as to how to include a fuller version:
structure(list(iYrMo = c("2013-08", "2013-08", "2013-08", "2013-08",
"2013-08", "2013-08"), iHrMi = c("00:00", "00:30", "01:00", "01:30",
"02:00", "02:30"), off = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Weekday",
"Weekend/Holiday"), class = "factor"), fit = c(0.883255368890743,
0.888802101750935, 0.887399903327103, 0.896846543832244, 0.895936947283074,
0.898059799540441), erraticity = structure(c(4L, 4L, 4L, 4L,
4L, 4L), .Label = c("Most", "More", "Less", "Least"), class = "factor")), .Names = c("iYrMo",
"iHrMi", "off", "fit", "erraticity"), class = c("data.table",
"data.frame"), row.names = c(NA, -6L))
Note that yearmon is just a double type, with 0 equaling Jan 0000 and 2013 + 6/12 equaling July 2013
You can use the limits and breaks arguments (same as in scale_y_continuous from ggplot):
library(zoo)
ggplot(repeatability, aes(x = iHrMi, y = as.yearmon(iYrMo), fill = erraticity, group = iYrMo)) +
geom_tile() +
facet_grid(. ~ off) +
scale_y_yearmon(name = "Year Month", limits = c(2013,2015), breaks = seq(2013,2015, by = 1/12)) +
scale_x_discrete(name = "Time", breaks = c("00:00", "12:00"))
Which with your reproducible data gives:
Messy, but the axis is correct.