I am analyzing day to day data to see when the value would be lower. I set each day as categorical variable so I can differentiate each day. But I want to get each day plotted on top of another day instead of one continuous graph as shown below.
Data set:
Value Day
2013-01-03 01:55:00 0.35435715 1
2013-01-03 02:00:00 0.33018654 1
2013-01-03 02:05:00 0.38976118 1
2013-01-04 02:10:00 0.45583868 2
2013-01-04 02:15:00 0.29290860 2
My current ggplot code is as follows:
g <- ggplot(data = Data, aes(x = Index, color = Dates)) +
geom_line(y = Data$Value) +
scale_x_datetime(date_breaks = TimeIntervalForGraph, date_labels = "%H") +
xlab("Time") +
ylab("Random value")
I would really appreciate if anyone can guide me on how I can turn my x-axis into 24hrs time series so that I can plot each day on the same graph to see when the value is lower during the 24 hrs.Thanks in advance.
Method tried:
I tried creating an 3rd column with time only, for some reasons the following codes didnt work:
time <- format(index(x), format = "%H:%M"))
data <- cbind(data, time)
You need a way of summarising the data for each hour of the day. Here are some approaches you're probably looking for:
library(xts)
library(data.table)
library(ggplot2)
tm <- seq(as.POSIXct("2017-08-08 17:30:00"), by = "5 mins", length.out = 10000)
z <- xts(runif(10000), tm, dimnames = list(NULL, "vals"))
DT <- data.table(time = index(z), coredata(z))
# note the data.table syntax is different:
DT[, hr := hour(time)]
# Plot the average value by hour:
datByHour <- DT[, list(avgval = mean(vals)), by = c("hr")]
# Use line plot if you have one point per hour:
g <- ggplot(data = datByHour, aes(x = hr, y = avgval, colour = avgval)) +
geom_line()
datByHour <- DT[, list(avgval = mean(vals)), by = c("hr")]
# visualise the distribution by hour:
g2 <- ggplot(data = DT, aes(x = hr, y = vals, group = hr)) +
geom_boxplot()
Please try the following and let me know if it works (here I am taking tm time column as given):
Data$tm = strftime(Data$tm, format="%H:%M:%S")
library(ggplot2)
ggplot(Data, aes(x = tm, y = Value, group = Day, colour = Day)) +
geom_line() +
theme_classic()
Related
I have an example dataframe composed of:
example dataframe
I have used ggplot2 to plot dates on the x-axis with a count on the y-axis:
df_ggplot <- read.csv("ggplot_ex.csv", header = T, na.strings = "", fileEncoding = "UTF-8-BOM")
df_ggplot$Date <- mdy(df_ggplot$Date)
df_ggplot$Ccount <- as.numeric(as.character(df_ggplot$Ccount))
ggplot(df_ggplot, aes(x=Date, y = Ccount)) +
geom_line() +
geom_point()
ggplot ex output
I am wanting points that occur less than 4 weeks after the previous point to turn red. Can anyone help? In this example, the second point would be red as it occurs about 2 weeks after the previous point.
You probably have to do the calculation in the dataframe before the plot (make sure your Date column is in the correct date format).
One option you can try:
df_ggplot <- df_ggplot %>%
mutate(time_diff = difftime(time1 = Date, time2 = lag(x = Date, n = 1), units = "weeks"),
is_red = as.factor(time_diff < 4))
will give you the points that must be flagged.
Date Ccount time_diff is_red
1 2019-08-17 20000 NA weeks <NA>
2 2019-08-30 15000 1.857143 weeks TRUE
3 2019-09-30 25000 4.285714 weeks FALSE
Then you can plot, using some the colors you want.
ggplot(df_ggplot, aes(x = Date, y = Ccount)) +
geom_line() +
geom_point(aes(color = is_red)) +
scale_color_manual(values = c("black", "red"), na.value = "black")
I created a plot in R using the ggplot library:
library(ggplot2)
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = variable), size = 1) +
scale_color_manual(values = c("#00AFBB", "#E7B800"))
I got the plot that I want but the only problem is that variable, yQ values have the format:
1990Q1
1900Q2
1990Q3
1990Q4
......
......
2017Q1
2017Q2
2017Q3
2017Q4
and because there are many years, the x-axis label cannot show all the dates clearly (they overlapped).
Therefore, I want the x-axis label to show only Q1 and Q3 for every 5 years.
So I want the x-axis to be something like this:
1990Q1 1990Q3 1995Q1 1995Q3 ...... 2015Q1 2015Q3
I tried to use scale_x_date but my dates are not in date format (e.g. 1990Q1) and therefore this does not work. How can I fix it?
The question does not provide reproducible input but using df from the Note below with the autoplot.zoo method of ggplot's autoplot generic we can write:
library(ggplot2)
library(zoo)
z <- read.zoo(df, index = "yQ", FUN = as.yearqtr)
autoplot(z) + scale_x_yearqtr()
Note
Test input--
df <- data.frame(yQ = c("1990Q1", "1990Q2", "1990Q3", "1990Q4"), value = 1:4)
The zoo::format.yearqtr() function is quite easy to use with ggplot2.
Try
scale_x_date(labels = function(x) zoo::format.yearqtr(x, "%YQ%q"))
Use function zoo::as.yearqtr (zoo package) to work with quarterly dates.
Generate example data:
year <- 1990:2000
quar <- paste0("Q", 1:4)
foo <- as.vector(outer(year, quar, paste0))
data <- data.frame(dateQ = foo, Y = rnorm(length(foo)))
head(data)
dateQ Y
1 1990Q1 -0.09944705
2 1991Q1 0.14493910
3 1992Q1 0.54856787
4 1993Q1 1.12966224
5 1994Q1 -0.93539302
6 1995Q1 0.24772265
Transform quarterly date to "normal" date:
data$dateNorm <- as.Date(zoo::as.yearqtr(data$dateQ))
head(data)
dateQ Y dateNorm
1 1990Q1 -0.09944705 1990-01-01
2 1991Q1 0.14493910 1991-01-01
3 1992Q1 0.54856787 1992-01-01
4 1993Q1 1.12966224 1993-01-01
5 1994Q1 -0.93539302 1994-01-01
6 1995Q1 0.24772265 1995-01-01
It sets Q1/2/3/4 as the first day of January/April/July/October.
data[grep("1991", data$dateQ), ]
dateQ Y dateNorm
2 1991Q1 0.1449391 1991-01-01
13 1991Q2 1.5878678 1991-04-01
24 1991Q3 -0.1071823 1991-07-01
35 1991Q4 2.2905729 1991-10-01
Now you can plot it or perform other calculations as it's in Date format.
library(ggplot2)
ggplot(data, aes(dateNorm, Y)) +
geom_line()
You can
manipulate x-axis breaks and labels with scale_x_discrete(breaks = ..., labels = ...)
change the angle of text with theme(axis.text.x = element_text(angle = ...))
I generated some data
Combs <- expand.grid(1990:2017, c("Q1", "Q2", "Q3", "Q4"))
df <- data.frame(
yQ = sort(apply(Combs, 1, paste, collapse="")),
value = runif(112)
)
In the first example, I subset yQ values you want with a logical vector - and change the angle of text
library(ggplot2)
pattern <- c(T, F, T, F, rep(F, 16))
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = "red"), size = 1) +
scale_x_discrete(breaks = df$yQ[pattern], labels = df$yQ[pattern]) +
theme(axis.text.x = element_text(angle=90))
But notice that ticks marks not specified by break are not shown - so the alternative is to copy yQ values into a vector and make non-relevant years = ""
xVec <- as.character(df$yQ)
xVec[pattern==F] <- ""
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = "red"), size = 1) +
scale_x_discrete(breaks = df$yQ, labels = xVec) +
theme(axis.text.x = element_text(angle=90))
I'm trying to create a histogram from time-series data in R, similar to this question. Each bin should show the total duration for the values falling within the bin. I have non-integer sample times in an zoo object of thousands of rows. The timestamps are irregular, and the data is assumed to be constant between each timestamp (sample-and-hold).
Example data:
library(zoo)
library(ggplot2)
timestamp = as.POSIXct(c("2018-02-21 15:00:00.0", "2018-02-21 15:00:02.5", "2018-02-21 15:00:05.2", "2018-02-21 15:00:07.0", "2018-02-21 15:00:09.3", "2018-02-21 15:00:10.0", "2018-02-21 15:00:12.0"), tz = "GMT")
data = c(0,3,5,1,3,0,2)
z = zoo(data, order.by = timestamp)
x.df <- data.frame(Date = index(z), Value = as.numeric(coredata(z)))
ggplot(x.df, aes(x = Date, y = Value)) + geom_step() + scale_x_datetime(labels = date_format("%H:%M:%OS"))
Please see the times-series plot here. Creating a histogram with hist(z, freq = T) does not care about the timestamps: Plot from hist method.
My desired output is a histogram with duration in seconds on the y-axis, something like this: Histogram with non-integer duration on y-axis.
Edit:
I should point out that the data values are not integers, and that i want to be able to control the bin width(s). I could use diff(timestamp) to create a (non-integer) column showing duration for each point, and plotting a bar graph like suggested by #MKR:
x.df = data.frame(DurationSecs = as.numeric(diff(timestamp)), Value = data[-length(data)])
ggplot(x.df, aes(x = Value, y = DurationSecs)) + geom_bar(stat = "identity")
This gives a histogram with the right bar heights for the example. But this fails when the values are floating point numbers.
Since you want duration (in seconds) on y-axis, hence you should add one column in x.df for duration. A histogram with stat = sum will fit needs of OP. The steps are
library(zoo)
library(dplyr)
timestamp = as.POSIXct(c("2018-02-21 15:00:00.0", "2018-02-21 15:00:02.5",
"2018-02-21 15:00:05.2", "2018-02-21 15:00:07.0", "2018-02-21 15:00:09.3",
"2018-02-21 15:00:10.0", "2018-02-21 15:00:12.0"), tz = "GMT")
data = c(0,3,5,1,3,0,2)
z = zoo(data, order.by = timestamp)
x.df <- data.frame(Date = index(z), Value = as.numeric(coredata(z)))
# DurationSecs is added as numeric. It shows diff from earliest time.
x.df <- x.df %>% arrange(Date) %>%
mutate(DurationSecs = ifelse(is.na(lead(Date)), 0, lead(Date) - Date))
# Draw the plot now
ggplot(x.df, aes(x = Value, y = DurationSecs)) + geom_histogram(stat="sum")
#The data
# Date Value DurationSecs
#1 2018-02-21 15:00:00 0 2.5
#2 2018-02-21 15:00:02 3 2.7
#3 2018-02-21 15:00:05 5 1.8
#4 2018-02-21 15:00:07 1 2.3
#5 2018-02-21 15:00:09 3 0.7
#6 2018-02-21 15:00:10 0 2.0
#7 2018-02-21 15:00:12 2 0.0
After some trial and error I found a solution. The answer provided by MKR sort of works, but I could not set the number of bins and it failed for floating-pont values.
I came across the wonderful functions cut and xtab in this question: How to plot an histogram with y as a sum of the x values for every bin in ggplot2. The solution provided there was painfully slow, drawing each data-point duration as stacked bars.
I don't need separate bars for each data-point, I just need the sum of the durations within each bin. This is my solution:
library(dplyr)
library(magrittr)
library(zoo)
library(ggplot2)
timestamp = as.POSIXct(c("2018-02-21 15:00:00.0", "2018-02-21 15:00:02.5",
"2018-02-21 15:00:05.2", "2018-02-21 15:00:07.0", "2018-02-21 15:00:09.3",
"2018-02-21 15:00:10.0", "2018-02-21 15:00:12.0"), tz = "GMT")
data = c(0,3,5,1,3,0,2)
z = zoo(data, order.by = timestamp)
x.df <- data.frame(Date = index(z), Value = as.numeric(coredata(z)))
# DurationSecs is added as numeric. It shows diff from the previous datapoint.
x.df <- x.df %>% arrange(Date) %>%
mutate(DurationSecs = ifelse(is.na(lead(Date)), 0, lead(Date) - Date))
# Adding a column of bins to the dataframe:
BinCount <- 7
x.df$bins = cut(x.df$Value, pretty(x.df$Value, n = BinCount), include.lowest = TRUE, right = FALSE)
# Creating a new dataframe containing bins and the sum of DurationSecs for each bin.
y.df = data.frame(xtabs(DurationSecs ~ bins, x.df))
# Ready to plot
ggplot(y.df, aes(x = bins, y = Freq)) +
geom_bar(stat = "identity") +
ylab("Duration") +
xlab("Value") +
scale_x_discrete(drop = F) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.3, hjust = 1)) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10))
The result is shown here. As a bonus, the labels on the x-axis are really beautiful, and I have the frequency table available for further analysis.
I've following dataset:
time tta
08:20:00 1
21:30:00 5
22:00:00 1
22:30:00 1
00:25:00 1
17:00:00 5
I would like to plot bar chart using ggplot so that the x-axis has every every 2 hours(00:00:00,02:00:00,04:00:00 and so on) and y-axis has frequency for a factor tta (1 and 5).
x-axis should be 00-01,01-02,... so on
I approached this using the xts package, but then found that it does not offer flooring the time. Hence, I conclude lubridate to be more practical here, also because ggplot does not understand xts objects right away. Both packages help you transforming time data in many ways.
Use xts::align.time or lubridate::floor_date to shift your times to the next/previous full hour/day/etc.
Either way, you aggregate the data before you pass it to ggplot. You can use sum to sum up tta, or just use length to count the number of occurences, but in the latter case you could also use geom_histogram on the time series only. You can carefully shift the bars in ggplot with position_nudge to represent a period rather than just sitting centered on a point of time. You sould specify scale_x_time(labels = ..., breaks = ...) in the plot.
Data:
time <- c(
"08:20:00",
"21:30:00",
"22:00:00",
"22:30:00",
"00:25:00",
"17:00:00"
)
time <- as.POSIXct(time, format = "%H:%M:%S")
tta <- c(1, 5, 1, 1, 1, 5)
Using xts:
library(xts)
myxts <- xts(tta, order.by = time)
myxts_aligned <- align.time(myxts, n = 60*60*2) # shifts all times to the next full
# 2 hours
myxts_agg <- period.apply(myxts_aligned,
INDEX = endpoints(myxts, "hours", 2),
FUN = sum) # sums up every two hours
require(ggplot2)
ggplot(mapping = aes(x = index(myxts_agg), y = myxts_agg[, 1])) +
geom_bar(stat = "identity",
width = 60*60*2, # one bar to be 2 hours wide
position = position_nudge(x = -60*60), # shift one hour to the left
# so that the bar represents the actual period
colour = "black") +
scale_x_time(labels = function(x) strftime(x, "%H:%M"),
breaks = index(myxts_agg)) + # add more breaks manually if you like
scale_y_continuous() # to escape the warning of ggplot not knowing
# how to deal with xts object
Using lubridate:
require(lubridate)
require(tidyverse)
mydf <- data.frame(time = time, tta = tta)
mydf_agg <-
mydf %>%
group_by(time = floor_date(time, "2 hours")) %>%
summarise(tta_sum = sum(tta), tta_freq = n())
ggplot(mydf_agg, aes(x = time, y = tta_sum)) +
geom_bar(stat = "identity",
width = 60*60*2, # one bar to be 2 hours wide
position = position_nudge(x = 60*60), # shift one hour to the *right*
# so that the bar represents the actual period
colour = "black") +
scale_x_time(labels = function(x) strftime(x, "%H:%M"),
breaks = mydf_agg$time) # add more breaks manually if you like
After all, allmost the same:
use the floor_date function from lubridate
library(tidyverse)
library(lubridate)
your_df %>% group_by(floor_date(time,"2 hours")) %>% count(tta)
and then ggplot with geom_col from there
library(lubridate)
library(ggplot2)
Make sure the class for your timestamp is POSxx
> class(df$timestamp)
[1] "POSIXct" "POSIXt"
Then use the scale_x_datetime function as follows.
gg +
scale_x_datetime(expand = c(0, 0), breaks=date_breaks("1 hour"), labels=date_format("%H:%M"))
On this case, it will space the brakes on the x axis, every one hour and the labels will look 09:00 for example.
I have data in the following format:
Date Year Month Day Flow
1 1953-10-01 1953 10 1 530
2 1953-10-02 1953 10 2 530
3 1953-10-03 1953 10 3 530
I would like to create a graph like this:
Here is my current image and code:
library(ggplot2)
library(plyr)
library(reshape2)
library(scales)
## Read Data
df <- read.csv("Salt River Flow.csv")
## Convert Date column to R-recognized dates
df$Date <- as.Date(df$Date, "%m/%d/%Y")
## Finds Water Years (Oct - Sept)
df$WY <- as.POSIXlt(as.POSIXlt(df$Date)+7948800)$year+1900
## Normalizes Water Years so stats can be applied to just months and days
df$w <- ifelse(month(df$Date) %in% c(10,11,12), 1903, 1904)
##Creates New Date (dat) Column
df$dat <- as.Date(paste(df$w,month(df$Date),day(df$Date), sep = "-"))
## Creates new data frame with summarised data by MonthDay
PlotData <- ddply(df, .(dat), summarise, Min = min(Flow), Tenth = quantile(Flow, p = 0.05), TwentyFifth = quantile(Flow, p = 0.25), Median = quantile(Flow, p = 0.50), Mean = mean(Flow), SeventyFifth = quantile(Flow, p = 0.75), Ninetieth = quantile(Flow, p = 0.90), Max = max(Flow))
## Melts data so it can be plotted with ggplot
m <- melt(PlotData, id="dat")
## Plots
p <- ggplot(m, aes(x = dat)) +
geom_ribbon(aes(min = TwentyFifth, max = Median), data = PlotData, fill = alpha("black", 0.1), color = NA) +
geom_ribbon(aes(min = Median, max = SeventyFifth), data = PlotData, fill = alpha("black", 0.5), color = NA) +
scale_x_date(labels = date_format("%b"), breaks = date_breaks("month"), expand = c(0,0)) +
geom_line(data = subset(m, variable == "Mean"), aes(y = value), size = 1.2) +
theme_bw() +
geom_line(data = subset(m, variable %in% c("Min","Max")), aes(y = value, group = variable)) +
geom_line(data = subset(m, variable %in% c("Ninetieth","Tenth")), aes(y = value, group = variable), linetype = 2) +
labs(x = "Water Year", y = "Flow (cfs)")
p
I am very close but there are some issues I'm having. First, if you can see a way to improve my code, please let me know. The main problem I ran into was that I needed two dataframes to make this graph: one melted, and one not. The unmelted dataframe was necessary (I think) to create the ribbons. I tried many ways to use the melted dataframe for the ribbons, but there was always a problem with the aesthetic length.
Second, I know to have a legend - and I want one, I need to have something in the aesthetics of each line/ribbon, but I am having trouble getting that to work. I think it would involve scale_fill_manual.
Third, and I don't know if this is possible, I would like to have each month label in between the tick marks, not on them (like in the above image).
Any help is greatly appreciated (especially with creating more efficient code).
Thank you.
Something along these lines might get you close with base:
library(lubridate)
library(reshape2)
# simulating data...
Date <- seq(as.Date("1953-10-01"),as.Date("2010-10-01"),by="day")
Year <- year(Date)
Month <- month(Date)
Day <- day(Date)
set.seed(1)
Flow <- rpois(length(Date), 2000)
Data <- data.frame(Date=Date,Year=Year,Month=Month,Day=Day,Flow=Flow)
# use acast to get it in a convenient shape:
PlotData <- acast(Data,Year~Month+Day,value.var="Flow")
# apply for quantiles
Quantiles <- apply(PlotData,2,function(x){
quantile(x,probs=c(1,.9,.75,.5,.25,.1,0),na.rm=TRUE)
})
Mean <- colMeans(PlotData, na.rm=TRUE)
# ugly way to get month tick separators
MonthTicks <- cumsum(table(unlist(lapply(strsplit(names(Mean),split="_"),"[[",1))))
# and finally your question:
plot(1:366,seq(0,max(Flow),length=366),type="n",xlab = "Water Year",ylab="Discharge",axes=FALSE)
polygon(c(1:366,366:1),c(Quantiles["50%",],rev(Quantiles["75%",])),border=NA,col=gray(.6))
polygon(c(1:366,366:1),c(Quantiles["50%",],rev(Quantiles["25%",])),border=NA,col=gray(.4))
lines(1:366,Quantiles["90%",], col = gray(.5), lty=4)
lines(1:366,Quantiles["10%",], col = gray(.5))
lines(1:366,Quantiles["100%",], col = gray(.7))
lines(1:366,Quantiles["0%",], col = gray(.7), lty=4)
lines(1:366,Mean,lwd=3)
axis(1,at=MonthTicks, labels=NA)
text(MonthTicks-15,-100,1:12,pos=1,xpd=TRUE)
axis(2)
The plotting code really isn't that tricky. You'll need to clean up the aesthetics, but polygon() is usually my strategy for shaded regions in plots (confidence bands, whatever).
Perhaps this will get you closer to what you're looking for, using ggplot2 and plyr:
library(ggplot2)
library(plyr)
library(lubridate)
library(scales)
df$MonthDay <- df$Date - years( year(df$Date) + 100 ) #Normalize points to same year
df <- ddply(df, .(Month, Day), mutate, MaxDayFlow = max(Flow) ) #Max flow on day
df <- ddply(df, .(Month, Day), mutate, MinDayFlow = min(Flow) ) #Min flow on day
p <- ggplot(df, aes(x=MonthDay) ) +
geom_smooth(size=2,level=.8,color="black",aes(y=Flow)) + #80% conf. interval
geom_smooth(size=2,level=.5,color="black",aes(y=Flow)) + #50% conf. interval
geom_line( linetype="longdash", aes(y=MaxDayFlow) ) +
geom_line( linetype="longdash", aes(y=MinDayFlow) ) +
labs(x="Month",y="Flow") +
scale_x_date( labels = date_format("%b") ) +
theme_bw()
Edit: Fixed X scale and X scale label
(Partial answer with base plotting function and not including the min, max, or mean.) I suspect you will need to construct a dataset before passing to ggplot, since that is typical for that function. I already do something similar and then pass the resulting matrix to matplot. (It doesn't do that kewl highlighting, but maybe ggplot can do it>
HDL.mon.mat <- aggregate(dfrm$Flow,
list( dfrm$Year + dfrm$Month/12),
quantile, prob=c(0.1,0.25,0.5,0.75, 0.9), na.rm=TRUE)
matplot(HDL.mon.mat[,1], HDL.mon.mat$x, type="pl")