I am using ggplot2 to plot my hourly time series data. Data organization is as
> head(df)
timestamp power
1 2015-08-01 00:00:00 584.4069
2 2015-08-01 01:00:00 577.2829
3 2015-08-01 02:00:00 569.0937
4 2015-08-01 03:00:00 561.6945
5 2015-08-01 04:00:00 557.9449
6 2015-08-01 05:00:00 562.4152
I use following ggplot2 command to plot the data:
ggplot(df,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=pretty_breaks(n=30)) +
theme(axis.text.x = element_text(angle=90,hjust=1))
With this the plotted graph is:
My questions are:
In the plotted graph, why it is showing only the labels corresponding to hour 18. Now, what if I want to display the labels corresponding to hour 12 of each day.
I am plotting hourly data, hoping to see the fine granular details. But, I am not able to see all the hours of entire one month. Can I somehow see the zoomed view for any selected day in the same plot?
Here is a rather long example of scaling dates in ggplot and also a possible interactive way to zoom in on ranges. First, some sample data,
## Make some sample data
library(zoo) # rollmean
set.seed(0)
n <- 745
x <- rgamma(n,.15)*abs(sin(1:n*pi*24/n))*sin(1:n*pi/n/5)
x <- rollmean(x, 3, 0)
start.date <- as.POSIXct('2015-08-01 00:00:00') # the min from your df
dat <- data.frame(
timestamp=as.POSIXct(seq.POSIXt(start.date, start.date + 60*60*24*31, by="hour")),
power=x * 3000)
For interactive zooming, you could try plotly. You need to set it up (get an api-key and username) then just do
library(plotly)
plot_ly(dat, x=timestamp, y=power, text=power, type='line')
and you can select regions of the graph and zoom in on them. You can see it here.
For changing the breaks in the ggplot graphs, here is a function to make date breaks by various intervals at certain hours.
## Make breaks from a starting date at a given hour, occuring by interval,
## length.out is days
make_breaks <- function(strt, hour, interval="day", length.out=31) {
strt <- as.POSIXlt(strt - 60*60*24) # start back one day
strt <- ISOdatetime(strt$year+1900L, strt$mon+1L, strt$mday, hour=hour, min=0, sec=0, tz="UTC")
seq.POSIXt(strt, strt+(1+length.out)*60*60*24, by=interval)
}
One way to zoom in, non-interactively, is to simply subset the data,
library(scales)
library(ggplot2)
library(gridExtra)
## The whole interval, breaks on hour 18 each day
breaks <- make_breaks(min(dat$timestamp), hour=18, interval="day", length.out=31)
p1 <- ggplot(dat,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=breaks) +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
ggtitle("Full Range")
## Look at a specific day, breaks by hour
days <- 20
samp <- dat[format(dat$timestamp, "%d") %in% as.character(days),]
breaks <- make_breaks(min(samp$timestamp), hour=0, interval='hour', length.out=length(days))
p2 <- ggplot(samp,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=breaks) +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
ggtitle(paste("Day:", paste(days, collapse = ", ")))
grid.arrange(p1, p2)
I didn't worked with data time data a lot so my code might look a bit messy... But the solution to 1 is to not use pretty_breaks() but better use concrete breaks and also limit the within the scale_x_datetime() function.
A bad written example might be the following:
ggplot(df,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"),
breaks=as.POSIXct(sapply(seq(18000, 3600000, 86400), function(x) 0 + x),
origin="2015-10-19 7:00:00"),
limits=c(as.POSIXct(3000, origin="2015-10-19 7:00:00"),
as.POSIXct(30000, origin="2015-10-19 7:00:00"))) +
theme(axis.text.x = element_text(angle=90,hjust=1))
I am not sure how to write the as.POSIXct() more readable... But Basically create the 12 hour point manually and add always a complete day within the range of your data frame...
Related
Goal
Use ggplot2 (latest version) to produce a graph that duplicates the x- or y-axis on both sides of the plot, where the scale is not continuous.
Minimal Reprex
# Example data
dat1 <- tibble::tibble(x = c(rep("a", 50), rep("b", 50)),
y = runif(100))
# Standard scatterplot
p1 <- ggplot2::ggplot(dat1) +
ggplot2::geom_boxplot(ggplot2::aes(x = x, y = y))
When the scale is continuous, this is easy to do with an identity transformation (clearly one-to-one).
# This works
p1 + ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(~ .))
However, when the scale is not continuous, this doesn't work, as other scale_* functions don't have a sec.axis argument (which makes sense).
# This doesn't work
p1 + ggplot2::scale_x_discrete(sec.axis = ggplot2::sec_axis(~ .))
Error in discrete_scale(c("x", "xmin", "xmax", "xend"), "position_d", :
unused argument (sec.axis = <environment>)
I also tried using the position argument in the scale_* functions, but this doesn't work either.
# This doesn't work either
p1 + ggplot2::scale_x_discrete(position = c("top", "bottom"))
Error in match.arg(position, c("left", "right", "top", "bottom")) :
'arg' must be of length 1
Edit
For clarity, I was hoping to duplicate the x- or y-axis where the scale is anything, not just discrete (a factor variable). I just used a discrete variable in the minimal reprex for simplicity.
For example, this issue arises in a context where the non-continuous scale is datetime or time format.
Duplicating (and modifying) discrete axis in ggplot2
You can adapt this answer by just putting the same labels on both sides. As far as "you can convert anything non-continuous to a factor, but that's even more inelegant!" from your comment above, that's what a non-continuous axis is, so I'm not sure why that would be a problem for you.
TL:DR Use as.numeric(...) for your categorical aesthetic and manually supply the labels from the original data, using scale_*_continuous(..., sec_axis(~., ...)).
Edited to update:
I happened to look back through this thread and see that it was asked for dates and times. This makes the question worded incorrectly: dates and times are continuous not discrete. Discrete scales are factors. Dates and times are ordered continuous scales. Under the hood, they're just either the days or the seconds since "1970-01-01".
scale_x_date will indeed throw an error if you try to pass a sec.axis argument, even if it's dup_axis. To work around this, you convert your dates/times to a number, and then fool your scales using labels. While this requires a bit of fiddling, it's not too complicated.
library(lubridate)
library(dplyr)
df <- data_frame(tm = ymd("2017-08-01") + 0:10,
y = cumsum(rnorm(length(tm)))) %>%
mutate(tm_num = as.numeric(tm))
df
# A tibble: 11 x 3
tm y tm_num
<date> <dbl> <dbl>
1 2017-08-01 -2.0948146 17379
2 2017-08-02 -2.6020691 17380
3 2017-08-03 -3.8940781 17381
4 2017-08-04 -2.7807154 17382
5 2017-08-05 -2.9451685 17383
6 2017-08-06 -3.3355426 17384
7 2017-08-07 -1.9664428 17385
8 2017-08-08 -0.8501699 17386
9 2017-08-09 -1.7481911 17387
10 2017-08-10 -1.3203246 17388
11 2017-08-11 -2.5487692 17389
I just made a simple vector of 11 days (0 to 10) added to "2017-08-01". If you run as.numeric on that, you get the number of days since the beginning of the Unix epoch. (see ?lubridate::as_date).
df %>%
ggplot(aes(tm_num, y)) + geom_line() +
scale_x_continuous(sec.axis = dup_axis(),
breaks = function(limits) {
seq(floor(limits[1]), ceiling(limits[2]),
by = as.numeric(as_date(days(2))))
},
labels = function(breaks) {as_date(breaks)})
When you plot tm_num against y, it's treated just like normal numbers, and you can use scale_x_continuous(sec.axis = dup_axis(), ...). Then you have to figure out how many breaks you want and how to label them.
The breaks = is a function that takes the limits of the data, and calculates nice looking breaks. First you round the limits, to make sure you get integers (dates don't work well with non-integers). Then you generate a sequence of your desired width (the days(2)). You could use weeks(1) or months(3) or whatever, check out ?lubridate::days. Under the hood, days(x) generates a number of seconds (86400 per day, 604800 per week, etc.), as_date converts that into a number of days since the Unix epoch, and as.numeric converts it back to an integer.
The labels = is a function takes the sequence of integers we just generated and converts those back to displayable dates.
This also works with times instead of dates. While dates are integer days, times are integer seconds (either since the Unix epoch, for datetimes, or since midnight, for times).
Let's say you had some observations that were on the scale of minutes, not days.
The code would be similar, with a few tweaks:
df <- data_frame(tm = ymd_hms("2017-08-01 23:58:00") + 60*0:10,
y = cumsum(rnorm(length(tm)))) %>%
mutate(tm_num = as.numeric(tm))
df
# A tibble: 11 x 3
tm y tm_num
<dttm> <dbl> <dbl>
1 2017-08-01 23:58:00 1.375275 1501631880
2 2017-08-01 23:59:00 2.373565 1501631940
3 2017-08-02 00:00:00 3.650167 1501632000
4 2017-08-02 00:01:00 2.578420 1501632060
5 2017-08-02 00:02:00 5.155688 1501632120
6 2017-08-02 00:03:00 4.022228 1501632180
7 2017-08-02 00:04:00 4.776145 1501632240
8 2017-08-02 00:05:00 4.917420 1501632300
9 2017-08-02 00:06:00 4.513710 1501632360
10 2017-08-02 00:07:00 4.134294 1501632420
11 2017-08-02 00:08:00 3.142898 1501632480
df %>%
ggplot(aes(tm_num, y)) + geom_line() +
scale_x_continuous(sec.axis = dup_axis(),
breaks = function(limits) {
seq(floor(limits[1] / 60) * 60, ceiling(limits[2] / 60) * 60,
by = as.numeric(as_datetime(minutes(2))))
},
labels = function(breaks) {
stamp("Jan 1,\n0:00:00", orders = "md hms")(as_datetime(breaks))
})
Here I updated the dummy data to span 11 minutes from just before midnight to just after midnight. In breaks = I modified it to make sure I got an integer number of minutes to create breaks on, changed as_date to as_datetime, and used minutes(2) to make a break every two minutes. In labels = I added a functional stamp(...)(...), which creates a nice format to display.
Finally just times.
df <- data_frame(tm = milliseconds(1234567 + 0:10),
y = cumsum(rnorm(length(tm)))) %>%
mutate(tm_num = as.numeric(tm))
df
# A tibble: 11 x 3
tm y tm_num
<S4: Period> <dbl> <dbl>
1 1234.567S 0.2136745 1234.567
2 1234.568S -0.6376908 1234.568
3 1234.569S -1.1080997 1234.569
4 1234.57S -0.4219645 1234.570
5 1234.571S -2.7579118 1234.571
6 1234.572S -1.6626674 1234.572
7 1234.573S -3.2298175 1234.573
8 1234.574S -3.2078864 1234.574
9 1234.575S -3.3982454 1234.575
10 1234.576S -2.1051759 1234.576
11 1234.577S -1.9163266 1234.577
df %>%
ggplot(aes(tm_num, y)) + geom_line() +
scale_x_continuous(sec.axis = dup_axis(),
breaks = function(limits) {
seq(limits[1], limits[2],
by = as.numeric(milliseconds(3)))
},
labels = function(breaks) {format((as_datetime(breaks)),
format = "%H:%M:%OS3")})
Here we've got an observation every millisecond for 11 hours starting at t = 20min34.567sec. So in breaks = we dispense with any rounding, since we don't want integers now. Then we use breaks every milliseconds(2). Then labels = needs to be formatted to accept decimal seconds, the "%OS3" means 3 digits of decimals for the seconds place (can accept up to 6, see ?strptime).
Is all of this worth it? Probably not, unless you really really want a duplicated time axis. I'll probably post this as an issue on the ggplot2 GitHub, because dup_axis should "just work" with datetimes.
Option 1: This is not very elegant but it works using the cowplot::align_plots function:
library(cowplot)
library(ggplot2)
dat1 <- tibble::tibble(x = c(rep("a", 50), rep("b", 50)),
y = runif(100))
p <- ggplot2::ggplot(dat1) +
ggplot2::geom_boxplot(ggplot2::aes(x = x, y = y))
p <- p + ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(~ .))
p1 <- p + scale_x_discrete(position = c( "bottom"))
p2 <- p + scale_x_discrete(position = c( "top"))
plots <- align_plots(p1, p2, align="hv")
ggdraw() + draw_grob(plots[[1]]) + draw_grob(plots[[2]])
Option 2:
library(forcats)
dat1$num <- as.numeric(fct_recode(dat1$x, "1" = "a", "2" = "b"))
x11();ggplot2::ggplot(dat1, (aes(x = num, y = y, group = num))) +
geom_boxplot()+
ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(~ .)) +
scale_x_continuous(position = c("top"), breaks = c(1,2), labels = c("a", "b"),
sec.axis = ggplot2::sec_axis(~ .,breaks = c(1,2), labels = c("a", "b")))
Note: an answer to similar problem was posted [here] using the cowplot package (Duplicating Discrete Axis in ggplot2), but it didn't work for me. The cowplot::switch_axis_position() function has been deprecated.
I have some rainfall data collected continuously from which I have calculated daily totals. Here is some toy data:
Date <- c(seq(as.Date("2016-07-01"), by = "1 day", length.out = 10))
rain_mm <- c(3,6,8,12,0,0,34,23,5,1)
rain_data <- data.frame(Date, rain_mm)
I can plot this data as follows:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_bar(stat = "identity") +
scale_x_date(date_labels = "%d")
Which gives the following:
This seems fine. It is clear how much rainfall there was on a certain day. However, it could also be interpreted that between midday of one day and midday of the next, a certain amount of rain fell, which is wrong. This is especially a problem if the graph is combined with other plots of related continuous variables over the same period.
To get round this issue I could use geom_step as follows:
library(ggplot)
ggplot(rain_data, aes(Date, rain_mm)) +
geom_step() +
scale_x_date(date_labels = "%d")
Which gives:
This is a better way to display the data, and now scale_x_date appears to be a continuous axis. However, it would be nice to get the area below the steps filled but cant seem to find a straight forward way of doing this.
Q1: How can I fill beneath the geom_step? Is it possible?
It may also be useful to convert Date into POSIXct to facilitate identical x-axis in multi-plot figures as discussed in this SO question here.
I can do this as follows:
library(dplyr)
rain_data_POSIX <- rain_data %>% mutate(Date = as.POSIXct(Date))
Date rain_mm
1 2016-07-01 01:00:00 3
2 2016-07-02 01:00:00 6
3 2016-07-03 01:00:00 8
4 2016-07-04 01:00:00 12
5 2016-07-05 01:00:00 0
6 2016-07-06 01:00:00 0
7 2016-07-07 01:00:00 34
8 2016-07-08 01:00:00 23
9 2016-07-09 01:00:00 5
10 2016-07-10 01:00:00 1
However, this gives a time of 01:00 for each date. I would rather have 00:00. Can I change this in the as.POSIXct function call, or do I have to do it afterwards using a separate function? I think it is something to do with tz = "" but cant figure it out.
How can I convert from class Date to POSIXct so that the time generated is 00:00?
Thanks
For your first question, you can work off this example. First, create a time-lagged version of your data:
rain_tl <- mutate( rain_data, rain_mm = lag( rain_mm ) )
Then combine this time-lagged version with the original data, and re-sort by date:
rain_all <- bind_rows( old = rain_data, new = rain_tl, .id="source" ) %>%
arrange( Date, source )
(Note the newly created source column is used to break ties, correctly interlacing the original data with the time-lagged version):
> head( rain_all )
source Date rain_mm
1 new 2016-07-01 NA
2 old 2016-07-01 3
3 new 2016-07-02 3
4 old 2016-07-02 6
5 new 2016-07-03 6
6 old 2016-07-03 8
You can now use the joint matrix to "fill" your steps:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_step() +
geom_ribbon( data = rain_all, aes( ymin = 0, ymax = rain_mm ),
fill="tomato", alpha=0.5 ):
This produces the following plot:
For your second question, the problem is that as.POSIX.ct does not pass additional arguments to the converter, so specifying the tz argument does nothing.
You basically have two options:
1) Reformat the output to what you want: format( as.POSIXct( Date ), "%F 00:00" ), which returns a vector of type character. If you want to preserve the object type as POSIXct, you can instead...
2) Cast your Date vector to character prior to passing it to as.POSIX.ct: as.POSIXct( as.character(Date) ), but this will leave off the time entirely, which may be what you want anyway.
If you would like to avoid the hack, you can customize the position in the geom_bar expression.
I found good results with:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_bar(stat = "identity", position = position_nudge(x = 0.51), width = 0.99) +
scale_x_date(date_labels = "%d")
With ggplot2, I would like to create a multiplot (facet_grid) where each plot is the weekly count values for the month.
My data are like this :
day_group count
1 2012-04-29 140
2 2012-05-06 12595
3 2012-05-13 12506
4 2012-05-20 14857
I have created for this dataset two others colums the Month and the Week based on day_group :
day_group count Month Week
1 2012-04-29 140 Apr 17
2 2012-05-06 12595 May 18
3 2012-05-13 12506 May 19
4 2012-05-20 14857 May 2
Now I would like for each Month to create a barplot where I have the sum of the count values aggregated by week. So for example for a year I would have 12 plots with 4 bars (one per week).
Below is what I use to generate the plot :
ggplot(data = count_by_day, aes(x=day_group, y=count)) +
stat_summary(fun.y="sum", geom = "bar") +
scale_x_date(date_breaks = "1 month", date_labels = "%B") +
facet_grid(facets = Month ~ ., scales="free", margins = FALSE)
So far, my plot looks like this
https://dl.dropboxusercontent.com/u/96280295/Rplot.png
As you can see the x axes is not as I'm looking for. Instead of showing only week 1, 2, 3 and 4, it displays all the month.
Do you know what I must change to get what I'm looking for ?
Thanks for your help
Okay, now that I see what you want, I wrote a small program to illustrate it. The key to your order of month problem is making month a factor with the levels in the right order:
library(dplyr)
library(ggplot2)
#initialization
set.seed(1234)
sday <- as.Date("2012-01-01")
eday <- as.Date("2012-07-31")
# List of the first day of the months
mfdays <- seq(sday,length.out=12,by="1 month")
# list of months - this is key to keeping the order straight
mlabs <- months(mfdays)
# list of first weeks of the months
mfweek <- trunc((mfdays-sday)/7)
names(mfweek) <- mlabs
# Generate a bunch of event-days, and then months, then week numbs in our range
n <- 1000
edf <-data.frame(date=sample(seq(sday,eday,by=1),n,T))
edf$month <- factor(months(edf$date),levels=mlabs) # use the factor in the right order
edf$week <- 1 + as.integer(((edf$date-sday)/7) - mfweek[edf$month])
# Now summarize with dplyr
ndf <- group_by(edf,month,week) %>% summarize( count = n() )
ggplot(ndf) + geom_bar(aes(x=week,y=count),stat="identity") + facet_wrap(~month,nrow=1)
Yielding:
(As an aside, I am kind of proud I did this without lubridate ...)
I think you have to do this but I am not sure I understand your question:
ggplot(data = count_by_day, aes(x=Week, y=count, group= Month, color=Month))
I am initially having the dataset as shown below:
ID A B Type Time Date
1 12 13 R 23:20 1-1-01
1 13 12 F 23:40 1-1-01
1 13 11 F 00:00 2-1-01
1 15 10 R 00:20 2-1-01
1 12 06 W 00:40 2-1-01
1 11 09 F 01:00 2-1-01
1 12 10 R 01:20 2-1-01
so on...
I tried to make the ggplot of the above dataset for A and B.
ggplot(data=dataframe, aes(x=A, y=B, colour = Type)) +geom_point()+geom_path()
Problem:
HOW do I add a subsetting variable that looks at the first 24 hours after the every 'F' point.
For the time being I have posted a continuous data set [with respect to time] but my original data set is not continuous. How can I make my data set continuous in a interval of 10 mins? I have used interpolation xspline() function on A and B but I don't know how to make my data set continuous with respect to time,
The highlighted part shown below is what I am looking for, I want to extract this dataset and then plot a new ggplot:
From MarkusN plots this is what I am looking for:
Taking first point as 'F' point and traveling 24hrs from that point (Since there is no 24 hrs data set available here so it should produce like this) :
I've tried the following, maybe you can get an idea from here. I recommend you to first have a variable with the time ordered (either in minutes or hours, in this example I've used hours). Let's see if it helps
#a data set is built as an example
N = 100
set.seed(1)
dataframe = data.frame(A = cumsum(rnorm(N)),
B = cumsum(rnorm(N)),
Type = sample(c('R','F','W'), size = N,
prob = c(5/7,1/7,1/7), replace=T),
time.h = seq(0,240,length.out = N))
# here, a list with dataframes is built with the sequences
l_dfs = lapply(which(dataframe$Type == 'F'), function(i, .data){
transform(subset(.data[i:nrow(.data),], (time.h - time.h[1]) <= 24),
t0 = sprintf('t0=%4.2f', time.h[1]))
}, dataframe)
ggplot(data=do.call('rbind', l_dfs), aes(x=A, y=B, colour=Type)) +
geom_point() + geom_path(colour='black') + facet_wrap(~t0)
First I created sample data. Hope it's similar to your problem:
df = data.frame(id=rep(1:9), A=c(12,13,13,14,12,11,12,11,10),
B=c(13,12,10,12,6,9,10,11,12),
Type=c("F","R","F","R","W","F","R","F","R"),
datetime=as.POSIXct(c("2015-01-01 01:00:00","2015-01-01 22:50:00",
"2015-01-02 08:30:00","2015-01-02 23:00:00",
"2015-01-03 14:10:00","2015-01-05 16:30:00",
"2015-01-05 23:00:00","2015-01-06 17:00:00",
"2015-01-07 23:00:00")),
stringsAsFactors = F)
Your first question is to plot the data, highlighting the first 24h after an F-point. I used dplyr and ggplot for this task.
library(dplyr)
library(ggplot)
df %>%
mutate(nf = cumsum(Type=="F")) %>% # build F-to-F groups
group_by(nf) %>%
mutate(first24h = as.numeric((datetime-min(datetime)) < (24*3600))) %>% # find the first 24h of each F-group
mutate(lbl=paste0(row_number(),"-",Type)) %>%
ggplot(aes(x=A, y=B, label=lbl)) +
geom_path(aes(colour=first24h)) + scale_size(range = c(1, 2)) +
geom_text()
The problem here is, that the colour only changes at some points. One thing I'm not happy with is the use of different line colors for path sections. If first24h is a discrete variable
geom_path draws two sepearate paths. That's why I defined the variable as numeric. Maybe someone can improve this?
Your second question about an interpolation can easily be solved with the zoo package:
library(zoo)
full.time = seq(df$datetime[1], tail(df$datetime, 1), by=600) # new timeline with point at every 10 min
d.zoo = zoo(df[,2:3], df$datetime) # convert to zoo object
d.full = as.data.frame(na.approx(d.zoo, xout=full.time)) # interpolate; result is also a zoo object
d.full$datetime = as.POSIXct(rownames(d.full))
With these two dataframes combined, you get the solution. Every F-F section is drawn in a separate plot and only the points not longer than 24h after the F-point is shown.
df %>%
select(Type, datetime) %>%
right_join(d.full, by="datetime") %>%
mutate(Type = ifelse(is.na(Type),"",Type)) %>%
mutate(nf = cumsum(Type=="F")) %>%
group_by(nf) %>%
mutate(first24h = (datetime-min(datetime)) < (24*3600)) %>%
filter(first24h == TRUE) %>%
mutate(lbl=paste0(row_number(),"-",Type)) %>%
filter(first24h == 1) %>%
ggplot(aes(x=A, y=B, label=Type)) +
geom_path() + geom_text() + facet_wrap(~ nf)
Things I wanted to do:
create a plot of all this days without rain in my list of lists "weekdays", plotted super-imposed. (One scatterplot of points with one x and y axis)
plot an averaged curve representing the hydrograph of all weekdays with no rain in the observed time!
This is my final solution (the code is much too complicated I think, but it works:
#create list of data frames splitted by days
daysplit<-split.default(data_regression_PCM4 [,1], format(index(data_regression_PCM4), "%a"))
#create a list for each day of the week
Monday<-list(daysplit$Mo)
#create a list "Monday" splitted by date
Monday<-as.data.frame(Monday)
Monday [,2]<-row.names(Monday)
Monday[,2]<-as.POSIXct(Monday[,2], format="%Y-%m-%d %H:%M:%S", tz="UTC")
Monday <- xts(Monday[,-2], order.by=Monday[,2])
Monday<-split(Monday, "days")
#Monday without rain (flow rate less than 20 L/s)
cond <- lapply(Monday, function(x) max(x) < 20)
Monday_TW<-Monday[unlist(cond)] #Monday "No rain"
#convert lists of xts to lists of df
Monday_TW <- lapply(Monday_TW, function(x) data.frame(DateTime=index(x),Abfluss=coredata(x)))
#Extract lists to data frames while omitting the DateTime column
mo<-lapply(Monday_TW, function(x) x[!(names(x) %in% c("DateTime"))])
mo <- do.call(cbind,mo)
#combine all weekdays to one df
weekdays_TW<-cbind(mo,di,mi,do,fr)
weekdays_TW [,11]<-(seq(as.POSIXct("2014-12-29 00:00:00",tz="UTC"), as.POSIXct("2014-12-29 23:55:00",tz="UTC"),by="5 min"))#add Time column with random date (only time matters)
names(weekdays_TW)<-c("Abfluss_1","Abfluss_2","Abfluss_3","Abfluss_4","Abfluss_5","Abfluss_6","Abfluss_7","Abfluss_8","Abfluss_9","Abfluss_10","Time")
weekdays_TW<-weekdays_TW[c(11,1,2,3,4,5,6,7,8,9,10)]
#melt df
melted_TW <- melt(weekdays_TW , id = 'Time', value_name = 'series')
names(melted_TW)<-c("Time","values","Abfluss")
##Plot via ggplot2
TW_plot<-ggplot(melted_TW, aes(Time,Abfluss)) + geom_point(aes(color = values))
TW_plot+stat_smooth(method = "lm", formula = y ~ poly(x, 9), size = 1,se=F)+
scale_x_datetime( breaks=("2 hour"),minor_breaks=("1 hour"),labels=date_format("%H:%M")) +
xlab("Time [h]")+ ylab("Abfluss Q [L/s]") +
"Montag_2", "Dienstag_1", "Dienstag_2","Mittwoch_1","Mittwoch_2","Donnerstag_1","Donnerstag_2","Freitag_1","Freitag_2"))+ labs(title="Tagesgang PCM4 / Wochentage") +
theme(plot.title = element_text(lineheight=.8, face="bold"))
Resulting plot:
And this is a sample of the data I used:
DateTime Q (L/s)
2015-01-26 00:00:00 12.057333
2015-01-26 00:15:00 13.137333
2015-01-26 00:30:00 12.639000
2015-01-26 00:45:00 12.476333
This is the dataset I used:
http://m.uploadedit.com/ba3d/1432744791634.TXT
If someone happens to find a simpler solution, I would appreciate that!