Mixing line chart and dots points in baseline of chart in R - r

I have built a chart in R with two series, but I want to add a coloured bar at the bottom of the chart:
The data to be plotted is
2013-01-01 12:35:00 0
2013-01-01 12:45:00 1
2013-01-01 13:00:00 1
....
2013-01-01 13:00:00 2
where 0 is green, 1 is orange and 2 is red. Datetime is aligned with data X in original chart.
This is the code for the chart (without coloured bar):
datos_tem <- dbGetQuery(connection, paste("SELECT temp_int,hum_int,datetime FROM datalog_v2 WHERE host_id=41 and datetime>='2014-02-01 00:00:00' and datetime<='2014-02-01 23:59:00';", sep=""))
dbDisconnect(connection)
datos_tem$datetime <- as.POSIXct(datos_tem$datetime)
datos_tem$temp_int <- as.numeric(datos_tem$temp_int)
datos_tem$hum_int <- as.numeric(datos_tem$hum_int)
#gg <- qplot(datos_tem$datetime, datos_tem$temp_int) + geom_line() # first line
#gg <- gg + geom_line(aes( x=datos_tem$datetime, y=datos_tem$hum_int )) # add the second line!
png(file.path("/tmp/", paste("comp",".png",sep="_")))
Molten <- melt(datos_tem, id.vars = "datetime")
ggplot(Molten, aes(x = datetime, y = value, colour = variable)) + geom_line() +
scale_y_continuous(limits=c(0, 100)) +
xlab("Tiempo") +
ylab("Temperatura --- (ÂșC) y Humedad (%)")+
geom_line(size=1.9)+
scale_color_manual(values=c("#FF0000", "#0000FF"),
name="Medidas",
labels=c("Temperature", "Humidity"))
So, I want to add something like my example to my code.
Is it possible?
Data for lines are:
temp_int hum_int datetime
11.6 76.8 2014-02-01 00:00:00
11.4 77.8 2014-02-01 00:15:00
11.3 79.4 2014-02-01 00:30:00
.....
And data for the bar at bottom is:
datetime DPV
2013-01-01 12:35:00 0
2013-01-01 12:45:00 1
2013-01-01 13:00:00 1
....
2013-01-01 13:00:00 2
Better!! I've changed my data and now I have:
datetime,temp_int,hum_int,dpv
"2014-02-15 00:00:00",67.2,13.6,"red"
"2014-02-15 00:15:00",63.4,13.8,"yellow"
"2014-02-15 00:30:00",61.2,14.2,"green"
"2014-02-15 00:45:00",60.4,14.5,"green"
....

It hard to answer without actual data but here some ideas to start with.
Made some sample data consisting of x values, temp values for lines and id values used to color bar.
set.seed(1)
df<-data.frame(x=1:100,temp=runif(100,10,50),id=sample(1:3,100,replace=TRUE))
One solution is to use geom_tile() and set y values to 0 and use id for the fill=. Problem with this solution is that height of bar will depend on range of your data. You can increase the height by calling several geom_tile() calls with different y values.
ggplot(df,aes(x))+geom_line(aes(y=temp))+
geom_tile(aes(y=0,fill=factor(id)))
Another possibility is to use geom_bar() with stat="identity" and set y value height of bars you need. With argument width= you can change width of bars to ensure that there is no space between bars.
ggplot(df,aes(x))+geom_line(aes(y=temp))+
geom_bar(aes(y=4,fill=factor(id)),stat="identity",width=1)
UPDATE - solution with OP data
Data provided in question.
df<-read.table(text="datetime,temp_int,hum_int,dpv
2014-02-15 00:00:00,67.2,13.6,red
2014-02-15 00:15:00,63.4,13.8,yellow
2014-02-15 00:30:00,61.2,14.2,green
2014-02-15 00:45:00,60.4,14.5,green",header=T,sep=",")
Converting datetime column to POSIXct.
df$datetime <- as.POSIXct(df$datetime)
Melting data frame to long format.
library(reshape2)
df.melt<-melt(df,id.vars=c("datetime","dpv"))
Now for plotting use melted data frame. Argument colour= should be placed inside the aes() of geom_line() because color change border of bars if placed inside the ggplot() call. For geom_bar() use dpv as fill= and also use scale_fill_identity() because dpv contains actual color names. If you need to have bars that are close to each other use width=900. I set 900 because you have time interval of 15 minutes that correspond to 900 seconds (1 second is unit in this case).
ggplot(df.melt, aes(x = datetime, y = value)) +
geom_line(aes(colour = variable),size=1.9) +
geom_bar(aes(y=4,fill=dpv),stat="identity",width=900)+
scale_fill_identity()

Related

Time series as factor with equidistant ticks

Having a dataframe (df) containing a time series for a single variable (X):
X time
1 6.905551 14-01-2021 14:53
2 6.852534 27-01-2021 18:24
3 7.030995 23-01-2021 11:11
4 7.083345 23-01-2021 01:19
5 7.003437 28-01-2021 01:07
6 7.040500 14-01-2021 23:34
7 6.940566 14-01-2021 13:42
8 6.989434 22-01-2021 18:37
9 7.032720 22-01-2021 17:50
10 7.001651 23-01-2021 19:05
I am using the time as a factor to create a plot displaying points in an equidistant manner, for which I require a conversion from the original timestamp e.g. "2021-01-14 12:07:53 CET" to 14-01-2021 12:07.
This is done by factor(format(timestamp, "%d-%m-%Y %H:%M")).
Now for the plotting I use ggplot2:
ggplot(aes(x = time, y = X, group=1), data=df) +
geom_line(linetype="dotted") + geom_point() + theme_linedraw() +
theme(axis.text.x = element_text(angle = -40)) +
scale_x_discrete(breaks=df$time[seq(1,length(df$time),by=4)], name="Date")
As indicated, I want to change the tick frequency for the x axis to avoid overlap. Ideally, ticks are placed in an equidistant manner as well per day, e.g 14-01-2021, 22-01-2021 and so on. By scale_x_discrete, I am able to place ticks for every nth factor but they end up plotting this (which is to be expected):
I have also looked into using the dates directly by as.Date(timestamp) and for the scaling e.g. scale_x_date(date_breaks = "4 days"). This obviously yields the correct equidistant tick spacing but the plot itself will end stacking values for the same date and thus containing gaps.
EDIT
#Jon Springs' answer works well if there are no duplicates in the time due to multiple observations. However, having these will result in the following using facet_grid to resolve for the said variable.
In this case the df looks like (with grouper being the variable used for facet_wrap):
X time. grouper
1 6.905551 14-01-2021 14:53 red
2 6.905551 14-01-2021 14:53 green
3 6.852534 27-01-2021 18:24 red
4 6.852534 27-01-2021 18:24 green
5 7.030995 23-01-2021 11:11 red
6 7.030995 23-01-2021 11:11 green
set.seed(0)
library(dplyr)
my_data <- tibble(X = rnorm(10),
time_delay = runif(10, 1, 1000)) %>%
mutate(time = as.POSIXct("2021-01-14") + cumsum(time_delay)*1E5) %>%
# Label every other NEW time
arrange(time) %>%
mutate(label = if_else(
cumsum(time != lag(time, default = as.POSIXct("2000-01-01"))) %% 2 < 1,
format(time, "%d-%m-%Y\n%H:%M"),
"")
)
my_data
ggplot(my_data, aes(x = time %>% as.factor,
y = X, group = 1)) +
geom_line() +
scale_x_discrete(labels = my_data$label)

How to limit graph to only show points above the positive x axis?

I currently have a plot of my data that looks like this:
However because of the negative spike in around 2017, the graph shows values above and below the x axis. How do I make it so the graph only shows values above the x axis?
This is the code I am currently using to produce my graph
plot(dandpw)
addLegend(lty = 1)
mydata
> head(dandpw)
QLD1.Price NSW1.Price VIC1.Price SA1.Price TAS1.Price
2008-01-07 10:30:00 33.81019 36.52777 49.66935 216.45379 30.88968
2008-01-14 10:30:00 45.09321 37.55887 49.04155 248.33518 51.16057
2008-01-21 10:30:00 27.22551 29.57798 31.28935 31.56158 45.99226
2008-01-28 10:30:00 26.14283 27.32113 30.20470 31.90042 53.48170
2008-02-04 10:30:00 91.86961 36.77000 37.09027 37.57167 56.28464
2008-02-11 10:30:00 62.60607 28.83509 34.95866 35.18217 55.78961
dput(head(dandpw
You can do this in two ways. Since there is no usable dput (only the picture), I assume your data is in a data frame.
You can remove negative numbers from your dataset
You can put limits on the y-axis shown in the chart (using ggplot2)
Method 1 (not recommended as it alters your data):
#remove negatives and replace with NA. Can also replace with 0 if desired
dandpw[dandpw < 0] <- NA
Method 2:
#assume dandpw is data frame
library(tidyverse)
names(dandpw)[1] <- "date" #looks like your date column might not be named
#ggplot prefers long format
dandpw <- dandpw %>% gather(variables, values, -date)
ggplot(data = dandpw, aes(x = date, y = values, color = variables)) +
geom_line() +
coord_cartesian(ylim = c(0, max(dandpw$values, na.rm = T) * 1.1 ))

multiple graphs of each time series [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have the following dataframe where each user has 288 observations:
User5 User8 User10
2015-01-01 00:00:00 12.3 10.3 17.5
2015-01-01 00:30:00 20.1 12.7 20.9
2015-01-01 01:00:00 12.8 9.2 17.8
2015-01-01 01:30:00 11.5 6.9 12.5
2015-01-01 02:00:00 12.2 9.2 7.5
2015-01-01 02:30:00 9.2 14.2 9.0
.................... .... .... ....
2015-01-01 23:30:00 11.2 10.7 16.8
How can I make a graph with multiple graphs of each time series?
Another option is to convert from wide to long format then plot everything in the same graph. Below is the code that use the DF posted by #G. Grothendieck
library(tidyverse)
library(scales)
# Convert Time from factor to Date/Time
DF$Time <- as.POSIXct(DF$Time)
# Convert from wide to long format (`tidyr::gather`)
df_long <- DF %>% gather(key = "user", value = "value", -Time)
# Plot all together, color based on User
# We use pretty_breaks() from scales package for automatic Date/Time labeling
ggplot(df_long, aes(Time, value, group = user, color = user)) +
geom_line() +
scale_x_datetime(breaks = pretty_breaks()) +
theme_bw()
Edit: to plot each user in a separated panel, use facet_grid
ggplot(df_long, aes(Time, value, group = user, color = user)) +
geom_line() +
scale_x_datetime(breaks = pretty_breaks()) +
theme_bw() +
facet_grid(user ~ .)
Assuming the data frame DF in the Note at the end read it into a zoo object and then plot. Assuming that "multiple graphs" means one panel per user, any of the following 3 options can be used. If "mulitple graphs" means one panel with three lines in it, one per user, then add the screen = 1 argument to the first two and facet = NULL to the third.
library(zoo)
z <- read.zoo(DF, tz = "")
# 1
plot(z)
# 2
library(lattice)
xyplot(z)
# 3
library(ggplot2)
autoplot(z)
Note
Lines <- "
Time,User5,User8,User10
2015-01-01 00:00:00,12.3,10.3,17.5
2015-01-01 00:30:00,20.1,12.7,20.9
2015-01-01 01:00:00,12.8,9.2,17.8
2015-01-01 01:30:00,11.5,6.9,12.5
2015-01-01 02:00:00,12.2,9.2,7.5
2015-01-01 02:30:00,9.2,14.2,9
2015-01-01 23:30:00,11.2,10.7,16.8"
DF <- read.csv(text = Lines)

plotting daily rainfall data using geom_step

I have some rainfall data collected continuously from which I have calculated daily totals. Here is some toy data:
Date <- c(seq(as.Date("2016-07-01"), by = "1 day", length.out = 10))
rain_mm <- c(3,6,8,12,0,0,34,23,5,1)
rain_data <- data.frame(Date, rain_mm)
I can plot this data as follows:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_bar(stat = "identity") +
scale_x_date(date_labels = "%d")
Which gives the following:
This seems fine. It is clear how much rainfall there was on a certain day. However, it could also be interpreted that between midday of one day and midday of the next, a certain amount of rain fell, which is wrong. This is especially a problem if the graph is combined with other plots of related continuous variables over the same period.
To get round this issue I could use geom_step as follows:
library(ggplot)
ggplot(rain_data, aes(Date, rain_mm)) +
geom_step() +
scale_x_date(date_labels = "%d")
Which gives:
This is a better way to display the data, and now scale_x_date appears to be a continuous axis. However, it would be nice to get the area below the steps filled but cant seem to find a straight forward way of doing this.
Q1: How can I fill beneath the geom_step? Is it possible?
It may also be useful to convert Date into POSIXct to facilitate identical x-axis in multi-plot figures as discussed in this SO question here.
I can do this as follows:
library(dplyr)
rain_data_POSIX <- rain_data %>% mutate(Date = as.POSIXct(Date))
Date rain_mm
1 2016-07-01 01:00:00 3
2 2016-07-02 01:00:00 6
3 2016-07-03 01:00:00 8
4 2016-07-04 01:00:00 12
5 2016-07-05 01:00:00 0
6 2016-07-06 01:00:00 0
7 2016-07-07 01:00:00 34
8 2016-07-08 01:00:00 23
9 2016-07-09 01:00:00 5
10 2016-07-10 01:00:00 1
However, this gives a time of 01:00 for each date. I would rather have 00:00. Can I change this in the as.POSIXct function call, or do I have to do it afterwards using a separate function? I think it is something to do with tz = "" but cant figure it out.
How can I convert from class Date to POSIXct so that the time generated is 00:00?
Thanks
For your first question, you can work off this example. First, create a time-lagged version of your data:
rain_tl <- mutate( rain_data, rain_mm = lag( rain_mm ) )
Then combine this time-lagged version with the original data, and re-sort by date:
rain_all <- bind_rows( old = rain_data, new = rain_tl, .id="source" ) %>%
arrange( Date, source )
(Note the newly created source column is used to break ties, correctly interlacing the original data with the time-lagged version):
> head( rain_all )
source Date rain_mm
1 new 2016-07-01 NA
2 old 2016-07-01 3
3 new 2016-07-02 3
4 old 2016-07-02 6
5 new 2016-07-03 6
6 old 2016-07-03 8
You can now use the joint matrix to "fill" your steps:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_step() +
geom_ribbon( data = rain_all, aes( ymin = 0, ymax = rain_mm ),
fill="tomato", alpha=0.5 ):
This produces the following plot:
For your second question, the problem is that as.POSIX.ct does not pass additional arguments to the converter, so specifying the tz argument does nothing.
You basically have two options:
1) Reformat the output to what you want: format( as.POSIXct( Date ), "%F 00:00" ), which returns a vector of type character. If you want to preserve the object type as POSIXct, you can instead...
2) Cast your Date vector to character prior to passing it to as.POSIX.ct: as.POSIXct( as.character(Date) ), but this will leave off the time entirely, which may be what you want anyway.
If you would like to avoid the hack, you can customize the position in the geom_bar expression.
I found good results with:
ggplot(rain_data, aes(Date, rain_mm)) +
geom_bar(stat = "identity", position = position_nudge(x = 0.51), width = 0.99) +
scale_x_date(date_labels = "%d")

Plot hourly data using ggplot2

I am using ggplot2 to plot my hourly time series data. Data organization is as
> head(df)
timestamp power
1 2015-08-01 00:00:00 584.4069
2 2015-08-01 01:00:00 577.2829
3 2015-08-01 02:00:00 569.0937
4 2015-08-01 03:00:00 561.6945
5 2015-08-01 04:00:00 557.9449
6 2015-08-01 05:00:00 562.4152
I use following ggplot2 command to plot the data:
ggplot(df,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=pretty_breaks(n=30)) +
theme(axis.text.x = element_text(angle=90,hjust=1))
With this the plotted graph is:
My questions are:
In the plotted graph, why it is showing only the labels corresponding to hour 18. Now, what if I want to display the labels corresponding to hour 12 of each day.
I am plotting hourly data, hoping to see the fine granular details. But, I am not able to see all the hours of entire one month. Can I somehow see the zoomed view for any selected day in the same plot?
Here is a rather long example of scaling dates in ggplot and also a possible interactive way to zoom in on ranges. First, some sample data,
## Make some sample data
library(zoo) # rollmean
set.seed(0)
n <- 745
x <- rgamma(n,.15)*abs(sin(1:n*pi*24/n))*sin(1:n*pi/n/5)
x <- rollmean(x, 3, 0)
start.date <- as.POSIXct('2015-08-01 00:00:00') # the min from your df
dat <- data.frame(
timestamp=as.POSIXct(seq.POSIXt(start.date, start.date + 60*60*24*31, by="hour")),
power=x * 3000)
For interactive zooming, you could try plotly. You need to set it up (get an api-key and username) then just do
library(plotly)
plot_ly(dat, x=timestamp, y=power, text=power, type='line')
and you can select regions of the graph and zoom in on them. You can see it here.
For changing the breaks in the ggplot graphs, here is a function to make date breaks by various intervals at certain hours.
## Make breaks from a starting date at a given hour, occuring by interval,
## length.out is days
make_breaks <- function(strt, hour, interval="day", length.out=31) {
strt <- as.POSIXlt(strt - 60*60*24) # start back one day
strt <- ISOdatetime(strt$year+1900L, strt$mon+1L, strt$mday, hour=hour, min=0, sec=0, tz="UTC")
seq.POSIXt(strt, strt+(1+length.out)*60*60*24, by=interval)
}
One way to zoom in, non-interactively, is to simply subset the data,
library(scales)
library(ggplot2)
library(gridExtra)
## The whole interval, breaks on hour 18 each day
breaks <- make_breaks(min(dat$timestamp), hour=18, interval="day", length.out=31)
p1 <- ggplot(dat,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=breaks) +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
ggtitle("Full Range")
## Look at a specific day, breaks by hour
days <- 20
samp <- dat[format(dat$timestamp, "%d") %in% as.character(days),]
breaks <- make_breaks(min(samp$timestamp), hour=0, interval='hour', length.out=length(days))
p2 <- ggplot(samp,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"), breaks=breaks) +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
ggtitle(paste("Day:", paste(days, collapse = ", ")))
grid.arrange(p1, p2)
I didn't worked with data time data a lot so my code might look a bit messy... But the solution to 1 is to not use pretty_breaks() but better use concrete breaks and also limit the within the scale_x_datetime() function.
A bad written example might be the following:
ggplot(df,aes(timestamp,power,group=1))+ theme_bw() + geom_line()+
scale_x_datetime(labels = date_format("%d:%m; %H"),
breaks=as.POSIXct(sapply(seq(18000, 3600000, 86400), function(x) 0 + x),
origin="2015-10-19 7:00:00"),
limits=c(as.POSIXct(3000, origin="2015-10-19 7:00:00"),
as.POSIXct(30000, origin="2015-10-19 7:00:00"))) +
theme(axis.text.x = element_text(angle=90,hjust=1))
I am not sure how to write the as.POSIXct() more readable... But Basically create the 12 hour point manually and add always a complete day within the range of your data frame...

Resources