Plot long time-series data using ggplot - r

I have time-series data of four years. Now I want to plot the same data year-wise and do comparative analysis. The dummy data is as
library(xts)
library(ggplot2)
timeindex <- seq(as.POSIXct('2016-01-01'),as.POSIXct('2016-12-31 23:59:59'), by = "1 mins")
dataframe <- data.frame(year1=rnorm(length(timeindex),100,10),year2=rnorm(length(timeindex),150,7),
year3=rnorm(length(timeindex),200,3),
year4=rnorm(length(timeindex),350,4))
xts_df <- xts(dataframe,timeindex)
Now, when I use ggplot it takes too long to plot all the series using following lines
visualize_dataframe_all_columns(xts_df)
The above function is defined as:
visualize_dataframe_all_columns <- function(xts_data) {
library(RColorBrewer)# to increase no. of colors
library(plotly)
dframe <- data.frame(timeindex=index(xts_data),coredata(xts_data))
df_long <- reshape2::melt(dframe,id.vars = "timeindex")
colourCount = length(unique(df_long$variable))
getPalette = colorRampPalette(brewer.pal(8, "Dark2"))(colourCount) # brewer.pal(8, "Dark2") or brewer.pal(9, "Set1")
g <- ggplot(df_long,aes(timeindex,value,col=variable,group=variable))
g <- g + geom_line() + scale_colour_manual(values=getPalette)
ggplotly(g)
}
Problems with above approach are:
It takes long time to plot. Can I reduce the plot time?
It is very diffcult to zoom into the plot using plotly. Is there any other better way
Are there any better approaches to visualize this data?

I faced more or less the same problem with frequency of 10 mins data. However, the question is that, does it make sense to plot the minute data for whole year? Human eyes cannot recognize the difference.
I would create a daily xts from that data and and plot it for the year. And modify the function to plot for a period of time for the minute data.

Related

Plotting a string column of format hh:mm:ss as histogram to get count of observation between 12 AM and 3 PM in R

I have a dataset with a column in string of format hh:mm:ss. I want to create a histogram based on this column in such a way that I can visualize the number of observations between 12 AM and 3 PM in R.
plot_ly(x = (as.numeric(data$Time) * 1000), type = "histogram") %>%
layout(xaxis=list(type="date", tickformat="%H:%M:%S"))
I tried plotting using Plotly but the x-axis is in a different format than expected. Please give suggestions.
One approach could be the use of the hms library
library("hms")
As there was no data provided I generated some random data for an easier understanding. The as_hms() function transforms the values as a difftime vector with a custom class
Count <- c(10,20,30,100,110,110,20,30,50,30)
Time <- c('12:02:01','12:07:38','12:30:42','12:57:21','13:01:09','13:38:36','13:48:43','13:51:33','14:50:22','14:59:59')
Time = as_hms(c(Time))
data = data.frame(Count, Time)
With ggplot you can now easily create an histogram with the number of observations. And if you need explicitly a plotly visualization you can achieve this with the library ggplotly.
p <- ggplot(data=data, aes(x=Time, y=Count)) +
geom_bar(stat="identity")
ggplotly(p)

Moving average on several time series using ggplot

Hi I try desperately to plot several time series with a 12 months moving average.
Here is an example with two time series of flower and seeds densities. (I have much more time series to work on...)
#datasets
taxon <- c(rep("Flower",36),rep("Seeds",36))
density <- c(seq(20, 228, length=36),seq(33, 259, length=36))
year <- rep(c(rep("2000",12),rep("2001",12),rep("2002",12)),2)
ymd <- c(rep(seq(ymd('2000-01-01'),ymd('2002-12-01'), by = 'months'),2))
#dataframe
df <- data.frame(taxon, density, year, ymd)
library(forecast)
#create function that does a Symmetric Weighted Moving Average (2x12) of the monthly log density of flowers and seeds
ma_12 <- function(x) {
ts_x <- ts(x, freq = 12, start = c(2000, 1), end = c(2002, 12)) # transform to time-series object as it is necessary to run the ma function
return(ma(log(ts_x + 1), order = 12, centre = T))
}
#trial of the function
ma_12(df[df$taxon=="Flower",]$density) #works well
library(ggplot2)
#Trying to plot flower and seeds log density as two time series
ggplot(df,aes(x=year,y=density,colour=factor(taxon),group=factor(taxon))) +
stat_summary(fun.y = ma_12, geom = "line") #or geom = "smooth"
#Warning message:
#Computation failed in `stat_summary()`:
#invalid time series parameters specified
Function ma_12 works correctly. The problem comes when I try to plot both time-series (Flower and Seed) using ggplot. I cannot define both taxa as different time series and apply a moving average on them. Seems that it has to do with "stat_summary"...
Any help would be more than welcome! Thanks in advance
Note: The following link is quite useful but can not directly help me as I want to apply a specific function and plot it in accordance to the levels of one group variable. For now, I can't find any solution. Any way, thank you to suggest me this.
Multiple time series in one plot
This is what you need?
f <- ma_12(df[df$taxon=="Flower", ]$density)
s <- ma_12(df[df$taxon=="Seeds", ]$density)
f <- cbind(f,time(f))
s <- cbind(s,time(s))
serie <- data.frame(rbind(f,s),
taxon=c(rep("Flower", dim(f)[1]), rep("Seeds", dim(s)[1])))
serie$density <- exp(serie$f)
library(lubridate)
serie$time <- ymd(format(date_decimal(serie$time), "%Y-%m-%d"))
library(ggplot2)
ggplot() + geom_point(data=df, aes(x=ymd, y=density, color=taxon, group=taxon)) +
geom_line(data=serie, aes(x= time, y=density, color=taxon, group=taxon))

R - Building a histogram with data in intervals (from survey)

I'm currently analysing some data I've retrieved from a survey and I want to create a histogram with it.
The problem is that the data is in pairs of range-absolute frequency, something like with different ranges:
Since the intervals are not the same, how can I generate the histogram in R?
Thank you in advance.
I think you want a bar chart instead of a histogram. Here's an article that explains the difference nicely.
For a barchart with the data you provided in the format you've indicated you could do something like this:
my_data <- data.frame(range = c('[0-2]','[2-5]','[5-9]'),
abs_frequency = c(2,10,5))
library(ggplot2)
plot <- ggplot(data = my_data, aes(x = range, y = abs_frequency))
plot +
geom_bar(stat="identity")

Differentiate missing values from main data in a plot using R

I create a dummy timeseries xts object with missing data on date 2-09-2015 as:
library(xts)
library(ggplot2)
library(scales)
set.seed(123)
seq <- seq(as.POSIXct("2015-09-01"),as.POSIXct("2015-09-02"), by = "1 hour")
ob1 <- xts(rnorm(length(seq),150,5),seq)
seq2 <- seq(as.POSIXct("2015-09-03"),as.POSIXct("2015-09-05"), by = "1 hour")
ob2 <- xts(rnorm(length(seq2),170,5),seq2)
final_ob <- rbind(ob1,ob2)
plot(final_ob)
# with ggplot
df <- data.frame(time = index(final_ob), val = coredata(final_ob) )
ggplot(df, aes(time, val)) + geom_line()+ scale_x_datetime(labels = date_format("%Y-%m-%d"))
After plotting my data looks like this:
The red coloured rectangular portion represents the date on which data is missing. How should I show that data was missing on this day in the main plot?
I think I should show this missing data with a different colour. But, I don't know how should I process data to reflect the missing data behaviour in the main plot.
Thanks for the great reproducible example.
I think you are best off to omit that line in your "missing" portion. If you have a straight line (even in a different colour) it suggests that data was gathered in that interval, that happened to fall on that straight line. If you omit the line in that interval then it is clear that there is no data there.
The problem is that you want the hourly data to be connected by lines, and then no lines in the "missing data section" - so you need some way to detect that missing data section.
You have not given a criteria for this in your question, so based on your example I will say that each line on the plot should consist of data at hourly intervals; if there's a break of more than an hour then there should be a new line. You will have to adjust this criteria to your specific problem. All we're doing is splitting up your dataframe into bits that get plotted by the same line.
So first create a variable that says which "group" (ie line) each data is in:
df$grp <- factor(c(0, cumsum(diff(df$time) > 1)))
Then you can use the group= aesthetic which geom_line uses to split up lines:
ggplot(df, aes(time, val)) + geom_line(aes(group=grp)) + # <-- only change
scale_x_datetime(labels = date_format("%Y-%m-%d"))

How to quickly (and elegantly) iterate between time series objects `ts` and date frames in R for ggplot2 plotting?

I am seeking guidance on how to quickly iterate between time series objects and date frames in R so that I plot in ggplot2, but allow for general analysis of the time series as ts().
For example, the following feels very clunky:
library(ggplot2)
library(lubridate)
library(forecast)
AP <- AirPassengers
df <- data.frame(date=as.Date(time(AP)), Y=as.matrix(AP))
ggplot(df, aes(x=factor(month(date)), y=Y)) +
geom_boxplot()
Further, I loose (?) the ability to utilize ggplot2::scale_x_date this way?
The essence of the question: how can I quickly plot the result of the graph in the code with ggplot2 and ideally with month labels for the x-axis while hopefully jumping through fewer hoops?
I realize I could use:
boxplot(AP ~ cycle(AP))
But I would like to use ggplot2 for greater flexibility.
Well, this seems to work.
library(xts)
library(ggplot2)
AP <- AirPassengers
df <- data.frame(date=as.Date(time(AP)), Y=as.matrix(AP))
ggplot(df)+geom_boxplot(aes(x=format(date,"%m"),y=Y))+
scale_x_discrete("",labels=unique(format(df$date,"%b")))

Resources