I'm a graphic designer who is trying to use R to create graphs that are too complicated for Excel. I'm specifically trying to create an irregular time series step chart. I've had no problems creating a regular time series chart, but for some reason, the irregular dates are throwing everything off.
I'm starting with a basic text file with two columns of data:
01-04-1940 4
05-29-1963 35
12-02-2002 24
I've loaded the data using
d <- read.delim("file.txt", header = TRUE)
and I've converted the first column in Unix time using
d$date <- as.Date(d$date, format = "%m-%d-%Y")
But at this point, I can't find any more information anywhere on how to proceed. I've seen the R package "ITS," But I cannot find any documentation on it beyond technical descriptions of the classes involved.
I'd much appreciate it if someone with some experience in R could point out the few lines of code I need to create this graph. Thanks!
ggplot deals quite nicely with data in date format. Here are some suggestions:
d <- data.frame(
date = c("01-04-1940", "05-29-1963", "12-02-2002"),
value = c(4, 35, 24)
)
d$date <- as.Date(d$date, format = "%m-%d-%Y")
ggplot(d, aes(x=date, y=value)) + geom_step(colour="blue")
ggplot(d, aes(x=date, y=value)) + geom_line(colour="red")
I would use xts/zoo. They both handle irregular time series easily.
z <- zoo(d[,2], d[,1])
plot(z)
plot(z, type="s")
Related
I have a dataset with a column in string of format hh:mm:ss. I want to create a histogram based on this column in such a way that I can visualize the number of observations between 12 AM and 3 PM in R.
plot_ly(x = (as.numeric(data$Time) * 1000), type = "histogram") %>%
layout(xaxis=list(type="date", tickformat="%H:%M:%S"))
I tried plotting using Plotly but the x-axis is in a different format than expected. Please give suggestions.
One approach could be the use of the hms library
library("hms")
As there was no data provided I generated some random data for an easier understanding. The as_hms() function transforms the values as a difftime vector with a custom class
Count <- c(10,20,30,100,110,110,20,30,50,30)
Time <- c('12:02:01','12:07:38','12:30:42','12:57:21','13:01:09','13:38:36','13:48:43','13:51:33','14:50:22','14:59:59')
Time = as_hms(c(Time))
data = data.frame(Count, Time)
With ggplot you can now easily create an histogram with the number of observations. And if you need explicitly a plotly visualization you can achieve this with the library ggplotly.
p <- ggplot(data=data, aes(x=Time, y=Count)) +
geom_bar(stat="identity")
ggplotly(p)
I created a boxplot with ggplot with the following data.frame:
library(lubridate)
library(ggplot2)
library(ggplotly)
df <- data.frame(
time = c("00:43:20", "00:44:30","00:45:40"),
sex = c("m","m","m")
)
df$sex <- factor(df$sex)
df$time <- lubridate::hms(df$time)
Now I created my boxplot with ggplot
g <- ggplot(df) +
geom_boxplot(aes(sex, time)) +
scale_y_time()
Everything looks fine and now get interactive with ggploty():
plotly::ggplotly(g)
But when I hoover over the boxplot, I just see seconds, not the lubridate format.
How can I manage to see the data as shown on the y-axis?
The problem is rather complex from what I understand. The main issue seems to be that lubridate stores times as periods. Therefore you get the seconds in plotly as in ggplot they are seconds as well, they just where converted on the scale by "scale_y_time".
From my understanding the work arround would be to convert the time value to a numeric value of minutes. Though this means a minutes will have 100sec after the comma/dot:
1st option with ggplot:
library(plotly)
library(ggplot)
library(lubridate)
# calculate time as minutes passed and get it as numeriic
mins <- as.numeric(lubridate::hms(df$time) - hms("00:00:00"))/60
df$sex <- factor(df$sex)
df$time <- mins
g <- ggplot2::ggplot(df) +
ggplot2::geom_boxplot(aes(sex, time))
plotly::ggplotly(g)
2nd option with plotly directly (only for the text data not sure if you could add sex F as x or if you need a second trace and some cosmetics need to be done also... anyhow ggplot gives practicalle the same result)
plotly::plot_ly(y = ~mins, type = "box")
Possibly there is a better solution - I just could not figure it out in the last 2 hours ;(
I am trying to emulate a ggplot of multiple lines which works as follows:
set.seed(45)
df <- data.frame(x=c(1,2,3,4,5,1,2,3,4,5,3,4,5), val=sample(1:100, 13),
variable=rep(paste0("category", 1:3), times=c(5,5,3)))
ggplot(data = df, aes(x=x, y=val)) + geom_line(aes(colour=variable))
I can get this simple example to work, however on a much larger data set I am following the same steps but it is not working.
ncurrencies = 6
dates = c(BTC$Date, BCH$Date, LTC$Date, ETH$Date, XRP$Date, XVG$Date)
opens = c(BTC$Open, BCH$Open, LTC$Open, ETH$Open, XRP$Open, XVG$Open)
categories = rep(paste0("categories", 1:ncurrencies),
times=c(nrow(BTC), nrow(BCH), nrow(LTC), nrow(ETH), nrowXRP), nrow(XVG)))
df = data.frame(dates, opens, categories)
# Plot - Not correct.
ggplot(data=df, aes(x=dates, y=opens)) +
geom_line(aes(colour=categories))
As you can see, the different points are discretised and the y-axis is strange. I am guessing this is a rookie error but I have been going round in circles for a while. Can anyone see it?
P.S. I don't think I can upload the data here as it would be too much code. However, the dataframe is in the same format as the practice example and the categories match up correctly to the x and y data. Therefore I believe it is the way I am defining ggplot - I am relatively new to R.
Thank you Markus and Jan, yes you are correct. df$opens was a factor and changing it to a numeric solved the problem.
opens = as.numeric(c(BTC$Open, BCH$Open, LTC$Open, ETH$Open, XRP$Open, XVG$Open))
I have time-series data of four years. Now I want to plot the same data year-wise and do comparative analysis. The dummy data is as
library(xts)
library(ggplot2)
timeindex <- seq(as.POSIXct('2016-01-01'),as.POSIXct('2016-12-31 23:59:59'), by = "1 mins")
dataframe <- data.frame(year1=rnorm(length(timeindex),100,10),year2=rnorm(length(timeindex),150,7),
year3=rnorm(length(timeindex),200,3),
year4=rnorm(length(timeindex),350,4))
xts_df <- xts(dataframe,timeindex)
Now, when I use ggplot it takes too long to plot all the series using following lines
visualize_dataframe_all_columns(xts_df)
The above function is defined as:
visualize_dataframe_all_columns <- function(xts_data) {
library(RColorBrewer)# to increase no. of colors
library(plotly)
dframe <- data.frame(timeindex=index(xts_data),coredata(xts_data))
df_long <- reshape2::melt(dframe,id.vars = "timeindex")
colourCount = length(unique(df_long$variable))
getPalette = colorRampPalette(brewer.pal(8, "Dark2"))(colourCount) # brewer.pal(8, "Dark2") or brewer.pal(9, "Set1")
g <- ggplot(df_long,aes(timeindex,value,col=variable,group=variable))
g <- g + geom_line() + scale_colour_manual(values=getPalette)
ggplotly(g)
}
Problems with above approach are:
It takes long time to plot. Can I reduce the plot time?
It is very diffcult to zoom into the plot using plotly. Is there any other better way
Are there any better approaches to visualize this data?
I faced more or less the same problem with frequency of 10 mins data. However, the question is that, does it make sense to plot the minute data for whole year? Human eyes cannot recognize the difference.
I would create a daily xts from that data and and plot it for the year. And modify the function to plot for a period of time for the minute data.
The data are a series of dates and times.
date time
2010-01-01 09:04:43
2010-01-01 10:53:59
2010-01-01 10:57:18
2010-01-01 10:59:30
2010-01-01 11:00:44
…
My goal was to represent a scatterplot with the date on the horizontal axis (x) and the time on the vertical axis (y). I guess I could also add a color intensity if there are more than one time for the same date.
It was quite easy to create an histogram of dates.
mydata <- read.table("mydata.txt", header=TRUE, sep=" ")
mydatahist <- hist(as.Date(mydata$day), breaks = "weeks", freq=TRUE, plot=FALSE)
barplot(mydatahist$counts, border=NA, col="#ccaaaa")
I haven't figured out yet how to create a scatterplot where the axis are date and/or time.
I would like also to be able to have axis not necessary with linear dates YYYY-MM-DD, but also based on months such as MM-DD (so different years accumulate), or even with a rotation on weeks.
Any help, RTFM URI slapping or hints is welcome.
The ggplot2 package handles dates and times quite easily.
Create some date and time data:
dates <- as.POSIXct(as.Date("2011/01/01") + sample(0:365, 100, replace=TRUE))
times <- as.POSIXct(runif(100, 0, 24*60*60), origin="2011/01/01")
df <- data.frame(
dates = dates,
times = times
)
Then get some ggplot2 magic. ggplot will automatically deal with dates, but to get the time axis formatted properly use scale_y_datetime():
library(ggplot2)
library(scales)
ggplot(df, aes(x=dates, y=times)) +
geom_point() +
scale_y_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M")) +
theme(axis.text.x=element_text(angle=90))
Regarding the last part of your question, on grouping by week, etc: To achieve this you may have to pre-summarize the data into the buckets that you want. You can use possibly use plyr for this and then pass the resulting data to ggplot.
I'd start by reading about as.POSIXct, strptime, strftime, and difftime. These and related functions should allow you to extract the desired subsets of your data. The formatting is a little tricky, so play with the examples in the help files.
And, once your dates are converted to a POSIX class, as.numeric() will convert them all to numeric values, hence easy to sort, plot, etc.
Edit: Andre's suggestion to play w/ ggplot to simplify your axis specifications is a good one.