The data are a series of dates and times.
date time
2010-01-01 09:04:43
2010-01-01 10:53:59
2010-01-01 10:57:18
2010-01-01 10:59:30
2010-01-01 11:00:44
…
My goal was to represent a scatterplot with the date on the horizontal axis (x) and the time on the vertical axis (y). I guess I could also add a color intensity if there are more than one time for the same date.
It was quite easy to create an histogram of dates.
mydata <- read.table("mydata.txt", header=TRUE, sep=" ")
mydatahist <- hist(as.Date(mydata$day), breaks = "weeks", freq=TRUE, plot=FALSE)
barplot(mydatahist$counts, border=NA, col="#ccaaaa")
I haven't figured out yet how to create a scatterplot where the axis are date and/or time.
I would like also to be able to have axis not necessary with linear dates YYYY-MM-DD, but also based on months such as MM-DD (so different years accumulate), or even with a rotation on weeks.
Any help, RTFM URI slapping or hints is welcome.
The ggplot2 package handles dates and times quite easily.
Create some date and time data:
dates <- as.POSIXct(as.Date("2011/01/01") + sample(0:365, 100, replace=TRUE))
times <- as.POSIXct(runif(100, 0, 24*60*60), origin="2011/01/01")
df <- data.frame(
dates = dates,
times = times
)
Then get some ggplot2 magic. ggplot will automatically deal with dates, but to get the time axis formatted properly use scale_y_datetime():
library(ggplot2)
library(scales)
ggplot(df, aes(x=dates, y=times)) +
geom_point() +
scale_y_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M")) +
theme(axis.text.x=element_text(angle=90))
Regarding the last part of your question, on grouping by week, etc: To achieve this you may have to pre-summarize the data into the buckets that you want. You can use possibly use plyr for this and then pass the resulting data to ggplot.
I'd start by reading about as.POSIXct, strptime, strftime, and difftime. These and related functions should allow you to extract the desired subsets of your data. The formatting is a little tricky, so play with the examples in the help files.
And, once your dates are converted to a POSIX class, as.numeric() will convert them all to numeric values, hence easy to sort, plot, etc.
Edit: Andre's suggestion to play w/ ggplot to simplify your axis specifications is a good one.
Related
I created a boxplot with ggplot with the following data.frame:
library(lubridate)
library(ggplot2)
library(ggplotly)
df <- data.frame(
time = c("00:43:20", "00:44:30","00:45:40"),
sex = c("m","m","m")
)
df$sex <- factor(df$sex)
df$time <- lubridate::hms(df$time)
Now I created my boxplot with ggplot
g <- ggplot(df) +
geom_boxplot(aes(sex, time)) +
scale_y_time()
Everything looks fine and now get interactive with ggploty():
plotly::ggplotly(g)
But when I hoover over the boxplot, I just see seconds, not the lubridate format.
How can I manage to see the data as shown on the y-axis?
The problem is rather complex from what I understand. The main issue seems to be that lubridate stores times as periods. Therefore you get the seconds in plotly as in ggplot they are seconds as well, they just where converted on the scale by "scale_y_time".
From my understanding the work arround would be to convert the time value to a numeric value of minutes. Though this means a minutes will have 100sec after the comma/dot:
1st option with ggplot:
library(plotly)
library(ggplot)
library(lubridate)
# calculate time as minutes passed and get it as numeriic
mins <- as.numeric(lubridate::hms(df$time) - hms("00:00:00"))/60
df$sex <- factor(df$sex)
df$time <- mins
g <- ggplot2::ggplot(df) +
ggplot2::geom_boxplot(aes(sex, time))
plotly::ggplotly(g)
2nd option with plotly directly (only for the text data not sure if you could add sex F as x or if you need a second trace and some cosmetics need to be done also... anyhow ggplot gives practicalle the same result)
plotly::plot_ly(y = ~mins, type = "box")
Possibly there is a better solution - I just could not figure it out in the last 2 hours ;(
I have a continuous variable y measured on different dates. I need to make boxplots with a box showing the distribution of y for each 5 year interval.
Sample data:
rdob <- as.Date(dob, format= "%m/%d/%y")
ggplot(data = data, aes(x=rdob, y=ageyear)) + geom_boxplot()
#Warning message:
#Continuous x aesthetic -- did you forget aes(group=...)?
This image is the first one I tried. What I want is a box for every five year interval, instead of a box for every year.
Here is a way to pull out the year in base R:
format(as.Date("2008-11-03", format="%Y-%m-%d"), "%Y")
Simply wrap your date vector in a format() and add the "%Y". To get this to be integer, you can use as.integer.
You could also take a look at the year function in the lubridate package which will make this extraction a little bit more straightforward.
One method to get 5 year intervals is to use cut to create a factor variable that creates levels at selected break points. Unless you have dozens of years your best bet would be to set the break points manually:
df$myTimeInterval <- cut(df$years, breaks=c(1995, 2000, 2005, 2010, 2015))
Here's an example taking Dave2e's suggestion of using cut on date intervals along with ggplot's group aesthetic mapping:
library(ggplot2)
n <- 1000
## Randomly sample birth dates and dummy up an effect that trends upward with DOB
dobs <- sample(seq(as.Date('1970/01/01'), Sys.Date(), by="day"), n)
effect <- rnorm(n) + as.numeric(as.POSIXct(dobs)) / as.numeric(as.POSIXct(Sys.Date()))
data <- data.frame(dob=dobs, effect=effect)
## boxplot w/ DOB binned to 5 year intervals
ggplot(data=data, aes(x=dob, y=effect)) + geom_boxplot(aes(group=cut(dob, "5 year")))
library(lubridate)
year=year(rdob)
Suppose I have a vector of numbers from 1:12 and want to plot them over period of time ranged from Jan. 2013 to Dec. 2013. I used the following code to generate the data and plotting:
dates<-seq(as.Date("2013/1/1"), by = "month", length.out = 12)
n<-seq(1:12)
df<-cbind(dates,n)
plot(df)
However, some problems come up with the last code; Firstly could not find an option in the first seq to generate only months and year without day. Secondly, all dates in df become serial even after adding as.Date before dates in cbind. Finally, the x axis in the plot not in the time format as a result of the last two problems.
just use
plot(dates,n)
without cbinding it. cbind creates a matrix (see class(df)). Within this process the dates are saved as class numeric.
For nicer and easier to customize plots use
require(ggplot2)
qplot(dates,n) + xlab("") + ylab("my y lab")
df<-data.frame(dates=dates,n=n)
plot(df$dates, df$n, axes=FALSE)
axis(1, labels=format(df$dates, "%b %Y"), at=df$dates)
axis(2)
I have the following data.frame:
sample <- data.frame(day=c(1,2,5,10,12,12,14))
sample.table <- as.data.frame(table(sample$day))
Now what I'd like to do is graph the day against the count of days, so something like:
require(ggplot2)
qplot(Var1, Freq, data=sample.table)
I realized though that Var1 really really really wants to be a factor. This works fine for a small number of days, but is terrible when days becomes much larger because the graph becomes unreadable. If I change it to a numeric or integer, then instead of plotting day on the x-axis, it plots the count of day, e.g. 1,2,3,4,5,6,7.
What can I do so that if I have, say 5000 days, it is still visible well?
This is because when you use table you get a vector with names (which are characters), and when you convert to data.frame these get converted to factors with the default settings.
You could avoid this by using your original data and getting ggplot2 to count the data:
qplot(day, ..count.., data=sample, stat="bin", binwidth=1)
or just use a histogram,
qplot(day, data=sample, geom="histogram", binwidth=1)
Note that you can adjust the binwidth argument to count in larger groups.
Figured out a hack for this.
as.integer(as.character(sample$day))
I'm a graphic designer who is trying to use R to create graphs that are too complicated for Excel. I'm specifically trying to create an irregular time series step chart. I've had no problems creating a regular time series chart, but for some reason, the irregular dates are throwing everything off.
I'm starting with a basic text file with two columns of data:
01-04-1940 4
05-29-1963 35
12-02-2002 24
I've loaded the data using
d <- read.delim("file.txt", header = TRUE)
and I've converted the first column in Unix time using
d$date <- as.Date(d$date, format = "%m-%d-%Y")
But at this point, I can't find any more information anywhere on how to proceed. I've seen the R package "ITS," But I cannot find any documentation on it beyond technical descriptions of the classes involved.
I'd much appreciate it if someone with some experience in R could point out the few lines of code I need to create this graph. Thanks!
ggplot deals quite nicely with data in date format. Here are some suggestions:
d <- data.frame(
date = c("01-04-1940", "05-29-1963", "12-02-2002"),
value = c(4, 35, 24)
)
d$date <- as.Date(d$date, format = "%m-%d-%Y")
ggplot(d, aes(x=date, y=value)) + geom_step(colour="blue")
ggplot(d, aes(x=date, y=value)) + geom_line(colour="red")
I would use xts/zoo. They both handle irregular time series easily.
z <- zoo(d[,2], d[,1])
plot(z)
plot(z, type="s")