how can I have a data set of only time intervals (no dates) in R, like the following:
TREATMENT_A TREATMENT_B
1:01:12 0:05:00
0:34:56 1:08:09
and compute mean times, etc, and draw boxplots with time intervals in the y-axis?
I am new to R, and I searched for this but found no example in the net.
Thanks
The chron-package has a 'times' class that supports arithmetic. You could also do all of that with POSIXct objects and format the date-time output to not include the date. I thought axis.POSIXct function has a format argument that should let you have time outputs. However, it does not seem to get dispatched properly, so I needed to construct the axis "by hand."
dft <- data.frame(x= factor( sample(1:2, 100, repl=TRUE)),
y= Sys.time()+rnorm(100)*4000 )
boxplot(y~x, data=dft, yaxt='n')
axis(2, at=seq(from=range(dft$y)[1], to =range(dft$y)[2], by=3000) ,
labels=format.POSIXct(seq(from=range(dft$y)[1], to =range(dft$y)[2], by=3000),
format ="%H:%M:%S") )
There did turn out to be an appropriate method, Axis.POSIXt (to which I thought boxplot should have been turning for plotting, but it did not seem to recognize the class of the 'y' argument):
boxplot(y~x, data=dft, yaxt='n')
Axis(side=2, x=range(dft$y), format ="%H:%M:%S")
Regarding your request for something "simpler", take a look at theis ggplot2 based solution, using the dft dataframe defined above with POSIXct times. (I did try with the chron-times object but got a message saying ggplot did not support that class):
require(ggplot2); p <- ggplot(dft, aes(x,y))
p + geom_boxplot()
Check out the "lubridate" package, and the "hms" function within it.
Related
I have a dataset which plots unemployment over time, and I want to add in bands highlighting when there is a recession.
The original dataframe is called quarterly data.
recession <- data.frame(date_start= as_date(c("1973-07-01", "1980-01-01", "1990-07-01","2008-04-01")),
date_end = as_date(c("1975-07-01","1981-04-01", "1991-07-01","2009-04-01")))
recession$date_start <- ymd (recession$date_start)
recession$date_end <- ymd (recession$date_end)
ggplot(quarterly_data, aes(x=date, y= Unemployment))+
geom_line()+
geom_rect(data = recession, inherit.aes=FALSE , aes(xmin = date_start, xmax = date_end, ymin = -0.1, ymax = 0.1),
fill = "red", alpha= 0.3)
However, when I run the ggplot, I get this error message:
Error: Invalid input: time_trans works with objects of class POSIXct only
Does anyone know how to fix this?
While you have supplied us with the data frame recession, you have not supplied us with the data frame quarterly_data, where you are getting the error. A few pointers here to try, but first, a bit of description of what to gauge is causing this issue.
First of all, time_trans appears to be from the scales package, but it's not clear why that needs to run based on the code above. Is there anything else that could be using the scales package here?
Now for the error message itself, it requires an object of class POSIXct only. This is different than objects of class Date, which are created from the lubridate package that you are using, as apparent from the use of as_date to create the recession data frame.
You can confirm this yourself by running class(recession$date_start), where you can see the output is a Date class object.
After the ymd() function, you are also getting an object of class Date. From the documentation, you should be able to coerce the class to be converted to POSIXct POSIXt via supplying a tz= (time zone) argument. You can see this with the following:
> class(ymd(recession$date_start))
[1] "Date"
> class(ymd(recession$date_start, tz='GMT'))
[1] "POSIXct" "POSIXt"
So, that might fix your problem. But, you still have some detective work to do, since we don't have your other data frame and we apparently are not seeing a function that is trying to call time_trans from the scales package. The other possibility here is that ggplot is calling this to adjust an axis based on a POSIXt object... but I don't see a scale_ call or coord_flip() that might cause this error. I would recommend the following sequence:
Try the "homerun" approach by running your ymd() functions again, but supplying tz="GMT" to force the output to be a POSIXct object. Not sure if this will be successful.
run the ggplot() line itself. Do you get the same error? If so, the error lies within the quarterly_data data frame, and not the recession data frame. If it works, then run the ggplot() line and add in the geom_line() object. If it still works, then your issue is with the geom_rect function, which likely means the recession data frame.
Check the class of date objects in quarterly_data. Are they Date class or POSIXct class? If Date, try to convert them to POSIXct (maybe just use as.POSIXct()).
Is there more code that belongs here from your plot call? If you have coord_flip() or any scale_x or other thematic elements that are added to your plot code, they can definitely be trying to adjust the time scale and result in that error.
I'm trying to plot some time series data. My plot looks like the following:
I'm uncertain as to why it displays the date as such. I'm using R Markdown in R studio. Below is my code:
agemployment<-read.csv("Employment-Level1.csv", header=TRUE)
Tried to change the class of Date:
as.Date(as.character(agemployment$Date),format="%m%d%Y")
That did nothing. Rest of code here:
`attach(agemployment)
View(agemployment)
head(agemployment)
agemployment<-ts(agemployment,frequency=12,start=c(2008, 1))
plot(agemployment, col="black", main="Agriculture Employment Level",
ylab="Total Employment Level (Thousands)", ylim=c(0, 250),lwd=2,
xaxs="i", yaxs="i", lty=1)'
This produces the above plot. I'm uncertain what I'm doing wrong. I would appreciate any help. Thank you!
EDIT:
Data here:
I suspect your issues are somehow driven by attach, generally attaching data frames is not a good practice. The following super-simple code worked for me:
# small dataset from your example, I use package readr to load it as data frame
df = readr::read_csv("DATE,Employment
1/1/2008,1245
2/1/2008,1280
3/1/2008,1343
4/1/2008,1251
5/1/2008,1236
6/1/2008,1265")
ts <- ts(data = df$Employment, frequency = 12, start = c(2008, 1))
plot(ts)
Using the file generated reproducibly in the Note at the end read the file into a zoo object making the index of class "yearmon" (representing year and month without day). Then plot it.
library(zoo)
z <- read.csv.zoo("Employment-Level1.csv", format = "%m/%d/%Y", FUN = as.yearmon)
plot(z)
or
library(ggplot2)
autoplot(z) + scale_x_yearmon()
(continued after plots)
If you wanted to convert z to a ts object or data frame:
tt <- as.ts(z)
DF <- fortify.zoo(z)
Note
Lines <- "DATE,Employment
1/1/2008,1245
2/1/2008,1280
3/1/2008,1343
4/1/2008,1251
5/1/2008,1236
6/1/2008,1265"
cat(Lines, file = "Employment-Level1.csv") # write out file
Realize that by providing an image in the question it means that everyone who answers must retype your data so in the future please provide the input data to questions in a reproducible form as we have done here.
I have a partial success with
input = "date,data
1-1-2015,5.5
2-1-2016,1.0
3-1-2016,4.0
4-1-2016,4.0
5-1-2019,3.0"
new = read.csv(text=input)
new$date = as.Date(new$date, "%d-%m-%Y")
new$date = as.numeric(new$date, as.Date("2015-01-01"), units="days") #https://stat.ethz.ch/pipermail/r-help/2008-May/162719.html
plot(density(new$date))
Resulting in working graph, unfortunately x axis is obviously formatted as integers. How can I produce graph with x axis formatted as data?
I expected
new = read.csv(text=input)
new$date = as.Date(new$date, "%d-%m-%Y")
plot(density(new$date))
to work, unfortunately it crashed with Error in density.default(new$date) : argument 'x' must be numeric.
density() wasn't really optimized to work with dates. The easiest fix would probably be to just replace the default axis labeling with date values. Here's how you can do that
plot(density(new$date), xaxt="n")
at<-axTicks(1)
axis(1,at, as.Date(at, origin="1970-01-01"))
I have just started to learn R and I have a problem with plotting some values read from a CSV file.
I have managed to load the csv file:
timeseries <- read.csv(file="R/scripts/timeseries.csv",head=FALSE,sep=",")
When checking the content of timeseries, I get the correct results (so far, so good):
1 2016-12-29T19:00:00Z 6
...
17497 2016-12-30T00:00:00Z 3
Now, I am trying to plot the values - the date should be on the x-axis and the values on the y-axis.
I found some SO questions about this topic: How to plot a multicolumn CSV file?. But I am unable to make it work following the instructions.
I tried:
matplot(timeseries[, 1], timeseries[, -1], type="1")
Also, I tried various barplot and matplot modifications but I usuassly get some exception like this one: Error in plot.window(...) : need finite 'xlim' values
Could someone suggest how to tackle this problem? Sorry for elementary question...
You need to make sure your dates have class Date.
dates <- c("2016-12-29T19:00:00Z", "2016-12-30T00:00:00Z")
values <- c(6,3)
df <- data.frame(dates, values)
df$dates <- as.Date(df$dates)
Then you could use ggplot2
library(ggplot2)
qplot(df$dates, df$values) + geom_line()
or even the default
plot(df$dates, df$values, type = "l")
or with lattice as in the question you referred to
library(lattice)
xyplot(df$values ~ df$dates, type = "l")
users
thanks to the reply of #McQueenDon on r-nabble
http://r.789695.n4.nabble.com/boxplot-with-x-axis-time-td4686787.html#a4687746
I managed to produce a boxplot::base of a single variable with the x-axis correctly formatted and spaced for the date of acquisition.
What if I would like to produce it with bwplot::lattice? I need this because I would like also to use a conditional factor.
Here you are a reproducible example (thanks again to #McQueenDon )
data(iris)
pippo= stack(iris[,-5])
pippo$date= rep(c("2013/01/29", "2013/03/01", "2013/11/01",
"2013/12/01", "2014/02/01", "2014/07/02"), 100)
pippo$date= as.Date(pippo$date)
boxplot(pippo$values ~ pippo$date) ## NOT exactly what I want
bx<- boxplot(pippo$values ~ pippo$date, plot= F)
bxp(bx, at=sort(unique(pippo$date))) # this is what I was looking for !
require(lattice)
bwplot(values~date, pippo, horizontal=F) #dates looks not correctly spaced even though they are correctly ordered and formatted
# finally I would like to condition to the 'ind' variable
bwplot(values~date| ind, pippo, horizontal=F, layout= c(2,2))
Thanks
Giuseppe
How about
xyplot(values~date| ind, pippo, horizontal=F, layout= c(2,2),
panel=panel.bwplot, box.width=20)
Here we use xyplot with a custom panel= parameter rather than bwplot because bwplot converts the x to a factor first which renumbers all the levels with sequential integers; xyplot does not do this.
If you wanted to label the exact dates, you could try
dts<-unique(pippo$date)
xyplot(values~date| ind, pippo, horizontal=F, layout= c(2,2),
panel=panel.bwplot, box.width=20,
scales=list(x=list(at=dts)))
but that looks quote crowded in this particular example.