I have a data frame with a series of times in the following format:
08:09:23.079
> class(timer3)
[1] "factor"
I would like to round/convert them to this format:
08:09
The end goal is to use them as values for the x-axis of a plot so I assume they would need to go to some type of time format (zoo, as.Date, etc.).
Any suggestions?
Suppose we have this input data:
DF <- data.frame(times = c("08:09:23.079", "08:30:13.062"), values = 1:2)
To keep it simple lets assume that there is at most one time point per minute (we show an alternative that is slightly longer afterwards without this restriction):
library(zoo)
library(chron)
# this assumes we want to store times to the second
tt <- times(as.character(DF$times))
z <- zoo(DF$values, tt)
plot(z, xaxt = "n")
# custom axis - assumes sufficiently many points to get reasonable graph
# round tick mark locations to the minute and remove the seconds from label
axt <- trunc(times(axTicks(1)), "min")
axis(1, at = axt, lab = sub(":..$", "", axt))
The above method of creating z could alternately be replaced with this. It works whether or not there is more than one point per minute as it aggregates them to the minute:
# with this z we will be store times to the minute
z <- read.zoo(DF, FUN = function(x) trunc(times(as.character(x)), "min"),
aggregate = mean)
EDIT: plotting and truncation.
At risk of being called necromancer, I will answer this question as I think this situation arises quite often.
Here is how to do it if you convert your timeseries data in xts format. The function to be used here is align.time
> head(GBPJPY)
GBPJPY.Open GBPJPY.High GBPJPY.Low GBPJPY.Close
2009-05-01 00:14:59 146.387 146.882 146.321 146.620
2009-05-01 00:29:54 146.623 146.641 146.434 146.579
2009-05-01 00:44:59 146.579 146.908 146.570 146.810
2009-05-01 00:59:59 146.810 146.842 146.030 146.130
2009-05-01 01:14:59 146.130 146.330 146.100 146.315
2009-05-01 01:29:57 146.315 146.382 146.159 146.201
> head(align.time(GBPJPY, 15*60))
GBPJPY.Open GBPJPY.High GBPJPY.Low GBPJPY.Close
2009-05-01 00:15:00 146.387 146.882 146.321 146.620
2009-05-01 00:30:00 146.623 146.641 146.434 146.579
2009-05-01 00:45:00 146.579 146.908 146.570 146.810
2009-05-01 01:00:00 146.810 146.842 146.030 146.130
2009-05-01 01:15:00 146.130 146.330 146.100 146.315
2009-05-01 01:30:00 146.315 146.382 146.159 146.201
as.zoo(sapply(timer3,substring,1,5))
or as.xts?
Maybe looking at a bigger sample of your data would help.
Two steps: 1) Factor to character: as.character() 2) character to POSIXct: strptime()
Related
I have a 24 hour data starting from 7:30 today (for example), until 7:30 the next day, because I didn't link the date to the line plot, R sorts the hour starting from 00:00 despite the data starting at 7:30, I am a beginner in R, and I don't know where to begin to even solve this problem, should I try linking the date also to the X axis, or is there a better solution?
My time function somehow didn't work either, it used to work when I was plotting data for 15 minute increments.
library(chron)
d <- read.csv(file="data.csv", header = T)
t <- times(d$Time)
plot(t,d$MCO2, type="l")
Graph created from the 24 hour data I have :
Graph created from a 15 minute data using the same code :
I wanted the outcome to be from 7:30 to 7:30 the next day, but it showed now a decimal number from 0.0 to 1
Here is the link to the data, just in case:
https://www.dropbox.com/s/wsg437gu00e5t08/Data%20210519.csv?dl=0
The question is actually about combining a date column and a time column to create a timestamp containing date AND time. Note that I suggest to process everything as if we are in GMT timezone. You can pick whatever timezone you want, then stick to it.
# use ggplot
library(ggplot2)
# assume everything happens in GMT timezone
Sys.setenv( TZ = "GMT" )
# replicating the data: a measurement result sampled at 1 sec interval
t <- seq(start, end, by = "1 sec")
Time24 <- trimws(strftime(t, format = "%k:%M:%OS", tz="GMT"))
Date <- strftime(t, format = "%d/%m/%Y", tz="GMT")
head(Time24)
head(Date)
d <- data.frame(Date, Time24)
# this is just a random data of temperature
d$temp <- rnorm(length(d$Date),mean=25,sd=5)
head(d)
# the resulting data is as follows
# Date Time24 temp
#1 22/05/2019 0:00:00 22.67185
#2 22/05/2019 0:00:01 19.91123
#3 22/05/2019 0:00:02 19.57393
#4 22/05/2019 0:00:03 15.37280
#5 22/05/2019 0:00:04 31.76683
#6 22/05/2019 0:00:05 26.75153
# this is the answer to the question
# which is combining the the date and the time column of the data
# note we still assume that this happens in GMT
t <- as.POSIXct(paste(d$Date,d$Time24,sep=" "), format = "%d/%m/%Y %H:%M:%OS", tz="GMT")
# print the data into a plot
png(filename = "test.png", width = 800, height = 600, units = "px", pointsize = 22 )
ggplot(d,aes(x=t,y=temp)) + geom_line() +
scale_x_datetime(date_breaks = "3 hour",
date_labels = "%H:%M\n%d-%b")
The problem is that the function times does not include information about the day. This is a problem since your data spans two days.
The data type you use should be able to include information about the day. Posix is this data type. Also, since Posix is the go-to date-time object in R it is much easier to plot.
Before plotting the data, the time column should have the correct difference in days. When just transforming the column with as.POSIXct, the times of day 2 are read as if it is from day 1. This is why we have to add 24 hours to the correct entries.
After that, it is just a matter of plotting. I added an example of the package of ggplot2 since I prefer these plots.
You might notice that using as.POSIXct will add an incorrect date to your time information. Don't bother about this, you use this date just as a dummy date. You don't use this date itself, you just use it to be able to work with the difference in days.
library(ggplot2)
# Read in your data set
d <- read.csv(file="Data 210519.csv", header = T)
# Read column into R date-time object
t <- as.POSIXct(d$Time24, format = "%H:%M:%OS")
# Add 24 hours to time the time on day 2.
startOfDayTwo <- as.POSIXct("00:00:00", format = "%H:%M:%OS")
endOfDayTwo <- as.POSIXct("07:35:00", format = "%H:%M:%OS")
t[t >= startOfDayTwo & t <= endOfDayTwo] <- t[t >= startOfDayTwo & t <= endOfDayTwo] + 24*60*60
plot(t,d$MCO2, type="l")
# arguably a nicer plot
ggplot(d,aes(x=t,y=MCO2)) + geom_line() +
scale_x_datetime(date_breaks = "2 hour",
date_labels = "%I:%M %p")
i have got a set of F1 lap times that in minutes and seconds. the seconds have tenth, hundred and thousands of a second and i need to keep it that way as i am trying to graph this data but R classes it as a factor. i have tried to convert it to numeric(as.numeric) but that does not seem to work.
here is a data sample:
DriverId Q1 Q2 Q3
1 1:05.585 1:04.772 1:04.293
20 1:04.655 1:07.354 1:04.123
ggplotdoes not let me graph the data properly if it is not in the right format.
i have tried using the chron and POSIXlt packages with no avail as it seems that they do not support the mm:ss.000 format
help please!
You can use strptime, where %OS represents the seconds and their fractional parts. Example:
x<-as.factor(c("1:04.293",'1:04.772','1:04.890'))
options(digits.secs=3)
x<- strptime(x, "%M:%OS")
Output:
"2018-01-13 00:01:04.293 CET" "2018-01-13 00:01:04.772 CET" "2018-01-13 00:01:04.890 CET"
Return only minutes and seconds (as character), for example for tick labels in a plot:
format(x,"%M:%OS3")
[1] "01:04.293" "01:04.772" "01:04.890"
So an example with plotting:
plot(x,c(1,2,3),xaxt='n')
at <- axTicks(1)
axis(1, at=at, labels=format(as.POSIXct(at,origin='1970-1-1'),"%M:%OS3"), las=1)
Output:
Hope this helps!
We can create a times class object with times from chron
library(chron)
library(tidyr)
library(dplyr)
gather(df1, key, val, -DriverId) %>%
mutate(val = chron::times(paste0("00:", val)))
In the above, we converted it to 'long' format. If we don't need to reshape, then
df1 %>%
mutate_at(vars(-DriverId), funs(chron::times(paste0("00:", .))))
# DriverId Q1 Q2 Q3
#1 1 00:01:06 00:01:05 00:01:04
#2 20 00:01:05 00:01:07 00:01:04
I have googled a lot and yet I still cant figure this one out. I am trying to plot one column in a dataframe against time, however my date column is "hardcoded" (for the lack of a better word) as index in the dataframe not a DATE column, as a variable, by itself.
> head(tmp)[1]
this is what I get, the 1st column is Returns:
RETURNS
2010-01-13 00:00:00 0.8291384
2010-01-14 00:00:00 0.2423567
2010-01-15 00:00:00 -1.0882186
2010-01-19 00:00:00 1.2422194
2010-01-20 00:00:00 -1.0654438
2010-01-21 00:00:00 -1.9126605
If I plot it like:
plot(tmp$RETURNS)
I get a plot of returns against index from 1 to 1500 (number of obs.) and not time. If I got a distinct time column I would plot it like this and it would be fine:
plot(tmp$DATE, tmp$RETURNS)
However, I dont know how to extract the date from that "hardcoded" date column, if that makes sense. I tried to convert it to other objects, timeSeries, zoo etc. Didnt help.I am sure there is some kind of simple function, I just cant find it. Thanks for any help guys.
EDIT:
Thanks guys, your help is very much appreciated. All answers are great, too bad that I cant accept them all ;) Of course it was rownames what I was looking for.
Reproducing your data (you should really have used dput to make life easier for us):
df <- as.data.frame(c(0.8291384, 0.2423567,-1.0882186, 1.2422194,-1.0654438,-1.9126605))
names(df) <- c("RETURNS")
rownames(df) <- c("2010-01-13 00:00:00", "2010-01-14 00:00:00", "2010-01-15 00:00:00", "2010-01-19 00:00:00","2010-01-20 00:00:00","2010-01-21 00:00:00")
df
RETURNS
2010-01-13 00:00:00 0.8291384
2010-01-14 00:00:00 0.2423567
2010-01-15 00:00:00 -1.0882186
2010-01-19 00:00:00 1.2422194
2010-01-20 00:00:00 -1.0654438
2010-01-21 00:00:00 -1.9126605
Cleaning up:
df$Date <- as.Date(rownames(df))
rownames(df) <- NULL
df
RETURNS Date
1 0.8291384 2010-01-13
2 0.2423567 2010-01-14
3 -1.0882186 2010-01-15
4 1.2422194 2010-01-19
5 -1.0654438 2010-01-20
6 -1.9126605 2010-01-21
Plotting:
plot(df$Date, df$RETURNS)
or
library(ggplot2)
ggplot(df, aes(x=Date, y=RETURNS)) + geom_point() + scale_x_date()
Assuming that the input is as in the Note below then using zoo we can plot using classic graphics, ggplot2 and lattice as follows. We also show a base R solution at the end and a variation. Note that since the time is always 0 we used "Date" class for the time index in all cases.
library(zoo)
z <- zoo(df$RETURNS, as.Date(rownames(df)))
plot(z)
library(ggplot2)
autoplot(z)
library(lattice)
xyplot(z)
# this one does not use any packages
df2 <- data.frame(time = as.Date(rownames(df)), RETURNS = df$RETURNS)
plot(RETURNS ~ time, df2)
# this also works using df2 just calculated
plot(df2)
Note: We assume the input is:
df <- data.frame(
RETURNS = c(0.8291384, 0.2423567,-1.0882186, 1.2422194,-1.0654438,-1.9126605),
row.names = c("2010-01-13 00:00:00", "2010-01-14 00:00:00", "2010-01-15 00:00:00",
"2010-01-19 00:00:00","2010-01-20 00:00:00","2010-01-21 00:00:00"))
I have a large file of measurements with 3-second period (here is a small part of it). I made time serie:
library(zoo)
fmt<-"%d.%m.%y %H:%M:%S"
dat <- read.zoo("~/Institut/Monitor/UA_test.csv",header=TRUE,
dec=".",sep='\t',tz='',format=fmt,index=1)
On the next step I need convert it to the time series with 3 minutes update interval, where values must be the averages. What is the simplest way to do this?
Use aggregate.zoo :
aggregate(dat, as.POSIXct(cut(index(dat), "3 min")), mean)
Note that dec="." and index=1 are used by default in read.zoo so they could be omitted from the read.zoo line.
You can use the period.apply from the xts package
library(xts)
(x <- period.apply(dat, endpoints(dat, "minutes", 3), mean))
# UA UB UC
#2014-04-13 00:59:57 209.1605 226.4110 213.7115
#2014-04-13 01:02:57 215.4467 226.7065 211.3325
#2014-04-13 01:05:57 216.6252 225.4948 214.1290
#2014-04-13 01:07:45 218.4633 219.5589 214.1325
The second argument to period.apply is a vector of the row numbers of the end of each 3 minute period. endpoints(dat, "minutes", 3) calculates that for you.
If you want the timestamps to be "rounded" instead of being the last timestamp of each period, you can use align.time, but align.time requires that the object is an xts, so you'd have to convert to xts first.
xx <- as.xts(x)
align.time(xx, n=60*3)
# UA UB UC
#2014-04-13 01:00:00 209.1605 226.4110 213.7115
#2014-04-13 01:03:00 215.4467 226.7065 211.3325
#2014-04-13 01:06:00 216.6252 225.4948 214.1290
#2014-04-13 01:09:00 218.4633 219.5589 214.1325
I have a data file that needs to averaged.
data<-data.frame(
Data=seq(
from=as.POSIXct("2014-04-01 00:00:00"),
to=as.POSIXct("2014-04-03 00:00:00"),
by ="5 min"
),
value=rnorm(577,0,1)
)
I need to find the average of "value" from 05:00:00 to 17:00:00 and then 17:00:00 to 05:00:00 (of the following day).
e.g. from 2014-04-01 05:00:00 to 2014-04-01 17:00:00 and from 2014-04-01 17:00:00 to 2014-04-02 05:00:00
The real data is not continuous and is missing several intervals. I can do it for the same day, but I don't know how to include the time from the following day.
Here's one strategy. You can use the cut.POSIXt and the seq.POSIXt to create an interval factor and then use that to take the means of the different intervals.
intervals<-cut(
data$Data,
breaks=seq(
as.POSIXct("2014-03-31 17:00:00"),
as.POSIXct("2014-04-03 5:00:00"),
by="12 hours"
)
)
means<-tapply(data$value, intervals, mean)
as.data.frame(means)
Here is a way:
day <- data[as.numeric(strftime(data$Data,"%H")) > 5 &
as.numeric(strftime(data$Data,"%H")) < 17,]
night <- data[as.numeric(strftime(data$Data,"%H")) < 5 |
as.numeric(strftime(data$Data,"%H")) > 17,]
strftime returns a character vector, which is why it is nested inside as.numeric here. From there it is just indexing.