R: plotting data.frame against time with "hardcoded" date column - r

I have googled a lot and yet I still cant figure this one out. I am trying to plot one column in a dataframe against time, however my date column is "hardcoded" (for the lack of a better word) as index in the dataframe not a DATE column, as a variable, by itself.
> head(tmp)[1]
this is what I get, the 1st column is Returns:
RETURNS
2010-01-13 00:00:00 0.8291384
2010-01-14 00:00:00 0.2423567
2010-01-15 00:00:00 -1.0882186
2010-01-19 00:00:00 1.2422194
2010-01-20 00:00:00 -1.0654438
2010-01-21 00:00:00 -1.9126605
If I plot it like:
plot(tmp$RETURNS)
I get a plot of returns against index from 1 to 1500 (number of obs.) and not time. If I got a distinct time column I would plot it like this and it would be fine:
plot(tmp$DATE, tmp$RETURNS)
However, I dont know how to extract the date from that "hardcoded" date column, if that makes sense. I tried to convert it to other objects, timeSeries, zoo etc. Didnt help.I am sure there is some kind of simple function, I just cant find it. Thanks for any help guys.
EDIT:
Thanks guys, your help is very much appreciated. All answers are great, too bad that I cant accept them all ;) Of course it was rownames what I was looking for.

Reproducing your data (you should really have used dput to make life easier for us):
df <- as.data.frame(c(0.8291384, 0.2423567,-1.0882186, 1.2422194,-1.0654438,-1.9126605))
names(df) <- c("RETURNS")
rownames(df) <- c("2010-01-13 00:00:00", "2010-01-14 00:00:00", "2010-01-15 00:00:00", "2010-01-19 00:00:00","2010-01-20 00:00:00","2010-01-21 00:00:00")
df
RETURNS
2010-01-13 00:00:00 0.8291384
2010-01-14 00:00:00 0.2423567
2010-01-15 00:00:00 -1.0882186
2010-01-19 00:00:00 1.2422194
2010-01-20 00:00:00 -1.0654438
2010-01-21 00:00:00 -1.9126605
Cleaning up:
df$Date <- as.Date(rownames(df))
rownames(df) <- NULL
df
RETURNS Date
1 0.8291384 2010-01-13
2 0.2423567 2010-01-14
3 -1.0882186 2010-01-15
4 1.2422194 2010-01-19
5 -1.0654438 2010-01-20
6 -1.9126605 2010-01-21
Plotting:
plot(df$Date, df$RETURNS)
or
library(ggplot2)
ggplot(df, aes(x=Date, y=RETURNS)) + geom_point() + scale_x_date()

Assuming that the input is as in the Note below then using zoo we can plot using classic graphics, ggplot2 and lattice as follows. We also show a base R solution at the end and a variation. Note that since the time is always 0 we used "Date" class for the time index in all cases.
library(zoo)
z <- zoo(df$RETURNS, as.Date(rownames(df)))
plot(z)
library(ggplot2)
autoplot(z)
library(lattice)
xyplot(z)
# this one does not use any packages
df2 <- data.frame(time = as.Date(rownames(df)), RETURNS = df$RETURNS)
plot(RETURNS ~ time, df2)
# this also works using df2 just calculated
plot(df2)
Note: We assume the input is:
df <- data.frame(
RETURNS = c(0.8291384, 0.2423567,-1.0882186, 1.2422194,-1.0654438,-1.9126605),
row.names = c("2010-01-13 00:00:00", "2010-01-14 00:00:00", "2010-01-15 00:00:00",
"2010-01-19 00:00:00","2010-01-20 00:00:00","2010-01-21 00:00:00"))

Related

Converting character to dates with hours and minutes

I'm having trouble converting character values into date (hour + minutes), I have the following codes:
start <- c("2022-01-10 9:35PM","2022-01-10 10:35PM")
end <- c("2022-01-11 7:00AM","2022-01-11 8:00AM")
dat <- data.frame(start,end)
These are all in character form. I would like to:
Convert all the datetimes into date format and into 24hr format like: "2022-01-10 9:35PM" into "2022-01-10 21:35",
and "2022-01-11 7:00AM" into "2022-01-11 7:00" because I would like to calculate the difference between the dates in hrs.
Also I would like to add an ID column with a specific ID, the desired data would like this:
ID <- c(101,101)
start <- c("2022-01-10 21:35","2022-01-10 22:35")
end <- c("2022-01-11 7:00","2022-01-11 8:00")
diff <- c(9,10) # I'm not sure how the calculations would turn out to be
dat <- data.frame(ID,start,end,diff)
I would appreciate all the help there is! Thanks!!!
You can use lubridate::ymd_hm. Don't use floor if you want the exact value.
library(dplyr)
library(lubridate)
dat %>%
mutate(ID = 101,
across(c(start, end), ymd_hm),
diff = floor(end - start))
start end ID diff
1 2022-01-10 21:35:00 2022-01-11 07:00:00 101 9 hours
2 2022-01-10 22:35:00 2022-01-11 08:00:00 101 9 hours
The base R approach with strptime is:
strptime(dat$start, "%Y-%m-%d %H:%M %p")
[1] "2022-01-10 09:35:00 CET" "2022-01-10 10:35:00 CET"

Filling not observed observations

I want to make a time series with the frequency a date and time is observed. The raw data looked something like this:
dd-mm-yyyy hh:mm
28-2-2018 0:12
28-2-2018 11:16
28-2-2018 12:12
28-2-2018 13:22
28-2-2018 14:23
28-2-2018 14:14
28-2-2018 16:24
The date and time format is in the wrong way for R, so I had to adjust it:
extracted_times <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
I ordered the data with frequency in a table using the following code:
timeserieswithoutzeros <- table(extracted_times)
The data looks something like this now:
2018-02-28 00:11:00 2018-02-28 01:52:00 2018-02-28 03:38:00
1 2 5
2018-02-28 04:10:00 2018-02-28 04:40:00 2018-02-28 04:45:00
2 1 1
As you may see there are a lot of unobserved dates and times.
I want to add these unobserved dates and times with the frequency of 0.
I tried the complete function, but the error states that it can't best used, because I use as.POSIXct().
Any ideas?
As already mentinoned in the comments by #eric-lecoutre, you can combine your observations with a sequence begining at the earliest ending at the last date using seq and subtract 1 of the frequency table.
timeseriesWithzeros <- table(c(extracted_times, seq(min(extracted_times), max(extracted_times), "1 min")))-1
Maybe the following is what you want.
First, coerce the data to class "POSIXt" and create the sequence of all date/time between min and max by steps of 1 minute.
bedrijf.CSV$viewed_at <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
new <- seq(min(bedrijf.CSV$viewed_at),
max(bedrijf.CSV$viewed_at),
by = "1 mins")
tmp <- data.frame(viewed_at = new)
Now see if these values are in the original data.
tmp$viewed <- tmp$viewed_at %in% bedrijf.CSV$viewed_at
tbl <- xtabs(viewed ~ viewed_at, tmp)
sum(tbl != 0)
#[1] 7
Final clean up.
rm(new, tmp)

convert lap times from factor to Time with decimals in R

i have got a set of F1 lap times that in minutes and seconds. the seconds have tenth, hundred and thousands of a second and i need to keep it that way as i am trying to graph this data but R classes it as a factor. i have tried to convert it to numeric(as.numeric) but that does not seem to work.
here is a data sample:
DriverId Q1 Q2 Q3
1 1:05.585 1:04.772 1:04.293
20 1:04.655 1:07.354 1:04.123
ggplotdoes not let me graph the data properly if it is not in the right format.
i have tried using the chron and POSIXlt packages with no avail as it seems that they do not support the mm:ss.000 format
help please!
You can use strptime, where %OS represents the seconds and their fractional parts. Example:
x<-as.factor(c("1:04.293",'1:04.772','1:04.890'))
options(digits.secs=3)
x<- strptime(x, "%M:%OS")
Output:
"2018-01-13 00:01:04.293 CET" "2018-01-13 00:01:04.772 CET" "2018-01-13 00:01:04.890 CET"
Return only minutes and seconds (as character), for example for tick labels in a plot:
format(x,"%M:%OS3")
[1] "01:04.293" "01:04.772" "01:04.890"
So an example with plotting:
plot(x,c(1,2,3),xaxt='n')
at <- axTicks(1)
axis(1, at=at, labels=format(as.POSIXct(at,origin='1970-1-1'),"%M:%OS3"), las=1)
Output:
Hope this helps!
We can create a times class object with times from chron
library(chron)
library(tidyr)
library(dplyr)
gather(df1, key, val, -DriverId) %>%
mutate(val = chron::times(paste0("00:", val)))
In the above, we converted it to 'long' format. If we don't need to reshape, then
df1 %>%
mutate_at(vars(-DriverId), funs(chron::times(paste0("00:", .))))
# DriverId Q1 Q2 Q3
#1 1 00:01:06 00:01:05 00:01:04
#2 20 00:01:05 00:01:07 00:01:04

Adding missing rows

The format of my excel data file is:
day value
01-01-2000 00:00:00 4
01-01-2000 00:01:00 3
01-01-2000 00:02:00 1
01-01-2000 00:04:00 1
I open my file with this:
ts = read.csv(file=pathfile, header=TRUE, sep=",")
How can I add additional rows with zero number in column “value” into the data frame. Output example:
day value
01-01-2000 00:00:00 4
01-01-2000 00:01:00 3
01-01-2000 00:02:00 1
01-01-2000 00:03:00 0
01-01-2000 00:04:00 1
This is now completely automated in the padr package. Takes only one line of code.
original <- data.frame(
day = as.POSIXct(c("01-01-2000 00:00:00",
"01-01-2000 00:01:00",
"01-01-2000 00:02:00",
"01-01-2000 00:04:00"), format="%m-%d-%Y %H:%M:%S"),
value = c(4, 3, 1, 1))
library(padr)
library(dplyr) # for the pipe operator
original %>% pad %>% fill_by_value(value)
See vignette("padr") or this blog post for its working.
I think this is a more general solution, which relies on creating a sequence of all timestamps, using that as the basis for a new data frame, and then filling in your original values in that df where applicable.
# convert original `day` to POSIX
ts$day <- as.POSIXct(ts$day, format="%m-%d-%Y %H:%M:%S", tz="GMT")
# generate a sequence of all minutes in a day
minAsNumeric <- 946684860 + seq(0,60*60*24,by=60) # all minutes of your first day
minAsPOSIX <- as.POSIXct(minAsNumeric, origin="1970-01-01", tz="GMT") # convert those minutes to POSIX
# build complete dataframe
newdata <- as.data.frame(minAsPOSIX)
newdata$value <- ts$value[pmatch(newdata$minAsPOSIX, ts$day)] # fill in original `value`s where present
newdata$value[is.na(newdata$value)] <- 0 # replace NAs with 0
Try:
ts = read.csv(file=pathfile, header=TRUE, sep=",", stringsAsFactors=F)
ts.tmp = rbind(ts,list("01-01-2000 00:03:00",0))
ts.out = ts.tmp[order(ts.tmp$day),]
Notice that you need to force load the strings in first column as character and not factors otherwise you will have issue with the rbind. To get the day column to be a factor after than just do:
ts.out$day = as.factor(ts.out$day)
Tidyr offers the nice complete function to generate rows for implicitly missing data. I use replace_na to turn NA values to 0 in second step.
ts%>%
tidyr::complete(day=seq.POSIXt(min(day), max(day), by="min"))%>%
dplyr::mutate(value=tidyr::replace_na(value,0))
Notice that I set the granularity of the dates to minutes since your dataset expects a row every minute.

R Rounding a Time

I have a data frame with a series of times in the following format:
08:09:23.079
> class(timer3)
[1] "factor"
I would like to round/convert them to this format:
08:09
The end goal is to use them as values for the x-axis of a plot so I assume they would need to go to some type of time format (zoo, as.Date, etc.).
Any suggestions?
Suppose we have this input data:
DF <- data.frame(times = c("08:09:23.079", "08:30:13.062"), values = 1:2)
To keep it simple lets assume that there is at most one time point per minute (we show an alternative that is slightly longer afterwards without this restriction):
library(zoo)
library(chron)
# this assumes we want to store times to the second
tt <- times(as.character(DF$times))
z <- zoo(DF$values, tt)
plot(z, xaxt = "n")
# custom axis - assumes sufficiently many points to get reasonable graph
# round tick mark locations to the minute and remove the seconds from label
axt <- trunc(times(axTicks(1)), "min")
axis(1, at = axt, lab = sub(":..$", "", axt))
The above method of creating z could alternately be replaced with this. It works whether or not there is more than one point per minute as it aggregates them to the minute:
# with this z we will be store times to the minute
z <- read.zoo(DF, FUN = function(x) trunc(times(as.character(x)), "min"),
aggregate = mean)
EDIT: plotting and truncation.
At risk of being called necromancer, I will answer this question as I think this situation arises quite often.
Here is how to do it if you convert your timeseries data in xts format. The function to be used here is align.time
> head(GBPJPY)
GBPJPY.Open GBPJPY.High GBPJPY.Low GBPJPY.Close
2009-05-01 00:14:59 146.387 146.882 146.321 146.620
2009-05-01 00:29:54 146.623 146.641 146.434 146.579
2009-05-01 00:44:59 146.579 146.908 146.570 146.810
2009-05-01 00:59:59 146.810 146.842 146.030 146.130
2009-05-01 01:14:59 146.130 146.330 146.100 146.315
2009-05-01 01:29:57 146.315 146.382 146.159 146.201
> head(align.time(GBPJPY, 15*60))
GBPJPY.Open GBPJPY.High GBPJPY.Low GBPJPY.Close
2009-05-01 00:15:00 146.387 146.882 146.321 146.620
2009-05-01 00:30:00 146.623 146.641 146.434 146.579
2009-05-01 00:45:00 146.579 146.908 146.570 146.810
2009-05-01 01:00:00 146.810 146.842 146.030 146.130
2009-05-01 01:15:00 146.130 146.330 146.100 146.315
2009-05-01 01:30:00 146.315 146.382 146.159 146.201
as.zoo(sapply(timer3,substring,1,5))
or as.xts?
Maybe looking at a bigger sample of your data would help.
Two steps: 1) Factor to character: as.character() 2) character to POSIXct: strptime()

Resources