I'm trying to bin time data into 1 min intervals throughout the day. Based on the following How to bin times from different days into time bins, I've managed to generate the breaks using the following:
breaks <- format(seq.POSIXt(from = strptime("00:00:00", format = "%H:%M:%S"), to = strptime("23:59:59", format = "%H:%M:%S"), by="1 min"), format = "%H:%M:%S")
However, once I try to use cut, NAS get coerced instead
data$bins <- cut(data$TIME, breaks = breaks)
Warning message:
In sort.int(as.double(breaks)) : NAs introduced by coercion
I've used the chron package to convert my data (hh:mm:ss format) into a time object and other operations seem to function properly.
Thanks!
Related
I have a table of maximum trip lengths by month which I am trying to graph in R ,
While trying to graph it, the X-axis does not graph according to the month, instead it graphs it alphabetically
I'm just getting started in R and I used the following code from what one of the videos I watched adjusted for my table names:
max_trips <- read.csv("max_and_min_trips.csv")
ggplot(data=max_trips)+
geom_point(mapping = aes(x=month,y=max_trip_duration))+
scale_x_month(month_labels = "%Y-%m")
The simple answer is that the data for your "month" column is stored as a vector of strings, not as a date. In R, this data type is called a "character" (or chr). You can confirm this by typing class(max_trips$month). The result is certainly "character" in your console. Therefore, your solution would be to (1) convert the data type to a date and (2) adjust the formatting of the date on the x axis using scale_x_date and/or related functions.
I'll demonstrate the process with a simple example dataset and plot. Here's the basic data frame and plot. You'll see, the plot is again arranged "alphabetically" instead of as expected if the mydf$dates values were stored as dates in "month/year" format.
library(lubridate)
mydf <- data.frame(
dates = c("1/21", "2/20", "12/21", "3/19", "10/19", "9/19"),
yvals = c(13, 31, 14, 10, 20, 18))
ggplot(mydf, aes(x = dates, y = yvals)) + geom_point()
Convert to Date
To convert to a date, you can use a few different functions, but I find the lubridate package particularly useful here. The as_date() function will be used for the conversion; however, we cannot just apply as_date() directly to mydf$dates or we will get the following error in the console:
> as_date(mydf$dates)
[1] NA NA NA NA NA NA
Warning message:
All formats failed to parse. No formats found.
Since there are so many variety of ways you can format data which correspond to dates, date times, etc, we need to specify that our data is in "month/year" format. The other key here is that data setup as a date must specify year, month and day. Our data here is just specifying month and year, so we will first need to add a random "day" to each date before converting. Here's something that works:
mydf$dates <- as_date(
paste0("1/", mydf$dates), # need to add a "day" to correctly format for date
format = "%d/%m/%y" # nomenclature from strptime()
)
The paste0(...) function serves to add "1/" before each value in mydf$dates and then the format = argument specifies the character values should be read as "day/month/year". For more information on the nomenclature for formats of dates, see the help for the strptime() function here.
Now our column is in date format:
> mydf$dates
[1] "2021-01-01" "2020-02-01" "2021-12-01" "2019-03-01" "2019-10-01" "2019-09-01"
> class(mydf$dates)
[1] "Date"
Changing Date Scale
When plotting now, the data is organized in the proper order along a date scale x axis.
p <- ggplot(mydf, aes(x = dates, y = yvals)) + geom_point()
p
If the labeling isn't quite what you are looking for, you may check the documentation here for the scale_x_date() function for some suggestions. The basic idea is to setup the arguments for breaks= in your scale and how they are labeled with date_labels=.
p + scale_x_date(breaks="4 months", date_labels = "%m/%y")
In the OP's case, I would suggest the following code should work:
library(lubridate)
max_trips <- read.csv("max_and_min_trips.csv")
max_trips$month <- as_date(
paste0("1/", max_trips$month),
format = "%d/%m/%y")
ggplot(data=max_trips)+
geom_point(mapping = aes(x=month,y=max_trip_duration))+
scale_x_date(breaks = "1 month", date_labels = "%Y-%m")
I'm calculate the difference in seconds of two consecutive row with the following code
set.seed(79)
library(outbreaks)
library(lubridate)
# Import data
disease_df <- measles_hagelloch_1861[, 3, drop = FALSE]
# Generate a random time for each day
disease_df$time <- sample(1:86400, nrow(disease_df), replace = TRUE)
disease_df$time <- hms::as.hms(disease_df$time)
# Combine date and time
disease_df$time1 <- with(disease_df, ymd(date_of_prodrome) + hms(time))
# Sort data
disease_df <- disease_df[order(disease_df$time1), ]
# Difference in days of two consecutive row
disease_df$diff <- as.numeric(difftime(disease_df$date_of_prodrome,
dplyr::lag(disease_df$date_of_prodrome, 1), units = 'days'))
# Difference in seconds of two consecutive row
disease_df$diff1 <- as.numeric(difftime(disease_df$time1,
dplyr::lag(disease_df$time1, 1), units = 'secs'))
Here is the resulted dataframe
and error message longer object length is not a multiple of shorter object length.
Could you please explain why difftime works fine for days but results in error for seconds? Thank you so much!
time1 column is of type "POSIXlt". I am not really sure why difftime with units = 'secs' doesn't work but if you convert it to POSIXct, it works without any error.
disease_df$time1 <- as.POSIXct(disease_df$time1)
disease_df$diff1 <- as.numeric(difftime(disease_df$time1,
dplyr::lag(disease_df$time1, 1), units = 'secs'))
Apparently dplyr was not happy wth the line: dplyr::lag(disease_df$time1, 1) because of the format of disease_df$time1.
Converting it to POSIXct works, so just update this part of your code:
# Combine date and time and convert to POSIXct
disease_df$time1 <- as.POSIXct(with(disease_df, ymd(date_of_prodrome) + hms(time)))
I would like to use ggplot to graph portions of time series data. For example, say I only wanted to graph the last five dates of this data. Is there away to specify this in ggplot without subsetting the data ahead of time? I tried using xlim, but it didn't work.
date <- c("2016-03-24","2016-03-25","2016-03-26","2016-03-27","2016-03-28",
"2016-03-29","2016-03-30","2016-03-31","2016-04-01","2016-04-02")
Temp <- c(35,34,92,42,21,47,37,42,63,12)
df <- data.frame(date,Temp)
My attempt:
ggplot(df) + geom_line(aes(x=date,y=Temp)) + xlim("2016-03-29","2016-04-02")
My dates are formatted as POSIXct.
You have to enter the xlim values as as.Date or as.POSIXct(). Is this what you want?
df$date <- as.Date(df$date, format= "%Y-%m-%d", tz = "UTC")
ggplot(df) + geom_line(aes(x=date,y=Temp)) +
xlim(as.Date(c("2016-03-30", "2016-04-02"), tz = "UTC", format = "%Y-%m-%d") )
PS: Be aware that you will get the following warning:
Warning message:
Removed 5 rows containing missing values (geom_path)
I have a vector of asset returns without dates in each row.
Is there a similar method as chart.CumReturns from package PerformanceAnalytics that does not require having to have a vector, dataframe etc. which is a time-based object (I do not have dates in rows).
If you want to keep all the functionality of chart.CumReturns and appearance of plots generated by the function, you may create fake dates, convert the vector to a format that chart.CumReturns accepts (e.g. xts or zoo), and then plot using chart.CumReturns with the fake x axis removed. It seems that chart.CumReturns does not handle order.by = index(x), thus you need a 'real' date.
library(PerformanceAnalytics)
library(xts)
# an example vector
vec <- coredata(edhec)[ , "Funds of Funds"]
# create fake dates, e.g.:
date <- seq(Sys.Date(), by = "1 month", length.out = length(vec))
# convert to xts (or zoo) object
xt <- xts(x = vec, order.by = date)
# plot without fake x axis
chart.CumReturns(xt, main = "Cumulative Returns", xaxis = FALSE)
I have date that looks like this:
"date", "sunrise"
2009-01-01, 05:31
2009-01-02, 05:31
2009-01-03, 05:33
2009-01-05, 05:34
....
2009-12-31, 05:29
and I want to plot this in R, with "date" as the x-axis, and "sunrise" as the y-axis.
You need to work a bit harder to get R to draw a suitable plot (i.e. get suitable axes). Say I have data similar to yours (here in a csv file for convenience:
"date","sunrise"
2009-01-01,05:31
2009-01-02,05:31
2009-01-03,05:33
2009-01-05,05:34
2009-01-06,05:35
2009-01-07,05:36
2009-01-08,05:37
2009-01-09,05:38
2009-01-10,05:39
2009-01-11,05:40
2009-01-12,05:40
2009-01-13,05:41
We can read the data in and format it appropriately so R knows the special nature of the data. The read.csv() call includes argument colClasses so R doesn't convert the dates/times into factors.
dat <- read.csv("foo.txt", colClasses = "character")
## Now convert the imported data to appropriate types
dat <- within(dat, {
date <- as.Date(date) ## no need for 'format' argument as data in correct format
sunrise <- as.POSIXct(sunrise, format = "%H:%M")
})
str(dat)
Now comes the slightly tricky bit as R gets the axes wrong (or perhaps better to say they aren't what we want) if you just do
plot(sunrise ~ date, data = dat)
## or
with(dat, plot(date, sunrise))
The first version gets both axes wrong, and the second can dispatch correctly on the dates so gets the x-axis correct, but the y-axis labels are not right.
So, suppress the plotting of the axes, and then add them yourself using axis.FOO functions where FOO is Date or POSIXct:
plot(sunrise ~ date, data = dat, axes = FALSE)
with(dat, axis.POSIXct(x = sunrise, side = 2, format = "%H:%M"))
with(dat, axis.Date(x = date, side = 1))
box() ## complete the plot frame
HTH
I think you can use the as.Date and as.POSIXct functions to convert the two columns in the proper format (the format parameter of as.POSIXct should be set to "%H:%M")
The standard plot function should then be able to deal with time and dates by itself