how, in R, to plot time of day versus calendar date - r

I regret that I come to post here after being burned-out on hours of Internet searching regarding this simplistic question.
I have several data sets to plot in R, each consisting of two columns of data: time, date. I am using R 2.11.0 on a Windows computer, via the Rgui.
Time is "time of day" that an event is observed. As an example, it is recognized as:
Factor w/ 87 levels "5:53","5:54",..: 84 85 85 85 86 ...
Date is calendar date, recognized as:
Class 'Date' num [1:730] 13879 13880 13881 13882 13883 ...
The time values are recorded in the format of a 24-hr clock, h:mm or hh:mm. The date values are displayed yyyy-mm-dd.
I want to plot time (y-axis) vs. date (x-axis).
Using
plot(date,time)
gives an accurate-looking plot, but the y-axis is labeled as the numeric factor values (about 0 to 90), rather than the desired, temporally-ordered levels of the factor variable. The x-axis is labeled in the desired, human-readable format.
How can I correct this? Is there a "time of day" format in R that I can convert my "time" variable into? I will subsequently like to do arithmetic on the time values as well, and would not mind having to carry one column of values to use in plotting and one column of values for maths.
I ran across several examples online of manipulation of (date + time) variables in R, and converting those to different formats. I do not believe this is my problem, as I have separate fields for time and date and want to plot one against the other.
My thanks to you in advance for your suggestions, or your directions to a web-accessible resource (no appropriate libraries or bookstores at my location).

There may be an easier way to do this, but you can always label the y-axis yourself. Adjust the ticksAt vector below to find something that looks suitable for your data.
Data <- data.frame(date=Sys.Date()+1:10,time=paste(5,41:50,sep=":"))
with(Data, plot(date,time,yaxt="n"))
ticksAt <- c(1,3,5,7,9)
axis(2, at=ticksAt, labels=as.character(Data$time)[ticksAt])
?plot.zoo has some good examples of how to create pretty axis annotations, though some of them may be zoo-specific. ?par is also a good resource.

ts and timeSeries are two good choices.
Take a look at Related

Let's assume you have two vectors, one of Date class named "dt" and the other a factor named "tm":
x <- paste(as.character(dt[1:2]), as.character(tm))
strptime(x, "%Y-%m-%d %H:%M")
## [1] "2008-01-01 05:53:00" "2008-01-02 05:54:00"
class(strptime(x, "%Y-%m-%d %H:%M"))
## [1] "POSIXt" "POSIXlt"

Related

Comparing dates in a dataframe and appending info based on comparison result in R

so I am lost with the following problem:
I have a dataframe, in which one column contains (STARTED) the starting time of a survey, and several others information of the survey schedule of that survey participant (D5 to D10: only the planned survey dates, D17 to D50: planned send-out times of measurement per day). I'd like to create to columns that indicate now which survey day (1-6) and which measurement per day (1-6) this survey corresponds to.
First problem is the format (!)...
STARTED has the format %Y-%m-%d %H:%M:%S, D5 to D10 %d.%m.%Y and D17 to D50 %d.%m.%Y %H:%M.
I tried dmy_hms() from lubridate, parse_date_time(), and simply as.POSIXct(), but I always fail to get STARTED and the D17 to D50 section into a comparable format. Any solutions on this one?
After just separating STARTED into date & time columns, I was able to compare using ifelse() with D5 to D10 and to create the column of day running from 1 to 6.
This might be already more elegant with something like which(), but I was not able to create a vectorized version of this, as which(<<D5:D10>> == STARTED) would need to compare that per row. Does anyone have a solution for this?
And lastly, how on earth can I set up the second column indicating the measurement time? The first and last survey of the is easy, as there are also uniquely labelled, but for the other four ones I would need to compare per day whether the starting time is before the planned survey time of the following survey. I could imagine just checking whether STARTED falls in between two planned survey times just next to each other - as a POSIXct object that might work, if I can parse the different formats.
Help is greatly appreciated, thanks!
A screenshot from the beginning of the data:
Screenshot from R data using View()
For these first few rows, the intended variable day would need to be c(1,2,1,1,1,2,2) and measurement c(3,2,4,2,1,2,3).
Your other columns are not formatted with %d.%m.%Y, instead either %d.%m.%t (date only) or %d.%m.%y %H:%M. Note the change from %Y to %y.
Try:
as.Date("20.05.22", format = "%d.%m.%y")
# [1] "2022-05-20"
as.POSIXct("20.05.22 06:00", format = "%d.%m.%y %H:%M")
# [1] "2022-05-20 06:00:00 EDT"

R date variable operation

I have a string/character variable contains a calendar date, eg,
x <- "2018-10-31"
I also have a variable y contains time, say 200 days.
y <- 200
How do I find out the calendar date for x + y?
I am not familiar with date type in R and struggle with how to approach this.
An add-on question, would this calculation be different if y = 4.3 months? Of course I can convert this into days, though wonder if there is more direct way to handle months without converting.
You could utilise the lubridate package, which is specifically designed for handling date time data.
library(lubridate)
x <- ymd("2018-10-31")
x + days(200)
[1] "2019-05-19"
lubridate works with 'period' objects, which require integers, so you would need to convert "4.3" months into something interpretable beforehand. "4.3" doesn't mean anything concrete in terms of date-time calculation anyways.

How Can I Create a Distribution Visualization for an Average Day?

In R, my dataframe ("sampledata") looks like this:
The timestamp column is POSIXct, format: "2018-10-01 00:03:23"
The state column is Factor w/ 3 levels "AVAILABLE", "MUST_NOT_RUN", "MUST_RUN"
There are 6 unique device_id. The timestamps for each device are not the same, meaning data was not always collected at the same minute for each device. In some cases, there are multiple records per minute for the same device.
I want to transform the data into a visualization that shows distribution of "state" across a "typical" day. Ideally, something like this:
I've tried to count each occurrence of "state" grouped by timestamp minutes but failed (Error: can't sum factors). I've been trying to use ggplot and geom_area for the visualization, but believe I need to restructure my data before it will work. Very new to R (obviously). Happy to read any tutorials or links provided as background and appreciate any help you can provide. Thanks!
Other information that may/may not be helpful:
There are a handful of columns in the dataframe not shown.
223,446 entries between 10/2/18 - 11/8/18.
You can take the hours from the timestamps and then compute proportions of your states by hour:
library(ggplot2)
library(plyr)
#get hours from timestamp
obj$hour <- as.POSIXlt(obj$timestamp)$hour
#get average state proportions per hour
plot_obj <- ddply(obj,.(hour), #take data.frame "obj" and group by "hour"
function(x) with(x,
data.frame(100*table(state)/length(state))))
ggplot(plot_obj, aes(x=hour,y=Freq,fill=state)) +
geom_area()

R - Datetimes with ggplot

What is the correct way to deal with datetimes in ggplot ?
I have data at several different dates and I would like to facet each date by the same time of day, e.g. between 1:30PM and 1:35PM, and plot the points between this time frame, how can I achieve this?
My data looks like:
datetime col1
2015-01-02 00:00:01 20
... ...
2015-01-02 11:59:59 34
2015-02-19 00:00:03 12
... ...
2015-02-19 11:59:58 27
I find myself often wanting to ggplot time series using datetime objects as the x-axis but I don't know how to use times only when dates aren't of interest.
The lubridate package will do the trick. There are commands you could use, specifically floor_date or ceiling_date to transform your datetime array.
I always use the chron package for times. It completely disregards dates and stores your time numerically (e.g. 1:30PM is stored as 13.5 because it's 13.5 hours into the day). That allows you to perform math on times, which is great for a lot of reasons, including calculating average time, the time between two points, etc.
For specific help with your plot you'll need to share a sample data frame in an easily copy-able format, and show the code you've tried so far.
This is a question I'd asked previously regarding the chron package, and it also gives an idea of how to share your data/ask a question that's easier for folks to reproduce and therefore answer:
Clear labeling of times class data on horizontal barplot/geom_segment

How to produce a scatter plot of dates vs magnitudes in R?

This is what i have done so far but its wrong.
earthquakes<- c(6.6,6.8,8.4)
dates <- (13/02/2001 ,28/02/2001,23/06/2001)
plot(earthquakes,dates)
I have only started learning R. Please help.
earthquakes<- c(6.6,6.8,8.4)
dates <- as.Date(c("13/02/2001", "28/02/2001", "23/06/2001"), format="%d/%m/%Y")
plot(dates, earthquakes)
You had a few issues:
Dates should be in quotes (otherwise R will think you're trying to do arithmetic (i.e. 13 divided by 02 divied by 2001)
To convert dates to actual date objects, use as.Date, pass a vector of dates (this is the c(... part), and then specify the format that they are in so that R knows what to do with the strings
you had x and y swapped
Note, the as.Date step is not strictly necessary, but if you don't do that, then the x axis of the plot will plot every item equidistant, irrespective of how far apart the dates actually are in time.

Resources