I have a vector that contains time data, but there's a problem: some of the entries are listed as dates (e.g., 10/11/2017), while other entries are listed as dates with time (e.g., 12/15/2016 09:07:17). This is problematic for myself, since as.Date() can't recognize the time portion and enters dates in an odd format (0012-01-20), while seemingly adding dates with time entries as NA's. Furthermore, using as.POSIXct() doesn't work, since not all entries are a combination of date with time.
I suspect that, since these entries are entered in a consistent format, I could hypothetically use an if function to change the entries in the vector to a consistent format, such as using an if statement to remove time entirely, but I don't know enough about it to get it to work.
use
library(lubridate)
Name of the data frame or table-> x
the column that has date->Date
use the ymd function
x$newdate<-ydm(x$Date)
Related
Lets say I have 2 files, each with a date column, e.g. 2022-11-14 and a timestamp, e.g. 18:36 column. No prior information is given on timezone this information was taken from. In my code, I create a new column, date_time_X corresponding to the file # where I concatenate date and time paste(as.character(date), timestamp).
I've discovered that base R's POSIXct default timezone is system-specific while lubridate's ymd_hds timezone defaults to UTC, so now I'm defaulting to applying ymd_hds(paste(as.character(date), timestamp)) across both files to keep consistent timezones, since I will be subtracting date_time_1 from date_time_2 to get a time difference diff_time = date_time_2 - date_time_1 .
My question is, what is the best approach to handle datetime variables from these different files when timezone is unknown? Should data be read in as UTC? Is there a way to completely remove the timezone component? I don't want to assume the data was collected at my local time, but I'm also not sure if defaulting to UTC is an acceptable approach. Ideally, removing the timezone would be the best option, but I'm not sure if this is possible without leaving the variables as character columns.
I am trying to gather a couple columns of dates so that its easier for it to be choices in shiny. However, when I gather dates, it turns into for example, 2020/12/14 to 128284 format. I have tried as.Date, as.character, I have tried lubridating but it doesn't work. (I have been gathering in a separate script besides shiny). Please see my code when gathering.
Here is my data
before gather
df<-df%>%gather(key="date.type", value="dates",
date.1, date.2, date.3, date.4)
This turns it to something like this;
after gather
This becomes a problem when I am trying to find difference between two dates in Shiny(I have been using difftime).
The error I get in shiny is:
x character string is not in a standard unambiguous format
I am also thinking of not gathering at all, but allowing the user to choose the from date column and to date column in the UI, but I am not sure how to then find the difference in days between the from and to dates in the server.
mutate(theduration=difftime(input$to,input$from,units="days")
This doesn't work.
OK, so I had this problem when using gather to make a dataset. When you get those 5 digit time character blocks, try this:
mutate(time=as.Date(as.numeric(time),origin="1899-12-30"))
Apparently, that that 5 digit number is days since the origin date. It's a MS thing. Good Luck!
I have time series data that I'm trying to analyse in R. It was provided as a CSV from excel, which I subsequently read as a data.frame all. Let's say it has two columns: all$date and all$people, representing the count of people on a particular date. The frequency is hence daily.
Being from Excel, the dates are integers representing the number of days since 1900-01-01.
I could read the data as people = ts(all$people, start=c(all$date[1], 1), frequency=365); but that gives a silly start value of almost 40000 because the data starts in 2006. The start parameter doesn't take a date object, according to ?ts, so I can't just use as.Date():
ts - ...
start: the time of the first observation. Either a single number
or a vector of two integers, which specify a natural time unit and
a (1-based) number of samples into the time unit. See the examples
for the use of the second form.
I could of course set start=1, but it's a bit painful to figure out what season we're in when the plot tells me interesting things are happening around day 2100. (To be clear, setting frequency=365 does tell me what year we're in, but isn't useful more precise dates). Is there a useful way of expressing the date in ts in a human-readable form so that I don't have to keep calling as.Date() to understand when the interesting features are happening?
I have a vector of times in R, all_symbols$Time and I am trying to find out how to get JUST the times (or convert the times to strings without losing information). I use
strptime(all_symbol$Time[j], format="%H:%M:%S")
which for some reason assumes the date is today and returns
[1] "2013-10-18 09:34:16"
Date and time formatting in R is quite annoying. I am trying to get the time only without adding too many packages (really any--I am on a school computer where I cannot install libraries).
Once you use strptime you will of necessity get a date-time object and the default behavior for no date in the format string is to assume today's date. If you don't like that you will need to prepend a string that is the date of your choice.
#James' suggestion is equivalent to what I was going to suggest:
format(all_symbol$Time[j], format="%H:%M:%S")
The only package I know of that has time classes (i.e time of day with no associated date value) is package:chron. However I find that using format as a way to output character values from POSIXt objects lends itself well to functions that require factor input.
In the decade since this was written there is now a package named “hms” that has some sort of facility for hours, minutes, and seconds.
hms: Pretty Time of Day
Implements an S3 class for storing and formatting time-of-day values, based on the 'difftime' class.
Came across the same problem recently and found this and other posts R: How to handle times without dates? inspiring. I'd like to contribute a little for whoever has similar questions.
If you only want to you base R, take advantage of as.Date(..., format = ("...")) to transform your date into a standard format. Then, you can use substr to extract the time. e.g. substr("2013-10-01 01:23:45 UTC", 12, 16) gives you 01:23.
If you can use package lubridate, functions like mdy_hms will make life much easier. And substr works most of the time.
If you want to compare the time, it should work if they are in Date or POSIXt objects. If you only want the time part, maybe force it into numeric (you may need to transform it back later). e.g. as.numeric(hm("00:01")) gives 60, which means it's 60 seconds after 00:00:00. as.numeric(hm("23:59")) will give 86340.
Basically I want to know why as.Date(200322,format="%Y%W") gives me NA. While we are at it, I would appreciate any advice on a data structure for repeated cross-section (aka pseudo-panel) in R.
I did get aggregate() to (sort of) work, but it is not flexible enough - it misses data on columns when I omit the missed values, for example.
Specifically, I have a survey that is repeated weekly for a couple of years with a bunch of similar questions answers to which I would like to combine, average, condition and plot in both dimensions. Getting the date conversion right should presumably help me towards my goal with zoo package or something similar.
Any input is appreciated.
Update: thanks for string suggestion, but as you can see in your own example, %W part doesn't work - it only identifies the year while setting the current day while I need to set a specific week (and leave the day blank).
Use a string as first argument in as.Date() and select a specific weekday (format %w, value 0-6). There are seven possible dates in each week, therefore strptime needs more information to select a unique date. Otherwise the current day and month are returned.
> as.Date(paste("200947", "0", sep="-"), format="%Y%W-%w")
[1] "2009-11-22"