removing date from %d/%m/%Y %H:%M in R - r

The r code that I am working on is supposed to use the data collected in every five minute intervals.
The data is saved in csv format. However, due to inconsistency in the data collected, the time column in the data sometimes represent timestamp instead of just time.(dd/mm/yyyy HH:MM, instead of HH:MM)
This causes an error to my system as the system reads the data as having multiple different values for the same time value. Therefore, I would like to omit the date format from the timestamp such that the code would only read the time value.
My failed attempt was:
as.Date(data[[1]],"%H:%M")
which gave me all NA values for the time column.
I have searched for similar questions in SO, but I did not manage to find a clear answer to my question. Can anyone suggest me some possible functions to use?
I appreciate your help.

You could just strip the date portion of the text and then use as.POSIXct to convert them all to a %H:%M timestamp, e.g.:
x <- c("10:25","01/01/2014 10:30")
x <- gsub("^.+(\\d{2}:\\d{2})$","\\1",x)
as.POSIXct(x,format="%H:%M",tz="UTC")
#[1] "2014-06-02 10:25:00 UTC" "2014-06-02 10:30:00 UTC"

Related

How do I stop implicit date conversion when using ifelse with date time data? [duplicate]

This question already has answers here:
How to prevent ifelse() from turning Date objects into numeric objects
(7 answers)
Closed 4 years ago.
I have a data frame that contains one column that is a series of dates, collected via a Google form. The date and time were collected separately. The data was entered by selecting a day from a calendar, and the date was entered manually - should have been a 24-hour clock, but the field appears to have just checked that the hour and minute were in the correct range.
I've read the file in from .csv . I converted the date time character field (as read in from the .csv) to a date time format in a new variable by using as.POSIXct(foo$When, tz="NZ", format="%Y-%m-%d %H:%M"). The dates and times were correctly constructed.
Except: I have some incorrect date/time entries in the original data. These have all been set to NA in the new field, as you expect. For those that do include a time, I have been trying to fix them while still retaining a POSIXct format.
I have been unsuccessful.
Here is an example of the data I have, and what I have tried to do:
TestDataForHelp <- data.frame(OldDateTime =
c("2013-12-04 21:10", "2013-12-15 09:07", "2014-01-01 06:27",
"2014-11-02 21:15", "2014-11-07 23:00", "2015-01-04 21:42",
"201508-11-02 20:15", "201508-11-02 20:15", "2017-11-02"))
TestDataForHelp$ActualDateTime <-
as.POSIXct(TestDataForHelp$OldDateTime, tz="NZ", format="%Y-%m-%d %H:%M")
TestDataForHelp$FixedDateTime <-
ifelse(TestDataForHelp$OldDateTime=="201508-11-02 20:15",
as.POSIXct("2015-11-02 20:15", tz="NZ", format="%Y-%m-%d %H:%M"),
TestDataForHelp$ActualDateTime)
The new variable, FixedDateTime, does not have a POSIXct type. It has been implicitly converted to a numeric type. How can I retain the POSIXct format from ActualDateTime and not have the implicit type conversion?
I would like to not have FixedDateTime but, rather, put the corrected data into ActualDateTime. The ifelse() seems to be the part of the code causing the format to shift from POSIXct to numeric. If I do:
TestDataForHelp$CopiedDateTime <- TestDataForHelp$ActualDateTime
The new variable, that is simply a copy of the original, retains the POSIXct type.
The previous question linked in the comments relates to date values only, not date time values. The data manipulation becomes more complicated with dealing with date time values, given that mine also do not include seconds. The other difference is that the original variable contains a mix of date, date-time, and incorrect date-time values, whereas that previous question had values that were all the same. It was unclear whether the non-uniform content of the variable was causing the problem.
Edit: I fixed the problem by fixing the strings before I converted them to dates. This removed the need to try to loop through the dates.
I can replicate the numeric answer, but not explain it. It is however calculating the results correctly for you. I'm not sure why it's returning as a numeric. However, the conversion from numeric to date is easy enough if you know the origin, which should be 1970-01-01. So I believe the following does the trick:
(Note, the first block is just what you already have)
TestDataForHelp$FixedDateTime <- ifelse(TestDataForHelp$OldDateTime=="201508-11-02 20:15",
as.POSIXct("2015-11-02 20:15", tz="NZ", format="%Y-%m-%d %H:%M"),
TestDataForHelp$ActualDateTime)
TestDataForHelp$FixedDateTime <- as.POSIXct(TestDataForHelp$FixedDateTime,
origin = as.POSIXct("1970-01-01", tz="NZ"))

Import separate date and time (hh:mm) excel columns, to use for time elapsed calculation

Newbie here, first post (please be gentle). I have been trying to resolve this for several hours, so finally decided time to ask advice.
I have a large spreadsheet which I am importing with readxl. It contains one column with date (format dd/mm/yyyy) and several time columns in format hh:mm as can be seen: excel
Essentially I want to be able to import both time and date columns and combine them, so that I can then do some other calculations, like time elapsed.
If I import letting R guess the col-types, it converts the times to POSIXct, but these then have a date on 1899 attached to them: R_POSIXct
If I force readxl to assign the time column to numeric, I get a decimal (e.g. 0.315972222 for 07:35), which then tried converting using similar syntax to
format(as.POSIXct(Sys.Date() + 0.315972222), "%Y-%m-%d %H:%M:%S", tz="UTC")
i.e.
df$datetime <- format(as.POSIXct(df$date + df$time), "%Y-%m-%d %H:%M", tz="UTC")
which results in the correct date, but with a time of 00:00, not the time it is passed.
I have tried searching here and found posts to be not quite the same question (e.g. Combining date and time columns into dd/mm/yyyy hh:mm), and have read widely, including about about lubridate, but as I'm only 6 months into R, am finding some explanations a bit cryptic.
Suggestions or ignposting appreciated (if there are solutions I haven't found)
If you subtract the number of days between 1899-01-01 and 1970-01-01 and then multiply that (shifted) Excel numeric value by 3600 you should come close to the number of seconds since start of 1970. You could then convert to POSIXct with as.POSIXct( x, origin="1970-01-01"). That does seem to be "the hard way", however
It would be far easier and probably more accurate to convert the date-times to YYYY-MM-DD H:M:S format and then export as csv to be imported into R as text. There is a "POSIXct" colClasses argument to read.csv, although it doesn't handle separate columns of date and time. For that you would be advised to import as character values and then paste the dates and times. Then watch you format strings for as.POSIXct. The dd/mm/yyyy "format" would be specified by "%d/%m/%Y".

Converting from fctr to date format.

I am attempting to convert a column in my data set from fctr to date format. The current column has data formatted as follows: "01/01/14. 01:00 Am." Ideally I would like to create a column for day and then a column for time as well. There are periods following the day and the time which is another issue I am facing. So far I have attempted to use lubridate to create a new column of data but I get the error "All formats failed to parse. No formats found." Any help would be greatly appreciated, thank you.
test <- fourteen %>%
mutate(When = mdy_hms(V3))
View(test)
If your date factor literally has levels that look like 01/01/14. 01:00 Am. including two periods and a space between the first period and the first hour digits and a space between the minutes and the am/pm designation, and all the dates are in this format, then the following should work:
... mutate(When = as.POSIXct(V3, format="%m/%d/%y. %H:%M %p.")) ...
In particular, the following standalone testcase works fine:
as.POSIXct(factor("01/01/14. 01:00 Am."), format="%m/%d/%y. %H:%M %p.")
For more information on the format argument being used here, see the R help page for the function strftime.

Using R for a Date format of 07-JUL-16 06.05.54.000000 AM

I have 2 Date variables in a .csv file with formats of "07-JUL-16 06.05.54.000000 AM". I want to use these in a regression model. Should I be reading these into a data frame as factors or characters? How can I take a difference of the 2 dates in each case?
Read them in as characters (e.g. stringsAsFactors=FALSE or tidyverse functions), then use as.POSIXct, e.g.
as.POSIXct("07-JUL-16 06.05.54.000000 AM",format="%d-%b-%y %I.%M.%OS %p")
## [1] "2016-07-07 06:05:54 EDT"
(I'm assuming that you are intending a day-month-year format rather than a month-day-year format -- but actually I don't have any evidence to support that thought!)
Once you've done this, subtracting the values should just work (give you an object of difftime) -- but be careful with units when converting to numeric!
For what it's worth, lubridate::ymd_hms thinks it can guess the format, but guesses wrong (?? assuming I guessed right above: with a two-digit year, and without any year values greater than 31, there's really nothing to distinguish years and days ...)

How do I get my dates in standard unambiguous formats if they can't be recognized due to their ambiguous format?

I want to split up a Timestamp from Excel into year and julian day. I know, duplicate question, but combining everything I have found from other questions is not helping me.
The timestamp is formatted 1/13/2011 13:55 . So, I wanted to tell R to recognize this as a time variable. I have hours and minutes so I tried as.POSIXct and as.POSIXlt. These didn't work. I tried adding strptime --
as.POSIXct(strptime(df$TIMESTAMP, "%d/%m/%Y %H:%M%S"))
I just got NAs.
Once I got R to recognize it as a date, I was going to use lubridate like day(df$Date).
It seems as though you have month and day reversed
1/13/2011 13:55
with
"%d/%m/%Y %H:%M%S"
corresponds to the 1st day of the 13th month, which is probably why you're getting NAs. This seems to work for me:
a <- "01/13/2011 13:55"
t <- strptime(a, "%m/%d/%Y %H:%M")
t
"2011-01-13 13:55:00"

Resources