Read csv file using read.csv() without losing milliseconds - r

I have a csv file with a timestamp column. The timestamps are in the format %Y-%m-%d %H:%M:%OS4 that is there is a milliseconds value also of 4 digits. When i read this csv using read.csv() I do not get the milliseconds but only till seconds in character format. How can I read the milliseconds also ?
Edit to add requires data and code:
mtc_data = read.csv(path/to/csv)
Notepad.pw link to data

After reading in with read.csv (where you may want to use option stringsAsFactors=FALSE) use as.POSIXct with the format string you already have. The miliseconds are internally stored. Using strftime you can display the miliseconds, the variable is no longer "POSIXct" format then, but "character". It might be more safe to use trimws to get rid of unnecessary spaces after reading in.
dat <- read.csv("V:/R/_data/yourData.csv", stringsAsFactors=FALSE)
(x <- as.POSIXct(trimws(dat$timestamp), format="%Y-%m-%d %H:%M:%OS"))
# [1] "2018-11-20 00:00:00 CET" "2018-11-20 00:00:05 CET" "2018-11-20 00:00:07 CET"
x2 <- strftime(x, format="%Y-%m-%d %H:%M:%OS6")
x2
# [1] "2018-11-20 00:00:00.000000" "2018-11-20 00:00:05.058399" "2018-11-20 00:00:07.540699"

Related

R common character to date converter for multiple formats

I am working with an input file where I have different string dates given in different month,day,year formats
example input ->
input <- c("2014-08-31 23:59:38" , "9/1/2014 00:00:25","2014-08-31 13:39:23", "12/1/2014 20:03:28")
How can I use a single function that would convert various formats of dates, in a fast manner, I am processing millions of lines
so far I have written this function:
convert_date <- function(x){
if (is.na(mdy_hms(x))){
return(ymd_hms(x))
}
return(mdy_hms(x))
}
However, it is extremely slow, I am looking for a faster and more convenient method.
Thank you so much for your time.
If you can construct a vector of possible formats that the date could be in, you could use clock. For each date-time string, it stops on the first format that succeeds.
Note that this only works if your formats are unambiguous. i.e. it would probably give you faulty results if you had both %m/%d/%Y and %d/%m/%Y in the same vector, because those are ambiguous.
library(clock)
input <- c(
"2014-08-31 23:59:38" , "9/1/2014 00:00:25",
"2014-08-31 13:39:23", "12/1/2014 20:03:28"
)
format <- c("%Y-%m-%d %H:%M:%S", "%m/%d/%Y %H:%M:%S")
date_time_parse(input, zone = "UTC", format = format)
#> [1] "2014-08-31 23:59:38 UTC" "2014-09-01 00:00:25 UTC"
#> [3] "2014-08-31 13:39:23 UTC" "2014-12-01 20:03:28 UTC"

Adding milliseconds to a timestamp in R, even though the original character does not have milliseconds?

I am doing some animal movement analysis and I want to submit data to an organisation called Movebank for annotation, but they require the timestamp to have milliseconds included with 3 decimal places.
I have a column in my data frame (dat) with my timestamps as characters (without milliseconds), for example "2017-07-19 16:30:24"
To convert them to time and date format with milliseconds I am using the code:
options(digits.secs = 3)
dat$timestamp <- as.POSIXct(dat$timestamp, format = "%Y-%m-%d %H:%M:%OS", tz = "UTC")
Which works fine at converting my timestamp column to POSIXct which I can use to make tracks etc., but it does not add .000 milliseconds to the end of each timestamp which I was hoping it would.
I have also tried:
dat$timestamp <- as.POSIXct(dat$timestamp, format = "%Y-%m-%d %H:%M:%OS3", tz = "UTC")
(Note: I added .. %OS3 ...)
But this returns an NA for my for my timestamps.
Can anybody shed some light on this? I essentially need to add .000 to the end of each of my timestamps so that, using the example given above, I would have the format "2017-07-19 16:30:24.000"
The milliseconds will be dropped if there are no times with effective milliseconds.
options(digits.secs=4)
x1 <- as.POSIXct("2017-07-19 16:30:25")
as.POSIXct(paste0(x1, ".000"), format="%Y-%m-%d %H:%M:%OS")
# [1] "2017-07-19 16:30:25 UTC"
However, they will be added automatically if there are.
x2 <- as.POSIXct("2017-07-19 16:30:25.002")
c(x1, x2)
# [1] "2017-07-19 18:30:25.000 CEST" "2017-07-19 18:30:25.002 CEST"

R - Formatting dates in dataframe - mix of decimal and character values

I have a date column in a dataframe. I have read this df into R using openxlsx. The column is 'seen' as a character vector when I use typeof(df$date).
The column contains date information in several formats and I am looking to get this into the one format.
#Example
date <- c("43469.494444444441", "12/31/2019 1:41 PM", "12/01/2019 16:00:00")
#What I want -updated
fixed <- c("2019-04-01", "2019-12-31", "2019-12-01")
I have tried many work arounds including openxlsx::ConvertToDate, lubridate::parse_date_time, lubridate::date_decimal
openxlsx::ConvertToDateso far works best but it will only take 1 format and coerce NAs for the others
update
I realized I actually had one of the above output dates wrong.
Value 43469.494444444441 should convert to 2019-04-01.
Here is one way to do this in two-step. Change excel dates separately and all other dates differently. If you have some more formats of dates that can be added in parse_date_time.
temp <- lubridate::parse_date_time(date, c('mdY IMp', 'mdY HMS'))
temp[is.na(temp)] <- as.Date(as.numeric(date[is.na(temp)]), origin = "1899-12-30")
temp
#[1] "2019-01-04 11:51:59 UTC" "2019-12-31 13:41:00 UTC" "2019-12-01 16:00:00 UTC"
as.Date(temp)
#[1] "2019-01-04" "2019-12-31" "2019-12-01"
You could use a helper function to normalize the dates which might be slightly faster than lubridate.
There are weird origins in MS Excel that depend on platform. So if the data are imported from different platforms, you may want to work woth dummy variables.
normDate <- Vectorize(function(x) {
if (!is.na(suppressWarnings(as.numeric(x)))) # Win excel
as.Date(as.numeric(x), origin="1899-12-30")
else if (grepl("A|P", x))
as.Date(x, format="%m/%d/%Y %I:%M %p")
else
as.Date(x, format="%m/%d/%Y %R")
})
For additional date formats just add another else if. Format specifications can be found with ?strptime.
Then just use as.Date() with usual origin.
res <- as.Date(normDate(date), origin="1970-01-01")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-01-04" "2019-12-31" "2019-12-01"
class(res)
# [1] "Date"
Edit: To achieve a specific output format, use format, e.g.
format(res, "%Y-%d-%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019-04-01" "2019-31-12" "2019-01-12"
format(res, "%Y/%d/%m")
# 43469.494444444441 12/31/2019 1:41 PM 12/01/2019 16:00:00
# "2019/04/01" "2019/31/12" "2019/01/12"
To lookup the codes type ?strptime.

formatting time in R error

I have a Time column in my df with value 1.01.2016 0:00:05. I want it without the seconds and therefore used df$Time <- as.POSIXct(df$Time, format = "%d.%m.%Y :%H:%M", tz = "Asia/Kolkata"). But I get NA value. What is the problem here?
I suspect there are two things working here: the storage of a time object (POSIXt), and the representation of that object.
The string you present is (I believe) not a proper POSIXt (whether POSIXct or POSIXlt) object for R, which means it is just a character string. In that case, you can remove it with:
gsub(':[^:]*$', '', '1.01.2016 0:00:05')
# [1] "1.01.2016 0:00"
However, that is still just a string, not a date or time object. If you parse it into a time-object that R knows about:
as.POSIXct("1.01.2016 0:00:05", format = "%d.%m.%Y %H:%M:%S", tz = "Asia/Kolkata")
# [1] "2016-01-01 00:00:05 IST"
then you now have a time object that R knows something about ... and it defaults to representing it (printing it on the console) with seconds-precision. Typically, all that is available to change for the console-printing is the precision of the seconds, as in
options("digits.secs")
# $digits.secs
# NULL
Sys.time()
# [1] "2018-06-26 18:21:06 PDT"
options("digits.secs"=3)
Sys.time()
# [1] "2018-06-26 18:21:10.090 PDT"
then you can get more. But alas, I do know think there is an R-option to say "always print my POSIXt objects in this way". So your only choice is (at the point where you no longer need it to be a time-like object) to change it back into a string with no time-like value:
x <- as.POSIXct("1.01.2016 0:00:05", format = "%d.%m.%Y %H:%M:%S", tz = "Asia/Kolkata")
x
# [1] "2016-01-01 00:00:05 IST"
?strptime
# see that day-of-month can either be "%d" for 01-31 or "%e" for 1-31
format(x, format="%e.%m.%Y %H:%M")
# [1] " 1.01.2016 00:00"
(This works equally well for a vector.)
Part of me suggests convert to POSIXt and back to string as opposed to my gsub example because using as.POSIXct will tell you when the string does not match the date-time-like object you are expecting, whereas gsub will happily do something wrong or nothing.
Try asPOSIXlt:
> test <- "1.01.2016 0:00:05"
> as.POSIXlt(test, "%d.%m.%Y %H:%M:%S", tz="Asia/Kolkata")
[1] "2016-01-01 00:00:05 IST"

date conversion from any fromat [duplicate]

I am reading a date value from csv file.So the format will vary according to the date format of csv. How can I convert any date string to dd-mm-yyy HH:mm:ss ?
EDIT :
The input format are :
dd/mm/yyyy HH:mm:ss
dd/mm/yyyy
dd-mm-yyyy HH:mm:ss
dd-mm-yyyy
mm-dd-yyyy HH:mm:ss
mm-dd-yyyy
mm/dd/yyyy
yyyy-mm-dd HH:mm:ss
yyyy-mm-dd
I need to convert all these formats to dd-mm-yyyy HH:mm:ss
See the anytime package whose anytime function does just that -- and without requiring a format string:
> inputs <- c("12/07/2017 10:11:12", "12/07/2017", "12-07-2017 10:11:12",
+ "07-12-2017", "2017-12-07 10:11:12", "2017-12-07")
> library(anytime)
> anytime(inputs)
[1] "2017-12-07 10:11:12 CST" "2017-12-07 00:00:00 CST"
[3] "2017-12-07 10:11:12 CST" "2017-07-12 00:00:00 CDT"
[5] "2017-12-07 10:11:12 CST" "2017-12-07 00:00:00 CST"
>
However, your requirement of accepting both d-m-y and m-d-y is not satisfiable. So you need to make a choice and supply an explicit format here.
In general, I highly recommend avoiding the ambiguity and sticking to y-m-d ISO formats. As a convenience to stubborn North American habits, anytime and anydata also accept m-d-y ordering but it is dangerous.
Again, only you can tell if 3-4-5 is April 3rd or March 4th, and you need to specify that.
as.Date() converts a string into a Date object. You will need to adapt its format parameter to the specific format your csv has.
Try "lubridate" package. Here A_1.csv has both formats.
Data <-read.csv("Al_1.csv") # import data
str(Data)
a = NULL # create a null object
library(lubridate)
a$Date <- mdy_hm(Data$Date) # store dd/mm/yyyy HH:mm:ss objects here
a$Price <- Data$Price # get respective values(it could by any other column)
b = NULL
b$Date <- mdy(Data$Date)
b$Price <- Data$Price
a <- as.data.frame(a)
b <- as.data.frame(b)
a <- a[is.na(a$Date)==FALSE,] # those with NA had diffrent formats remove it
b <- b[is.na(b$Date)==FALSE,]
b$Date <- as.POSIXlt(b$Date)# change your other format also to UTC
x <- rbind(a,b)
str(x)
x <- ts(x)

Resources