I have a table like so:
dtab<-data.table(Arr.Dep=c("A","D"),
time=c("2017-05-01 04:50:00","2017-05-01 04:55:00"))
dtab[,time:=parse_date_time(dtab$time, c("%y-%m-%d %H:%M%S"))]
Operations on the date and time column seem successful:
dtab[,time2:=time+2]
But if I try to run an ifelse statement, the POSIXct format goes back to numeric and I seem to be unable to bring it back to date and time.
dtab[,time3:=ifelse(dtab[,Arr.Dep]=="A",dtab[,time]+2,"hello")]
I saw the issue has already been raised:
R- date time variable loses format after ifelse
Unfortunately it's not of great help to me, as when I try to follow the example - adding 2 seconds rather than replacing with NA as in the OP -, I hit an error anyway.
Any help?
Use library(lubridate) and add time dtab$time2 <- dtab$time2 + seconds(2). With this method, the format does not change.
Related
I am working through a code on a sightings dataset and am trying to bin the dates into monthly (4 week) groups.
The command given to be used is:
breaks <- seq(min(min(survey.c$DATE), min(sightings.d$date))-opt$window_length*7,
max(max(survey.c$DATE), max(sightings.d$date))+opt$window_length*7,
by = paste(opt$window_length, 'week'))
This consistently gives back the error
Error in min(min(survey.c$DATE), min(sightings.d$date)) -
opt$window_length * : non-numeric argument to binary operator
Separately, part of the command (min(min(survey.c$DATE), min(sightings.d$date)) and (max(survey.c$DATE), max(sightings.d$date)) work, and so the issues comes with the last part opt$window_length.
The window_length is "2", however even if I change this last part to simply say "14" I run into the same issue. If I am totally honest, I don't completely understand exactly what opt$window_length is trying to do.
I have tried converting columns survey.c$DATE and sightings.d$date into date formats, and this reads back at true with no NAs. I can also convert these into characters, but the data reads back as NAs.
I think I need to change the final part of the command into a proper date format OR change the original date column into numeric form?
I load a dataset from excel
library(readxl)
df<-read_excel("excel_file.XLSX")
In the file there is a separate date column as Posixct
str(df$datecol)
I also have a time column that in R gets loaded as a date time. To bring it back as time I do........
df$Timecol<-format(df$Timecol,"%H:%M:%S")
However it turns into a character. This is where i think the problem lies
str(STOP_DATA$`Stop Frisk Time`)
I would think this part resolves the situation
df$merge_date_time<-as.POSIXct(paste(df$Datecol, df$TimeCol), format="%Y-%m-%d %H:%M:%S")
The date and time is then combined. What i want to do now is reference a timestamp column that is a Poxict data type.
str(df$Timestamp)
I would like to then find the time difference between them
df$TIME_SINCE <- difftime(df$Timestamp, df$merge_date_time, tz="UTC", units = "mins" )
but I end up with weird numbers that don't make sense. My guess its the Character data type for time. Does anyone know how to solve this?
I ended up finding out that this works
df$date_time<-paste(df$date, format(as.POSIXct(df$time), '%T'))
I removed the portion below from the script as it changed the file into a character.
df$Timecol<-format(df$Timecol,"%H:%M:%S")
I accepted the obscure POSIXT default with the proper time and odd dates (1899-12-31) and what the script did was replace 1899-12-31 with the proper correstponding df$date column.
I am trying to convert AM/PM formatted date and time into as.posixct , but for every 00:00:00 , i am getting NA. please guide me on this. please refer below image.
CODE I TRYED WITH FOR LOOP
i=0
for (i in 1:nrow(clean_df)){
if((is.na(clean_df$Local_time)[i]) == TRUE){
#cat("",clean_df$Local_time[i])
clean_df$Local_time[i] <- paste("",as.Date(clean_df$Local_time[i-1]),"00:00:00")
}
print(nrow(clean_df)-i)
}
But above code is taking longer time to execute , which is not recommended . requesting you all any solution with this.
Given that some of your raw data may be lacking a time component, when you expect it to be present when converting to POSIXct, I don't see any way around scrubbing your data. But, you may try doing the scrubbing in a vectorized way, which might perform better:
clean_df$Local_time <- ifelse(nchar(clean_df$Local_time) == 10,
paste(clean_df$Local_time, "00:00:00"),
clean_df$Local_time)
im working with some GTFS data from Berlin and I am hitting a wall here right now.
There is a stop_times.txt file for all Busstops in Berlin with 5 million rows.
Two Columns (Arrival_time and Departure_time) contain anomalies, such as
Arrival_time : 112:30:0 instead of the regular format 11:20:30.
I dont really know how to extract those specific lines and erase them from the dataset. I cant come up with an algorithm which is able to detect it. I tried to go with the length of strings (should be 8 00:00:00 = 8 characters), but the errored ones are also 8 long.
Do you know a simple way to make sure that the format is always xx:xx:xx and delete all others?
Thanks...
Edit :
So, now after trying the below suggested solution, it didnt work because it would just tell me how many rows were malicious and not where and not how i could delete those.
My idea is basically now :
Find every timestamp which does not correspond to this exact format :
'00:00:00', where it has to be length '8' and 2 digits seperated by ':'. Is there a way to detect anomalies within this pattern and then delete them? I really dont know how to fix this issue anymore.
Thanks
lubridate is such a useful package I can't remember how I was doing without it.
require(lubridate)
times <- c("112:30:0", "11:20:30")
datetimes <- paste("01.01.2018", times)
parsed.datetimes <- lubridate::dmy_hms(datetimes)
#[1] NA "2018-01-01 11:20:30 UTC"
#Warning message:
# 1 failed to parse.
This function will automatically tell you when format parsing failed, only thing is, it is taking datetime format as input instead of just times, but you can easily get around that like shown.
In order to know exactly which ones have failed to parse, you can then apply:
failed.list <- which(is.na(parsed.datetimes))
I have a dataframe (df3) with with some values.
One of these values is the daedlines.
The data of this value is something like the following:
deadline
1419397140
1418994978
1419984000
1418702400
They are days and I want to convert the to using this:
df3$deadline <- as.POSIXct(df3$deadline, origin="1970-01-01")
Generally it was worked for me with other dataframes from other files.
However with this it gives me back this error:
Error in as.POSIXlt.character(as.character(x), ...) :
character string is not in a standard unambiguous format
How can I fix it?
It might be that you have a character or factor, and it's expecting a numeric vector for conversion from unix time :
as.POSIXct(as.numeric(as.character(df3$deadline)),origin="1970-01-01")
As a suggestion for future debugging, you can check your parameter type by using
class(df3$deadline)
and making sure you are passing the correct type to as.POSIXlt().
From the help menu for asPOSIX*():
Character input is first converted to class '"POSIXlt"' by
'strptime': numeric input is first converted to '"POSIXct"'. Any
conversion that needs to go between the two date-time classes
requires a time zone: conversion from '"POSIXlt"' to '"POSIXct"'
will validate times in the selected time zone.