Converting datetime from character to POSIXct object - r

I have an instrument that exports data in an unruly time format. I need to combine the date and time vectors into a new datetime vector in the following POSIXct format: %Y-%m-%d %H:%M:%S. Out of curiosity, I attempted to do this in three different ways, using as.POSIXct(), strftime(), and strptime(). When using my example data below, only the as.POSIXct() and strftime() functions work, but I am curious as to why strptime() is producing NAs? Also, I cannot convert the strftime() output into a POSIXct object using as.POSIXct()...
When trying these same functions on my real data (of which I've only provided you with the first for rows), I am running into an entirely different problem. Only the strftime() function is working. For some reason the as.POSIXct() function is also producing NAs, which is the only command I actually need for converting my datetime into a POSIXct object...
It seems like there are subtle differences between these functions, and I want to know how to use them more effectively. Thanks!
Reproducible Example:
## Creating dataframe:
date <- c("2017-04-14", "2017-04-14","2017-04-14","2017-04-14")
time <- c("14:24:24.992000","14:24:25.491000","14:24:26.005000","14:24:26.511000")
value <- c("4.106e-06","4.106e-06","4.106e-06","4.106e-06")
data <- data.frame(date, time)
data <- data.frame(data, value) ## I'm sure there is a better way to combine three vectors...
head(data)
## Creating 3 different datetime vectors:
## This works in my example code, but not with my real data...
data$datetime1 <- as.POSIXct(paste(data$date, data$time), format = "%Y-%m-%d %H:%M:%S",tz="UTC")
class(data$datetime1)
## This is producing NAs, and I'm not sure why:
data$datetime2 <- strptime(paste(data$date, data$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
class(data$datetime2)
## This is working just fine
data$datetime3 <- strftime(paste(data$date, data$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
class(data$datetime3)
head(data)
## Since I cannot get the as.POSIXct() function to work with my real data, I tried this workaround. Unfortunately I am running into trouble...
data$datetime4 <- as.POSIXct(x$datetime3, format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
Link to real data:
here
Example using real_data.txt:
## Reading in the file:
fpath <- "~/real_data.txt"
x <- read.csv(fpath, skip = 1, header = FALSE, sep = "", stringsAsFactors = FALSE)
names(x) <- c("date","time","bscat","scat_coef","pressure_mbar","temp_K","CH1","CH2") ## This is data from a Radiance Research Integrating Nephelometer Model M903 for anyone who is interested!
## If anyone could get this to work that would be awesome!
x$datetime1 <- as.POSIXct(paste(x$date, x$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
## This still doesn't work...
x$datetime2 <- strptime(paste(x$date, x$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
## This works:
x$datetime3 <- strftime(paste(x$date, x$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
## But I cannot convert from strftime character to POSIXct object, so it doesn't help me at all...
x$datetime4 <- as.POSIXct(x$datetime3, format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
head(x)
Solution:
I was not providing the as.POSIXct() function with the correct format string. Once I changed %Y-%m-%d %H:%M%:%S to %Y-%m-%d %H:%M:%S, the data$datetime2, data$datetime4, x$datetime1 and x$datetime2 were working properly! Big thanks to PhilC for debugging!

For your real data issue replace the %m% with %m:
## Reading in the file:
fpath <- "c:/r/data/real_data.txt"
x <- read.csv(fpath, skip = 1, header = FALSE, sep = "", stringsAsFactors = FALSE)
names(x) <- c("date","time","bscat","scat_coef","pressure_mbar","temp_K","CH1","CH2") ## This is data from a Radiance Research Integrating Nephelometer Model M903 for anyone who is interested!
## issue was the %m% - fixed
x$datetime1 <- as.POSIXct(paste(x$date, x$time), format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
## Here too - fixed
x$datetime2 <- strptime(paste(x$date, x$time), format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
head(x)

There was a format string error causing the NAs; try this:
## This is no longer producing NAs:
data$datetime2 <- strptime(paste(data$date, data$time), format = "%Y-%m-%d %H:%M:%S",tz="UTC")
class(data$datetime2)

Formatting to "%Y-%m-%d %H:%M:%OS" is a generic view. To make the fractional seconds to a specific number of decimals call the option for degits.sec, e.g.:
options(digits.secs=6) # This will take care of seconds up to 6 decimal points
data$datetime1 <- lubridate::parse_date_time(data$datetime, "%Y-%m-%d %H:%M:%OS")

Related

read.zoo is not returning needed date format

My initial data is in %y-%m-%d format...
using the code
returnsgamma <- read.zoo(returns, header = TRUE, sep = ",", FUN = as.chron)
the zoo file is returning values in the order %m/%d/%y
is there anyway to read.zoo and have the order of dates stay as %y/%m/%d or %d/%m/%y?
Assuming the input shown in the Note at the end we can use the default Date class whose output when rendering defaults to yyyy-mm-dd or use chron with chron(..., out.format="y-m-d") which produces yy-mm-dd.
library(zoo)
read.csv.zoo(text = Lines, format = "%y-%m-%d")
## 2022-12-01
## 34
library(chron)
toChron <- function(x) as.chron(x, out.format = "y-m-d")
read.csv.zoo(text = Lines, FUN = toChron)
## 22-12-01
## 34
Note
Lines <- "date,value
22-12-01,34"

Writing functions to reference specific columns

I have to pull different data sets from the same API regularly but for different reasons, so I have to write out the code for many different pulls. I'd like to create some functions to help with this, but I need some help.
I haven't been able to figure out how to set up the function so that I can change the data set but still pull from the same column each time. In this example, I have 3 columns with timestamps that mean different things (made up in this data). I need to change the timezone here to my local time zone. The column name will remain the same in all of my datasets, but the name of the dataset will change. I have a few places in my code where I need to do this, and I haven't been able to figure it out, so any suggestions would be much appreciated!
The second section of this example code is not included in the actual code, but it is there to set the data up correctly. The data comes out of the API in the format shown as GMT.
df <- data.frame(col_1 = c(1, 2, 3, 4),
time_1 = c("2021-01-20 23:58:21", "2021-01-20 21:21:00", "2021-01-20 17:14:04", "2021-01-20 01:05:18"),
time_2 = c("2021-01-19 23:58:21", "2021-01-19 21:21:00", "2021-01-19 17:14:04", "2021-01-19 01:05:18"),
time_3 = c("2021-01-18 23:46:21", "2021-01-18 36:21:00", "2021-01-18 15:14:04", "2021-01-18 01:05:18"),
time_4 = c("2021-01-17 23:58:21", "2021-01-17 20:21:00", "2021-01-17 18:14:04", "2021-01-17 02:05:18"))
# Not part of actual code
df$time_1 <- as.POSIXlt(df$time_1, tz = "GMT")
df$time_2 <- as.POSIXlt(df$time_2, tz = "GMT")
df$time_3 <- as.POSIXlt(df$time_3, tz = "GMT")
df$time_4 <- as.POSIXlt(df$time_4, tz = "GMT")
# What I want it to do
# df$time_1 <- lubridate::with_tz(df$time_1, tz = "America/Los_Angeles")
# df$time_2 <- lubridate::with_tz(df$time_2, tz = "America/Los_Angeles")
# df$time_3 <- lubridate::with_tz(df$time_3, tz = "America/Los_Angeles")
# df$time_4 <- lubridate::with_tz(df$time_4, tz = "America/Los_Angeles")
# Attempted function
timezone_cleanup <- function(my_df){
my_df$time_1 <- lubridate::with_tz(my_df$time_1, tz = "America/Los_Angeles")
my_df$time_2 <- lubridate::with_tz(my_df$time_2, tz = "America/Los_Angeles")
my_df$time_3 <- lubridate::with_tz(my_df$time_3, tz = "America/Los_Angeles")
my_df$time_4 <- lubridate::with_tz(my_df$time_4, tz = "America/Los_Angeles")
}
# how I'd like to use this function. Not working now. Even if I wrap it with data.frame(), it's not what I wanted.
new_df <- timezone_cleanup(df)
I think you need to return my_df in your function to get the changed dataframe back. However, you can use lapply or across to apply the same function to multiple columns.
library(dplyr)
timezone_cleanup <- function(my_df){
my_df %>%
mutate(across(starts_with('time'),
lubridate::with_tz, tz = "America/Los_Angeles"))
}
new_df <- timezone_cleanup(df)
By the way, I do recive a warning message while using this Unrecognized time zone 'America/Los_Angeles'. Are you sure you are using the correct tz value?

Strptime fails when working with a dataframe

Strptime seems to be missing something in this scenario:
aDateInPOSIXct <- strptime("2018-12-31", format = "%Y-%m-%d")
someText <- "asdf"
df <- data.frame(aDateInPOSIXct, someText, stringsAsFactors = FALSE)
bDateInPOSIXct <- strptime("2019-01-01", format = "%Y-%m-%d")
df[1,1] <- bDateInPOSIXct
Assignment of bDate to the dataframe fails with:
Error in as.POSIXct.numeric(value) : 'origin' must be supplied
And a warning:
provided 11 variables to replace 1 variables
I want to use both POSIXct dates and POSIXct date-times to compare this and that. It's way less work than manipulating character strings -- and POSIX takes care of the time zone issues. Unfortunately, I'm missing something.
You only need to cast your calls to strptime to POSIXct explicitly:
aDateInPOSIXct <- as.POSIXct(strptime("2018-12-31", format = "%Y-%m-%d"))
someText <- "asdf"
df <- data.frame(aDateInPOSIXct, someText, stringsAsFactors = FALSE)
bDateInPOSIXct <- as.POSIXct(strptime("2019-01-01", format = "%Y-%m-%d"))
df[1,1] <- bDateInPOSIXct
Check the R documentation which says:
Character input is first converted to class "POSIXlt" by strptime: numeric input is first converted to "POSIXct".

R as.POSIXct try two input formats

I am reading in a .csv of dates and gps positions. I need to convert the date column to a date class.
I am using:
data = data.frame(rbind(c('2016/07/19 17:52:00',3674.64416424279,354.266660979476),
c('2016/07/19 17:54:00',3674.65121597935,354.246972537617),
c('2016/07/19 17:55:00',3674.65474186293,354.237128326737),
c('2016/07/19 17:56:00',3674.65826775671,354.227284122559)))
colnames(data) = (c('GMT_DateTime','northing','easting'))
data$GMT_DateTime<-as.POSIXct(data$GMT_DateTime, tz="GMT", format = "%Y/%m/%d %H:%M:%S")
Sometimes the date in the .csv to be read is formatted as "%Y/%m/%d %H:%M:%S" and sometimes as "%m/%d/%Y %H:%M"
Is there a way to feed in two possible formats to as.POSIXct() to try both possible formats? I imagine something like this:
data$GMT_DateTime<-as.POSIXct(data$GMT_DateTime, tz="GMT", format = "%m/%d/%Y %H:%M" or "%Y/%m/%d %H:%M:%S")
Thank you!
In what follows I will use package lubridate.
I have added two extra rows to the example dataset, with date/time values in the "%m/%d/%Y %H:%M" format. Note that that column is of class character, if it is of class factor it will probably throw an error.
As for the warnings, don't worry, they are just lubridate telling you that it found several formats and cannot process them all in one go.
tmp <- data$GMT_DateTime # work on a copy
na <- is.na(ymd_hms(tmp))
data$GMT_DateTime[!na] <- ymd_hms(tmp)[!na]
data$GMT_DateTime[na] <- mdy_hm(tmp)[na]
data$GMT_DateTime <- as.POSIXct(as.numeric(data$GMT_DateTime),
format = "%Y-%m-%d",
origin = "1970-01-01", tz = "GMT")
rm(tmp) # final clean up
Data in dput() format.
data <-
structure(list(GMT_DateTime = c("2016/07/19 17:52:00", "2016/07/19 17:54:00",
"2016/07/19 17:55:00", "2016/07/19 17:56:00", "07/22/2016 17:02",
"07/23/2016 17:15"), northing = c(3674.64416424279, 3674.65121597935,
3674.65474186293, 3674.65826775671, 3674.662, 3674.665), easting = c(354.266660979476,
354.246972537617, 354.237128326737, 354.227284122559, 354.2702,
354.3123)), row.names = c(NA, -6L), class = "data.frame")

R POSIXlt timestamp conversion do not know how to convert 'df1$timestamp' to class “POSIXlt”

Hello all I am facing issue while converting timestamp to POSIXlt, later from this timestamp o need to extract Year,month,dayofmoth,hour,min,sec
2015-12-01 00:04:39 is my timestamp
and here is my try
getwd()
rm(list=ls())
library(ggplot2)
library(plyr)
library(reshape)
library(scales)
library(gridExtra)
library(SparkR)
Sys.setenv(SPARK_HOME="/usr/local/spark").libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
sc <- sparkR.init(master="local","RwordCount")
args <- commandArgs(trailing = TRUE)
sqlContext <- sparkRSQL.init(sc)
df1 <- read.df(sqlContext, "hdfs://master:9000/test.csv", header='true', source = "com.databricks.spark.csv", inferSchema='true', stringsAsFactors = F)
if("timestamp" %in% colnames(df1)){
df1$pTime <- as.POSIXlt(df1$timestamp, format= "%Y-%m-%d %H:%M:%S")
}else {
df1$pTime <- as.POSIXlt(df1$Timestamp, format= "%Y-%m-%d %H:%M:%S")
}
but getting error here do not know how to convert 'df1$timestamp' to class “POSIXlt”
later i need to find out the Year,month,dayofmonth,hors,min,sec for that i have this snippet
df1$Year <- df1$pTime$year-100 #Year
df1$Month <- df1$pTime$mon+1 #Month 1-12
df1$Day <- df1$pTime$mday #day of month
df1$Hour <- df1$pTime$hour #0â??23: hours
df1$Min <- df1$pTime$min
df1$Sec <- df1$pTime$sec
df1$WeekOfYear <- strftime(df1$pTime, format="%W")
and i am executing above script using following syntax,
bin/spark-submit --packages com.databricks:spark-csv_2.11:1.3.0 /home/script/analysis.R
**Error in as.POSIXlt.default(df1$timestamp, format = "%Y-%m-%d %H:%M:%S") :
do not know how to convert 'df1$timestamp' to class “POSIXlt”
Calls: as.POSIXlt -> as.POSIXlt.default
Execution halted**
How can i get rid of the error, any help will be appreciated.
Thanks
You can convert your timestamp by using as.POSIXct
x <- as.POSIXct("2015-12-01 00:04:39")
and then using lubridate package, you can extract all the information
library(lubridate)
year(x)
#[1] 2015
month(x)
#[1] 12
day(x)
#[1] 1
hour(x)
#[1] 0
minute(x)
#[1] 4
second(x)
#[1] 39
You can extract parts of datetime values by
x <- Sys.time()
format(x, format="%Y")
for example. See
?strptime
for all options.
I can not reconstruct the first part of your question. What is the error message you get?
firstly, you can index your data.frame using without using the ifelse case
df1[colnames(df1) %in% "timestamp"]
To convert you entire column of format 2015-12-01 00:04:39
as.POSIXlt(strptime(as.character(df1[colnames(df1) %in% "timestamp"]),
format = "%Y-%m-%d %H:%M:%S"),
format = "%Y-%m-%d %H:%M:%S")
I had 'trans_dtime' column type string in dataframe. i have converted 'trans_dtime' column to timestamp type using SparkR
printSchema(df)
root
|-- col1: string (nullable = true)
|-- trans_dtime: string (nullable = true)
df$trans_dtime <- from_utc_timestamp(date_format(df$trans_dtime, "YYYY-MM-dd HH:mm:ss"), "GMT")
printSchema(df)
root
|-- col1: string (nullable = true)
|-- trans_dtime: timestamp (nullable = true)
Hope it will help you. :)

Resources