I am reading in a .csv of dates and gps positions. I need to convert the date column to a date class.
I am using:
data = data.frame(rbind(c('2016/07/19 17:52:00',3674.64416424279,354.266660979476),
c('2016/07/19 17:54:00',3674.65121597935,354.246972537617),
c('2016/07/19 17:55:00',3674.65474186293,354.237128326737),
c('2016/07/19 17:56:00',3674.65826775671,354.227284122559)))
colnames(data) = (c('GMT_DateTime','northing','easting'))
data$GMT_DateTime<-as.POSIXct(data$GMT_DateTime, tz="GMT", format = "%Y/%m/%d %H:%M:%S")
Sometimes the date in the .csv to be read is formatted as "%Y/%m/%d %H:%M:%S" and sometimes as "%m/%d/%Y %H:%M"
Is there a way to feed in two possible formats to as.POSIXct() to try both possible formats? I imagine something like this:
data$GMT_DateTime<-as.POSIXct(data$GMT_DateTime, tz="GMT", format = "%m/%d/%Y %H:%M" or "%Y/%m/%d %H:%M:%S")
Thank you!
In what follows I will use package lubridate.
I have added two extra rows to the example dataset, with date/time values in the "%m/%d/%Y %H:%M" format. Note that that column is of class character, if it is of class factor it will probably throw an error.
As for the warnings, don't worry, they are just lubridate telling you that it found several formats and cannot process them all in one go.
tmp <- data$GMT_DateTime # work on a copy
na <- is.na(ymd_hms(tmp))
data$GMT_DateTime[!na] <- ymd_hms(tmp)[!na]
data$GMT_DateTime[na] <- mdy_hm(tmp)[na]
data$GMT_DateTime <- as.POSIXct(as.numeric(data$GMT_DateTime),
format = "%Y-%m-%d",
origin = "1970-01-01", tz = "GMT")
rm(tmp) # final clean up
Data in dput() format.
data <-
structure(list(GMT_DateTime = c("2016/07/19 17:52:00", "2016/07/19 17:54:00",
"2016/07/19 17:55:00", "2016/07/19 17:56:00", "07/22/2016 17:02",
"07/23/2016 17:15"), northing = c(3674.64416424279, 3674.65121597935,
3674.65474186293, 3674.65826775671, 3674.662, 3674.665), easting = c(354.266660979476,
354.246972537617, 354.237128326737, 354.227284122559, 354.2702,
354.3123)), row.names = c(NA, -6L), class = "data.frame")
Related
I'm playing around with functions in R and want to create a function that takes a character variable and converts it to a POSIXct.
The time variable currently looks like this:
"2020-01-01T05:00:00.283236Z"
I've successfully converted the time variable in my janviews dataset with the following code:
janviews$time <- gsub('T',' ',janviews$time)
janviews$time <- as.POSIXct(janviews$time, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
Since I have to perform this on multiple datasets, I want to create a function that will perform this. I created the following function but it doesn't seem to be working and I'm not sure why:
set.time <- function(dat, variable.name){
dat$variable.name <- gsub('T', ' ', dat$variable.name)
dat$variable.name <- as.POSIXct(dat$variable.name, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
}
Here's the first four rows of the janviews dataset:
structure(list(customer_id = c("S4PpjV8AgTBx", "p5bpA9itlILN",
"nujcp24ULuxD", "cFV46KwexXoE"), product_id = c("kq4dNGB9NzwbwmiE",
"FQjLaJ4B76h0l1dM", "pCl1B4XF0iRBUuGt", "e5DN2VOdpiH1Cqg3"),
time = c("2020-01-01T05:00:00.283236Z", "2020-01-01T05:00:00.895876Z",
"2020-01-01T05:00:01.362329Z", "2020-01-01T05:00:01.873054Z"
)), row.names = c(NA, -4L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x1488180e0>)
Also, if there is a better way to convert my time variable, I am open to changing my method!
I would use the lubridate package and the as_datetime() function.
lubridate::as_datetime("2020-01-01T05:00:00.283236Z")
Returns
"2020-01-01 05:00:00 UTC"
Lubridate Info
I am working with data in R and would like to change the time zone of some POSIXct data, but only for certain rows within the columns (Survey_Start and Survey_End). Some of the data is already in the proper time zone, so converting the entire column is a problem. My code to change the time zone is:
herps2021 <- herps2021 %>%
mutate(Survey_Start = as.POSIXct(Survey_Start, format = "%H:%M:%S",
tz = "UTC"),
Survey_End = as.POSIXct(Survey_End, format = "%H:%M:%S", tz =
"UTC"),
#Change to proper time zone
Survey_Start = with_tz(Survey_Start, tzone = "America/Los_Angeles"),
Survey_End = with_tz(Survey_End , tzone = "America/Los_Angeles")
)
Is there a way to specify which rows for the columns Survey_Start and Survey_End I want to convert, so that the data already in the correct time zone is unaffected?
Thanks!
you could try using parse_date_time that allows you to parse multiple dates and times in a column.
Looks something like this:
library(lubridate)
parse_date_time(c("2016", "2016-04"), orders = c("Y", "Ym"))
#> [1] "2016-01-01 UTC" "2016-04-01 UTC"
here is the link to the documentation: https://lubridate.tidyverse.org/reference/parse_date_time.html
I have to pull different data sets from the same API regularly but for different reasons, so I have to write out the code for many different pulls. I'd like to create some functions to help with this, but I need some help.
I haven't been able to figure out how to set up the function so that I can change the data set but still pull from the same column each time. In this example, I have 3 columns with timestamps that mean different things (made up in this data). I need to change the timezone here to my local time zone. The column name will remain the same in all of my datasets, but the name of the dataset will change. I have a few places in my code where I need to do this, and I haven't been able to figure it out, so any suggestions would be much appreciated!
The second section of this example code is not included in the actual code, but it is there to set the data up correctly. The data comes out of the API in the format shown as GMT.
df <- data.frame(col_1 = c(1, 2, 3, 4),
time_1 = c("2021-01-20 23:58:21", "2021-01-20 21:21:00", "2021-01-20 17:14:04", "2021-01-20 01:05:18"),
time_2 = c("2021-01-19 23:58:21", "2021-01-19 21:21:00", "2021-01-19 17:14:04", "2021-01-19 01:05:18"),
time_3 = c("2021-01-18 23:46:21", "2021-01-18 36:21:00", "2021-01-18 15:14:04", "2021-01-18 01:05:18"),
time_4 = c("2021-01-17 23:58:21", "2021-01-17 20:21:00", "2021-01-17 18:14:04", "2021-01-17 02:05:18"))
# Not part of actual code
df$time_1 <- as.POSIXlt(df$time_1, tz = "GMT")
df$time_2 <- as.POSIXlt(df$time_2, tz = "GMT")
df$time_3 <- as.POSIXlt(df$time_3, tz = "GMT")
df$time_4 <- as.POSIXlt(df$time_4, tz = "GMT")
# What I want it to do
# df$time_1 <- lubridate::with_tz(df$time_1, tz = "America/Los_Angeles")
# df$time_2 <- lubridate::with_tz(df$time_2, tz = "America/Los_Angeles")
# df$time_3 <- lubridate::with_tz(df$time_3, tz = "America/Los_Angeles")
# df$time_4 <- lubridate::with_tz(df$time_4, tz = "America/Los_Angeles")
# Attempted function
timezone_cleanup <- function(my_df){
my_df$time_1 <- lubridate::with_tz(my_df$time_1, tz = "America/Los_Angeles")
my_df$time_2 <- lubridate::with_tz(my_df$time_2, tz = "America/Los_Angeles")
my_df$time_3 <- lubridate::with_tz(my_df$time_3, tz = "America/Los_Angeles")
my_df$time_4 <- lubridate::with_tz(my_df$time_4, tz = "America/Los_Angeles")
}
# how I'd like to use this function. Not working now. Even if I wrap it with data.frame(), it's not what I wanted.
new_df <- timezone_cleanup(df)
I think you need to return my_df in your function to get the changed dataframe back. However, you can use lapply or across to apply the same function to multiple columns.
library(dplyr)
timezone_cleanup <- function(my_df){
my_df %>%
mutate(across(starts_with('time'),
lubridate::with_tz, tz = "America/Los_Angeles"))
}
new_df <- timezone_cleanup(df)
By the way, I do recive a warning message while using this Unrecognized time zone 'America/Los_Angeles'. Are you sure you are using the correct tz value?
I have an instrument that exports data in an unruly time format. I need to combine the date and time vectors into a new datetime vector in the following POSIXct format: %Y-%m-%d %H:%M:%S. Out of curiosity, I attempted to do this in three different ways, using as.POSIXct(), strftime(), and strptime(). When using my example data below, only the as.POSIXct() and strftime() functions work, but I am curious as to why strptime() is producing NAs? Also, I cannot convert the strftime() output into a POSIXct object using as.POSIXct()...
When trying these same functions on my real data (of which I've only provided you with the first for rows), I am running into an entirely different problem. Only the strftime() function is working. For some reason the as.POSIXct() function is also producing NAs, which is the only command I actually need for converting my datetime into a POSIXct object...
It seems like there are subtle differences between these functions, and I want to know how to use them more effectively. Thanks!
Reproducible Example:
## Creating dataframe:
date <- c("2017-04-14", "2017-04-14","2017-04-14","2017-04-14")
time <- c("14:24:24.992000","14:24:25.491000","14:24:26.005000","14:24:26.511000")
value <- c("4.106e-06","4.106e-06","4.106e-06","4.106e-06")
data <- data.frame(date, time)
data <- data.frame(data, value) ## I'm sure there is a better way to combine three vectors...
head(data)
## Creating 3 different datetime vectors:
## This works in my example code, but not with my real data...
data$datetime1 <- as.POSIXct(paste(data$date, data$time), format = "%Y-%m-%d %H:%M:%S",tz="UTC")
class(data$datetime1)
## This is producing NAs, and I'm not sure why:
data$datetime2 <- strptime(paste(data$date, data$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
class(data$datetime2)
## This is working just fine
data$datetime3 <- strftime(paste(data$date, data$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
class(data$datetime3)
head(data)
## Since I cannot get the as.POSIXct() function to work with my real data, I tried this workaround. Unfortunately I am running into trouble...
data$datetime4 <- as.POSIXct(x$datetime3, format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
Link to real data:
here
Example using real_data.txt:
## Reading in the file:
fpath <- "~/real_data.txt"
x <- read.csv(fpath, skip = 1, header = FALSE, sep = "", stringsAsFactors = FALSE)
names(x) <- c("date","time","bscat","scat_coef","pressure_mbar","temp_K","CH1","CH2") ## This is data from a Radiance Research Integrating Nephelometer Model M903 for anyone who is interested!
## If anyone could get this to work that would be awesome!
x$datetime1 <- as.POSIXct(paste(x$date, x$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
## This still doesn't work...
x$datetime2 <- strptime(paste(x$date, x$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
## This works:
x$datetime3 <- strftime(paste(x$date, x$time), format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
## But I cannot convert from strftime character to POSIXct object, so it doesn't help me at all...
x$datetime4 <- as.POSIXct(x$datetime3, format = "%Y-%m-%d %H:%M%:%S", tz = "UTC")
head(x)
Solution:
I was not providing the as.POSIXct() function with the correct format string. Once I changed %Y-%m-%d %H:%M%:%S to %Y-%m-%d %H:%M:%S, the data$datetime2, data$datetime4, x$datetime1 and x$datetime2 were working properly! Big thanks to PhilC for debugging!
For your real data issue replace the %m% with %m:
## Reading in the file:
fpath <- "c:/r/data/real_data.txt"
x <- read.csv(fpath, skip = 1, header = FALSE, sep = "", stringsAsFactors = FALSE)
names(x) <- c("date","time","bscat","scat_coef","pressure_mbar","temp_K","CH1","CH2") ## This is data from a Radiance Research Integrating Nephelometer Model M903 for anyone who is interested!
## issue was the %m% - fixed
x$datetime1 <- as.POSIXct(paste(x$date, x$time), format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
## Here too - fixed
x$datetime2 <- strptime(paste(x$date, x$time), format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
head(x)
There was a format string error causing the NAs; try this:
## This is no longer producing NAs:
data$datetime2 <- strptime(paste(data$date, data$time), format = "%Y-%m-%d %H:%M:%S",tz="UTC")
class(data$datetime2)
Formatting to "%Y-%m-%d %H:%M:%OS" is a generic view. To make the fractional seconds to a specific number of decimals call the option for degits.sec, e.g.:
options(digits.secs=6) # This will take care of seconds up to 6 decimal points
data$datetime1 <- lubridate::parse_date_time(data$datetime, "%Y-%m-%d %H:%M:%OS")
I have data in YYMMDDHH format but am trying to get the weekday so I need to go to a date format but can't figure it out.
Here's a dput of the relevant data:
structure(list(id = c(7927751403363142656, 18236986451472797696,
5654946373641778176, 14195690822403907584, 1693303484298446848,
1.1362181921561e+19, 11694645532962195456, 1221431312630614784,
1987127670789791488, 379819848497418688), hour = c(14102118L,
14102217L, 14102812L, 14102912L, 14102820L, 14102401L, 14102117L,
14102312L, 14102301L, 14102414L)), .Names = c("id", "hour"), row.names = c(3620479L,
8510796L, 29632625L, 34450879L, 31874113L, 13420799L, 3332671L,
11543560L, 9602012L, 15574701L), class = "data.frame")
When I use:
dat2$dow <- as.Date(substr(as.character(dat2$hour), 1,6), format = '%Y%m%d')
I just get NA's. Any suggestions?
"%Y" is for 4-digit years; "%y" is for 2-digit years. And you don't need to use substr. as.Date will ignore anything after the end of the specified format.
dat2$dow <- as.Date(as.character(dat2$hour), format='%y%m%d')