I'm playing around with functions in R and want to create a function that takes a character variable and converts it to a POSIXct.
The time variable currently looks like this:
"2020-01-01T05:00:00.283236Z"
I've successfully converted the time variable in my janviews dataset with the following code:
janviews$time <- gsub('T',' ',janviews$time)
janviews$time <- as.POSIXct(janviews$time, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
Since I have to perform this on multiple datasets, I want to create a function that will perform this. I created the following function but it doesn't seem to be working and I'm not sure why:
set.time <- function(dat, variable.name){
dat$variable.name <- gsub('T', ' ', dat$variable.name)
dat$variable.name <- as.POSIXct(dat$variable.name, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
}
Here's the first four rows of the janviews dataset:
structure(list(customer_id = c("S4PpjV8AgTBx", "p5bpA9itlILN",
"nujcp24ULuxD", "cFV46KwexXoE"), product_id = c("kq4dNGB9NzwbwmiE",
"FQjLaJ4B76h0l1dM", "pCl1B4XF0iRBUuGt", "e5DN2VOdpiH1Cqg3"),
time = c("2020-01-01T05:00:00.283236Z", "2020-01-01T05:00:00.895876Z",
"2020-01-01T05:00:01.362329Z", "2020-01-01T05:00:01.873054Z"
)), row.names = c(NA, -4L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x1488180e0>)
Also, if there is a better way to convert my time variable, I am open to changing my method!
I would use the lubridate package and the as_datetime() function.
lubridate::as_datetime("2020-01-01T05:00:00.283236Z")
Returns
"2020-01-01 05:00:00 UTC"
Lubridate Info
I'm fairly new to R and am trying to plot some expenditure data. I read the data in from excel and then do some manipulation on the dates
data <- read.csv("Spending2019.csv", header = T)
#converts time so R can use the dates
strdate <- strptime(data$DATE,"%m/%d/%Y")
newdate <- cbind(data,strdate)
finaldata <- newdate[order(strdate),]
This probably isn't the most efficient, but it gets me there :)
Here's the relevant columns of the first four lines of my finaldata dataframe
dput(droplevels(finaldata[1:4,c(5,7)]))
structure(list(AMOUNT = c(25.13, 14.96, 43.22, 18.43), strdate = structure(c(1546578000,
1546750800, 1547010000, 1547010000), class = c("POSIXct", "POSIXt"
), tzone = "")), row.names = c(NA, 4L), class = "data.frame")
The full data set has 146 rows and the dates range from 1/4/2019 to 12/30/2019
I then plot the data
plot(finaldata$strdate,finaldata$AMOUNT, xlab = "Month", ylab = "Amount Spent")
and I get this plot
This is fine for me getting started, EXCEPT why is JAN repeated at the far right end? I have tried various forms of xlim and can't seem to get it to go away.
Using the lubridate package I want to add seconds (for the purpose of the example) to a "POSIXct", "POSIXt" field in a tibble.
b <- structure(list(`"a"` = c("a", "a", "a", "a", "a"), Date_time = structure(c(1506694322,
1506694270, 1506693970, 1506693897, 1506693849), class = c("POSIXct",
"POSIXt"), tzone = "")), .Names = c("\"a\"", "Date_time"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
b %>%
mutate(tol_lower = Date_time - second(2),
tol_lower = Date_time + second(30))
I get the error:
Error in mutate_impl(.data, dots) : 'origin' must be supplied
Why is this? I appreciate i can calculate hours, but I'd like to know what I'm doing wrong.
Additional points:
-I've tried as.Date, which gives the same error.
-I can add seconds directly without issue: tol_lower = Date_time - 2
Whyn't use this?
b %>% mutate(tol_lower = Date_time - 2,
tol_upper = Date_time + 30)
In case you want to add hours to given date then simply use Date_time + 2*60*60 (i.e. 2 hours added to Date_time).
Also ?second clearly says that x in second(x) is a "date-time object" but in your case you are trying to pass an integer.
Hope it helps!
I'm using R to generate a CZML file from a database.
The database has dirty data.
I need a way to make sure times are in the format "%H:%M:%S".
The data can be in the correct %H:%M:%S already or missing zeros in front of the hour, e.g 8:30:00, which is an invalid ISO 8601 and throws the CZML parsing off entirely.
It needs to always be like so 08:30:00 or 07:09:00 in the 24h format.
I have errors because it is like so 8:30:00 or 7:09:00 still in the 24h format though, I haven't checked if the minutes or seconds are incorrect too but for the moment I assume they are correct and the only problem is the hours.
For example, I have a csv file like this:
"Date","Time","TZ","Jul.Time","BirdID","Species","Sex","Age","SiteID","Latitude","Longitude"
"4-Mar-13","08:30:00","America/Costa_Rica",2456356.187500,"test2","GREH","M","AHY","56scr25",8.71191178,-82.96866316
"4-Mar-13","8:30:00","America/Costa_Rica",2456356.187500,"test2","GREH","M","AHY","56scr25",8.71191178,-82.96866316
I need to generate a CZML like so:
"point": {
"color": {
"rgba": [
"2013-03-04T08:30:00Z",225,50,50,196,"2013-03-04T08:30:01Z",50,50,225,196,"2013-03-04T13:30:00Z",225,50,50,196,"2013-03-04T13:30:01Z",50,50,225,196,"2013-03-04T16:00:00Z",225,50,50,196,"2013-03-04T16:00:01Z",50,50,225,196
]
},
"pixelSize": { "number": 10 }
}
My code is like so:
j=1
numVisits=nrow(visitedTimes)
while(j<=numVisits){
date=as.Date(visitedTimes$Date[j], format="%d-%b-%y")
time=format(visitedTimes$Time[j], format="%H:%M:%S")
timeOfPassage=paste0(date,"T",time,"Z")
timeAfter=as.POSIXlt(timeOfPassage, format="%Y-%m-%dT%H:%M:%SZ")
timeAfter$sec=timeAfter$sec+1
timeAfter=format(timeAfter, format="%Y-%m-%dT%H:%M:%SZ")
cat(paste0("\"",timeOfPassage,"\","))
cat("225,50,50,196,")
cat(paste0("\"",timeAfter,"\","))
cat("50,50,225,196")
if(j<numVisits){
cat(",")
}
j=j+1
}
But it doesn't produce the desired output because of the dirty data..
Any ideas?
We can use times from chron
library(chron)
times(v1)
#[1] 08:30:00 08:30:00 07:09:00 07:09:00
Or using base R
format(strptime(v2, '%H:%M:%S'), '%H:%M:%S')
#[1] "08:30:00" "08:30:00" "07:09:00" "07:09:00" "07:09:05" "11:10:00"
Using the OP's updated dataset
df1$Time <- times(df1$Time)
df1$Time
#[1] 08:30:00 08:30:00
Or using regex
sub('^(.:)', '0\\1', df1$Time)
gsub('[^:]{2}(*SKIP)(*F)|(\\d)', '0\\1', v2, perl=TRUE)
#[1] "08:30:00" "08:30:00" "07:09:00" "07:09:00" "07:09:05" "11:10:00"
data
v1 <- c('8:30:00', '08:30:00', '7:09:00', '7:9:00')
v2 <- c(v1, '7:9:5', '11:10:0')
df1 <- structure(list(Date = c("4-Mar-13", "4-Mar-13"), Time = c("08:30:00",
"8:30:00"), TZ = c("America/Costa_Rica", "America/Costa_Rica"
), Jul.Time = c(2456356.1875, 2456356.1875), BirdID = c("test2",
"test2"), Species = c("GREH", "GREH"), Sex = c("M", "M"), Age = c("AHY",
"AHY"), SiteID = c("56scr25", "56scr25"), Latitude = c(8.71191178,
8.71191178), Longitude = c(-82.96866316, -82.96866316)), .Names = c("Date",
"Time", "TZ", "Jul.Time", "BirdID", "Species", "Sex", "Age",
"SiteID", "Latitude", "Longitude"), class = "data.frame", row.names = c(NA,
-2L))
I have data in YYMMDDHH format but am trying to get the weekday so I need to go to a date format but can't figure it out.
Here's a dput of the relevant data:
structure(list(id = c(7927751403363142656, 18236986451472797696,
5654946373641778176, 14195690822403907584, 1693303484298446848,
1.1362181921561e+19, 11694645532962195456, 1221431312630614784,
1987127670789791488, 379819848497418688), hour = c(14102118L,
14102217L, 14102812L, 14102912L, 14102820L, 14102401L, 14102117L,
14102312L, 14102301L, 14102414L)), .Names = c("id", "hour"), row.names = c(3620479L,
8510796L, 29632625L, 34450879L, 31874113L, 13420799L, 3332671L,
11543560L, 9602012L, 15574701L), class = "data.frame")
When I use:
dat2$dow <- as.Date(substr(as.character(dat2$hour), 1,6), format = '%Y%m%d')
I just get NA's. Any suggestions?
"%Y" is for 4-digit years; "%y" is for 2-digit years. And you don't need to use substr. as.Date will ignore anything after the end of the specified format.
dat2$dow <- as.Date(as.character(dat2$hour), format='%y%m%d')