How to Zero Pad Month and Day in R - r

I've read-in an excel file with dates formatted as m/d/yyyy, which R is reading as a factor. Trying the below returns all NA:
strptime(tvMid$calendar_date, format = "%Y-%m-%d")
It appears I need to zero pad the m and/or d conditionally, based-on whether it is a single-digit month or day. What is the best approach to add these zeroes?

data.frame(
calendar_date = "5/13/2018"
) -> tvMid
strptime(tvMid$calendar_date, format = "%m/%d/%Y")
## [1] "2018-05-13 EDT"

Related

How to calculate time difference in R using an dataframe

Have an large data frame where there's 2 columns (POSIXct) and need to calculate length of ride.
Dates are formatted as follows:
format: "2020-10-31 19:39:43"
Can use the difftime function, correct?
Thanks
Given your data is using the correct POSIXct format you can simply subtract two dates to get the difference. No need for additional functions.
date1 <- as.POSIXct(strptime("2020-10-31 19:39:43", format = "%Y-%m-%d %H:%M:%OS"))
date2 <- as.POSIXct(strptime("2020-10-31 19:20:43", format = "%Y-%m-%d %H:%M:%OS"))
date1 - date2
Output: Time difference of 19 mins
It depends what output format you want.
For example if you want month difference between two dates, you can use the "interval" function from library "lubridate"
library(lubridate)
interval(as.Date(df$date1),as.Date(df$date2) %/% months(1))
It also works with years, weeks, days, hours

Counting Observations Within Date Range R

This probably has a really simply solution. I have two data sets. One is a vector of POSIXct tweet timestamps and the second is a vector of POSIXct ADL HEAT Map timestamps.
I'm looking to build a function that lets me take the dates from the tweets vector and for each one count the number of timestamps in the ADL HEAT Map vector that fall within a specified range from the tweet.
My aim is to build the function such that I can put in the tweets vector, the ADL vector, the number of days from the tweets vector to start counting, and the number of days from the tweets vector to stop counting, and return a vector of counts the same length as the tweets data.
I already tried the solution here, and it didn't work: Count number of occurences in date range in R
Here's an example of what I'm trying to do. Here's a smaller version of the data sets I'm using:
tweets <- c("2016-12-12 14:34:00 GMT", "2016-12-5 17:20:06 GMT")
ADLData <- c("2016-12-11 16:30:00 GMT", "2016-12-7 18:00:00 GMT", "2016-12-2 09:10:00 GMT")
I want to create a function, let's call it countingfunction that lets me input the first data set, the second one, and call a number of days to look back. In this example, I chose 7 days:
countingfunction(tweets, ADLData, 7)
Ideally this would return a vector of the length of tweets or in this case 2 with counts for each of how many events in ADLData occurred within the past 7 days from the date in tweets. In this case, c(2,1).
So, if I have understood you correctly you have that kind of data:
tweets <- c(as.POSIXct("2020-08-16", tz = ""), as.POSIXct("2020-08-15", tz = ""), as.POSIXct("2020-08-14", tz = ""), as.POSIXct("2020-08-13", tz = ""))
ADL <- c(as.POSIXct("2020-08-15", tz = ""), as.POSIXct("2020-08-14", tz = ""))
And what you want to do, is to say whether a tweet is within the ADL date range or not. That could be accomplished doing this:
ifelse(tweets %in% ADL, print("its in"), print("its not"))
You can assign this easily to another vector, which then states whether it is in or not.
You can write countingfunction with the help of outer and calculate the difference in time between every value of two vectors using difftime.
countingfunction <- function(x1, x2, n) {
mat <- outer(x1, x2, difftime, units = 'days')
rowSums(mat > 0 & mat <= n)
}
Assuming you have vectors of class POSIXct like these :
tweets <- as.POSIXct(c("2016-12-12 14:34:00", "2016-12-5 17:20:06"), tz = 'GMT')
ADLData <- as.POSIXct(c("2016-12-11 16:30:00","2016-12-7 18:00:00",
"2016-12-2 09:10:00"), tz = 'GMT')
n <- 7
You can pass them as :
countingfunction(tweets, ADLData, n)
#[1] 2 1

Mixed Date formats in R data frame

how do you work with a column of mixed date types, for example 8/2/2020,2/7/2020, and all are reflecting February,
I have tried zoo::as.Date(mixeddatescolumn,"%d/%m/%Y").The first one is right but the second is wrong.
i have tried solutions here too
Fixing mixed date formats in data frame? but the questions seems different from what i am handling.
It is really tricky to know even for a human if dates like '8/2/2020' is 8th February or 2nd August. However, we can leverage the fact that you know all these dates are in February and remove the "2" part of the date which represents the month and arrange the date in one standard format and then convert the date to an actual Date object.
x <- c('8/2/2020','2/7/2020')
lubridate::mdy(paste0('2/', sub('2/', '', x, fixed = TRUE)))
#[1] "2020-02-08" "2020-02-07"
Or same in base R :
as.Date(paste0('2/', sub('2/', '', x, fixed = TRUE)), "%m/%d/%Y")
Since we know that every month is in February search for /2/ or /02/ and if found the middle number is the month; otherwise, the first number is the month. In either case set the format appropriately and use as.Date. No packages are used.
dates <- c("8/2/2020", "2/7/2020", "2/28/2000", "28/2/2000") # test data
as.Date(dates, ifelse(grepl("/0?2/", dates), "%d/%m/%Y", "%m/%d/%Y"))
## [1] "2020-02-08" "2020-02-07" "2000-02-28" "2000-02-28"

How to convert Hour Minutes character format into a POSIXlt format?

Current situation:
mydate <- "14:45"
class(mydate)
The current class of this value is a character. I would like to convert it into a POSIXlt format.
I tried the strptime() function but it unfortunately adds the full date to my hours when I actually only need Hours:Minutes
mydate <- strptime(mydate, format = "%H:%M")
What can I do to get a POSIXlt format uniquely containing hours and minutes ?
Thanks in advance for your returns !
POSIXlt and POSIXct always contain date and time. You can use chron times class to represent times less than 24:00:00.
library(chron)
tt <- times(paste(mydate, "00", sep = ":"))
tt
## [1] 14:45:00
times class objects are represented internally as a fraction of a day so, for example, adding 1/24 will add an hour.
tt + 1/24 # add one hour
## [1] 15:45:00
For me it works like this:
test <- "2016-04-10T12:21:25.4278624"
z <- as.POSIXct(test,format="%Y-%m-%dT%H:%M:%OS")
#output:
z
"2016-04-10 12:21:25 CEST"
The code is form here: R: convert date from character to datetime

How to convert character to Date format in R?

How to convert the below in character to Date format?
YYYY.MM
I am facing an issue dealing with zeroes after decimal points for month 10.
Say
2012.10
appears in my input source data as
2012.1
with the zero post decimal missing. How do I bring this back in the Date format?
Since you have only year and month, you need to assign some value for day before you convert to date. In the example below, the day has arbitrarily been chosen as 15.
IF THE INPUT IS CHARACTER
dates = c("2012.10", "2012.01")
lubridate::ymd(paste0(year_month = dates, day = "15"))
#[1] "2012-10-15" "2012-01-15"
IF THE INPUT IS NUMERIC
dates = c(2012.10, 2012.01)
do.call(c, lapply(strsplit(as.character(dates), "\\."), function(d){
if(nchar(d[2]) == 1){
d[2] = paste0(d[2],"0")
}
lubridate::ymd(paste0(year = d[1], month = d[2], day = "15"))
}))
#[1] "2012-10-15" "2012-01-15"
The zoo package has a "yearmon" class for representing year and month without day. Internally it stores them as year + fraction where fraction = 0 for Jan, 1/12 for Feb, 2/12 for Mar and so on but it prints in a nicer format and sorts as expected. Assuming that your input, x, is numeric convert it to character with 2 digit month and then apply as.yearmon with the appropriate format.
library(zoo)
x <- c(2012.1, 2012.01) # test data
as.yearmon(sprintf("%.2f", x), "%Y.%m")
## [1] "Oct 2012" "Jan 2012"
as.Date can be applied to convert a "yearmon" object to "Date" class if desired but normally that is not necessary.
as.Date(as.yearmon(sprintf("%.2f", x), "%Y.%m"))
## [1] "2012-10-01" "2012-01-01"
The code below uses the ymd() function from the lubridate package and sprintf() to coerce dates given in a numeric format
dates <- c(2012.1, 2012.01)
as well as dates given as a character string
dates <- c("2012.1", "2012.01")
where the part left of the decimal point specifies the year whereas the fractional part denote the month.
lubridate::ymd(sprintf("%.2f", as.numeric(dates)), truncated = 1L)
[1] "2012-10-01" "2012-01-01"
The format specification %.2f tells sprintf() to use 2 decimal places.
The parameter truncated = 1L indicates that one date element is missing (day) and should be completed by the default value (the first day of the month). Alternatively, the day of the month can be directly specified in the format specification to sprintf():
lubridate::ymd(sprintf("%.2f-15", as.numeric(dates)))
[1] "2012-10-15" "2012-01-15"

Resources