Getting a handle on date conversion in R - r

I created a column of dates by hand.
Need to convert entire column Date to xx/xx/xxxx (01/dd/2012)
I know how to use substrings to extract month day year from a standard data class like 01JAN12
DateM=substr(Date,START,FINISH)
DateM
Now put it together as xx/xx/xx
Here is my code:
Date=c("January 23,2012","January 24,2012","January
25,2012","January 26,2012","January 27,2012")
# WANT: 23JAN12 24JAN12 25JAN12 26JAN12 27JAN12
Date3=format(Date,"%B %d %Y")
# Error in format.default(Date, "%B %d %Y") : invalid 'trim'
argument but I still get the dates ok when I print out
#WANT: 01/23/2012 01/24/2012 01/25/2012 01/26/2012 01/27/2012 from
Date3
Date4=format(Date3,"%B %d %Y")
invalid 'trim' argument
# WANT: 2012-01-23 2012-01-24 2012-01-25 2012-01-26 2012-01-27
Date5=format(Date3,"%Y %B %d")
invalid 'trim' argument
I want to use the basic functions of R as opposed to lubridate.
Can someone direct me how to finish this, please?

You should use as.Date and not format
as.Date(Date, "%B %d, %Y")
#[1] "2012-01-23" "2012-01-24" "2012-01-25" "2012-01-26" "2012-01-27"
as.Date is used to convert dates from different character representation to "Date" class.
Once you have data in the "Date" class, we can then use format to represent them in the way we want.
format(as.Date(Date, "%B %d, %Y"), "%m/%d/%Y")
#[1] "01/23/2012" "01/24/2012" "01/25/2012" "01/26/2012" "01/27/2012"

Related

Sort column with date and time in R dataframe

I merged several other dataframes together. However, now the dates are no longer chronologically order (See photo). How do I order the dataframe based on the values of the 'Date' column?
R dataframe output which I want to change
I first tried to set the 'Date' column as index, but since the 'Date' column does not only have unique values, I can't.
Whenever I do:
new_df <- new_df[order(new_df$Date),]
it only sorts the dates based on their first value.
Next to that, sometimes there are multiple exact the same values for the 'Date' column. How can I make the index the same whenever the 'Date' column has the exact same value?
The order should be based on the converted to Date class
new_df$Date1 <- as.Date(new_df$Date, "%A, %d %b %Y, %H:%M")
If we want to keep the time part as well in ordering, use as.POSIXct
new_df$Date1 <- as.POSIXct(new_df$Date,format = "%A, %d %b %Y, %H:%M")
and then do
new_df <- new_df[order(new_df$Date1),]
If we want to create a time series object, use xts
library(xts)
xts(new_df["Income"], order.by = new_df$Date1)
As a reproducible example
> str1 <- "Saturday, 12 Apr 2014, 18:00"
> as.Date(str1, "%A, %d %b %Y, %H:%M")
[1] "2014-04-12"
> as.POSIXct(str1, format = "%A, %d %b %Y, %H:%M")
[1] "2014-04-12 18:00:00 EDT"

Converting string dates to numeric dates

I want to convert this kind of dates :Apr 09, 2019 to this kind of dates: Apr 09, 2019-04-09
I wrote
as.Date(Data$date, format = "%B %d, %Y")
format(as.Date(Data$date, format = "%B %d, %Y"), "%d-%m-%Y")
That code worked, however when I View(Data) I see that it had not converted.
Why? Any idea?
The reason is that the column is not updated. We need to assign (<-) the results back to the original column or a new column
Data$date <- format(as.Date(Data$date, format = "%B %d, %Y"), "%d-%m-%Y")

Confusion regarding DateTime conversion in R

I am trying to convert a character string into a dateTime object in R
Sample data:
Following is the code that I am using for conversion
sample$Tweet_Timestamp <- lapply(sample$Tweet_Timestamp, function(x) as.POSIXct(strptime(x, "%a %b %d %H:%M:%S %z %Y")))
sample<-sample%>%unnest(Tweet_Timestamp)
The result I am getting is as follows:
Now in the result we can see that the date has converted from 18th Feb to 19th Feb. I cannot understand the reason why I am getting such result.Can someone help me decipher this?
as.POSIXct will automatically convert the date-time to the time zone of your local system. If you wish to retain the original time zone, you can do so by adding tz = "UTC" which is the defualt Universal Time.
For instance, the following code (using 1st row of your sample data):
as.POSIXct(strptime("Tue Feb 18 23:09:57 +0000 2014", "%a %b %d %H:%M:%S %z %Y", tz = "UTC"))
will produce the following output (without altering the time zone):
[1] "2014-02-18 23:09:57 UTC"

as.Date returning NA while converting it from character

I am converting following format to date from character
January 2016
I want to convert it to following format
201601
I am using following code
df$date <- as.Date(df$date,"%B %Y")
But it returns me NA values. I have even set the locale as follows
lct<- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME",lct)
But it gives me NA values. How to fix it
We can do this easily with as.yearmon and format
library(zoo)
format(as.yearmon(str1), "%Y%m")
#[1] "201601"
If we are going by the as.Date route, then 'Date' requires day also, so, paste a day and then use format after converting to 'Date'
format(as.Date(paste(str1, '01'), "%B %Y %d") , "%Y%m")
data
str1 <- "January 2016"

Convert 12 hour to 24 hour format in R

I am not getting the right conversion when I try to convert 12 hours to 24 hours. My script (with sample data is below).
library(lubridate)
library(data.table)
# Read Sample Data
df <- data.frame(c("April 22 2016 10:49:15 AM","April 22 2016 10:01:21 AM","April 22 2016 09:06:40 AM","April 21 2016 09:50:49 PM","April 21 2016 06:07:18 PM"))
colnames(df) <- c("Date") # Set Column name
dt <- setDT(df) # Convert to data.table
ff <- function(x) as.POSIXlt(strptime(x,"%B %d %Y %H:%M:%S %p"))
dt[,dates := as.Date(ff(Date))]
When I try creating a new variable called TOD, I get the output in H:M:S format without converting it into 24 hour format. What I mean is that for the 3rd row, instead of getting 21:50:49 I get 09:50:49. I tried two different ways to do this. One use as.ITime from data.table and then also using strptime. The code I use to calculate TOD is below.
dt[,TOD1 := as.ITime(ff(Date))]
dt$TOD2 <- format(strptime(dt$Date, "%B %d %Y %H:%M:%S %p"), format="%I:%M:%S")
I thought of trying it using dataframe instead of data.table to eliminate any issues with using strptime in data.table and still got the same answer.
df$TOD <- format(strptime(df$Date, "%B %d %Y %H:%M:%S %p"), format="%I:%M:%S") # Using dataframe instead of data.table
Any insights on how to get the right answer?
As commented #lmo, you need to use %I parameter instead of %H, from ?strptime:
%H
Hours as decimal number (00–23). As a special exception strings
such as 24:00:00 are accepted for input, since ISO 8601 allows these.
%I
Hours as decimal number (01–12).
strptime("April 21 2016 09:50:49 PM", "%B %d %Y %I:%M:%S %p")
# [1] "2016-04-21 21:50:49 EDT"
Here you go:
library(lubridate)
df$Date <- mdy_hms(df$Date)
Note that while mdy_hms is extremely convenient and takes care of the 12 / 24 hour time for you, it will automatically assign UTC as a time zone. You can specify a different one if you need. You can then convert df to a data.table if you like.

Resources