Extracting the hour from a date - r

I have a date in a semi-odd format. It's written like:
Tue Oct 11 20:56:33 +0000 2016
The date is a string. The date shows what time and day a tweet was sent as in a data set. I want to find what hour each of my tweets were sent out. What is the best way to work with this? Is there a way to convert it to a date type easily, or is there a way to work with that string with what I want to do? Obviously I'd like to something like:
format(strptime(testTweets$Date, format = "%m %d %H:%M%S %Y", "%H"))
But I'm not sure how to work with the day of the week or the +0000. Thanks for the help!

The above comment beat me to it, but try the following code:
strptime(x, format = "%a %b %d %H:%M:%S %z %Y")
[1] "2016-10-11 20:56:33"
You only need to check the documentation for strptime to figure out the correct format mask to use:
https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/strptime

Related

How to change a column of characters into date form?

I currently have data as such:
data
I wish to change the 'date' column to date type. (it is now in character).
I have tried the code below but it gives me 'NA' as the result.
as.Date(data$Date, format = "%a %d-%m-%Y %I:%M %p")
Am I making a mistake in the way I am formatting the date to suit the format in my data?
Unfortunately, we don't have your data (we have a picture of your data, but the only way to use that is to transcribe it manually - best to include data as text in your questions).
Anyway, using a brief example in the same format:
dates <- c("Wed 25-Apr-2018 3:20 PM", "Thu 10-Mar-2022 10:53 AM")
We can get the dates by doing:
as.Date(dates, "%a %d-%b-%Y %I:%M %p")
#> [1] "2018-04-25" "2022-03-10"
Note though that this does not preserve the time, and to do this you probably want to use R's built in date-time format, POSIXct. We can get this with:
strptime(dates, "%a %d-%b-%Y %I:%M %p")
#> [1] "2018-04-25 15:20:00 BST" "2022-03-10 10:53:00 GMT"
I think the main problem was that you were using %m for the month, but this only parses months in decimal number format. You need %b for text-abbreviations of months.

How can as.Date() convert fully written dates into ISO 8601? [duplicate]

This question already has an answer here:
Format for ordinal dates (day of month with suffixes -st, -nd, -rd, -th)
(1 answer)
Closed 1 year ago.
I currently have a vector of dates that are in the following format:
a <- c("Wednesday 26th May 2021","Thursday 27th May 2021")
I've tried to get it into ISO 8601 using the following:
as.Date(a, "%I %d%S %F %Y")
But I'm not 100% certain about the syntax of writing dates.
Any thoughts are appreciated!
You can remove the date suffixes and use as.Date -
#Added an extra date that does not have th as prefix.
a <- c("Wednesday 26th May 2021","Thursday 27th May 2021",
'Tuesday 1st June 2021', 'Monday 31st May 2021')
as.Date(sub('(?<=\\d)(th|rd|st|nd)', '', a, perl = TRUE), '%A %d %b %Y')
#[1] "2021-05-26" "2021-05-27" "2021-06-01" "2021-05-31"
Read ?strptime for different format specification.
If you are open to packages lubridate::dmy works directly.
lubridate::dmy(a)
#[1] "2021-05-26" "2021-05-27" "2021-06-01" "2021-05-31"

Format Date from Offset to Greenwich to standard time in R

I am currently looking at some data I uploaded from the EPA website but the date is formatted a little funny: 20170101T2300-0500
I tried reformatting the date and time at once which did not work so I split the column by "T" and successfully reformatted the date but when I entered
Df$time<-strptime(as.character(Df$time),"%I%M%z")
Df$time<- format(Df$time, "%I:%M:%S")
The time column turned into N/A's. I read that "z" was the "offset to Greenwich" factor but for output only, I'm unsure if I used it in the correct context.
You can do directly
strptime("20170101T2300-0500", "%Y%m%dT%H%M%z")
[1] "2017-01-01 22:00:00"
%Y - Year
%m - Month
%d - day
T - strict text delimiter
%H - Hours
%M - Minutes
%z - signed offset

Dealing with twitter timestamps in R

I've got a dataset with tweets and the informations twitter provides about them. I need to transform my dates from the format given to one I can understand properly (preferentially using a function I can choose the format, since I might need to select the tweets by day of the week, time of the day or anything like that) using R, I'm just starting to learn the language.
the format I've got the dates in is:
1420121295000
1420121298000
I've researched a bit before answering and tried to use functions like as.POSIXct, as>POSIXlt and others, they all got me this error:
Error in as.POSIXct.default(date, format = "%a %b %d %H:%M:%S %z %Y", :
do not know how to convert 'date' to class "POSIXct"
The format above is in epochs. Assuming this is in milliseconds since the epoch (you would have to double-check with the Twitter api), you can convert from epoch to UTC time using anytime function from the anytime package as shown below, which returns "2015-01-01 14:08:15 UTC."
anytime(1420121295000*0.001) #times 0.001 to convert to seconds
format(anytime(1420121295000*0.001), tz = "America/New_York", usetz=TRUE) #converting from UTC to EST timezone.

R: convert character vector to POSIXlt

I'm wondering how to convert a column in a data frame that contains character vectors like this one "Mon Aug 19 05:00:07 +0000 2013" into the POSIXlt format.
df$created_at<-as.POSIXlt(df$created_at, format= "%a %b %d %H:%M:%S %z %Y",tz="")
gives me NA's
I followed http://stat.ethz.ch/R-manual/R-devel/library/base/html/strptime.html
Thank you!
The problem seems to be locale-related. %a and %b match abbreviated day and month name in the current locale, so if your current locale is not set to english, the Mon and Aug in your example won't be recognised as day and month names.
One workaround can be to set your locale to english or to C, also known as the POSIX locale. This can be done with :
Sys.setlocale("LC_TIME", "C")

Resources