R convert following format to date class - r

I have a data frame consist of two columns date and Text. The format of date is somewhat typical as Jan 09 05:44:30 +0000 2015. Now i want to convert this date in to format as 01/09/2015 05:44:30 or Jan/09/2015 05:44:30. I did some efforts on single date and it worked fine but same failed on whole date column. Please help .
I tried as such way :
p <- "Jan 09 05:44:30 +0000 2015"
p <- sub("Jan","01",p)
p1 <- strsplit(p," ")
p2 <- unlist(p1)
append(p2,p2[5], after=2)
I have data frame which looks like :
Text Date
"...some text ....." Jan 09 05:44:30 +0000 2015
"...some text ....." Jan 09 05:44:30 +0000 2015
"...some text ....." Jan 09 05:44:30 +0000 2015
"...some text ....." Jan 09 05:44:30 +0000 2015
and I want it as:
Text Date
"...some text ....." 01/09/2015 05:44:30
"...some text ....." 01/09/2015 05:44:30
"...some text ....." 01/09/2015 05:44:30
"...some text ....." 01/09/2015 05:44:30

Study help("strptime") to learn how to create the format string.
p <- "Jan 09 05:44:30 +0000 2015"
as.POSIXct(p, format="%b %d %H:%M:%S %z %Y", tz="GMT")
#[1] "2015-01-09 05:44:30 GMT"
This gives you a datetime object (and is of course vectorized). Use the format function as necessary for creating output strings with other formats if you must.

Related

Extract time stamps from string and convert to R POSIXct object

Currently, my dataset has a time variable (factor) in the following format:
weekday month day hour min seconds +0000 year
I don't know what the "+0000" field is but all observations have this. For example:
"Tues Feb 02 11:05:21 +0000 2018"
"Mon Jun 12 06:21:50 +0000 2017"
"Wed Aug 01 11:24:08 +0000 2018"
I want to convert these values to POSIXlt or POSIXct objects(year-month-day hour:min:sec) and make them numeric. Currently, using as.numeric(as.character(time-variable)) outputs incorrect values.
Thank you for the great responses! I really appreciate a lot.
Not sure how to reproduce the transition from factor to char, but starting from that this code should work:
t <- unlist(strsplit(as.character("Tues Feb 02 11:05:21 +0000 2018")," "))
strptime(paste(t[6],t[2],t[3], t[4]),format='%Y %b %d %H:%M:%S')
PS: More on date formats and conversion: https://www.stat.berkeley.edu/~s133/dates.html
For this problem you can get by without using lubridate. First, to extract individual dates we can use regmatches and gregexpr:
date_char <- 'Tue Feb 02 11:05:21 +0000 2018 Mon Jun 12 06:21:50 +0000 2017'
ptrn <- '([[:alpha:]]{3} [[:alpha:]]{3} [[:digit:]]{2} [[:digit:]]{2}\\:[[:digit:]]{2}\\:[[:digit:]]{2} \\+[[:digit:]]{4} [[:digit:]]{4})'
date_vec <- unlist( regmatches(date_char, gregexpr(ptrn, date_char)))
> date_vec
[1] "Tue Feb 02 11:05:21 +0000 2018" "Mon Jun 12 06:21:50 +0000 2017"
You can learn more about regular expressions here.
In the above example +0000 field is the UTC offset in hours e.g. it would be -0500 for EST timezone. To convert to R date-time object:
> as.POSIXct(date_vec, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC')
[1] "2018-02-02 11:05:21 UTC" "2017-06-12 06:21:50 UTC"
which is the desired output. The formats can be found here or you can use lubridate::guess_formats(). If you don't specify the tz, you'll get the output in your system's time zone (e.g. for me that would be EST). Since the offset is specified in the format, R correctly carries out the conversion.
To get numeric values, the following works:
> as.numeric(as.POSIXct(date_vec, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC'))
[1] 1517569521 1497248510
Note: this is based on uniform string structure. In the OP there was Tues instead of Tue which wouldn't work. The above example is based on the three-letter abbreviation which is the standard reporting format.
If however, your data is a mix of different formats, you'd have to extract individual time strings (customized regexes, of course), then use lubridate::guess_formats() to get the formats and then use those to carry out the conversion.
Hope this is helpful!!

Difficult Date Time Conversion in R

I'm trying to separate this date/time string in R but have not been successful.
Here is an example of the strings:
"Thu Sep 28 02:11:51 +0000 2017"
"Mon Oct 02 19:22:35 +0000 2017"
What is the best way to make this tidy? I've realized this is far beyond my skills.
Try something like this:
as.POSIXct(gsub("\\+0000", '', "Thu Sep 28 02:11:51 +0000 2017"), format = "%a %b %d %H:%M:%S %Y")
which gives "2017-09-28 02:11:51 EDT"

how to convert the Tue Jan 31 17:38:10 +0000 2017 format to date format in R?

I tried ISOdatetime() but it's not working.
error: argument "min" is missing, with no default
for example: Tue Jan 31 17:38:10 +0000 2017 -> 31/01/2017 or 31-01-2017
We can use strptime
format(strptime(str1, format = "%a %b %d %H:%M:%OS%z %Y"), "%d/%m/%Y")
#[1] "31/01/2017"

Having trouble converting df index to datetime object

So this is my dataframe
Ticker Owner \
SEC Form 4
Nov 09 02:19 PM HSY HERSHEY TRUST
Nov 09 02:05 PM HSY HERSHEY TRUST CO
Nov 09 02:03 PM WDFC PITTARD DANIEL E
Nov 09 01:34 PM IMGN Enyedy Mark J
Nov 09 01:25 PM ORI ZUCARO ALDO C
I'm trying to convert the index(SEC Form 4) into a datetime object, so I can use that object's methods. However, I like the current format style of the date (Nov 09 02:19 PM) and don't want to replace it with something like (2016-11-09 14:19).
pd.to_datetime(df.index, format = '%b %d')
pd.to_datetime(df.index, format = '%b %d %I:%M %p' )
I played around with some of these format parameters but it seems these change the look display style of the date into something like (2016-11-09 14:19:00) format, which is not the format that I want.
I even tried to see if there was a dtype datetime that I can just convert to (so I won't have to change the display look) but I had no luck finding such a dtype.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dtypes.html
Thank you.
Maybe you missed to append the year since it is not specified in the data. Here is a possible solution.
zz = """"SEC Form 4" Ticker Owner
"Nov 09 02:19 PM" "HSY" "HERSHEY TRUST"
"Nov 09 02:05 PM" HSY "HERSHEY TRUST CO"
"Nov 09 02:03 PM" WDFC "PITTARD DANIEL E"
"Nov 09 01:34 PM" IMGN "Enyedy Mark J"
"Nov 09 01:25 PM" ORI "ZUCARO ALDO C"
"""
df = pd.read_table(io.StringIO(zz), delim_whitespace=True)
df.set_index('SEC Form 4', inplace=True)
# Adding the missing year
df.index = '2016 ' + df.index
# There is no need to detail the expected format
df.index = pd.to_datetime(df.index)
print(df.index.dtype)
print(df)
# datetime64[ns]
# Ticker Owner
# 2016-11-09 14:19:00 HSY HERSHEY TRUST
# 2016-11-09 14:05:00 HSY HERSHEY TRUST CO
# 2016-11-09 14:03:00 WDFC PITTARD DANIEL E
# 2016-11-09 13:34:00 IMGN Enyedy Mark J
# 2016-11-09 13:25:00 ORI ZUCARO ALDO C

Reading tweet time in R

I have downloaded tweets in json format.Now i want to represent the tweets creation rate with respect to time.There is a variable named 'created at' which represents when the tweet was created.I have this variable in this format:
Thu Apr 09 15:43:18 +0000 2015
I was able to read all the other things,but no idea how to read this +0000.Previously i tried reading this in R,which was a success:
Thu Apr 09 15:43:18 2015
For reading this above variable i used this following code:
earlier <-strptime("Thu Apr 09 15:43:18 2015","%a %b %d %H:%M:%S %Y")
Please help me how i can read the first code in R.
You can try
as.POSIXct('Thu Apr 09 15:43:18 +0000 2015',
format='%a %b %d %H:%M:%S %z %Y', tz='GMT')
#[1] "2015-04-09 15:43:18 GMT"
According to ?strptime
‘%z’ Signed offset in hours and minutes from UTC, so ‘-0800’ is 8
hours behind UTC. Values up to ‘+1400’ are accepted as from R
3.1.1: previous versions only accepted up to ‘+1200’.
(Standard only for output.)

Resources