I have a data frame consist of two columns date and Text. The format of date is somewhat typical as Jan 09 05:44:30 +0000 2015. Now i want to convert this date in to format as 01/09/2015 05:44:30 or Jan/09/2015 05:44:30. I did some efforts on single date and it worked fine but same failed on whole date column. Please help .
I tried as such way :
p <- "Jan 09 05:44:30 +0000 2015"
p <- sub("Jan","01",p)
p1 <- strsplit(p," ")
p2 <- unlist(p1)
append(p2,p2[5], after=2)
I have data frame which looks like :
Text Date
"...some text ....." Jan 09 05:44:30 +0000 2015
"...some text ....." Jan 09 05:44:30 +0000 2015
"...some text ....." Jan 09 05:44:30 +0000 2015
"...some text ....." Jan 09 05:44:30 +0000 2015
and I want it as:
Text Date
"...some text ....." 01/09/2015 05:44:30
"...some text ....." 01/09/2015 05:44:30
"...some text ....." 01/09/2015 05:44:30
"...some text ....." 01/09/2015 05:44:30
Study help("strptime") to learn how to create the format string.
p <- "Jan 09 05:44:30 +0000 2015"
as.POSIXct(p, format="%b %d %H:%M:%S %z %Y", tz="GMT")
#[1] "2015-01-09 05:44:30 GMT"
This gives you a datetime object (and is of course vectorized). Use the format function as necessary for creating output strings with other formats if you must.
Related
Currently, my dataset has a time variable (factor) in the following format:
weekday month day hour min seconds +0000 year
I don't know what the "+0000" field is but all observations have this. For example:
"Tues Feb 02 11:05:21 +0000 2018"
"Mon Jun 12 06:21:50 +0000 2017"
"Wed Aug 01 11:24:08 +0000 2018"
I want to convert these values to POSIXlt or POSIXct objects(year-month-day hour:min:sec) and make them numeric. Currently, using as.numeric(as.character(time-variable)) outputs incorrect values.
Thank you for the great responses! I really appreciate a lot.
Not sure how to reproduce the transition from factor to char, but starting from that this code should work:
t <- unlist(strsplit(as.character("Tues Feb 02 11:05:21 +0000 2018")," "))
strptime(paste(t[6],t[2],t[3], t[4]),format='%Y %b %d %H:%M:%S')
PS: More on date formats and conversion: https://www.stat.berkeley.edu/~s133/dates.html
For this problem you can get by without using lubridate. First, to extract individual dates we can use regmatches and gregexpr:
date_char <- 'Tue Feb 02 11:05:21 +0000 2018 Mon Jun 12 06:21:50 +0000 2017'
ptrn <- '([[:alpha:]]{3} [[:alpha:]]{3} [[:digit:]]{2} [[:digit:]]{2}\\:[[:digit:]]{2}\\:[[:digit:]]{2} \\+[[:digit:]]{4} [[:digit:]]{4})'
date_vec <- unlist( regmatches(date_char, gregexpr(ptrn, date_char)))
> date_vec
[1] "Tue Feb 02 11:05:21 +0000 2018" "Mon Jun 12 06:21:50 +0000 2017"
You can learn more about regular expressions here.
In the above example +0000 field is the UTC offset in hours e.g. it would be -0500 for EST timezone. To convert to R date-time object:
> as.POSIXct(date_vec, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC')
[1] "2018-02-02 11:05:21 UTC" "2017-06-12 06:21:50 UTC"
which is the desired output. The formats can be found here or you can use lubridate::guess_formats(). If you don't specify the tz, you'll get the output in your system's time zone (e.g. for me that would be EST). Since the offset is specified in the format, R correctly carries out the conversion.
To get numeric values, the following works:
> as.numeric(as.POSIXct(date_vec, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC'))
[1] 1517569521 1497248510
Note: this is based on uniform string structure. In the OP there was Tues instead of Tue which wouldn't work. The above example is based on the three-letter abbreviation which is the standard reporting format.
If however, your data is a mix of different formats, you'd have to extract individual time strings (customized regexes, of course), then use lubridate::guess_formats() to get the formats and then use those to carry out the conversion.
Hope this is helpful!!
I'm trying to separate this date/time string in R but have not been successful.
Here is an example of the strings:
"Thu Sep 28 02:11:51 +0000 2017"
"Mon Oct 02 19:22:35 +0000 2017"
What is the best way to make this tidy? I've realized this is far beyond my skills.
Try something like this:
as.POSIXct(gsub("\\+0000", '', "Thu Sep 28 02:11:51 +0000 2017"), format = "%a %b %d %H:%M:%S %Y")
which gives "2017-09-28 02:11:51 EDT"
I tried ISOdatetime() but it's not working.
error: argument "min" is missing, with no default
for example: Tue Jan 31 17:38:10 +0000 2017 -> 31/01/2017 or 31-01-2017
We can use strptime
format(strptime(str1, format = "%a %b %d %H:%M:%OS%z %Y"), "%d/%m/%Y")
#[1] "31/01/2017"
So this is my dataframe
Ticker Owner \
SEC Form 4
Nov 09 02:19 PM HSY HERSHEY TRUST
Nov 09 02:05 PM HSY HERSHEY TRUST CO
Nov 09 02:03 PM WDFC PITTARD DANIEL E
Nov 09 01:34 PM IMGN Enyedy Mark J
Nov 09 01:25 PM ORI ZUCARO ALDO C
I'm trying to convert the index(SEC Form 4) into a datetime object, so I can use that object's methods. However, I like the current format style of the date (Nov 09 02:19 PM) and don't want to replace it with something like (2016-11-09 14:19).
pd.to_datetime(df.index, format = '%b %d')
pd.to_datetime(df.index, format = '%b %d %I:%M %p' )
I played around with some of these format parameters but it seems these change the look display style of the date into something like (2016-11-09 14:19:00) format, which is not the format that I want.
I even tried to see if there was a dtype datetime that I can just convert to (so I won't have to change the display look) but I had no luck finding such a dtype.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dtypes.html
Thank you.
Maybe you missed to append the year since it is not specified in the data. Here is a possible solution.
zz = """"SEC Form 4" Ticker Owner
"Nov 09 02:19 PM" "HSY" "HERSHEY TRUST"
"Nov 09 02:05 PM" HSY "HERSHEY TRUST CO"
"Nov 09 02:03 PM" WDFC "PITTARD DANIEL E"
"Nov 09 01:34 PM" IMGN "Enyedy Mark J"
"Nov 09 01:25 PM" ORI "ZUCARO ALDO C"
"""
df = pd.read_table(io.StringIO(zz), delim_whitespace=True)
df.set_index('SEC Form 4', inplace=True)
# Adding the missing year
df.index = '2016 ' + df.index
# There is no need to detail the expected format
df.index = pd.to_datetime(df.index)
print(df.index.dtype)
print(df)
# datetime64[ns]
# Ticker Owner
# 2016-11-09 14:19:00 HSY HERSHEY TRUST
# 2016-11-09 14:05:00 HSY HERSHEY TRUST CO
# 2016-11-09 14:03:00 WDFC PITTARD DANIEL E
# 2016-11-09 13:34:00 IMGN Enyedy Mark J
# 2016-11-09 13:25:00 ORI ZUCARO ALDO C
I have downloaded tweets in json format.Now i want to represent the tweets creation rate with respect to time.There is a variable named 'created at' which represents when the tweet was created.I have this variable in this format:
Thu Apr 09 15:43:18 +0000 2015
I was able to read all the other things,but no idea how to read this +0000.Previously i tried reading this in R,which was a success:
Thu Apr 09 15:43:18 2015
For reading this above variable i used this following code:
earlier <-strptime("Thu Apr 09 15:43:18 2015","%a %b %d %H:%M:%S %Y")
Please help me how i can read the first code in R.
You can try
as.POSIXct('Thu Apr 09 15:43:18 +0000 2015',
format='%a %b %d %H:%M:%S %z %Y', tz='GMT')
#[1] "2015-04-09 15:43:18 GMT"
According to ?strptime
‘%z’ Signed offset in hours and minutes from UTC, so ‘-0800’ is 8
hours behind UTC. Values up to ‘+1400’ are accepted as from R
3.1.1: previous versions only accepted up to ‘+1200’.
(Standard only for output.)