I have downloaded tweets in json format.Now i want to represent the tweets creation rate with respect to time.There is a variable named 'created at' which represents when the tweet was created.I have this variable in this format:
Thu Apr 09 15:43:18 +0000 2015
I was able to read all the other things,but no idea how to read this +0000.Previously i tried reading this in R,which was a success:
Thu Apr 09 15:43:18 2015
For reading this above variable i used this following code:
earlier <-strptime("Thu Apr 09 15:43:18 2015","%a %b %d %H:%M:%S %Y")
Please help me how i can read the first code in R.
You can try
as.POSIXct('Thu Apr 09 15:43:18 +0000 2015',
format='%a %b %d %H:%M:%S %z %Y', tz='GMT')
#[1] "2015-04-09 15:43:18 GMT"
According to ?strptime
‘%z’ Signed offset in hours and minutes from UTC, so ‘-0800’ is 8
hours behind UTC. Values up to ‘+1400’ are accepted as from R
3.1.1: previous versions only accepted up to ‘+1200’.
(Standard only for output.)
Related
I have a dataframe where time is given in the format: Tue Apr 03 18:00:06 +0000 2012
And timezone offset is given in minutes in the format: 240
How do I add 240 minutes to the time above so I can get local time from UTC time?
We could convert to POSIXct and then add the minutes
library(lubridate)
out <- as.POSIXct(str1, format = '%a %b %d %H:%M:%S +0000 %Y') %m+% minutes(240)
tz(out) <- "UTC"
out
#[1] "2012-04-03 22:00:06 UTC"
data
str1 <- 'Tue Apr 03 18:00:06 +0000 2012'
Currently, my dataset has a time variable (factor) in the following format:
weekday month day hour min seconds +0000 year
I don't know what the "+0000" field is but all observations have this. For example:
"Tues Feb 02 11:05:21 +0000 2018"
"Mon Jun 12 06:21:50 +0000 2017"
"Wed Aug 01 11:24:08 +0000 2018"
I want to convert these values to POSIXlt or POSIXct objects(year-month-day hour:min:sec) and make them numeric. Currently, using as.numeric(as.character(time-variable)) outputs incorrect values.
Thank you for the great responses! I really appreciate a lot.
Not sure how to reproduce the transition from factor to char, but starting from that this code should work:
t <- unlist(strsplit(as.character("Tues Feb 02 11:05:21 +0000 2018")," "))
strptime(paste(t[6],t[2],t[3], t[4]),format='%Y %b %d %H:%M:%S')
PS: More on date formats and conversion: https://www.stat.berkeley.edu/~s133/dates.html
For this problem you can get by without using lubridate. First, to extract individual dates we can use regmatches and gregexpr:
date_char <- 'Tue Feb 02 11:05:21 +0000 2018 Mon Jun 12 06:21:50 +0000 2017'
ptrn <- '([[:alpha:]]{3} [[:alpha:]]{3} [[:digit:]]{2} [[:digit:]]{2}\\:[[:digit:]]{2}\\:[[:digit:]]{2} \\+[[:digit:]]{4} [[:digit:]]{4})'
date_vec <- unlist( regmatches(date_char, gregexpr(ptrn, date_char)))
> date_vec
[1] "Tue Feb 02 11:05:21 +0000 2018" "Mon Jun 12 06:21:50 +0000 2017"
You can learn more about regular expressions here.
In the above example +0000 field is the UTC offset in hours e.g. it would be -0500 for EST timezone. To convert to R date-time object:
> as.POSIXct(date_vec, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC')
[1] "2018-02-02 11:05:21 UTC" "2017-06-12 06:21:50 UTC"
which is the desired output. The formats can be found here or you can use lubridate::guess_formats(). If you don't specify the tz, you'll get the output in your system's time zone (e.g. for me that would be EST). Since the offset is specified in the format, R correctly carries out the conversion.
To get numeric values, the following works:
> as.numeric(as.POSIXct(date_vec, format = '%a %b %d %H:%M:%S %z %Y', tz = 'UTC'))
[1] 1517569521 1497248510
Note: this is based on uniform string structure. In the OP there was Tues instead of Tue which wouldn't work. The above example is based on the three-letter abbreviation which is the standard reporting format.
If however, your data is a mix of different formats, you'd have to extract individual time strings (customized regexes, of course), then use lubridate::guess_formats() to get the formats and then use those to carry out the conversion.
Hope this is helpful!!
I have a set of date times in my DB, generated by Javascript new Date(). in R, I read them into data frames as character, thereafter I need to convert them to POSIXct date times, but any combination of format that I try returns NA, any idea how to fix it? thanks
As an example, here is some of data I have:
> datetimes = c("Thu Dec 01 2016 14:53:38 GMT+0100 (CET)", "Thu Dec 01 2016 14:54:38 GMT+0100 (CET)", "Thu Dec 01 2016 14:55:38 GMT+0100 (CET)")
> class(datetimes)
[1] "character"
> c_datetimes = strptime(datetimes, format = '%a %b %d %Y %H:%M:%S')
> c_datetimes
[1] NA NA NA
First, the database should store actual datetime values, not strings. If this can't be fixed, the code that generates the data should be modified to return ISO8601 strings. Just call Date.toJSON() or the identical toISOString() to get a string in the ISO8601 form: 2017-02-14T12:55:58.376Z.
As the name implies, Json dates are in this format. All REST APIs expect such a parameter too. Anything else simply covers up the problem.
The reason you can't parse the current text is that you are probably in a non-English locale. You can disable localized parsing by setting LC_TIME to C.
Once you do that, you can parse the text wtih the '%a %b %d %Y %H:%M:%S GMT%z' format string. Note GMT and %z. The GMT literal ensures that GMT is ignored in the string. %z will parse the offset.
The snippet:
datetimes = c("Thu Dec 01 2016 14:53:38 GMT+0100 (CET)",
"Thu Dec 01 2016 14:54:38 GMT+0100 (CET)",
"Thu Dec 01 2016 14:55:38 GMT+0100 (CET)")
Sys.setlocale("LC_TIME", "C")
strptime(datetimes, format = '%a %b %d %Y %H:%M:%S GMT%z')
Will return :
[1] "2016-12-01 15:53:38" "2016-12-01 15:54:38" "2016-12-01 15:55:38"
You'll note that the offset was taken into account to generate the correct local time for my machine, which is at +2:00 during winter.
UPDATE
Both toJSON() and toISOString() return UTC time. If you want to preserve the offset information and the data was generated using Javascript you may have to use moment.js to generate strings with the offset, as shown here :
var m = moment(); // get "now" as a moment
var s = m.format(); // the ISO format is the default so no parameters are needed
// sample output: 2013-07-01T17:55:13-07:00
Probably your locale settings are not English:
datetimes = c("Thu Dec 01 2016 14:53:38 GMT+0100 (CET)", "Thu Dec 01 2016 14:54:38 GMT+0100 (CET)", "Thu Dec 01 2016 14:55:38 GMT+0100 (CET)")
Sys.setlocale("LC_TIME", "C")
strptime(datetimes, format = '%a %b %d %Y %H:%M:%S GMT%z', tz = "GMT") #choose wichever timezone you like here
[1] "2016-12-01 13:53:38 GMT" "2016-12-01 13:54:38 GMT" "2016-12-01 13:55:38 GMT"
In a data.frame, I have a date time stamp in the form:
head(x$time)
[1] "Thu Oct 11 22:18:02 2012" "Thu Oct 11 22:50:15 2012" "Thu Oct 11 22:54:17 2012"
[4] "Thu Oct 11 22:43:13 2012" "Thu Oct 11 22:41:18 2012" "Thu Oct 11 22:15:19 2012"
Everytime I try to convert it with as.Date, lubridate, or zoo I get NAs or Errors.
What is the way to convert this time to a readable form?
I've tried:
Time<-strptime(x$time,format="&m/%d/%Y %H:$M")
x$minute<-parse_date_time(x$time)
x$minute<-mdy(x$time)
x$minute<-as.Date(x$time,"%m/%d/%Y %H:%M:%S")
x$minute<-as.time(x$time)
x$minute<-as.POSIXct(x$time,format="%H:%M")
x$minute<-minute(x$time)
What you really want is strptime(). Try something like:
strptime(x$time, "%a %b %d %H:%M:%S %Y")
As an example of the interesting things you can do with strptime(), consider the following:
thedate <- "I came to your house at 11:45 on January 21, 2012."
strptime(thedate, "I came to your house at %H:%M on %B %d, %Y.")
# [1] "2012-01-21 11:45:00"
Another option is to use lubridate::parse_date_time():
library(lubridate)
parse_date_time(x$time, "%a %b %d %H:%M:%S %Y")
Or more simply:
parse_date_time(x$time, "abdHMSY")
From the docs:
It differs from base::strptime() in two respects. First, it allows specification of the order in which the formats occur without the need to include separators and % prefix. Such a formating argument is refered to as "order". Second, it allows the user to specify several format-orders to handle heterogeneous date-time character representations.
The docs contain all the formats (the "abdHMSY" etc.) recognized by lubridate.
I have a spreadsheet full of data with dates that look like this:
Mon Jul 16 15:20:22 +0000 2012
Is there a way to convert these to R dates (preferably PST) without using regular expression or is there no other way? I'd appreciate ideas on doing this conversion efficiently.
Sure, just use strptime() to parse time from strings:
R> strptime("Mon Jul 16 15:20:22 +0000 2012",
+ format="%a %b %d %H:%M:%S %z %Y")
[1] "2012-07-16 10:20:22 CDT"
R>
which uses my local timezone (CDT). If yours is Pacific, you can set it explicitly as in
R> strptime("Mon Jul 16 15:20:22 +0000 2012",
+ format="%a %b %d %H:%M:%S %z %Y", tz="America/Los_Angeles")
[1] "2012-07-16 08:20:22 PDT"
R>
which looks right with a 7 hour delta to UTC.
There's nearly a verbatim example of how to do this in the Examples section of ?strptime:
# ?strptime example:
## An RFC 822 header (Eastern Canada, during DST)
strptime("Tue, 23 Mar 2010 14:36:38 -0400", "%a, %d %b %Y %H:%M:%S %z")
# your data...
strptime("Mon Jul 16 15:20:22 +0000 2012", "%a %b %d %H:%M:%S %z %Y")
This can also be done with lubridate package in tidyverse
library(lubridate)
parse_date_time("Mon Jul 16 15:20:22 +0000 2012", orders = "amdHMSzY")
which is what I prefer.