Dealing with twitter timestamps in R - r

I've got a dataset with tweets and the informations twitter provides about them. I need to transform my dates from the format given to one I can understand properly (preferentially using a function I can choose the format, since I might need to select the tweets by day of the week, time of the day or anything like that) using R, I'm just starting to learn the language.
the format I've got the dates in is:
1420121295000
1420121298000
I've researched a bit before answering and tried to use functions like as.POSIXct, as>POSIXlt and others, they all got me this error:
Error in as.POSIXct.default(date, format = "%a %b %d %H:%M:%S %z %Y", :
do not know how to convert 'date' to class "POSIXct"

The format above is in epochs. Assuming this is in milliseconds since the epoch (you would have to double-check with the Twitter api), you can convert from epoch to UTC time using anytime function from the anytime package as shown below, which returns "2015-01-01 14:08:15 UTC."
anytime(1420121295000*0.001) #times 0.001 to convert to seconds
format(anytime(1420121295000*0.001), tz = "America/New_York", usetz=TRUE) #converting from UTC to EST timezone.

Related

SAS date and time to Normal date and time format (Y-M-D H:m:S)

I have obtained data from a remote tagging experiment and the dates are in SAS format (43255.940972). The last level of information I want to extract are seconds. I was trying this:
as.Date(data$Date, origin = "1960-01-01 00:00:00", "%Y-%m-%d %H:%M:%S")
But all I get in return is Year-Month-Day but no time details. How do I specify, using this command that I also want that information to be returned?
Thanks
Based on the answer provided by #Onyambu
as.POSIXct(a*86400,origin="1899-12-30")
Some documentation you can read on those functions and date time in general.

How to format time zone offset in lubridate

I want to format the date in ISO 8601 format using lubridate. At the moment the code I have parses the date almost the way I want. The only thing I want to change is to have a colon in the time zone offset. My code at the moment:
dateTime <- str_match(fileName, dateTimeRegex)[2] %>% ymd_hms() %>% strftime(format = "%y-%m-%dT%H:%M:%S%z", tz = "UTC")
Sample output:
"19-09-26T10:45:00+0000"
Expected output:
"19-09-26T10:45:00+00:00"
Is there a simple way to do it, without parsing this manually? %z creates 0000, but I need a colon there
From Wikipedia (emphasis mine):
The UTC offset is the difference in hours and minutes from Coordinated Universal Time (UTC, or GMT) for a particular place and date. It is generally shown in the format ±[hh]:[mm], ±[hh][mm], or ±[hh]. So if the time being described is one hour ahead of UTC (such as the time in Berlin during the winter), the UTC offset would be "+01:00", "+0100", or simply "+01".
HH:MM is just one way to format time offsets, the others being HHMM and HH, so your output conforms to ISO 8601.
We can use regex to achieve your desired output. Using sub
x <- "19-09-26T10:45:00+0000"
sub("(.*\\+)(\\d{2})(\\d{2})", "\\1\\2:\\3", x)
#[1] "19-09-26T10:45:00+00:00"

Import separate date and time (hh:mm) excel columns, to use for time elapsed calculation

Newbie here, first post (please be gentle). I have been trying to resolve this for several hours, so finally decided time to ask advice.
I have a large spreadsheet which I am importing with readxl. It contains one column with date (format dd/mm/yyyy) and several time columns in format hh:mm as can be seen: excel
Essentially I want to be able to import both time and date columns and combine them, so that I can then do some other calculations, like time elapsed.
If I import letting R guess the col-types, it converts the times to POSIXct, but these then have a date on 1899 attached to them: R_POSIXct
If I force readxl to assign the time column to numeric, I get a decimal (e.g. 0.315972222 for 07:35), which then tried converting using similar syntax to
format(as.POSIXct(Sys.Date() + 0.315972222), "%Y-%m-%d %H:%M:%S", tz="UTC")
i.e.
df$datetime <- format(as.POSIXct(df$date + df$time), "%Y-%m-%d %H:%M", tz="UTC")
which results in the correct date, but with a time of 00:00, not the time it is passed.
I have tried searching here and found posts to be not quite the same question (e.g. Combining date and time columns into dd/mm/yyyy hh:mm), and have read widely, including about about lubridate, but as I'm only 6 months into R, am finding some explanations a bit cryptic.
Suggestions or ignposting appreciated (if there are solutions I haven't found)
If you subtract the number of days between 1899-01-01 and 1970-01-01 and then multiply that (shifted) Excel numeric value by 3600 you should come close to the number of seconds since start of 1970. You could then convert to POSIXct with as.POSIXct( x, origin="1970-01-01"). That does seem to be "the hard way", however
It would be far easier and probably more accurate to convert the date-times to YYYY-MM-DD H:M:S format and then export as csv to be imported into R as text. There is a "POSIXct" colClasses argument to read.csv, although it doesn't handle separate columns of date and time. For that you would be advised to import as character values and then paste the dates and times. Then watch you format strings for as.POSIXct. The dd/mm/yyyy "format" would be specified by "%d/%m/%Y".

How to convert from CEST to UTC?

I want to convert number of days to date with time:
> 15525.1+as.Date("1970-01-01")
[1] "2012-07-04" ## correct but no time
I tried this:
> apollo.fmt <- "%B %d, %Y, %H:%M:%S"
> as.POSIXct((15525.1+as.Date("1970-01-01")), format=apollo.fmt, tz="UTC")
[1] "2012-07-04 04:24:00 CEST"
but as you see the results provide in CEST. But I need it it in UTC.
Any hints on this?
For the original conversion, refer to this question: Converting numeric time to datetime POSIXct format in R and these pages: Date-times in R , Date-time conversions and Converting excel dates (number) to R date-time object. Bascially, it depends on your data source, the time origin for that data sources (Excel, Apache etc.) and the units. For example, you may have the total time elapsed in seconds, minutes, hours or days since the time origin for your data source which will be different for Excel or Apache. Once you have this information, you can use strptime or origin arguments and convert to R date-time objects.
If you are only concerned with changing the timezone, you can use attr:
> u <- Sys.time()
> u
[1] "2017-12-21 09:01:35 EST"
> attr(u, "tzone") <- "UTC"
> u
[1] "2017-12-21 14:01:35 UTC"
You may want to check up on the valid timezones for your machine though. A good way to get a time-zone that works with your machine would be googleway::google_timezone. To get the coordinates for your location (or the location from where you're importing data), you can either look those up online or use ggmap::geocode() - useful if converting time stamps in data from different time zones.
I think the problem is as.POSIXct doesn't change anything if the time is already POSIXct, so the tz option has no effect.
Use attr as explained here

R: Posix (Unix) Time Crazy Conversion

Unix time is 1435617000.
as.Date(1435617000,origin="01-01-1970")
[1] "3930586-11-23"
Which is wrong. I'm trying to (a) get the correct date, which, per epoch converter is GMT: Mon, 29 Jun 2015 22:30:00 GMT.
How do I get R to tell me the month, day, year, hour, minute & second? Thank you.
I think the reason why that happen is because as.Date converts arguments to class date objects. In this case you do not need a date but a class POSIXct object because your input, the x vector, contains other informations that as.Date is not able to manage. Another problem that even with the right function could appear, is that if when you do not specify the right time zone with the tz argument (except the case where your time zone is the same as the original time).
The following code does the job.
x <- 1435617000
as.POSIXct(x, origin = "1970-01-01", tz ="GMT")
[1] "2015-06-29 22:30:00 GMT"
Use as.Date
Just in the case you wanted only the date but you have a complete Unix time like x, you have to just divide by 86400 (which is the number of seconds in a day!) to get only the right date.
as.Date(x/86400L, origin = "1970-01-01")
[1] "2015-06-29"
Another important detail
The origin argument has to be supplied with YYYY-MM-DD and not like you did DD-MM-YYYY I am not sure but I think that the former is the only accepted and correct way.

Resources