Timezone offset in datetime format with read_csv in R - r

I have a long csv file with multiple columns. The first column is datetime with a timezone offset:
2015-05-29 02:05:00+02:00
I am trying to read it with read_csv in R and I would like it to be converted to:
2015-05-29 04:05:00
But instead, read_csv gives me:
2015-05-29 02:05:00
Is there a way to parse and format the datetime, such that it correctly adds the extra timezone hour? Note that the timezone changes in my file at different times of the year (Daylight Saving Time). So, it can be +02:00 or +01:00, for instance

using %z in format od POSIXct will help. tz "" represents local time zone. change tz as required.
as.POSIXct("2015-05-29 02:05:00 +0200", format="%F %T %z",tz = "")

Related

Converting timestamp in seconds to a date format in R

I have a table (tags) with a column for timestamp (ts), which is formatted as seconds since 1 Jan, 1970 GMT. I'm trying to create a date column that converts the timestamp from seconds to date and time EST.
The suggested code for R was:
tags$date<-strptime(tags$ts, "%Y-%m-%d")
tags$date<-as.POSIXct(tags$date)
But when I do this, tags$date comes up as NA. Any suggestions for what I might be doing wrong? Thanks.
You should us as.POSIXct function instead:
tags$date <- as.POSIXct(tags$ts, origin="1970-01-01", tz="US/New York")
strptime converts between character representations and dates not between timestamp and dates.
Here's a lubridate version. When we use as_datetime we don't need to explicitly specify an origin as it defaults to the desired origin.
lubridate::as_datetime(1507119276, tz='EST')
# [1] "2017-10-04 07:14:36 EST"

Dealing with twitter timestamps in R

I've got a dataset with tweets and the informations twitter provides about them. I need to transform my dates from the format given to one I can understand properly (preferentially using a function I can choose the format, since I might need to select the tweets by day of the week, time of the day or anything like that) using R, I'm just starting to learn the language.
the format I've got the dates in is:
1420121295000
1420121298000
I've researched a bit before answering and tried to use functions like as.POSIXct, as>POSIXlt and others, they all got me this error:
Error in as.POSIXct.default(date, format = "%a %b %d %H:%M:%S %z %Y", :
do not know how to convert 'date' to class "POSIXct"
The format above is in epochs. Assuming this is in milliseconds since the epoch (you would have to double-check with the Twitter api), you can convert from epoch to UTC time using anytime function from the anytime package as shown below, which returns "2015-01-01 14:08:15 UTC."
anytime(1420121295000*0.001) #times 0.001 to convert to seconds
format(anytime(1420121295000*0.001), tz = "America/New_York", usetz=TRUE) #converting from UTC to EST timezone.

How do I parse a concatinated date and time in R?

The date July, 1, 2016 1:15pm and 43 seconds is given to me as the string 160701131543.
I have an entire column in my data frame of this date time. How should I go about parsing this column into usable data.
You can use the as.POSIXct function and specify the format, in your case the format is year, month, day, hour, minute, second. Read more about formatting date and time data on the ?strptime help page.
as.POSIXct("160701131543", format = "%y%m%d%H%M%S")
[1] "2016-07-01 13:15:43 EDT"
The timezone can be changed with the 'tz' parameter.
Here is another option with lubridate. The default tz is "UTC". It can be changed by specifying tz
library(lubridate)
ymd_hms("160701131543")
#[1] "2016-07-01 13:15:43 UTC"

R: Posix (Unix) Time Crazy Conversion

Unix time is 1435617000.
as.Date(1435617000,origin="01-01-1970")
[1] "3930586-11-23"
Which is wrong. I'm trying to (a) get the correct date, which, per epoch converter is GMT: Mon, 29 Jun 2015 22:30:00 GMT.
How do I get R to tell me the month, day, year, hour, minute & second? Thank you.
I think the reason why that happen is because as.Date converts arguments to class date objects. In this case you do not need a date but a class POSIXct object because your input, the x vector, contains other informations that as.Date is not able to manage. Another problem that even with the right function could appear, is that if when you do not specify the right time zone with the tz argument (except the case where your time zone is the same as the original time).
The following code does the job.
x <- 1435617000
as.POSIXct(x, origin = "1970-01-01", tz ="GMT")
[1] "2015-06-29 22:30:00 GMT"
Use as.Date
Just in the case you wanted only the date but you have a complete Unix time like x, you have to just divide by 86400 (which is the number of seconds in a day!) to get only the right date.
as.Date(x/86400L, origin = "1970-01-01")
[1] "2015-06-29"
Another important detail
The origin argument has to be supplied with YYYY-MM-DD and not like you did DD-MM-YYYY I am not sure but I think that the former is the only accepted and correct way.

BigQuery converting to a different timezone

I am storing data in unixtimestamp on google big query. However, when the user will ask for a report, she will need the filtering and grouping of data by her local timezone.
The data is stored in GMT. The user may wish to see the data in EST. The report may ask the data to be grouped by date.
I don't see the timezone conversion function here:
Does anyone know how I can do this in bigquery? i.e. how do i group by after converting the timestamp to a different timezone?
Standard SQL in BigQuery has built-in functions:
DATE(timestamp_expression, timezone)
TIME(timestamp, timezone)
DATETIME(timestamp_expression, timezone)
Example:
SELECT
original,
DATETIME(original, "America/Los_Angeles") as adjusted
FROM sometable;
+---------------------+---------------------+
| original | adjusted |
+---------------------+---------------------+
| 2008-12-25 05:30:00 | 2008-12-24 21:30:00 |
+---------------------+---------------------+
You can use standard IANA timezone names or offsets.
As of September 2016 BigQuery has adopted standard SQL and you can now just use the "DATE(timestamp, timezone)" function to offset for a timezone. You can reference their docs here:
BigQuery DATE docs
To those that stumble here:
How to convert a timestamp to another timezone?
Given that TIMESTAMP values, once constructed, are stored as UTC, and that TIMESTAMP does not have a constructor (TIMESTAMP, STRING), you can convert a timestamp to another time zone by transforming it first to a DATETIME and then constructing the new TIMESTAMP from the DATETIME in the new timezone:
SELECT TIMESTAMP(DATETIME(timestamp_field, '{timezone}'))
Example:
SELECT
input_tz,
input,
'America/Montreal' AS output_tz,
TIMESTAMP(DATETIME(input,'America/Montreal')) AS output
FROM (
SELECT 'US/Pacific' AS input_tz, TIMESTAMP(DATETIME(DATE(2021, 1, 1), TIME(16, 0, 0)), 'US/Pacific') AS input
UNION ALL
SELECT 'UTC' AS input_tz, TIMESTAMP(DATETIME(DATE(2021, 1, 1), TIME(16, 0, 0)), 'UTC') AS input
UNION ALL
SELECT 'Europe/Berlin' AS input_tz, TIMESTAMP(DATETIME(DATE(2021, 1, 1), TIME(16, 0, 0)), 'Europe/Berlin') AS input
) t
results in:
Row
input_tz
input
output_tz
output
1
US/Pacific
2021-01-02 00:00:00 UTC
America/Montreal
2021-01-01 19:00:00 UTC
2
UTC
2021-01-01 16:00:00 UTC
America/Montreal
2021-01-01 11:00:00 UTC
3
Europe/Berlin
2021-01-01 15:00:00 UTC
America/Montreal
2021-01-0110:00:00 UTC
How to strip time zone info from a DATETIME value?
DATETIME in BigQuery are time zone naive, such that they do not contain timezone info. This being said, if you have business knowledge that allows you to know the timezone of a DATETIME, you can strip that timezone offset by converting it to a TIMESTAMP with the known timezone:
SELECT TIMESTAMP(datetime_value, '{timezone}')
Given that the TIMESTAMP stores the value in UTC, you can then re-convert to DATETIME if that's your preferred method of storage, but now you'll know that your DATETIME is in UTC :)
Hopefully this can be helpful! :)
Your premise is right. If you group like this, then users who want EST or EDT will get incorrect date grouping:
GROUP BY UTC_USEC_TO_DAY(ts_field)
But as long as you figure out the offset that your user wants, you can still do the full calculation on the server. For example, if EST is 5 hours behind UTC then query like this:
GROUP BY UTC_USEC_TO_DAY(ts_field - (5*60*60*1000*1000000) )
Just parameterize the "5" to be the offset in hours, and you're all set. Here's a sample based on one of the sample data sets:
SELECT
COUNT(*) as the_count,
UTC_USEC_TO_DAY(timestamp * 1000000 - (5*60*60*1000*1000000) ) as the_day
FROM
[publicdata:samples.wikipedia]
WHERE
comment CONTAINS 'disaster'
and timestamp >= 1104537600
GROUP BY
the_day
ORDER BY
the_day
You can remove the offset to see how some edits move to different days.
To convert any TimeZone DateTime string to UTC, one could use PARSE_TIMESTAMP using the supported TIMESTAMP Formats in BigQuery.
For example to convert IST (Indian Standard Time) string to UTC, use the following:
SAFE.PARSE_TIMESTAMP("%a %b %d %T IST %Y", timeStamp_vendor, "Asia/Kolkata")
Here PARSE_TIMESTAMP parses the IST string to a UTC TIMESTAMP (not string). Adding SAFE as prefix takes care of errors/nulls etc.
To convert this to a readable string format in BigQuery, use FORMAT_TIMESTAMP as follows:
FORMAT_TIMESTAMP("%d-%b-%Y %T %Z", SAFE.PARSE_TIMESTAMP("%a %b %d %T IST %Y", timeStamp_vendor, "Asia/Kolkata"))
This example would take an IST string of the format Fri May 12 09:45:12 IST 2019 and convert it to 12-May-2019 04:15:12 UTC.
Replace IST with the required TimeZone and Asia/Kolkata with relevant Timezone name to achieve the conversion for your timezone
2016 update: Look answers below, BigQuery now provides timestamp and timezone methods.
You are right - BigQuery doesn't provide any timestamp conversion methods.
In this case, I suggest that you run your GROUP BY based on dimensions of the GMT/UTC timestamp field, and then convert and display the result in the local timezone in your code.
For me TIMESTAMP_SUB and TIMESTAMP_ADD functions did the job. When needed to convert timestamp from UTC to PST I used:
TIMESTAMP_SUB(`timestamp`, INTERVAL 8 HOUR)

Resources