I'm writing a geotagging app and running into headaches with timezones. Basically, my app has the following data:
Images with local timestamps (i.e. relative to a timezone)
GPS track files consisting of entries using UTC timestamps
My problem: I need a way to get all data that belongs to a give day, based on the timezone where the data was acquired. For the images, that is easy (I ask the timezone from the user upon import and save it in the EXIF data), but I'm not sure how to do it for the GPS tracks (there usually are multiple tracks per day, and assigning them timezones is not easy for the user when importing data that spans sever days and timezones). I can think of two possible solutions:
Use a heuristic based on the fact that the tracks are recorded at the same time and place as the images - but there can be tracks before a day's first image or after its last one that still need to be included - I'm not sure how to realiably handle such edge cases
Determine the timezone from the GPS coordinates - this would be an ideal solution, but is there an open source library that does this (ideally one that works offline)?
The heuristic method I don't think will work well.
Firstly always store times as UTC and timezone of origin, otherwise time is less meaningful.
After some thought I think that it would be sufficient to resolve down to the country code and from that lookup the timezone.
Depending on how much detail you want I think GeoTog may help you to locate a city and therefore a country from a lat/long (although it will need changing to work the other way).
If not that then Gisgraphy will work with the larger GeoNames database. You could use the web service or extract the data.
If none of these are good enough then I think you'll need to get a some GIS data, possibly boundaries from VMAP0 and process it into polygons or something searchable.
Option two: you could start by checking this site: http://www.twinsun.com/tz/tz-link.htm
Option one (less complicated, but I am not sure I accurately understand your need...)
So you have as input:
A target day defined in a known timezone TZ, starting at t0 and ending at t1 (excluded)
Images with timestamps ti in the same timezone TZ (is this hypothesis true?)
GPS tracks with UTC timestamps tg which can span over several time zones
We also know that there is at least one GPS track for each image.
Here's something that should work:
Convert your target day into UTC. You get the values t0/UTC and t1/UTC
Convert images' timestamps into UTC (you get ti/UTC from known ti/TZ)
process image if (t0/UTC <= ti/UTC < t1/UTC) i.e. it was taken during your target day.
find a GPS track including ti/UTC (no problem since tracks are timestamped in UTC), and then the closest timestamp within the list of points in this track. This point is the most likely position of your image.
Related
I have a Redshift data table where all time values are stored in CST and I convert the time values to the respective timezone based on the zip code (location).
While I do that, I understand that all time values are in Standard time and hence my function usage is
CASE WHEN **** convert_timezone('CST', 'EST', time_column)
WHEN **** convert_timezone('CST', 'MST', time_column)
....
END
This may not be applicable once we enter into Daylight Savings time. How can I take care of this such that I do not modify the SQL query again in 2018 March and in future?
Don't use time zone abbreviations. The are somewhat ambiguous, and can only refer to one aspect of the time zone. Instead, use a full IANA time zone identifier, such as America/Chicago for US Central time.
This is explained well in the Redshift docs:
Using a Time Zone Name
If you specify a time zone using a time zone name, CONVERT_TIMEZONE automatically adjusts for Daylight Saving Time (DST), or any other local seasonal protocol, such as Summer Time, Standard Time, or Winter Time, that is in force for that time zone during the date and time specified by 'timestamp'. For example, 'Europe/London' represents UTC in the winter and UTC+1 in the summer.
As far as the "...based on the zip code" part of your question, understand that not every ZIP code is locality-based. There are also technical assignments, overseas APO/FPO addresses, US territories, and other edge cases. Additionally, some zip codes may straddle more than one time zone.
A better approach, when possible, is to:
Get an approximation of latitude/longitude coordinates - using a variety of techniques depending on your source data. For example, geocoding APIs can take a street address and give a lat/lon.
Then determine the time zone identifier for that location, using one of the techniques listed here.
We have a database of cities with its geo coordinates. Once we filled it with corresponding time zones using tzworld. User sets location including city, city has time zone - here how we know user's timezone (we need to render date and time on server). But time zones are being changed: some new are appearing, some old are being removed.
Is there any best practices or tools to handle that kind of changes?
I.e. there is a city Foo with time zone Foo/Bar. One day tzdata was changed, and Foo/Bar was split into Foo/Old_Bar and Foo/New_Bar time zones with the same UTC offsets. We still have Foo/Bar in our db. Actually, it's a BC break, but it's ok since, say, we can handle those BC breaks. But then tzdata was changed again, and now Foo/New_Bar has different offset. And here comes troubles. Some users from Foo city see wrong local time since that moment.
Just to be sure you understand me right: it's not about DST, it's about the fact that time zones (their names) are being changed.
As far as I can see, wee need a kind of machine-readable tzdata diff. Like
split: Foo/Bar Foo/Old_Bar,Foo/New_Bar
move: Foo/New_Bar -05:00
This issue makes me feel that storing time zones is a bad idea. Is there a better one?
With specific regard to the IANA/Olson TZ database, the location identifiers do not change once established. The history of each identifier is always consistent for that location.
However, if you are using tz_world or some other map source to determine the time zone for some other location - one that doesn't necessarily have it's own identifier, then yes - it's possible that a zone split will cause the zone to change. Though, when it does, the new zone should be consistent with the old zone, up to the point of the change.
As a real world example, consider America/Fort_Nelson, which was added in tzdb 2015g for Fort Nelson, British Columbia, Canada, and the surrounding region of the Northern Rockies Regional Municipality. Previously, this area would have been resolved to America/Vancouver, but the zone was split due to their March 2015 time change. The tz_world maps were updated on November 7, 2015 to account for this change.
If you had previously resolved a user in Fort Nelson to America/Vancouver, then they will have incorrect times from November 1st, 2015 forward, as that's when Vancouver switched back to UTC-8, while Fort Nelson remained at UTC-7.
If you update to the latest tzdb and tz_world, you can use the original information to re-determine the time zone - which would now be America/Fort_Nelson.
The new time zone will accurately reflect all of the same information as Vancouver before the split, and the correct information for Fort Nelson after the split.
All of this should just work, assuming you update time zones after each update of tz_world, and recalculate future events after updating the tzdb.
The question remains, how do you know which zones have split and changed so you don't have to recalculate everything? For a small amount of data, you might as well recalculate everything. But for larger datasets, this might be impractical. Unfortunately, there's no machine-readable standardized format for the differences. I believe this has been talked about before in the tz discussion list, but I can't find it at the moment. You can ask there if you like.
Currently the only way is to manually read the release notes of each update. You can find them in the tz-announce list archives (or subscribe to the list for future updates). You can also find them in the NEWS file of any given release. You'll also want to review the history of the tz_world shapefile, which is on that web site.
Also, recognize that time zone IDs will never be removed from the tzdb. A split may create a new zone (Foo/New_Bar), but the original zone will remain (Foo/Bar, not Foo/Old_Bar). If a zone is determined unnecessary, its Zone entry might be replaced with a Link entry, but it will never be removed entirely.
I imagined there would be more literature on this, but I'm having trouble finding any. I have a lot of non-algebraically-aggregatable time series data (that is to say, points for which no function exists that I could use to aggregate them to a higher granularity-- stuff like unique active users, unique contributors, etc... where knowing the amount I had every minute of some hour does not tell me how many I had total during the hour). Currently, I'm just storing and presenting all of this data in UTC. The problem is that many of my clients find this confusing-- understandably so. Because the data is non-algebraically-aggregatable, there's no way to get from UTC data for 1 day midnight- midnight to, say, PST data from midnight to midnight. Recalculation would need to be done from raw data.
So:
Recalculation from raw data is prohibitively expensive for some complicated analytics graphs
We could store all data for all time zones, but this would increase the amount of data we store x24.
All of that said, how do other people deal with this issue? Here's how Google Analytics does it, but this seems insufficient for my use case because I know if I open the multiple timezone can of worms, clients will ask for more than one. This will also take a lot of work that doesn't seem worth the effort as just adding timezone support won't be extremely noticeable or a huge win. What I'm really hoping for is some clever design solution that just presents the UTC data in some intuitive enough way that it's no longer confusing for people in other timezones. Has anyone dealt with similar problems and come upon a solution I'm missing?
First of all, you should recognize that there a lot more than 24 time zones. In order to accurately take into account how people actually use time worldwide, you should be using IANA time zones, of which there are over 500. See also Wikipedia and the timezone tag wiki.
If you are dealing with individual points (discreet timestamps), then you can certainly convert from UTC to any time zone you wish, on the fly as you render your graph. You just need to also keep in mind that the range of data you query will also need to be translated to that time zone.
But if you are talking about aggregating data by the "day" of a specific time zone, then there is no magic bullet. You will need to decide ahead of time which time zones you want to support and calculate each one separately. When you do this, recognize that it's not just the view that's changing. Since the day boundaries are different for each time zone, then the data for each time zone could potentially have very different daily totals.
You should also be aware that not every day has 24 hours. If the day happens to be the date of a daylight saving time transition, it could have 23, 23.5, 24.5, or 25 hours. This could potentially affect how you draw your graph.
One approach you might consider is to be time zone ignorant in your aggregations, rather than using UTC or any specific time zone. Of course this depends heavily on the context of your data, but it is appropriate in certain circumstances. For example, on an invoice, you might care less about the specific timestamps, and more about which calendar date the invoice was assigned to. In that case, once a date is assigned, you would just aggregate on that date. Even if the company operates over multiple time zones, you wouldn't care about that in aggregate.
As far as some clever design that abstracts this from the user, I'm afraid I haven't seen much. The only two choices you really have are timezone-adjusted aggregations (UTC or otherwise), and time zone ignorant aggregations for calendar-date contexts.
We had similar issues to roll up the data for Generation in renewable. We went with three options User / Farm / UTC.
If user selects USER then all the data would be based on his browser Time zone. And Yesterday meant 24 hours till last mid night in user local time.
Similarly if it was Farm, then we take the Farm local and derive the same.
UTC is standard similar to what you have implemented.
Here's a screenshot of my twitter feed (as of right now while me writing this Question).
Notice how the time is relative to me, right now? (those times differences are correct, btw)
How do they calculate that?
The reason I ask is that right now, i'm in Melbourne Australia. When I Remote Desktop to a server in the states, log in to twitter (using the SAME account) .. i get the same results!
At first, I thought they were calculating this based upon my account settings for Time Zone (which btw is set at +10 GMT)
But if that was the case, when I remote desktop to the server (which is in San Fran, CA) it should be showing different results in that RD terminal, right?
So how could they have coded this, please?
Twitter more than likely stores the date it was posted in UTC, it knows the time now in UTC (both on your machine and on the server).
Given that those date times are translated into the same timezone (UTC), it's a simple matter of taking the difference between the two times.
It's the same thing the Stack Exchange sites do to stamp the times for all the activities that you see.
As long as you're able to convert any representation of date time to UTC (which pretty much every API in existence has), this value is able to be computed as Twitter will push the UTC time down to the clients which then do the math (or do it on the server and pass the differences down); the settings that you see for UTC offset are when absolute times are displayed to you and you want them relative to your timezone.
We have a website that has a large number of events that have dates and times created by admins. Admins choose a time zone for each date time entered, and they are stored in UTC time. We are trying to support a global audience, and be completely localized in terms of dates.
We have a search page, that allows dates to be entered as search criteria.
So users could say, show me all events between "12:01 AM July-1-2011" and "11:59 PM July 10-2011".
I'm trying to figure out what the best approach is to determining what time zone to consider the date filter criteria in.
Force end users to select a time zone when creating a date filters. This is cumbersome, and our designers our pushing back. It is what I would prefer.
Assume the the entered dates are in the users "preferred" time zone, which is set upon logging in.
Store times in Local time, without converting to UTC. This way the end users are searching in the admin created date. I hate this idea, i need help explaining why this is bad.
Please help!
Second option is possible solution to your problem. And it is probably the best.
Possibly you could get current time zone offset from web browser (with JavaScript) but the problem is, there are certain time zones that currently have the same offset but Daylight Saving Time switches on different dates, therefore search result would be inaccurate. By having User to choose his/her prefer time zone and storing that information in the profile, you could always present correct dates and times, as well as use this information for searching. However, I would add an information near search box, so that end User would know what time zone this refers to (with JavaScript that would be obvious: the current one, with profile User might forgot).
BTW. Time zone information is best to show as "UTC+02:00 (Warsaw, Zagreb, Skopje)" instead "Central European Time"...
As for other options:
1. Too much clicking. As well as "don't make me think, I want to have it in my local time zone, isn't that obvious?".
3. Local times will not be comparable against each other. You will soon end up with two different dates referring to the same point in time (at least in terms of the numbers). Really bad idea.