Wrong data points reporting in Bosun (issue with downsampling) - opentsdb

I have a metric that I check over the 2 hours period using the max aggregator (for example 9:30 AM till 11:30 AM PST). Opentsdb UI shows that there was only one data point at 10:16 AM PST. And if I choose 2h max downsampling, then the UI shows one data point at 10:00 AM which is correct. When I check the same metric using Bosun UI for the same time period, the data point is registered at 10:52 AM PST and if I choose 2h max downsampling then I receive "No Results".
I have tried other metrics as well, anytime I use the same value for the downsampling as for the overall query, I receive "No Results" while with reduced downsampling or no downsampling at all, there are data points.
I would greatly appreciate if someone could explain the Bosun's behavior in this example.

Bosun currently is hardcoded to use the UTC timezone for everything. We recommend changing OpenTSDB/HBase to use UTC as well. There are Github issues for making this more apparent in the UI or for changing to a local timezone. There was an attempt to add a setting for changing the timezone, but it was not able to be merged.
Stack Overflow uses UTC for all our systems, so we don't have a need to implement support for other timezones. UTC is highly recommended due to issues with daylight savings time and servers in multiple locations, but if someone wants to add local timezone support we are happy to discuss the implementation on Github or in the slack chatroom.

Related

Firebase BigQuery server offset time

Background:
I'm having the Firebase analytics data exported to BigQuery. And I'm using cron jobs to crunch data in BigQuery for getting insight.
Problem:
To be able to only crunch delta data i.e. the data that has arrived since last time I ran my cron job I need a way to figure out the time when the data arrived at server, since the event_timestamp is generated at client and can be cached at client before sent.
Insights:
I have laborated with event_server_timestamp_offset (offset) which I thought I could use together with event_timestamp. But I was expecting the offset to only be positive but it can also be negative. And when I look at the MAX and MIN for the offset in the entire exported Firebase analytics dataset and re-calculate it to years instead of microseconds I can get more than 18 years offset.
Query:
SELECT
MAX(event_server_timestamp_offset)/(1000000*60*60*24) max_days,
MIN(event_server_timestamp_offset)/(1000000*60*60*24) min_days
FROM
`analytics_<project_id>.events_*`
Result: max_days=6784.485790436655,
min_days=-106.95833052104166
Question:
How can I figure out the server arrival time for my Firebase exported BigQuery data so I can run cron jobs crunching only delta data?
Can I use event_server_timestamp_offset together with event_timestamp? If so, how?
Best regards,
Daniel
Surprisingly enough, this question not having a clear answer for almost 2 years, I am leaving here the answers I got from the Firebase support team. The format is - question asked followed by the answer of the support staff.
Q1. event_date - The date on which the event was logged (YYYYMMDD format in the registered timezone of your app). Does it mean that the event occurred on that date, or that it was actually collected on that date?
A1. Per documentation, event_date refers to the date on which the event is logged/occurred. Note that event_date is based on the Analytics timezone setting of your Firebase Project.
Q2. event_timestamp - The time (in microseconds, UTC) at which the event was logged on the client. Is it safe to assume that this is the exact timestamp the event occurred on client side (in the app timezone of course)?
A2. Yes, this is based on the device timezone setting. However, event_timestamp may be skewed if the device time is incorrect.
Q3. event_server_timestamp_offset - Timestamp offset between collection time and upload time in micros. This is the main field that causes all the misunderstandings - in our BigQuery table for the year 2020 this field takes values in a range between 5 days and -2 days. I mean how can the colleciton time be 2 days ahead?
A3. The event_server_timestamp_offset field in the export schema is the time difference between when the event took place and the app uploaded it to our server. In other words, this is the estimated difference between the client's local time and the actual time, according to our servers. The values of this field are usually positive, but can be negative as well if the device time setting is incorrect.
Q4. One last question is very important - can we ignore the
event_server_timestamp_offset field and just rely on event_timestamp -
as the exact date and time the event occurred on the clientside (not
collected, not uplaoded, etc). If not- please explain how we can get
the exact datetime of the event occuring on the clientside. But if yes
please let me know why do we need the event_server_timestamp_offset field?
A4. Yes, you may actually ignore it and use event_timestamp alone. However, as mentioned earlier, event_timestamp could be off if the device time setting incorrect, but it shouldn't really affect the bigger picture of your analytics data as cases like this are usually one-off.
We use the event_date as the indicator and load the data once a day.

Is timezone info redundant provided that UTC timestamp is available?

I have a simple mobile app that schedules future events between people at a specified location. These events may be physical or virtual, so the time specified for the event may or may not be in the same timezone as the 'location' of the event. For example, a physical meeting may be scheduled for two people in London at local time 10am on a specified date. Alternatively, a Skype call may be scheduled for two people in different timezones at 4pm (in one person's timezone) on a specified date though the 'location' of the event is simply 'office' which means two different places in different timezones.
I wonder the following design is going to work for this application:
On the client, it asks user to input the local date and time and specify the timezone local to the event.
On the server, it converts the local date and time with the provided timezone into UTC timestamp, and store this timestamp only.
When a client retrieves these details, it receives the UTC timestamp only and converts it into local time in the same timezone as the client's current timezone. The client's current timezone is determined by the current system timezone setting, which I think is automatically adjusted based on the client's location (of course, assuming the client is connected to a mobile network).
My main motivations for this design are:
UTC is an absolute and universal time standard, and you can convert to/from it from/to any timezone.
Users only care about the local date and time in the timezone they are currently in.
Is this a feasible design? If not, what specific scenarios would break the application or severely affect user experience? Critiques welcome.
For a single event, knowing the UTC instant on which it occurs is usually enough, so long as you have the right UTC instant (see later).
For repeated events, you need to know the time zone in which it repeats... not just the UTC offset, but the actual time zone. For example, if I schedule a weekly meeting at 5pm in Europe/London with colleagues in America/Los_Angeles, then for most of the year it will occur at 9am for them... but for a couple of weeks in the year it will occur at 8am and for a couple of weeks in the year it will occur at 10am, due to differences in when DST is observed.
Even for a single event, you might want to consider what happens if time zone rules change. Suppose I schedule a meeting for 4pm on March 20th 2018, in the Europe/London time zone. Currently that will occur with a UTC offset of 0... but suppose between now and the meeting, the time zone rules change to bring British Summer Time in one hour earlier. If I've written it in my diary as 4pm, I probably don't want the software to think that it's actually at 5pm because that's the UTC instant we originally predicted.
We don't know your exact application requirements, but the above situations at least provide an argument for potentially storing the local time and time zone instead of the UTC instant... but you'll also need to work out what to do if the local time ends up being skipped or being ambiguous due to DST changes. (When the clocks fall back, some local times occur twice. When the clocks skip forward, some local times are skipped. A time that was unambiguous may become invalid or ambiguous if the rules change between the original planning time and the actual event. You should probably account for this in your design.)
To keep it simple, my answers are:
Timezone info is redundant if you want to define a single moment. A
UTC/Unix timestamp completely defines a moment.
Your design seems feasible but on point 2: i would convert to the UTC/Unix timestamp on the client-side and already give this timestamp
in its final form to the server. Reason: the client-side already has the info necessary to convert (see this time-keeping
client-server-db
architecture
example - it works based exactly on the principles you describe).
One possible problem (as described by Jon Skeet in his answer) are recurring events, but this should be reflected in the way you model
time. The difference between recurring events and fixed events is
that the latter completely define a moment (like a UTC/Unix
timestamp) while the first are only a 'function' which can be applied
to the current time to get the next trigger time of the recurring
event. But this might entirely be a different problem than what
you ask - in any case, somehow distinguishing between recurring
events (if you need them) and fixed events in your model is a good
idea.
One decision to make is: PULL or PUSH? Or both? Do you want the server to be able to send emails for example, when an event comes to
pass? Or do you want client-side alerts only when your client-side
app is running? The answers to these questions will help you come
towards a design suitable for you.

How to reliably determine time zone for a user from the Twitter API?

When getting information from Twitter's API for a user, they provide two fields related to the user's time zone:
utc_offset: -14400,
time_zone: "Indiana (East)"
Unfortunately, this doesn't tell the full story because I don't know if that UTC offset was calculated during standard time or daylight savings time. After dividing by 3600 seconds, I get -4 hours, which is valid during the summer months, but in the winter the correct value would be -5 hours.
If the value was ALWAYS determined by the daylight savings time value then I could write an algorithm for that, however after some searching on the subject I've seen several pasted outputs that contradict that assumption. (as a quick example, this question shows his/her offset as -21600 and then he/she says he/she is on central time, which if calculated during daylight savings time would be -18000).
It would make sense to me that the value would be calculated as of Jan 1 and the several pasted outputs I've found online fall into that category, but my own Twitter account shows the values listed above for which this assumption is invalid. My next thought was maybe it was calculated at the time I created my account, but then that seems erroneous as well because I can change my time zone at any later point (and even so, I created my account in November when I would have been on standard and not daylight time!).
My last thought was that maybe the value is being calculated by the date of the API request. This makes a lot of sense and the Twitter accounts I own all seem to validate this. BUT, the SA question I linked to earlier shows that the person answered the question on June 2nd, which is daylight savings time and his/her value of -21600 reflects a standard time for the Central time zone.
Anyone out there solve this problem? Thanks so much!
Twitter's front end uses Ruby on Rails. If you go to your own twitter account settings and look at the possible options for time zones (view source on the dropdown list), you will find that they match up with those provided by ActiveSupport::TimeZone, shown in this documentation. Although there appears to be some zones understood by Rails that Twitter has omitted, all of the Twitter zone key names are in that list.
I have asked Twitter to use standard time zone names in the future, in this developer request.
Why does Rails limit this list and use their own key values? Who knows. I have asked before, and gotten very little response. Read here.
But you can certainly use their mapping dictionary to turn the time_zone value into a standard IANA time zone identifier. For example:
"Indiana (East)" => "America/Indiana/Indianapolis"
"Central Time (US & Canada)" => "America/Chicago"
This can be found in the Rails documentation, and in the source code. (Scroll down to MAPPING.)
Then you can use any standard IANA/Olson/TZDB implementation you wish. They exist for just about every language and platform. For further details, see the timezone tag wiki. If you need help with a specific implementation, you'll need to expand your question to tell us what language you are using and what you have tried so far. (Or consider asking a new question about just that part of it.)
In regards to the utc_offset field, twitter does not make it clear what basis they use to calculate it. My guess is that it is the user's current offset, based on the time that you call the API.
Update 1
I have added support for converting Rails time zone names to both IANA and Windows standard time zone identifiers in my TimeZoneConverter library for .NET. If you are using .NET, you can use this library to simplify your conversions and stay on top of updates more easily.
Update 2
Twitter's API now returns the time zone in this format:
"time_zone": {
"name": "Pacific Time (US & Canada)",
"tzinfo_name": "America/Los_Angeles",
"utc_offset": -28800
},
Use the tzinfo_name field. Done. :)

How does Twitter's Website calculate how long a tweet was posted?

Here's a screenshot of my twitter feed (as of right now while me writing this Question).
Notice how the time is relative to me, right now? (those times differences are correct, btw)
How do they calculate that?
The reason I ask is that right now, i'm in Melbourne Australia. When I Remote Desktop to a server in the states, log in to twitter (using the SAME account) .. i get the same results!
At first, I thought they were calculating this based upon my account settings for Time Zone (which btw is set at +10 GMT)
But if that was the case, when I remote desktop to the server (which is in San Fran, CA) it should be showing different results in that RD terminal, right?
So how could they have coded this, please?
Twitter more than likely stores the date it was posted in UTC, it knows the time now in UTC (both on your machine and on the server).
Given that those date times are translated into the same timezone (UTC), it's a simple matter of taking the difference between the two times.
It's the same thing the Stack Exchange sites do to stamp the times for all the activities that you see.
As long as you're able to convert any representation of date time to UTC (which pretty much every API in existence has), this value is able to be computed as Twitter will push the UTC time down to the clients which then do the math (or do it on the server and pass the differences down); the settings that you see for UTC offset are when absolute times are displayed to you and you want them relative to your timezone.

How to implement timezone in a web application?

I want to implement timezone in my web application. I researched and saw most web app use a GMT dropdown, here is the link to that dropdown http://www.attackwork.com/BlogEntry/6/Time-Zone-Dropdown-Select-List/Default.aspx
Then I saw this article suggesting UTC is the way to go when it comes to implement timezone. https://web.archive.org/web/20210513223048/http://aspnet.4guysfromrolla.com/articles/081507-1.aspx Basically it's saying don't use DateTime.Now instead use DateTime.UtcNow
My questions are,
Is there a dropdown of the timezones in UTC, like the first link I showed there is one on GMT?
Should I really use UTC or GMT?
.NET 3.5 provides the TimeZoneInfo class which should make it relatively simple for you to populate a dropdown with time zones. GMT came before UTC and UTC was officially instituted on January 1, 1972. See this link for more information. For today's purposes, the two are pretty much synonymous, though they have different historical origins. Use whichever looks and functions better for your purposes.
I'm not sure if this is what you intended to ask, but in your database you should always store timestamps in UTC/GMT (as noted by others they mean essentially the same thing). For each user of your web app, store the time zone preference.
Then whenever you display the timestamp for something to a user, convert the UTC time in the database to the user's timezone.
GMT (Greenwich Mean Time) is the same as UTC (Universal Coordinated Time). This isn't an either/or choice - use it :)
Use Localization settings, functions and features anywhere possible.
If you aren't running against SQL Server 2008 or don't want to abstract timezone management to the database, you should store all times as UTC/GMT and apply the timezone difference based off the user's profile setting, so that users from all around the world can see timestamps on events in their local time.
The distinction between UTC and GMT is probably too fine to bother with in your code. However, it's probably a good idea to always save and process times internally with zero timezone offset, and deal with it as a presentation concern.
It's also possible to use JavaScript to determine the user's probable timezone: examine the timezone offsets for some pair of Date objects reasonably close to the solstices (even January 1 and July 1 makes a suitable approximation) to obtain a coarse timezone identification. Feel free to use this information to determine a default timezone, but do allow it to be changed by the user: JavaScript doesn't provide sufficient detail to select the exact timezone with national and regional historical shifts, and it may not be enabled by the user anyway.

Resources