How to effectively deal with late events in Azure Data Explorer (Kusto)? - azure-data-explorer

We have an event stream coming from iot devices form partners.
Some of those partners send directly from the device, some centralize collection and send to us in batches.
Some partners send once a day in a burst with all events from previous 24h.
Some partners resume sending after outages and we got events from 3-7 days ago.
The stream have an average of 450 events per second, with 170bytes per event. The peak is 1000 events per second.
The events have a timestamp field and most queries will filter events with timestamp in last 30 days.
My understanding is that Azure data explorer create extents as data arrives. Because of the very late events, I suppose my extents are not optimized for timestamp queries because data from a day will be spread unevenly of several extents, mixed together with other days events.
How to deal effectively with very late events? Is there a way to tell ADX that I plan to query based on a timestamp and get it to self-organize around that for late events?

For out-of-order ingestion, you can consider setting up a datetime partition key for the table.
See: Data partitioning policy

Related

Google Analytics show real time goal hits but not on the conversions report

I'm trying to report conversions to Google Analytics from the server side of an app after a payment is successfully processed. I'm using the Measurement Protocol from the devguides. https://developers.google.com/analytics/devguides/collection/protocol/v1/
The problem is that it successfully shows the goal hits on the real time conversions report, but this are not showed in the normal conversions report as goal completions.
Is there any difference between 'goal hit' and 'goal completion' I'm missing? Or is there any delay on the data that makes into the regular conversions report?
There is a delay. Per documentation it's 24-48 hours (4 hours on a 360 account), although usually the data shows up somewhat faster.
Documentation:
Processing latency is 24-48 hours. Standard accounts that send more
than 200,000 sessions per day to Analytics will result in the reports
being refreshed only once a day. This can delay updates to reports and
metrics for up to two days. To restore intra-day processing, reduce
the number of sessions your account sends to < 200,000 per day. For
Analytics 360 accounts, this limit is extended to 2 billion hits per
month.
I used to think there was long delays in data showing up in GA reports as well, until I discovered a small bug in the GA system in regards to time zones. The system automatically selects the date for you on the reports, but if you live in a time zone like Australia or The Philippines, these can be out of sync, and therefore, the most recent data doesn't show up.
I now always set the date to "Today" or to the last few days, and I find all data comes through within minutes, not hours.

How to best debug a google analytic events?

I'm currently integrating support for google analytics in my c++ project. I'm still learning how to use the analytics interface, but I can foresee a few potential issues that I may have with debugging.
I'm currently only able to see the "Event Category" and "Event Action" fields for any events in real time. Is there a way to see "Event Labels" and "Event Values"?
I've only been using the analytics interface for a few hours. How long does it take for events to transfer from Real Time to archived events that can be found in the "Behavior" panel? Currently, I'm not seeing any events in the "Behavior" panel, but there are events in the "Real-Time" panel.
If you click an entry in the event category column in realtime view it will give you a breakdown to action and label for that category.
Processing latency is documented here:
Data processing latency Processing latency is 24-48 hours. Standard
accounts that send more than 200,000 sessions per day to Analytics
will result in the reports being refreshed only once a day. This can
delay updates to reports and metrics for up to two days. To restore
intra-day processing, reduce the number of sessions your account sends
to < 200,000 per day. For Analytics 360 accounts, this limit is
extended to 2 billion hits per month.
Most of the time the data will show up a lot quicker (in some of my accounts the data turns up within the hour; anecdotally I'd say it depends to some extent on the account size/number of hits. Also for a Premium/360 account guaranteed processing latency is 4 hours). But if you need to rely on it for any business criticall purpose you'd better go for the documented number.
For your title question how to "best" debug, I'd probably start by installing some kind of proxy that allows to inspect the actual request. This will allow you to better track down the cause of the error, if any.

How to emigrate old statistics to google analytics?

In our project we stored all users event data in our database for over one year , but it's not indexed.
now we are going to use google analytics to store our analytics and analyze the report using google analytics dashboard.
but before start using google analytics , i would like to emigrate all old statics (about 2 million events) to google analytics.
for this matter i should use Measurement Protocol and it's limit allow me to transfer 2 million hits with no problem.
but i didn't succeed to know how to set the time of the event. Measurement Protocol has Queue Time but google says :
Values greater than four hours may lead to hits not being processed.
how it's possible to transfer 2 million events to google analytics with there event time ?
Thanks
You are correct you can use the measurement protocol to send events data directly to google analytics. I don't see any problem in sending 2 million events. However its not possible to set the event time longer then four hours ago.
Queue time is used to set the time that the event occurred as you can see it cant be more then four hours ago and I have found that if you do set it to four hours ago its a bit fuzzy if the data is correct or not. This feature is probably most use in mobile devices where they may go off line for a short time you can store the data then send it all once the device is online again.
So the dates will be the date that you sent the event to Google Analytics you cant back date the data to more then four hours ago. So I am not sure how much use the data will be to you when it is all inserted.
There is no way to do this, but you can make it easier on yourself.
Unfortunately, there is no way to add, remove, or otherwise edit Google Analytics hit data retrospectively, except to delete all of it. You also cannot copy, or move it between accounts, or download it all.
You are not the first to have to come to terms with this.
In this situation, we recommend to our clients that they run their new and old systems in parallel for a testing period (usually 6 months or a year), before switching off one of them.
Yes, it's difficult to let go of old data, but sometimes it has to be done.

Why are Google Analytics Dashboard statistics changing?

Background:
I have a Google Analytics account using which I am tracking user activity for web and mobile app. After logging into your account and choosing the web property and the corresponding view, you generally see a dashboard with quick stats like Pageviews, Users, Sessions, Pages/Sessions, Avg. Session Duration, Bounce Rate and percentage of new sessions. You can change the time period (from the top right area of the Dashboard) to get the same stats for that period.
Problem:
Last week, I was interested in the three main stats: Page views, Users and Sessions for a particular day - say, day A. The dashboard showed the following stats:
Pageviews - 1,660,137
Users - 496,068
Sessions - 983,549
This report was based on 100% of sessions.
I go back to the dashboard TODAY and check the same stats for the same day A. Here's what I saw:
Pageviews - 1,660,137
Users - 511,071
Sessions - 1,005,517
This report is also based on 100% of sessions.
Nothing was changed in the tracking code for the web and mobile app. Could someone explain why I have this difference in the stats? Is this normal?
They need some time to update the system, otherwise their system would overwhelm
When you first create a profile it can take up to 48 -72 hours for it to start showing data.
After that time data will appear instantly in the Real-time reports.
Standard reports take longer to finish processing. You need to remember the amount of data that is being processed. Some of the data may appear in the standard reports after a few hours. The numbers have not completed processing for at least 24 hours, so anything you look at then will not be accurate.
When checking Google Analytics never look at todays or yesterdays numbers in the standards reports, if you want accurate information. Things get even more confusing when you consider time zones. When exactly is it yesterday? I have noticed numbers changing as far back as 48 hours. But Google Says in there documentation 24 hours. I am looking for the link in the documentation will post it when I find it.
Found it: Data Limits
Data processing latency
Processing latency is 24-48 hours. Standard accounts that send more
than 200,000 sessions per day to Google Analytics will result in the
reports being refreshed only once a day. This can delay updates to
reports and metrics for up to two days. To restore intra-day
processing, reduce the number of sessions you send to < 200,000 per
day. For Premium accounts, this limit is extended to 2 billion hits
per month.
So try doing the same thing again today but check your last day being Monday. When you check again next week the numbers should be correct.

Last “end date” with data in Analytics

I'm using "Reporting google Analitics API" and I can’t find information about what the last “end date” with data in Analytics is.
For example, let's suppose you want to retrive the last month’s data.
When do you have to perform the query?
The first day of the current month?
...or the second one?
...or maybe the third one?
And only another question: are the returned data for days in pacific time?
Google Analytics API is supposed to have access to the same data you have in the interface.
Google says that data can take up to 24h to process. The time it takes to really update the data depends on the type and size of the account. Small accounts are updated multiple times a day and can have data available in just a few hours. Once you reach 1M hits a month you are moved to a different mode where the data on your account is updated only once a day. Google Analytics Premium customers have updates more often even for large ammounts of traffic.
There's no way to tell through the API what is exactly the time of the last hit processed. You can query the data for today by the hour and see for yourself though.
Usually you don't care and just want to make sure that the data you're querying has been fully processed for that day.
So if you query data for yesterday there's a chance it has not being completely updated, for example if it's midnight the data for yesterday is just a couple minutes ago and probably haven't been completely processed yet. The safest bet in this case is to query data for 2 days ago.
So if today is 2012-06-15 and you want to get 1 month of data a safe approach is to query data with start-date=2012-05-13 and end-date=2012-06-13. This will most of the time give you data for days that have been fully processed, but it's not 100% safe as well. Google Analytics have had outages in the past where data took longer than that to process, these are not usual though. When you get the data out it's really hard to tell just for the API if the data for those days have been fully processed or not, using the 2 days ago isea you just make it more likely that it is.
The days are aggregate following your timezone settings configured on the Google Analytics profile.

Resources