Performing Operations on metrics using Grafana and Bucky and OpenTSDB - opentsdb

So we are currently trying out Grafana using the data source OpenTSDB. And the metrics are being fed into OpenTSDB using Bucky (javascript). Unfortunately Bucky feeds the raw metrics like .navigationStart or .domComplete. But what we need are metrics that are already computed and more meaningful, like "Total Page Load Time" and etc. like in SiteSpeed.io.
like:
Total Page Load Time = performance.timing.loadEventEnd – performance.timing.navigationStart
To address this I was thinking of computing these values myself, however Grafana's metric input for OpenTSDB don't seem to have operations like that (like for its support for Graphite which has diffSeries to do subtraction)
Any suggestions? Thanks in advance.

Related

Google Analytics 4 reported for api outdated data

My goal - getted actual data from report GA4 in Backend app, how is it done in the dashboard. But, problem - report contains only part of data. I want get active users with city in interval last half on hour. I do it like this:
BetaAnalyticsDataClient.create().use { analyticsData ->
val request: RunReportRequest = RunReportRequest.newBuilder()
.setProperty("properties/$propertyId")
.addDimensions(Dimension.newBuilder().setName("city"))
.addMetrics(Metric.newBuilder().setName("activeUsers"))
.addDateRanges(DateRange.newBuilder().setStartDate("today").setEndDate("today"))
.build()`
I get a result in which there are no cities that are currently in the dashboard on the map. I had tried in GA4 settings interval updating data, i not found this. What is reason? Thanks!
The Method: properties.runReport is the standard GA4 report, and is limited to data that has been processed. Data processing takes between 24 and 48 hours it is not meant for real time.
You need to use the Creating a Realtime Report if you want real-time data, and you are limited to the dimensions and metrics available real-time these reports are very limited.
I guess what i am saying is you are running the wrong report, and you should not expect to get the data you want from the Realtime report.

Ingesting Google Analytics data into S3 or Redshift

I am looking for options to ingest Google Analytics data(historical data as well) into Redshift. Any suggestions regarding tools, API's are welcomed. I searched online and found out Stitch as one of the ETL tools, help me know better about this option and other options if you have.
Google Analytics has an API (Core Reporting API). This is good for getting the occasional KPIs, but due to API limits it's not great for exporting great amounts of historical data.
For big data dumps it's better to use the Link to BigQuery ("Link" because I want to avoid the word "integration" which implies a larger level of control than you actually have).
Setting up the link to BigQuery is fairly easy - you create a project in the Google Cloud Console, enable billing (BigQuery comes with a fee, it's not part of the GA360 contract), add your email address as BigQuery Owner in the "IAM&Admin" section, go to your GA account and enter the BigQuery Project ID in the GA Admin section, "Property Settings/Product Linking/All Products/BigQuery Link". The process is described here: https://support.google.com/analytics/answer/3416092
You can select between standard updates and streaming updated - the latter comes with an extra fee, but gives you near realtime data. The former updates data in BigQuery three times a day every eight hours.
The exported data is not raw data, this is already sessionized (i.e. while you will get one row per hit things like the traffic attribution for that hit will be session based).
You will pay three different kinds of fees - one for the export to BigQuery, one for storage, and one for the actual querying. Pricing is documented here: https://cloud.google.com/bigquery/pricing.
Pricing depends on region, among other things. The region where the data is stored might also important be important when it comes to legal matters - e.g. if you have to comply with the GDPR your data should be stored in the EU. Make sure you get the region right, because moving data between regions is cumbersome (you need to export the tables to Google Cloud storage and re-import them in the proper region) and kind of expensive.
You cannot just delete data and do a new export - on your first export BigQuery will backfill the data for the last 13 months, however it will do this only once per view. So if you need historical data better get this right, because if you delete data in BQ you won't get it back.
I don't actually know much about Redshift, but as per your comment you want to display data in Tableau, and Tableau directly connects to BigQuery.
We use custom SQL queries to get the data into Tableau (Google Analytics data is stored in daily tables, and custom SQL seems the easiest way to query data over many tables). BigQuery has a user-based cache that lasts 24 hours as long as the query does not change, so you won't pay for the query every time the report is opened. It still is a good idea to keep an eye on the cost - cost is not based on the result size, but on the amount of data that has to be searched to produce the wanted result, so if you query over a long timeframe and maybe do a few joins a single query can run into the dozens of euros (multiplied by the number of users who use the query).
scitylana.com has a service that can deliver Google Analytics Free data to S3.
You can get 3 years or more.
The extraction is done through the API. The schema is hit level and has 100+ dimensions/metrics.
Depending on the amount of data in your view, I think this could be done with GA360 too.
Another option is to use Stitch's own specfication singer.io and related open source packages:
https://github.com/singer-io/tap-google-analytics
https://github.com/transferwise/pipelinewise-target-redshift
The way you'd use them is piping data from into the other:
tap-google-analytics -c ga.json | target-redshift -c redshift.json
I like Skyvia tool: https://skyvia.com/data-integration/integrate-google-analytics-redshift. It doesn't require coding. With Skyvia, I can create a copy of Google Analytics report data in Amazon Redshift and keep it up-to-date with little to no configuration efforts. I don't even need to prepare the schema — Skyvia can automatically create a table for report data. You can load 10000 records per month for free — this is enough for me.

Google Analytics: Is it possible to filter the raw data to create more valuable indicators?

We would like to build indicators that provide more useful information than "averages", e.g. instead of having to rely on "average time on page", we would like to create an indicator like "unique users that spent more than [threshold] time on page".
In order to do this, we need to know, whether Google Analytics is storing information on "user session" in connection with "time on page" in its raw data? And if it does, whether this raw data is accessible and can be filtered?
Another situation where the mentioned storing and filtering might come in handy, is the following: if different activities (e.g. post comment, click like, ...) are all tracked with regard to user session, we could build an indicator like: "unique users that performed any of the following: comment, like, ...".
Any reply, remark or comment is highly appreciated.
Raw data is not accessible in Google Analytics.
The closest you will get, if you have a GA360 account, is the BigQuery export, but even that is not "raw" in any meaningful sense (although it is more detailed). You could create a custom sendHitTask to send raw data to your own database.
But raw data would be not useful to you, since GA does not send session data with the raw data. Since 2012/the introduction of Universal Analytics, sessions are entirely calculated on the Google servers - the aforementioned BigQuery export would actually be more useful, since data there is already sessionized, but this requires the paid-for version of GA.
Usually there are workaround for most use cases - i.e. "more than time x" can be viewed as categorical data instead of as metric, so if you send a timestamp in seconds (starting with 0 for the first page view) with each hit to a session scoped custom dimension GA will only retain the last value per session. Then you can filter by all users where that dimension is bigger a given value (you need to use a regex, since you cannot compare dimensions as numbers, and I recommend to create "buckets" instead of having too many discrete values).
But to answer the explicit question, there is no access to raw data (unless you store it yourself) and it would not contain session data in any case.

R google analytics segment definition

I've just started learning a bit of R in order to pull and analyse data from Google Analytics. What I'm struggling with is querying the management API to pull certain account data.
My desire is to be able to pull the definition of a segment or the definition of all my segments. That is to be able to find out how the segment is built e.g. page url contains 'questions'.
I'm using RGoogleAnalyics package. I've found this source code (https://r-google-analytics.googlecode.com/svn-history/r32/trunk/src/RGoogleAnalytics/R/Configuration.R) but not quite sure how to interpret that into script. I've tried experimenting but not getting any success - variety of errors around not being able to find functions such as Validatetoken or RetrieveConfigurationData
Any help as to how I need to format my query to get this would be really appreciated. Thanks
Fox, I am personally using RGA package and it works seamlessly.
See the segments documentation on Google Analytics Developers Site, the code is as follows:
ga$getData(ids, batch = TRUE, walk = TRUE,
start.date, end.date,
metrics = "ga:visits,ga:transactions",
dimensions="ga:keyword",
filter="ga:country==Denmark;ga:medium==organic",
segment="dynamic::ga:medium%3D%3Dreferral")
As a rule of thumb for complex segment I would advise creating them in GA interface and then simply referring them with segment ID.
Also be aware that you cannot reference more than one segment (unlike in GA interface), so for everyt segment you need to analyze, you need to place API request.

Mixable metrics and dimensions in google analytics

I'm doing some complex reports for google analytics and would like to ask you if the following is possible. The client wants to have just organic data for a bunch of metrics. Like pageviews, visitBounceRoutes, etc. The query I ended up with is the following:
https://www.googleapis.com/analytics/v3/data/ga?dimensions=ga:source,ga:medium,ga:keyword,ga:day,ga:month,ga:year&end-date=2013-11-20&fields=columnHeaders/name,rows,totalResults,totalsForAllResults&filters=ga:medium==organic&ids=ga:79067749&metrics=ga:pageviews,ga:pageviewsPerVisit,ga:visitors,ga:avgTimeOnSite,ga:newVisits,ga:visitBounceRate&start-date=2013-10-20
However the response is as follows:
'{"totalResults":0,"columnHeaders":[{"name":"ga:source"},{"name":"ga:medium"},{"name":"ga:keyword"},{"name":"ga:day"},{"name":"ga:month"},{"name":"ga:year"},{"name":"ga:pageviews"},{"name":"ga:pageviewsPerVisit"},{"name":"ga:visitors"},{"name":"ga:avgTimeOnSite"},{"name":"ga:newVisits"},{"name":"ga:visitBounceRate"}],"totalsForAllResults":{"ga:pageviews":"0","ga:pageviewsPerVisit":"0.0","ga:visitors":"0","ga:avgTimeOnSite":"0.0","ga:newVisits":"0","ga:visitBounceRate":"0.0"}}'
Can the dimensions ga:source,ga:medium,ga:keyword be mixed with the above metrics? It seems they can't since if I omit them the API returns an array of values 1 per each day within the specified range.
Where can I find more information about this and what categories are mixable? https://developers.google.com/analytics/devguides/reporting/core/dimsmets just shows all the available metrics but do not explains how they are combined and which one would be valid requests. I'm new at the analytics API and would be great any kind of help or guidance
Thanks a lot
Google Analytics Query Explorer is your friend for playing around with analytics dimensions/metrics/filters ;-)
Try http://ga-dev-tools.appspot.com/explorer/?dimensions=ga:source,ga:medium,ga:keyword,ga:day,ga:month,ga:year&metrics=ga:pageviews,ga:pageviewsPerVisit,ga:visitors,ga:avgTimeOnSite,ga:newVisits,ga:visitBounceRate&filters=ga:medium%253D%253Dorganic&start-date=2013-10-20&end-date=2013-11-20&max-results=100
Some thoughts:
Those dimensions & metrics should work -- maybe there was no organic data recorded during that time range?
Try removing the ga:medium==organic filter and see what your data looks like.
Does the profile you're using (ga:79067749) have any filters on it? If so, maybe try a different profile that has unfiltered data. (Analytics best practices -- make sure you have a profile with no filters applied that captures all data.)
As Mike said, there is no problem with the combination of metrics and dimensions you are using.
If you are entering the URL query directly in the browser problem might be the lack of URL encoding in your query string. For example, you need to convert == to %253D%253D
For example, instead of ga:medium==organic, you need ga:medium%253D%253Dorganic
If you build your query in the Google Analytics Query Explorer as Mike suggests, you can grab the direct link to your report by clicking the link symbol in the upper left:

Resources