How can i fetch Google Analytics data into cassandra using Nifi? - google-analytics

I want to fetch my website's google analytics data into cassandra using Nifi , is there any way for it?

You should be able to use InvokeHttp to query the Google Analytics data (perhaps with a POST to batchGet), or you could use ExecuteScript and include the Java API driver/JAR in your Module Directory path. Once you get the response(s), there are many processors you can use for any transformation of the data, including ReplaceText for generating a CQL statement to insert the data. Then you can send the statement(s) to PutCassandraQL for ingest into Cassandra.

Related

Reference external data source from AI/Kusto query?

tl;dr: I want to reference an external data source from a Kusto query in Application Insights.
My application is writing logs to Application Insights, and we're querying it using Kusto in the Azure portal. To give an example of what I'm trying to do:
We're currently looking at these logs to find an action that triggers when a visitor viewed a blog post on our site. This is working well on a per blog-post level, but now we want to group this data by the category these blog posts are in, or by the tags they have, but that's not information I have within the logs.
The information we log contains unique info about that blog post (unique url, our internal id, etc) that I could use to look up this information in another data source (e.g. our SQL DB where this relation is stored), but I have no idea if/how this is possible. So that's the question, is this possible? Can I query a SQL DB, or get data in JSON via a URL or something?
Alternative solutions would be to move the reporting elsewhere (e.g. PowerBI) and just use AI as a data source, or to actually log all the category/tag info, but I really don't want to go down that route.
Kusto supports accessing external data (blobs, Azure SQL, Cosmos DB), however
Application Insights / Azure Monitor and other multi-tenant services are blocking this functionality due to security and resource governance concerns.
You could try setting-up your own Azure Data Explorer (Kusto) cluster, where this functionality will be available, and then access your Application Insights data using cross-cluster query, or by exporting the data from Application Insights and hooking up EventGrid ingestion into your Kusto cluster.
Relevant links:
Kusto supporting external data:
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/schema-entities/externaltables
Querying data inside Application Insights:
https://learn.microsoft.com/en-us/azure/data-explorer/query-monitor-data
Continuous export data from Application Insights:
https://learn.microsoft.com/en-us/azure/azure-monitor/app/export-telemetry
Data ingestion into Kusto from EventGrid:
https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-event-grid

Upload offline data in Google Analytics

I am looking for a way to quickly upload offline data into Google Analytics. This is possible using Data Import which is a feature provided by Google Analytics itself. But doing this on daily basis is a hectic task. Is there any other functionality available using which i can automatically upload data on daily basis and view the report?
You can automate data imports by using the Management API. Data Import is documented here.
To follow the examples you first need to install the Google API client for the programming language of your choice. Then you create the custom data source (same as for the manual upload) and send data there via the uploadData method. Run this at a schedule (e.g. via cron) and the task stops being hectic.

How to copy Google Analytics data into SQL Server tables

I just started working on Google Analytics stuff and I'm pretty new to this. I am now granted access to GA account of my Organization's marketing Website for several European countries(single login).
My requirement is to copy different European countries GA data into a single table structure in SQL server. Wondering if anyone of you have done this before? Any suggestions are highly appreciated.
As already written earlier, there are several ways of doing this. I prefer to integrate Google Analytics and SQL Server with no coding, using Skyvia tool: Google Analytics and SQL Server Integration. It allows me to create a copy of Google Analytics report data in SQL Server and keep it up-to-date with little to no configuration efforts. I don’t even need to prepare the schema — Skyvia can automatically create a table for report data. You can load 10000 records per month for free — this is enough for me.
There is a number of ways of doing this. Google Analytics does have the ability to export data as CSV but its going to be hard to match up the data properly.
If you are up for a bit of programming. start with the Google Analytics API it will allow you to extract data from Google analytics and insert it where you like. You can use any programming language that is capable of preforming a HTTP Post and HTTP Get. However i recommend looking into one of Googles client libraries.
If you have the ability to use SSIS to you can use Targit Google Analytics SSIS its a custom connection manager and data reader for extracting data from Google analytics it is free to use. Note: Full disclosure I am the lead developer on that project.

BigRquery - RUN_QUERY_JOB

I've installed "bigrquery" like this:
devtools::install_github("hadley/bigrquery")
library(bigrquery)
And i get this error, when trying to extract data:
Error: Access Denied: Job triple-xxx-xxx:job_zu6P-qSxxx7DBVICij6_QyDv0: RUN_QUERY_JOB
I've looked here and on the web and everyone says that you just need 2 things to extrac data from Google BigQuery:
1.-Have a Project for it (BigQuery Enabled):
2.-Put a billing address for BigQuery.
I've done that, but still got the problem.
IMPORTAT:
For other packages that interact with Google products (Google Analytics), e.g RGA; you need to create a Client ID (OAUTH), do i need to to this with "bigrquery"???
Someone can update the method to get the data?
Ps. I can get the data in the broswer (with the Web Interface provided by Google). But not in R from "bigrquery" - I'm using the version hosted on CRAN.
Ps2. I don't want that the "authentications" to be stored in the cache, is there a way to make "bigrquery" to ask for authentication everytime it tries to connect to BigQuery?
I found this issue on this post, but with the solution out-of-date:
Google App Engine authorization for Google BigQuery
This error means that the user that was running the query was not authorized to run jobs in the project (triple-xxx-xxx). You'd need to add the user that is running the query to the project via the developers console (https://console.developers.google.com/project).
To answer some of your other questions:
You don't need to create a clientid to use bigquery.
I'm not sure if there is a way to force bigrquery to re-authorize every time. That said, looking at the source code (https://github.com/hadley/bigrquery/blob/master/R/auth.r) you may be able to call set_access_cred with null to clear the authentication.

Does BigQuery encrypt data on disk?

I have tried finding the answer to this but have only found anecdotal references. Does Google encrypt the data that is stored on BigQuery? If so, what encryption mechanisms are used?
All new data being added to BigQuery tables are encrypted using strong, industry-standard encryption methods. All old data will be encrypted over time, but currently with no specific timeline. If you'd like more detail on security across the Google Cloud Platform you might want to check out this blog post:
http://googlecloudplatform.blogspot.com/2014/08/googles-cloud-is-secure-but-you-dont.html
BigQuery is a part of Google's Cloud Platform Offering. As part of utilizing BigQuery you need to first load the data into it.
You can load Data 2 ways.
Load jobs support two data sources:
Objects in Google Cloud Storage
Data sent with the job or streaming insert
Data sourced from Google Cloud Storage is always encrypted as per the link below. Streaming data into BigQuery probably depends on how you stream it (I'm haven't found any absolute data on this point)
Loading data into BigQuery
Description of Compute Engine Disks
Yes, data stored in BigQuery is encrypted.
Google Cloud Platform encrypts data stored in BigQuery, without any action required from the user, using one or more encryption mechanisms. Data stored in Google Cloud Platform is encrypted at the storage level using either AES256 or AES128.
There are more details in this whitepaper:
https://cloud.google.com/security/encryption-at-rest/

Resources