BigQuery Query for GA Segment - Sequence - google-analytics

I'm trying to create a BigQuery SQL query which would tell me how many users have the following sequence (which is identical to GA segment)
I am not using legacy SQL.
I have tried using some mechanisms, yet to no avail.

Related

Effective way to query BMC Helix Remedy database other than Smart Reporting

Currently we are connecting the AR System through the Oracle database for this purpose. I need to know: is there any alternative way to access or query the Remedy database effectively? Is there any built-in API which we can utilise which will increase the efficiency of the work?
What could be used is the REST api, in which you can query directly the forms.
Please check following url:
REST API Doc
It will result with JSON object containing all data.
In order to obtain access to all forms you need to create a "service" user with fixed license and permissions to the forms which you would like to read using the API call.
You can query the Oracle back-end directly, with a few caveats. It should only be for reading data, not writing or modifying data. Otherwise, you could break data integrity as well as bypass workflow that should be fired. Also, this direct access does not enforce any permissions, nor does it translate any of the data. For example, selection fields come back as a number instead of their string value, dates are in epoch format, etc.
There is a Remedy ODBC driver, which isn't being updated, nor does it support joins. However, you can open multiple connections with it and join them manually. Plus, it does handle permissions and translations for you.
https://docs.bmc.com/docs/ars1911/odbc-database-access-introduction-896318914.html
If you know in advance what joins you will be doing, you should setup join forms within Remedy. That way the joins are done efficiently in the database. Otherwise, you are stuck with either of the above solutions or using one of the APIs which don't support ad-hoc joins.

BigQuery vs Cloud SQL autoscaling?

I declare that I am a beginner in using Google Cloud Platform.
I am developing a web application in react using firebase, so all data is saved on firestore.
Now I need to have a relational database, and I am very confused as to which is the best between Cloud SQL and BigQuery.
My idea was to have one part of the data on Cloud SQL and the other part on Firestore.
When an event happens, the data from Cloud SQL and firestore are merged and uploaded to BigQuery for analysis.
Example:
On Firestore I have a product that has an array field where IDs are
stored. These IDs are related to the Database saved on Cloud SQL. When
an order is placed it is added to a collection on Firestore and
appended to the database on BigQuery.
My problem is that from what I have read there is no possibility of autoscaling on Cloud SQL, while on BigQuery it does.
So my question is can you autoscale on CloudSQL?
If it can't be done, is it correct to use BigQuery exclusively?
Is there another solution on GCP that allows you to have a relational database but with autoscaling?
Edit 1
This is the very simplified model of a part of the database on CloudSQL / BigQuery
I'll use a 2/3 inner join query to get all the values I need.
I don't know how to make it non-relational and therefore be able to use firestore without having a large duplication of data, I am open to any kind of advice
Not sure that I understood correctly, but I reckon you would like to get some data (from one data source), combine/process that data with the data from a Firestore collection, and load/stream the result into BigQuery. All of that - is operationally in run time. The question is about the choice of that data source - either a Cloud SQL or a BigQuery.
Am I right that from you point of view the main Cloud SQL drawback - is a lack of scalability (autoscale). And you would like to consider a BigQuery instead of the Cloud SQL due to the 'autoscale'?
It is not clear what is the rate of the request/queries you expect, and where the data is located (any requirements on a global access), so it may be difficult to discuss the situation. Anyway...
Thinking about BigQuery, in my opinion, - this is a great "database" (the best from my point of view), but mainly for analytical purposes... Each query has some 'initial' latency (the query job won't be executed faster than some threshold), which cannot be significantly minimised, and there is no binary indexes in BigQuery tables. It means that your query will take a few seconds (let's assume 3 or more) every time you run it (unless the result is taken from the cache). If the number of requests is significant - it may become expensive (in BigQuery) and expensive in the component, which is used to process that task (i.e. Cloud Function triggered by some event) - as the later has to wait (and do nothting) during the query time.
In addition, BigQuery is very good in loading or steeaming data into it, but not very good in regular data updates inside it - there are plenty of limitations. Thus, depending on your context, it may be not very good idea to maintain operational data in BigQuery.
If I rule out the BigQuery -
Can we sacrifice 'autoscalability' for the Cloud SQL?
Can we use a Firestore collection instead of the Cloud SQL (and sacrifice the 'relational' property?
Can we use Cloud SQl and handle the the amount of data in tables which are used for querying, so there is no delays?
Not sure if I managed to help, but at least I provided some thoughts about the problem.
'Now I need to have a relational database, and I am very confused as to which is the best between Cloud SQL and BigQuery.'
Please be aware that BigQuery cannot be used to substitute a relational Database, and it is oriented on running analytical queries, not for simple CRUD operations and queries (Like in Cloud SQL). That doesn’t mean BigQuery can’t handle normalized data and joins. It absolutely can. It just performs better on denormalized stuff because BigQuery is essentially an OLAP engine. So, denormalize whenever possible (please read here).
You can use read replications to scale Cloud SQL. Read Replica instances allow data from the master instance to be replicated to one or more slaves. This setup can provide increased read throughput. Please see this.

Querying the Cosmos DB Change Feed using SQL queries

I need to access Cosmos DB data through a middleware API that gives access to SQL queries but not the change feed (i.e. DocumentClient.CreateDocumentQuery() but not DocumentClient.CreateDocumentChangeFeedQuery()). Is it possible to query the change feed using regular SQL queries?
I was thinking about filtering documents on recent _ts but I am not sure timestamps are guaranteed to be monotonically increasing across entire collections due to potential clock drift across the VMs Cosmos DB runs on.
You cannot query the Change Feed using a SQL query. The Change Feed contains documents that have been inserted / updated, and any filtering needs to be done client-side after receiving such changes.

Searchable data in CosmosDb with Graph api

My team uses CosmosDb to store data.
For our use case some of this data needs to be searchable.
Currently there are some filters in the Gremlin that has been implemented in CosmosDb so far, but not enough to suit our needs, which are mainly search in text.
This would be implemented to make a fuzzy search for a vertex, say, a person, where both name, email and company name would be included in the text.
In https://github.com/Azure/azure-documentdb-dotnet/issues/413 there was some talk of some string filters, but there has been no updates for a while.
My question is would it be better to use Azure Search for this use case?
We could add a step in the pipeline that would synchronize our data to an Azure Search service upon doing CRUD, but this would mean slower CRUD as well as data duplication, and the consumer of our api would have to use a search endpoint to get an id, and then do an additional lookup afterwards to get any related data.
If you can expose the data you want to make searchable to Azure Search using a "vanilla" (non-Gremlin) SQL API query, consider using Azure Search indexers for Cosmos DB. However, for simple string matching searches Azure Search may be an overkill - use it if you need more sophisticated searches (natural language-aware in many languages, custom tokenization, custom scoring, etc.).
If you need a tighter integration between Cosmos DB Graph API and Azure Search, vote for this UserVoice suggestion.

How can i fetch Google Analytics data into cassandra using Nifi?

I want to fetch my website's google analytics data into cassandra using Nifi , is there any way for it?
You should be able to use InvokeHttp to query the Google Analytics data (perhaps with a POST to batchGet), or you could use ExecuteScript and include the Java API driver/JAR in your Module Directory path. Once you get the response(s), there are many processors you can use for any transformation of the data, including ReplaceText for generating a CQL statement to insert the data. Then you can send the statement(s) to PutCassandraQL for ingest into Cassandra.

Resources