CosmosDB: DISTINCT for only one Column - azure-cosmosdb

I have the following query:
SELECT DISTINCT c.deviceId, c._ts FROM c
ORDER BY c._ts DESC
I would like to receive only one pair (c.deviceId, c._ts) per deviceId, but because the c._ts value is distinct for all entries, I am getting all the value-pairs for all deviceIds, with other words my whole DB.
I have tried to use Question: Distinct for only one value as a guide, but I see that CosmosDB does not support GROUP BY.
Is there a way to do this in cosmosDB?

Though it's a common requirement i think,i can't implement it on my side as same as you. The distinct keyword can't work on one single column cross whole query result.
Group by feature is currently in active development for a long period,based on the latest comment in this voice,it is coming soon.
If your need is urgent ,as workaround, you could follow this case to use documentdb-lumenize package which supports Aggregate Functions.

Related

Custom Definitions in Bigquery

I'm pretty new to Bigquery/Firebase/GA even SQL. (btw, if you have some good experience or recommendations where I can start learning, that would be great!)
But I have main issue with Bigquery that needs solving right now. I'm kinda trying all sources I can get some info/tips from. I hope this community will be one of them.
So my issue is with Custom Definitions. we have them defined in Google Analytics . We want to divide users with this definition and analyze them separately:
My question is: where/how can I find these custom definitions in bigquery to filter my Data? I have normal fields, like user ID, Timestamps etc. but can't find these custom definitions.
I have been doing some research but still don't have a clear answer, if someone can give me some tips or mby a solution I would be forever in debt ! xD
I got one solution from the other community which looks like this, but I couldn't make it work, my bigquery doesn't recognize customDimensions as it says in the error.
select cd.* from table, unnest(customDimensions) cd
You can create your own custom function, Stored Procedure on Bigquery as per your requirements.
To apply formal Filter over filed like user ID, & Timestamps, you can simply apply standard SQL filter as given below:-
SELECT * FROM DATA WHERE USER_ID = 'User1' OR Timestamps = 'YYY-MM-DDTHH:MM'
Moreover, unnest is used to split data on fields, do you have data which need to be spited ?
I could help you more if you share what are you expecting from your SQL.
Your custom dimensions sit in arrays called customDimensions. These arrays are basically a list of structs - where each struct has 2 fields: key and value. So they basically look like this example: [ {key:1, value:'banana'}, {key:4, value:'yellow'}, {key:8, value:'premium'} ] where key is the index of the custom dimension you've set up in Google Analytics.
There are 3 customDimensions arrays! Two of them are nested within other arrays. If you want to work with those you really need to get proficient in working with arrays. E.g. the function unnest() turns arrays into table format on which you can run SQL.
customDimensions[]
hits[]
customDimensions[]
product[]
customDimensions[]
Example 1 with subquery on session-scoped custom dimension 3:
select
fullvisitorid,
visitStartTime,
(select value from unnest(customDimensions) where key=3) cd3
from
ga_sessions_20210202
Example 2 with a lateral cross join - you're enlargening the table here - not ideal:
select
fullvisitorid,
visitStartTime,
cd.*
from ga_session_20210202 cross join unnest(customDimensions) as cd
All names are case-sensitive - in one of your screenshots you used a wrong name with a "c" being uppercase.
This page can help you up your array game: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays - just work through all the examples and play around in the query editor

Cosmos DB Querying documents that must have two values for the same field

Hello a litte bit stuck for now.
Let's say I have a simple collection (maybe not so well built...) :
- id
- user_id
- application_id
- version
- [other fields]
I would like to get all the user_id for a specific application_id that must have used two specific versions of the application.
If only request a specific version, this is quite easy
select c.user_id from c where c.application_id='123456' and c.version='1.0'
But if I simply try to add another version (with a AND operator) the result is empty, and that's normal.
I tried with JOIN operator but didn't succeed to make it work.
Let also say that my SQL background is old and poor and I don't speak about the Cosmos DB SQL specificities
But I want to be able to detect user_id that have used two versions of
a specific application
Maybe i got your point.You mean your data like below:
A userid has v1 and v2,B userid has v1. Now you just want to select A because A crosses v1 and v2. I'm afraid this requirement could not be implemented with single query sql. You could refer to below sql:
SELECT count(c.version) as count,c.userid FROM c
where (c.version='v1' or c.version='v2')
group by c.userid
So then you could know which userid has crossed v1 and v2 based on the count =2. Just loop the result by yourself.

How to group by and order by in cosmos db?

As this doc said, "You currently cannot use GROUP BY with an ORDER BY clause but this is planned". But we do need to group by one field and order by another field. Is there a way to do it?
According to my research, i'm afraid that there is no such direct official way to use GROUP BY with ORDER BY since the statement you mentioned in your question:
The GROUP BY clause must be after the SELECT, FROM, and WHERE clause
and before the OFFSET LIMIT clause. You currently cannot use GROUP BY
with an ORDER BY clause but this is planned.
You could submit your feedback to push the progress of this feature.
If your need is urgent,i would suggest you :
a:sort the date after group by.Such as ARRAY.SORT() method in
.net code.
b:Or you could group by the data with this package(which is
built on Stored Procedure in cosmos db) after order by.

BigQuery error: Cannot query the cross product of repeated fields

I am running the following query on Google BigQuery web interface, for data provided by Google Analytics:
SELECT *
FROM [dataset.table]
WHERE
  hits.page.pagePath CONTAINS "my-fun-path"
I would like to save the results into a new table, however I am obtaining the following error message when using Flatten Results = False:
Error: Cannot query the cross product of repeated fields
customDimensions.value and hits.page.pagePath.
This answer implies that this should be possible: Is there a way to select nested records into a table?
Is there a workaround for the issue found?
Depending on what kind of filtering is acceptable to you, you may be able to work around this by switching to OMIT IF from WHERE. It will give different results, but, again, perhaps such different results are acceptable.
The following will remove entire hit record if (some) page inside of it meets criteria. Note two things here:
it uses OMIT hits IF, instead of more commonly used OMIT RECORD IF).
The condition is inverted, because OMIT IF is opposite of WHERE
The query is:
SELECT *
FROM [dataset.table]
OMIT hits IF EVERY(NOT hits.page.pagePath CONTAINS "my-fun-path")
Update: see the related thread, I am afraid this is no longer possible.
It would be possible to use NEST function and grouping by a field, but that's a long shot.
Using flatten call on the query:
SELECT *
FROM flatten([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910],customDimensions)
WHERE
  hits.page.pagePath CONTAINS "m"
Thus in the web ui:
setting a destination table
allowing large results
and NO flatten results
does the job correctly and the produced table matches the original schema.
I know - it is old ask.
But now it can be achieved by just using standard SQL dialect instead of Legacy
#standardSQL
SELECT t.*
FROM `dataset.table` t, UNNEST(hits.page) as page
WHERE
  page.pagePath CONTAINS "my-fun-path"

Selecting the most recent date from a table in PeopleSoft using Peoplesoft Query (Max() doesn't work)

I am building a query in people soft using Peoplesoft query manager.
I am trying to pull the most recent date from the date column. I have tried using max() as an expression, however, the query doesn't pull any records.
I have checked with another co-worker and they have never been able to pull records using max().
Is there any other way or workaround to pull the most recent record?
So I figured out why no results were returned when using Max in a subquery. It was more from a lack of understanding PeopleSoft and SQL since I am relatively new to it. When I was setting the date column in the subquery as max for the aggregate to be used as criteria to compare to the date column in the main query I didn't make any criteria in the subquery. This meant that the subquery would go through all dates for all employees except for the employee that I was specifying in a prompt and returning a value that didn't match any of the dates for the employee in the main query and returning no one. This was fixed by setting a criteria in the subquery that the employee ID that had to be searched in the subquery matched the one that was typed into the prompt in the main query
Use effective date for doing such searches while using PSQuery.
Use Effective date in order to get the most recent date, max may not work properly in PeopleSoft. Query should be effective dated
PS Query has built in filters for EFFDT tables. When you add a criteria on the EFFDT field, there are some additional drop down choices on the "condition type" field like 'Eff Date <' and 'Eff Date <=', etc. Usually, when you create a query for an Effective dated table, PS Query will automatically add the subquery based on the 'Eff Date <=' condition type.

Resources