There are three tables mentioned below, I eventually want to bring in a field from Table3 to Table1 (but the only way to join these two tables is via a common field present in Table2)
Table 1: Application Insights-30 days data (datasize ~4,000,000)
Table 2: Kusto based table (datasize: 1,080,153)
Table 3: Kusto based table (datasize: 38,815,878)
I was not able to join the tables directly, So, I used various filter conditions, distinct operators and split the month data to 4 weeks and then used union to join all 3 tables and got the resultant table.
However, now I am unable to perform any operations on the resultant table (even |count doesn't work)
I get the following error
Query execution has exceeded the allowed limits (80DA0003):
Any help in handling such cases would be helpful
Please check article how to control the query limits:
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/concepts/querylimits#limit-on-result-set-size-result-truncation
When using the join operator, make sure that the table with fewer rows is the first one (left-most in query).
See more Best Practices here.
Related
I’ve been struggling with this one for while, would appreciate some advice. A bit context about the source and destination tables involved in the update/merge query I'm using:
SRC Table
Format_Code ACR
----------------------------
BAD 5
SAD 7
MAD 2
SRC is created via a select statement joining on two tables; select statement looks something like this:
Select distinct a.Format_Code, b.ACR from Formats a
Inner join Codes b on lower(a.Format_Name) = lower(b.Format_Name)
I’m attempting to use SRC to update a destination table (DES), joining/matching on Format_Code as follows:
Merge Into Inventory DES
Using
(
Select distinct a.Format_Code, b.ACR from Formats a
Inner join Codes b on lower(a.Format_Name) = lower(b.Format_Name)
) SRC
On DES.Format_Code = SRC.Format_Code
When Matched Then Update set DES.ACR = SRC.ACR
I get the following error (because of the duplicates in the destination table I think) but not sure how to ignore/bypass them. SRC contains no duplicates but DES does have duplicate Format_Code. During the update I’d like to either update just one instance of the duplicate rows or ignore duplicates entirely (small number of duplicates so I can update manually if necessary)
SQL Error: ORA-30926: unable to get a stable set of rows in the source tables
30926. 00000 - "unable to get a stable set of rows in the source tables"
*Cause: A stable set of rows could not be got because of large dml
activity or a non-deterministic where clause.
*Action: Remove any non-deterministic where clauses and reissue the dml.
It’s my first time posting, so apologies if I’ve made any rookie mistakes
... because of large dml activity ...
Try to COMMIT, and then run your MERGE. Any improvement?
I am running the following query on Google BigQuery web interface, for data provided by Google Analytics:
SELECT *
FROM [dataset.table]
WHERE
hits.page.pagePath CONTAINS "my-fun-path"
I would like to save the results into a new table, however I am obtaining the following error message when using Flatten Results = False:
Error: Cannot query the cross product of repeated fields
customDimensions.value and hits.page.pagePath.
This answer implies that this should be possible: Is there a way to select nested records into a table?
Is there a workaround for the issue found?
Depending on what kind of filtering is acceptable to you, you may be able to work around this by switching to OMIT IF from WHERE. It will give different results, but, again, perhaps such different results are acceptable.
The following will remove entire hit record if (some) page inside of it meets criteria. Note two things here:
it uses OMIT hits IF, instead of more commonly used OMIT RECORD IF).
The condition is inverted, because OMIT IF is opposite of WHERE
The query is:
SELECT *
FROM [dataset.table]
OMIT hits IF EVERY(NOT hits.page.pagePath CONTAINS "my-fun-path")
Update: see the related thread, I am afraid this is no longer possible.
It would be possible to use NEST function and grouping by a field, but that's a long shot.
Using flatten call on the query:
SELECT *
FROM flatten([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910],customDimensions)
WHERE
hits.page.pagePath CONTAINS "m"
Thus in the web ui:
setting a destination table
allowing large results
and NO flatten results
does the job correctly and the produced table matches the original schema.
I know - it is old ask.
But now it can be achieved by just using standard SQL dialect instead of Legacy
#standardSQL
SELECT t.*
FROM `dataset.table` t, UNNEST(hits.page) as page
WHERE
page.pagePath CONTAINS "my-fun-path"
I have a Google Analytics premium account set up to monitor the user activity of a website and mobile application.
Raw data from GA is being stored in BigQuery tables.
However, I noticed that the statistics that I see in a GA report are quite different the statistics that I see when querying the BigQuery tables.
I understand that GA reports show aggregated data and possibly, sampled data. And that the raw data in Bigquery tables is session/hit-level data.
But I am still not sure if I understand the reason why the statistics could be different.
Would really appreciate it if someone clarified this for me.
Thanks in advance.
UPDATE 1:
I exported the raw data from Bigquery into my Hadoop cluster. The data is stored in a hive table. I flattened all the nested and repeated fields before exporting.
Here is the hive query that I ran on the raw data in the Hive table:
SELECT
date as VisitDate,
count(distinct fullvisitorid) as CountVisitors,
SUM(totals_visits) as SumVisits,
SUM(totals_pageviews) AS PVs
FROM
bigquerydata
WHERE
fullvisitorid IS NOT NULL
GROUP BY
date
ORDER BY
VisitDate DESC
A) Taking February 9th as the VisitDate, I get the following results from this query:
i) CountVisitors= 1,074,323
ii) SumVisits= 48,990,198
iii) PVs= 1,122,841,424
Vs
B) Taking the same VisitDate and obtaining the same statistics from the GA report:
i) Users count = 1,549,757
ii) Number of pageviews = 11,604,449 (Huge difference when compared to A(iii))
In the hive query above, am I using any wrong fields or processing the fields in a wrong way? Just trying to figure out why I have this difference in numbers.
UPDATE 2 (following #Felipe Hoffa 's suggestion):
This is how I am flattening the tables in my Python code before exporting the result to GCS and then to Hadoop cluster:
queryString = 'SELECT * FROM flatten(flatten(flatten(flatten(flatten(flatten([' + TABLE_NAME + '],hits),hits.product),hits.promotion),hits.customVariables), hits.customDimensions), hits.customMetrics)'
I understand what you are saying about flattening causing repeated pageviews and each repetition getting into the final wrong addition.
I tried the same query (from Update1) on Bigquery table instead of my Hive table. The numbers matched with those on the Google Analytics Dashboard.
However, assuming that the Hive table is all I have and it has those repeated fields due to flattening.. BUT Is there still anyway that I can fix my hive query to match the stats from Google Analytics dashboard?
Logically speaking, if the repeated fields came up due to flattening.. can't I reverse the same thing in my Hive table? If you think that I can reverse, do you have any suggestion as to how I can proceed on it?
Thank you so much in advance!
Can you run the same query in BigQuery, instead of on the data exported to Hive?
My guess: "The data is stored in a hive table. I flattened all the nested and repeated fields before exporting." When flattening - are you repeating pageviews several times, with each repetition getting into the final wrong addition?
Note how data can get duplicated when flattening rows:
SELECT col, x FROM (
SELECT "wrong" col, SUM(totals.pageviews) x
FROM (FLATTEN ([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910], hits))
), (
SELECT "correct" col, SUM(totals.pageviews) x
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
)
col x
wrong 2262
correct 249
Update given "update 2" to the question:
Since BigQuery is working correctly, and this is a Hive problem, you should add that tag to get relevant answers.
Nevertheless, this is how I would correctly de-duplicate previously duplicated rows with BigQuery:
SELECT SUM(pv)
FROM (
SELECT visitId, MAX(totals.pageviews) pv
FROM (FLATTEN ([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910], hits))
GROUP EACH BY 1
)
i have oracle 11g
i have tables employee table and employee time data tables.
and my table having id,employee_no,employee_in Time,...
in one day i get with 1,100,17-JUN-14 04.57.19 PM,..., 2,100,17-JUN-14 05.57 PM...etc multiple records with multiple employe ids.
how to get recently get recorded with using employee_no.
i already tried with join of both tables and try to get employee name and employee_in Time
please save my days.
It would really help to know the table structures and relevant columns in each. In addition it helps to know sample data and expected results and what you've tried to date.
Based on statement to " recently get recorded with using employee_no." I take to mean get the most recent employee_in_time for each employee.
Select max(employee_In_Time), employee_no, trunc(employee_In_time) as CalDay
from employee_Time
Group by employee_no, trunc(employee_In_time)
This would return the most recent time entry for each employee, if you need other data from employee table a join should suffice. but without knowing results... not sure what you're after.
I need to manipulate some data in SQLite, it should be simple but trying to figure it out how to do exactly this has frustrated me.
It's just a join, one table called "routes" has a column "stop_id". I need to take another table called "stops" which also has a "stop_id" column and everywhere that they match, add all the additional columns from "stops" to the "routes" table (added columns are "stop_name" "stop_lat" "stop_lon" and "master_station"). "stop_id" is the primary key in the stops table. I need to join the tables and not keep them relational because after I do that I will be changing the rows by hand with new information. I am using Firefox SQLite Manager if that matters.
A join can be done with JOIN:
SELECT * FROM routes JOIN stops USING (stop_id)
However, the result of a join cannot be changed directly; the UPDATE statement works only on actual tables.
To change values that come from the routes or stops tables, you have to update those tables by using their respective primary keys to look up the records.