I am using following query in which , table which is being referred that have 50Million+ records. By creating history table will help me out to better CPU performance ? or is there any other option apart from Partition. Or Query plan tweak is the only option ?
SELECT MIN(minbkt),
maxbkt,
SUBSTRB(DUMP(MIN(val), 16, 0, 32), 1, 120) minval,
SUBSTRB(DUMP(MAX(val), 16, 0, 32), 1, 120) maxval,
SUM(rep) sumrep,
SUM(repsq) sumrepsq,
MAX(rep) maxrep,
COUNT(*) bktndv,
SUM(CASE
WHEN rep = 1 THEN
1
ELSE
0
END) unqrep
FROM (SELECT val,
MIN(bkt) minbkt,
MAX(bkt) maxbkt,
COUNT(val) rep,
COUNT(val) * COUNT(val) repsq
FROM (SELECT
/*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */
"VERSION_LABEL" val,
NTILE(75) OVER(ORDER BY NLSSORT("VERSION_LABEL", 'NLS_SORT = binary')) bkt
FROM "User"."AUDITTRAIL" t
WHERE "VERSION_LABEL" IS NOT NULL)
GROUP BY val)
GROUP BY maxbkt
ORDER BY maxbkt
It looks like this is a query associated with gathering a histogram on the version_label column of an auditing table.
I would expect that you almost certainly do not need such a histogram to be present, and you can modify the statistics gathering to just collect simple statistics on such a table -- ie. no histograms. the best way of doing that would be based on your version and the way in which the statistics gathering is being triggered, but if you need help with that then either expand the question to include those details or start another question.
Related
SQLITE3
Task: get a data set that contains the following data - SEE NOTES BESIDE COLUMNS
SELECT DISTINCT DateTime(Rounded, 'unixepoch') AS RoundedDate, -- Rounded DateTime to the floor hour
Count() AS Count, -- Count of items that registered within the above time
CAST (avg(Speed) AS INT) AS AverageSpeed, -- Average table.Speed column data within the defined datetime
Count() AS SpeederCount -- ?? WTF? [pseudo constraints: if Speed > Speedlimit then +1]
FROM RawSpeedLane AS sl
INNER JOIN
SpeedLaneSearchData AS slsd ON slsd.ParentId = sl.Id
INNER JOIN
Projects AS p ON p.ProjectId = sl.ProjectId
WHERE sl.ProjectId = 72
GROUP BY RoundedDate;
The SQL above is currently gives me all the data I need, EXECPT for the last column.
This last column is supposed to be the count of records where that pass specific criteria. The only way I have found to successfully do this is to build a sub query... Cool? okay, but the problem is the sub query takes 4 minutes to run because well... I suck at SQL :P No matter how many different ways I've tried to write it, it still takes forever.
Here is the long, but working version.
SELECT DISTINCT RoundedDate,
Count() AS Count,
CAST (avg(Speed) AS INT) AS AverageSpeed,
(
SELECT count()
FROM RawSpeedLane AS slr
WHERE slr.ProjectId = 72 AND
datetime( ( (strftime('%s', Start) - (strftime('%M', Start) * 60 + strftime('%S', Start) ) ) ), 'unixepoch') = sl.RoundedDate AND
Speed > p.SpeedLimit
)
AS SpeederCount
FROM SpeedLaneReportDataView AS sl
INNER JOIN
Projects AS p ON p.ProjectId = sl.ProjectId
WHERE sl.ProjectId = 72
GROUP BY RoundedDate;
I currently just tried this for the last column
(select Count() where sl.Speed > p.SpeedLimit)
but as expected, i got 1s and 0s im not really sure on what to do here. Any hints or help that lead me in the right direction is very much appreciated.
I don't think SQLite has an IIF but CASE works.
This is a response to Backs answer, but I can't comment yet.
SELECT DISTINCT DateTime(Rounded, 'unixepoch') AS RoundedDate, -- Rounded DateTime to the floor hour
Count() AS Count, -- Count of items that registered within the above time
CAST (avg(Speed) AS INT) AS AverageSpeed, -- Average table.Speed column data within the defined datetime
SUM(CASE WHEN Speed > SpeedLimit THEN 1 ELSE 0 END) AS SpeederCount
FROM RawSpeedLane AS sl
With SUM and IIF:
SELECT DISTINCT DateTime(Rounded, 'unixepoch') AS RoundedDate, -- Rounded DateTime to the floor hour
Count() AS Count, -- Count of items that registered within the above time
CAST (avg(Speed) AS INT) AS AverageSpeed, -- Average table.Speed column data within the defined datetime
SUM(IIF(Speed > SpeedLimit, 1, 0)) AS SpeederCount
FROM RawSpeedLane AS sl
Does anybody know how to do a group by query by n records.
For example if I have a db with xn records I would like to aggregate the first 3 and then the next 3 and so on.
Where {x,n member of positive integers excluding 0} :)
Thanks
This does exactly what you want :
SELECT int(((T.Rank - 1) / 3)) AS GroupID, SUM(T.field_to_agregate)
FROM
(
SELECT (SELECT COUNT(*) FROM your_table AS T2 WHERE T1.ID>T2.ID) + 1 AS Rank , ID, field_to_agregate
FROM your_table AS T1
) T
GROUP BY int(((T.Rank - 1) / 3))
However since you did not posted any data sample and table structure (mistake!), I had to suppose that you have an ID field in your table, if not you will have to adapt it. If you don't suceed add more info about your data and I will adapt my query to match your table struct
I'm using BigQuery on exported GA data (see schema here)
Looking at the documentation, I see that when I selected a field that is inside a record it will automatically flatten that record and duplicate the surrounding columns.
So I tried to create a denormalized table that I could query in a more SQL like mindset
SELECT
CONCAT( date, " ", if (hits.hour < 10,
CONCAT("0", STRING(hits.hour)),
STRING(hits.hour)), ":", IF(hits.minute < 10, CONCAT("0", STRING(hits.minute)), STRING(hits.minute)) ) AS hits.date__STRING,
CONCAT(fullVisitorId, STRING(visitId)) AS session_id__STRING,
fullVisitorId AS google_identity__STRING,
MAX(IF(hits.customDimensions.index=7, hits.customDimensions.value,NULL)) WITHIN RECORD AS customer_id__LONG,
hits.hitNumber AS hit_number__INT,
hits.type AS hit_type__STRING,
hits.isInteraction AS hit_is_interaction__BOOLEAN,
hits.isEntrance AS hit_is_entrance__BOOLEAN,
hits.isExit AS hit_is_exit__BOOLEAN,
hits.promotion.promoId AS promotion_id__STRING,
hits.promotion.promoName AS promotion_name__STRING,
hits.promotion.promoCreative AS promotion_creative__STRING,
hits.promotion.promoPosition AS promotion_position__STRING,
hits.eventInfo.eventCategory AS event_category__STRING,
hits.eventInfo.eventAction AS event_action__STRING,
hits.eventInfo.eventLabel AS event_label__STRING,
hits.eventInfo.eventValue AS event_value__INT,
device.language AS device_language__STRING,
device.screenResolution AS device_resolution__STRING,
device.deviceCategory AS device_category__STRING,
device.operatingSystem AS device_os__STRING,
geoNetwork.country AS geo_country__STRING,
geoNetwork.region AS geo_region__STRING,
hits.page.searchKeyword AS hit_search_keyword__STRING,
hits.page.searchCategory AS hits_search_category__STRING,
hits.page.pageTitle AS hits_page_title__STRING,
hits.page.pagePath AS page_path__STRING,
hits.page.hostname AS page_hostname__STRING,
hits.eCommerceAction.action_type AS commerce_action_type__INT,
hits.eCommerceAction.step AS commerce_action_step__INT,
hits.eCommerceAction.option AS commerce_action_option__STRING,
hits.product.productSKU AS product_sku__STRING,
hits.product.v2ProductName AS product_name__STRING,
hits.product.productRevenue AS product_revenue__INT,
hits.product.productPrice AS product_price__INT,
hits.product.productQuantity AS product_quantity__INT,
hits.product.productRefundAmount AS hits.product.product_refund_amount__INT,
hits.product.v2ProductCategory AS product_category__STRING,
hits.transaction.transactionId AS transaction_id__STRING,
hits.transaction.transactionCoupon AS transaction_coupon__STRING,
hits.transaction.transactionRevenue AS transaction_revenue__INT,
hits.transaction.transactionTax AS transaction_tax__INT,
hits.transaction.transactionShipping AS transaction_shipping__INT,
hits.transaction.affiliation AS transaction_affiliation__STRING,
hits.appInfo.screenName AS app_current_name__STRING,
hits.appInfo.screenDepth AS app_screen_depth__INT,
hits.appInfo.landingScreenName AS app_landing_screen__STRING,
hits.appInfo.exitScreenName AS app_exit_screen__STRING,
hits.exceptionInfo.description AS exception_description__STRING,
hits.exceptionInfo.isFatal AS exception_is_fatal__BOOLEAN
FROM
[98513938.ga_sessions_20151112]
HAVING
customer_id__LONG IS NOT NULL
AND customer_id__LONG != 'NA'
AND customer_id__LONG != ''
I wrote the result of this table into another table denorm (flatten on, large data set on).
I get different results when I query denorm with the clause
WHERE session_id_STRING = "100001897901013346771447300813"
versus wrapping the above query in (which yields desired results)
SELECT * FROM (_above query_) as foo where session_id_STRING = 100001897901013346771447300813
I'm sure this is by design, but if someone could explain the difference between these two methods that would be very helpful?
I believe you are saying that you did check the box "Flatten Results" when you created the output table? And I assume from your question that session_id_STRING is a repeated field?
If those are correct assumptions, then what you are seeing is exactly the behavior you referenced from the documentation above. You asked BigQuery to "flatten results" so it turned your repeated field into an un-repeated field and duplicated all the fields around it so that you have a flat (i.e., no repeated data) table.
If the desired behavior is the one you see when querying over the subquery, then you should uncheck that box when creating your table.
Looking at the documentation, I see that when I selected a field that
is inside a record it will automatically flatten that record and
duplicate the surrounding columns.
This is not correct. BTW, can you please point to the documentation - it needs to be improved.
Selecting a field does not flatten that record. So if you have a table T with a single record {a = 1, b = (2, 2, 3)}, then do
SELECT * FROM T WHERE b = 2
You still get a single record {a = 1, b = (2, 2)}. SELECT COUNT(a) from this subquery would return 1.
But once you write results of this query with flatten=on, you get two records: {a = 1, b = 2}, {a = 1, b = 2}. SELECT COUNT(a) from the flattened table would return 2.
Using PL-SQL, I need to find the record with the lastest INVC_LN_ITEM_STAT_START_DT value within a group of records that share the same value for SHPMNT_LN_ITEM_KEY and RPT_PER_KEY.
How else might this be done? Are there analytical functions for this type of query?
SELECT
m1.*
FROM
HD_INVC_LN_ITEM_STAT m1
LEFT OUTER JOIN HD_INVC_LN_ITEM_STAT m2
ON (
m1.SHPMNT_LN_ITEM_KEY = m2.SHPMNT_LN_ITEM_KEY
AND m1.RPT_PER_KEY = m2.RPT_PER_KEY
AND m1.INVC_LN_ITEM_STAT_START_DT < m2.INVC_LN_ITEM_STAT_START_DT)
WHERE
m2.SHPMNT_LN_ITEM_KEY IS NULL
ORDER BY
m1.SHPMNT_LN_ITEM_KEY
,m1.RPT_PER_KEY
,m1.INVC_LN_ITEM_STAT_CD
,m1.INVC_LN_ITEM_STAT_START_DT
How about this?
SELECT
HD_INVC_LN_ITEM_STAT1.*
FROM
HD_INVC_LN_ITEM_STAT HD_INVC_LN_ITEM_STAT1
INNER JOIN
(
SELECT
HD_INVC_LN_ITEM_STAT2.SHPMNT_LN_ITEM_KEY
,HD_INVC_LN_ITEM_STAT2.RPT_PER_KEY
,MAX(HD_INVC_LN_ITEM_STAT2.INVC_LN_ITEM_STAT_START_DT) AS MAX_INVC_LN_ITEM_STAT_START_DT
FROM
HD_INVC_LN_ITEM_STAT HD_INVC_LN_ITEM_STAT2
GROUP BY
HD_INVC_LN_ITEM_STAT2.SHPMNT_LN_ITEM_KEY
,HD_INVC_LN_ITEM_STAT2.RPT_PER_KEY
) HD_INVC_LN_ITEM_STAT2
ON
HD_INVC_LN_ITEM_STAT1.SHPMNT_LN_ITEM_KEY = HD_INVC_LN_ITEM_STAT2.SHPMNT_LN_ITEM_KEY
AND HD_INVC_LN_ITEM_STAT1.RPT_PER_KEY
= HD_INVC_LN_ITEM_STAT2.RPT_PER_KEY
AND HD_INVC_LN_ITEM_STAT1.INVC_LN_ITEM_STAT_START_DT = HD_INVC_LN_ITEM_STAT2.MAX_INVC_LN_ITEM_STAT_START_DT
ORDER BY
HD_INVC_LN_ITEM_STAT1.SHPMNT_LN_ITEM_KEY
,HD_INVC_LN_ITEM_STAT1.RPT_PER_KEY
,HD_INVC_LN_ITEM_STAT1.INVC_LN_ITEM_STAT_CD
,HD_INVC_LN_ITEM_STAT1.INVC_LN_ITEM_STAT_START_DT;
It's longer but arguably more intuitive. I would like other people's opinions on which is more efficient.
My english won't help to explain what my problem is but i will give a try.
Lets say we have a table with (photoId, _bandId, desc)
Now we have a set of records
1, 1000, TestDesc1
2, 1000, TestDesc2
3, 1010, TestDesc3
4, 1900, TestDesc4
5, 1000, TestDesc5
Im trying to find a select command where gets all the [desc] for a specific [_bandId]
BUT AS FIRST RESULT i want a specific photoId
What i use,
SELECT * FROM [myTable]
WHERE _bandId = (SELECT _bandId from [myTable] where photoId=2)
This of course gives me as output :
TestDesc1
TestDesc2
TestDesc5
BUT what i really need as output is this
TestDesc2
TestDesc1
TestDesc5
Kind Regards
Konstantinos
Add an order by with a case as the first parameter, where you create a value that is 0 for the record that you want first and 1 for the other records. After that you can add more parameters for how you want to sort the remaing records, for example on description:
SELECT
*
FROM
[myTable]
WHERE
_bandId = (SELECT _bandId from [myTable] where photoId=2)
ORDER BY
CASE PhotoId WHEN 2 THEN 0 ELSE 1 END,
[desc]