Unnecessary IO and Product Join Indicators - teradata

what are Unnecessary-IO and Product Join Indicators in Teradata. How are these metrics determined for a query?

Product Join Indicator is the ratio of CPU Seconds to IO for a query. Generally when the PJI is greater than 3 the query should be reviewed. Greater than 6 you may find the query is performing an unnecessary product join. To calculate the PJI using DBQL metrics: (AMPCPUTime * 1000) / TotalIOCount
Unnecessary IO Indicator is the ratio of IO to CPU Seconds. If the UII is greater than 3 the query should be reviewed to eliminate full-table scans and possibly redistribution steps. UII is a reasonable indicator to identify queries that may benefit from additional statistics to indexing improvements. To calculate the UII using DBQL metrics: TotalIOCount / (AMPCPUTime * 1000)

Related

SQLite optimization: simultaneous search for lower and upper bounds

For a long running algorithm based on SQLite3, I have a simple but huge table defined like that:
CREATE TABLE T(ID INTEGER PRIMARY KEY, V INTEGER);
The inner loop of the algorithm will need to find, given some integer N, the biggest ID that is less or equal to N, the value V associated to it, as well as the smallest ID that is strictly bigger than N.
The following pair of requests does work:
SELECT ID, V FROM T WHERE ID <= ? ORDER BY ID DESC LIMIT 1;
SELECT ID FROM T WHERE ID > ? LIMIT 1;
But I feel that it should be possible to merge those two requests into a single one. When SQLite has consulted the primary index to find the ID just smaller than N (first request), the next entry in the B-tree index is already the answer to the second request.
To give an order of magnitude, the table T has more than one billion of rows, and the inner requests will need to be executed more than 100 billions of times. Hence each microsecond counts. Of course I will use a fast SSD on a server with plenty of RAM. PostgreSQL could also be an option if it is quicker for that usage without taking more disk space.
This is an answer to my own question. While I didn't found yet a better SQL request as posted in the question, I made some preliminary speed measurements that moved slightly the perspective. Here are my observations:
There is a big difference in search or insert performance depending whether ID values are sequential or random order. In my application, ID will be mostly sequential, but with plenty of exceptions.
Executing the pair of SQL requests takes less time than the sum of each request taken separately. This is most visible with random order. This means that when the second SQL request runs, the B-Tree to access the next ID is always in cache memory and walking through the index is faster the second time.
The search and insertion times per request increase with the number of rows. In sequential order, the difference is small, but in random order the increase is substantial. Indexing a B-Tree is inherently O(log N), and in addition OS cache becomes less performant as the file size increases.
Here are my measurements on a fast server with SSD:
| Insertion (µs)| Search (µs)
# rows | sequ. random | sequ. random
10^5 | 0.60 0.9 | 1.1 1.3
10^6 | 0.64 3.1 | 1.2 2.5
10^7 | 0.66 4.3 | 1.2 3.0
10^8 | 0.70 5.6 | 1.3 4.2
10^9 | 0.73 | 1.3 4.6
My conclusion is that SQLite internal logic doesn't seem to be the bottleneck for the foreseen algorithm. That bottleneck for huge tables is disk access, even on a fast SSD. I don't expect to have a better performance with another database engine, nor with a custom made B-Tree.

How to find overall CPU usage in a multi-tenant environment?

I have a single server/multiple workers architecture to run distributed queries. Once a query finishes, it reports back to the server the total time taken for completion and I maintain a running counter of the total CPU time (totalCpuTime) for completion of ALL the queries. With this count I want to expose the overall CPU Usage of the cluster on the server.
I was thinking of polling for the totalCpuTime every 3 minutes (sampling rate). Lets say I have two data-points d1 and d2 which are sampling_rate = 1800s apart. I did cpuUsage = (d2 - d1)/total_number_of_workers * sampling_rate but this is giving me a number greater than 100 and I don't know how to make sense of it.
Any ideas or other approaches?

DocumentDB read latency when queried across partitions

I created 2 empty documentDB collections: 1) with single partition and 2) with multi-partition. Next inserted a single row on both these collections and ran a scan (select * from c). I found that the single partition took up ~2RUs whereas multi-partition took about ~50RUs. It's not just the RU's, but the read latency was about 20x slower with multi-partition. So is it that multi-partition always has high read latency when queried across partitions?
You can get the same latency for multi-partition collections as single-partition collections. Let's take the example of scans:
If you have non-empty collections, then the performance
will be the same as data is read from one of the partitions. Data is read from the first partition, and paginated across partitions in order.
If you use the MaxDegreeOfParallelism option, you'll get the same low
latencies. Note that query execution is serial by default, in order
to optimize for queries with larger datasets. If you use the
parallelism option, the query will have the same low latency
If you scan with a filter on partition key = value, then you'll get the
same performance even without the parallelism.
It is true that there is a small RU overhead for each partition touched during query (~2 RU per partition for query parsing). Note that this doesn't increase with query size, i.e., even if your query returned e.g. 1000 documents, the query will be 1000 + P*2 RUs for partitioned collections in place of 1000 RUs. You can eliminate this overhead of course by including a filter on partition key.

graphite: how to get per-second metrics from batch metrics?

I'm trying to measure a online mini-batch processing system with a per-second metrics (total query per second). For every batch, a metric (e.g. "stats.gauges.<host>.query.count") will be send to graphite. batches are processed in several different hosts in parallel and a batch of data take about 5 seconds to process.
I've tried:
simply sum series: sumSeries(stats.gauges.*.query.count),
the result metrics is many times greater than the actual value;
scale to 1 second:
scaleToSeconds(sumSeries(stats.gauges.*.query.count),
1), the result metrics is much less than the actual value;
integral then derivative: nonNegativeDerivative(sumSeries(integral(stats.gauges.*.query.count))), same as the first case ...
send gauges with
delta=True param, then derivative. the result is about 20% greater
than the actual value
so, how to get per-second metrics from batch metrics? what is the best practice?
You should use carbon-aggregator service to add several metrics together as they come in. There is an example which fits your case at http://graphite.readthedocs.io/en/latest/config-carbon.html#aggregation-rules-conf
As your batch takes 5 secs to process, frequency should be 5 to buffer all the metrics. After five seconds, aggregator will sum them up and write to graphite.

The fastest way to SELECT MAX( ) from table

What would be the most efficient and elegant way to select MAX value from transparent table?
For example, let's try to select such simple table as T100, Messages. Let us imagine we should select maximum message number within some application area.
We can do this in a multiple ways. The most obvious is using MAX aggregate function
select MAX(msgnr)
from t100
where arbgb = '/ASU/GENERAL'
And the other is using UP TO 1 ROWS clause
select msgnr
from t100
where arbgb = '/ASU/GENERAL'
and ROWNUM = 1
order by msgnr DESC
Herein above, all SQL statements are given in a native Oracle SQL, as I was doing tests in DBACOCKPIT where this is mandatory.
Built-in cockpit tool showed almost identical results for both requests:
estimated costs is 3 for both of them, however, second query has only 2 steps in plan whereas the first has 3.
estimated CPU costs for the first query is 21.564 per step, however second has total cost of 21.964.
So we can conclude that second variant is more efficient here.However, would this be a point in a common case? Would this result be different on different DBs or we can treat it as a rule of thumb?
Performance measurements always vary depending on a rather large set of parameters, the DBMS being not the least of them. Some DBMS might, for example, be able to use an index to efficiently determine a MAX(...) statement while others might have to resort to some more inefficient form. Also, don't trust the optimizer's predictions - measure the real performance. The optimizer might be wrong for a number of reasons (outdated statistics and broken indexes being the most common ones as far as I have seen), and it's of little help if the optimizer estimates a total cost of 12.345 if the actual cost turns out to be 987.654. Hence, my recommendations would be
select only what you need (avoid SELECT *, use WHERE sensibly)
measure, measure, measure
be wary of most of the unproven and alleged performance rules-of-thumb out there.

Resources