Teradata - maximum load-factor - teradata

We run a Teradata datawarehouse and in order to operate fast & stable:
till which degree should/can we load it with data?
Obviously it should not be 100% full. But then: how much space should be left 'unused'? Does this depend on the kind of operations we typically do (not many joins, mainly just filtering and i/o)?
Thanks for any advice.

Yes, there is a guidance on that: guide
• Reserve 25% to 35% of total space for spool space
• Allow an extra 5% of PERM space in user DBC.

Related

How to decrease the minimum throughput

I need some support on Cosmos DB. I've scaled up a database above 10k RU/s, and now I can no longer come back down to anything below 10k. Looks like I've got locked into a higher tier of some sort, which is very frustrating. I need to go back to what it was before (1,800), but the minimum throughput is now 10,000 and I can't change it. Please help.
Minimum throughput is stuck in 10,000 RU/s
When you scale out Cosmos DB it creates physical partitions that cannot be deallocated. The result is a minimum RU/s that is about 10% of what the maximum throughput ever provisioned was.
The only way now to deal with this is to delete the container. If you have data in the container you need to keep you will need to migrate it to another container. This used to require writing code but there is an easier option now using the Live Data migrator. After you copy the data you can then delete the original container.
Update: this is now 1% of max throughput provisioned.

MariaDB performance according to CPU Cache

I want to build a MariaDB server.
I am researching DB performance according to CPU specification before server purchase.
Are there any benefits to gaining as CPU Cache changes from 8MB to 12MB?
Is it a good option to purchase a large CPU Cache?
No.
Don't pay extra for more cores. MariaDB essentially uses part of only one core for each active connection.
Don't pay extra for faster CPU or anything else to do with the CPU. If you do become CPU-bound, then come back here for help with optimizing your query; it is usually as simple as adding a composite index or reformulating a query.
Throwing hardware at something is a one-time partial fix and of low benefit -- a few percent: like 2% or, rarely, 20%.
Reformulation or indexing can (depending on the situation) give you 2x or 20x or even 200x.
More RAM up to a point is beneficial. Make a crude estimate of dataset size. Let us know that value, plus some clue of what type of app.
SSD instead of HDD is rather standard, but that may give noticeable benefit.

cloudera swap space reached to max threshold

I'm getting this alarm on Cloudera, is there any way to increase the swap space capacity?
While you ask how to increase the swap space capacity, I think it save to assume that what you are really looking for is a way to solve the problem of full swap space.
Increasing the swap space is only one way of dealing with the issue - the other is simply to use less swap space. Cloudera recommends using minimal to no swap space because using swap degrades the performance substantially. The way of controlling this is by setting the 'swappiness' to 1, vs the default of 60. See documentation for instructions and more rational.
If the swappiness is already set to 1, than you can try clearing the swap by toggling swap off, then on.
swapoff -a
swapon -a
Before toggling swap you should make sure that
the amount of swap space in use is less than the amount of free memory (as the contents of swap may be shifted to memory).
currently running processes are not using swap (running vmstat produces on output with columns labeled 'si' and 'so' telling you the amount of memory swapped in and out per second. if these are both 0, then you should be safe).

Amazon DynamoDB throttling

I've enabled auto-scaling for our dynamo-db table. It has a target utilization at 30% but it keeps throttling.
See this example screenshot where throttling is happening
As you can see it's exactly scaling up as you want it too. But I don't understand why it's still throttling. Its almost always below the provisioned throughput.
Can anyone explain what's going wrong and why it's still throttling?
Thanks,
Hendrik
Very hard to tell from the graph, and limited information.
Some thoughts:
AutoScaling can take 5 - 10 minutes to kick in. This is not fast enough if there is a sudden increase in usage. Perhaps you are seeing throttling in that 5 - 10 minute window before it scales up.
If you set CloudWatch Metrics to 1 min interval, you might see whats going on in a big more detail.
As mkobit mentioned, you might be hitting throughput on your Partition, depending on how your data is structured.
Your capacity units are evenly distributed between your partitions. So you may hit the capacity limit on your partition you have and which records you are trying to access, but not your table throughput.
This also depends on the amount of data you have stored, number of partitions etc.
HTH

AWS DynamoDB: read/write units estimation issue

I am creating an online crowd driven game. I expect the read/write requests to fluctuate (like, 50,50,50,1500,50,50,50)every second and I need to process all 100% requests with strong consistency.
I am planning to go with AWS's DynamoDB from GAE datastore for its strong consistency. I have the below doubts which I could not get clear answers in other discussions.
1. If the item size for a write action is just 4B, Will that be rounded to a 1KB and consume a write unit?
2. Financially it is not wise to set the Provisioned Throughput Capacity around the expected peak value. Alarms can warn us. But in the case of sudden rise, the requests could be throttled at the time we receive alarm. Is DynamoDB really designed to handle highly fluctuating read/write?
3. I read about Dynamc DynamoDB to update the read/write throughput capacity for us, When we add some read/write units, How long it will take to allocate them? If it takes too long, Whats the use of increasing the bar after the tide hits?
Google app engine bills just for the number of requests happen in that month. If I can make AWS work like, "Whatever the request count could be, I will expand and contract myself and charge you only for the used read/write units", I will go for AWS.
Please advise. Dont hesitate if I am not being clear at parts.
Thanks,
Karthick.
Yes. Item sizes are rounded up and the throughput is used. From the Provisioned Throughput in Amazon DynamoDB documentation:
The total number of read operations necessary is the item size, rounded up to the next multiple of 4 KB, divided by 4 KB.
It can handle some bursting, but it is generally intended to be used for uniform workloads. Here is a section from the Guidelines for Working with Tables documentation and some other helpful links about the best practices:
A temporary non-uniformity in a workload can generally be absorbed by the bursting allowance, as described in Use Burst Capacity Sparingly. However, if your application must accommodate non-uniform workloads on a regular basis, you should design your table with DynamoDB's partitioning behavior in mind (see Understand Partition Behavior), and be mindful when increasing and decreasing provisioned throughput on that table.
Query and Scan guidelines for avoiding bursts of read activity
The Table Best Practices section
Use Burst Capacity Sparingly
This one is going to depend on how much data your table has, because DynamoDB will have to repartition the data if you are scaling up. See the Consider Workload Uniformity When Adjusting Provisioned Throughput documentation for more information about the partitioning..

Resources