Low memory condition , cluster under pressure - azure-data-explorer

I am constantly getting errors like 'Low memory condition , cluster under pressure' , 'bad allocation' even for simple metadata commands like .show table extents or .show operations. Nothing significant is running on the cluster , is it possible that memory is blocked from some operation that got finished sometime earlier? If so, is there any command to flush , free this reserved memory? The cluster is using V3 engine.

This is not expected, please open a support ticket.

Related

When I execute the MATCH statement, it reports an error on NebulaGraph Database

NebulaGraph version is master.
Deployment way is stand-alone.
Installation way is Docker.
Disk is SSD.
CPU is m1 16G.
The NebulaGraph master branch I deployed executes match(m)-[:follow]->() return m limit 10 statement and reports an error
-1005:Scan vertices or edges need to specify a limit number, or limit number can not push down.
I checked several times but found nothing wrong. Anybody who can help revise the query I give?
This limit cannot be pushed down, you have to put some other limit on m. For example:
match(m)-[:follow]->() where id(m)=="xxx" return m limit 10
match(m)-[:follow]->() where m.name=="xxx" return m limit 10

Azure Cosmos DB Emulator slow (100 ms / request)

I am trying to set up the Azure Cosmos DB Emulator to work locally with integration tests but I found that it is very slow.
I am reading a ~1KB JSON document with the container.ReadItemAsync<T> method, and awaiting the answer. I am calling this method in a loop, for 100 times.
The execution time is consistently around 9.5-10 seconds, so one request takes around 100 milliseconds which is very slow compared to the fact that this service is running locally.
Why is this so slow and how can I make it faster?
I expect at most 1 ms / request considering it is all disk I/O.
I tried the following but they didn't work:
Turning Rate Limiting on/off
creating the database/collection with various provisioning settings, it has zero effect on performance (even 100k RU)
creating the db and collection manually vs with the client SDK
"Reset Data" menu in the emulator tray menu
Further information:
The emulator version is 2.14.6.0 (68d4ca59)
I start the emulator from the start menu, but starting it from the command line doesn't change anything
I am using the Microsoft.Azure.Cosmos nuget package, version 3.22.1
my CPU is i7-8565U, but it isn't even fully used while the test is running
my system has 16 GB RAM
my system is running on a fast enough SSD: "NVMe SK hynix BC501 H", but while running the test the SSD usage is between 0 and 2%.
the performance is the same if I increase the document size to 100 KB or even 1 MB.
Creating your CosmosClientOptions with the AllowBulkExecution = true setting can cause this.
the SDK will construct batches and group operations, when the batch is full, it will get dispatched, but if the batch doesn’t fill up, there is a timer that will dispatch it to make sure they complete. This timer currently is 100 milliseconds. So if the batch does not get filled up (for example, you are just sending 50 concurrent operations), then the overall latency might be affected.
Source: Introducing Bulk support in the .NET SDK

How to resolve celery.backends.rpc.BacklogLimitExceeded error

I am using Celery with Flask after working for a good long while, my celery is showing a celery.backends.rpc.BacklogLimitExceeded error.
My config values are below:
CELERY_BROKER_URL = 'amqp://'
CELERY_TRACK_STARTED = True
CELERY_RESULT_BACKEND = 'rpc'
CELERY_RESULT_PERSISTENT = False
Can anyone explain why the error is appearing and how to resolve it?
I have checked the docs here which doesnt provide any resolution for the issue.
Possibly because your process consuming the results is not keeping up with the process that is producing the results? This can result in a large number of unprocessed results building up - this is the "backlog". When the size of the backlog exceeds an arbitrary limit, BacklogLimitExceeded is raised by celery.
You could try adding more consumers to process the results? Or set a shorter value for the result_expires setting?
The discussion on this closed celery issue may help:
Seems like the database backends would be a much better fit for this purpose.
The amqp/RPC result backends needs to send one message per state update, while for the database based backends (redis, sqla, django, mongodb, cache, etc) every new state update will overwrite the old one.
The "amqp" result backend is not recommended at all since it creates one queue per task, which is required to mimic the database based backends where multiple processes can retrieve the result.
The RPC result backend is preferred for RPC-style calls where only the process that initiated the task can retrieve the result.
But if you want persistent multi-consumer result you should store them in a database.
Using rabbitmq as a broker and redis for results is a great combination, but using an SQL database for results works well too.

Google Cloud Bigtable: repeated grpc error code 13, then suddenly success

In short, we are sometimes seeing that a small number of Cloud Bigtable queries fail repeatedly (for 10s or even 100s of times in a row) with the error rpc error: code = 13 desc = "server closed the stream without sending trailers" until (usually) the query finally works.
In detail, our setup is as follows:
We are running a collection (< 10) of Go services on Google Compute Engine. Each service leases tasks from a pair of PULL task queues. Each task contains an ID of a bigtable row. The task handler executes the following query:
row, err := tbl.ReadRow(ctx, <my-row-id>,
bigtable.RowFilter(bigtable.ChainFilters(
bigtable.FamilyFilter(<my-column-family>),
bigtable.LatestNFilter(1))))
If the query fails then the task handler simply returns. Since we lease tasks with a lease time between 10 and 15 minutes, a little while later the lease will expire on that task, it will be lease again, and we'll retry. The tasks have a max retry of 1000 so they can be retried many times over a long period. In a small number of cases, a particular task will fail with the grpc error above. The task will typically fail with this same error every time it runs for hours or days on end, before (seemingly out of the blue) eventually succeeding (or the task runs out of retries and dies).
Since this often takes so long, it seems unrelated to server load. For example right now on a Sunday morning, these servers are very lightly loaded, and yet I see plenty of these errors when I tail the logs. From this answer, I had originally thought that this might be due to trying to query for a large amount of data, perhaps near the max limit that cloud bigtable will support. However I now see that this is not the case; I can find many examples where tasks that have failed many times finally succeed and report only a small amount of data (e.g. <1 MB) was retrieved.
What else should I be looking at here?
edit: From further testing I now know that this is completely machine (client) independent. If I tail the log on one of the task leasing machines, wait for a "server closed the stream without sending trailers" error, and then try a one-off ReadRow query to the same rowId from another, unrelated, totally unused machine, I get the same error repeatedly.
This error is typically caused by having more than 256MB of data in your reply.
However, there is currently a bug in our server side error handling code that allows some invalid characters in HTTP/2 trailers which is not allowed by the spec. This means that some error messages that have invalid characters will be seen as this kind of error. This should be fixed early next year.

Session error in Teradata Fastload Script

My Fastload script is scheduled to run every week and when it starts the script failed because of the insufficient number of sessions every week. but, when I restart the script manually then it executed with no session error.
I don't what causes it to fail every week with the same reason of insufficient session. Can anyone let me know what all may be the reason for the same.
Check for:
1. Schedule job connection string, if it point to one Teradata Node (I.P) address. Sometimes based on the concurrent session you exceed PE session limit (120 session). Try using DNS/VIP to achive better load balancing
2. Number of Unilities running on system during schedule time. If your exceed the limit of threshold use SLEEP and TANACITY to plance your job in queue instead it fails
3. Limit the Fastload session limit using SESSIONS
Thanks!!

Resources