Wikidata - request limit for SPARQL queries - wikidata

Is there a limit for queries on Wikidata (SPARQL queries only, not editing)? I couldn't find any official documentation about this. I wonder how strong queries are limited per minute/hour (and per IP-address).

Yes, there are limits.
Single query is currently limited to 1 minute runtime. Docs are here:
https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual
Also, each IP is limited to 5 concurrent requests currently. There's no limit on how many sequential requests can be processed.
These limits may be adjusted depending on capacity and traffic.

Related

Has firestore removed the soft limit of 1 write per second to a single document?

Firestore has always had a soft limit of 1 write per second to a single document. That meant that for doing things like a counter that updates more frequently than once per second, the recommended solution was sharded counters.
Looking at the Firestore Limits documentation, this limit appears to have disappeared. The Firebase summit presentation mentioned that Firestore is now more scalable, but only mentioned hard limits being removed.
Can anyone confirm whether this limit has indeed been removed, and we can remove all our sharded counters in favor of writing to a single count document tens or hundreds of times per second?
firebaser here
This was indeed removed from the documentation. It was always a soft limit that was not enforced in the code, but instead an estimate for the physical limitation of how long it takes to synchronize the changes to indexes and multiple data center.
We've significantly improved the infrastructure used for these write operations, and now provide tools such a the key visualizer to better analyze performance and look for hot spots in the read and write behavior of your app. While there's still some physical limit, we recommend using these tools rather than depending on a single documented value to analyze your app's database performance.
For most use-cases I'd recommend using the new COUNT() operator nowadays. But if you want to continue using write-time aggregation counters, it is still recommended to use a sharded counter for high-volume count operations, we've just stopped giving a hard number for when to use it.

query with pagination causing high throughput usage

I use Azure Cosmos DB API for MongoDB account, version 3.6. In queries using skip and limit I noticed higher throughput usage. The higher skip is the more costly query is.
db.MyCollection.find({Property:"testtest"}).skip(12000).limit(10)
Above query costs around 3000 RU. The property that is in find clause is my partition key. I have read that currently cosmosdb is capable of doing queries with offset and limit but I found that officaly only in SQL API for CosmosDb there's OFFSET LIMIT clause. Is it possible with MongoDb API either or should I live with costly queries with skip ?
The SQL API will yield the same result with OFFSET LIMIT. You'll find an almost linear increase in RU as you increase the offset as each query loops over all skipped documents.
If possible you should try to use the continuation token if possible in your context. You could also adjust your filter criteria using a indexed property to move over your data.
The RU charge of a query with OFFSET LIMIT will increase as the number of terms being offset increases. For queries that have multiple pages of results, we typically recommend using continuation tokens. Continuation tokens are a "bookmark" for the place where the query can later resume. If you use OFFSET LIMIT, there is no "bookmark". If you wanted to return the query's next page, you would have to start from the beginning.
Source

Are CosmosDB attachments (still) a good way to store payloads larger than the document limit of 2MB?

I need to save CosmosDB documents that contain a large list - larger than the document limit of 2 MB.
I'd like to split that list into an 'attachment' associated to the "main document".
But this documentation page briefly mentions that
Attachment feature is being depreciated [sic]
What's the deprecation plan? Will newly created collections (in the future) stop supporting attachments?
The same page of documentation mentions a limit of 2GB for "Maximum attachment size per Account".
What does that mean? 2GB per attachment? 2 GB total for all attachments?
I recommend not taking a dependency on attachments. We are still planning on deprecating them but have not started in earnest on this.
Depending on your access patterns for this data you may want to break this up as individual documents or modeled in some other way. CRUD operations on large documents can be very costly and will experience high latency because of the large payload in each request.
If you have an unbounded array these should definitely be stored as individual documents or modeled such that increasing size does not cause eventual performance issues. If your data is updated frequently it should be modeled such that the frequently updated portions are separate from properties that are static.
This article here describes scenarios and considerations when modeling data in Cosmos and may help you come up with a more efficient model.
https://learn.microsoft.com/en-us/azure/cosmos-db/modeling-data
Hope this is helpful.

is there a limitation for query a collection in Firebase Firestore?

I have watched videos about Firestore on YouTube. It is said that there is a limitation for a where which the max size is 1 Mb and also maximum 1 write per second.
How about the query to a collection? Is there a limitation for this? Because I will heavily rely on a parent collections to perform different queries for a lot of users. That's why I need to know the worst case scenario. I need to know if there are any limitations.
I mean something like, maximum number of query per second, max concurrent queries? Maximum number to get data from a collection in a second ? Do such limitations exis for querying a collection?
I have tried to read the documentation from here and it seems there is no limitation for query in a collection. I need to make sure, maybe there is documentation that I have not read yet?
There is no documented limit to the number of queries you can execute against Firestore. While there is probably a physical limit, you're extremely unlikely to hit it before running into any of the documented limits (such as the 1 million concurrent users).
In other words: it is quite unlikely you'll need to worry about the read scalability or limitations of Firestore for your application. It is made to scale very well on read operations, which is precisely the reasons why it supports a more limited set of functionality, and why it has a write throughput limit on individual documents.
Firestore scales massively for read operations. When using the Blaze payment plan, there are no fundamental read limits like there are for write limits. You just need to be willing to pay for all those documents reads, and the bandwidth required for all that data. Please read the pricing page about billing.
There is limit for Your reads and writes.
They have provided in their document where in Free service you have limited read writes.
Each read will be counted in normal queries it acts same with writing document

Elastic Cache vs DynamoDb DAX

I have use case where I write data in Dynamo db in two table say t1 and t2 in transaction.My app needs to read data from these tables lot of times (1 write, at least 4 reads). I am considering DAX vs Elastic Cache. Anyone has any suggestions?
Thanks in advance
K
ElastiCache is not intended for use with DynamoDB.
DAX is good for read-heavy apps, like yours. But be aware that DAX is only good for eventually consistent reads, so don't use it with banking apps, etc. where the info always needs to be perfectly up to date. Without further info it's hard to tell more, these are just two general points to consider.
Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache that can reduce Amazon DynamoDB response times from milliseconds to microseconds, even at millions of requests per second. While DynamoDB offers consistent single-digit millisecond latency, DynamoDB with DAX takes performance to the next level with response times in microseconds for millions of requests per second for read-heavy workloads. With DAX, your applications remain fast and responsive, even when a popular event or news story drives unprecedented request volumes your way. No tuning required. https://aws.amazon.com/dynamodb/dax/
AWS recommends that you use **DAX as solution for this requirement.
Elastic Cache is an old method and it is used to store the session states in addition to the cache data.
DAX is extensively used for intensive reads through eventual consistent reads and for latency sensitive applications. Also DAX stores cache using these parameters:-
Item cache - populated with items with based on GetItem results.
Query cache - based on parameters used while using query or scan method
Cheers!
I'd recommend to use DAX with DynamoDB, provided you're having more read calls using item level API (and NOT query level API), such as GetItem API.
Why? DAX has one weird behavior as follows. From, AWS,
"Every write to DAX alters the state of the item cache. However, writes to the item cache don't affect the query cache. (The DAX item cache and query cache serve different purposes, and operate independently from one another.)"
Hence, If I elaborate, If your query operation is cached, and thereafter if you've write operation that affect's result of previously cached query and if same is not yet expired, in that case your query cache result would be outdated.
This out of sync issue, is also discussed here.
I find DAX useful only for cached queries, put item and get item. In general very difficult to find a use case for it.
DAX separates queries, scans from CRUD for individual items. That means, if you update an item and then do a query/scan, it will not reflect changes.
You can't invalidate cache, it only invalidates when ttl is reached or nodes memory is full and it is dropping old items.
Take Aways:
doing puts/updates and then queries - two seperate caches so out of sync
looking for single item - you are left only with primary key and default index and getItem request (no query and limit 1). You can't use any indexes for gets/updates/deletes.
Using ConsistentRead option when using query to get latest data - it works, but only for primary index.
Writing through DAX is slower than writing directly to Dynamodb since you have a hop in the middle.
XRay does not work with DAX
Use Case
You have queries that you don't really care they are not up to date
You are doing few putItem/updateItem and a lot of getItem

Resources