Let's assume I am working on an online shop with high traffic. The items on sale are in high demand but also very limited. I need to make sure they won't be oversold.
Currently I have something like this:
$order->addProduct($product);
$em->persist($order);
if($productManager->isAvailable($product)){
$em->flush();
}
However, I suppose this still allows for overselling a product if two orders come in within a very short period of time. What other possibilities are there to make sure the product will definitely never be oversold?
You need to use a pessimistic lock inside a transaction.
Let's say your Product entity has a count field containing the number of items left. After a user purchases an item, you decrease that field.
In this case you need a pessimistic write lock. Basically, it locks a row from being read and/or updated by other processes that try to acquire a pessimistic lock too. Those processes stay locked until the transaction that locked the row ends by either committing or rolling back or after a timeout.
So, you start a transaction, acquire a lock, check whether there are enough items left, add them to the order, decrease the number of items and commit the transaction:
$em->beginTransaction();
try {
$em->lock($product, LockMode::PESSIMISTIC_WRITE);
if ($product->getCount() < $numberOfItemsBeingPurchased) {
throw new NotEnoughItemsLeftInStock;
}
$order->addItem($product, $numberOfItemsBeingPurchased);
$product->decreaseCount($numberOfItemsBeingPurchased);
$em->commit();
} catch (Exception $e) {
$em->rollback();
throw $e;
}
I'm suggesting throwing an exception here because 2 users buying the last item at the same time is an exceptional situation. Of course, you should use some sort of item count check — validation constraints or something else — before you run this code. So, if a user has made past that check but another user bought the last item after the check and before the current user actually bought it, it's an exceptional situation.
Also note that you should start and end transactions during a single HTTP request. I mean, do not lock a row in one HTTP request, wait for the user to complete the purchase and release the lock only after that. If you want users to be able to keep items in their carts for some time — like in the real world carts — use other means for that like reserving a product for the user for some time by still decreasing the count of items left in stock and releasing it if after some timeout by adding that number of items back.
There's a complete chapter on Doctrine2 talking about concurrency which is exactly what you need.
You need to wirte a transactionnal custom query, and lock down your table during the query time. It's all explained here : Transactions and Concurrency
Related
I have about 780K(count) items stored in DDB.
I'm calling DynamoDBMapper.query(...) method to get all of them.
The result is good, bcs I can get all of the items. But it cost me 3min to get them.
From the log, I see the DynamoDBMapper.query(...) method is trying to get items page by page, each page will request an individual query call to DDB which will cost about 0.7s for each page.
I counted that all items returned with 292 pages, so the total duration is about 0.7*292=200s which is unacceptable.
My code is basically like below:
// setup query condition, after filter the items count would be about 780K
DynamoDBQueryExpression<VendorAsinItem> expression = buildFilterExpression(filters, expression);
List<VendorAsinItem> results = new ArrayList<>();
try {
log.info("yrena:Start query");
DynamoDBMapperConfig config = getTableNameConfig();
results = getDynamoDBMapper().query( // get DynamoDBMapper instance and call query method
VendorAsinItem.class,
expression,
config);
} catch (Exception e) {
log.error("yrena:Error ", e);
}
log.info("yrena:End query. Size:" + results.size());
So how can I get all items at once without pagination.
My final goal is to reduce the query duration.
EDIT Just re-read the title of the question and realized that perhaps I didn't address the question head on: there is no way to retrieve 780,000 items without some pagination because of a hard limit of 1MB per page
Long form answer
780,000 items retrieved, in 3 minutes, using 292 pages: that's about 1.62 pages per second.
Take a moment and let that sync in..
Dynamo can return 1MB of data per page, so you're presumably transferring 1.5MB of data per second (that will saturate a 10 Mbit pipe).
Without further details about (a) the actual size of the items retrieved; (b) the bandwidth of your internet connection; (c) the number of items that might get filtered out of query results and (d) the provisioned read capacity on the table I would start looking at:
what is the network bandwidth between your client and Dynamo/AWS -- if you are not maxing that out, then move on to next;
how much read capacity is provisioned on the table (if you see any throttling on the requests, you may be able to increase RCU on the table to get a speed improvement at a monetary expense)
the efficiency of your query:
if you are applying filters, know that those are applied after query results are generated and so the query is consuming RCU for stuff that gets filtered out and that also means the query is inefficient
think about whether there are ways you can optimize your queries to access less data
Finally 780,000 items is A LOT for a query -- what percentage of items in the database is that?
Could you create a secondary index that would essentially contain most, or all of that data that you could then simply scan instead of querying?
Unlike a query, a scan can be parallelized so if your network bandwidth, memory and local compute are large enough, and you're willing to provision enough capacity on the database you could read 780,000 items significantly faster than a query.
A Lambda function gets triggered by SQS messages. the reserved concurrency is set to the maximum which means I can have concurrent Lambda execution. each Lambda will read the SQS message and needs to update a Dynamodb table that holds the sum of message lengths. it's a numeric value that increases.
Although I have implemented the optimistic locking, I still see the final value doesn't match with the actual correct summation. any thoughts?
here is the code that does the update:
public async Task Update(T item)
{
using (IDynamoDBContext dbContext = _dataContextFactory.Create())
{
T savedItem = await dbContext.LoadAsync(item);
if (savedItem == null)
{
throw new AmazonDynamoDBException("DynamoService.Update: The item does not exist in the Table");
}
await dbContext.SaveAsync(item);
}
}
Best to use DynamoDB streams here, and batch writes. Otherwise you will unavoidably have transaction conflicts, probably sitting in some logs somewhere are a bunch of errors. You can also see this cloudwatch metric for your table: TransactionConflict.
DynamoDB Streams
To perform aggregation, you will need to have a table which has a stream enabled on it. Set the MaximumBatchingWindowInSeconds & BatchSize to values which suit your requirements. That is say you need the able to be accurate within 10 seconds, you would set MaximumBatchingWindowInSeconds to no more than 10. And you might not want to have more than 100 items waiting to be aggregated so set BatchSize=100. You will create a Lambda function which will process the items coming into your table in the form of:
"TransactItems": [{
"Put": {
"TableName": "protect-your-table",
"Item": {
"id": "123",
"length": 4,
....
You would then iterate over this and sum up the length attribute, and do an update ADD statement to a summation in another table, which holds calculated statistics based on the stream. Note you may receive duplicate messages which may cause you errors. You could handle this in Dynamo by making sure you don't write an item if it exists already, or use message deduplication id.
Batching
Make sure you are not processing many many tiny messages one at a time, but instead are batching them together say in your Lambda function which reads form SQS that it can read up to 100 messages at a time and do a batch write. Also set a low concurrency limit on it, so that messages can bank up a little over a couple of seconds.
The reason you want to do this is that you can't actually increment a value in DynamoDB many times a second, it will give you errors and actually slow your processing. You'll find your system as a whole will be performing at a fraction of the cost, be more accurate, and the real time accuracy should be close enough to what you need.
In my Firebase database I have posts and then authenticated users can "like" posts. How can I efficiently get the number of likes a post has received. I know using MongoDB I can add/remove the user's id to a list and then use a MongoDB function to get the length of it very quickly and set that equal to the likes amount, but I'm not suer how to do that using Firebase. I could also add/remove it to the list and increment a likeCount variable, but that seems like it would cause concurrency issues unless Firebase has a function for that. What functions can I call to best handle this and scale well? Thanks in advance!
You can do both things:
1) Create a votes node with the UID as key and a value to sum up all the votes.
post:{
//All the data
likes:{
$user_1:1,
$user_2:-1,
}
}
And then just get a SingleValue Event or a Value event(depending if you want to keep track of changes) and sum up all the children
2)You can use a transaction block and just save a value and increase or decrease it depending on the votes
(here is a link where you can find transactions for android,iOS or java)
https://firebase.google.com/docs/database/web/save-data#save_data_as_transactions
post:{
//All the data,
likes:2,
}
It really depends on how much information you want to store, and what the user can do once he/she already voted for some post,
I would recommend using both, to keep flexibility for the user to like (like in Facebook) so he can unlike something and use the transaction with number to keep it scalable.. so if a post gets 1,000,000 likes you don't have to count the 1,000,000 likes every time someone loads the post
I have an asp.net page that calls a dll that will start a long process that updates product information. As the process is running, I want to provide the user with constant updates on which product that process is on and the status of the products. I've been having trouble getting this to work. I add information to a log text file and I was thinking that I'd redirect to a new page and have that page use Javascript to read from the text file every few seconds. My question is has anyone else tried this and does it work fairly well?
I would use Ajax and poll that text file for updates.
Make the long process so that it updates some state about the progress - status, last product processed, etc.
Once started, you might want to update the progress on only say every 50th item processed, to spare resources (you might not, it depends on what you want) on the ASP.NET side.
If the processing is associated with the current session, you might want to put the state in the session. If it is not - e.g. global - put it in some global state.
Then poll the state once in a while through Ajax from javascript, using e.g. JSON and update the UI accordingly.
Make sure you use proper locking when accessing the shared state.
Also, try to keep the state small (store only what is absolutely required to get the data for the js gui).
Rather than outputting the status to a text based log file (or, in addition to using the log file; you can use both), you could output the status to a Database table. For example, structure the table as so:
Queue:
id (Primary Key)
user_id (Foreign Key)
product_id (Foreign Key) //if available
batch_size //total set of products in the batch this message was generated from
batch_remaining //number remaining in this batch
message
So now you have a queue. Now when a user goes on a particular page, you can do one of two things:
Grab some relevant data from the Queue and display it in the header. This will be updated every time the page is refreshed.
Create an AJAX handler that will poll a web service to grab the data at intervals. This way you can have the status update every 15 or 20 seconds, even if the user is on the same page without refreshing.
You can delete the rows from the Queue after they've been sent, or if you want to retain that data, add another column called sent and set it to true once the user has seen that data.
I have a very large (millions of rows) SQL table which represents name-value pairs (one columns for a name of a property, the other for it's value). On my ASP.NET web application I have to populate a control with the distinct values available in the name column. This set of values is usually not bigger than 100. Most likely around 20. Running the query
SELECT DISTINCT name FROM nameValueTable
can take a significant time on this large table (even with the proper indexing etc.). I especially don't want to pay this penalty every time I load this web control.
So caching this set of names should be the right answer. My question is, how to promptly update the set when there is a new name in the table. I looked into SQL 2005 Query Notification feature. But the table gets updated frequently, very seldom with an actual new distinct name field. The notifications will flow in all the time, and the web server will probably waste more time than it saved by setting this.
I would like to find a way to balance the time used to query the data, with the delay until the name set is updated.
Any ides on how to efficiently manage this cache?
A little normalization might help. Break out the property names into a new table, and FK back to the original table, using a int ID. you can display the new table to get the complete list, which will be really fast.
Figuring out your pattern of usage will help you come up with the right balance.
How often are new values added? are new values added always unique? is the table mostly updates? do deletes occur?
One approach may be to have a SQL Server insert trigger that will check the table cache to see if its key is there & if it's not add itself
Add a unique increasing sequence MySeq to your table. You may want to try and cluster on MySeq instead of your current primary key so that the DB can build a small set then sort it.
SELECT DISTINCT name FROM nameValueTable Where MySeq >= ?;
Set ? to the last time your cache has seen an update.
You will always have a lag between your cache and the DB so, if this is a problem you need to rethink the flow of the application. You could try making all requests flow through your cache/application if you manage the data:
requests --> cache --> db
If you're not allowed to change the actual structure of this huge table (for example, due to huge numbers of reports relying on it), you could create a holding table of these 20 values and query against that. Then, on the huge table, have a trigger that fires on an INSERT or UPDATE, checks to see if the new NAME value is in the holding table, and if not, adds it.
I don't know the specifics of .NET, but I would pass all the update requests through the cache. Are all the update requests done by your ASP.NET web application? Then you could make a Proxy object for your database and have all the requests directed to it. Taking into consideration that your database only has key-value pairs, it is easy to use a Map as a cache in the Proxy.
Specifically, in pseudocode, all the requests would be as following:
// the client invokes cache.get(key)
if(cacheMap.has(key)) {
return cacheMap.get(key);
} else {
cacheMap.put(key, dababase.retrieve(key));
}
// the client invokes cache.put(key, value)
cacheMap.put(key, value);
if(writeThrough) {
database.put(key, value);
}
Also, in the background you could have an Evictor thread which ensures that the cache does not grow to big in size. In your scenario, where you have a set of values frequently accessed, I would set an eviction strategy based on Time To Idle - if an item is idle for more than a set amount of time, it is evicted. This ensures that frequently accessed values remain in the cache. Also, if your cache is not write through, you need to have the evictor write to the database on eviction.
Hope it helps :)
-- Flaviu Cipcigan