As anyofferschange notification amount varies with time. We don't have any specific way to read multiple notifications together.
So, I am reading one by one and saving some of information in sql server database, It takes quite a lot of time that I can never finish reading all the notifications.
What is the best possible way to achieve this?
Here's what I did...I started by clearing out the queue. Then I started my windows service that every few seconds polled the queue. I think I pulled back 10 messages at time. I would get a total count of messages and then spin up a number of threads that could handle the amount of messages I had waiting. One by one, I read the message, add to my SQL database, then delete the message from SQS.
Over time, I understood better how many threads to spin up and how often to poll my queue. As long as my service was running, I would maintain just a handful of SQS messages in the queue at a time and I would quickly read and process them. Occasionally, due to bad programming (yeah, it happens), my service would crash and I wouldn't know about it. Tens of thousands of messages become queued up and I would put my service in "crisis" mode, which polled at an increasing rate and essentially maxed out the number of calls I could make to SQS. Usually in a few hours, my service would catch up and then I increase the polling interval. Sometimes though, I would just dump the queue and start over as I'd have potentially hundreds of price changes on a single SKU and didn't want to waste the processing time to go through them. But most of the time, things ran smoothly.
Why can't you read more than one notification together? Like I said, I believe I read 10 at a time on each thread. Once I got the 10 messages, I processed them in a loop and dumped them to a SQL database. Once the 10 were processed, I send a message up to SQS to delete.
I ran this for several years on an account with over 10,000 SKU's. We had up to the minute price change notifications on all our products and could instantly reprice and update Amazon, if needed.
Related
How to avoid latency in EventHub consumer data ?
My Architecture (data flow): IOTHub -> EventHub -> BlobStorage (No deviation from IOTHub packet to Blob Storage JSON packet)
Deviation occurs only on consumer application side (Listener is receiving with delay of 30-50 seconds)
Azure Configuration: 4 Partitions with standard S2 tier subscription.
Publisher: 3000 packets per minute.
My question: BlobStorage has proper data without deviation, but why listener part is receiving with latency. How could I overcome this ?
Tried with EventProcessorClient with respective handlers as suggested in GitHub sample code. Works fine without error. But having huge latency.Tried EventHubProducerClient as well. still same latency issue.
I can't speak to how IoT Hub manages data internally or what it's expected latency between IoT data being received and when IoT Hub itself publishes to Event Hubs.
With respect to Event Hubs, you should expect to see individual events with varying degrees of end-to-end latency. Event Hubs is optimized for throughput (the number of events that flow through the system) and not for the latency of an individual event (the amount of time it takes for it to flow from publisher to consumer).
What I'd suggest monitoring is the backlog of events available to be read in a partition. If there are ample events already available in the partition and you’re not seeing them flow consistently through the processor as fast as you’re able to process them, that’s something we should look to tune.
Additional Event Hubs context
When an event is published - by IoT Hub or another producer - the operation completes when the service acknowledged receipt of the event. At this point, the service has not yet committed the event to a partition, and it is not available to be read. The time that it takes for an event to be available for reading varies and has no SLA associated with it. Most often, it’s milliseconds but can be several seconds in some scenarios – for example, if a partition is moving between nodes.
Another thing to keep in mind is that networks are inherently unreliable. The Event Hubs consumer types, including EventProcessorClient, are resilient to intermittent failures and will retry or recover, which will sometimes entail creating a new connection, opening a new link, performing authorization, and positioning the reader. This is also the case when scaling up/down and partition ownership is moving around. That process may take a bit of time and varies depending on the environment.
Finally, it's important to note that overall throughput is also limited by the time that it takes for you to process events. For a given partition, your handler is invoked and the processor will wait for it to complete before it sends any more events for that partition. If it takes 30 seconds for your application to process an event, that partition will only see 2 events per minute flow through.
We have a bus reservation system running in GKE in which we are handling the creation of such reservations with different threads. Due to that, CRUD java methods can sometimes run simultaneously referring to the same bus, resulting in the save in our DB of the LAST simultaneous update only (so the other simultaneous updates are lost).
Even if the probabilities are low (the simultaneous updates need to be really close, 1-2 seconds), we need to avoid this. My question is about how to address the solution:
Lock the bus object and return error to the other simultaneous requests
In-memory map or Redis caché to track the bus requests
Use GCP Pub/Sub, Kafka or RabbitMQ as a queue system.
Try to focus the efforts on reducing the simultaneous time window (reduce from 1-2 seconds up to milliseconds)
Others?
Also, we are worried if in the future the GKE requests handling scalability may be an issue. If we manage a relatively higher number of buses, should we need to implement a queue system between the client and the server? Or GKE load balancer & ambassador will already manages it for us? In case we need a queue system in the future, could it be used also for the collision problem we are facing now?
Last, the reservation requests from the client often takes a while. Therefore, we are changing the requests to be handled asynchronously with a long polling approach from the client to know the task status. Could we link this solution to the current problem? For example, using the Redis caché or the queue system to know the task status? Or should we try to keep the requests synchronous and focus on reducing the processing time (it may be quite difficult).
I'm running BizTalk 2006, and I have an orchestration that receives a series of messages (orders) that are correlated on BTS.MessageType. On my delay shape, I check the time until midnight, which is the batch cut off. I'm getting occasional instances where I receive a message after the loop ends, and this is creating Zombie messages. I still need these messages to process, but in a new instance of the orchestration. I need some ideas of how to handle this gracefully.
One option would be to correlate on the date (in addition to BTS.MessageType)
You would have to create a pipeline component that promotes the date without the time. But there could be some time window where messages would go "randomly" either to the old or new instance (for example if you have multiple BizTalk servers with slightly different times, or if the system clocks is resynchronized with a NTP source). To be safe, wait a few minutes before ending the previous day's instance.
If that window of overlap between the old and new instances is a problem, you should instead correlate on another value that changes only once a day, such as a Guid stored in a database and promoted by a pipeline component.
Otherwise, I've successfully used your "hackish" solution in past projects, as long as you can tolerate a small window where messages are queued and not processed immediately for a few minutes every day. In my case it was fine because messages are produced by american users during their work day, and sent by FTP or MSMQ. However if you have international users that sent messages by web services, then you may not have a time in the day where you probably won't receive anything, and the web services won't be able to queue the messages for later processing.
I need to log to the database every call to my Web API.
Now of course I don't want to go to my database on every call.
So lets say I have a dictionary or a hash table object in my cache,
and every 10000 records I go to the database.
I still don't want this every 10000 user to wait for this operation.
And I can't start a different thread for long operations since the application pool
can be recycled basically on anytime.
What is the best solution for this scenario?
Thanks
I would argue that your view of durability is rather inconsistent. Your cache of 10000 objects could also be lost at any time due to an app pool recycle or server crash.
But to the original question of how to perform a large operation without causing the user to wait:
Put constraints on app pool recycling and deal with the potential data loss.
Periodically dump the cached messages to a Windows service for further processing. This is still not 100% guaranteed to preserve data, e.g. the service/server could crash.
Use a message queue (MSMQ), possibly with WCF. A message queue can persist to disk, so this can be considered reasonably reliable.
Message Queuing (MSMQ) technology enables applications running at
different times to communicate across heterogeneous networks and
systems that may be temporarily offline. Applications send messages to
queues and read messages from queues.
Message Queuing provides guaranteed message delivery, efficient
routing, security, and priority-based messaging. It can be used to
implement solutions to both asynchronous and synchronous scenarios
requiring high performance.
Taking this a step further...
Depending on your requirements and/or environment, you could probably eliminate your cache, and write all messages immediately (and rapidly) to a message queue and not worry about performance loss or a large write operation.
Say I have a multi-step, asynchronous process with these restrictions:
Individual steps can be performed by any worker
Steps must be performed in-order
The approach I'm considering:
Insert a db row that represents the entire process, with a "Steps completed" column to keep track of the progress.
Subscribe to a queue that will receive a message when the entire process is done.
Upon completion of each step, update the db row and queue the next step in the process.
After the last step is completed, queue the "process is complete" message.
Delete the db row.
Thoughts? Pitfalls? Smarter ways to do it?
I've built a system very similar to what you've described in a large, task-intensive document processing system, and have had to live with both the pros and the cons for the last 7 years now. Your approach is solid and workable, but I see some drawbacks:
Potentially vulnerable to state change (i.e., what if process inputs change before all steps are queued, then the later steps could have inputs inconsistent with earlier steps)
More infrastructural than you'd like, involving both a DB and a queue = more points of failure, harder to set up, more documentation required = doesn't quite feel right
How do you keep multiple workers from acting on the same step concurrently? In other words, the DB row says 4 steps are completed, how does a worker process know if it can take #5 or not? Doesn't it need to know whether another process is already working on this? One way or another (DB or MQ) you need to include additional state for locking.
Your example is robust to failure, but doesn't address concurrency. When you add state to address concurrency, then failure handling becomes a serious problem. For example, a process takes step 5, and then puts the DB row into "Working" state. Then when that process fails, step 5 is stuck in "Working" state.
Your orchestrator is a bit heavy, as it is doing a lot of synchronous DB operations, and I would worry that it might not scale as well as the rest of the architecture, as there can be only one of those...this would depend on how long-running your steps were compared to a database transaction--this would probably only become an issue at very massive scale.
If I had it to do over again, I would definitely push even more of the orchestration onto the worker processes. So, the orchestration code is common and could be called by any worker process, but I would keep the central, controlling process as light as possible. I would also use only message queues and not any database to keep the architecture simple and less synchronous.
I would create an exchange with 2 queues: IN and WIP (work in progress)
The central process is responsible for subscribing to process requests, and checking the WIP queue for timed out steps.
1) When the central process received a request for a given processing (X), it invokes the orchestration code, and it loads the first task (X1) into the IN queue
2) The first available worker process (P1) transactionally dequeues X1, and enqueues it into the WIP queue, with a conservative time-to-live (TTL) timeout value. This dequeueing is atomic, and there are no other X tasks in IN, so no second process can work on an X task.
3) If P1 terminates suddenly, no architecture on earth can save this process except for a timeout. At the end of the timeout period, the central process will find the timed out X1 in WIP, and will transactionally dequeue X1 from WIP and enqueue it back into IN, providing the appropriate notifications.
4) If P1 terminates abnormally but gracefully, then the worker process will transactionally dequeue X1 from WIP and enqueue it back into IN, providing the appropriate notifications. Depending on the exception, the worker process could also choose to reset the TTL and retry the step.
5) If P1 hangs indefinitely, or exceeds its TTL, same result as #3. The central process handles it, and presumably the worker process will at some point be recycled--or the rule could be to recycle the worker process anytime there's a timeout.
6) If P1 succeeds, then the worker process will determine the next step, either X2 or X-done. If the next step is X2, then the worker process will transactionally dequeue X1 from WIP, and enqueue X2 into IN. If the next step is X-done, then the processing is complete, and the appopriate action can be taken, perhaps this would be enqueueing X-done into IN for subsequent processing by the orchestrator.
The benefits of my suggested approach are:
Contention between worker processes is specified
All possible failure scenarios (crash, exception, hang, and success) are handled
Simple architecture can be completely implemented with RabbitMQ and no database, which makes it more scalable
Since workers handle determining and enqueueing the next step, there is a more lightweight orchestrator, leading to a more scalable system
The only real drawback is that it is potentially vulnerable to state change, but often this is not a cause for concern. Only you can know whether this would be an issue in your system.
My final thought on this is: you should have a good reason for this orchestration. After all, if process P1 finishes task X1 and now it is time for some process to work on next task X2, it seems P1 would be a very good candidate, as it just finished X1 and is now available. By that logic, a process should just gun through all the steps until completion--why mix and match processes if the tasks need to be done serially? The only async boundary really would be between the client and the worker process. But I will assume that you have a good reason to do this, for example, the processes can run on different and/or resource-specialized machines.