When I connect to the console of Openstack instances over noVNC, it is automatically disconnected after a period of time without any interactions. Is there any way to disable this feature or increase the timeout?
Related
We have a bus reservation system running in GKE in which we are handling the creation of such reservations with different threads. Due to that, CRUD java methods can sometimes run simultaneously referring to the same bus, resulting in the save in our DB of the LAST simultaneous update only (so the other simultaneous updates are lost).
Even if the probabilities are low (the simultaneous updates need to be really close, 1-2 seconds), we need to avoid this. My question is about how to address the solution:
Lock the bus object and return error to the other simultaneous requests
In-memory map or Redis caché to track the bus requests
Use GCP Pub/Sub, Kafka or RabbitMQ as a queue system.
Try to focus the efforts on reducing the simultaneous time window (reduce from 1-2 seconds up to milliseconds)
Others?
Also, we are worried if in the future the GKE requests handling scalability may be an issue. If we manage a relatively higher number of buses, should we need to implement a queue system between the client and the server? Or GKE load balancer & ambassador will already manages it for us? In case we need a queue system in the future, could it be used also for the collision problem we are facing now?
Last, the reservation requests from the client often takes a while. Therefore, we are changing the requests to be handled asynchronously with a long polling approach from the client to know the task status. Could we link this solution to the current problem? For example, using the Redis caché or the queue system to know the task status? Or should we try to keep the requests synchronous and focus on reducing the processing time (it may be quite difficult).
We are using Cosmos DB SDK whose version is 2.9.2. We perform Document CRUD operations. Usually, the end-to-end P95 latency is 20ms. But sometimes the latency is over 1000ms. The high latency period lasts for 10 hours to 1 day. The collection is not throttling.
We have get some background information from:
https://icm.ad.msft.net/imp/v3/incidents/details/171243015/home
https://icm.ad.msft.net/imp/v3/incidents/details/168242283/home
There are some diagnostics strings in the tickets.
We know that the client maintains a cache of the mapping of logical partition and physical replica address. This mapping may be outdated because of replicas movement or outage. So client tries to read from the second/third replica. However, this retry has significant impact on end to end latency. We also observe that the high latency/timeout can last for several hours, even days. I expect there’s some mechanism of refreshing mapping cache in the client. But it seems the client stops visiting more than one replica only after we redeploy our service.
Here are my questions:
How can the client tell whether it’s unable to connect to a certain replica? Will the client wait until timeout or server tells client that the replica is unavailable?
In which condition the mapping cache will be refreshed? We are using Session consistency and TCP mode.
Will restarting our service force the cache to be refreshed? Or refreshing only happens when the machine restarts?
When we find there’s replica outage, is there any way to quickly mitigate?
What operations are performed (Document CRUD or query)?
And what are the observed latencies & frequencies? Also please check if the collection is throttling (with custom throttling policy).
Client do manage the some metada and does handle its staleness efficiently with-in SLA bounds.
Can you please create a support ticket with account details and 'RequestDiagnostis' and we shall look into it.
We have a replicated cluster cache setup with two instances, everything runs well when both instances are on-line, and we are using Community Edition 4.8.
When we take an instance offline, cache management becomes very slow and even stopping and starting the cache from NCache Manager GUI takes a very long time and then shows a message stating that there is an instance that is un-reachable.
Also when trying to fetch data from cache or add data to it, it gives an exception of operation timeout, and there is no response from the single instance that is still running.
From what I understand, this scenario should be handled by the cache service it-self since it is replicated, and it should handle failure for an instance going offline.
Thank you,
I would like to explain the cause of slowness on your application when one of the server node gets removed from the cache cluster.
What happens is whenever a node gets removed from the cache cluster, the surviving node/nodes go into recovery process and try to re-establish connection with that downed server node. By default this Connection retry value is set to “2” which means that the surviving nodes would try to reconnect with the downed node two times and after the reconnection has failed, the cache cluster would consider the downed server and offline and the cluster would start handling requests like before. This reconnection process can take up to 90 seconds as this is the default TCP/IP timeout interval and if the connection retry is set to “2” the recovery process could take up to around 200 seconds. Your application(Or NCache Manager calls) could experience slowness or request timeouts during this 2-3 minutes window when the cluster is in the recovery mode but once the recovery process is finished, the application should start working without any issues. If the slowness or request timeouts last more than a few minutes
The Connection retry value can be changed from the NCache “Config.ncconf” file. Increasing the number of connection retries would mean that the cluster would spend more time in the recovery process. The purpose of this feature is that if there is a network glitch in the environment and the server nodes lose connection with eachother, the servers would get reconnected automatically due to this recovery process. This is the reason why it is recommended to keep the Connection Retry interval set to at least 1.
I'm running BizTalk 2006, and I have an orchestration that receives a series of messages (orders) that are correlated on BTS.MessageType. On my delay shape, I check the time until midnight, which is the batch cut off. I'm getting occasional instances where I receive a message after the loop ends, and this is creating Zombie messages. I still need these messages to process, but in a new instance of the orchestration. I need some ideas of how to handle this gracefully.
One option would be to correlate on the date (in addition to BTS.MessageType)
You would have to create a pipeline component that promotes the date without the time. But there could be some time window where messages would go "randomly" either to the old or new instance (for example if you have multiple BizTalk servers with slightly different times, or if the system clocks is resynchronized with a NTP source). To be safe, wait a few minutes before ending the previous day's instance.
If that window of overlap between the old and new instances is a problem, you should instead correlate on another value that changes only once a day, such as a Guid stored in a database and promoted by a pipeline component.
Otherwise, I've successfully used your "hackish" solution in past projects, as long as you can tolerate a small window where messages are queued and not processed immediately for a few minutes every day. In my case it was fine because messages are produced by american users during their work day, and sent by FTP or MSMQ. However if you have international users that sent messages by web services, then you may not have a time in the day where you probably won't receive anything, and the web services won't be able to queue the messages for later processing.
I have an EAR app running on WAS and connecting to a oracle db to run a stored proc.
Now the stored proc takes a long time to run. So the firewall between the WAS and the oracle server closes down after 30 min. Is there an oracle configuration that will allow the firewall to remain open? Increasing the timeout is not an option here.
If the firewall is closing the connection because of inactivity, you can set the sqlnet.ora parameter sqlnet.expire_time on the server to ping the client every N minutes. Normally, this is used for dead connection detection so that the server can determine that a client application died with an open connection. But that may work to prevent the firewall from deciding that the connection has been inactive too long. If, on the other hand, the firewall simply disallows connections that last longer than 30 minutes regardless of inactivity, this setting won't have any impact.
Architecturally, do you really want your application making a 30 minute stored procedure call, though? It would seem more appropriate for the application to make a call that submits a job that runs the stored procedure asynchronously. The web application can then periodically poll the database to see whether the job is still running if you want to display some sort of progress bar to the user.