I sort of know the answer to this, but cannot really grasp the underlying concept. I know you are always instructed to use connection pooling now. But imagine this scenario.
I need to read data from one database, and one table, multiple times.
Connection pooling is going to inject microseconds of overhead, but why not eliminate that by using a single connection for everything and locking around that?
Since it is one database, with one table. Isn't it pretty unlikely that we will be able to get any performance boost from multithreaded connection pools?
Just hoping for some clarity here. And maybe some simple resources which would explain WHY, connection pooling ALWAYS is better.
Thanks. I know this is not the greatest question, and I appreciate your time. I am specifically in the .net environment, but this is a basic concept across programming correct?

With one global connection you need to be prepared to handle spurious connection failues. Those can always happen (network hiccup, ...).
You absolutely do get concurrency when using multiple concurrent statements against a single table. SQL Server does not usually lock tables exclusively (exceedingly rare).
You will forget to use the synchronization protocol somewhere (lock everywhere). You will get it wrong eventually and have to fight races.
If you have a slow runaway query that would block the entire app. It will appear "hung" to browsers.
You serialize all HTTP requests on the global lock. You only use one CPU. You won't scale at all. Your app will not handle burst well.
Having a single global connection is really a bad idea. Why not just use pooling? That saves you the development work of using synchronization. It is even less work.
Of course, pooling is not always better. You can construct pathological cases where it isn't. I never encountered a case where I needed to keep a connection open for longer than the current HTTP request, though.


The asynchronous connection pool implementation in Rust

I have a Tokio TCP back-end application, which, briefly, after receiving a request, reads something from Redis, writes something to PostgreSQL, uploads something via HTTP, sends something to RabbitMQ etc. Processing each request takes a lot of time, so a separate task for each request is created. As sharing connections is impossible in asynchronous models, some connection pooling is required. For now, new connections are established on each request, and it is extremely excessive.
I have been looking for an asynchronous connection pool implementation in Rust, but have not found any of them up to date.
I would like to hear some advice on how to implement it myself.
The only idea I have come up with is:
Implement a Stream/Sink object with an inner collection of connections. It does not matter whether it is LIFO or FIFO, since the connections are identical. On the application startup, N connections are allocated.
Now I am not sure if it is possible to share such a pool among tasks, but if it were possible, tasks would poll the stream for a connection instance (instead of establishing their own one), use it, and then put back.
If there were no connections available, the stream might establish more of them or ask the task to hang on (depending on its configuration).
If a connection fails, it gets dropped and the pool now contains N-1 connections, so it may decide to allocate a new one on the next request.
So I have two problems I cannot find proper answers anywhere:
Must/can/should I share the stream/sink-pool among tasks in some way? Anyway, I see some Shared futures in the futures crate.
There are some gloomy points in the tokio/futures tutorial. E.g. it does not explain how do I notify the uppermost task, that is, how do I implement the mythical innermost future, which does not pool anything itself, but still has to notify the upper futures.
Or is my approach completely wrong? I could start playing with it by myself, but I have a strong suspicion that I have missed something, e.g. a one-click solution.

Program not closing connections to db2400 / as/400

I've been programming for just a few years, and we have a default dll used for data access. It seems like there has been some data-mining or site scraping going on here lately, and although there are no issues with our SQL database connections, many of the programs that access the as/400 are keeping connections open and idle for long periods of time. I looked through our default data access dll and added code to close the connection after each function, but that didn't help. I have little experience with db2 / as/400 ... how do I close all of these open / idle connections from the code?
If you're using connections pools, that's working as designed.
Are you sure the connection is actually open? How are you determining that?
If you're just seeing locks held by the QZDASOINIT job on the IBM i, then that's also by design. The system will hard close tables (cursors) after the first use. When used again by the same job, the system will only pseudo-close them; in order to provide faster response when they are re-used.
If an operation needing exclusive access is attempted, the system will hard close the pseudo closed cursor.

Distributed eventual consistency Key Value Store

I find it difficult to convince myself the advantage of using complex design like DynamoDB over simple duplication strategy.
Let's say we want to build a distributed key/value data store over 5 servers. (each server has exactly the same duplica).
Eventual consistency system, like DynamoDB, typically uses complicated conflicts reconcile, vector timestamp, etc. to achieve eventually consistency.
But instead, why couldn't we simply do the following:
For write, client will issue the write command to all the servers. So all servers will execute the clients' write command in the same order. It will reply to clients before servers commit the write.
For read, client will just do a round robin, only one server at a time will take care of read command. (Other servers won't see the read command)
Yes, client may experience temporary stale data, but eventually all replica will have the same dataset, which is the same semantic as DynamoDB.
What's the disadvantage of this simple design vs Complicated DynamoDB?
Your strategy has a few disadvantages, but their exact nature depends on details you haven't covered.
One obvious example is dealing with network segmentation. That is, when one part of your network becomes segmented (disconnected) from another part.
In this case, you have a couple of choices about how to react when you try to write some data to the server, and that fails. You might just assume that it worked, and continue as if everything was fine. If you do that, and the server later comes back on line, a read may return stale data.
To prevent that, you might treat a failed write as a true failure, and refuse to accept the write until/unless all servers confirm the write. This, unfortunately, makes the system as a whole quite fragile--in fact, much more fragile (at least with respect to writing) than if you didn't replicate at all (because if any of the servers go off-line, you can't write any more). It also has one more problem: it limits write throughput to the (current) speed of the slowest server, so even if they're all working, unless they're perfectly balanced (unlikely to happen) you're wasting capacity.
To prevent those problems, many systems (including Paxos, if memory serves) use some sort of "voting" based system. That is, you attempt to write to all the servers. You consider the write complete if and only if the majority of servers confirm that they've received the write. Likewise on a read, you attempt to read from all the servers, and you consider a value properly read if and only if the majority of servers agree on the value.
This way, up to one fewer than half the servers can be off-line at any given time, and you can still read and write data. Likewise, if you have a few servers that react a little more slowly than the rest, that doesn't slow down operations overall.
Of course, you need to fill in quite a few details to create a working system--but the fact remains that the basic concept is pretty simple, as outlined above.

Web server tolerance to high client poll rate: Cowboy vs. Yaws web servers

I have been building a real-time notification system. It’s part of a web application, but events have to be seen as soon as they occur. Long polling was not an option because it would be expensive for the web server to hold on to connections when no events are available, so I had to go for short-lived polls.
Each client hits the web server every, say, 2 seconds (this is a fairly high rate). When events are available, they are sent as JSON to the JavaScript client. Now, this requires a server set-up to handle a high number of short-lived connections. I have implemented one such system using the Yaws web server. However, because Yaws starts quite a number of many other services, it feels heavy and connections begin to get either refused or aborted when they go beyond 30,000 (maybe because I am running some ETS Tables in the same Erlang VM as Yaws is running on [separating these may require rpc:call/4, which—I fear—will increase latency]). I know that there are operating-system-specific tweaks to do, and those have been done.
This would not be a problem if it was easy to cluster up several Yaws instances. In Yaws, i am using a few appmods, and I am doing things RESTfully. I was thinking that the Cowboy web server might enhance things a bit here. I have not used Cowboy before, but I have used Misultin. Looking at Cowboy, it is a full fledged OTP Application and it seems to be easy to cluster, and being lightweight, may perhaps increase on the number of clients the overall system can handle. Storage is on Mnesia, which I can distribute easily to add more nodes (maybe by replication), so that there is a Cowboy instance in front of every Mnesia instance.
My questions are:
Is my speculation correct, that if I switched from Yaws to Cowboy, I might increase the performance significantly?
Yaws has a clean API via Appmods and the #arg{} record. Does Cowboy have an equivalent of these two things (illustrate please)?
Can Cowboy handle file uploads? If so, which server (Yaws or Cowboy), in your opinion would be better to use in the case of frequent file uploads? Illustrate how file uploads are done with Cowboy.
It is possible to run several Yaws instances on the same machine. Do you think that creating many Yaws instances per server (physical box) and having the client-load distributed across these would help? What do I need to know about doing this?
When I set the yaws.conf parameter max_connections = nolimit, how would I specify the same in Cowboy?
Now, I followed the interview with Cowboy author and he discusses the reasons why Cowboy is more lightweight than Yaws. He says that
The biggest difference is the use of binaries instead of lists. The generic acceptor pool is another. I could list a lot of other small differences but I figure these aren’t the most interesting.
That because Cowboy uses the listener-pool library Ranch, it somehow ends up with a higher capability of handling more connections, plus the use of binaries and not lists.
Another quote from the same interview:
Since we use one process per connection instead of two, and we use binaries instead of lists, we end up using a lot less memory than other projects without user intervention. Cowboy is also lazy, it doesn’t do anything unless required. So we don’t have much in memory until the user starts calling functions.
I wonder how yaws handles this case. Somehow, my problem domain needs lightweight HTTP handling. It’s actually true that Yaws will lead to more memory consumption as compared to say, Mochiweb, Misultin or Cowboy. My greatest concern is that Yaws has the best/cleanest API whereby it gives us access to the #arg{} containing everything we need as an Erlang record, so that we can get them out ourselves, than the others which have numerous functions for extracting stuff outside. Even the documentation: Yaws docs are pretty good and straightforward. Perhaps I need to look at more Cowboy code for things like file uploading and simple GET and POST request handling.
Otherwise, the questions I asked earlier, remain as pressing concerns. Yaws is pretty good, but seems to be overkill for this fast light-weight short-lived high rate poll situation, what do you think?
Your 30000 refusal limit sounds an awful lot like a 32k limit somewhere. Either the default process count, which is 32k, or some system limit on file descriptors and so on. You should not rule out the possibility that the limitation is on the kernel side of things. I've seen systems come to their limits quite easily due to kernel configurations which can be really hard to handle.

Generic Architecture for a Network Server/Client using a State Machine

so, I inventedmade up a simple protocol that I want to use for a client to talk to a server. It's the typical (I think) three-phase layout:
Connection Establishment (will eventually include capability negotiation)
Actual Data Exchange - packets are happily travelling to and fro', get interpreted by the respective receiver which acts on them accordingly
Connection Teardown - one side says "don't wanna no more', other side says 'so be it' (will eventually allow the other side to send some data until it is done instead of simply closing the conversation)
The framework is a simple setup: The server does java.net.ServerSocket.accept() and starts a thread to handle the incoming connection by a client, which creates a java.net.Socket() to the host/port where the server is waiting. Both sides use the java.io.InputStream and java.io.OutputStream and spew data at each other, assembling outgoing and parsing incoming messages. Fine, so far.
So far, the protocol is hard-coded. Connection Establishment and Teardown are pretty much ok, while the Data Exchange part - which I want to be full-duplex - is pretty much a mess.
So, thinks me, let's do this the good way and set up a state machine using, surprise, the design pattern of the same name. I'm pretty clear about what the states should be for the server and the client, respectively, and what kinds of events should happen for a transition to take place, and what actions should be undertaken when a transition does happen. That looks good - on paper, that is. In practice, I've stubmled over a couple of questions that I can't solve on paper.
In particular, the inputs of the state machine are ... a little diverse. How could I possibly be able to write data, read data and check the connection (it might have closed or may be broken) at the same time? Also, the 1st and 3rd phase should get timers to avoid potentially infinite waiting times for answers.
So, I'd be grateful for any help that bridges my gap between the theory state machine and the code state machine.
BTW, I can read C/C++/C# too - no need to translate to Java (which is what I'm using).
The state for your machine needs to be stored per "Connection"
Each client connecting might be in a different state. So if you had an object tracking your state, you would have an instance of that object for every connection.
I actually wrote a little library that abstracts out just about everything from the state machine if you're interested. There is some test code in there as well that should show you how to work it. State Machine Code
It does some stuff you might forget, like ensuring that state transitions that are not "valid" are actually an error rather than maybe being missed, and logging state transitions is free.
ps. (Anyone) If you look at it and don't like it--please let me know why. I'd like to make it usable for anyone.
