I want to preface this by saying, I have a feeling that this idea will not work the way I'm imagining, but I'm not sure why. Its likely I'm making some sort of false assumption about the way the internet works.
Lets say server A has a file of size 1024 kb. This file is split up into 1024 packets to be sent over a network. Server A sends these 1024 packets to server B via TCP. As soon as B receives a packet it sends it back to A and vice versa. If any 3rd party C makes a request to A for the data, it would make a copy of each packet it receives from B and send to to C and B.
In this scheme, server A could delete its file once it has sent all the packets to B, freeing up disk space. Servers A and B only need to store a single packet at any given time and the rest of the packets would be "juggled" in the network.
Is it really possible to store data by "juggling" it through a network? Am I underestimating the overhead in receiving and sending packets?
EDIT: This is assuming 100% network reliability, which is probably a completely unrealistic assumption.
A network does have some sort of "storage" capacity, and it is usually measured by the "Bandwidth-delay product". (Think network as a pipe, then its storage-capacity, the amount of data(water) that the pipe can hold is the volume of that pipe, the area-length product.) If you use unreliable protocols such as UDP to do the "cyclic sending", the extra data that exceeds the capacity will simply lose. If you use TCP to do it, after filling up the "pipe", the sending will then fill up the internal sending buffer on the OS, and then block.
Related
When a server has only 1 UDP socket, and many clients are sending UDP packets to it, what would be the best approach to handle all of the incoming packets?
I think this can also be a problem with TCP packets, since there's a limited thread count, which cannot cover all client TCP socket receive events.
But things are better in this situation because there's 1 TCP socket per client, and even if the network buffer is full, packet receiving is blocked until the queue has space (let me know if I'm wrong).
UDP packets, however, are discarded when the buffer is full, and there's only 1 socket, so the chances of that happening are higher.
How can I solve this problem? I've searched for a while, but I couldn't get a clear answer. Should I implement my own queueing system? Or just maximize the network buffer size?
There is no way to guarantee you won't drop UDP messages. No matter what you do, if the rate of packets being sent is too large, you will drop some, either on the receiving host or somewhere in the network.
Some things that can help include:
Implementing an internal queue for messages in your Java app, and handing them over to a thread pool to process.
Increasing the kernel's message buffering.
But neither of these can deal with the case where the average message arrival rate is higher that the receiver's ability to process them or the network capacity. This will inevitably lead to lost messages (requests).
I've searched for a while, but I couldn't get a clear answer.
That is because there isn't one! Some problems are fundamentally unsolvable. For others, the best answer depends on factors that are too hard to measure or predict.
(If you want certainty ... don't use networking!)
In the TCP case, what you should do is use a (long-term) socket for each client. Depending on the number of sockets you need to support, you could either:
Dedicate a server-side thread to each socket (and client).
Use java.nio.channels.Selector and a thread pool.
You will still get problems if the rate of requests exceeds your server's ability to process them. However, the TCP connections will ensure that requests are not lost, and that the clients get some "back pressure".
Why is UDP usually used for DNS requests instead of TCP?
I know that we could use TCP, but why UDP is the default protocol? Are there any reasons for that, or it is just for design purposes?
UDP is default protocol because in most cases, and when DNS was designed, an exchange is a single question/response, each part fitting into a small 512 bytes packet, so there is no need to establish a long running connection, where TCP needs first a 3-way handshake before exchanging any data.
Hence in most cases UDP gives better performances and DNS is time sensitive.
But then of course UDP is easier to spoof than TCP and bigger packets can be a problem.
First of all, it is important to note that TCP can also be used for DNS. In practice, most DNS servers support both UDP and TCP, though TCP is rarely used for simple DNS queries and is reserved mainly for operations like zone transfers.
The biggest advantage to using UDP is the performance boost. There are several reasons why TCP DNS queries are slower:
TCP requires a connection to be established before each request, then subsequently torn down. So if it takes 20ms for a message to travel from your computer to the server and back (a time known as RTT - Round Trip Time), then a TCP query would require 3xRTT (60ms) to be fully processed - 20ms for opening the connection, 20 more ms for the query, and another 20ms to tear it down. UDP would only require one RTT, so 20ms.
Due to TCP's connection-oriented nature, more resource are needed per-connection to store and manage TCP's state. TCP requires both the client and server to have a separate socket for each and every connection.
UDP makes it easy to deploy anycast DNS servers. In anycast, several servers (possibly around the world) share a single IP address - e.g. 1.1.1.1. When you send a query to 1.1.1.1, one of these servers (probably one of the closest ones geographically) gets it. Since TCP involves multiple packets sent back and forth, reliable anycast is harder to achieve since you need to make sure that the packets always reach the same exact server. Otherwise, they might end up reaching different servers which won't know what to do with them.
Lower data overhead - a UDP header is tiny compared to the header TCP sends for every segment. Using UDP means sending fewer bytes.
Simplicity - UDP is a lot simpler than TCP. TCP is optimized for long data transfers and has a bunch of complex mechanisms such as flow control and congestion control for optimizing the rate of data flow. DNS doesn't need any of these mechanisms for simple queries since the typical amount of sent data is tiny.
Web games are forced to use tcp.
But with real time constraints tcp head of line blocking behavior is absurd when you don't care about old packets.
While I'm aware that there's definitely nothing that we can do on the client side, I'm wondering if there is a solution on the server side.
Indeed, on the server you get packets in order and miserably wait if misbehaving packet t+42 has been lost even though packets t+43, t+44 can already be nicely waiting in your receive buffer.
Since we are talking about local data, technically it should be possible to retrieve it..
So does anyone have an idea on how to perform that feat?
How to save this precious data from these pesky kernel space daemons?
TCP guarantees that the data arrives in order and re-transmits lost packets. TCP Man Page
Given this, there is only one way to achieve the results you want given your stated constraints, and that is to hack the TCP protocol at the server side (assuming you cannot control the Client WebSocket behavior). The simplest, relative term, would be to open a raw socket, implement your own simple TCP handshake (Syn-Ack when client Syns), then read and write from the socket managing your own TCP headers. Your custom implementation would need to keep track of received sequence numbers and acknowledge all of those you want the client to forget about.
You might be able to reduce effort by making this program a proxy to your original.
Example of TCP raw socket here.
I am designing a simple protocol on top of UDP and now I've realized that someone else can send a packet to the port I'm listening on. Such packet will obviusly be incorrect for my application (I'm not concerned about security now)
Is there the way of filtering these invalid messages? I was thinking about adding some fixed magic number at the beginning of each packet, but how large should it be? Is 16 bits enough?
I believe the typical solution is to require a handshake (that could include a "long enough" magic number) in the beginning of the session. Of course the "session" is something your protocol needs to keep track of, UDP does not have the concept. Keeping a list ip, port and last packet receive time for all current sessions should do it: then you can drop all packets from a peers that have not done the handshake beforehand. This not only prevents random unknown app traffic from breaking your application but will also prevent multiple legitimate peers from screwing up each others traffic.
Additionally you could add either a session id or an increasing packet sequence number (with allowances for missing packets) to packets if you need to ensure that the peer agrees with you on which session this is.
Probably. Java uses 32 for .class files (0xcafebabe). But you only have 534 bytes in practical UDP so you might need to save a couple.
I have noticed that viewing images or websites that are hosted on US servers (Im in europe) is considerably slower. The main reason would be the latency because of the distance.
But if 1 packet takes n milliseconds to be received, can't this be alleviated by sending more packets simultaneously?
Does this actually happen or are the packets sent one by one? And if yes what determines how many packets can be send simultaneously (something to do with the cable i guess)?
But if 1 packet takes n milliseconds
to be received, can't this be
alleviated by sending more packets
simultaneously?
Not in a boundless way, by TCP/IP standards, because there algorithms that determine how much can be in flight and not yet acknowledged to avoid overloading the whole network.
Does this actually happen or are the
packets sent one by one?
TCP can and does keep up to a certain amount of packets and data "in flight".
And if yes what determines how many
packets can be send simultaneously
(something to do with the cable i
guess)?
What cable? The same standards apply whether you're on cabled, wireless, or mixed sequences of connections (remember your packet goes through many routers on its way to the destination, and the sequence of router can change among packets).
You can start your study of TCP e.g. wikipedia. Your specific questions deal with congestion control algorithms and standard, Wikipedia will give you pointers to all relevant algorithms and RFCs, but the whole picture won't do you much good if you try to start studying at that spot without a lot of other understanding of TCP (e.g., its flow control concepts).
Wikipedia and similar encyclopedia/tutorial sites can only give you a summary of the summary, while RFCs are not studied to be readable, or understandable to non-experts. If you care about TCP, I'd recommend starting your study with Stevens' immortal trilogy of books (though there are many other valid ones, Stevens' are by far my personal favorites).
The issue is parallelism.
Latency does not directly affect your pipe's throughput. For instance, a dump truck across the country has terrible latency, but wonderful throughput if you stuff it full of 2TB tapes.
The problem is that your web browser can't start asking for things until it knows what to ask for. So, when you load a web page with ten images, you have to wait until the img tags arrive before you can send the request for them. So everything is perceptibly slower, not because your connection is saturated but because there's down time between making one request and the next.
A prefetcher helps alleviate this issue.
As far as "multiple packets at a time" are concerned, a single TCP connection will have many packets in transit at once, as specified by the window scaling algorithm the ends are using. But that only helps with one connection at a time...
TCP uses what's called a sliding window. Basically the amount of buffer space, X, the receiver has to re-assemble out of order packets. The sender can send X bytes past the last acknowledged byte, sequence number N, say. This way you can fill the pipe between sender and receiver with X unacknowledged bytes under the assumption that the packets will likely get there and if not the receiver will let you know by not acknowledging the missing packets. On each response packet the receiver sends a cumulative acknowledgment, saying "I've got all the bytes up to byte X." This lets it ack multiple packets at once.
Imagine a client sending 3 packets, X, Y, and Z, starting at sequence number N. Due to routing Y arrives first, then Z, and then X. Y and Z will be buffered at the destination's stack and when X arrives the receiver will ack N+ (the cumulative lengths of X,Y, and Z). This will bump the start of the sliding window allowing the client to send additional packets.
It's possible with selective acknowledgement to ack portions of the sliding window and ask the sender to retransmit just the lost portions. In the classic scheme is Y was lost the sender would have to resend Y and Z. Selective acknowledgement means the sender can just resend Y. Take a look at the wikipedia page.
Regarding speed, one thing that may slow you down is DNS. That adds an additional round-trip, if the IP isn't cached, before you can even request the image in question. If it's not a common site this may be the case.
TCP Illustrated volume 1, by Richard Stevens is tremendous if you want to learn more. The title sounds funny but the packets diagrams and annotated arrows from one host to the other really make this stuff easier to understand. It's one of those books that you can learn from and then end up keeping as a reference. It's one of my 3 go-to books on networking projects.
The TCP protocol will try to send more and more packets at a time, up to a certain amount (I think), until it starts noticing that they're dropping (the packets die in router/switch land when their Time To Live expires) and then it throttles back. This way it determines the size of the window (bandwidth) that it can use. If the sending host is noticing from its end a lot of dropped packets, then the receiving host is just going to see it as a slow connection. It could very well be blasting you with data, you're just not seeing it.
I guess parallel packets transmission is also possible(ya.. there can be limiit on No. of packets to be send at a time)..U will get more information about the packet transmission from topics::> message switching,packet switching ,circuit switching & virtual circuit packet switching...