How to write a high performance Netty Client

How to write a high performance Netty Client - tcp

I want an extremely efficient TCP client to send google protocol buffer messages. I have been using the Netty library to develop a server/client.
In tests the server seems to be able to handle up to 500k transactions per second, without to many problems, but the client tends to peak around 180k transactions per second.
I have based my client on the examples provided in the Netty documentation, but the difference is I just want to send the message and forget, I don't want a response (which most of the examples get). Is there anyway to optimize my client, so that I can achieve a higher TPS ?
Should my client maintain multiple channels, or should I be able to achieve a higher throughput than this with a single channel?

1) If the client is only interested in sending, not in receiving, you can always disable reading from channel like below
channel.setReadable(false);
2) You can increase the throughput very easily by having multiple client channels per client, and also it can scale too.
3) and you can do following tweaks to improve the performance in general (for read/ write)
Its better to have a SEDA like pipline by adding a EXecutionHandler with OrderdMemoryAwareThreadPoolExecutor, (with min, max channel memory with optimal value)
bootstrap.setPipelineFactory(new ChannelPipelineFactory() {
#Override
public ChannelPipeline getPipeline() throws Exception {
return Channels.pipeline(
executionHandler1,//sharable
new MessageDecoderHandler(),
new MessageEncoderHandler(),
executionHandler2,//sharable
new BusinessLogicHandler1(),
new BusinessLogicHandler2());
}
});
Setting the writeBufferHighWaterMark of the channel to optimal value (Make sure that setting a big value will not create congestion)
bootstrap.setOption("writeBufferHighWaterMark", 10 * 64 * 1024);
Setting the SO_READ, SO_WRITE buffer size
bootstrap.setOption("sendBufferSize", 1048576);
bootstrap.setOption("receiveBufferSize", 1048576);
Enabling the TCP No delay
bootstrap.setOption("tcpNoDelay", true);

I am not sure if "tcpNoDelay" helps to improve the throughput. Delay is there to improve the performance. None the less, I tried it and saw that the throughput actually fell more than 90%.

Related

Need to write multiple commands on device 1 by 1

Its taking too long to write a single command on characteristics. I am using below code for a single command and a loop on it.
getConnObservable()
.first()
.flatMap(rxBleConnection -> rxBleConnection.writeCharacteristic(characteristics, command))
.observeOn(AndroidSchedulers.mainThread())
.subscribe(
bytes -> onWriteSuccess(),
this::onWriteFailure
);
Its taking almost 600ms to write on device. I need to write like 100 of commands 1 by 1.
Can anyone please explain what is the best way to do that batch operation

The best way to get the highest performance possible over BLE is to use the same RxBleConnection to carry out all writes—this means to mitigate the overhead of RxJava i.e.:
getConnObservable()
.first()
.flatMapCompletable(rxBleConnection -> Completable.merge(
rxBleConnection.writeCharacteristic(characteristics, command0)).toCompletable(),
rxBleConnection.writeCharacteristic(characteristics, command1)).toCompletable(),
(...)
rxBleConnection.writeCharacteristic(characteristics, command99)).toCompletable(),
))
.observeOn(AndroidSchedulers.mainThread())
.subscribe(
this::onWriteSuccess,
this::onWriteFailure
);
Additionally one could try to negotiate the shortest possible Connection Interval (CI) by subscribing to rxBleConnection.requestConnectionPriority(BluetoothGatt.CONNECTION_PRIORITY_HIGH, delay, timeUnit)
Further speedup can be achieved by setting bluetoothGattCharacteristic.setWriteType(BluetoothGattCharacteristic.WRITE_TYPE‌_WITHOUT_RESPONSE) if the peripheral/characteristic supports this write type.*
*Be aware that the internal buffer for writes without response is limited and depending on the API level behaves a bit differently. It should not matter for ~100 writes though.
In regards to this conversation:
RxAndroidBle is a Bluetooth Low Energy library and comparing it to Blue2Serial (which uses standard Bluetooth) in terms of performance is not the best thing to do. These have different use-cases—just like using a WiFi or Ethernet cable to get access to the Internet.
Best Regards

Find out latency in a reliable way

Background: I am developing a small game and use the player's latency to do lag compensation. The game is open sourced, so at the moment it is a very easy task to reverse engineer the system and delay ones response time to artificially increment ones reported delay, resulting in possibly unfair advantages.
My current strategy for latency retrieval is:
Every fixed interval I send a message labeled as "ping" to a player. (This has nothing to do with ICMP)
This ping message consists of a special "ping" opcode and a payload with a sequence number
Once the client receives said message, he sends back one with a "pong" opcode and a payload with the same sequence number
When the server receives the message labeled as "pong", it calculates how much time passed in between sending and receiving. This is the round trip time
Our latency is the rtt / 2
In pseudo code
Server:
function now() {
return current UTC time in millis
}
i = 0
function nextSequence() {
return i++
}
sendingTimestamps = []
function onPingEvent() {
id = nextSequence()
sendingTimestamps[id] = now()
sendPingMessage(id)
}
function onPongReceived(id) {
received = now()
sent = sendingTimestamps[id]
rtt = received - sent
latency = rtt / 2
}
Client:
function onPingReceived(id) {
sendPongMessage(id)
}
As you can see, it's very easy for the client to just add a delay in his code to inflate his reported latency.
Is there a better way to get a clients latency in order to leave them less room for cheating?

Answer below is a summary of topics discussed in comments to have them all in one place.
Lag compensation should rely on precise time stamp of event rather than average packet delay
Transition time may drastically vary even for two successive packets. Suggested approach with measuring average latency and assuming, that each received packet was sent "latency" ms ago for lag compensation is way too inaccurate. The following scheme should be applied instead:
Server starts emulating world on its side and sends command START to all clients. Clients initiate emulating world and count ticks from its creation. Whenever any event occurs on client side, client sends it with timestamp to server. Like "user pressed fire at tick #183". Server's emulation of game is far ahead due to packet transition time, but server can "go back in time" to handle user's order and resolve consequences.
Time stamps and events still can be faked
AFAIU problem of verifying client input is generally unsolvable. Any algorithm implemented in client can be recreated to fake events/timestamps/packets. Closed code can be reversed, so it is not an answer. Even world wide spread games like Counter-Strike or OverWatch have cheaters, despite they are developed by large companies, which, I bet, have separate department focused solely on game security. Some companies develop antivirus like modules, which check game file integrity or hash of parts of RAM snapshot, but it still can be bypassed.
The question is amount of efforts required to fake algorithm. The more efforts needed the less fakers will be. Trivial timestamp verifycation is the following:
If you receive event#2 in TCP stream after event#1, but its time stamp is before event#1, then it's faked.
If time stamp is far behind server's time, then warn and kick player for enormously bad delay. If it's a real player, the game anyway is unplayable for him, otherwise you kicked hacker. CS servers do this if I'm not mistaken.

Firebase concurrent read/write requests limits

How many concurrent read and write request can firebase support. Can it support 50k write and 50k read concurrent requests. What will be the average response time for both read and write at this load. Assume that the data size is not huge, say only 50k records are there and each request will be reading or writing single record only.

There is a physical limit to how much data can be written to a disk in a certain time interval (you fail to mention any time interval in your question). Firebase's limits are based on the physical limits combined with logic to keep the shared service responsive for everyone. These numbers change regularly, so we don't disclose them.
Once you reach the limit, writes will be queued and processed in turn. So any peak in write throughput will be buffered. You can detect when your writes are being buffered by using a completion listener. For buffered writes you'll see an increase in the time between when you call set() and when the completion listener fires.
For this specific case, I'd recommend setting up a small jsbin and simply testing it. A loop that writes 50k nodes should be simple enough:
var ref = firebase.database().ref();
for (var i=0; i < 50000; i++) {
ref.child(i).set(true).then(function(ref) {
// this is when the write has completed
});
}
You'll want to add some performance measurement logic in there, probably using performance.now().

Pcap Dropping Packets

// Open the ethernet adapter
handle = pcap_open_live("eth0", 65356, 1, 0, errbuf);
// Make sure it opens correctly
if(handle == NULL)
{
printf("Couldn't open device : %s\n", errbuf);
exit(1);
}
// Compile filter
if(pcap_compile(handle, &bpf, "udp", 0, PCAP_NETMASK_UNKNOWN))
{
printf("pcap_compile(): %s\n", pcap_geterr(handle));
exit(1);
}
// Set Filter
if(pcap_setfilter(handle, &bpf) < 0)
{
printf("pcap_setfilter(): %s\n", pcap_geterr(handle));
exit(1);
}
// Set signals
signal(SIGINT, bailout);
signal(SIGTERM, bailout);
signal(SIGQUIT, bailout);
// Setup callback to process the packet
pcap_loop(handle, -1, process_packet, NULL);
The process_packet function gets rid of header and does a bit of processing on the data. However when it takes too long, i think it is dropping packets.
How can i use pcap to listen for udp packets and be able to do some processing on the data without losing packets?

Well, you don't have infinite storage so, if you continuously run slower than the packets arrive, you will lose data at some point.
If course, if you have a decent amount of storage and, on average, you don't run behind (for example, you may run slow during bursts buth there are quiet times where you can catch up), that would alleviate the problem.
Some network sniffers do this, simply writing the raw data to a file for later analysis.
It's a trick you too can use though not necessarily with a file. It's possible to use a massive in-memory structure like a circular buffer where one thread (the capture thread) writes raw data and another thread (analysis) reads and interprets. And, because each thread only handles one end of the buffer, you can even architect it without locks (or with very short locks).
That also makes it easy to detect if you've run out of buffer and raise an error of some sort rather than just losing data at your application level.
Of course, this all hinges on your "simple and quick as possible" capture thread being able to keep up with the traffic.
Clarifying what I mean, modify your process_packet function so that it does nothing but write the raw packet to a massive circular buffer (detecting overflow and acting accordingly). That should make it as fast as possible, avoiding pcap itself dropping packets.
Then, have an analysis thread that takes stuff off the queue and does the work formerly done in process_packet (the "gets rid of header and does a bit of processing on the data" bit).
Another possible solution is to bump up the pcap internal buffer size. As per the man page:
Packets that arrive for a capture are stored in a buffer, so that they do not have to be read by the application as soon as they arrive.
On some platforms, the buffer's size can be set; a size that's too small could mean that, if too many packets are being captured and the snapshot length doesn't limit the amount of data that's buffered, packets could be dropped if the buffer fills up before the application can read packets from it, while a size that's too large could use more non-pageable operating system memory than is necessary to prevent packets from being dropped.
The buffer size is set with pcap_set_buffer_size().
The only other possibility that springs to mind is to ensure that the processing you do on each packet is as optimised as it can be.
The splitting of processing into collection and analysis should alleviate a problem of not keeping up but it still relies on quiet time to catch up. If your network traffic is consistently more than your analysis can handle, all you're doing is delaying the problem. Optimising the analysis may be the only way to guarantee you'll never lose data.

Compensating for jitter

I have a voice-chat service which is experiencing variations in the delay between packets. I was wondering what the proper response to this is, and how to compensate for it?
For example, should I adjust my audio buffers in some way?
Thanks

You don't say if this is an application you are developing yourself or one which you are simply using - you will obviously have more control over the former so that may be important.
Either way, it may be that your network is simply not good enough to support VoIP, in which case you really need to concentrate on improving the network or using a different one.
VoIP typically requires an end to end delay of less than 200ms (milli seconds) before the users perceive an issue.
Jitter is also important - in simple terms it is the variance in end to end packet delay. For example the delay between packet 1 and packet 2 may be 20ms but the delay between packet 2 and packet 3 may be 30 ms. Having a jitter buffer of 40ms would mean your application would wait up to 40ms between packets so would not 'lose' any of these packets.
Any packet not received within the jitter buffer window is usually ignored and hence there is a relationship between jitter and the effective packet loss value for your connection. Packet loss typically impacts users perception of voip quality also - different codes have different tolerance - a common target might be that it should be lower than 1%-5%. Packet loss concealment techniques can help if it just an intermittent problem.
Jitter buffers will either be static or dynamic (adaptive) - in either case, the bigger they get the greater the chance they will introduce delay into the call and you get back to the delay issue above. A typical jitter buffer might be between 20 and 50ms, either set statically or adapting automatically based on network conditions.
Good references for further information are:
- http://www.voiptroubleshooter.com/indepth/jittersources.html
- http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a00800945df.shtml
It is also worth trying some of the common internet connection online speed tests available as many will have specific VoIP test that will give you an idea if your local connection is good enough for VoIP (although bear in mind that these tests only indicate the conditions at the exact time you are running your test).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex