On Hadoop Network System Call

On Hadoop Network System Call - networking

Since Socket is used in hadoop src, guess it's TCP connections to send/recv messages and files, right?
How JVM translate these Socket instances to Linux system calls, is that socket/send, or select/poll?
If it's all about select/poll, I still can get ip/port through relative socket system calls, right?
When I collect all the sys_calls during a terasort (1 master, 3 slaves), I got rare connect/accept/socket sys_calls, and they are without any LAN ip in the sockaddr struct (either 0 or strange ones, IPv4). There are bunches of select/poll sys_calls, is that reasonable?

you could use "netstat --tcp -n" commands to check current tcp connection. i guess Hadoop should use TCP.
you may need use strace to start your Hadoop JVM. The strace will print system calls used by the running application. Usual application use sys_poll to check connection FD's status and use read/write or sendto/recvfrom syscall to receive transmit packet.
?
Right, these system call is only called once during connection setup through sys_socket system call, then application does many polls, transmit or receives operation on that socket.

Related

TCP SYN flooding from Linux Machine(which is acting as client) to Destination (Server)

ROB(CLIENT) and BOB(SERVER) is Established with TCP , after some time ROB Linux Machine frequently sending TCP [SYN] to BOB(SERVER). The SYN packet is initiated automatically , which is not triggered by any service from ROB. Due to this BOB is dropping TCP connection .
We have enabled TCP Dump in ROB machine and Identified this issue.
How to Identify who is sending the Unnecessary SYN initiation from ROB to BOB ?

You could consider using the auditctl kernel level auditing framework that has been built into Linux for a while. It won't be pretty, but it will create audit records that you can then work through and tie to users. For example:
sudo auditctl -a exit,always -F arch=b64 -S socket -F success=1
Once that is done (you may need to install auditctl if it isn't there already... Check your package manager...), you can review the audit logs using ausearch
sudo ausearch -sc connect
The records will include the PID, PPID, and UID. You can dig further, but decoding the arguments to the call will take some effort since they are just raw hunks of data and will not be represented as the structures that they actually are.
Also, note that depending on your Linux version, you may find that you need to monitor a different system call. Obviously, the -b64 needs to match your architecture as well.

How to ensure that UNIX sockets are connected to the same address

Given file descriptors of two connected AF_UNIX sockets is it possible to ensure that they are indeed connected to the same peer?
A function like getpeername would do but it doesn't seem to support this family.

Regarding the definition of AF_UNIX socket, the socket is the method used by the processes of the local machine.
So, to get the peer(s) of a socket file, you just need to get the list of local processes that have opened the file descriptor.
You can do that using fuser command or lsof. Both unix command will give you the list of processes using the socket, and you may compare the result to status about your question.

Listening Application (winsock2) behavior towards Port scanning (Syn Scan)

Should a server application that listens on a port, able to detect and logs down any connection attempt done by Syn Scanning?
Test Scenario
I had written a windows program which i simply called it "simpleServer.exe".
This program is just a simulation of a very basic server application.
It listens on a port, and wait for incoming messages.
The listening Socket was defined to be a TCP Stream Socket.
that's all that this program is doing.
I had been deploying this exact same program on 2 different machines, both running on windows 7 professional 64bit.
This machine will act as a host.
and they are stationed in the same network area.
then, using the program "nmap",
i used another machine on the same network, to act as a client.
using the "-sS" parameter on "nmap", i do a Syn Scan, to the IP and Port of the listening simpleServer on both machine (one attempt at a time).
(note that the 2 hosts already had "wireshark" started, and is monitoring on tcp packets from the client's IP and to the listening port.)
In the "wireshark" entry, on both machine, i saw the expected tcp packet for Syn Scan:
client ----(SYN)----> host
client <--(SYN/ACK)-- host
client ----(RST)----> host
the above packet exchange suggests that the connection was not established.
But on the "simpleServer.exe", only one of it had "new incoming connection" printed in the logs, while the other instance was not alerted of any new incoming connection, hence no logs at all.
Code Snippets
// socket bind and listen was done above this loop
while(TRUE)
{
sClient=accept(sListen,(SOCKADDR*)&remoteAddr,&nAddrLen);
if(sClient == INVALID_SOCKET)
{
printf("Failed accept()");
continue;
}
dwSockOpt (sListen);
printf ("recv a connection: %s\n", inet_ntoa(remoteAddr.sin_addr));
closesocket(sClient);
}
side note:
yes, since it is just a simple program, the flow might be a little funny, such as no break in the while loop. so please don't mind this simple and flawed design.
Further Investigation
i had also put a getsockopt() in the "simpleServer" right after it went into listening state, to check the differences of both the listening socket's SOL_SOCKET option.
one notable difference i found between the two hosts, is the SO_MAX_MSG_SIZE.
the host that detects the incoming connection has a Hex value of 0x3FFFFFFF (1073741823), while the other one that has no logs is 0xFFFFFFFF (-1). not sure if this is related or not, but i just spammed whatever differences that i may found in my test environment. the other value of the SOL_SOCKET are more or less the same.
side note: i tested on some other machine, which covers another windows 7 professional, windows server 2008 r2, windows server 2003. i am not sure if it is coincidence or not, but machine that have SO_MAX_MSG_SIZE == -1, they all did not detect the connection of the Syn Scanning. but maybe it is just a coincidence. i have nothing to prove tho.
Help That I Needed
why is the different behavior from the 2 same of the same application on a different machine with the same OS?
what determines the value of the SO_MAX_MSG_SIZE? considering two same OS but having 2 different values.

If a connection is never established, accept() will never return. That disposes of 90% of your question.
The only explanation for the 'new incoming connection' (or 'recv a connection' or whatever it is) message is that something else connected.
SO_MAX_MSG_SIZE has no meaning for a TCP socket, let alone a listening TCP socket. So whatever variation you experienced is meaningless.

Unix sockets programming: port is not getting unbound after server shutdown

I'm studying Unix sockets programming. I made a time server that sends raw time data and a client for it that receives that data and converts it to local time.
When I run the server, connect a client to it (which causes both of them to do their job and shutdown) and then rerun the server, I get errno = 98 on bind() call. I have to change the port in the server's source code and recompile it to get rid of that error. When I run the server and connect to it again it's ok, after another rerun the situation repeats. But I can change back to previous port then. So I'm jumping from port 1025 to 1026 and vice-versa each debug run (which are very frequent, so this annoys a little).
It works like this: The server opens the listener socket, binds to it, listens to it, accepts a connection into a data socket, writes a time_t to it, closes the data socket and then closes the listener socket. The client opens a socket, connects to a server, reads data and closes the socket.
What's the problem?
Thanks in advance.

The sockets have a lingering time after they close. They may keep the port taken for a litte while after the execution of your application, so they may send any unsent data. If you wait long enough the port will be released and can be taken again for another socket.
For more info on Socket Lingering check out:
http://www.developerweb.net/forum/archive/index.php/t-2982.html

setsockopt and SO_REUSEADDR

errno 98 - Address already in use
Look into SO_REUSEADDR
Beej's guide to network programming

More details on what causes this:
http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html

Unix TCP servers and UDP Servers

Why is the design of TCP servers mostly such that whenever it accepts a connection, a new process is invoked to handle it . But, why in the case of UDP servers, mostly there is only a single process that handles all client requests ?

The main difference between TCP and UDP is, as stated before, that UDP is connectionless.
A program using UDP has only one socket where it receives messages. So there's no problem if you just block and wait for a message.
If using TCP you get one socket for every client which connects. Then you can't just block and wait for ONE socket to receive something, because there are other sockets which must be processed at the same time.
So you got two options, either use nonblocking methods or use threads. Code is usually much simpler when you don't have one while loop which has to handle every client, so threading is often prefered. You can also save some CPU time if using blocking methods.

When you talk with client via TCP connection you maintain TCP session. So when new connection established you need separate process(or thread, no matter how it implemented and what OS used) and maintain conversation. But when you use UDP connection you may recieve datagram(and you will be informed about senders ip and port) but in common case you cannot respond on it.

First of all, the classic Unix server paradigm is filter based. For example, various network services can be configured in /etc/services and a program like inetd listens on all of the TCP and UDP sockets for incoming connections and datagrams. When a connection / DG arrives it forks, redirects stdin, stdout and stderr to the socket using the dup2 system call, and then execs the server process. You can take any program which reads from stdin and writes to stdout and turn it into a network service, such as grep.
According to Steven's in "Unix Network Programming", there are five kinds of server I/O models (pg. 154):
blocking
non-blocking
multiplexing (select and poll)
Signal Driven
asynchronous ( POSIX aio_ functions )
In addition the servers can be either Iterative or Concurrent.
You ask why are TCP servers are typically concurrent, while UDP servers are typically iterative.
The UDP side is easier to answer. Typically UDP apps follow a simple request response model where a client sends a short request followed by a reply with each pair constituting a stand alone transaction. UDP servers are the only ones which use Signal Drive I/O, and at the very rarely.
TCP is a bit more complicated. Iterative servers can use any of the I/O models above, except #4. The fastest servers on a single processor are actually Iterative servers using non-blocking I/O. However, these are considered relatively complex to implement and that plus the Unix filter idiom where traditionally the primary reasons for use of the concurrent model with blocking I/O, whether multiprocess or multithreaded. Now, with the advent of common multicore systems, the concurrent model also has the performance advantage.

Your generalization is too general. This is a pattern you might see with a Unix-based server, where process creation is inexpensive. A .NET-based service will use a new thread from the thread pool instead of creating a new process.

Programs that can continue to do useful work while they are waiting for I/O
will often be multithreaded. Programs that do lots of computation which
can be neatly divided into separate sections can benefit from
multithreading, if there are multiple processors. Programs that service
lots of network requests can sometimes benefit by having a pool of
available threads to service requests. GUI programs that also need to
perform computation can benefit from multithreading, because it allows the
main thread to continue to service GUI events.
Thats why we use TCP as an internet protocol.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex