I've been using Munin for some days and I think it's very interesting information, but I don't understand some of the graphs, and how they can be used/read to get information to improve system.
The ones I don't understand are:
Disk
Disk throughput per device
Inode usage in percent
IOstat
Firewall Throughput
Processes
Fork rate
Number of threads
VMstat
System
Available entropy
File table usage
Individual interrupts
Inode table usage
Interrupts and context switches
Ty!
Munin creates graphs that enable you to see trends. This is very useful to see if a change you made doesn't negatively impact the performance of the system.
Disk
Disk throughput per device & IOstat
The amount of data written or read from a disk device. Disks are always slow compared to memory. A lot of disk reads could for example indicate that your database server doesn't have enough RAM.
Inode usage in percent
Every filesystem has a index where information about the files is stored, like name, permissions and location on the disk. With many small files the space available to this index could run out. If that happens no new files can be saved to that filesystem, even if there is enough space on the device.
Firewall Throughput
Just like it says, the amount of packets going though the iptables firewall. Often this firewall is active on all interfaces on the system. This is only really interesting if you run munin on a router/firewall/gateway system.
Processes
Fork rate
Processes are created by forking a existing process into two processes. This is the rate at wich new processes are created.
Number of threads
The total number of processes running in the system.
VMstat
Usage of cpu time.
running: time spent running non-kernel code
I/O sleep: Time spent waiting for IO
System
Available entropy: The entropy is the measure of the random numbers available from /dev/urandom. These random numbers are needed to create SSL connections. If you create a large number of SSL connections this randomness pool could possibly run out of real random numbers.
File table usage
The total number of files open in the system. If this number suddenly goes up there might be a program that is not releasing its file handles properly.
Related
I have a problem that is memory bandwidth limited -- I need to read a lot (many GB) of data sequentially from RAM, do some quick processing and write it sequentially to a different location in RAM. Memory latency is not a concern.
Is there any benefit from dividing the work between two or more cores in different NUMA zones? Equivalently, does working across zones reduce the available bandwidth?
For bandwidth-limited, multi-threaded code, the behavior in a NUMA system will primarily depend how "local" each thread's data accesses are, and secondarily on details of the remote accesses.
In a typical 2-socket server system, the local memory bandwidth available to two NUMA nodes is twice that available to a single node. (But remember that it may take many threads running on many cores to reach asymptotic bandwidth for each socket.)
The STREAM Benchmark, for example, is typically run in a configuration that allows almost all accesses from every thread to be "local". This is implemented by assuming "first touch" NUMA placement -- when allocated memory is first written, the OS has to create mappings from the process virtual address space to physical addresses, and (by default) the OS chooses physical addresses that are in the same NUMA node as the core that executed the store instruction.
"Local" bandwidth (to DRAM) in most systems is approximately symmetric (for reads and writes) and relatively easy to understand. "Remote" bandwidth is much more asymmetric for reads and writes, and there is usually significant contention between the read/write commands going between the chips and the data moving between the chips. The overall ratio of local to remote bandwidth also varies significantly across processor generations. For some processors (e.g., Xeon E5 v3 and probably v4), the interconnect is relatively fast, so jobs with poor locality can often be run with all of the memory interleaved between the two sockets.
Local bandwidths have increased significantly since then, with more recent processors generally strongly favoring local access.
Example from the Intel Xeon Platinum 8160 (2 UPI links between chips):
Local Bandwidth for Reads (each socket) ~112 GB/s
Remote Bandwidth for Reads (one-direction at a time) ~34 GB/s
Local bandwidth scales perfectly in two-socket systems, and remote bandwidth also scales very well when using both sockets (each socket reading data from the other socket).
It gets more complicated with combined read and write traffic between sockets, because the read traffic from node 0 to node 1 competes with the write traffic from node 1 to node 0, etc.
Local Bandwidth for 1R:1W (each socket) ~101 GB/s (reduced due to read/write scheduling overhead)
Remote Bandwidth for 1R:1W (one socket running at a time) ~50 GB/s -- more bandwidth is available because both directions are being used, but this also means that if both sockets are doing the same thing, there will be conflicts. I see less than 60 GB/s aggregate when both sockets are running 1R:1W remote at the same time.
Of course different ratios of local to remote accesses will change the scaling. Timing can also be an issue -- if the threads are doing local accesses at the same time, then remote accesses at the same time, there will be more contention in the remote access portion (compared to a case in which the threads are doing their remote accesses at different times).
I Just want to understand how we should plan for the capacity of a NiFi instance.
We have a NiFi instance which is having around 500 flows. So, the total number of processors enabled on NiFi canvas is around 4000. We do run 2-5 flows simultaneously which does not take more than half an hour i.e. we do process data in MBs.
It was working fine till now but we are seeing outofMemory error very often. So we increased xms and xmx parameters from 4g to 8g which has resolved the problem for now. But going forward we will have more flows and we may face outofmemory issue again.
So, can anyone help with matrix of capacity planning or any suggestion to avoid such issues before happening? eg:- If we have 3000 processors enabled with/without any processing then Xg amount memory required.
Any input on NiFi capacity planning would be appreciated.
Thanks in Advance.
OOM errors can occur due to specific memory consuming processors. For example: SplitXML is loading your whole record to memory, so it could load a 1GiB file for instance.
Each processors can document what resource considerations should be taken. All of the Apache processors(as far as I can tell) are documented in that matter so you can rely on them.
In our example, by the way, SplitXML can be replaced with SplitRecord which doesn't load all of the record to memory.
So even if you use 1000 processors simultaneously, they might not consume as much memory as one processor that loads your whole FlowFile's content to memory.
Check which processors you are using and make sure you don't use one like that(there are more like this one that load the whole document to memory).
I use Moodle on centos7 with Php, Mariadb, Nginx. There are huge number of users that use this Moodle. If the number of users grows more than 300user per sec, the Moodle has delay in response and seems to be hanged!
I read about:
Galera (multi master clustering with 3nodes)
slave-master (separate read and write)
MaxScale
increase ram and cpu (I have up to: 288GB ram, 24coreCPU, SSD drive)
What is the best practice to serve huge number of requests without delay? How can I scale my database (because it is the bottleneck)? I want scale it for serve huge request (most of them is read from database)
MariaDB (and MySQL) can scale 'infinitely' for reads by using Replication and sending read requests to Slave servers.
500 connections per second is very high. (But I don't know what the practical limit is.)
There are several extra tools that can do "connection pooling". Search for this; it may let you go well past 500 logical connections on a single server.
In the case of Galera, you could have 3 read-write nodes, plus any number of Slaves hanging off each of the 3.
For simple Master-Slave, there can be any number of Slaves hanging off the one Master.
Obviously you can do generic MySQL/MariaDB tuning first, and use a recent version of Moodle (3.7 is current right now)
After that, one thing you can check is how you have sessions implemented.
https://docs.moodle.org/37/en/Session_handling
This page also has many more tips:
https://docs.moodle.org/37/en/Performance_recommendations
I am trying to make a simple general purpose multi-threaded async downloader in python.How many parallel connections can be generally be made to a server with minimum risk of being banned or rate limited.
I am aware that network will be a limiting in some cases but lets assume in this case that network isn't an issue in this case for the sake of discussion.I/O is also done asynchronously.
According to Browserscope , browsers make a maximum of 17 connections at a time.
However according to my research , most download managers download files in multi-part and make 8+ connections per file.
1.How many files can be downloaded at a time ?
2.How many chunks for a single can be downloaded at one time ?
3.What should be the minimum size of those chunks to make it worth creating the overhead of creating parallel connections ?
It depends.
While some servers tolerate a high number of connections, others don't. General web servers might be more on the high side (low two digit), file hosters might be more sensitive.
There's little to say unless you can check the server's configuration or just try and remember for the next time when your ban has timed out.
You should however watch your bandwidth. Once you max out your access line there's no gain in further increasing the connections.
I'm using IT Guru's Opnet to simulate different networks. I've run the basic HomeLAN scenario and by default it uses an ethernet connection running at a data rate of 20Kbps. Throughout the scenarios this is changed from 20K to 40K, then to 512K and then to a T1 line running at 1.544Mbps. My question is - does increasing the data rate for the line increase the throughput?
I have this graph output from the program to display my results:
Please note it's the image on the forefront which is of interest
In general, the signaling capacity of a data path is only one factor in the net throughput.
For example, TCP is known to be sensitive to latency. For any particular TCP implementation and path latency, there will be a maximum speed beyond which TCP cannot go regardless of the path's signaling capacity.
Also consider the source and destination of the traffic: changing the network capacity won't change the speed if the source is not sending the data any faster or if the destination cannot receive it any faster.
In the case of network emulators, also be aware that buffer sizes can affect throughput. The size of the network buffer must be at least as large as the signal rate multiplied by the latency (the Bandwidth Delay Product). I am not familiar with the particulars of Opnet, but I have seen other emulators where it is possible to set a buffer size too small to support the select rate and latency.
I have written a couple of articles related to these topics which may be helpful:
This one discusses common network bottlenecks: Common Network Performance Problems
This one discusses emulator configuration issues: Network Emulators