Understand Hadoop yarn memory in datanode and Unix memory

Understand Hadoop yarn memory in datanode and Unix memory - unix

We are having 20 data nodes and 3 management nodes. Each data node is having 45GB of RAM .
Data node RAM Capacity
45x20=900GB total ram
Management nodes RAM Capacity
100GB x 3 = 300GB RAM
I can see memory is completely occupied in Hadoop resource manager URL and Jobs submitted are in waiting state since 900GB is occupied till 890GB in resource manager url.
However , I have raised a request to Increase my memory capacity to avoid memory is being used till 890Gb out of 900GB.
Now, Unix Team guys are saying in data node out of 45GB RAM 80% is completely free using free -g command (cache/buffer) shows the output as free . However in Hadoop side(resource manager) URL says it is completely occupied and few jobs are in hold since memory is completely occupied.I would like to know how hadoop is calculating the memory in resource manager and is it good to upgrade the memory since it is occupying every user submit a hive jobs .
Who is right here hadoop output in RM or Unix free command .

The UNIX command free is correct because the RM shows reserved memory not memory used.
If I submit a MapReduce job with 1 map task requesting 10GB of memory per map task but the map task only uses 2GB then the system will only show 2GB used. The RM will show 10GB used because it has to reserve that amount for the task even if the task doesn't use all the memory.

Related

"Cannot allocate memory" when starting new Flink job

We are running Flink on a 3 VM cluster. Each VM has about 40 Go of RAM. Each day we stop some jobs and start new ones. After some days, starting a new job is rejected with a "Cannot allocate memory" error :
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000340000000, 12884901888, 0) failed; error='Cannot allocate memory' (errno=12)
Investigations show that the task manager RAM is ever growing, to the point it exceeds the allowed 40 Go, although the jobs are canceled.
I don't have access (yet) to the cluster so I tried some tests on a standalone cluster on my laptop and monitored the task manager RAM:
With jvisualvm I can see everything working as intended. I load the job memory, then clean it and wait (a few minutes) for the GB to fire up. The heap is released.
Whereas with top, memory is - and stay - high.
At the moment we are restarting the cluster every morning to account for this memory issue, but we can't afford it anymore as we'll need jobs running 24/7.
I'm pretty sure it's not a Flink issue but can someone point me in the right direction about what we're doing wrong here?

On standalone mode, Flink may not release resources as you wished.
For example, resources holden by static member in an instance.
It is highly recommended using YARN or K8s as runtime env.

How can I configure yarn cluster for parallel execution of Applications?

When I run spark job on yarn cluster, Applications are running in queue. So how can I run in parallel number of Applications?.

I suppose your YARN scheduler option is set to FIFO. Please change it to FAIR or capacity scheduler.Fair Scheduler attempts to allocate resources so that all running applications get the same share of resources.
The Capacity Scheduler allows sharing of a Hadoop cluster along
organizational lines, whereby each organization is allocated a certain
capacity of the overall cluster. Each organization is set up with a
dedicated queue that is configured to use a given fraction of the
cluster capacity. Queues may be further divided in hierarchical
fashion, allowing each organization to share its cluster allowance
between different groups of users within the organization. Within a
queue, applications are scheduled using FIFO scheduling.
If you are using capacity scheduler then
In spark submit mention your queue --queue queueName
Please try to change this capacity scheduler property
yarn.scheduler.capacity.maximum-applications = any number
it will decide how many application will run parallely

By default, Spark will acquire all available resources when it launches a job.
You can limit the amount of resources consumed for each job via the spark-submit command.
Add the option "--conf spark.cores.max=1" to spark-submit. You can change the number of cores to suite your environment. For example if you have 100 total cores, you might limit a single job to 25 cores or 5 cores, etc.
You can also limit the amount of memory consumed: --conf spark.executor.memory=4g
You can change settings via spark-submit or in the file conf/spark-defaults.conf. Here is a link with documentation:
Spark Configuration

Corda OutOfMemory Issue And ActiveMQ large message size issue

I tried sending almost 5000 output states in a single transaction and I ran out of memory. I am trying to figure out how to increase memory. I tried increasing in the runnodes.bat file by teaking command
java -Xmx1g -jar runnodes.jar %*
But this doesn't seem to increase the heap size. So I tried running the following command for each node manually with memory option given -Xmx1g.
bash -c 'cd "/build/nodes/Notary" ; "/Library/Java/JavaVirtualMachines/jdk1.8.0_152.jdk/Contents/Home/jre/bin/java" "-Dname=Notary-corda.jar" "-Dcapsule.jvm.args=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -javaagent:drivers/jolokia-jvm-1.3.7-agent.jar=port=7005,logHandlerClass=net.corda.node.JolokiaSlf4Adapter" **"-Xmx1g"** "-jar" "corda.jar" && exit'
This solved out of memory issue but now I started seeing ActiveMQ large message size issue
E 10:57:31-0600 [Thread-1 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4#2cfd9b0a)] impl.JournalImpl.run - appendAddRecord::java.lang.IllegalArgumentException: Record is too large to store 22545951 {}
java.lang.IllegalArgumentException: Record is too large to store 22545951
at org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:2915) ~[artemis-journal-2.2.0.jar:2.2.0]
Any idea?

This is because you are trying to send a transaction that is almost 20MB in size. In Corda 3 and earlier, the limit on transaction size is 10MB, and this amount is not configurable.
In Corda 4, the limit on transaction size can be configured by the network operator as one of the network parameters (see https://docs.corda.net/head/network-map.html#network-parameters). The logic for allowing a configurable limit is that otherwise, larger nodes could bully smaller nodes off the network by sending extremely large transactions that it would be infeasible for the smaller nodes to process.

How can I increase the amount of memory available to a Corda node?

We observe the H2 database running out of memory when running a job that issues/modifies more than 6k states sequentially.
How can I provide more memory to the node?

You can provide more memory to the node using the -Xmx command line flag. For example, if you wanted to run the node with 2gb of heap memory, you would run:
java -Xmx2048m -jar corda.jar
See more information here: https://docs.corda.net/running-a-node.html#starting-an-individual-corda-node.

Munin Graphs meaning

I've been using Munin for some days and I think it's very interesting information, but I don't understand some of the graphs, and how they can be used/read to get information to improve system.
The ones I don't understand are:
Disk
Disk throughput per device
Inode usage in percent
IOstat
Firewall Throughput
Processes
Fork rate
Number of threads
VMstat
System
Available entropy
File table usage
Individual interrupts
Inode table usage
Interrupts and context switches
Ty!

Munin creates graphs that enable you to see trends. This is very useful to see if a change you made doesn't negatively impact the performance of the system.
Disk
Disk throughput per device & IOstat
The amount of data written or read from a disk device. Disks are always slow compared to memory. A lot of disk reads could for example indicate that your database server doesn't have enough RAM.
Inode usage in percent
Every filesystem has a index where information about the files is stored, like name, permissions and location on the disk. With many small files the space available to this index could run out. If that happens no new files can be saved to that filesystem, even if there is enough space on the device.
Firewall Throughput
Just like it says, the amount of packets going though the iptables firewall. Often this firewall is active on all interfaces on the system. This is only really interesting if you run munin on a router/firewall/gateway system.
Processes
Fork rate
Processes are created by forking a existing process into two processes. This is the rate at wich new processes are created.
Number of threads
The total number of processes running in the system.
VMstat
Usage of cpu time.
running: time spent running non-kernel code
I/O sleep: Time spent waiting for IO
System
Available entropy: The entropy is the measure of the random numbers available from /dev/urandom. These random numbers are needed to create SSL connections. If you create a large number of SSL connections this randomness pool could possibly run out of real random numbers.
File table usage
The total number of files open in the system. If this number suddenly goes up there might be a program that is not releasing its file handles properly.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex