Flume channel Capacity is full giving exception - hadoop-streaming

I have started flume Twitteragent to fetch the twitter data into the hdfs after few mins the data was not able to write to the hdfs it was poping a message in the terminal as mentioned below.
Error
org.apache.flume.ChannelException: Take list for MemoryTransaction, capacity 100 full, consider committing more frequently, increasing capacity, or increasing thread count at org.apache.flume.channel.MemoryChannel$MemoryTransaction.doTake(MemoryChannel.java:101)at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:391)at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:853)2015-10-16 15:14:18,126 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERRORorg.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: org.apache.flume.ChannelException: Take list for MemoryTransaction, capacity 100 full, consider committing more frequently, increasing capacity, or increasing thread count
I guess we need to increase the channel capacity but not sure needed a help.

Related

Kusto ingestion limit and command throttle because of capacity policy

I use kusto ingest client kustoClient.IngestFromDataReader to ingest data. And it throws exception An error occurred for source: 'DataReader'. Error: 'Failed to ingest: State='Throttled', Status='The control command was aborted due to throttling. Retrying after some backoff might succeed. CommandType: 'DataIngestPull', Capacity: 18, Origin: 'CapacityPolicy/Ingestion'.'. I read the document here https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/capacitypolicy#ingestion-capacity, and guess it may because that there are too many requests run concurrently and the cluster capacity is limited, am I right?
I am still a bit confused about the document. What does the final number (Minimum(ClusterMaximumConcurrentOperations, Number of nodes in cluster * Maximum(1, Core count per node * CoreUtilizationCoefficient))) mean? Does it means the total concurrent operations number? And specifically does one kusto ingest client or one kusto ingest command only have one concurrent operation or it is configurable?
Thanks a lot!
Effectively the document means that ingest capacity (in terms of concurrent ingest operations) is 3/4 times the overall number of cores in the cluster, but not higher than 512.
You can view your cluster capacity and its utilization by running '.show cluster capacity' command.
If you do not want to handle the throttling by yourself, you should use the KustoQueuedIngestClient class, and pass to it the ingestion service endpoint (https://ingest-..kusto.windows.net).
The ingestion service will take care of managing the load on your cluster.
See Ingestion Overview article for more details.

NiFi memory management

I Just want to understand how we should plan for the capacity of a NiFi instance.
We have a NiFi instance which is having around 500 flows. So, the total number of processors enabled on NiFi canvas is around 4000. We do run 2-5 flows simultaneously which does not take more than half an hour i.e. we do process data in MBs.
It was working fine till now but we are seeing outofMemory error very often. So we increased xms and xmx parameters from 4g to 8g which has resolved the problem for now. But going forward we will have more flows and we may face outofmemory issue again.
So, can anyone help with matrix of capacity planning or any suggestion to avoid such issues before happening? eg:- If we have 3000 processors enabled with/without any processing then Xg amount memory required.
Any input on NiFi capacity planning would be appreciated.
Thanks in Advance.
OOM errors can occur due to specific memory consuming processors. For example: SplitXML is loading your whole record to memory, so it could load a 1GiB file for instance.
Each processors can document what resource considerations should be taken. All of the Apache processors(as far as I can tell) are documented in that matter so you can rely on them.
In our example, by the way, SplitXML can be replaced with SplitRecord which doesn't load all of the record to memory.
So even if you use 1000 processors simultaneously, they might not consume as much memory as one processor that loads your whole FlowFile's content to memory.
Check which processors you are using and make sure you don't use one like that(there are more like this one that load the whole document to memory).

Corda OutOfMemory Issue And ActiveMQ large message size issue

I tried sending almost 5000 output states in a single transaction and I ran out of memory. I am trying to figure out how to increase memory. I tried increasing in the runnodes.bat file by teaking command
java -Xmx1g -jar runnodes.jar %*
But this doesn't seem to increase the heap size. So I tried running the following command for each node manually with memory option given -Xmx1g.
bash -c 'cd "/build/nodes/Notary" ; "/Library/Java/JavaVirtualMachines/jdk1.8.0_152.jdk/Contents/Home/jre/bin/java" "-Dname=Notary-corda.jar" "-Dcapsule.jvm.args=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -javaagent:drivers/jolokia-jvm-1.3.7-agent.jar=port=7005,logHandlerClass=net.corda.node.JolokiaSlf4Adapter" **"-Xmx1g"** "-jar" "corda.jar" && exit'
This solved out of memory issue but now I started seeing ActiveMQ large message size issue
E 10:57:31-0600 [Thread-1 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4#2cfd9b0a)] impl.JournalImpl.run - appendAddRecord::java.lang.IllegalArgumentException: Record is too large to store 22545951 {}
java.lang.IllegalArgumentException: Record is too large to store 22545951
at org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:2915) ~[artemis-journal-2.2.0.jar:2.2.0]
Any idea?
This is because you are trying to send a transaction that is almost 20MB in size. In Corda 3 and earlier, the limit on transaction size is 10MB, and this amount is not configurable.
In Corda 4, the limit on transaction size can be configured by the network operator as one of the network parameters (see https://docs.corda.net/head/network-map.html#network-parameters). The logic for allowing a configurable limit is that otherwise, larger nodes could bully smaller nodes off the network by sending extremely large transactions that it would be infeasible for the smaller nodes to process.

DynamoDB: High SuccessfulRequestLatency

We had a period of latency in our application that was directly correlated with latency in DynamoDB and we are trying to figure out what caused that latency.
During that time, the consumed reads and consumed writes for the table were normal (much below the provisioned capacity) and the number of throttled requests was also 0 or 1. The only thing that increased was the SuccessfulRequestLatency.
The high latency occurred during a period where we were doing a lot of automatic writes. In our use case, writing to dynamo also includes some reading (to get any existing records). However, we often write the same quantity of data in the same period of time without causing any increased latency.
Is there any way to understand what contributes to an increase in SuccessfulRequest latency where it seems that we have provisioned enough read capacity? Is there any way to diagnose the latency caused by this set of writes to dynamodb?
You can dig deeper by checking the Get Latency and Put Latency in CloudWatch.
As you have already mentioned, there was no throttling, and your writes involve some reading as well, and your writes at other period of time don't cause any latency, you should check for what exactly in read operation is causing this.
Check SuccessfulRequestLatency metric while including the Operation dimension as well. Start with GetItem and BatchGetItem. If that doesn't
help include Scan and Query as well.
High request latency can sometimes happen when DynamoDB is doing an internal failover of one of its storage nodes.
Internally within Dynamo each storage partition has to be replicated across multiple nodes to provide a high level of fault tolerance. Occasionally one of those nodes will fail and a replacement node has to be introduced, and this can result in elevated latency for a subset of affected requests.
The advice I've had from AWS is to use a short timeout and a fast retry (e.g. 100ms) if your use-case is latency-sensitive. It's my understanding that only requests that hit the affected node experience increased latency, so within one or two retries you'll hit a different node and get a successful response, with minimal impact on your overall latency. Obviously it's hard to verify this, because it's not a scenario you can reproduce!
If you've got a support contract with AWS, it's well worth submitting a support ticket from the AWS console when events like this happen. They are usually able to provide an insight into what actually happened.
Note: If you're doing retries, remember to use exponential backoff to reduce the risk of throttling.

Munin Graphs meaning

I've been using Munin for some days and I think it's very interesting information, but I don't understand some of the graphs, and how they can be used/read to get information to improve system.
The ones I don't understand are:
Disk
Disk throughput per device
Inode usage in percent
IOstat
Firewall Throughput
Processes
Fork rate
Number of threads
VMstat
System
Available entropy
File table usage
Individual interrupts
Inode table usage
Interrupts and context switches
Ty!
Munin creates graphs that enable you to see trends. This is very useful to see if a change you made doesn't negatively impact the performance of the system.
Disk
Disk throughput per device & IOstat
The amount of data written or read from a disk device. Disks are always slow compared to memory. A lot of disk reads could for example indicate that your database server doesn't have enough RAM.
Inode usage in percent
Every filesystem has a index where information about the files is stored, like name, permissions and location on the disk. With many small files the space available to this index could run out. If that happens no new files can be saved to that filesystem, even if there is enough space on the device.
Firewall Throughput
Just like it says, the amount of packets going though the iptables firewall. Often this firewall is active on all interfaces on the system. This is only really interesting if you run munin on a router/firewall/gateway system.
Processes
Fork rate
Processes are created by forking a existing process into two processes. This is the rate at wich new processes are created.
Number of threads
The total number of processes running in the system.
VMstat
Usage of cpu time.
running: time spent running non-kernel code
I/O sleep: Time spent waiting for IO
System
Available entropy: The entropy is the measure of the random numbers available from /dev/urandom. These random numbers are needed to create SSL connections. If you create a large number of SSL connections this randomness pool could possibly run out of real random numbers.
File table usage
The total number of files open in the system. If this number suddenly goes up there might be a program that is not releasing its file handles properly.

Resources