I am executing my MapReduce program on a Multinode cluster with 3 nodes. While executing I am getting the error "Status:failed Too many fetch failures" after MAP 100% was reduced to 16%, so why am I getting this error?
How can I solve this?
The reasons for the error could be DNS issues, http threads on the mapper side or JVM bug that cause connections to get map outputs fail. Please look at this http://archive.apachecon.com/na2013/presentations/27-Wednesday/Big_Data/14:45-7_Deadly_Hadoop_Misconfigurations-Kathleen_Ting/HadoopTroubleshootingApacheCon.pdf. You can find solution on slide number 24.
Hope this helps
Related
I am still leaning python and Nornir for network automation as a network engineer.
I am scripting to find any BGP issue from multiple devices with Nornir using "TextFSM".
The python with nornir works correctly without "TextFSM.
I get an error message when TextFSM is used for parsing.
I searched and spent time a lot to find the reason of this issue.
Half of the network devices I am managing has the issue with "TextFSM" at present.
Could you please advise me where I can start to fix this issue?
Thank you.
I get the output with netmiko like below
output = net_connect.send_command('show ip bgp summary')
Here is the output of "print(repr(output))
We are running into a possible latency issue with Corda Token SDK and encumbrances. The encumbrance is not immediately recognized by DatabaseTokenSelection but after a sufficient wait it does. And this happens sporadically.
This is a little hard to diagnose but if you're able to pinpoint what's causing the latency please do make an issue on here: https://github.com/corda/token-sdk/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc
if you don't hear back on the issue message me (#david) on slack.corda.net I can help kick the can along for you.
Some tips for how to debug this:
test the latency between the nodes with ping
make sure any cnames resolve with nslookup from some of your corda nodes
check if any flows are ending up in the flow hospital along the way: https://docs.corda.net/docs/corda-os/4.7/node-flow-hospital.html
I am doing simulation on datacenter network. I use inet TCPBasicClientApp module for TCP traffic. In my simulation, I feed a set of TCP flows from a file. It works for most of the tests, but in some tests I got the error:
Module Error: Ephemeral port range 1024..5000 exhausted
I tried to figure out, but it seems out of my control, because this error comes from Inet TCP module.
After digging up Google, I got an idea that the error is related to 2 parameters:
thinkTime
idleInterval
They said as soon as they increased them, the problem is solved.
I was setting both at 1s; then tried to increase it to 2s, and 5s for both parameters, but the error still remains.
So is there anyone has faced this error and got rid of it, or has any idea to fix this error just share me please? I very much appreciate any help.
Best,
Danh
Hi all, let me update my solution for this problem, in case someone need to refer. So instead of using ephemeral local port which is limited in a range (1024--5000), I use normal port number by setting the parameter localPort, with initial value of 2000 then increases by 1 for every new socket. By setting that I don't see the error anymore, so I guess my simulation now can serve as much as the maximum port number supported in Omnet, e.g, 65535 ports. Thanks.
Best,
Danh
I am trying to implement retry strategy for a http outbound. After Googling, I found out that until-successful is having good capability to retry. But, maximum number of threads available for this API is 32. Hence, messages will be lost once the thread count reaches 32 and hence it might result in performance issues. Could someone clarify whether this issue is fixed in mule.
What are the other alternative strategies available ? Any suggestions/links/sample/pseudo code is really appreciated.
There is no limit to the number of retries you can attempt with until-successful.
I have no idea where you got this "32" from...
I have an assignment to implement simple fault-tolerance in an OpenMPI application. The problem we are having is that, despite setting the MPI error handling to MPI_ERRORS_RETURN, when one of our nodes is unplugged from the cluster we get the following error on the next MPI_ call after a lengthy hang:
[btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() failed: Connection timed out (110)
My take from this is that it is not possible to continue processing on all other nodes when one node drops from the network with OpenMPI. Can anyone confirm this for me, or point me in a direction for preventing the btl_tcp_endpoint error?
We are using OpenMPI version 1.6.5.
The MPI_ERRORS_RETURN code paths are not well tested (and probably not well implemented) in Open MPI. They simply haven't been a priority, so we've never really done much work in this area.
Sorry.