Getting error while running map reduce jobs in R

Getting error while running map reduce jobs in R - r

I just started integrating RHadoop. It is integrated R-studio server with Hadoop, but I am getting error while running map-reduce jobs. when I run following Line of code.
library(rmr2)
a <- to.dfs(seq(from=1, to=500, by=3), output="/user/hduser/num")
*b <- mapreduce(input=a, map=function(k,v){keyval(v,v*v)})*
StackTrace:
15/03/24 21:13:47 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.2.0.jar] /tmp/streamjob4788227373090541042.jar tmpDir=null
15/03/24 21:13:48 INFO client.RMProxy: Connecting to ResourceManager at tungsten10/192.168.0.123:8032
15/03/24 21:13:48 INFO client.RMProxy: Connecting to ResourceManager at tungsten10/192.168.0.123:8032
15/03/24 21:13:49 INFO mapred.FileInputFormat: Total input paths to process : 1
15/03/24 21:13:50 INFO mapreduce.JobSubmitter: number of splits:2
15/03/24 21:13:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1427104115974_0009
15/03/24 21:13:50 INFO impl.YarnClientImpl: Submitted application application_1427104115974_0009
15/03/24 21:13:50 INFO mapreduce.Job: The url to track the job: http://XXX.XXX.XXX.XXX:8088/proxy/application_1427104115974_0009/
15/03/24 21:13:50 INFO mapreduce.Job: Running job: job_1427104115974_0009
15/03/24 21:14:02 INFO mapreduce.Job: Job job_1427104115974_0009 running in uber mode : false
15/03/24 21:14:03 INFO mapreduce.Job: map 0% reduce 0%
15/03/24 21:14:07 INFO mapreduce.Job: Task Id : attempt_1427104115974_0009_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
15/03/24 21:14:08 INFO mapreduce.Job: Task Id : attempt_1427104115974_0009_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
15/03/24 21:14:15 INFO mapreduce.Job: Task Id : attempt_1427104115974_0009_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
15/03/24 21:14:16 INFO mapreduce.Job: Task Id : attempt_1427104115974_0009_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
15/03/24 21:14:20 INFO mapreduce.Job: Task Id : attempt_1427104115974_0009_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
15/03/24 21:14:21 INFO mapreduce.Job: Task Id : attempt_1427104115974_0009_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
15/03/24 21:14:25 INFO mapreduce.Job: map 100% reduce 0%
15/03/24 21:14:26 INFO mapreduce.Job: Job job_1427104115974_0009 failed with state FAILED due to: Task failed task_1427104115974_0009_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0
15/03/24 21:14:26 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=7
Killed map tasks=1
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=27095
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=27095
Total vcore-seconds taken by all map tasks=27095
Total megabyte-seconds taken by all map tasks=27745280
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
15/03/24 21:14:26 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
**Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
15/03/24 21:14:30 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
Moved: 'hdfs://XXX.XXX.XXX.XXX:8020/tmp/file10076f272b9a' to trash at: hdfs://XXX.XXX.XXX.XXX:8020/user/hduser/.Trash/Current**
I searched a lot for solving this problem, but solution not found yet.
As I am new to RHadoop I am stucked with this problem.
Can, Anyone please help me to resolve this problem, I will be very much thankful.

The error is caused as the HADOOP_STREAMING environment variable is not set in your code. You should specify the full path along with the jar file name. The below R code seems to work fine for me.
R Code (I'm using hadoop 2.4.0 over Ubuntu)
Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")
library(rJava)
library(rhdfs)
# Initialise
hdfs.init()
library(rmr2)
a <- to.dfs(seq(from=1, to=500, by=3), output="/user/hduser/num")
b <- mapreduce(input=a, map=function(k,v){keyval(v,v*v)})
Hope this helps.

Related

API Manager 4.1, Management Console cant force Restart

i have problem with cant force restart from management console login as admin. can anyone helping with me about this. i got log like this
`[2022-10-01 00:45:04,581] INFO - CarbonCoreActivator Starting WSO2 Carbon...
[2022-10-01 00:45:04,581] INFO - CarbonCoreActivator Operating System : Linux 4.18.0-305.el8.x86_64, amd64
[2022-10-01 00:45:04,581] INFO - CarbonCoreActivator Java Home : /usr/java/jdk1.8.0_341-amd64/jre
[2022-10-01 00:45:04,581] INFO - CarbonCoreActivator Java Version : 1.8.0_341
[2022-10-01 00:45:04,581] INFO - CarbonCoreActivator Java VM : Java HotSpot(TM) 64-Bit Server VM 25.341-b10,Oracle Corporation
[2022-10-01 00:45:04,582] INFO - CarbonCoreActivator Carbon Home : /opt/source/wso2_binary/wso2am-4.1.0
[2022-10-01 00:45:04,582] INFO - CarbonCoreActivator Java Temp Dir : /opt/source/wso2_binary/wso2am-4.1.0/tmp
[2022-10-01 00:45:04,582] INFO - CarbonCoreActivator User : abhimata, en-US, Asia/Jakarta
[2022-10-01 00:45:04,786] INFO - DefaultCryptoProviderComponent 'CryptoService.Secret' property has not been set. 'org.wso2.carbon.crypto.provider.SymmetricKeyInternalCryptoProvider' won't be registered as an internal crypto provider. Please set the secret if the provider needs to be registered.
[2022-10-01 00:45:05,125] INFO - KafkaEventAdapterServiceDS Successfully deployed the Kafka output event adaptor service
[2022-10-01 00:45:05,279] INFO - TemplateDeployerServiceTrackerDS Successfully deployed the execution manager tracker service
[2022-10-01 00:45:06,716] INFO - ServiceComponent Eventing Hub ServiceComponent is activated
[2022-10-01 00:45:07,431] WARN - Digester Match [Server/Service/Engine/Host/Valve] failed to set property [maxDays] to []
[2022-10-01 00:45:08,095] ERROR - DefaultRealm nullType class java.lang.reflect.InvocationTargetException
org.wso2.carbon.user.core.UserStoreException: nullType class java.lang.reflect.InvocationTargetException
at org.wso2.carbon.user.core.common.DefaultRealm.createObjectWithOptions(DefaultRealm.java:404) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
at org.wso2.carbon.user.core.common.DefaultRealm.initializeObjects(DefaultRealm.java:231) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
at org.wso2.carbon.user.core.common.DefaultRealm.init(DefaultRealm.java:136) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
at org.wso2.carbon.user.core.common.DefaultRealmService.initializeRealm(DefaultRealmService.java:276) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
at org.wso2.carbon.user.core.common.DefaultRealmService.<init>(DefaultRealmService.java:102) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
at org.wso2.carbon.user.core.common.DefaultRealmService.<init>(DefaultRealmService.java:115) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
at org.wso2.carbon.user.core.internal.Activator.startDeploy(Activator.java:72) ~[?:?]
at org.wso2.carbon.user.core.internal.BundleCheckActivator.start(BundleCheckActivator.java:61) ~[?:?]
at org.eclipse.osgi.internal.framework.BundleContextImpl$3.run(BundleContextImpl.java:842) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.internal.framework.BundleContextImpl$3.run(BundleContextImpl.java:1) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_341]
at org.eclipse.osgi.internal.framework.BundleContextImpl.startActivator(BundleContextImpl.java:834) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.internal.framework.BundleContextImpl.start(BundleContextImpl.java:791) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.internal.framework.EquinoxBundle.startWorker0(EquinoxBundle.java:1013) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.internal.framework.EquinoxBundle$EquinoxModule.startWorker(EquinoxBundle.java:365) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.container.Module.doStart(Module.java:598) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.container.Module.start(Module.java:462) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel$1.run(ModuleContainer.java:1820) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.internal.framework.EquinoxContainerAdaptor$2$1.execute(EquinoxContainerAdaptor.java:150) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.incStartLevel(ModuleContainer.java:1813) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.incStartLevel(ModuleContainer.java:1770) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.doContainerStartLevel(ModuleContainer.java:1735) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.dispatchEvent(ModuleContainer.java:1661) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.dispatchEvent(ModuleContainer.java:1) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:234) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:345) ~[org.eclipse.osgi_3.14.0.v20190517-1309.jar:?]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_341]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_341]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_341]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_341]
at org.wso2.carbon.user.core.common.DefaultRealm.createObjectWithOptions(DefaultRealm.java:358) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
... 25 more
Caused by: org.wso2.carbon.user.core.UserStoreException: DB error occurred while persisting domain : PRIMARY & tenant id : -1234
at org.wso2.carbon.user.core.util.UserCoreUtil.persistDomain(UserCoreUtil.java:931) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
at org.wso2.carbon.user.core.common.AbstractUserStoreManager.persistDomain(AbstractUserStoreManager.java:9083) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
at org.wso2.carbon.user.core.jdbc.JDBCUserStoreManager.<init>(JDBCUserStoreManager.java:320) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
at org.wso2.carbon.user.core.jdbc.JDBCUserStoreManager.<init>(JDBCUserStoreManager.java:262) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
at org.wso2.carbon.user.core.jdbc.UniqueIDJDBCUserStoreManager.<init>(UniqueIDJDBCUserStoreManager.java:129) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_341]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_341]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_341]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_341]
at org.wso2.carbon.user.core.common.DefaultRealm.createObjectWithOptions(DefaultRealm.java:358) ~[org.wso2.carbon.user.core_4.6.3.jar:?]
... 25 more
i expecting can force restart from management console or can with restart with command line from linux. please help me about this; thanks

Usually you can stop wso2server gracefully with ./wso2server.sh --stop. (./api-manager.sh --stop for latest versions)
In the management console(https://localhost:9443/carbon) also there is a shutdown button that you can use.
If you need to kill the process as you require right now, you can use in inbuilt Linux commands such as ps to locate the process and kill it.
Enter the command ps ax | grep wso2
This will list the set of processes started that has the text wso2. Unless you opened several apim instances, there will be one process that contains an ID. There, the first number refers to the process id.
Enter following command to gracefully shutdown the APIM.
kill <process-id>
You may force the shutdown by sending signal kill (-9) like follows if you need to.
kill -9 <process-id>
With ps again, if you can see there is a java process that sets the java home to <APIM_HOME>, additionally, you can kill it with
kill -9 java

STORE relation problem using pig -x local problem, failed to read data

1st approach: Using pig -x mapreduce
Hbase table created via hbase shell
Hbase table is created:
hbase(main):003:0> list
TABLE
clientes
1 row(s)
Took 0.0047 seconds
=> ["clientes"]
Used this code to Load data from clientes.txt into dados (pig -x mapreduce)
grunt> dados = LOAD 'file:///mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt' USING PigStorage(',') AS (
id:chararray,
nome:chararray,
sobrenome:chararray,
idade:int,
funcao:chararray
);
Checked dados with dump dados and failed:
2021-03-07 19:00:32,390 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1615152557282_0002
2021-03-07 19:00:32,390 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases dados
2021-03-07 19:00:32,390 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: dados[1,8],dados[-1,-1] C: R:
2021-03-07 19:00:32,395 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2021-03-07 19:00:37,406 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2021-03-07 19:00:37,406 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1615152557282_0002 has failed! Stop running all dependent jobs
2021-03-07 19:00:37,406 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2021-03-07 19:00:37,410 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2021-03-07 19:00:37,492 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Could not get Job info from RM for job job_1615152557282_0002. Redirecting to job history server.
2021-03-07 19:00:37,595 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
2021-03-07 19:00:37,595 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2021-03-07 19:00:37,597 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.2.2 0.17.0 hadoop 2021-03-07 19:00:31 2021-03-07 19:00:37 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1615152557282_0002 dados MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Wrong FS: hdfs://localhost:9000/user/hadoop, expected: file:///
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:294)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1565)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1562)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1562)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.pig.backend.hadoop.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop.PigJobControl.run(PigJobControl.java:205)
at java.lang.Thread.run(Thread.java:748)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:301)
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:9000/user/hadoop, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:737)
at org.apache.hadoop.fs.RawLocalFileSystem.setWorkingDirectory(RawLocalFileSystem.java:604)
at org.apache.hadoop.fs.FilterFileSystem.setWorkingDirectory(FilterFileSystem.java:307)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:250)
... 18 more
hdfs://localhost:9000/tmp/temp-1169299097/tmp-2103156722,
Input(s):
Failed to read data from "file:///mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt"
Output(s):
Failed to produce result in "hdfs://localhost:9000/tmp/temp-1169299097/tmp-2103156722"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1615152557282_0002
2021-03-07 19:00:37,597 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2021-03-07 19:00:37,601 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias dados. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
Details at logfile: /mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/pig_1615154395936.log
2nd approach: Using pig -x local (dump dados works)
grunt> dados = LOAD 'file:///mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt' USING PigStorage(',') AS (
>> id:chararray,
>> nome:chararray,
>> sobrenome:chararray,
>> idade:int,
>> funcao:chararray
>> );
2021-03-07 19:02:17,219 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2021-03-07 19:02:17,222 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt:0+794
2021-03-07 19:02:17,226 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 2
2021-03-07 19:02:17,226 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2021-03-07 19:02:17,241 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2021-03-07 19:02:17,243 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2021-03-07 19:02:17,253 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: dados[1,8],dados[-1,-1] C: R:
2021-03-07 19:02:17,266 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner -
2021-03-07 19:02:17,274 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task:attempt_local116575577_0001_m_000000_0 is done. And is in the process of committing
2021-03-07 19:02:17,280 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner -
2021-03-07 19:02:17,280 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task attempt_local116575577_0001_m_000000_0 is allowed to commit now
2021-03-07 19:02:17,285 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local116575577_0001_m_000000_0' to file:/tmp/temp2133275539/tmp1539690224
2021-03-07 19:02:17,286 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - map
2021-03-07 19:02:17,286 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local116575577_0001_m_000000_0' done.
2021-03-07 19:02:17,291 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Final Counters for attempt_local116575577_0001_m_000000_0: Counters: 16
File System Counters
FILE: Number of bytes read=1264
FILE: Number of bytes written=530456
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=20
Map output records=20
Input split bytes=414
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=311427072
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
org.apache.pig.PigWarning
FIELD_DISCARDED_TYPE_CONVERSION_FAILED=1
2021-03-07 19:02:17,291 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local116575577_0001_m_000000_0
2021-03-07 19:02:17,291 [Thread-7] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2021-03-07 19:02:17,485 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:02:17,492 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:02:17,492 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2021-03-07 19:02:17,492 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2021-03-07 19:02:17,493 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:02:17,536 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2021-03-07 19:02:17,540 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.2.2 0.17.0 hadoop 2021-03-07 19:02:16 2021-03-07 19:02:17 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_local116575577_0001 1 0 n/a n/a n/a n/a 0 0 0 0 dados MAP_ONLY file:/tmp/temp2133275539/tmp1539690224,
Input(s):
Successfully read 20 records from: "file:///mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt"
Output(s):
Successfully stored 20 records in: "file:/tmp/temp2133275539/tmp1539690224"
Counters:
Total records written : 20
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local116575577_0001
2021-03-07 19:02:17,542 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:02:17,544 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:02:17,551 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:02:17,558 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 1 time(s).
2021-03-07 19:02:17,558 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2021-03-07 19:02:17,563 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2021-03-07 19:02:17,563 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2021-03-07 19:02:17,570 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
2021-03-07 19:02:17,570 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(id,nome,sobrenome,,funcao)
(c001,Josias,Silva,55,Analista de Mercado)
(1100002,Pedro,Malan,74,Professor)
(1100003,Maria,Maciel,34,Bombeiro)
(1100004,Suzana,Bustamante,66,Analista de TI)
(1100005,Karen,Moreira,74,Advogado)
(1100006,Patricio,Teixeira,42,Veterinario)
(1100007,Elisa,Haniero,43,Piloto)
(1100008,Mauro,Bender,63,Marceneiro)
(1100009,Mauricio,Wagner,39,Artista)
(1100010,Douglas,Macedo,60,Escritor)
(1100011,Francisco,McNamara,47,Cientista de Dados)
(1100012,Sidney,Raynor,26,Escritor)
(1100013,Maria,Moon,41,Gerente de Projetos)
(1100014,Bete,Balanaira,65,Musico)
(1100015,Julia,Peixoto,49,Especialista em TI)
(1100016,Jeronimo,Wallace,52,Engenheiro de Dados)
(1100017,Noeli,Laura,72,Cientista de Dados)
(1100018,Jean,Junior,45,Desenvolvedor RPA)
(1100019,Cristina,Garbim,63,Engenheiro Blockchain)
But STORE dados INTO 'hbase://clientes' or STORE dados INTO 'file:///home/hadoop/hadloop/pig_output' fails:
grunt> STORE dados INTO 'hbase://clientes' USING
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>> 'dados_clientes:nome
>> dados_clientes:sobrenome
>> dados_clientes:idade
>> dados_clientes:funcao'
>> );
2021-03-07 19:03:51,347 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1289080477_0002
2021-03-07 19:03:51,347 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases dados
2021-03-07 19:03:51,347 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: dados[1,8],dados[-1,-1] C: R:
2021-03-07 19:03:51,349 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2021-03-07 19:03:51,349 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local1289080477_0002]
2021-03-07 19:03:51,835 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for clientes
2021-03-07 19:03:51,839 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2021-03-07 19:03:51,839 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2021-03-07 19:03:51,843 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: dados[1,8],dados[-1,-1] C: R:
2021-03-07 19:03:51,860 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation - Closing zookeeper sessionid=0x1780e985b4d000f
2021-03-07 19:03:51,866 [LocalJobRunner Map Task Executor #0] INFO org.apache.zookeeper.ZooKeeper - Session: 0x1780e985b4d000f closed
2021-03-07 19:03:51,866 [LocalJobRunner Map Task Executor #0-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1780e985b4d000f
2021-03-07 19:03:51,867 [Thread-10] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2021-03-07 19:03:51,870 [Thread-10] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1289080477_0002
java.lang.Exception: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:83)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:144)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:275)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:992)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75)
... 18 more
2021-03-07 19:03:52,055 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2021-03-07 19:03:52,055 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1289080477_0002 has failed! Stop running all dependent jobs
2021-03-07 19:03:52,055 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2021-03-07 19:03:52,056 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:03:52,057 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:03:52,057 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2021-03-07 19:03:52,058 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.2.2 0.17.0 hadoop 2021-03-07 19:03:50 2021-03-07 19:03:52 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local1289080477_0002 dados MAP_ONLY Message: Job failed! hbase://clientes,
Input(s):
Failed to read data from "file:///mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt"
Output(s):
Failed to produce result in "hbase://clientes"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local1289080477_0002
2021-03-07 19:03:52,058 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
grunt> STORE dados INTO 'file:///home/hadoop/hadloop/pig_output' USING
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>> 'dados_clientes:nome
>> dados_clientes:sobrenome
>> dados_clientes:idade
>> dados_clientes:funcao'
>> );
java.lang.Exception: java.lang.IllegalArgumentException: Illegal character code:47, </> at 0. User-space table qualifiers can only contain 'alphanumeric characters': i.e. [a-zA-Z_0-9-.]: ///home/hadoop/hadloop/pig_output
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
Caused by: java.lang.IllegalArgumentException: Illegal character code:47, </> at 0. User-space table qualifiers can only contain 'alphanumeric characters': i.e. [a-zA-Z_0-9-.]: ///home/hadoop/hadloop/pig_output
at org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:196)
at org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:149)
at org.apache.hadoop.hbase.TableName.<init>(TableName.java:322)
at org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:358)
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:449)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.<init>(TableOutputFormat.java:107)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.getRecordWriter(TableOutputFormat.java:153)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:83)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:659)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-03-07 19:05:10,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1458581109_0003
2021-03-07 19:05:10,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases dados
2021-03-07 19:05:10,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: dados[1,8],dados[-1,-1] C: R:
2021-03-07 19:05:10,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2021-03-07 19:05:10,477 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2021-03-07 19:05:10,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1458581109_0003 has failed! Stop running all dependent jobs
2021-03-07 19:05:10,478 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2021-03-07 19:05:10,478 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:05:10,479 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:05:10,480 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2021-03-07 19:05:10,480 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.2.2 0.17.0 hadoop 2021-03-07 19:05:10 2021-03-07 19:05:10 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local1458581109_0003 dados MAP_ONLY Message: Job failed! file:///home/hadoop/hadloop/pig_output,
Input(s):
Failed to read data from "file:///mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt"
Output(s):
Failed to produce result in "file:///home/hadoop/hadloop/pig_output"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local1458581109_0003
2021-03-07 19:05:10,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
Services running:
(base) [hadoop#dataserver 1-HBase]$ jps
4160 SecondaryNameNode
11666 Main
5413 HQuorumPeer
5766 HRegionServer
6966 JobHistoryServer
4631 NodeManager
4457 ResourceManager
5578 HMaster
3835 DataNode
12382 Jps
3615 NameNode
Hadoop version:
SUBCOMMAND may print help when invoked w/o parameters or with -h.
(base) [hadoop#dataserver 1-HBase]$ hadoop version
Hadoop 3.2.2
Source code repository Unknown -r 7a3bc90b05f257c8ace2f76d74264906f0f7a932
Compiled by hexiaoqiao on 2021-01-03T09:26Z
Compiled with protoc 2.5.0
From source with checksum 5a8f564f46624254b27f6a33126ff4
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-3.2.2.jar
HBase version:
(base) [hadoop#dataserver 1-HBase]$ hbase version
/opt/hadoop/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_USER: bad substitution
/opt/hadoop/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_OPTS: bad substitution
Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hbase/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase 2.2.0
Source code repository file:///opt/hbase-rm/output/hbase-2.2.0-bin revision=Unknown
Compiled by hbase-rm on Tue Jun 11 04:30:30 UTC 2019
From source with checksum 63a465554927aeea3f1f0bcae63decff
Pig Version:
(base) [hadoop#dataserver 1-HBase]$ pig version
2021-03-07 19:08:50,197 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
2021-03-07 19:08:50,199 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
2021-03-07 19:08:50,199 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2021-03-07 19:08:50,263 [main] INFO org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58
2021-03-07 19:08:50,263 [main] INFO org.apache.pig.Main - Logging error messages to: /mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/pig_1615154930258.log
2021-03-07 19:08:50,536 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. File version does not exist
Details at logfile: /mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/pig_1615154930258.log
2021-03-07 19:08:50,557 [main] INFO org.apache.pig.Main - Pig script completed in 400 milliseconds (400 ms)

To solve this issue you need to start a service from Yarn called Job History Server
Run this following command:
mr-jobhistory-daemon.sh start historyserver
and check if the following service is working fine through jps command:
13153 HQuorumPeer
13314 HMaster
**20242 JobHistoryServer**
5043 NameNode
6003 NodeManager
30163 Jps
5845 ResourceManager
5514 SecondaryNameNode
5227 DataNode
28510 RunJar
13519 HRegionServer

Connect sparklyr 0.8.4 to remote spark 2.2.1 connection

I'm trying to connect from R to a remote spark cluster.
The spark cluster is build on debian jessie and the R version i can install on it is at most 3.3 but I need 3.4 to be able to run FactoMineR. So I installed R on another machine and try to connect the cluster using sparklyr 0.8.4
> sc <- spark_connect(master = "spark://spark-cluster-m:7077", spark_home="/usr/lib/spark/", version="2.2.1")
Error in start_shell(master = master, spark_home = spark_home, spark_version = version, :
SPARK_HOME directory '/usr/lib/spark/' not found
spark isn't installed on the local machine but on the spark-cluster-m, it is :
jc#spark-cluster-m:/usr/lib/spark$ ls
bin conf data examples external jars LICENSE licenses NOTICE python R README.md RELEASE sbin work yarn
Have I missed something ?
The spark cluster is on google cloud (test account) and so is the VM with R. How do I verify the port spark can be connected to ?
Thanks for your clues
#user16... You're right, this particular problem seems to be solved but my way is not ended.
I installed the same spark version (2.2.1 with hadoop > 2.7)
Here is my new error message :
Error in force(code) :
Failed during initialize_connection: java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:524)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)
at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sparklyr.Invoke.invoke(invoke.scala:137)
at sparklyr.StreamHandler.handleMethodCall(stream.scala:123)
at sparklyr.StreamHandler.read(stream.scala:66)
at sparklyr.BackendHandler.channelRead0(handler.scala:51)
at sparklyr.BackendHandler.channelRead0(handler.scala:4)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)
Log: /tmp/RtmpTUh0z6/file5d231368db0_spark.log
---- Output Log ----
at io.netty.channel.nio.NioEventLoop.processS
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
18/07/21 18:24:59 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-cluster-m:7077...
18/07/21 18:24:59 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master spark-cluster-m:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to spark-cluster-m/10.142.0.3:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: spark-cluster-m/10.142.0.3:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:257)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:631)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
18/07/21 18:25:19 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
18/07/21 18:25:19 WARN StandaloneSchedulerBackend: Application ID is not initialized yet.
18/07/21 18:25:19 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46811.
18/07/21 18:25:19 INFO NettyBlockTransferService: Server created on 10.142.0.5:46811
18/07/21 18:25:19 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/07/21 18:25:19 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.142.0.5, 46811, None)
18/07/21 18:25:19 INFO BlockManagerMasterEndpoint: Registering block manager 10.142.0.5:46811 with 366.3 MB RAM, BlockManagerId(driver, 10.142.0.5, 46811, None)
18/07/21 18:25:19 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.142.0.5, 46811, None)
18/07/21 18:25:19 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.142.0.5, 46811, None)
18/07/21 18:25:19 INFO SparkUI: Stopped Spark web UI at http://10.142.0.5:4040
18/07/21 18:25:19 INFO StandaloneSchedulerBackend: Shutting down all executors
18/07/21 18:25:19 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
18/07/21 18:25:19 WARN StandaloneAppClient$ClientEndpoint: Drop Unregist
I can see it can resolve the name (=> 10.142.0.3)
Also, it seems to be the good port as if I use port 7000, i have the error :
18/07/21 18:32:54 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from spark-cluster-m/10.142.0.3:7000 is closed
18/07/21 18:32:54 WARN StandaloneAppClient$ClientEndpoint: Could not connect to spark-cluster-m:7000: java.io.IOException: Connection reset by peer
18/07/21 18:32:54 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master spark-cluster-m:7000
But I can't figure out what this means.
You say my configuration is "particular". If there is a better (and simple) approach, i would be glad to use it.
Here is how I proceeded in my tests :
I created a google dataproc cluster with spark (2.2.1)
I added Cassandra on each node
At this stage, everything works ok.
Then, i need to install FactoMineR as I'd like to try HMFA. It is said to run with R > 3.0.0 so it seems to be ok but it depends on nlme which can't be installed on R < 3.4.0 (and the one in the debian jessie backports is 3.3.)
So, what can I do ?
I must admit that i'm not very enthusiastic in restarting a full spark / cassandra cluster install from scratch...

R + Hadoop with RHadoop job fails on Single Machine Cluster

Apologies in advance for being a newbie and perhaps asking stupid questions.
I have installed Hadoop on a Single Machine Cluster (Ubuntu 14.04) and successfully tested the very basic program specified in the Apache installation guide. Subsequently I installed R, RStudio, and the packages rhdfs, rmr2 and all dependencies.
Then I have tried to run the following program :
Sys.setenv(HADOOP_CMD="/usr/local/hadoop/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar")
library('rhdfs')
library('rmr2')
hdfs.init()
small.ints = to.dfs(1:10)
mapreduce(
input = small.ints,
map = function(k, v)
{
lapply(seq_along(v), function(r){
x <- runif(v[[r]])
keyval(r,c(max(x),min(x)))
})})
the job fails and the ouput on the console is as follows
packageJobJar: [/tmp/RtmprPBBS1/rmr-local-env242520fb4125, /tmp/RtmprPBBS1/rmr-global-env24252518202b, /tmp/RtmprPBBS1/rmr-streaming-map24255b97931e, /tmp/hadoop-hduser/hadoop-unjar4430970496737933525/] [] /tmp/streamjob6651310557292596411.jar tmpDir=null
14/05/05 09:16:08 INFO mapred.FileInputFormat: Total input paths to process : 1
14/05/05 09:16:08 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hduser/mapred/local]
14/05/05 09:16:08 INFO streaming.StreamJob: Running job: job_201405050557_0013
14/05/05 09:16:08 INFO streaming.StreamJob: To kill this job, run:
14/05/05 09:16:08 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201405050557_0013
14/05/05 09:16:08 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201405050557_0013
14/05/05 09:16:09 INFO streaming.StreamJob: map 0% reduce 0%
14/05/05 09:16:41 INFO streaming.StreamJob: map 100% reduce 100%
14/05/05 09:16:41 INFO streaming.StreamJob: To kill this job, run:
14/05/05 09:16:41 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201405050557_0013
14/05/05 09:16:41 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201405050557_0013
14/05/05 09:16:41 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201405050557_0013_m_000001
14/05/05 09:16:41 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
the stderror log is as follows
Error in library(functional) : there is no package called ‘functional’
No traceback available
Error during wrapup:
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
i have tried with a few other simple, demo programs and the result is the same. so it seems that the problem lies with my configuration.
the 'functional' package was already installed and was getting loaded automatically. even loading it manually, does not help. so that is most probably not the problem.
any help or suggestions would me gratefully accepted.
i am running Hadoop 1.2.1, R 3.0.5 and RStudio 0.98.507 on Ubuntu 14.04 in a single cluster mode Java is Oracle 7 Java version 1.7.0_55
Hadoop installation seems to be OK since my regular wordcount program is working fine.
i am getting identical results with even the simplest RHadoop demo
could this be a problem with my machine capacity ? running on a slightly high end laptop machine ? 2.8 GiB Memory and Intel® Core™ i3-2310M CPU # 2.10GHz × 4 processor
i have now moved to Hadoop 2.2.0 and managed to install the same using this tutorial. The demo program for calculating PI executed without errors.
Then I executed this very simple MR program
Sys.setenv(HADOOP_CMD="/usr/local/hadoop220/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop220/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar")
library('rhdfs')
library('rmr2')
library('functional')
hdfs.init()
small.ints = to.dfs(1:10)
mapreduce(
input = small.ints,
map = function(k, v) cbind(v, v^2))
The program executed upto line 7 but failed in the all important MR step with the following error [ only the last part of the error is shown ]
14/05/06 13:53:36 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/06 13:53:36 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/06 13:53:37 INFO mapred.FileInputFormat: Total input paths to process : 1
14/05/06 13:53:37 INFO mapreduce.JobSubmitter: number of splits:2
14/05/06 13:53:37 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
14/05/06 13:53:37 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/05/06 13:53:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1399363749415_0002
14/05/06 13:53:38 INFO impl.YarnClientImpl: Submitted application application_1399363749415_0002 to ResourceManager at /0.0.0.0:8032
14/05/06 13:53:38 INFO mapreduce.Job: The url to track the job: http://yantrajaal:8088/proxy/application_1399363749415_0002/
14/05/06 13:53:38 INFO mapreduce.Job: Running job: job_1399363749415_0002
14/05/06 13:53:45 INFO mapreduce.Job: Job job_1399363749415_0002 running in uber mode : false
14/05/06 13:53:45 INFO mapreduce.Job: map 0% reduce 0%
14/05/06 13:53:57 INFO mapreduce.Job: map 100% reduce 0%
14/05/06 13:53:57 INFO mapreduce.Job: Task Id : attempt_1399363749415_0002_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
14/05/06 13:54:31 INFO mapreduce.Job: map 100% reduce 0%
14/05/06 13:54:32 INFO mapreduce.Job: Job job_1399363749415_0002 failed with state FAILED due to: Task failed task_1399363749415_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/05/06 13:54:32 INFO mapreduce.Job: Counters: 10
Job Counters
Failed map tasks=7
Killed map tasks=1
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=72476
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
14/05/06 13:54:32 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
really at my wits end on what to do next !
any suggestions on the way forward would be gratefully received and acknowledged. my suspicion is that RHadoop is perhaps not yet comfortable with Ubuntu 14.04 but that is a guess

Start your terminal and login as super user or root using
sudo su root
then start R in terminal and install the rhadoop packages using following commands
install.packages(c("codetools", "R", "Rcpp", "RJSONIO", "bitops",
"digest", "functional", "stringr", "plyr", "reshape2", "rJava"))
install.packages(c("dplyr","R.methodsS3"))
install.packages(c("Hmisc")) install.packages(c("caTools"))
Sys.setenv(HADOOP_HOME="/usr/local/hadoop")
Sys.setenv(HADOOP_CMD="/usr/local/hadoop/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/share/hadoop/tools/lib/hadoopversiomentionhere.jar")
after that install the rmr2 rhdfs2 here
after that install these downloaded source file using this command
install.packages(path_to_file, repos = NULL, type="source")
now after installing close the terminal R and then the terminal open
rstudio run the R code for streaming error will be solved as the
above steps will install the R libraries in global folders.
Optionally if u want u can install R itself being a super user for being on the safer side hope this helps

It seems to be an error with the R setup on your single machine cluster.
Is the R package functional installed on the cluster?

I sloved my problem similar to yours with method below.
Have a look at your R libraries
.libPaths()
Check which library package functional was installed to with commands below:
system.file(package="functional")
If it is installed in a personal library, instead of in a library common to all users, jobs will fail with error saying the package cannot loaded.
Hope this will help.
Cheers
Yanchang Zhao
RDataMining.com

the problem is because when you install packages as a non-root user, they end up in a private directory. this is the cause of all the problem. solution is to be logged in as root, or super-user, and then install the packages so that they end up in the system wide R library, which in my case is /usr/lib64/R/library after this, there is no more any problem. programs will work!

RHadoop Job failing on Single Node Ubuntu cluster

I am posting a similar question a second time because I believe I now have a far more precise view of the problem.
Environment : Hadoop 2.2.0 running as a Single Node Cluster on an Ubuntu 14.04 laptop machine. RStudio version 0.98.507, R version 3.0.2 (2013-09-25), Java Version 1.7.0_55
Any R (or Python) program works perfectly with the Hadoop Streaming utility located at /usr/local/hadoop220/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
Errors appear when we use package "rmr" ( part of RHadoop) and call mapreduce() from inside an R program being run in RStudio.
To simplify this post, I am showing a very simple program that fails ( other bigger programs fail with identical error messages)
Sys.setenv(HADOOP_CMD="/usr/local/hadoop220/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop220/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar")
library('rhdfs')
library('rmr2')
hdfs.init()
hdfs.ls("/user/hduser")
small.ints = to.dfs(1:1000)
mapreduce(
input = small.ints,
map = function(k, v) cbind(v, v^2))
the errors that show up on the R-Studio console are
> Sys.setenv(HADOOP_CMD="/usr/local/hadoop220/bin/hadoop")
> Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop220/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar")
> library('rhdfs')
Loading required package: rJava
HADOOP_CMD=/usr/local/hadoop220/bin/hadoop
Be sure to run hdfs.init()
> library('rmr2')
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: bitops
Loading required package: digest
Loading required package: functional
Loading required package: reshape2
Loading required package: stringr
Loading required package: plyr
Loading required package: caTools
> hdfs.init()
14/05/10 14:20:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> hdfs.ls("/user/hduser")
permission owner group size modtime file
1 drwxr-xr-x hduser supergroup 0 2014-05-07 17:44 /user/hduser/BT
2 drwxr-xr-x hduser supergroup 0 2014-05-09 07:14 /user/hduser/BT-out
3 drwxr-xr-x hduser supergroup 0 2014-05-09 20:30 /user/hduser/BTR-out
4 drwxr-xr-x hduser supergroup 0 2014-05-07 17:44 /user/hduser/BTj-in
> small.ints = to.dfs(1:1000)
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop220/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/05/10 14:20:50 WARN util.NativeCodeLoader: ... using builtin-java classes where applicable
[ these two messages repeat multiple times ]
> mapreduce(
+ input = small.ints,
+ map = function(k, v) cbind(v, v^2))
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop220/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/05/10 14:21:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/10 14:21:20 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/tmp/RtmpYCerEW/rmr-local-env282d4c7a3b53, /tmp/RtmpYCerEW/rmr-global-env282d77c9da92, /tmp/RtmpYCerEW/rmr-streaming-map282d4225651a, /tmp/hadoop-hduser/hadoop-unjar678942474363050554/] [] /tmp/streamjob8073315154972274831.jar tmpDir=null
14/05/10 14:21:21 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/10 14:21:21 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/05/10 14:21:22 INFO mapred.FileInputFormat: Total input paths to process : 1
14/05/10 14:21:22 INFO mapreduce.JobSubmitter: number of splits:2
14/05/10 14:21:22 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
14/05/10 14:21:22 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/05/10 14:21:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1399709731242_0003
14/05/10 14:21:23 INFO impl.YarnClientImpl: Submitted application application_1399709731242_0003 to ResourceManager at /0.0.0.0:8032
14/05/10 14:21:23 INFO mapreduce.Job: The url to track the job: http://yantrajaal:8088/proxy/application_1399709731242_0003/
14/05/10 14:21:23 INFO mapreduce.Job: Running job: job_1399709731242_0003
14/05/10 14:21:30 INFO mapreduce.Job: Job job_1399709731242_0003 running in uber mode : false
14/05/10 14:21:30 INFO mapreduce.Job: map 0% reduce 0%
14/05/10 14:21:43 INFO mapreduce.Job: Task Id : attempt_1399709731242_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/05/10 14:21:44 INFO mapreduce.Job: Task Id : attempt_1399709731242_0003_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/05/10 14:22:04 INFO mapreduce.Job: Task Id : attempt_1399709731242_0003_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
14/05/10 14:22:04 INFO mapreduce.Job: Task Id : attempt_1399709731242_0003_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/05/10 14:22:17 INFO mapreduce.Job: Task Id : attempt_1399709731242_0003_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/05/10 14:22:17 INFO mapreduce.Job: Task Id : attempt_1399709731242_0003_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/05/10 14:22:26 INFO mapreduce.Job: map 100% reduce 0%
14/05/10 14:22:26 INFO mapreduce.Job: Job job_1399709731242_0003 failed with state FAILED due to: Task failed task_1399709731242_0003_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/05/10 14:22:26 INFO mapreduce.Job: Counters: 10
Job Counters
Failed map tasks=7
Killed map tasks=1
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=91997
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
14/05/10 14:22:26 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
>
I have Googled the two irritating warnings
(a) disabled stack guard and have discovered from this link that there is "nothing to worry about" just a warning
(b) Unable to load native-hadoop library for your platform... using builtin-java classes where applicble .. this is also a warning as per this link nothing to worry about
After discounting these two warnings as not being the cause, the main error that I find is here
14/05/10 14:21:43 INFO mapreduce.Job: Task Id : attempt_1399709731242_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
i have reinstalled the RHadoop packages, rmr and rhdfs and have also reinstalled rJava. Have in the past tried with Hadoop 1.3 as well, but errors are same.
would be really grateful if someone can suggest some way forward on this

I resolved the problem by changing the directory of installation of rmr2, rhdfs,... packages.
Basically you need to install all the packages in a system folder instead of custom folder.
There seem to be a problem with the location of installation. Initially I installed the packages in a custom folder:
/home/user/R/3.1
By re-installing the packages at:
/usr/lib/R/library
I got the code working.
I hope this will be of some help.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Getting error while running map reduce jobs in R - r

Related

API Manager 4.1, Management Console cant force Restart

STORE relation problem using pig -x local problem, failed to read data

Connect sparklyr 0.8.4 to remote spark 2.2.1 connection

R + Hadoop with RHadoop job fails on Single Machine Cluster

RHadoop Job failing on Single Node Ubuntu cluster

Categories

Resources