The nexus is located in kubernetis, before that it was updated without problems. At the moment I get the following error, no matter which version I upgrade to 3.41.1/3.42.0.
Log output from version 3.41.1:
-------------------------------------------------
Started Sonatype Nexus OSS 3.41.1-01
-------------------------------------------------
2022-10-19 11:34:30,682+0000 INFO [jetty-main-1] *SYSTEM org.eclipse.jetty.server.AbstractConnector - Started ServerConnector#31451f5e{HTTP/1.1, (http/1.1)}{0.0.0.0:8086}
2022-10-19 11:34:30,686+0000 INFO [jetty-main-1] *SYSTEM org.eclipse.jetty.server.AbstractConnector - Started ServerConnector#2105be54{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2022-10-19 11:34:30,689+0000 INFO [jetty-main-1] *SYSTEM org.eclipse.jetty.server.AbstractConnector - Started ServerConnector#5f41ab0d{HTTP/1.1, (http/1.1)}{0.0.0.0:8085}
2022-10-19 11:34:31,834+0000 WARN [Timer-0] *SYSTEM java.util.prefs - Could not lock User prefs. Unix error code 2.
2022-10-19 11:34:31,835+0000 WARN [Timer-0] *SYSTEM java.util.prefs - Couldn't flush user prefs: java.util.prefs.BackingStoreException: Couldn't get file lock.
2022-10-19 11:34:48,228+0000 INFO [qtp491323871-560] *UNKNOWN org.apache.shiro.session.mgt.AbstractValidatingSessionManager - Enabling session validation scheduler...
2022-10-19 11:34:48,238+0000 INFO [qtp491323871-556] *UNKNOWN org.sonatype.nexus.internal.security.anonymous.AnonymousManagerImpl - Loaded configuration: OrientAnonymousConfiguration{enabled=true, userId='anonymous', realmName='NexusAuthorizingRealm'}
2022-10-19 11:35:01,488+0000 WARN [Timer-0] *SYSTEM java.util.prefs - Could not lock User prefs. Unix error code 2.
2022-10-19 11:35:01,489+0000 WARN [Timer-0] *SYSTEM java.util.prefs - Couldn't flush user prefs: java.util.prefs.BackingStoreException: Couldn't get file lock.
2022-10-19 11:35:31,489+0000 WARN [Timer-0] *SYSTEM java.util.prefs - Could not lock User prefs. Unix error code 2.
2022-10-19 11:35:31,489+0000 WARN [Timer-0] *SYSTEM java.util.prefs - Couldn't flush user prefs: java.util.prefs.BackingStoreException: Couldn't get file lock.
2022-10-19 11:36:01,489+0000 WARN [Timer-0] *SYSTEM java.util.prefs - Could not lock User prefs. Unix error code 2.
2022-10-19 11:36:01,490+0000 WARN [Timer-0] *SYSTEM java.util.prefs - Couldn't flush user prefs: java.util.prefs.BackingStoreException: Couldn't get file lock.
2022-10-19 11:36:18,170+0000 WARN [SIGTERM handler] *SYSTEM com.orientechnologies.orient.core.OSignalHandler - Received signal: SIGTERM
I get a similar error when updating to 3.42.0.
What could be the problem?
1st approach: Using pig -x mapreduce
Hbase table created via hbase shell
Hbase table is created:
hbase(main):003:0> list
TABLE
clientes
1 row(s)
Took 0.0047 seconds
=> ["clientes"]
Used this code to Load data from clientes.txt into dados (pig -x mapreduce)
grunt> dados = LOAD 'file:///mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt' USING PigStorage(',') AS (
id:chararray,
nome:chararray,
sobrenome:chararray,
idade:int,
funcao:chararray
);
Checked dados with dump dados and failed:
2021-03-07 19:00:32,390 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1615152557282_0002
2021-03-07 19:00:32,390 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases dados
2021-03-07 19:00:32,390 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: dados[1,8],dados[-1,-1] C: R:
2021-03-07 19:00:32,395 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2021-03-07 19:00:37,406 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2021-03-07 19:00:37,406 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1615152557282_0002 has failed! Stop running all dependent jobs
2021-03-07 19:00:37,406 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2021-03-07 19:00:37,410 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2021-03-07 19:00:37,492 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Could not get Job info from RM for job job_1615152557282_0002. Redirecting to job history server.
2021-03-07 19:00:37,595 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
2021-03-07 19:00:37,595 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2021-03-07 19:00:37,597 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.2.2 0.17.0 hadoop 2021-03-07 19:00:31 2021-03-07 19:00:37 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1615152557282_0002 dados MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Wrong FS: hdfs://localhost:9000/user/hadoop, expected: file:///
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:294)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1565)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1562)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1562)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.pig.backend.hadoop.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop.PigJobControl.run(PigJobControl.java:205)
at java.lang.Thread.run(Thread.java:748)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:301)
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:9000/user/hadoop, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:737)
at org.apache.hadoop.fs.RawLocalFileSystem.setWorkingDirectory(RawLocalFileSystem.java:604)
at org.apache.hadoop.fs.FilterFileSystem.setWorkingDirectory(FilterFileSystem.java:307)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:250)
... 18 more
hdfs://localhost:9000/tmp/temp-1169299097/tmp-2103156722,
Input(s):
Failed to read data from "file:///mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt"
Output(s):
Failed to produce result in "hdfs://localhost:9000/tmp/temp-1169299097/tmp-2103156722"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1615152557282_0002
2021-03-07 19:00:37,597 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2021-03-07 19:00:37,601 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias dados. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
Details at logfile: /mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/pig_1615154395936.log
2nd approach: Using pig -x local (dump dados works)
grunt> dados = LOAD 'file:///mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt' USING PigStorage(',') AS (
>> id:chararray,
>> nome:chararray,
>> sobrenome:chararray,
>> idade:int,
>> funcao:chararray
>> );
2021-03-07 19:02:17,219 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2021-03-07 19:02:17,222 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt:0+794
2021-03-07 19:02:17,226 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 2
2021-03-07 19:02:17,226 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2021-03-07 19:02:17,241 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2021-03-07 19:02:17,243 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2021-03-07 19:02:17,253 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: dados[1,8],dados[-1,-1] C: R:
2021-03-07 19:02:17,266 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner -
2021-03-07 19:02:17,274 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task:attempt_local116575577_0001_m_000000_0 is done. And is in the process of committing
2021-03-07 19:02:17,280 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner -
2021-03-07 19:02:17,280 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task attempt_local116575577_0001_m_000000_0 is allowed to commit now
2021-03-07 19:02:17,285 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local116575577_0001_m_000000_0' to file:/tmp/temp2133275539/tmp1539690224
2021-03-07 19:02:17,286 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - map
2021-03-07 19:02:17,286 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local116575577_0001_m_000000_0' done.
2021-03-07 19:02:17,291 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Final Counters for attempt_local116575577_0001_m_000000_0: Counters: 16
File System Counters
FILE: Number of bytes read=1264
FILE: Number of bytes written=530456
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=20
Map output records=20
Input split bytes=414
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=311427072
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
org.apache.pig.PigWarning
FIELD_DISCARDED_TYPE_CONVERSION_FAILED=1
2021-03-07 19:02:17,291 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local116575577_0001_m_000000_0
2021-03-07 19:02:17,291 [Thread-7] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2021-03-07 19:02:17,485 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:02:17,492 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:02:17,492 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2021-03-07 19:02:17,492 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2021-03-07 19:02:17,493 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:02:17,536 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2021-03-07 19:02:17,540 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.2.2 0.17.0 hadoop 2021-03-07 19:02:16 2021-03-07 19:02:17 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_local116575577_0001 1 0 n/a n/a n/a n/a 0 0 0 0 dados MAP_ONLY file:/tmp/temp2133275539/tmp1539690224,
Input(s):
Successfully read 20 records from: "file:///mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt"
Output(s):
Successfully stored 20 records in: "file:/tmp/temp2133275539/tmp1539690224"
Counters:
Total records written : 20
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local116575577_0001
2021-03-07 19:02:17,542 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:02:17,544 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:02:17,551 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:02:17,558 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 1 time(s).
2021-03-07 19:02:17,558 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2021-03-07 19:02:17,563 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2021-03-07 19:02:17,563 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2021-03-07 19:02:17,570 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
2021-03-07 19:02:17,570 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(id,nome,sobrenome,,funcao)
(c001,Josias,Silva,55,Analista de Mercado)
(1100002,Pedro,Malan,74,Professor)
(1100003,Maria,Maciel,34,Bombeiro)
(1100004,Suzana,Bustamante,66,Analista de TI)
(1100005,Karen,Moreira,74,Advogado)
(1100006,Patricio,Teixeira,42,Veterinario)
(1100007,Elisa,Haniero,43,Piloto)
(1100008,Mauro,Bender,63,Marceneiro)
(1100009,Mauricio,Wagner,39,Artista)
(1100010,Douglas,Macedo,60,Escritor)
(1100011,Francisco,McNamara,47,Cientista de Dados)
(1100012,Sidney,Raynor,26,Escritor)
(1100013,Maria,Moon,41,Gerente de Projetos)
(1100014,Bete,Balanaira,65,Musico)
(1100015,Julia,Peixoto,49,Especialista em TI)
(1100016,Jeronimo,Wallace,52,Engenheiro de Dados)
(1100017,Noeli,Laura,72,Cientista de Dados)
(1100018,Jean,Junior,45,Desenvolvedor RPA)
(1100019,Cristina,Garbim,63,Engenheiro Blockchain)
But STORE dados INTO 'hbase://clientes' or STORE dados INTO 'file:///home/hadoop/hadloop/pig_output' fails:
grunt> STORE dados INTO 'hbase://clientes' USING
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>> 'dados_clientes:nome
>> dados_clientes:sobrenome
>> dados_clientes:idade
>> dados_clientes:funcao'
>> );
2021-03-07 19:03:51,347 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1289080477_0002
2021-03-07 19:03:51,347 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases dados
2021-03-07 19:03:51,347 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: dados[1,8],dados[-1,-1] C: R:
2021-03-07 19:03:51,349 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2021-03-07 19:03:51,349 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local1289080477_0002]
2021-03-07 19:03:51,835 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for clientes
2021-03-07 19:03:51,839 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2021-03-07 19:03:51,839 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2021-03-07 19:03:51,843 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: dados[1,8],dados[-1,-1] C: R:
2021-03-07 19:03:51,860 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation - Closing zookeeper sessionid=0x1780e985b4d000f
2021-03-07 19:03:51,866 [LocalJobRunner Map Task Executor #0] INFO org.apache.zookeeper.ZooKeeper - Session: 0x1780e985b4d000f closed
2021-03-07 19:03:51,866 [LocalJobRunner Map Task Executor #0-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1780e985b4d000f
2021-03-07 19:03:51,867 [Thread-10] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2021-03-07 19:03:51,870 [Thread-10] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1289080477_0002
java.lang.Exception: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:83)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:144)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:275)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:992)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75)
... 18 more
2021-03-07 19:03:52,055 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2021-03-07 19:03:52,055 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1289080477_0002 has failed! Stop running all dependent jobs
2021-03-07 19:03:52,055 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2021-03-07 19:03:52,056 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:03:52,057 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:03:52,057 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2021-03-07 19:03:52,058 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.2.2 0.17.0 hadoop 2021-03-07 19:03:50 2021-03-07 19:03:52 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local1289080477_0002 dados MAP_ONLY Message: Job failed! hbase://clientes,
Input(s):
Failed to read data from "file:///mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt"
Output(s):
Failed to produce result in "hbase://clientes"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local1289080477_0002
2021-03-07 19:03:52,058 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
grunt> STORE dados INTO 'file:///home/hadoop/hadloop/pig_output' USING
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>> 'dados_clientes:nome
>> dados_clientes:sobrenome
>> dados_clientes:idade
>> dados_clientes:funcao'
>> );
java.lang.Exception: java.lang.IllegalArgumentException: Illegal character code:47, </> at 0. User-space table qualifiers can only contain 'alphanumeric characters': i.e. [a-zA-Z_0-9-.]: ///home/hadoop/hadloop/pig_output
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
Caused by: java.lang.IllegalArgumentException: Illegal character code:47, </> at 0. User-space table qualifiers can only contain 'alphanumeric characters': i.e. [a-zA-Z_0-9-.]: ///home/hadoop/hadloop/pig_output
at org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:196)
at org.apache.hadoop.hbase.TableName.isLegalTableQualifierName(TableName.java:149)
at org.apache.hadoop.hbase.TableName.<init>(TableName.java:322)
at org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:358)
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:449)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.<init>(TableOutputFormat.java:107)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.getRecordWriter(TableOutputFormat.java:153)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:83)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:659)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-03-07 19:05:10,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1458581109_0003
2021-03-07 19:05:10,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases dados
2021-03-07 19:05:10,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: dados[1,8],dados[-1,-1] C: R:
2021-03-07 19:05:10,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2021-03-07 19:05:10,477 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2021-03-07 19:05:10,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1458581109_0003 has failed! Stop running all dependent jobs
2021-03-07 19:05:10,478 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2021-03-07 19:05:10,478 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:05:10,479 [main] WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl - JobTracker metrics system already initialized!
2021-03-07 19:05:10,480 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2021-03-07 19:05:10,480 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.2.2 0.17.0 hadoop 2021-03-07 19:05:10 2021-03-07 19:05:10 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local1458581109_0003 dados MAP_ONLY Message: Job failed! file:///home/hadoop/hadloop/pig_output,
Input(s):
Failed to read data from "file:///mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/clientes.txt"
Output(s):
Failed to produce result in "file:///home/hadoop/hadloop/pig_output"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local1458581109_0003
2021-03-07 19:05:10,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
Services running:
(base) [hadoop#dataserver 1-HBase]$ jps
4160 SecondaryNameNode
11666 Main
5413 HQuorumPeer
5766 HRegionServer
6966 JobHistoryServer
4631 NodeManager
4457 ResourceManager
5578 HMaster
3835 DataNode
12382 Jps
3615 NameNode
Hadoop version:
SUBCOMMAND may print help when invoked w/o parameters or with -h.
(base) [hadoop#dataserver 1-HBase]$ hadoop version
Hadoop 3.2.2
Source code repository Unknown -r 7a3bc90b05f257c8ace2f76d74264906f0f7a932
Compiled by hexiaoqiao on 2021-01-03T09:26Z
Compiled with protoc 2.5.0
From source with checksum 5a8f564f46624254b27f6a33126ff4
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-3.2.2.jar
HBase version:
(base) [hadoop#dataserver 1-HBase]$ hbase version
/opt/hadoop/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_USER: bad substitution
/opt/hadoop/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_OPTS: bad substitution
Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hbase/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase 2.2.0
Source code repository file:///opt/hbase-rm/output/hbase-2.2.0-bin revision=Unknown
Compiled by hbase-rm on Tue Jun 11 04:30:30 UTC 2019
From source with checksum 63a465554927aeea3f1f0bcae63decff
Pig Version:
(base) [hadoop#dataserver 1-HBase]$ pig version
2021-03-07 19:08:50,197 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
2021-03-07 19:08:50,199 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
2021-03-07 19:08:50,199 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2021-03-07 19:08:50,263 [main] INFO org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58
2021-03-07 19:08:50,263 [main] INFO org.apache.pig.Main - Logging error messages to: /mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/pig_1615154930258.log
2021-03-07 19:08:50,536 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. File version does not exist
Details at logfile: /mnt/win/GD/DS/1Formacao/3EngenhariaDeDadosHadoop/07/Arquivos/1-HBase/pig_1615154930258.log
2021-03-07 19:08:50,557 [main] INFO org.apache.pig.Main - Pig script completed in 400 milliseconds (400 ms)
To solve this issue you need to start a service from Yarn called Job History Server
Run this following command:
mr-jobhistory-daemon.sh start historyserver
and check if the following service is working fine through jps command:
13153 HQuorumPeer
13314 HMaster
**20242 JobHistoryServer**
5043 NameNode
6003 NodeManager
30163 Jps
5845 ResourceManager
5514 SecondaryNameNode
5227 DataNode
28510 RunJar
13519 HRegionServer
I've set 'execution_timeout': timedelta(seconds=300) parameter on many tasks. When the execution timeout is set on task downloading data from Google Analytics it works properly - after ~300 seconds is the task set to failed. The task downloads some data from API (python), then it does some transformations (python) and loads data into PostgreSQL.
Then I've a task which executes only one PostgreSQL function - execution sometimes takes more than 300 seconds but I get this (task is marked as finished successfully).
*** Reading local file: /home/airflow/airflow/logs/bulk_replication_p2p_realtime/t1/2020-07-20T00:05:00+00:00/1.log
[2020-07-20 05:05:35,040] {__init__.py:1139} INFO - Dependencies all met for <TaskInstance: bulk_replication_p2p_realtime.t1 2020-07-20T00:05:00+00:00 [queued]>
[2020-07-20 05:05:35,051] {__init__.py:1139} INFO - Dependencies all met for <TaskInstance: bulk_replication_p2p_realtime.t1 2020-07-20T00:05:00+00:00 [queued]>
[2020-07-20 05:05:35,051] {__init__.py:1353} INFO -
--------------------------------------------------------------------------------
[2020-07-20 05:05:35,051] {__init__.py:1354} INFO - Starting attempt 1 of 1
[2020-07-20 05:05:35,051] {__init__.py:1355} INFO -
--------------------------------------------------------------------------------
[2020-07-20 05:05:35,098] {__init__.py:1374} INFO - Executing <Task(PostgresOperator): t1> on 2020-07-20T00:05:00+00:00
[2020-07-20 05:05:35,099] {base_task_runner.py:119} INFO - Running: ['airflow', 'run', 'bulk_replication_p2p_realtime', 't1', '2020-07-20T00:05:00+00:00', '--job_id', '958216', '--raw', '-sd', 'DAGS_FOLDER/bulk_replication_p2p_realtime.py', '--cfg_path', '/tmp/tmph11tn6fe']
[2020-07-20 05:05:37,348] {base_task_runner.py:101} INFO - Job 958216: Subtask t1 [2020-07-20 05:05:37,347] {settings.py:182} INFO - settings.configure_orm(): Using pool settings. pool_size=10, pool_recycle=1800, pid=26244
[2020-07-20 05:05:39,503] {base_task_runner.py:101} INFO - Job 958216: Subtask t1 [2020-07-20 05:05:39,501] {__init__.py:51} INFO - Using executor LocalExecutor
[2020-07-20 05:05:39,857] {base_task_runner.py:101} INFO - Job 958216: Subtask t1 [2020-07-20 05:05:39,856] {__init__.py:305} INFO - Filling up the DagBag from /home/airflow/airflow/dags/bulk_replication_p2p_realtime.py
[2020-07-20 05:05:39,894] {base_task_runner.py:101} INFO - Job 958216: Subtask t1 [2020-07-20 05:05:39,894] {cli.py:517} INFO - Running <TaskInstance: bulk_replication_p2p_realtime.t1 2020-07-20T00:05:00+00:00 [running]> on host dwh2-airflow-dev
[2020-07-20 05:05:39,938] {postgres_operator.py:62} INFO - Executing: CALL dw_system.bulk_replicate(p_graph_name=>'replication_p2p_realtime',p_group_size=>4 , p_group=>1, p_dag_id=>'bulk_replication_p2p_realtime', p_task_id=>'t1')
[2020-07-20 05:05:39,960] {logging_mixin.py:95} INFO - [2020-07-20 05:05:39,953] {base_hook.py:83} INFO - Using connection to: id: postgres_warehouse. Host: XXX Port: 5432, Schema: XXXX Login: XXX Password: XXXXXXXX, extra: {}
[2020-07-20 05:05:39,973] {logging_mixin.py:95} INFO - [2020-07-20 05:05:39,972] {dbapi_hook.py:171} INFO - CALL dw_system.bulk_replicate(p_graph_name=>'replication_p2p_realtime',p_group_size=>4 , p_group=>1, p_dag_id=>'bulk_replication_p2p_realtime', p_task_id=>'t1')
[2020-07-20 05:23:21,450] {logging_mixin.py:95} INFO - [2020-07-20 05:23:21,449] {timeout.py:42} ERROR - Process timed out, PID: 26244
[2020-07-20 05:23:36,453] {logging_mixin.py:95} INFO - [2020-07-20 05:23:36,452] {jobs.py:2562} INFO - Task exited with return code 0
Does anyone know how to enforce execution timeout out for such long running functions? It seems that the execution timeout is evaluated once the PG function finish.
Airflow uses the signal module from the standard library to affect a timeout. In Airflow it's used to hook into these system signals and request that the calling process be notified in N seconds and, should the process still be inside the context (see the __enter__ and __exit__ methods on the class) it will raise an AirflowTaskTimeout exception.
Unfortunately for this situation, there are certain classes of system operations that cannot be interrupted. This is actually called out in the signal documentation:
A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.
To which we say "But I'm not doing a long-running calculation in C!" -- yeah for Airflow this is almost always due to uninterruptable I/O operations.
The highlighted sentence above (emphasis mine) nicely explains why the handler is still triggered even after the task is allowed to (frustratingly!) finish, well beyond your requested timeout.
We were deleting old docker images with keeping last 10 of them. We tried Compact blob store task to delete them physically but on the administration/Repository settings, Blob store still shows the same size after deleting images.
This is the compact blob store log:
2018-06-28 14:18:40,709+0200 INFO [quartz-6-thread-20] *SYSTEM org.sonatype.nexus.blobstore.compact.internal.CompactBlobStoreTask - Task information:
2018-06-28 14:18:40,712+0200 INFO [quartz-6-thread-20] *SYSTEM org.sonatype.nexus.blobstore.compact.internal.CompactBlobStoreTask - ID: 2bf9a574-f3e6-4f8e-8351-d98e4abc5103
2018-06-28 14:18:40,712+0200 INFO [quartz-6-thread-20] *SYSTEM org.sonatype.nexus.blobstore.compact.internal.CompactBlobStoreTask - Type: blobstore.compact
2018-06-28 14:18:40,712+0200 INFO [quartz-6-thread-20] *SYSTEM org.sonatype.nexus.blobstore.compact.internal.CompactBlobStoreTask - Name: cbs
2018-06-28 14:18:40,712+0200 INFO [quartz-6-thread-20] *SYSTEM org.sonatype.nexus.blobstore.compact.internal.CompactBlobStoreTask - Description: Compacting default blob store
2018-06-28 14:18:40,713+0200 INFO [quartz-6-thread-20] *SYSTEM org.sonatype.nexus.blobstore.file.FileBlobStore - Deletions index file rebuild not required
2018-06-28 14:18:40,713+0200 INFO [quartz-6-thread-20] *SYSTEM org.sonatype.nexus.blobstore.file.FileBlobStore - Begin deleted blobs processing
2018-06-28 14:18:41,551+0200 INFO [quartz-6-thread-20] *SYSTEM org.sonatype.nexus.blobstore.file.FileBlobStore - Elapsed time: 837.6 ms, processed: 45/45
2018-06-28 14:18:41,551+0200 INFO [quartz-6-thread-20] *SYSTEM org.sonatype.nexus.blobstore.compact.internal.CompactBlobStoreTask - Task complete
Docker layers can be shared across many different images, so the layers associated with an image are not deleted automatically when you delete an image. First run a "docker - delete unused manifests and images" task, then try running the compact blobstore again.