how to cluster wso2am-analytics-2.0.0 - wso2-api-manager

We are building wso2am in cluster mode.
Is there any document about building wso2am-analytics cluster ?
I have tried to use wso2das, reference as below.
https://docs.wso2.com/display/DAS310/Working+with+Product+Specific+Analytics+Profiles
But get the error as below
TID: [-1234] [] [2016-12-09 15:00:00,101] ERROR {org.wso2.carbon.analytics.spark.core.AnalyticsTask} - Error while executing the scheduled task for the script: APIM_LATENCY_BREAKDOWN_STATS {org.wso2.carbon.analytics.spark.core.AnalyticsTask}
org.wso2.carbon.analytics.spark.core.exception.AnalyticsExecutionException: Exception in executing query CREATE TEMPORARY TABLE APIMGT_PERHOUR_EXECUTION_TIME USING CarbonAnalytics OPTIONS(tableName "ORG_WSO2_APIMGT_STATISTICS_PERHOUREXECUTIONTIMES", schema " year INT -i, month INT -i, day INT -i, hour INT -i, context STRING, api_version STRING, api STRING, tenantDomain STRING, apiPublisher STRING, apiResponseTime DOUBLE, securityLatency DOUBLE, throttlingLatency DOUBLE, requestMediationLatency DOUBLE, responseMediationLatency DOUBLE, backendLatency DOUBLE, otherLatency DOUBLE, firstEventTime LONG, _timestamp LONG -i", primaryKeys "year, month, day, hour, context, api_version, tenantDomain, apiPublisher", incrementalProcessing "APIMGT_PERHOUR_EXECUTION_TIME, DAY", mergeSchema "false")
at org.wso2.carbon.analytics.spark.core.internal.SparkAnalyticsExecutor.executeQueryLocal(SparkAnalyticsExecutor.java:764)
at org.wso2.carbon.analytics.spark.core.internal.SparkAnalyticsExecutor.executeQuery(SparkAnalyticsExecutor.java:721)
at org.wso2.carbon.analytics.spark.core.CarbonAnalyticsProcessorService.executeQuery(CarbonAnalyticsProcessorService.java:201)
at org.wso2.carbon.analytics.spark.core.CarbonAnalyticsProcessorService.executeScript(CarbonAnalyticsProcessorService.java:151)
at org.wso2.carbon.analytics.spark.core.AnalyticsTask.execute(AnalyticsTask.java:60)
at org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:67)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Unknown options : incrementalprocessing
at org.wso2.carbon.analytics.spark.core.sources.AnalyticsRelationProvider.checkParameters(AnalyticsRelationProvider.java:123)
at org.wso2.carbon.analytics.spark.core.sources.AnalyticsRelationProvider.setParameters(AnalyticsRelationProvider.java:113)
at org.wso2.carbon.analytics.spark.core.sources.AnalyticsRelationProvider.createRelation(AnalyticsRelationProvider.java:75)
at org.wso2.carbon.analytics.spark.core.sources.AnalyticsRelationProvider.createRelation(AnalyticsRelationProvider.java:45)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.execution.datasources.CreateTempTableUsing.run(ddl.scala:92)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at org.wso2.carbon.analytics.spark.core.internal.SparkAnalyticsExecutor.executeQueryLocal(SparkAnalyticsExecutor.java:760)
... 11 more
======================================================
Any suggestion will be appreciated !!

Just to be clear of what you are asking. Are you using DAS or the wso2am-analytics that's shipped along the wso2am-2.0.0?
If you are using wso2am-2.0.0 along with the wso2das-3.1.0 then there is an issue with the spark analytics scripts that comes with the DAS.
This is due to the use of incremental processing. IncrementalProcessing should be changed to IncrementalParams.
You can see this has been fixed by wso2 here, but has not been released yet.
You can update the scripts from the DAS carbon console under main/Batch Analytics/scripts

Tks for ur response!! We need keeping summarized data in statdb,so which table should be purged ? I have found a link as below http://www.rukspot.com/Publishing_APIM_1100_Runtime_Statistics_to_DAS.html
It mentioned that just purge below table
ORG_WSO2_APIMGT_STATISTICS_DESTINATION
ORG_WSO2_APIMGT_STATISTICS_FAULT
ORG_WSO2_APIMGT_STATISTICS_REQUEST
ORG_WSO2_APIMGT_STATISTICS_RESPONSE
ORG_WSO2_APIMGT_STATISTICS_WORKFLOW
ORG_WSO2_APIMGT_STATISTICS_THROTTLE

Related

Contstraint stream breaks when switching from OptaPlanner 7.46.0 to 8.0.0

I have a constraint that crashes in the latest OptaPlanner 8.0.0, but used to work fine on 7.46.0.
As expected, IntelliJ's code inspection (and the debugger) shows that after the first join, the stream is a TriConstraintStream. The runtime class makes more sense to me than the class OptaPlanner is trying to cast to.
When leaving out the last groupBy the error goes away, so that clause seems to cause the issue.
Did something change in the way join and groupby worked?
It seems that the underlying OptaPlanner code was refactored for 8.0.0, so I have trouble seeing what exactly changed in OptaPlanner.
Should I add something to ensure that a TriJoin is used instead of a BiJoin?
I could not find any relevant notes in the migration documentation.
protected Constraint preventProductionShortage(ConstraintFactory factory) {
return factory.from(Demand.class)
.groupBy(Demand::getSKU,
Demand::getWeekNumber
)//BiConstraintStream
.join(Demand.class,
equal((sku, weekNumber)-> sku, Demand::getSKU),
greaterThanOrEqual((sku, weekNumber)-> weekNumber, Demand::getWeekNumber)//TriConstraintStream
)
.groupBy((sku, weekNumber, totalDemand) -> sku,
(sku, weekNumber, totalDemand) -> weekNumber,
sum((sku, weekNumber, totalDemand) -> totalDemand.getOrderQuantity())
)//TriConstraintStream
.penalize("Penalty", HardMediumSoftScore.ONE_MEDIUM,
(sku_weekNumber, demandQty, productionQty) -> 1);
}
Stack trace:
java.lang.ClassCastException: class org.optaplanner.core.impl.score.stream.tri.CompositeTriJoiner cannot be cast to class org.optaplanner.core.impl.score.stream.bi.AbstractBiJoiner (org.optaplanner.core.impl.score.stream.tri.CompositeTriJoiner and org.optaplanner.core.impl.score.stream.bi.AbstractBiJoiner are in unnamed module of loader 'app')
at org.optaplanner.core.impl.score.stream.drools.common.rules.BiJoinMutator.<init>(BiJoinMutator.java:40)
at org.optaplanner.core.impl.score.stream.drools.common.rules.UniRuleAssembler.join(UniRuleAssembler.java:70)
at org.optaplanner.core.impl.score.stream.drools.common.rules.AbstractRuleAssembler.join(AbstractRuleAssembler.java:179)
at org.optaplanner.core.impl.score.stream.drools.common.ConstraintSubTree.getRuleAssembler(ConstraintSubTree.java:94)
at org.optaplanner.core.impl.score.stream.drools.common.ConstraintSubTree.getRuleAssembler(ConstraintSubTree.java:89)
at org.optaplanner.core.impl.score.stream.drools.common.ConstraintGraph.generateRule(ConstraintGraph.java:431)
at org.optaplanner.core.impl.score.stream.drools.common.ConstraintGraph.lambda$generateRule$57(ConstraintGraph.java:423)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
at org.optaplanner.core.impl.score.stream.drools.common.ConstraintGraph.generateRule(ConstraintGraph.java:424)
at org.optaplanner.core.impl.score.stream.drools.DroolsConstraintFactory.buildSessionFactory(DroolsConstraintFactory.java:101)
at org.optaplanner.core.impl.score.director.stream.ConstraintStreamScoreDirectorFactory.<init>(ConstraintStreamScoreDirectorFactory.java:77)
at org.optaplanner.test.impl.score.stream.DefaultConstraintVerifier.verifyThat(DefaultConstraintVerifier.java:63)
at org.optaplanner.test.impl.score.stream.DefaultConstraintVerifier.verifyThat(DefaultConstraintVerifier.java:32)
at com.ohly.planner.constraints.ConstraintsTest.weekShortageSingleSKU(ConstraintsTest.java:61)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)
Process finished with exit code -1
[edit] For completeness, the new function as suggested by Lukáš Petrovický
protected Constraint preventProductionShortage(ConstraintFactory factory) {
return factory.from(Demand.class)
.join(Demand.class,
equal(Demand::getSKU),
greaterThanOrEqual(demand -> demand.getWeekNumber()))
.groupBy((d, d2) -> d.getSKU(),
(d, d2) -> d.getWeekNumber(),
sum((d,d2) -> d2.getOrderQuantity())
)
...
[/edit]
This was an unfortunate bug not caught by the existing test coverage.
The fix is aimed at OptaPlanner 8.0.1, incl. test coverage improvement.
That said, I would argue that the constraint is not very efficient. Unless I'm missing some key implications, the following is semantically very similar, yet much faster:
protected Constraint preventProductionShortage(ConstraintFactory factory) {
return factory.from(Demand.class)
.join(Demand.class,
equal(Demand::getSKU),
greaterThanOrEqual(demand -> demand.getWeekNumber()))
.groupBy(..., ..., sum((demand, demand2) -> ...))
.penalize("Penalty", HardMediumSoftScore.ONE_MEDIUM);
}
Note how I eliminated the first use of groupBy(). There may be some difference though in how many tuples are penalized this way, which may or may not be what you want. Feel free to open another question on that.

Resource 7bed8adc-9ed9-49dc-b15e-6660e2fc3285 transitioned to failure state ERROR when use openstacksdk to create_server

When I create the openstack server, I get bellow Exception:
Resource 7bed8adc-9ed9-49dc-b15e-6660e2fc3285 transitioned to failure state ERROR
My code is bellow:
server_args = {
"name":server_name,
"image_id":image_id,
"flavor_id":flavor_id,
"networks":[{"uuid":network.id}],
"admin_password": admin_password,
}
try:
server = user_conn.conn.compute.create_server(**server_args)
server = user_conn.conn.compute.wait_for_server(server)
except Exception as e: # there I except the Exception
raise e
When create_server, my server_args data is bellow:
{'flavor_id': 'd4424892-4165-494e-bedc-71dc97a73202', 'networks': [{'uuid': 'da4e3433-2b21-42bb-befa-6e1e26808a99'}], 'admin_password': '123456', 'name': '133456', 'image_id': '60f4005e-5daf-4aef-a018-4c6b2ff06b40'}
My openstacksdk version is 0.9.18.
In the end, I find the flavor data is too big for openstack compute node, so I changed it to a small flavor, so I create success.

Spark using map in cluster mode

I have a immutable map in my class. When I run my code in local mode, there is no problem and I can reach every key in the map. However, when I run my code in cluster mode, nodes throw error about not finding the key in the map.
What I've tried up to now are these;
-Broadcast the immutable map over cluster.
broadcast = sc.broadcast(my_immutable_map)
-Parallelize the map as pair RDD
my_map_rdd = sc.parallelize( my_immutable_map.toSeq)
When i examine the logs, I see key not found exception.
My error stacktrace is as follows:
Driver stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 15.0 failed 4 times, most recent failure: Lost task 1.3 in stage 15.0 (TID 25, datanode1.big.com): java.util.NoSuchElementException: key not found: 905053199731
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:58)
at havelsan.CDRGenerator$.generate_random_target(CDRGenerator.scala:95)
at havelsan.CDRGenerator$$anonfun$main$2$$anonfun$6.apply(CDRGenerator.scala:167)
at havelsan.CDRGenerator$$anonfun$main$2$$anonfun$6.apply(CDRGenerator.scala:165)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1197)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6.apply(PairRDDFunctions.scala:1197)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1251)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1205)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Can you explain how spark distribute maps and how it is possible that some nodes can't find some keys in this map, please? Btw my spark version is 1.6.0
What am I missing?
UPDATE
This part is for initializing the map on driver.
...
var pd = sc.textFile( "hdfs://...")
my_immutable_map = pd.map( line => line.split(":") ).map{ line => (line(0), line(1).split(","))}.collectAsMap
...
broadcast = sc.broadcast(my_immutable_map)
my_map_rdd = sc.parallelize( my_immutable_map.toSeq)
And this is the part where i got the error.
def my_func(key:String):String={
...
my_value = broadcast.value(key)
...
}
my_func is called inside a map as;
my_another_rdd.map{ line =>
val key = line.split(",")(0)
my_func(key)
}
The solution that i found is to pass the broadcast value to the function as a parameter. Still, I couldn't find a solution for parallelize method.
https://stackoverflow.com/a/34912887/4668959

SQLITE_ERROR: Connection is closed when connecting from Spark via JDBC to SQLite database

I am using Apache Spark 1.5.1 and trying to connect to a local SQLite database named clinton.db. Creating a data frame from a table of the database works fine but when I do some operations on the created object, I get the error below which says "SQL error or missing database (Connection is closed)". Funny thing is that I get the result of the operation nevertheless. Any idea what I can do to solve the problem, i.e., avoid the error?
Start command for spark-shell:
../spark/bin/spark-shell --master local[8] --jars ../libraries/sqlite-jdbc-3.8.11.1.jar --classpath ../libraries/sqlite-jdbc-3.8.11.1.jar
Reading from the database:
val emails = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlite:../data/clinton.sqlite", "dbtable" -> "Emails")).load()
Simple count (fails):
emails.count
Error:
15/09/30 09:06:39 WARN JDBCRDD: Exception closing statement
java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (Connection is closed)
at org.sqlite.core.DB.newSQLException(DB.java:890)
at org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)
at org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358)
at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
at org.apache.spark.scheduler.Task.run(Task.scala:90)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
res1: Long = 7945
I got the same error today, and the important line is just before the exception:
15/11/30 12:13:02 INFO jdbc.JDBCRDD: closed connection
15/11/30 12:13:02 WARN jdbc.JDBCRDD: Exception closing statement
java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (Connection is closed)
at org.sqlite.core.DB.newSQLException(DB.java:890)
at org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)
at org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)
So Spark succeeded to close the JDBC connection, and then it fails to close the JDBC statement
Looking at the source, close() is called twice:
Line 358 (org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, Spark 1.5.1)
context.addTaskCompletionListener{ context => close() }
Line 469
override def hasNext: Boolean = {
if (!finished) {
if (!gotNext) {
nextValue = getNext()
if (finished) {
close()
}
gotNext = true
}
}
!finished
}
If you look at the close() method (line 443)
def close() {
if (closed) return
you can see that it checks the variable closed, but that value is never set to true.
If I see it correctly, this bug is still in the master. I have filed a bug report.
Source: JDBCRDD.scala (lines numbers differ slightly)

Spark - "too many open files" in shuffle

Using Spark 1.1
I have 2 datasets. One is very large and the other was reduced (using some 1:100 filtering) to much smaller scale. I need to reduce the large dataset to the same scale, by joining only those items from the smaller list with their corresponding counterparts in the larger list (those lists contain elements that have a mutual join field).
I am doing that using the following code:
The "if(joinKeys != null)" part is the relevant part
Smaller list is "joinKeys", larger list is "keyedEvents"
private static JavaRDD<ObjectNode> createOutputType(JavaRDD jsonsList, final String type, String outputPath,JavaPairRDD<String,String> joinKeys) {
outputPath = outputPath + "/" + type;
JavaRDD events = jsonsList.filter(new TypeFilter(type));
// This is in case we need to narrow the list to match some other list of ids... Recommendation List, for example... :)
if(joinKeys != null) {
JavaPairRDD<String,ObjectNode> keyedEvents = events.mapToPair(new KeyAdder("requestId"));
JavaRDD < ObjectNode > joinedEvents = joinKeys.join(keyedEvents).values().map(new PairToSecond());
events = joinedEvents;
}
JavaPairRDD<String,Iterable<ObjectNode>> groupedEvents = events.mapToPair(new KeyAdder("sliceKey")).groupByKey();
// Add convert jsons to strings and add "\n" at the end of each
JavaPairRDD<String, String> groupedStrings = groupedEvents.mapToPair(new JsonsToStrings());
groupedStrings.saveAsHadoopFile(outputPath, String.class, String.class, KeyBasedMultipleTextOutputFormat.class);
return events;
}
Thing is when running this job, I always get the same error:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2757 in stage 13.0 failed 4 times, most recent failure: Lost task 2757.3 in stage 13.0 (TID 47681, hadoop-w-175.c.taboola-qa-01.internal): java.io.FileNotFoundException: /hadoop/spark/tmp/spark-local-20141201184944-ba09/36/shuffle_6_2757_2762 (Too many open files)
java.io.FileOutputStream.open(Native Method)
java.io.FileOutputStream.<init>(FileOutputStream.java:221)
org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:123)
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
I already increased my ulimits, by doing the following on all cluster machines:
echo "* soft nofile 900000" >> /etc/security/limits.conf
echo "root soft nofile 900000" >> /etc/security/limits.conf
echo "* hard nofile 990000" >> /etc/security/limits.conf
echo "root hard nofile 990000" >> /etc/security/limits.conf
echo "session required pam_limits.so" >> /etc/pam.d/common-session
echo "session required pam_limits.so" >> /etc/pam.d/common-session-noninteractive
But doesn't fix my problem...
The bdutil framework works in a way the user "hadoop" is the one running the job. The script that deploys the cluster, created a file /etc/security/limits.d/hadoop.conf that overrided the ulimit settings for "hadoop" user, which I wasn't aware of. By deleting this file, or alternatively setting the desired ulimits there, I was able to resolve the problem.

Resources