DSS boxcarring skipping operation - wso2-data-services-server

I'm using DSS 3.5.0 with PostgreSQL, and a set of operations in a request box is not working in a very peculiar way. I've successfully used request boxes containing thousands of operations in this same project, including operations very similar to those that fail. One of these large request boxes failed, and after spending some time looking for the operations that caused the problem, we were able to reduce it to a set of five operations.
The problem
Looking at the PostgreSQL logs, the query issued by one of the operations is not executed because it never reaches the database.
I'll call the operations O1, O2, O3, O4 and O5 and their queries Q1, Q2, Q3, Q4 and Q5. Playing with the request and checking the resulting database log, we ended up with:
Request box contains O1-O2-O3-O4-O5: database executes Q1-Q2-Q3-Q5
Request box contains O1-O2-O4-O5: database executes Q1-Q2-Q4-Q5
Request box contains O1-O2-O3-O4: database executes Q1-Q2-Q3-Q4
Request box contains O1-O2-O3-O4-O4-O5: database executes Q1-Q2-Q3-Q5
So, it looks weird and it doesn't seem to follow a clearly discernible pattern.
All operations perform correctly if sent separately to the DSS, or in two different request boxes. The exact nature of the operations doesn't seem to be directly linked to the problem because the same operations are used countless times in other scenarios. The queries are not especially long or complex.
Operation 1: updates a record in table A
Operation 2: deletes a record from table B
Operation 3: inserts a record in table B
Operation 4: inserts a record in table A
Operation 5: inserts a record in table B (same as operation 3)
Errors and logs
The actual error message issued by PostgreSQL for operation 5 is
ERROR: null value in column "element_id" violates not-null constraint
This is expected because operation 4 (the one that disappears) inserts a value that is later used to resolve element_id for operation 5.
The PostgreSQL log reports:
LOG: execute <unnamed>: BEGIN
LOG: execute <unnamed>: UPDATE public.project_element SET element_uuid=$1,location_id=$2,from_revit=$3,name=$4,type=$5,model=NULLIF($6,0),parent_element=(SELECT PE.ELEMENT_ID FROM PROJECT_ELEMENT PE WHERE PE.PROJECT_ID = $7 AND (PE.ELEMENT_ID = $8 OR (PE.ELEMENT_UUID = $9 AND PE.ELEMENT_UUID IS NOT NULL))) ,left_border=$10,right_border=$11 WHERE element_id=$12
DETAIL: parameters: $1 = '(element-uuid)', $2 = '85', $3 = '1', $4 = '(some-text)', $5 = '3', $6 = '0', $7 = '22', $8 = NULL, $9 = '(parent-uuid)', $10 = NULL, $11 = NULL, $12 = '9983'
LOG: execute <unnamed>: DELETE FROM ELEMENT_PROPERTY WHERE ELEMENT_ID = (SELECT PE.ELEMENT_ID FROM PROJECT_ELEMENT PE WHERE PE.ELEMENT_ID = $1 AND PE.PROJECT_ID = $2) AND NAME = $3
DETAIL: parameters: $1 = '9983', $2 = '22', $3 = 'num_ports'
LOG: execute <unnamed>: INSERT INTO public.element_property(name,value,type,element_id) VALUES($1,$2,$3,(^M SELECT PE.ELEMENT_ID FROM PROJECT_ELEMENT PE WHERE PE.PROJECT_ID = $4 AND (PE.ELEMENT_ID = $5 OR (PE.ELEMENT_UUID = $6 AND PE.ELEMENT_UUID IS NOT NULL))))
DETAIL: parameters: $1 = 'num_ports', $2 = '48', $3 = '0', $4 = '22', $5 = NULL, $6 = '(element-uuid)'
LOG: execute <unnamed>: INSERT INTO public.element_property(name,value,type,element_id) VALUES($1,$2,$3,(SELECT PE.ELEMENT_ID FROM PROJECT_ELEMENT PE WHERE PE.PROJECT_ID = $4 AND (PE.ELEMENT_ID = $5 OR (PE.ELEMENT_UUID = $6 AND PE.ELEMENT_UUID IS NOT NULL))))
DETAIL: parameters: $1 = 'port_num', $2 = '6', $3 = '0', $4 = '22', $5 = NULL, $6 = '(other-uuid)'
ERROR: null value in column "element_id" violates not-null constraint
DETAIL: Failing row contains (port_num, 6, 0, null).
STATEMENT: INSERT INTO public.element_property(name,value,type,element_id) VALUES($1,$2,$3,(SELECT PE.ELEMENT_ID FROM PROJECT_ELEMENT PE WHERE PE.PROJECT_ID = $4 AND (PE.ELEMENT_ID = $5 OR (PE.ELEMENT_UUID = $6 AND PE.ELEMENT_UUID IS NOT NULL))))
LOG: execute S_2: BEGIN
LOG: execute S_1: ROLLBACK
DSS log starts with an exception, but I'm not sure if it's really related to this problem. The following log goes from the request box start to the first time it complains about the error message returned from PostgreSQL. DSS complains multiple times after that.
DEBUG - {org.apache.axis2.transport.http.AxisServlet}
java.lang.NullPointerException
at javax.servlet.GenericServlet.getServletContext(GenericServlet.java:123)
at org.apache.axis2.transport.http.AxisServlet.createMessageContext(AxisServlet.java:715)
at org.apache.axis2.transport.http.AxisServlet$RestRequestProcessor.<init>(AxisServlet.java:819)
at org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:227)
at org.wso2.carbon.core.transports.CarbonServlet.doPost(CarbonServlet.java:231)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.eclipse.equinox.http.servlet.internal.ServletRegistration.service(ServletRegistration.java:61)
at org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:128)
at org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.wso2.carbon.tomcat.ext.servlet.DelegationServlet.service(DelegationServlet.java:68)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.wso2.carbon.ui.filters.CSRFPreventionFilter.doFilter(CSRFPreventionFilter.java:88)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.wso2.carbon.ui.filters.CRLFPreventionFilter.doFilter(CRLFPreventionFilter.java:59)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.wso2.carbon.tomcat.ext.filter.CharacterSetFilter.doFilter(CharacterSetFilter.java:61)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.wso2.carbon.tomcat.ext.valves.CompositeValve.continueInvocation(CompositeValve.java:99)
at org.wso2.carbon.tomcat.ext.valves.CarbonTomcatValve$1.invoke(CarbonTomcatValve.java:47)
at org.wso2.carbon.webapp.mgt.TenantLazyLoaderValve.invoke(TenantLazyLoaderValve.java:57)
at org.wso2.carbon.tomcat.ext.valves.TomcatValveContainer.invokeValves(TomcatValveContainer.java:47)
at org.wso2.carbon.tomcat.ext.valves.CompositeValve.invoke(CompositeValve.java:62)
at org.wso2.carbon.tomcat.ext.valves.CarbonStuckThreadDetectionValve.invoke(CarbonStuckThreadDetectionValve.java:159)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at org.wso2.carbon.tomcat.ext.valves.CarbonContextCreatorValve.invoke(CarbonContextCreatorValve.java:57)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1074)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1739)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1698)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:744)
DEBUG - Input contentType (application/json) {org.apache.axis2.builder.BuilderUtil}
DEBUG - CharSetEncoding defaulted (UTF-8) {org.apache.axis2.builder.BuilderUtil}
DEBUG - [MessageContext: logID=f9462531f982d008b3e2aacd88bfd07f4a7e4905c354170e] Checking for Service using target endpoint address : /services/iims {org.apache.axis2.dispatchers.RequestURIBasedServiceDispatcher}
DEBUG - org.apache.axis2.i18n.resource::handleGetObject(servicefound) {org.apache.axis2.i18n.ProjectResourceBundle}
DEBUG - [MessageContext: logID=f9462531f982d008b3e2aacd88bfd07f4a7e4905c354170e] Found AxisService : iims {org.apache.axis2.engine.AbstractDispatcher}
DEBUG - Attempt to check for Operation using HTTP Location failed {org.apache.axis2.dispatchers.HTTPLocationBasedDispatcher}
DEBUG - [MessageContext: logID=f9462531f982d008b3e2aacd88bfd07f4a7e4905c354170e] Attempted to check for Operation using target endpoint URI, but the operation fragment was missing {org.apache.axis2.dispatchers.RequestURIBasedOperationDispatcher}
DEBUG - getAction (null) from org.apache.axis2.client.Options#279e70a {org.apache.axis2.client.Options}
DEBUG - SoapAction is (null) {org.apache.axis2.context.MessageContext}
DEBUG - createSOAPEnvelope using Builder (class org.apache.axis2.json.JSONOMBuilder) selected from type (application/json) {org.apache.axis2.transport.TransportUtils}
DEBUG - getAction (null) from org.apache.axis2.client.Options#279e70a {org.apache.axis2.client.Options}
DEBUG - SoapAction is (null) {org.apache.axis2.context.MessageContext}
DEBUG - [MessageContext: logID=f9462531f982d008b3e2aacd88bfd07f4a7e4905c354170e] Checking for Operation using Action : null {org.apache.axis2.dispatchers.ActionBasedOperationDispatcher}
DEBUG - [MessageContext: logID=f9462531f982d008b3e2aacd88bfd07f4a7e4905c354170e] Attempted to check for Operation using target endpoint URI, but the operation fragment was missing {org.apache.axis2.dispatchers.RequestURIBasedOperationDispatcher}
DEBUG - Axis operation is null {org.apache.axis2.json.gson.JSONMessageHandler}
DEBUG - No headers present corresponding to http://www.w3.org/2005/08/addressing {org.apache.axis2.handlers.addressing.AddressingInHandler}
DEBUG - No headers present corresponding to http://schemas.xmlsoap.org/ws/2004/08/addressing {org.apache.axis2.handlers.addressing.AddressingInHandler}
DEBUG - getAction (null) from org.apache.axis2.client.Options#279e70a {org.apache.axis2.client.Options}
DEBUG - SoapAction is (null) {org.apache.axis2.context.MessageContext}
DEBUG - [MessageContext: logID=f9462531f982d008b3e2aacd88bfd07f4a7e4905c354170e] Checking for Operation using Action : null {org.apache.axis2.dispatchers.ActionBasedOperationDispatcher}
DEBUG - getAction (null) from org.apache.axis2.client.Options#279e70a {org.apache.axis2.client.Options}
DEBUG - SoapAction is (null) {org.apache.axis2.context.MessageContext}
DEBUG - [MessageContext: logID=f9462531f982d008b3e2aacd88bfd07f4a7e4905c354170e] Checking for Operation using Action : null {org.apache.axis2.dispatchers.ActionBasedOperationDispatcher}
DEBUG - Get operation for request_box {org.apache.axis2.description.AxisService}
DEBUG - Found axis operation: org.apache.axis2.description.InOutAxisOperation#682d0c2c {org.apache.axis2.description.AxisService}
DEBUG - org.apache.axis2.i18n.resource::handleGetObject(operationfound) {org.apache.axis2.i18n.ProjectResourceBundle}
DEBUG - [MessageContext: logID=f9462531f982d008b3e2aacd88bfd07f4a7e4905c354170e] Found AxisOperation : request_box {org.apache.axis2.engine.AbstractDispatcher}
DEBUG - getAddressingRequirementParemeterValue: value: 'null' {org.apache.axis2.addressing.AddressingHelper}
DEBUG - [MessageContext: logID=f9462531f982d008b3e2aacd88bfd07f4a7e4905c354170e] isReplyRedirected: ReplyTo is null. Returning false {org.apache.axis2.addressing.AddressingHelper}
DEBUG - getAction (null) from org.apache.axis2.client.Options#112f42cb {org.apache.axis2.client.Options}
DEBUG - Old WSAAction is (null) {org.apache.axis2.context.MessageContext}
DEBUG - New WSAAction is (urn:request_boxResponse) {org.apache.axis2.context.MessageContext}
DEBUG - setAction Old action is (null) {org.apache.axis2.client.Options}
DEBUG - setAction New action is (urn:request_boxResponse) {org.apache.axis2.client.Options}
DEBUG - messageID is null. {org.apache.axis2.context.ConfigurationContext}
DEBUG - forceExpand: changing prefix from to {org.apache.axiom.om.impl.llom.OMSourcedElementImpl}
DEBUG - DXXATransactionManager.begin() {org.wso2.carbon.dataservices.core.description.xa.DSSXATransactionManager}
DEBUG - Creating data source connection {org.wso2.carbon.dataservices.core.description.config.SQLConfig}
ERROR - ERROR: null value in column "element_id" violates not-null constraint_ Detalhe: Failing row contains (port_num, 6, 0, null). (Sanitized) {org.wso2.carbon.dataservices.core.description.query.SQLQuery}
org.postgresql.util.PSQLException: ERROR: null value in column "element_id" violates not-null constraint
The implementation
This is the actual request box that fails (some field contents replaced to reduce noise):
{
"request_box":{
"update_project_element_operation":{
"name":"(some-text)",
"element_id":9983,
"element_uuid":"(element-uuid)",
"from_revit":1,
"project_id":22,
"parent_element_uuid":"(parent-uuid)",
"type":3,
"location_id":85,
"model":0
},
"delete_element_property_operation":{
"name":"num_ports",
"element_id":9983,
"project_id":22
},
"insert_element_property_operation":{
"project_id":22,
"element_uuid":"(element-uuid)",
"name":"num_ports",
"value":"48"
},
"insert_project_element_operation":{
"name":"(this operation disappears)",
"element_id":0,
"element_uuid":"(other-uuid)",
"from_revit":1,
"project_id":22,
"parent_element_uuid":"(element-uuid)",
"type":10,
"location_id":85,
"model":0
},
"insert_element_property_operation":{
"project_id":22,
"element_uuid":"(other-uuid)",
"name":"port_num",
"value":"6"
}
}
}
I can provide detailed table, query and operation definitions if necessary. All operations were used before, and each one of them work if issued separately or in two different request boxes. It seems to be a issue directly linked to DSS boxcarring.
Any ideas?

After a few weeks of investigation, including direct contact to the WSO2 support, we concluded that this unusual problem was caused by the JSON to XML conversion inside DSS. This may be related to the fact that the request box representation in JSON format can contain non-unique names (and according to RFC 7159 the behavior in this case is unpredictable and implementation-defined). It should be noted that we also used a request box with thousands of repetitions of the same name without any visible problem, so it isn't a straightforward consequence of all non-unique names not being correctly processed.
When we tried the same request box in XML, all operations were correctly executed. To avoid changing the application, we followed WSO2's advice and had the ESB converting the application-generated JSON to XML. Preliminar tests showed that in this case the XML was correctly generated, however we decided to slightly adjust the JSON generator to issue an array of operation objects instead of an object containing members with non-unique names, to avoid the undefined behavior and the possibility of new, unpredictable problems in JSON parsing.
WSO2 is aware of this problem and it may or may not be fixed by an upcoming release of DSS. Until then, the safer way to avoid request box suprises seems to be to use XML instead of JSON when sending transactions to DSS using request boxes.

Related

SQLITE_ERROR: Connection is closed when connecting from Spark via JDBC to SQLite database

I am using Apache Spark 1.5.1 and trying to connect to a local SQLite database named clinton.db. Creating a data frame from a table of the database works fine but when I do some operations on the created object, I get the error below which says "SQL error or missing database (Connection is closed)". Funny thing is that I get the result of the operation nevertheless. Any idea what I can do to solve the problem, i.e., avoid the error?
Start command for spark-shell:
../spark/bin/spark-shell --master local[8] --jars ../libraries/sqlite-jdbc-3.8.11.1.jar --classpath ../libraries/sqlite-jdbc-3.8.11.1.jar
Reading from the database:
val emails = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlite:../data/clinton.sqlite", "dbtable" -> "Emails")).load()
Simple count (fails):
emails.count
Error:
15/09/30 09:06:39 WARN JDBCRDD: Exception closing statement
java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (Connection is closed)
at org.sqlite.core.DB.newSQLException(DB.java:890)
at org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)
at org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358)
at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
at org.apache.spark.scheduler.Task.run(Task.scala:90)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
res1: Long = 7945
I got the same error today, and the important line is just before the exception:
15/11/30 12:13:02 INFO jdbc.JDBCRDD: closed connection
15/11/30 12:13:02 WARN jdbc.JDBCRDD: Exception closing statement
java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (Connection is closed)
at org.sqlite.core.DB.newSQLException(DB.java:890)
at org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)
at org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)
So Spark succeeded to close the JDBC connection, and then it fails to close the JDBC statement
Looking at the source, close() is called twice:
Line 358 (org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, Spark 1.5.1)
context.addTaskCompletionListener{ context => close() }
Line 469
override def hasNext: Boolean = {
if (!finished) {
if (!gotNext) {
nextValue = getNext()
if (finished) {
close()
}
gotNext = true
}
}
!finished
}
If you look at the close() method (line 443)
def close() {
if (closed) return
you can see that it checks the variable closed, but that value is never set to true.
If I see it correctly, this bug is still in the master. I have filed a bug report.
Source: JDBCRDD.scala (lines numbers differ slightly)

inputKey basic sample doesn't work

My goal is to have a task or setting that can take some parameters.
After carefully reading the docs, I've written in build.sbt this basic snippet that compiles ok:
val servers = token(
literal("desarrollo") |
literal("parametrizacion")
)
val deploy = inputKey[Unit]("Deploy to server")
deploy := {
val serv = servers.parsed
println(s"Deploying to $serv")
}
I'm experiencing this problems from SBT command line:
> deploy desarrollo
[error] Expected ID character
[error] Not a valid command: deploy
[error] Expected project ID
[error] Expected configuration
[error] Expected ':' (if selecting a configuration)
[error] Expected key
[error] Expected '::'
[error] Expected end of input.
[error] Expected 'desarrollo'
[error] Expected 'parametrizacion'
[error] desploy desarrollo
[error] ^
Tab completion for the argument doesn't work.
My purpose is to admit a parameter whose value can be desarrollo either parametrizacion.
A space must be specified and discarded.
val servers = token(' ' ~> (
literal("desarrollo") |
literal("parametrizacion")
))
val deploy = inputKey[Unit]("Deploy to server")
deploy := {
val serv = servers.parsed
println(s"Deploying to $serv")
}
Now, deploy desarrollo and deploy parametrizacion do work.
Really, the need to specify the initial space, provides more flexibility. :-)

Spark - "too many open files" in shuffle

Using Spark 1.1
I have 2 datasets. One is very large and the other was reduced (using some 1:100 filtering) to much smaller scale. I need to reduce the large dataset to the same scale, by joining only those items from the smaller list with their corresponding counterparts in the larger list (those lists contain elements that have a mutual join field).
I am doing that using the following code:
The "if(joinKeys != null)" part is the relevant part
Smaller list is "joinKeys", larger list is "keyedEvents"
private static JavaRDD<ObjectNode> createOutputType(JavaRDD jsonsList, final String type, String outputPath,JavaPairRDD<String,String> joinKeys) {
outputPath = outputPath + "/" + type;
JavaRDD events = jsonsList.filter(new TypeFilter(type));
// This is in case we need to narrow the list to match some other list of ids... Recommendation List, for example... :)
if(joinKeys != null) {
JavaPairRDD<String,ObjectNode> keyedEvents = events.mapToPair(new KeyAdder("requestId"));
JavaRDD < ObjectNode > joinedEvents = joinKeys.join(keyedEvents).values().map(new PairToSecond());
events = joinedEvents;
}
JavaPairRDD<String,Iterable<ObjectNode>> groupedEvents = events.mapToPair(new KeyAdder("sliceKey")).groupByKey();
// Add convert jsons to strings and add "\n" at the end of each
JavaPairRDD<String, String> groupedStrings = groupedEvents.mapToPair(new JsonsToStrings());
groupedStrings.saveAsHadoopFile(outputPath, String.class, String.class, KeyBasedMultipleTextOutputFormat.class);
return events;
}
Thing is when running this job, I always get the same error:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2757 in stage 13.0 failed 4 times, most recent failure: Lost task 2757.3 in stage 13.0 (TID 47681, hadoop-w-175.c.taboola-qa-01.internal): java.io.FileNotFoundException: /hadoop/spark/tmp/spark-local-20141201184944-ba09/36/shuffle_6_2757_2762 (Too many open files)
java.io.FileOutputStream.open(Native Method)
java.io.FileOutputStream.<init>(FileOutputStream.java:221)
org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:123)
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
I already increased my ulimits, by doing the following on all cluster machines:
echo "* soft nofile 900000" >> /etc/security/limits.conf
echo "root soft nofile 900000" >> /etc/security/limits.conf
echo "* hard nofile 990000" >> /etc/security/limits.conf
echo "root hard nofile 990000" >> /etc/security/limits.conf
echo "session required pam_limits.so" >> /etc/pam.d/common-session
echo "session required pam_limits.so" >> /etc/pam.d/common-session-noninteractive
But doesn't fix my problem...
The bdutil framework works in a way the user "hadoop" is the one running the job. The script that deploys the cluster, created a file /etc/security/limits.d/hadoop.conf that overrided the ulimit settings for "hadoop" user, which I wasn't aware of. By deleting this file, or alternatively setting the desired ulimits there, I was able to resolve the problem.

How to understand this Riak stacktrace?

Can anyone help me solve this problem, I have the stacktrace of the problem, but can't understand what the trace actually means.
The error occurs when I try to retrieve all data from a bucket, in a Riak Database. And I am using java-riak-client library as the ORM. I can figure out that its a MapReduce problem, but other than that.....
Here below is the actual stacktrace, I actually could not figure out what error is it pointing to,
and I tried to find out the record its displaying in error.
#Update: Yes the record is there, when i CURL
com.basho.riak.client.RiakException: java.io.IOException: <html><head><title>500 Internal Server Error</title></head><body><h1>Internal Server Error</h1>The server encountered an error while processing this request:<br><pre>{error,
{error,
{case_clause,
{error,
{0,
[{module,riak_kv_mrc_map},
{partition,913438523331814323877303020447676887284957839360},
{details,
[{fitting,
{fitting,<0.21083.23>,#Ref<0.0.31.39954>,follow,1}},
{name,0},
{module,riak_kv_mrc_map},
{arg,{{jsfun,<<"Riak.mapValuesJson">>},none}},
{output,
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,sink,
undefined}},
{options,
[{log,sink},
{trace,[error]},
{sink,
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,
sink,undefined}},
{sink_type,{fsm,10,infinity}}]},
{q_limit,64}]},
{type,forward_preflist},
{error,[preflist_exhausted]},
{input,
{ok,{r_object,<<"xxxx-users">>,
<<"xxxx#hotmail.com-userpass">>,
[{r_content,
{dict,7,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],
[],[],[],[]},
{{[],[],
[[<<"Links">>]],
[],[],[],[],[],[],[],
[[<<"content-type">>,97,112,112,108,
105,99,97,116,105,111,110,47,106,
115,111,110],
[<<"X-Riak-VTag">>,54,98,119,73,73,
84,107,120,66,70,107,86,102,67,103,
71,73,116,120,121,85,53]],
[[<<"index">>]],
[],
[[<<"X-Riak-Last-Modified">>|
{1407,514685,380030}]],
[],
[[<<"charset">>,117,116,102,45,56],
[<<"X-Riak-Meta">>]]}}},
<<"{\"identityId\":{\"userId\":\"xxxx#hotmail.com\",\"providerId\":\"userpass\"},\"firstName\":\"xx\",\"lastName\":\"xx\",\"fullName\":\"xx xx\",\"email\":\"xxxx#hotmail.com\",\"authMethod\":{\"method\":\"userPassword\"},\"passwordInfo\":{\"hasher\":\"bcrypt\",\"password\":\"$2a$10$Gm1VVCM09iyI7TQY7r8B7.Baa.YrtHHgREkQpTIH9ThyW4WzuUeJ.\"}}">>}],
[{<<35,9,254,249,83,228,76,146>>,
{1,63574733885}}],
{dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[[clean|true]],
[]}}},
undefined},
undefined}},
{modstate,
{state,
913438523331814323877303020447676887284957839360,
{fitting_details,
{fitting,<0.21083.23>,#Ref<0.0.31.39954>,
follow,1},
0,riak_kv_mrc_map,
{{jsfun,<<"Riak.mapValuesJson">>},none},
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,sink,
undefined},
[{log,sink},
{trace,[error]},
{sink,
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,
sink,undefined}},
{sink_type,{fsm,10,infinity}}],
64},
{jsfun,<<"Riak.mapValuesJson">>},
none}},
{stack,[]}]}}},
[{riak_kv_wm_mapred,pipe_mapred_nonchunked,3,
[{file,"src/riak_kv_wm_mapred.erl"},{line,180}]},
{webmachine_resource,resource_call,3,
[{file,"src/webmachine_resource.erl"},{line,183}]},
{webmachine_resource,do,3,
[{file,"src/webmachine_resource.erl"},{line,141}]},
{webmachine_decision_core,resource_call,1,
[{file,"src/webmachine_decision_core.erl"},{line,48}]},
{webmachine_decision_core,decision,1,
[{file,"src/webmachine_decision_core.erl"},{line,481}]},
{webmachine_decision_core,handle_request,2,
[{file,"src/webmachine_decision_core.erl"},{line,33}]},
{webmachine_mochiweb,loop,1,
[{file,"src/webmachine_mochiweb.erl"},{line,97}]},
{mochiweb_http,parse_headers,5,
[{file,"src/mochiweb_http.erl"},{line,180}]}]}}</pre><P><HR><ADDRESS>mochiweb+webmachine web server</ADDRESS></body></html>
at com.basho.riak.client.query.MapReduce.execute(MapReduce.java:81)
at models.UserRecordsModel$.getAllUsers(UserRecordsModel.scala:131)
at controllers.DataRetrieval$$anonfun$getRegisteredUserData$1.apply(DataRetrieval.scala:42)
at controllers.DataRetrieval$$anonfun$getRegisteredUserData$1.apply(DataRetrieval.scala:38)
at play.api.mvc.ActionBuilder$$anonfun$apply$10.apply(Action.scala:221)
at play.api.mvc.ActionBuilder$$anonfun$apply$10.apply(Action.scala:220)
at securesocial.core.SecureSocial$SecuredActionBuilder$$anonfun$2$$anonfun$apply$1.apply(SecureSocial.scala:117)
at securesocial.core.SecureSocial$SecuredActionBuilder$$anonfun$2$$anonfun$apply$1.apply(SecureSocial.scala:113)
at scala.Option.map(Option.scala:145)
at securesocial.core.SecureSocial$SecuredActionBuilder$$anonfun$2.apply(SecureSocial.scala:113)
at securesocial.core.SecureSocial$SecuredActionBuilder$$anonfun$2.apply(SecureSocial.scala:112)
at scala.Option.flatMap(Option.scala:170)
at securesocial.core.SecureSocial$SecuredActionBuilder.invokeSecuredBlock(SecureSocial.scala:112)
at securesocial.core.SecureSocial$SecuredActionBuilder.invokeBlock(SecureSocial.scala:146)
at play.api.mvc.ActionBuilder$$anon$1.apply(Action.scala:309)
at play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4$$anonfun$apply$5.apply(Action.scala:109)
at play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4$$anonfun$apply$5.apply(Action.scala:109)
at play.utils.Threads$.withContextClassLoader(Threads.scala:18)
at play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4.apply(Action.scala:108)
at play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4.apply(Action.scala:107)
at scala.Option.map(Option.scala:145)
at play.api.mvc.Action$$anonfun$apply$1.apply(Action.scala:107)
at play.api.mvc.Action$$anonfun$apply$1.apply(Action.scala:100)
at play.api.libs.iteratee.Iteratee$$anonfun$mapM$1.apply(Iteratee.scala:481)
at play.api.libs.iteratee.Iteratee$$anonfun$mapM$1.apply(Iteratee.scala:481)
at play.api.libs.iteratee.Iteratee$$anonfun$flatMapM$1.apply(Iteratee.scala:517)
at play.api.libs.iteratee.Iteratee$$anonfun$flatMapM$1.apply(Iteratee.scala:517)
at play.api.libs.iteratee.Iteratee$$anonfun$flatMap$1$$anonfun$apply$13.apply(Iteratee.scala:493)
at play.api.libs.iteratee.Iteratee$$anonfun$flatMap$1$$anonfun$apply$13.apply(Iteratee.scala:493)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: <html><head><title>500 Internal Server Error</title></head><body><h1>Internal Server Error</h1>The server encountered an error while processing this request:<br><pre>{error,
{error,
{case_clause,
{error,
{0,
[{module,riak_kv_mrc_map},
{partition,913438523331814323877303020447676887284957839360},
{details,
[{fitting,
{fitting,<0.21083.23>,#Ref<0.0.31.39954>,follow,1}},
{name,0},
{module,riak_kv_mrc_map},
{arg,{{jsfun,<<"Riak.mapValuesJson">>},none}},
{output,
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,sink,
undefined}},
{options,
[{log,sink},
{trace,[error]},
{sink,
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,
sink,undefined}},
{sink_type,{fsm,10,infinity}}]},
{q_limit,64}]},
{type,forward_preflist},
{error,[preflist_exhausted]},
{input,
{ok,{r_object,<<"xxxx-users">>,
<<"xxxx#hotmail.com-userpass">>,
[{r_content,
{dict,7,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],
[],[],[],[]},
{{[],[],
[[<<"Links">>]],
[],[],[],[],[],[],[],
[[<<"content-type">>,97,112,112,108,
105,99,97,116,105,111,110,47,106,
115,111,110],
[<<"X-Riak-VTag">>,54,98,119,73,73,
84,107,120,66,70,107,86,102,67,103,
71,73,116,120,121,85,53]],
[[<<"index">>]],
[],
[[<<"X-Riak-Last-Modified">>|
{1407,514685,380030}]],
[],
[[<<"charset">>,117,116,102,45,56],
[<<"X-Riak-Meta">>]]}}},
<<"{\"identityId\":{\"userId\":\"xxxx#hotmail.com\",\"providerId\":\"userpass\"},\"firstName\":\"xx\",\"lastName\":\"xx\",\"fullName\":\"xx xx\",\"email\":\"xxxx#hotmail.com\",\"authMethod\":{\"method\":\"userPassword\"},\"passwordInfo\":{\"hasher\":\"bcrypt\",\"password\":\"$2a$10$Gm1VVCM09iyI7TQY7r8B7.Baa.YrtHHgREkQpTIH9ThyW4WzuUeJ.\"}}">>}],
[{<<35,9,254,249,83,228,76,146>>,
{1,63574733885}}],
{dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[[clean|true]],
[]}}},
undefined},
undefined}},
{modstate,
{state,
913438523331814323877303020447676887284957839360,
{fitting_details,
{fitting,<0.21083.23>,#Ref<0.0.31.39954>,
follow,1},
0,riak_kv_mrc_map,
{{jsfun,<<"Riak.mapValuesJson">>},none},
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,sink,
undefined},
[{log,sink},
{trace,[error]},
{sink,
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,
sink,undefined}},
{sink_type,{fsm,10,infinity}}],
64},
{jsfun,<<"Riak.mapValuesJson">>},
none}},
{stack,[]}]}}},
[{riak_kv_wm_mapred,pipe_mapred_nonchunked,3,
[{file,"src/riak_kv_wm_mapred.erl"},{line,180}]},
{webmachine_resource,resource_call,3,
[{file,"src/webmachine_resource.erl"},{line,183}]},
{webmachine_resource,do,3,
[{file,"src/webmachine_resource.erl"},{line,141}]},
{webmachine_decision_core,resource_call,1,
[{file,"src/webmachine_decision_core.erl"},{line,48}]},
{webmachine_decision_core,decision,1,
[{file,"src/webmachine_decision_core.erl"},{line,481}]},
{webmachine_decision_core,handle_request,2,
[{file,"src/webmachine_decision_core.erl"},{line,33}]},
{webmachine_mochiweb,loop,1,
[{file,"src/webmachine_mochiweb.erl"},{line,97}]},
{mochiweb_http,parse_headers,5,
[{file,"src/mochiweb_http.erl"},{line,180}]}]}}</pre><P><HR><ADDRESS>mochiweb+webmachine web server</ADDRESS></body></html>
at com.basho.riak.client.raw.http.ConversionUtil.convert(ConversionUtil.java:589)
at com.basho.riak.client.raw.http.HTTPClientAdapter.mapReduce(HTTPClientAdapter.java:386)
at com.basho.riak.client.query.MapReduce.execute(MapReduce.java:79)
... 36 more
The stack trace is telling us that there was a case clause exception at line 180 of the fie riak_kv_wm_mapred.erl
The clause at that line is handling the responses for the pipe processing the map phase, which appears to be returning the error preflist_exhausted, which is not explicitly handled by the case statement.
That error usually indicates that one or more vnodes were overloaded or otherwise unavailable, and fallbacks had not yet started to take over their workload.
The affected partition was 913438523331814323877303020447676887284957839360, the console.log and error.log may have further details about what happened.

Titan graph DB, Java exception during a query using external index ( Elasticsearch )

I have a tiny graph DB, 400k nodes and 150k Edges.
I was following the "Type Definition Overview" and "Indexing Backend Overview" directions to create keys, external indexes and query them, I've done:
g.makeKey('State').dataType(String.class).indexed('dev-titan', Vertex.class).make();
'dev-titan' = Elasticsearch index name
And I can find the states values in elasticsearch, titan index, under a field name "4O".
When I run this query, after 20 minutes or more I got this
rexster[groovy]> g=rexster.getGraph('graph')
==>titangraph[cassandra:xx.xx.x.xxx]
rexster[groovy]> g.query().has("State",EQUAL,"TN").vertices()
Mar 13, 2014 5:10:58 PM org.glassfish.grizzly.filterchain.DefaultFilterChain execute
WARNING: Exception during FilterChain execution
java.lang.ClassCastException: com.tinkerpop.rexster.protocol.msg.ErrorResponseMessage cannot be cast to org.glassfish.grizzly.asyncqueue.WritableMessage
at org.glassfish.grizzly.nio.transport.TCPNIOTransportFilter.handleWrite(TCPNIOTransportFilter.java:111)
at org.glassfish.grizzly.filterchain.TransportFilter.handleWrite(TransportFilter.java:191)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:265)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:200)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:134)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:78)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:652)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:533)
at com.tinkerpop.rexster.client.RexProClientFilter.handleRead(RexProClientFilter.java:155)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:265)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:200)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:134)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:78)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:815)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:115)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:55)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:135)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:567)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:547)
at java.lang.Thread.run(Thread.java:744)
Standard indexes work good.
What I can be doing wrong?
I appriciate any help

Resources