Azure databricks autoloader spark streaming unable to read input fil - bigdata

I have setup streaming job using autoloader feature and input is located at azure adls gen2 in parquet format.below is the code.
df = spark.readStream.format("cloudFiles")\
.options(**cloudfile)\
.schema(schema).load(staging_path)
df.writeStream\
.trigger(processingTime="10 minutes"))\
.outputMode("append")\
.option("checkpointLocation", checkpoint_path)\
.foreachBatch(writeBatchToADXandDelta)\
.start()
This code throws an error as below
py4j.Py4JException: An exception was raised by the Python Proxy. Return Message: Traceback (most recent call last):
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 11.0 failed 4 times, most recent failure: Lost task 5.3 in stage 11.0 (TID 115) (172.20.58.133 executor 1): com.databricks.sql.io.FileReadException: Error while reading file /mnt/adl2/environment=production/data_lake=main/tier=ingress/area=transient/domain=iotdata/entity=screens/topic=sensor/vendor=abc/source_system=iot_hub/parent=external/dataset=screens/kind=data/evolution=2/file_format=parquet/source=kevitsa/ingestion_date=2022/08/03/13/-136567710_c96a862c2aaf43cfbd62025cd3db4a48_1.parquet.
..
Caused by: java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:208)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$2.apply(ParquetFileFormat.scala:397)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$2.apply(ParquetFileFormat.scala:373)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:333)
... 18 more
what could be the reason for this.
Thanks in advance!!

From the error message it looks like you have a broken file in your location. You can use ignoreCorruptFiles option (doc) to skip broken files instead of failing.

Related

SFTP On New or Update throws error Pipe close

I'm using the "On New or Update" source of the Mule 4 SFTP Connector, to process files from an SFTP server directory. The process works fine, however while reading the last file the SFTP connector throws an error as shown below and the file remains in directory waiting next schedule time to be picked up and the same thing will happen for the last file of other new set of files.
Any thoughts on how to fix this issue?
ERROR:
11:20:45.315 05/04/2022 Worker-0 [MuleRuntime].uber.27: [sftp-demo-app].prcsACKFiles-Error-SuccessFlow.CPU_INTENSIVE #1648077b ERROR
event:c458bc90-cbbd-11ec-85e2-06a565d43154
********************************************************************************
Message : "org.mule.weave.v2.module.reader.ReaderParsingException: org.mule.runtime.api.exception.MuleRuntimeException - Exception was found trying to retrieve the contents of file /home/messages/file_8ddb7674.json
org.mule.runtime.api.exception.MuleRuntimeException: Exception was found trying to retrieve the contents of file /home/messages/file_8ddb7674.json
at org.mule.extension.sftp.internal.connection.SftpClient.exception(SftpClient.java:427)
at org.mule.extension.sftp.internal.connection.SftpClient.exception(SftpClient.java:423)
at org.mule.extension.sftp.internal.connection.SftpClient.getFileContent(SftpClient.java:349)
at org.mule.extension.sftp.internal.connection.SftpFileSystem.retrieveFileContent(SftpFileSystem.java:117)
at org.mule.extension.sftp.internal.SftpInputStream$SftpFileInputStreamSupplier.getContentInputStream(SftpInputStream.java:111)
at org.mule.extension.sftp.internal.SftpInputStream$SftpFileInputStreamSupplier.getContentInputStream(SftpInputStream.java:93)
at org.mule.extension.file.common.api.AbstractConnectedFileInputStreamSupplier.getContentInputStream(AbstractConnectedFileInputStreamSupplier.java:81)
at org.mule.extension.file.common.api.AbstractFileInputStreamSupplier.get(AbstractFileInputStreamSupplier.java:65)
at org.mule.extension.file.common.api.AbstractFileInputStreamSupplier.get(AbstractFileInputStreamSupplier.java:33)
at org.mule.extension.file.common.api.stream.LazyStreamSupplier.lambda$new$1(LazyStreamSupplier.java:29)
at org.mule.extension.file.common.api.stream.LazyStreamSupplier.get(LazyStreamSupplier.java:42)
at org.mule.extension.file.common.api.stream.AbstractNonFinalizableFileInputStream.lambda$createLazyStream$0(AbstractNonFinalizableFileInputStream.java:48)
at $java.io.InputStream$$EnhancerByCGLIB$$55e4687e.read(<generated>)
at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:102)
at org.mule.runtime.core.internal.streaming.bytes.AbstractInputStreamBuffer.consumeStream(AbstractInputStreamBuffer.java:111)
at com.mulesoft.mule.runtime.core.internal.streaming.bytes.FileStoreInputStreamBuffer.consumeForwardData(FileStoreInputStreamBuffer.java:239)
at com.mulesoft.mule.runtime.core.internal.streaming.bytes.FileStoreInputStreamBuffer.consumeForwardData(FileStoreInputStreamBuffer.java:202)
at com.mulesoft.mule.runtime.core.internal.streaming.bytes.FileStoreInputStreamBuffer.doGet(FileStoreInputStreamBuffer.java:125)
at org.mule.runtime.core.internal.streaming.bytes.AbstractInputStreamBuffer.get(AbstractInputStreamBuffer.java:93)
at org.mule.runtime.core.internal.streaming.bytes.BufferedCursorStream.assureDataInLocalBuffer(BufferedCursorStream.java:126)
at org.mule.runtime.core.internal.streaming.bytes.BufferedCursorStream.doRead(BufferedCursorStream.java:101)
at org.mule.runtime.core.internal.streaming.bytes.AbstractCursorStream.read(AbstractCursorStream.java:124)
at org.mule.runtime.core.internal.streaming.bytes.BufferedCursorStream.read(BufferedCursorStream.java:26)
at java.io.InputStream.read(InputStream.java:101)
at org.mule.runtime.core.internal.streaming.bytes.ManagedCursorStreamDecorator.read(ManagedCursorStreamDecorator.java:96)
at org.mule.weave.v2.el.SeekableCursorStream.read(MuleTypedValue.scala:306)
at org.mule.weave.v2.module.reader.UTF8StreamSourceReader.handleBOM(SeekableStreamSourceReader.scala:179)
at org.mule.weave.v2.module.reader.UTF8StreamSourceReader.readAscii(SeekableStreamSourceReader.scala:163)
at org.mule.weave.v2.module.json.reader.JsonTokenizer.$init$(JsonTokenizer.scala:21)
at org.mule.weave.v2.module.json.reader.indexed.IndexedJsonTokenizer.<init>(IndexedJsonTokenizer.scala:15)
at org.mule.weave.v2.module.json.reader.indexed.IndexedJsonParser.parser(IndexedJsonParser.scala:17)
at org.mule.weave.v2.module.json.reader.JsonReader.readValue(JsonReader.scala:40)
at org.mule.weave.v2.module.json.reader.JsonReader.doRead(JsonReader.scala:30)
at org.mule.weave.v2.module.reader.Reader.read(Reader.scala:35)
at org.mule.weave.v2.module.reader.Reader.read$(Reader.scala:33)
at org.mule.weave.v2.module.json.reader.JsonReader.read(JsonReader.scala:20)
at org.mule.weave.v2.el.MuleTypedValue.value(MuleTypedValue.scala:147)
at org.mule.weave.v2.model.values.wrappers.DelegateValue.valueType(DelegateValue.scala:17)
at org.mule.weave.v2.model.values.wrappers.DelegateValue.valueType$(DelegateValue.scala:16)
at org.mule.weave.v2.el.MuleTypedValue.valueType(MuleTypedValue.scala:177)
at org.mule.weave.v2.model.types.ObjectType$.accepts(Type.scala:1068)
Caused by: org.mule.extension.sftp.api.SftpConnectionException: Error occurred while trying to connect to host
... 112 more
Caused by: org.mule.runtime.api.connection.ConnectionException:
at org.mule.extension.sftp.api.SftpConnectionException.<init>(SftpConnectionException.java:38)
... 112 more
Caused by: org.mule.runtime.api.connection.ConnectionException:
... 112 more
Caused by: 4:
at com.jcraft.jsch.ChannelSftp.get(ChannelSftp.java:1540)
at com.jcraft.jsch.ChannelSftp.get(ChannelSftp.java:1290)
at org.mule.extension.sftp.internal.connection.SftpClient.getFileContent(SftpClient.java:347)
... 110 more
Caused by: java.io.IOException: Pipe closed
Error Pipe closed in SFTP indicates a communication error that the SFTP connector can not resolve, so the operation fails. I don't believe that there is anything that you can do about that. You might try to test a newer version of the connector if you are using an older one, just in case.

Why am I getting connection reset error in Sqoop?

I am using Sqoop 1.4.6v and hadoop-2.7.1v.
I am importing data from Oracle DB and using ojdbc6.jar.
It is working fine but sometimes I am getting following error:-
19/03/15 16:27:23 INFO mapreduce.Job: Task Id : attempt_1552649108375_0013_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLRecoverableException: IO Error: Connection reset
How do I resolve this issue?
Any help regarding this would be appreciated.
I found something for you let me know if it helps :
This problem occurs primarily due to the lack of a fast random number generation device on the host where the map tasks execute
Please refer the sqoop guide for detailed explanation:
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oracle_connection_reset_errors

Failed to apply plugin [id 'net.corda.plugins.cordformation']

I tried to migrate yo-cordapp from version 2.0 to 3.0 but gets this error.
FAILURE: Build failed with an exception.
Where:
Build file '/home/atul/Documents/mg/IdeaProjects/yo-cordapp/build.gradle' line: 36
What went wrong:
A problem occurred evaluating root project 'yo'.
Failed to apply plugin [id 'net.corda.plugins.cordformation']
Could not create plugin of type 'Cordformation'.
Could not initialize class net.corda.plugins.Cordformation
Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to get more log output.
BUILD FAILED
Total time: 0.513 secs
Stopped 0 worker daemon(s).
Received result Failure[value=org.gradle.initialization.ReportedException: org.gradle.internal.exceptions.LocationAwareException: Build file '/home/atul/Documents/mg/IdeaProjects/yo-cordapp/build.gradle' line: 36
A problem occurred evaluating root project 'yo'.] from daemon DaemonInfo{pid=1439, address=[1bb69a7c-e166-4da4-be23-025402c62d96 port:36544, addresses:[/0:0:0:0:0:0:0:1, /127.0.0.1]], state=Idle, lastBusy=1527564107106, context=DefaultDaemonContext[uid=dbe9d9f3-b86b-448f-8d35-648c4aad50fd,javaHome=/usr/lib/jvm/java-8-oracle,daemonRegistryDir=/root/.gradle/daemon,pid=1439,idleTimeout=10800000,daemonOpts=-XX:MaxPermSize=256m,-XX:+HeapDumpOnOutOfMemoryError,-Xmx1024m,-Dfile.encoding=UTF-8,-Duser.country=IN,-Duser.language=en,-Duser.variant]} (build should be done).
why i am getting this error
[PS : i know this migration already bean done but i am getting this error when i tried.]
You need to apply the new cordapp plugin. See https://github.com/corda/cordapp-example/blob/release-V3/kotlin-source/build.gradle#L11.

Using R script with Spark

I want to run my existing R script from Spark.
I have setup R and Spark on my machine and trying to execute the code but i am getting exception but that is not very helpful.
Spark Code-
String file = "/home/MSA2.R";
SparkConf sparkConf = new SparkConf().setAppName("First App")
.setMaster("local[1]");
#SuppressWarnings("resource")
JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
JavaRDD<String> rdd = sparkContext.textFile("/home/test.csv")
.pipe(file);
R code -
f1 <- read.csv("/home/testing.csv")
Exception -
Exception in thread "main" org.apache.spark.SparkException: Job
aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most
recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost):
java.lang.IllegalStateException: Subprocess exited with status 2.
Command ran: /home/MSA2.R
java.util.NoSuchElementException: key not found: 1
rg.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 >seconds. This timeout is controlled by spark.rpc.askTimeout at >org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTime‌​>out$$createRpcTimeou‌​tException(RpcTimeou‌​t.scala:48)
There is not much in exception to debug the issue.
Can anyone suggest if the approach is correct or not. If yes can anyone help with the issue, If no, please suggest an approach.
Note: I don't want to use Spark R
Reference of above code- https://www.linkedin.com/pulse/executing-existing-r-scripts-from-spark-rutger-de-graaf
Actual error is :
java.lang.IllegalStateException: Subprocess exited with status 2.
Command ran: /home/MSA2.R
Make sure, MSA2.R exists in the given location and in the same cluster where you are running spark jobs.
Generally exit status 2 occurs when script is not able to access the device.
I have fixed the issue. I have added
#!/usr/bin/Rscript
on the first line of the RScript and it worked.

WebSphere 8.5, Why AdminApp returns "exception information: websphere.management.application.client.AppDeploymentException"

I have a very simply Jython script over Unix. It was working perfectly during WebSphere 7 and now, after we upgraded to WAS 8.5 it isn't working anymore. Obviously, I changed the path to point to WAS8.5. I spent the whole day struglling to find the reason for this falling and I am completely stuck. The exception descrition doesn't help much.
From a JCL JOB I call the Jython script.
/WebSphere/was85/dtl85cel/ledm85nd/DeploymentManager/profiles/default/bin/wsadmin.sh -lang jython -f /WebSphereDevelopment/scripts/dtl/WAS85/Install.jy
The Jython script is really simple.
Basically, I call AdminApp.install("myEAR path", ...with the options below:
-nopreCompileJSPs -installed.ear.destination /WebSphereDevelopment/MYAPP/dtl/curr/deployment/ -distributeApp -nouseMetaDataFromBinary -nodeployejb -appname DVL-MYAPP -createMBeansForResources -noreloadEnabled -nodeployws -validateinstall warn -processEmbeddedConfig -filepermission ..dll=755#. .so=755#..a=755#.*.sl=755 -noallowDispatchRemoteInclude -noallowServiceRemoteInclude -asyncRequestDispatchType DISABLED -nouseAutoLink -contextroot / -MapModulesToServers ÝÝ MyApp MyApp.war,WEB-INF/web.xml WebSphere:cell=dtl85cel,node=wleMyAppa,server=WLEMYAPP¨¨
)
The error log is:
WASX7017E: Exception received while running file "/WebSphereDevelopment/scripts dtl/MYAPP/MYAPP_DTL_DEPLOY.jy"; exception information: com.ibm.websphere.management.application.client.AppDeploymentException: com.ibm.websphere.management.appliccation.client.AppDeploymentException: ÝRoot exception is java.lang.RuntimeException: Deploying /WebSphere/was85/dtl85cel/ledm85nd/DeploymentManager/profiles/d java.lang.RuntimeException: java.lang.RuntimeException: Deploying /WebSphere/wa s85/dtl85cel/ledm85nd/DeploymentManager/profiles/default/temp/app69105293327198772690.ear failed.
Turn on tracing under wsadmin.properties:
com.ibm.ws.scripting.traceString

Resources