External Table for Parquet or Avro data format throws error - azure-data-explorer

i created an external table to fetch Parquet data from ADLS which throws below error.
Query execution has resulted in error (0x80131500): Partial query failure: 0x80131500 (message: 'Input parquet file is ill-formed and cannot be processed: 'not a Parquet file (size too small)'.: ', details: 'Source: Kusto.Common.Svc [0]Kusto.Common.Svc.Exceptions.IngestionSourceParquetReaderException: Input parquet file is ill-formed and cannot be processed: 'not a Parquet file (size too small)'. Timestamp=2020-05-07T11:22:42.0340199Z
Folder structure at ADLS:-
logs / {AppId}/ 2020 / 05 / 07
External Table definition :
.create external table ExTParquet (AppId:string,UserId:string,Email:string,RoleName:string,Operation:string,EntityId:string,EntityType:string,EntityName:string,TargetTitle:string,Params:string,EventProcessedUtcTime:datetime,PartitionId:string,EventEnqueuedUtcTime:datetime)
kind=blob
partition by
AppId,
bin(EventProcessedUtcTime,1d)
dataformat=parquet
(
h#'https://streamoutalds2.blob.core.windows.net/stream-api-raw-parquet/logs;secret_key'
)
with
(
folder = "ExternalTables"
)
Note : if i provide full file path and remove source directory partitioning from External Table definition , it works well.
But i need to ready data for all the files within directory and not just one.
Any help is much Appreciated.

This is a known issue being worked out. You can open a support ticket (Azure Data Explorer). The team will also post here an update when issue is resolved.
[EDIT] the issue should have been resolved.

Related

Liquibase SQL Changeset Cannot Load CSV File : FileNotFoundException

Using a SQL style approach to Liquibase changesets (which is our codestyle, we don't use XML) I am trying to load an CSV file using the following SQL changeset
SQL
-- changeset user:insert-prices-data-temp-table
LOAD DATA LOCAL INFILE 'foo/src/main/resources/liquibase/changelogs/2021/prices.csv'
INTO TABLE prices_temp
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
When deploying the WAR file to Wildfly the following exception
java.io.FileNotFoundException:
foo/src/resources/liquibase/changelogs/2021/prices.csv (No such file
or directory) 2021-09-22T20:52:18.581085123Z at
java.io.FileInputStream.open0(Native Method)
2021-09-22T20:52:18.581087214Z at
java.io.FileInputStream.open(FileInputStream.java:195)
2021-09-22T20:52:18.581089768Z at
java.io.FileInputStream.(FileInputStream.java:138)
2021-09-22T20:52:18.581092029Z at
java.io.FileInputStream.(FileInputStream.java:93)
2021-09-22T20:52:18.581094150Z at
com.mysql.jdbc.MysqlIO.sendFileToServer(MysqlIO.java:3772)
The Master Change log file references the 2021 directory with the SQL and CSV file existing in the same directory.
<includeAll path="liquibase/changelogs/2021/" filter="xml" errorIfMissingOrEmpty="true" />
I have tried the following other paths but they all still yield a FileNotFoundException
prices.csv
liquibase/changelogs/2021/prices.csv
WEB-INF/classes/liquibase/changelogs/2021/CCC-220-marking-time-prices.csv
(Absolute path) /home/xxxx/Work/foo-service/foo/src/main/resources/liquibase/changelogs/2021/prices.csv
Liquibase Version: 3.5.1
Wildfly Jboss Version : 21
I have checked the CSV file is present in the WAR file
Any ideas how to fix this?
The problem is that you are trying to use some mysql function LOAD DATA LOCAL INFILE which doesn't know about your classpath. It's trying to look at your filesystem and such path doesn't exists. Even if you provide something like yourapp.jar!liquibase/changelogs/2021/prices.csv it won't be able to read that file. You will need to pull prices.csv out of your application to filesystem and point mysql function to that location.
Or you can use liquibase's loadData if that helps.

What is the filepath that a "Read CSV" operator needs to read a file from RapidMiner Server?

I have a RM Server running on a VM (Ubuntu) on top of my Win10 machine.
I have a process to read a .csv file and write its contents on a MySQL database on a MySQL Server which also runs on the same VM.
The problem is that the read file operator does not seem to be able to find the file.
Scenario1.
When I try as location-name in the read csv operator ../data/myFile.csv
and run the process on Server I am getting Failed to execute initialization process: Error executing process /apps/myApp/process/task_read_csv_to_db: The file 'java.io.FileNotFoundException: /root/../data/myFile.csv (No such file or directory)' does not exist.
Scenario2.
When I try as location-name in the read csv operator /apps/myApp/data/myFile.csv
and run the process on Server I am getting Failed to execute initialization process: Error executing process /apps/myApp/process/task_read_csv_to_db: The file 'java.io.FileNotFoundException: /apps/myApp/data/myFile.csv (No such file or directory)' does not exist.
What is the right filepath that I should give to the Read CSV operator?
Just to update with the answer. After David's suggestion, I resulted in storing the .csv file outside of the /rapidminer-server-home/data/repository since every remote repository seems to be depicted with an integer instead of its original name, making the use of the actual full path of the file not usable.
I would say, the issue is that depending on the location of the JobAgent that is executing your process, the relative path might be varying.
Is /apps/myApp/data/myFile.csv the correct path to the file? If not, I would suggest to use the absolute path to the file. Hope this helps.
Best,
David

SQL lite import csv error CREATE TABLE data;(...) failed: near ";": syntax error

Brand new to SQL lite, running on a mac. I'm trying to import a csv file from the SQL lite tutorial:
http://www.sqlitetutorial.net/sqlite-import-csv/
The 'cities' data I'm trying to import for the tutorial is here:
http://www.sqlitetutorial.net/wp-content/uploads/2016/05/city.csv
I try and run the following code from Terminal to import the data into a database named 'data' and get the following error:
sqlite3
.mode csv
.import cities.csv data;
CREATE TABLE data;(...) failed: near ";": syntax error
A possible explanation may be the way I'm downloading the data - I copied the data from the webpage into TextWrangler and saved it as a .txt file. I then manually changed the extension to .csv. This doesn't seem very eloquent but that was the advice I found online for creating the .csv file: https://discussions.apple.com/thread/7857007
If this is the issue then how can I resolve it? If not then where am I going wrong?
Another potentially useful point - when I executed the code yesterday there was no problem, it created a database with the data. However, running the same code today produces the error.
sqlite3 dot commands such as .import are not SQL and don't need semicolon at end. Replace
.import cities.csv data;
with
.import cities.csv data

csync/sqlite error when running ownCloud command

I am running owncloudcmd to sync files from a local* path to an ownCloud/Nextcloud server, all running Debian 8. However it fails with the error:
[5] csync_statedb_query sqlite3_compile error: disk I/O error - on
query PRAGMA quick_check; [6] csync_statedb_load ERR: sqlite3
integrity check failed - bail out: disk I/O error. #### ERROR during
csync_update : "CSync failed to load the journal file. The journal
file is corrupted."
I am not very familiar with csync or sqlite so I am a bit in the dark and although I can find talk of this issue through googling, I can't find a fix. The data in this case can be dumped to start over so I'm happy to flush any database or anything else. I've trying removing the created csync and journal files assuming one of them was corrupted but it doesn't seem to change anything.
I have read talk about changing PRAGMA settings to ignore the error (or check) but I can't see how this is implemented either.
Is anyone able to show me how to clear out the corruption?
*the local file is a mounted path to an AWS S3 bucket but I think this is irrelevant because it is working on other systems fine.

fast export unexplained failure

I have roughly 14 million records that I am attempting to export from a Teradata table to file using a fast export connection object.
There is no size limit for fast export files on our Linux system, and there is 1.2 TB of available space in the target directory.
The session fails, and gives the following errors:
READER_2_1_1 FEXP_87011 Process [16022] exited with status [12]
SDKS_38200 Partition-level [SOURCE_TABLE_NAME]: Plug-in #305400 failed in deinit()
I googled the error message, and found this post:
Here
I followed the recommendations in the port to delete the .out file in the temp directory, delete the files that were partially filled in the target directory, and drop the error table and delete the log file. This did not fix the issue and the session still fails with the same error messages.
Try to use TPT Export plug-in instead. Also you can try to execute this FastExport using bteq scripts directly on your unix environment.

Resources