Does anyone has experienced on using Custom mapper in Sqoop Export from Hive Table to SQLDatabase before?
I had using the following sqoop command with 33 mappers to performed sqoop export:
sqoop export -Dmapred.job.queue.name=projectname -Dsqoop.export.records.per.statement=1000 --connect "jdbc:sqlserver://svrname;database=dbname" --username 'usrname' --password 'pwd' --hcatalog-database hive_schema_name --hcatalog-table hive_obj_name --table 'SQL_DB_OBJ_NAME' -- --schema SQL_DB_SCHEMA_NAME --fields-terminated-by $'\x01' -m 33 -batch
But as a result, I can only see 4 mappers was using from application master and required a long time to complete due to Huge Amount of Data. So I'm wonder does anyone can help to confirmed whether custom mapper is able to used in sqoop export
Sqoop export supports number of mappers argument, but it will be ignored in your command. You have to move -- --schema <schema-name> to the end of command since the Sqoop CLI has the following structure:
sqoop TOOL PROPERTY_ARGS SQOOP_ARGS [-- EXTRA_ARGS]
Related
I am trying to create a sealed command for my build pipeline which inserts data and quits.
So far I have created my data files
things-to-import-001.sql and 002 etc, which contains all the INSERT statements I'd like to run, with a file per table.
I have created a command file to run them
-- import-all.sql
.read ./things-to-import-001.sql
.read ./things-to-import-002.sql
.quit
However when I run my command
sqlite3 -init ./import-all.sql ./database.sqlite
..the data is inserted, but the program remains running and shows the sqlite> prompt, despite the .quit command. I have also tried using .exit 0.
From the sqlite3 --help
-init FILENAME read/process named file
Docs: https://www.sqlite.org/cli.html#reading_sql_from_a_file
How can I tell sqlite to exit once my inserts have finished?
I have managed to find a dirty workaround for this issue.
I have updated my import file to include a bad command, and executed using -bail to quit on first error.
-- import-all.sql
.read ./things-to-import-001.sql
.read ./things-to-import-002.sql
.fakeErrorToQuitWithBail
Then you can execute with
sqlite3 -init import-all.sql -bail
and it should quit with
Error: unknown command or invalid arguments: "fakeErrorToQuitWithBail". Enter ".help" for help
Try using ".exit" at the place of ".quit". For some reason SQLite dont doccumented this commands.
https://www.tutorialspoint.com/sqlite/sqlite_commands.htm
I need to run dags in parallel but do not need significant scaling, so LocalExecutor can do the job just fine. I looked through the Airflow docs and first created a MySQL database:
CREATE DATABASE airflow_db CHARACTER SET utf8;
CREATE USER <user> IDENTIFIED BY <pass>;
GRANT ALL PRIVILEGES ON airflow_db.* TO <user>;
I then modify the following parameters in the airflow.cfg file:
executor = LocalExecutor
sql_alchemy_conn = mysql+mysqlconnector://<user>:<pass>#localhost:3306/airflow_db
When I run airflow db init, I run into the following error message:
AttributeError: 'MySQLConverter' object has no attribute '_dagruntype_to_mysql'
During handling of the above exception, another exception occurred:
TypeError: Python 'dagruntype' cannot be converted to a MySQL type
Please note that nothing else in the airflow.cfg file was altered and that using the default SequentialExecutor with sqlite lets everything run just fine. Also note that I am using Airflow version 2.2.0
I found the solution to my own question. Instead of using the mysqlconnector, I used the pymysql driver:
pip install PyMySQL
The airflow.cfg parameters can then be adjusted as follows:
sql_alchemy_conn = mysql+pymysql://<user>:<pass>#localhost:3306/airflow_db
All else can stay the same.
I am newbie to SQOOP 1.4.5. I have gone through the sqoop documentation. I have successfully Imported / Exported the simple datatypes kinds of records to and from hdfs.
NEXT I TRIED FOR LOB DATA FOR EXAMPLE CLOB.
I have a simple CLOB table that Create Query is as following...
CREATE TABLE “SCOTT”.”LARGEDATA” (“ID” VARCHAR2(20 BYTE), “IMG” CLOB ) SEGMENT CREATION DEFERRED PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING TABLESPACE “USERS” LOB (“IMG”) STORE AS BASICFILE (TABLESPACE “USERS” ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING );
I can successfully import data to hdsf
sqoop import –connect jdbc:oracle:thin:#:1522: –username –password –table ‘LARGEDATA’ -m 1 –target-dir /home/mydata/tej/LARGEDATA2 –fields-terminated-by , –escaped-by \\ –enclosed-by ‘\”‘
But when I tried to export this data BACK to ORACLE using following command
sqoop export –connect jdbc:oracle:thin:#:1522: –username –password –table ‘LARGEDATA’ -m 1 –export-dir /home/mydata/tej/LARGEDATA2 –fields-terminated-by , –escaped-by \\ –enclosed-by ‘\”‘
I got following exception
java.lang.CloneNotSupportedException: com.cloudera.sqoop.lib.ClobRef at java.lang.Object.clone(Native Method)
java.io.IOException: Could not buffer record at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:218)
and the error metioned in this link https://stackoverflow.com/questions/30778340/sqoop-export-4000-characters-column-data-into-oracle-clob
I google about it and got following links that have mentioned that sqoop does not support export for BLOB and CLOB data. Out of that some are of Jul 2015 post. and some jira issue shown it is still opened. forum links are as following…
https://issues.apache.org/jira/browse/SQOOP-991
Can sqoop export blob type from HDFS to Mysql?
http://sofb.developer-works.com/article/19310921/Can+sqoop+export+blob+type+from+HDFS+to+Mysql%3F
http://grokbase.com/t/sqoop/user/148te4tghg/sqoop-import-export-clob-datatype
Exporting sequence file to Oracle by Sqoop
Can anyone please let me know is SQOOP support export for LOB data? if yes then please guide me how can I do this?
Try creating a staging table in oracle and use --staging-table --clear-staging-table. Keep staging table column as varchar2(10000).
I am trying to import oracle dump in Oracle 11g XE by using the below command
imp system/manager#localhost file=/home/madhu/test_data/oracle/schema_only.sql full=y
Getting like below
IMP-00037: Character set marker unknown
IMP-00000: Import terminated unsuccessfully
Any one please help me
You received IMP-00037 error because of export file corrupted. I'd suspect either your dump file is corrupted or the dump file was not created by exp utility.
If the issue was occured because of corrupted dump file, then there is no choice other than obtaining uncorrupted dump file. Use impdp utility to import if you have used expdp utility to prepare dumpfile.
Following link will be helpful to try other option:
https://community.oracle.com/thread/870104?start=0&tstart=0
https://community.oracle.com/message/734478
If you are not sure which command(exp/expdp) was used, you could check log file which was created during dump export. It contains exact command which was executed to prepare the dump file.
We are migrating from CDH3 to CDH4 and as part of this migration we are moving all the jobs that we have on CDH3. We have noticed one critical issue in this, when a work flow is executed through oozie for executing a python script which internally invoked a hive query(hive -e {query}), here in this hive query we are adding a custom jar using add jar {LOCAL PATH FOR JAR}, and created a temporary function for custom udf. And it looks ok till here. But when the query started executing with custom udf funtion it is failing with Distributed cache, File Not Found Exception which is looking for jar in the HDFS path instead of lookig in local path.
I am not sure if I am missing some configuration here.
Execption Trace:
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the
log4j.properties files. Execution log at:
/tmp/yarn/yarn_20131107020505_79b41443-b9f4-4d36-a0eb-4f0d79cd3ce9.log
java.io.FileNotFoundException: File does not exist:
hdfs://aa.bb.com:8020/opt/nfsmount/mypath/custom.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:824)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
..... .....
any help on this is highly appreciated.
Regards,
GHK.
There are some few options. All the required jar should be in the classpath before you run hive query.
option 1: Add your custom jar by <file>/hdfs/path/to/your/jar</file> in oozie workflow
option 2: use attribute --auxpath /local/path/to/your/jar while calling your hive script in python. Eg: hive --auxpath /local/path/to/your.jar -e {query}