Flink Hbase connector: write data in Hbase sink table : Unable to create a sink for writing table - jar

I want to write data in hbase sink table, I have Hbase version 2.2.0 which is compatible flink version 1.14.4
I defined the sink hbase table as follows:
sink_ddl = """
CREATE TABLE hTable (
datemin STRING,
family2 ROW<datemax STRING>,
family3 ROW<channel_title STRING, channel_id STRING>,
PRIMARY KEY (datemin) NOT ENFORCED
) WITH (
'connector' = 'hbase-2.2',
'table-name' = 'test',
'zookeeper.quorum' = '127.0.0.1:2181'
)
"""
And I write data into it with:
table_env.execute_sql("""
INSERT INTO hTable
SELECT
datemin,
ROW(datemax),
ROW(channel_title, channel_id)
FROM table_api_table
""")
but I got error
py4j.protocol.Py4JJavaError: An error occurred while calling o1.executeSql.
: org.apache.flink.table.api.ValidationException: Unable to create a sink for writing table 'default_catalog.default_database.hTable'.
Table options are:
'connector'='hbase-2.2'
'table-name'='test'
'zookeeper.quorum'='127.0.0.1:2181'
Caused by: java.lang.NoSuchMethodError: org.apache.flink.table.factories.DynamicTableFactory$Context.getPhysicalRowDataType()Lorg/apache/flink/table/types/DataType;
at org.apache.flink.connector.hbase2.HBase2DynamicTableFactory.createDynamicTableSink(HBase2DynamicTableFactory.java:95)
at org.apache.flink.table.factories.FactoryUtil.createTableSink(FactoryUtil.java:181)
... 28 more
btw: I added connector jar
please any help?
what is the cause of this error?
how can I connect flink with hbase

Finally it's working !!
I fixed this issue by doing the following:
I edited hbase-env.sh :
# Extra Java CLASSPATH elements. Optional.
export HBASE_CLASSPATH=/home/hadoop/hbase/conf
I edited hbase-site.xml, so I added the following propertie:
<property>
<name>hbase.defaults.for.version.skip</name>
<value>true</value>
</property>
Then editing the connector jar , indeed I unpackaged the jar and then I edited hbase-default.xml
<property>
<name>hbase.defaults.for.version.skip</name>
<value>true</value>
<description>Set to true to skip the 'hbase.defaults.for.version' check.
Setting this to true can be useful in contexts other than
the other side of a maven generation; i.e. running in an
IDE. You'll want to set this boolean to true to avoid
seeing the RuntimeException complaint: "hbase-default.xml file
seems to be for and old version of HBase (\${hbase.version}), this
version is X.X.X-SNAPSHOT"</description>
</property>
and finally, moving the jar in flink lib folder (it's better than :
table_env.get_config().get_configuration().set_string("pipeline.jars","file:///home/hadoop/hbase/conf/flink-sql-connector-hbase-2.2_2.11-1.14.4.jar")
)
this articles helped me a lot:
https://www.cnblogs.com/panfeng412/archive/2012/07/22/hbase-exception-hbase-default-xml-file-seems-to-be-for-and-old-version-of-hbase.html
https://blog.csdn.net/bokzmm/article/details/119882885

Related

How to migrate data to new server? (Sequence generator issue)

What is the right way to backup and restore a MariaDB database that has sequence generation enabled (i.e. NOT autoincrement)? (This includes migrating to a new server.)
Is it possible to instruct the sequence generator to pick up indexing table data at a specific ID value? How?
Steps I take to create my issue
I wish to transfer an application to a new server:
Backup data on source server:
mysqldump --skip-opt --no-create-db --no-create-info --hex-blob [database-name] [...list of tables...] > data-backup.sql
On target server, create new empty database (same name)
Build/run JHipster Spring application on target server: java -jar myapp.jar (Running this application recreates/configures a new instance of the database on the target server.)
Restore data:
mysql [database-name] < data-backup.sql
All the above steps produce no errors (so far).
Problem
When I follow these steps, the database is restored (apparently perfectly). I can log in to the application and access all information. BUT when I attempt to create new entities (i.e. save something to the database), I get an ID 'Duplicate entry' error in the server logs:
2022-03-24 12:54:43.775 ERROR 11277 --- [ XNIO-1 task-1] o.h.e.jdbc.batch.internal.BatchingBatch : HHH000315: Exception executing batch [java.sql.BatchUpdateException: (conn=33) Duplicate entry '1001' for key 'PRIMARY'], SQL: insert into product (name, id) values (?, ?)
2022-03-24 12:54:43.776 WARN 11277 --- [ XNIO-1 task-1] o.h.engine.jdbc.spi.SqlExceptionHelper : SQL Error: 1062, SQLState: 23000
2022-03-24 12:54:43.776 ERROR 11277 --- [ XNIO-1 task-1] o.h.engine.jdbc.spi.SqlExceptionHelper : (conn=33) Duplicate entry '1001' for key 'PRIMARY'
2022-03-24 12:54:43.779 ERROR 11277 --- [ XNIO-1 task-1] o.z.problem.spring.common.AdviceTraits : Internal Server Error
org.springframework.dao.DataIntegrityViolationException: could not execute batch; SQL [insert into product (name, id) values (?, ?)]; constraint [PRIMARY]; nested exception is org.hibernate.exception.ConstraintViolationException: could not execute batch
at org.springframework.orm.jpa.vendor.HibernateJpaDialect.convertHibernateAccessException(HibernateJpaDialect.java:276)
at org.springframework.orm.jpa.vendor.HibernateJpaDialect.translateExceptionIfPossible(HibernateJpaDialect.java:233)
at org.springframework.orm.jpa.JpaTransactionManager.doCommit(JpaTransactionManager.java:566)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:743)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:711)
at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:654)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:407)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698)
at com.mycompany.app.web.rest.ProductResource$$EnhancerBySpringCGLIB$$84c14d6d.createProduct(<generated>)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
...
Clearly my backup/restore process is not accounting properly for the sequence generator, which generates ID values that conflict with the existing data.
What I am doing wrong? What is the right process of backing up/restoring such a database?
Environment: JHipster 7.7.0 (Angular, monolithic), MariaDB 10.4, OpenJDK 16.0.2_7, OS Windows 10 Pro and openSUSE 15.2, Firefox 98.0.2 and Chrome 99.0.4844.84.
PS: I previously reported this issue here, aimed at the JHipster community, but got limited response. I think I need a MySQL/MariaDB expert opinion on this.
(Apologies in advance: I'm not a database expert. The technique I outline above has served me well for years, but previously I was dealing with AUTO_INCREMENT. This sequence generator has me baffled.)
Ok! I have solutions.
[For the sake if these notes, let's call the database: mydata. Also, in JHipster, the MariaDB sequence generator is called: sequence_generator]
Let's consider two situations:
(1) Simple migration
If you are merely migrating the application to a new server, the process is straight forward:
Step 1: On the original server backup and secure your database: mysqldump -u root -p mydata > mydata.sql
Step 2: Transfer the SQL file to the new server, along with the JHipster JAR file
Step 3: On the new server, create an empty database with the same name, and restore the data: mysql -u root -p mydata < mydata.sql
Step 4: Now launch your JHipster application, and everything should work
(2) Model modification
The assumption is that you have modified your model in some way (e.g. added properties to one or more entities). This solution is fiddly, but it works (for me).
Step 1: Backup your database, and secure it (in case something goes wrong): mysqldump -u root -p mydata > mydata.sql
Step 2: Backup and secure the original JHipster JAR that works with the original database
Step 3: Duplicate your database (schema and data) in a new table: mydata_bk
Step 4: Drop your original database, and create a new empty database
Step 5: Launch your new JHipster JAR, and give it time to create the new database schema, then stop the application
Step 6: Use a tool (DataGrip, sqlYog, etc) to compare the old (mydata_bk) and new schema (mydata), and modify the old schema to match the new schema
Step 7: Restore/copy all data from mydata_bk to mydata, EXCEPT for the tables DATABASECHANGELOG, DATABASECHANGELOGLOCK and the special sequence_generator table
Step 8: Open the mydata.sql SQL file, and at the top, after initial comments, one of the first instructions will read:
--
-- Sequence structure for `sequence_generator`
--
DROP SEQUENCE IF EXISTS `sequence_generator`;
CREATE SEQUENCE `sequence_generator` start with 2000 minvalue 1 maxvalue 9223372036854775806 increment by 50 cache 1000 nocycle ENGINE=InnoDB;
SELECT SETVAL(`sequence_generator`, 201050, 0);
The specific numbers may vary, but the broad details will be similar. In a MariaDB SQL console type/execute each of those SQL statements: DROP SEQUENCE ...;, CREATE SEQUENCE ...;, and SELECT SETVAL(...);
Step 9: Launch your JHipster application.
Hope this helps others that run into similar issues. Let me know if you have a better approach!

Liquibase SQL Changeset Cannot Load CSV File : FileNotFoundException

Using a SQL style approach to Liquibase changesets (which is our codestyle, we don't use XML) I am trying to load an CSV file using the following SQL changeset
SQL
-- changeset user:insert-prices-data-temp-table
LOAD DATA LOCAL INFILE 'foo/src/main/resources/liquibase/changelogs/2021/prices.csv'
INTO TABLE prices_temp
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
When deploying the WAR file to Wildfly the following exception
java.io.FileNotFoundException:
foo/src/resources/liquibase/changelogs/2021/prices.csv (No such file
or directory) 2021-09-22T20:52:18.581085123Z at
java.io.FileInputStream.open0(Native Method)
2021-09-22T20:52:18.581087214Z at
java.io.FileInputStream.open(FileInputStream.java:195)
2021-09-22T20:52:18.581089768Z at
java.io.FileInputStream.(FileInputStream.java:138)
2021-09-22T20:52:18.581092029Z at
java.io.FileInputStream.(FileInputStream.java:93)
2021-09-22T20:52:18.581094150Z at
com.mysql.jdbc.MysqlIO.sendFileToServer(MysqlIO.java:3772)
The Master Change log file references the 2021 directory with the SQL and CSV file existing in the same directory.
<includeAll path="liquibase/changelogs/2021/" filter="xml" errorIfMissingOrEmpty="true" />
I have tried the following other paths but they all still yield a FileNotFoundException
prices.csv
liquibase/changelogs/2021/prices.csv
WEB-INF/classes/liquibase/changelogs/2021/CCC-220-marking-time-prices.csv
(Absolute path) /home/xxxx/Work/foo-service/foo/src/main/resources/liquibase/changelogs/2021/prices.csv
Liquibase Version: 3.5.1
Wildfly Jboss Version : 21
I have checked the CSV file is present in the WAR file
Any ideas how to fix this?
The problem is that you are trying to use some mysql function LOAD DATA LOCAL INFILE which doesn't know about your classpath. It's trying to look at your filesystem and such path doesn't exists. Even if you provide something like yourapp.jar!liquibase/changelogs/2021/prices.csv it won't be able to read that file. You will need to pull prices.csv out of your application to filesystem and point mysql function to that location.
Or you can use liquibase's loadData if that helps.

External Table for Parquet or Avro data format throws error

i created an external table to fetch Parquet data from ADLS which throws below error.
Query execution has resulted in error (0x80131500): Partial query failure: 0x80131500 (message: 'Input parquet file is ill-formed and cannot be processed: 'not a Parquet file (size too small)'.: ', details: 'Source: Kusto.Common.Svc [0]Kusto.Common.Svc.Exceptions.IngestionSourceParquetReaderException: Input parquet file is ill-formed and cannot be processed: 'not a Parquet file (size too small)'. Timestamp=2020-05-07T11:22:42.0340199Z
Folder structure at ADLS:-
logs / {AppId}/ 2020 / 05 / 07
External Table definition :
.create external table ExTParquet (AppId:string,UserId:string,Email:string,RoleName:string,Operation:string,EntityId:string,EntityType:string,EntityName:string,TargetTitle:string,Params:string,EventProcessedUtcTime:datetime,PartitionId:string,EventEnqueuedUtcTime:datetime)
kind=blob
partition by
AppId,
bin(EventProcessedUtcTime,1d)
dataformat=parquet
(
h#'https://streamoutalds2.blob.core.windows.net/stream-api-raw-parquet/logs;secret_key'
)
with
(
folder = "ExternalTables"
)
Note : if i provide full file path and remove source directory partitioning from External Table definition , it works well.
But i need to ready data for all the files within directory and not just one.
Any help is much Appreciated.
This is a known issue being worked out. You can open a support ticket (Azure Data Explorer). The team will also post here an update when issue is resolved.
[EDIT] the issue should have been resolved.

When using clojure's korma sqlite3 helpers, what's the default path for the sqlite3 database?

When using korma.db, defdb can take a sqlite3 helper to establish a connexion to a sqlite3 database. However, I've tried placing the database on the root of the project directory, alongside project.clj, and on the resources directory, but when I try to use the db I get:
Failure to execute query with SQL:
SELECT "examples".* FROM "examples" :: []
SQLException:
Message: [SQLITE_ERROR] SQL error or missing database (no such table: examples)
Needless to say my sqlite database contains an examples table. When trying to do this, I get a sqlite.db file of zero bytes placed on the root project dir.
I'm doing this from lein repl within the project, by the way.
Edit: This is what I do when it fails:
(use 'korma.db)
(defdb db (sqlite3 {:db "filename.db"}))
(use 'korma.core)
(defentity examples)
(select examples)
Just in case anybody is wondering or runs into this...
Using version [korma "0.4.2"]
and [org.xerial/sqlite-jdbc "3.7.15-M1"]
in my project.clj:
My project structure looks like:
root/project.clj
root/db/dev.sqlite3
root/src/...
and this is how I use korma to access the db:
(use 'korma.db)
(defdb mydb {:classname "org.sqlite.JDBC"
:subprotocol "sqlite"
:subname "db/dev.sqlite3"})
Basically, using subname, I'm able to search in the root of the lein project. I added db/ in the subname per my dir structure above.

Hive query execution for custom udf is exepecting hdfs jar path instead of local path in CDH4 with Oozie flow

We are migrating from CDH3 to CDH4 and as part of this migration we are moving all the jobs that we have on CDH3. We have noticed one critical issue in this, when a work flow is executed through oozie for executing a python script which internally invoked a hive query(hive -e {query}), here in this hive query we are adding a custom jar using add jar {LOCAL PATH FOR JAR}, and created a temporary function for custom udf. And it looks ok till here. But when the query started executing with custom udf funtion it is failing with Distributed cache, File Not Found Exception which is looking for jar in the HDFS path instead of lookig in local path.
I am not sure if I am missing some configuration here.
Execption Trace:
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the
log4j.properties files. Execution log at:
/tmp/yarn/yarn_20131107020505_79b41443-b9f4-4d36-a0eb-4f0d79cd3ce9.log
java.io.FileNotFoundException: File does not exist:
hdfs://aa.bb.com:8020/opt/nfsmount/mypath/custom.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:824)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
..... .....
any help on this is highly appreciated.
Regards,
GHK.
There are some few options. All the required jar should be in the classpath before you run hive query.
option 1: Add your custom jar by <file>/hdfs/path/to/your/jar</file> in oozie workflow
option 2: use attribute --auxpath /local/path/to/your/jar while calling your hive script in python. Eg: hive --auxpath /local/path/to/your.jar -e {query}

Resources