Loading Hive Data into Oracle Db - oracle11g

I need to mapping a Hive table into a Oracle Db using Odi 11.1.1.6.0 .
I create physical and logical schema of the two technology. (connection test is ok)
I have a physical and logical odi agent that use Http and port 20910. (connection test is ok)
I used RKM for the reverse engineering of the two tables (tables Hive and the corresponding Oracle table with the same fields).
After that, I create a project with an interface to test the mapping.
I use drag and drop for the source Hive table and the Target Oracle Table.
After that, I use drag and drop of each field of Hive table to the corresponding oracle table.
The dimension/type of each field of the two tables are the same.
I control the Flow of interface and this use only IKM File-Hive to Oracle (OLH).
When I start the interface, the session starting but there is this error:
ODI-1226: Step Hive_to_Oracle_test fails after 1 attempt(s).
ODI-1240: Flow Hive_to_Oracle_test fails while performing a Integration operation. This flow loads target table TEST_TABLE.
Caused By: com.sunopsis.dwg.function.SnpsFunctionBaseException: ODI-30038: OS command returned 4.
at com.sunopsis.dwg.tools.OSCommand.actionExecute(OSCommand.java:294)
at com.sunopsis.dwg.function.SnpsFunctionBase.execute(SnpsFunctionBase.java:276)
at com.sunopsis.dwg.dbobj.SnpSessTaskSql.execIntegratedFunction(SnpSessTaskSql.java:3437)
at com.sunopsis.dwg.dbobj.SnpSessTaskSql.executeOdiCommand(SnpSessTaskSql.java:1509)
at oracle.odi.runtime.agent.execution.cmd.OdiCommandExecutor.execute(OdiCommandExecutor.java:44)
at oracle.odi.runtime.agent.execution.cmd.OdiCommandExecutor.execute(OdiCommandExecutor.java:1)
at oracle.odi.runtime.agent.execution.TaskExecutionHandler.handleTask(TaskExecutionHandler.java:50)
at com.sunopsis.dwg.dbobj.SnpSessTaskSql.processTask(SnpSessTaskSql.java:2913)
at com.sunopsis.dwg.dbobj.SnpSessTaskSql.treatTask(SnpSessTaskSql.java:2625)
at com.sunopsis.dwg.dbobj.SnpSessStep.treatAttachedTasks(SnpSessStep.java:558)
at com.sunopsis.dwg.dbobj.SnpSessStep.treatSessStep(SnpSessStep.java:464)
at com.sunopsis.dwg.dbobj.SnpSession.treatSession(SnpSession.java:2093)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor$2.doAction(StartSessRequestProcessor.java:366)
at oracle.odi.core.persistence.dwgobject.DwgObjectTemplate.execute(DwgObjectTemplate.java:216)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor.doProcessStartSessTask(StartSessRequestProcessor.java:300)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor.access$0(StartSessRequestProcessor.java:292)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor$StartSessTask.doExecute(StartSessRequestProcessor.java:855)
at oracle.odi.runtime.agent.processor.task.AgentTask.execute(AgentTask.java:126)
at oracle.odi.runtime.agent.support.DefaultAgentTaskExecutor$2.run(DefaultAgentTaskExecutor.java:82)
at java.lang.Thread.run(Thread.java:662)
hive_to_oracle_test is my interface,TEST_TABLE is my oracle table.
Any idea?

I think you can use SQOOP check https://ccp.cloudera.com/display/con/Quest+Data+Connectors. You can also look into ORAOOP check http://archive.cloudera.com/cdh/3/adapters/oraoopuserguide.pdf for more detials. Both of this could be used for data transfer from hive to oracle database.

Related

How to access on premise Teradata from Azure Databricks

We need to connect to on premise Teradata from Azure Databricks .
Is that possible at all ?
If yes please let me know how .
I was looking for this information as well and I recently was able to access our Teradata instance from Databricks. Here is how I was able to do it.
Step 1. Check your cloud connectivity.
%sh nc -vz 'jdbcHostname' 'jdbcPort'
- 'jdbcHostName' is your Teradata server.
- 'jdbcPort' is your Teradata server listening port. By default, Teradata listens to the TCP port 1025
Also check out Databrick’s best practice on connecting to another infrastructure.
Step 2. Install Teradata JDBC driver.
Teradata Downloads page provides JDBC drivers by version and archive type. You can also check the Teradata JDBC Driver Supported Platforms page to make sure you pick the right version of the driver.
Databricks offers multiple ways to install a JDBC library JAR for databases whose drivers are not available in Databricks. Please refer to the Databricks Libraries to learn more and pick the one that is right for you.
Once installed, you should see it listed in the Cluster details page under the Libraries tab.
Terajdbc4.jar dbfs:/workspace/libs/terajdbc4.jar
Step 3. Connect to Teradata from Databricks.
You can define some variables to let us programmatically create these connections. Since my instance required LDAP, I added LOGMECH=LDAP in the URL. Without LOGMECH=LDAP it returns “username or password invalid” error message.
(Replace the text in italic to the values in your environment)
driver = “com.teradata.jdbc.TeraDriver”
url = “jdbc:teradata://Teradata_database_server/Database=Teradata_database_name,LOGMECH=LDAP”
table = “Teradata_schema.Teradata_tablename_or_viewname”
user = “your_username”
password = “your_password”
Now that the connection variables are specified, you can create a DataFrame. You can also explicitly set this to a particular schema if you have one already. Please refer to Spark SQL Guide for more information.
Now, let’s create a DataFrame in Python.
My_remote_table = spark.read.format(“jdbc”)\
.option(“driver”, driver)\
.option(“url”, url)\
.option(“dbtable”, table)\
.option(“user”, user)\
.option(“password”, password)\
.load()
Now that the DataFrame is created, it can be queried. For instance, you can select some particular columns to select and display within Databricks.
display(My_remote_table.select(“EXAMPLE_COLUMN”))
Step 4. Create a temporary view or a permanent table.
My_remote_table.createOrReplaceTempView(“YOUR_TEMP_VIEW_NAME”)
or
My_remote_table.write.format(“parquet”).saveAsTable(“MY_PERMANENT_TABLE_NAME”)
Step 3 and 4 can also be combined if the intention is to simply create a table in Databricks from Teradata. Check out the Databricks documentation SQL Databases Using JDBC for other options.
Here is a link to the write-up I published on this topic.
Accessing Teradata from Databricks for Rapid Experimentation in Data Science and Analytics Projects
If you create a virtual network that can connect to on prem then you can deploy your databricks instance into that vnet. See https://docs.azuredatabricks.net/administration-guide/cloud-configurations/azure/vnet-inject.html.
I assume that there is a spark connector for terradata. I haven't used it myself but I'm sure one exists.
You can't. If you run Azure Databricks, all the data needs to be stored in Azure. But you can call the data using REST API from Teradata and then save data in Azure.

Spark newbie (ODBC/SparkSQL)

I have a spark cluster setup and tried both native scala and spark sql on my dataset and the setup seems to work for the most part. I have the following questions
From an ODBC/extenal connectivity to the cluster, what should i expect?
- the admin/developer shapes the data and persists/caches a few RDDs that will be exposed? (Thinking on the lines of hive tables)
- What would be the equivalent of connecting to a "Hive metastore" in spark/spark sql?
Is thinking along the lines of hive faulted?
My other question was
- when i issue hive queries, (and say create tables and such), it uses the same hive metastore as hadoop/hive
- Where do the tables get created when i issue sql queries using sqlcontext?
- If i persist the table, it is the same concept as persisting an RDD?
Appreciate your answers
Nithya
(this is written with spark 1.1 in mind, be aware that new features tend to be added quickly, some limitations mentioned below might very well disappear at some point in the future).
You can use Spark SQL with Hive syntax and connect to Hive metastore, which will result in your Spark SQL hive commands to be executed on the same data space as if they were executed through Hive directly.
To do that you simply need to instantiate a HiveContext as explained here and provide a hive-site.xml configuration file that specifies, among other things, where to find the Hive metastore.
The result of a SELECT statement is a SchemaRDD, which is an RDD of Row objects that has an associated schema. You can use it just like you use any RDD, including cache and persist and the effect is the same (the fact that the data comes from hive has not influence here).
If your hive command is creating data, e.g. "CREATE TABLE ... ", the corresponding table gets created in exactly the same place as with regular Hive, i.e. /var/lib/hive/warehouse by default.
Executing Hive SQL through Spark provides you with all the caching benefits of Spark: executing a 2nd SQL query on the same data set within the same spark context will typically be much faster than the first query.
Since Spark 1.1, it is possible to start the Thrift JDBC server, which is essentially an equivalent to HiveServer2 and thus allows you to execute SparkQL commands through a JDBC connection.
Note that not all Hive features are available (yet?), see details here.
Finally, you can also discard Hive syntax and metastore and execute SQL queries directly on CSV and Parquet files. My best guess is that this will become the preferred approach in the future, although at the moment the set of SQL features available like this is smaller than when using the Hive syntax.

Invalid column definition error when using four part name to access Oracle DB as SQL Server linked server

I have setup a linked server in SQL Server 2008 R2 in order to access an Oracle 11g database. The MSDASQL provider is used to connect to the linked server through the Oracle Instant Client ODBC driver. The connection works well when using the OPENQUERY with the below syntax:
SELECT *
FROM OPENQUERY(LINKED_SERVER, 'SELECT * FROM SCHEMA.TABLE')
However, went I try to use a four part name using the below syntax:
SELECT *
FROM LINKED_SERVER..SCHEMA.TABLE
I receive the following error:
Msg 7318, Level 16, State 1, Line 1
The OLE DB provider "MSDASQL" for linked server "LINKED_SERVER" returned an invalid column definition for table ""SCHEMA"."TABLE"".
Does anyone have any insight on what my be causing the four part name query to fail while the OPENQUERY one works without any problems?
The correct path to follow is to use OPENQUERY function because your linked server is Oracle: the four name syntax will work fine for MSSQL servers, essentially because they understand T-SQL.
With very simple queries, a 4 part name can accidentally work but not often if you are in a real scenario. In your case, the SELECT * is returning all the columns, and in your case one of the column definition is not compatible with SQL Server. Try another table or try to select a single simple column (e.g. a CHAR or a NUMBER), maybe it will work without problem.
In any case, using distributed queries can be tricky sometime. Database itself does some optimizations before executing commands, so it is important for the database to know what it can do and what it can't. If the DB thinks the linked server is MSSQL, it will take some action that may not work with Oracle.
When using four part name syntax with a linked DB different from MSSQL, you will have other problems as well, for example using database builtin functions (i.e. to_date() Oracle function will not work because MSSQL would want to use its own convert() function, and so on).
So again, if the linked server is not a MSSQL, the right choice is to use OPENQUERY and passing it a query that use a syntax valid against the linked server SQL dialect.
If you use the OLEDB provider for Oracle you can query without using openquery

specify default schema for a database in db2 client

Do we have any way to specify default schema in cataloged DBs in db2 client in AIX.
The problem is , when it's connecting to DB, it's taking user ID as default schema and that's where it's failing.
We have too many scripts that are doing transactions to DB without specifying schema in their db2 sql statements. So it's not feasible to change scripts at all.
Also we can't create users to match schema.
You can try to type SET SCHEMA=<your schema> ; before executing your queries.
NOTE: Not sure if this work (I am without a DB2 database at the moment, but it seems that work) and depending on your DB2 version.
You can create a stored procedure that just changes the current schema and then set the SP as connect proc. You can test some conditions before make that schema change, for example if the stored procedure is executed from the AIX server directly with a given user.
You configure the database to use this SP each time a connection is established by modifying connect_proc
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.config.doc/doc/r0057371.html
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.dbobj.doc/doc/c0057372.html
You can create alias in the new user schema that points to the tables with the other schema. Refer these links :
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0000910.html
http://bytes.com/topic/db2/answers/181247-do-you-have-always-specify-schema-when-using-db2-clp

How to create a small and simple database using Oracle 11 g and SQL Developer?

How to create a small and simple database using Oracle 11 g and SQL Developer ?
I am seeing too many errors and I cannot find any way to make a simple database.
For example
create database company;
Caused the following error:
Error starting at line 1 in command:
create database company
Error at Command Line:1 Column:0
Error report:
SQL Error: ORA-01501: CREATE DATABASE failed
ORA-01100: database already mounted
01501. 00000 - "CREATE DATABASE failed"
*Cause: An error occurred during create database
*Action: See accompanying errors.
EDIT-
This is completely different from MySQL and MS-SQL that I am familiar with.
Not as intuitive as I was expecting.
First off, what Oracle calls a "database" is generally different than what most other database products call a "database". A "database" in MySQL or SQL Server is much closer to what Oracle calls a "schema" which is the set of objects owned by a particular user. In Oracle, you would generally only have one database per server (a large server might have a handful of databases on it) where each database has many different schemas. If you are using the express edition of Oracle, you are only allowed to have 1 database per server. If you are connected to Oracle via SQL Developer, that indicates that you already have the Oracle database created.
Assuming that you really want to create a schema, not a database (using Oracle terminology), you would create the user
CREATE USER company
IDENTIFIED BY <<password>>
DEFAULT TABLESPACE <<tablespace to use for objects by default>>
TEMPORARY TABLESPACE <<temporary tablespace to use>>
You would then assign the user whatever privileges you wanted
GRANT CREATE SESSION TO company;
GRANT CREATE TABLE TO company;
GRANT CREATE VIEW TO company;
...
Once that is done, you can connect to the (existing) database as COMPANY and create objects in the COMPANY schema.
Actually the answer from Justin above could not be more incorrect. SQL Server and MySQL are for smallish databases. Oracle is for large enterprise databases, thus the difference in it's structure. And it is common to have more than one Oracle database on a server provided that the server is robust enough to handle the load. If you received the error posted above then you obviously are trying to create a new Oracle database and if you are doing that then you probably already understand the structure of an Oracle database. The likely scenario is that you attempted to create a database using dbca, it initially failed, but the binaries were created. You then adjusted your initial parameters and re-tried creating the database using dbca. However, the utility sees the binaries and folder structure for the database that you are creating so it thinks that the database already exists but is not mounted. Dropping the database and removing the binaries and folders as well as any other cleanup of the initial attempt should be done first, then try again.
From your question description, I think you were to create a database schema, not a database instance. In Oracle terminology, a database instance is a set of files in the file system. It's more like data files in MySQL. Whereas database in MySQL is somewhat equivalent to Oracle's schema.
To create a schema in Oracle: https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6014.htm
To create a database instance in Oracle (I personally prefer CDBA):
https://docs.oracle.com/cd/E11882_01/server.112/e25494/create.htm#ADMIN11068
Notice the Oracle Express edition does not support mounting more than one database instance at one time.

Resources