We "need" to feed our SSAS Tabular models with Snowflake data (through ODBC because we have no Snowflake connector).
We tried with SQL Server 2016 and SQL Server 2017 and get atrocious performance (few rows a second...).
Under PowerBI, there is a Snowflake connector and it's fast.
I came across a thread from someone having an apparently similar speed problem when trying to feed Snowflake data to SAS.
He seems to have solved his problem by specifying some (ODBC?) parameters:
dbcommit = 10000
autocommit = no
readbuff = 200
insertbuff = 200
Are these parameters specific to Snowflake? Or just ODBC?
Thanks
Eric
These are SAS options, not ODBC or Snowflake.
For the load issue, ODBC row by row inserts is typically a slow approach.
Is there an option for you to export the data to S3 or Blob and then bulk load in SSAS before running your models - or alternatively could you push down the queries to Snowflake itself?
Instructions for SSAS Tabular models in Azure Analysis Services are here. Everything except the On Premises Data Gateway should apply to you.
Make sure you check whether tracing=0 in your ODBC data source as tracing=6 killed performance for me.
Related
I want to build a Kafka Connector in order to retrieve records from a database at near real time. My database is the Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 and the tables have millions of records. First of all, I would like to add the minimum load to my database using CDC. Secondly, I would like to retrieve records based on a LastUpdate field which has value after a certain date.
Searching at the site of confluent, the only open source connector that I found was the “Kafka Connect JDBC”. I think that this connector doesn’t have CDC mechanism and it isn’t possible to retrieve millions of records when the connector starts for the first time. The alternative solution that I thought is Debezium, but there is no Debezium Oracle Connector at the site of Confluent and I believe that it is at a beta version.
Which solution would you suggest? Is something wrong to my assumptions of Kafka Connect JDBC or Debezium Connector? Is there any other solution?
For query-based CDC which is less efficient, you can use the JDBC source connector.
For log-based CDC I am aware of a couple of options however, some of them require license:
1) Attunity Replicate that allows users to use a graphical interface to create real-time data pipelines from producer systems into Apache Kafka, without having to do any manual coding or scripting. I have been using Attunity Replicate for Oracle -> Kafka for a couple of years and was very satisfied.
2) Oracle GoldenGate that requires a license
3) Oracle Log Miner that does not require any license and is used by both Attunity and kafka-connect-oracle which is is a Kafka source connector for capturing all row based DML changes from an Oracle and streaming these changes to Kafka.Change data capture logic is based on Oracle LogMiner solution.
We have numerous customers using IBM's IIDR (info sphere Data Replication) product to replicate data from Oracle databases, (as well as Z mainframe, I-series, SQL Server, etc.) into Kafka.
Regardless of which of the sources used, data can be normalized into one of many formats in Kafka. An example of an included, selectable format is...
https://www.ibm.com/support/knowledgecenter/en/SSTRGZ_11.4.0/com.ibm.cdcdoc.cdckafka.doc/tasks/kcopauditavrosinglerow.html
The solution is highly scalable and has been measured to replicate changes into the 100,000's of rows per second.
We also have a proprietary ability to reconstitute data written in parallel to Kafka back into its original source order. So, despite data having been written to numerous partitions and topics , the original total order can be known. This functionality is known as the TCC (transactionally consistent consumer).
See the video and slides here...
https://kafka-summit.org/sessions/exactly-once-replication-database-kafka-cloud/
I'm working on a process improvement that will use SQL in r to work with large datasets. Currently the source data is stored in several different MS Access databases. My initial approach was to use RODBC to read all of the source data into r, and then use sqldf() to summarize the data as needed. I'm running out of RAM before I can even begin use sqldf() though.
Is there a more efficient way for me to complete this task using r? I've been looking for a way to run a SQL query that joins the separate databases before reading them into r, but so far I haven't found any packages that support this functionality.
Should your data be in a database dplyr (a part of the tidyverse) would be the tool you are looking for.
You can use it to connect to a local / remote database, push your joins / filters / whatever there and collect() the result as a data frame. You will find the process neatly summarized on http://db.rstudio.com/dplyr/
What I am not quite certain of - but it is not a R issue but rather an MS Access issue - is the means for accessing data across multiple MS Access databases.
You may need to write custom SQL code for that & pass it to one of the databases via DBI::dbGetQuery() and have MS Access handle the database link.
The link you posted looks promising. If it doesn't yield the intended results, consider linking one Access DB to all the others. Links take almost no memory. Union the links and fetch the data from there.
# Load RODBC package
library(RODBC)
# Connect to Access db
channel <- odbcConnectAccess("C:/Documents/Name_Of_My_Access_Database")
# Get data
data <- sqlQuery(channel , paste ("select * from Name_of_table_in_my_database"))
These URLs may help as well.
https://www.r-bloggers.com/getting-access-data-into-r/
How to connect R with Access database in 64-bit Window?
My source data is in Oracle and target data is in Teradata.Can you please provide me the easy and quick way to validate data .There are 900 tables.If possible can you provide syntax too
There is a product available known as the Teradata Gateway that works with Oracle and allows you to access Teradata in a "heterogeneous" manner. This may not be the most effective way to compare the data.
Ultimately what your requirements sound more process driven and to be done effectively would require the source data to be compared/validated as stage tables on the Teradata environment after your ETL/ELT process has completed.
How to Synchronize local database(datas) value to Server(Hosting) database values in SQL server 2008 R2? ex:from our client having a pc wen they are enterting entry it will insert kay ah...but in our concern keeping backup from server Hosting i mean IBM server...suppose client
connecting internet connection means clicking single button event want to transfer OUR own server database also..can u got it
Is there such any option there? Please let me know.
Thank you in advance!
I suggest that use Redgate Data compare tools in order to synchronize data from one database to another database.
You can also use some query such as following query in order to determine deferent record in two database and synch them
USE Datbase1
INSERT INTO Schema1.Table1 (columns)
SELECT t1.Columns
FROM Datbase2.Schema2.Table2 t1
LEFT JOIN Schema1.Table1 t2 ON t1.keyColumn = t2.Keycolumn
WHERE t2.keycolumn IS NULL
The RedGate will be costly for you. Use can use Open Source Database compare and sync tool as
Open DBDiff
I have used it for my Live Databases compare with Local Database. Still today I have not found any issue.
Hope It will help you.
Open DBDiff is an open source database schema comparison tool for SQL Server 2005/2008.
It reports differences between two database schemas and provides a synchronization script to upgrade a database from one to the other.
I have an ASP.NET app that uses a SQL Server database. I now need to pull data from Sybase ASE into that SQL Server database for my app to consume, and I'm not having any success with my ideas.
Has anyone done this? Any ideas/suggestions/tips?
You can configure a linked server from SQL Server to Sybase. It should be fairly vanilla using the Sybase provider on the MS side.
Okay, I've finally (through lame trial and error) found out how to link my Sybase ASE (12.5) server to my SQL Server (2008) which will allow the integration I want. Here's roughly how I did it:
Logged in to Sybase ASE OLE DB Configuration Manager (this is like the Sybase version of Windows' ODBC Data Sources) and added an OLE DB data source. I believe you must be an admin on the PC to do this.
In SQL Server 2008 Management Studio, went to Server Objects > Linked Servers. Right click and select "New Linked Server".
In the Linked Server Properties, I set the following properties:
General:
--Linked server: the name of your linked server as you want it to appear in your linked server list
--Provider: Select Sybase ASE OLE DB Provider from the dropdown list.
--Product name: The exact name of the OLD DB data source you just created in Sybase ASE OLE DB Configuration Manager.
--Data source: Same as Product name.
--Provider string: I left this blank
--Location: I left this blank
--Catalog: The default database (master or whatever) to log on to.
Security:
--You need to map a valid SQL Server logon to a valid Sybase logon. I did not use impersonation (which does a credentials pass-thru).
--I chose my connection Be made without using a security context.
Server Options:
--All the defaults worked for me.
Throughout, the standard SQL Server help worked fairly well as a guide. Though not always true, F1 was my friend here.
I can now do distributed queries, DTS or SSIS packages, and use SSRS. This takes a lot of the suck out of Sybase ASE.
Of course the above can be done via the command line using sp_linkserver, but the GUI is more comfortable for a lowly dev like me.
Use Management Studio or Enterprise Manager to import the data using the data importation wizard. That should be it, just make sure you pick the right data provider in the wizard and you should be good to go.
If you want this to be a live feed create a small windows service to manage the exchange of information. It should be relatively simple to do, just a little bit of leg work on your end. If you are adverse to that there are plenty of off the shelf solutions that can do this for you.
The question is a little vague on specifics:
Is this a one time conversion or part of a repeated process.
Is the source machine "reachable" from your destination machine (can you connect the two or do you need to read in files)
With most conversions there are two parts:
Physically getting data from the source into the destination.
Mapping data from the source to the destination tables.
It is hard to make any recommendations without more info. What would be fine for a one time conversion would not work if you need to read in data all day every day. Also, if the source database can not be connected to and you have to pass files, they methods change.