Clickhouse compatibility between different versions for Merge tables - compatibility

I have a question regrading compatibility between different Clickhouse server versions. I have a ReplicatedMergeTree Table Engine, let's call it Table A on Clickhouse server version 20.3. I have another Replicated MergeTree Table Engine, let's call it Table B on Clickhouse server version 21.8. I have a Merge Table Engine, let's call it Table C, running on Clickhouse server version 20.3 which merges data from Table A and Table B. Since the different tables are on different versions of Clickhouse, I wanted to know whether there would be any issues because of it.
In short
Node 1 (version 20.3) - Local Table A.
Node 2 (version 21.8) - Local Table B.
Node 3 (version 20.3) - Distribute Table A, Distributed Table B, Merge Table C(of Distributed Table A and Distributed Table B)
Is it supported by Clickhouse or not?

Is it supported by Clickhouse or not?
Not in general. Distributed queries over different versions may produce incorrect results. I know 2 issues for this case with 20.3 + 21.8.
Also I don't understand how you going to use Engine=Merge, you need Distributed table to query remote server. And I think you don't need Engine=Merge. It's a simple 2 shards schema for Distributed table.
Anyway, I don't recommend to use it. But it's possible and will work with limited usecases.

Related

How to export large table from Teradata

What would be the best way to export large table (e.g. over 10 billion rows) from Teradata? In terms of speed and resources consumption.
I know FastExport tool which uses fastexport mode, but it still requires the results to be put in the spool before sending to the client. Before it allowed avoiding spool by forcing nospoolonly mode but this seems to be broken in recent releases. Therefore if I use it with select * from table query the whole table will be copied, which would require massive spool file allowance.
I also came across JDBC driver and PT API, however they seem to be using the same underlying mechanisms.
Is there a better way?

Queries in RPostgreSQL are very slow

Currently I'm building a Shiny APP using several queries to a PostgreSQL database (mainly SELECT and INSERT statements). The application works but I'm trying to make it faster. When I compare the execution times between the same query using the RPostgreSQL package and a db client like Postico, it's taking 8 times more with the RPostgreSQL package.
Any ideas of ways of boosting the performance or connecting to a PostgreSQL database from R?
Thanks
Have you ever heard about the package dbplyr (with the b)?
I would recommend it because this package enables your dplyr (with no b) to be used with SQL databases.
There are many advantages since the way you interact with your databases will shift
from this:
to this:
These images are extracted from a great article entitled "Databases using R" by Edgar Ruiz (2017). You should take a look at it HERE for more details.
The main advantages presented by Mr. Ruiz are, and I quote:
"
1) Run data exploration over all of the data - Instead of coming up with a plan to decide what data to import, we can focus on analyzing the data inside the database, which in turn should yield faster insights.
2) Use the SQL Engine to run the data transformations - We are, in effect, pushing the computation to the database because dplyr is sending SQL queries to the database.
3) Collect a targeted dataset - After become familiar with the data and choosing the data points that will either be shared or modeled, a final query can then be used to bring back only that data into memory in R.
4) All your code is in R! - Because we are using dplyr to communicate with the database, there is no need to change language, or tools, to perform the data exploration. "
So, you will probably gain the speed you are looking for with dbplyr/dplyr.
You should give it a try.
You can find more information about it and how to establish the connection with your PostgreSQL Server using the DBI package at:
https://cran.r-project.org/web/packages/dbplyr/vignettes/dbplyr.html
and
https://rviews.rstudio.com/2017/05/17/databases-using-r/

Simulate records in database without entering any

I've nearly finished the development of a project and would like to test its performance, especially the database query calls. I'm using Linq to SQL to search via usernames, but I've only got around 10 'users' in my database, so I can't really get a decent speed reading. How can I simulate thousands/millions of users in the database without actually creating new records? I've read about Selenium, but it seems that is good for repeat actions (simulating concurrent users?). Are there any other tools I should look into, or are there any options in VS 2008 (Professional Edition)?
Thanks
You can "trick" SQL Server into thinking there are more records than there actually are in a table using the approach outlined in this article. See the section on False SQL Server Statistics
e.g.
UPDATE STATISTICS TableName WITH ROWCOUNT=100000
will create statistics for the table as if it has 100000 rows in. You can then see what effect this has on the execution plan. But note this is undocumented functionality as so it may give quirky behaviour.
You could just populate your table with sample data. There's various tools available to help out with that like, Red Gate's SQL Data Generator. I prefer actually having large data volumes as I think that is what will be more accurate.

How to query data from an AspenTech IP21 Historian?

Old subject, combined with new tools: What would be the best/appropriate way to query data for a web application from an AspenTech IP21 (InfoPlus.21) data historian?
In the past, I've used some pretty awful queries via the Aspen SqlPlus ODBC driver, but that doesn't seem like the right approach, as it doesn't seem to install on Win 7 at all.
Anyone here have experience with that?
1) make sure you have appropriate version of Aspen tools, later ones (7.1, 7.2) will run on Windows 7 with no problems
2) I have worked with Aspen IP21 going over 15 years and have never had issues with SQL performance compared to other databases like Oracle or SQL server as long as the IP21 is on an approriate server and the query is written appropriately per the structure of the database. Doing a join against timestamp is going to produce a slow query. Depending on what you want to accomplish, there are multiple other ways to get data, through HISTORY pseudo table, AGGREGATES table, or other query techniques that are specific to IP21.
3) ODBC is still the most standard, easiest, and to me best performance for getting data from Ip21 form any client, ASP, .Net, web page, other databases, VB programs, Excel VBA, etc. Just may need some optimization tweaking probably in how SQL is written.
I've had extensive experience using the normal SQLPlus drivers in C#/ASP.NET and performance has never been an issue. While the ODBC drivers work, I have encountered certain limitations, such as not always returning SELECTs results.
As for how to check 'out of spec':
If this is for real-time values and not for ranges of time, I would suggest using record references to simply select the current value. That way the entire query stays in memory.
For time ranges you will have to select the ranges and iterate over them, which is more costly.

Inner join across multiple access db's

I am re-designing an application for a ASP.NET CMS that I really don't like. I have made som improvements in performance only to discover that not only does this CMS use MS SQL but some users "simply" use MS Access database.
The problem is that I have some tables which I inner join, that with the MS Access version are in two different files. I am not allowed to simply move the tables to the other mdb file.
I am now trying to figure out a good way to "inner join" across multiple access db files?
It would really be a pity if I have fetch all the data and the do it programmatically!
Thanks
You don't need linked tables at all. There are two approaches to using data from different MDBs that can be used without a linked table. The first is to use "IN 'c:\MyDBs\Access.mdb'" in the FROM clause of your SQL. One of your saved queries would be like:
SELECT MyTable.*
FROM MyTable IN 'c:\MyDBs\Access.mdb'
and the other saved query would be:
SELECT OtherTable.*
FROM OtherTable IN 'c:\MyDBs\Other.mdb'
You could then save those queries, and then use the saved queries to join the two tables.
Alternatively, you can manage it all in a single SQL statement by specifying the path to the source MDB for each table in the FROM clause thus:
SELECT MyTable.ID, OtherTable.OtherField
FROM [c:\MyDBs\Access.mdb].MyTable
INNER JOIN [c:\MyDBs\Other.mdb].OtherTable ON MyTable.ID = OtherTable.ID
Keep one thing in mind, though:
The Jet query optimizer won't necessarily be able to use the indexes from these tables for the join (whether it will use them for criteria on individual fields is another question), so this could be extremely slow (in my tests, it's not, but I'm not using big datasets to test). But that performance issue applies to linked tables, too.
If you have access to the MDBs, and are able to change them, you might consider using Linked Tables. Access provides the ability to link to external data (in other MDBs, in Excel files, even in SQL Server or Oracle), and then you can perform your joins against the links.
I'd strongly encourage performance testing such an option. If it's feasible to migrate users of the Access databases to another system (even SQL Express), that would also be preferable -- last I checked, there are no 64-bit JET drivers for ODBC anymore, so if the app is ever hosted in a 64-bit environment, these users will be hosed.
Inside one access DB you can create "linked tables" that point to the other DB. You should (I think) be able to query the tables as if they both existed in the same DB.
It does mean you have to change one of the DBs to create the virtual table, but at least you're not actually moving the data, just making a pointer to it
Within Access, you can add remote tables through the "Linked Table Manager". You could add the links to one Access file or the other, or you could create a new Access file that references the tables in both files. After this is done, the inner-join queries are no different than doing them in a single database.

Resources