I Have few questions, which i was asked recently. Please help me with the answers.
1) In TD 14 can fastload load in chunks greater than 64K? Are there any other options for it.
2) Can we do a compression on varchar columns in TD 14.
3) How we force the optimizer to use the JOIN INDEX created?
Regards,
Amit
Q1: No, this is both the default and the maximum size
Q2:Of course, COMPRESS on VarChar is supported since 13.10
Q3: There's no way to force it. But you can create a view with exactly the same SELECT, so you can utilize the JI like a Materialized View in FROM
Related
In OpenEdge ABL / Progress 4GL, a field can be defined with a FORMAT, but that is only the default format for it to be displayed. Thus, a CHARACTER field with FORMAT 'X(10)' could store thousands of characters past the first ten.
The database I'm using contains millions of rows in some of the tables I'm concerned with. Is there any system table or Progress-internal program I can use to determine the longest length of a given field? I'm looking for anything more efficient than full-table scans. I'm on Progress OpenEdge 11.5.
"dbtool" will scan the db and find fields whose width exceeds the "sql width". By default that is 2x the format that was defined for character fields.
https://knowledgebase.progress.com/articles/Article/P24496/
Of course it has to scan the table to do that so it may not meet your "more efficient than table scans" criteria. FWIW dbtool is reasonably efficient.
If the fields that you are concerned about are problematic because of potential SQL access you might also want to look into "authorized data truncation" via the -SQLTruncateTooLarge parameter which will truncate the data on the fly.
Another option would be -SQLWidthUpdate which automatically adjusts the SQL width on the fly. That requires an upgrade to at least 11.6.
Both of these might solve your problem without periodic table scans.
If it's actually the character format you want to adjust to match the data, I suppose what you could do is to use dbtool to adjust the SQL width of all the fields, and then set the character format to be half the SQL width.
I have implemented wildcard search using oracle coherence API. When I execute the search on string fields(four fields) using
1) "LikeFilter" with "fIgnoreCase" as true and
2) search text is % patterns(eg: "%test%") and
3) accumulated those using " AnyFilter", and
4) the volume of data in the cache is huge then the searches become very slow.
Applying the standard index does not have any effect on the performance, as it appears that this index works only for exact matches or comparisons.
Is there any special type of index in Coherence for wildcard searches (similar to the new indexes in Oracle TEXT)? If not, is there any other way to improve wildcard query performance on Coherence, with large data sets in the cache?
Please provide code snippet to understand the current solution applied. Also, hope following practices already applied:
Explain plan to see the query performance
Leveraging data-grid wide execution for parallel processing considering volume of data
Also, need information on volume of data (in GB) along with Coherence setup in place (no. of nodes, size of each node) to understand sizing of the cluster.
I am using websql to store data in a phonegap application. One of table have a lot of data say from 2000 to 10000 rows. So when I read from this table, which is just a simple select statement it is very slow. I then debug and found that as the size of table increases the performance deceases exponentially. I read somewhere that to get performance you have to divide table into smaller chunks, is that possible how?
One idea is to look for something to group the rows by and consider breaking into separate tables based on some common category - instead of a shared table for everything.
I would also consider fine tuning the queries to make sure they are optimal for the given table.
Make sure you're not just running a simple Select query without a where clause to limit the result set.
I just started to use KNIME and it suppose managed a huge mount of data, but isn't, it's slow and often not response. I'll manage more data than that I'm using now, What am I doing wrong?.
I set in my configuration file "knime.ini":
-XX:MaxPermSize=1024m
-Xmx2048m
I also read data from a database node (millions of rows) but I can't limit it by SQL (I don't really mind, I need this data).
SELECT * FROM foo LIMIT 1000
error:
WARN Database Reader com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'LIMIT 0' at line 1
I had the same issue... and was able to solve it really simply, KNIME has a KNIME.ini file, this one is like the paramethers KNIME uses to execute...
The real issue is that JBDC driver is set for 10 Fetch Size. By default, when Oracle JDBC runs a query, it retrieves a result set of 10 rows at a time from the database cursor. This is the default Oracle row fetch size value... so whenever you are reading database you will have a big pain waiting to retrieve all the lines.
The fix is simply, go to the folder where KNIME is installed, look for the file KNIME.ini, open it and then add the following sentences to the bottom, it will override the defauld JBDC fetching, and then you will get the data in literally seconds.
-Dknime.database.fetchsize=50000
-Dknime.url.timeout=9000
Hope this helps :slight_smile:
see http://tech.knime.org/forum/knime-users/knime-performance-reading-from-a-database for the rest of this discussion and solutions...
I'm not sure if your question is about the performance problem or the SQL problem.
For the former, I had the same issue and only found a solution when I started searching for Eclipse performance fixes rather than KNIME performance fixes. It's true that increasing the Java heap size is a good thing to do, but my performance problem (and perhaps yours) was caused by something bad going on in the saved workspace metadata. Solution: Delete contents of the knime/workspace/.metadata directory.
As for the latter, not sure why you're getting that error; maybe try adding a semicolon at the end of the SQL statement.
I have a large number of rows in a SQL Server table. I need to select all those rows and run an update query for each of those rows. I need to know what's the best option do it from the following
run a select query and get a DataTable and use the following code in the application
foreach(DataRow item in DataTable.Rows)
{
//perform update
}
in the database level use a stored procedure, select the set of data and use SQL Server cursor to perform the update
Option 1 vs option 2 means you work with disconnected dataset vs connected data readers.
As discussed other times here in SO in fact this means more memory needs at once on the client vs a connection kept open longer and smaller chunk of data transmitted more often while looping on results. Since your main focus is on the update of data i think both options are probably similar and if you have many records i would probably go for the second one, using datareader, to do not have to load all those records at once in a dataset.
As others already pointed out the best performances would be achieved having a set based update stored procedure to which you would pass certain parameters and all records are atomically updated at once. Or also have a look at SqlBulk updates.
i have also one suggestion
foreach(DataRow item in DataTable.Rows) {
//perform update
}
here you can use LINQ .so it can give fast response
Like Marc already commented, go for option 3, and do a set based update if at all possible (and usually it is). If you think it isn't, maybe you could ask a separate question on how to do that. The more specifics you will give, the better the proposed solutions will fit your situation