Knime too slow - performance - bigdata

I just started to use KNIME and it suppose managed a huge mount of data, but isn't, it's slow and often not response. I'll manage more data than that I'm using now, What am I doing wrong?.
I set in my configuration file "knime.ini":
-XX:MaxPermSize=1024m
-Xmx2048m
I also read data from a database node (millions of rows) but I can't limit it by SQL (I don't really mind, I need this data).
SELECT * FROM foo LIMIT 1000
error:
WARN Database Reader com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'LIMIT 0' at line 1

I had the same issue... and was able to solve it really simply, KNIME has a KNIME.ini file, this one is like the paramethers KNIME uses to execute...
The real issue is that JBDC driver is set for 10 Fetch Size. By default, when Oracle JDBC runs a query, it retrieves a result set of 10 rows  at a time from the database cursor. This is the default Oracle row fetch size value... so whenever you are reading database you will have a big pain waiting to retrieve all the lines.
The fix is simply, go to the folder where KNIME is installed, look for the file KNIME.ini, open it and then add the following sentences to the bottom, it will override the defauld JBDC fetching, and then you will get the data in literally seconds.
-Dknime.database.fetchsize=50000
-Dknime.url.timeout=9000
Hope this helps :slight_smile:

see http://tech.knime.org/forum/knime-users/knime-performance-reading-from-a-database for the rest of this discussion and solutions...

I'm not sure if your question is about the performance problem or the SQL problem.
For the former, I had the same issue and only found a solution when I started searching for Eclipse performance fixes rather than KNIME performance fixes. It's true that increasing the Java heap size is a good thing to do, but my performance problem (and perhaps yours) was caused by something bad going on in the saved workspace metadata. Solution: Delete contents of the knime/workspace/.metadata directory.
As for the latter, not sure why you're getting that error; maybe try adding a semicolon at the end of the SQL statement.

Related

How to view partial results in spatialite-gui?

I'm running a query to create a table using the spatialite gui on my Windows 7 machine. It has been going for days and I would like to cancel it and try something different. Is there a way for me to view the results of the query so far? The .sqlite file has trippled in size and I'm curious about what is happening.
In SQLite, transactions are atomic and isolated. (And if you do not use explicit transactions, every command gets an automatic transaction.)
So there is no easy way to see partial results; the database goes to great efforts to ensure that the transaction either succeeds completely, or is rolled back.
If possible, try the command with a smaller data set, or write the query so that only a part of the data is processed.

Can you get a log file of 'reads' on specific RECID(Tablename) in Progress-4GL/Openedge at RunTime without access to Source Code?

I want to know which tables are being read by a query.
for each Customer where CustomerID = 12345.
Eventually this customer will be found in the following example, but progress must 'read' many tables before getting to customer 12345.
How do I know exactly which tables are read (By CustomerID), prior to getting to customer 12345?
*NOTE: I do not have access to modify the code being run for this selection. Ideally I would run a separate set of code that is executed at the same time as the customer query above to track the reads.
EDIT: More clearly - Can you track reads from a given program (.p) OR ProcessID and output either a RECID or the PrimaryKey to a file?
I understand the information is being read off the Disk and probably stored in a database buffer. So how would I get at the information in the database buffer?
You seem to be mixing up a few different things.
In a situation like your example where you FIND a specific record in one, and only one table then there is just a single record read. Progress will find that record by first scanning a relevant index. That might be 2 or 3 "logical reads" of the b-tree to get to the proper node. The record block and index blocks may, or may not be read from disk - that depends on what has happened previously.
There are "Virtual System Tables" available that can tell you how many READ operations take place against a particular table or index. But they do not trace the specific ROWID or other identifying data. _TableStat and _IndexStat are aggregates for all users on the system, _UserTableStat and _UserIndexStat are specific to a particular user's activity. You do need to set the -tablerangesize and -indexrangesize parameters adequately to take advantage of these.
If you have enabled the table and index statistics then you can use a tool like ProTop - http://protop.wss.com to get insight into this activity. Or you can write your own code.
OpenEdge Auditing does not track reads. That would be prohibitively expensive.
It's probably not really a good idea but, in theory, you could write FIND triggers for the tables you are interested in. That doesn't require access to the application source but you would need a development license. It will probably kill performance to do this though - so unless this is a non-production test environment that you just want to fiddle with I wouldn't really do that.
You mention wanting to know how you got to that point. That sounds more like you might need to have a "4gl trace". One easy way to get the stack trace of a running process is to execute:
$DLC/bin/proGetStack PID (UNIX)
or
%DLC%\bin\proGetStack PID (Windows)
This command will generate a "protrace.pid" file containing a 4gl stack trace and other interesting information.
There are also more complicated ways to get that info like using PROMON and the "client statement cache" or setting various log entry types at session startup. But proGetStack is pretty convenient and requires no code or scripting changes.
Some great options from Tom above. And all of them may be relevant to you. The option he only skirts around is the logging options. I feel obliged to expand on this because I'm giving a talk on it in a couple of weeks!
Assuming you are running a modern version of Progress, or even 10.2B08, then you have client logging available to you. Start your session with these additional options:
-clientlog "\somefolder\somefile.txt"
-logentrytypes "QryInfo:3"
This will log all the info of all the queries in your session to the file you specified above. If you navigate to the point in the system where you want to analyse your query and empty the logfile and save it, you can then run the offending query and see all the detail you need.
The output tells you all sorts of useful info, including the number of reads on each table, compared with the number returned to the user. You also get the index selected.
Using Tom's advice and/or this will get you what you need.

Create table Failed: [100015] Total size of all parcels is greater than the max message size

Could someone explain what does the above error message mean? How can it be fixed?
Thanks
There appears to be two main causes of this error:
Bugs in the client software
The query is too large
Bugs:
Make sure that you have the latest tools installed.
I have seen this error when incompatible versions of different TTU
software components are installed, especially CLI.
Please install (or-reinstall) the latest and greatest patches of CLI.
-- SteveF
Steve Fineholtz Taradata Employee
The other reference is from the comments to the original post:
Could be the driver. I had a similar issue with JDBC drivers, which went away
when I simply switched to a different version. – access_granted
Query is too large:
This is the root of the problem, even if it is caused by the above bugs.
Check your actual SQL query size sent to the server. Usually OBDC logs or debug files will let you examine the actual SQL generated.
Some SQL generators include charsets and collations to each field, increasing the query length.
You may want to create your own SQL Query from scratch.
Avoid the following, since they can be added by using additional queries.
Indexes
Default Values
Constraints
Non-ASCII characters as Column Names.
Also, remove all whitespace except a single space.
Do not attempt to add data while creating a table; Unless, the total size of the SQL statement is less than 1 MB.
From the first reference, the maximum query size is 1MB.
On the extreme side, you can name all of your fields a single letter(or double letters...). You can rename them with Alter Table queries later.
The same goes for type; you can set the type for all of the columns as CHAR, and modify it later(before any data is added to the table).

AX 2012R2: Lookup query takes too long, lookup never opens

I have a AX2012R2 CU6 (build&client 6.2.1000.1437, kernel 6.2.1000.5268) with the following problem:
On AP>Journals>Invoices>Invoice Journal>lines (form LedgerJournalTransVendInvoice), when I select Vendor as Account type and then activate the lookup on the Account field, AX freezes for a couple minutes and when it recovers, the lookup is closed/never opened. This happens every time when account type vendor, other account types work just fine.
I debugged this to LedgerJournalEngine.accountNumLookup() --> VendTable.lookupVendor line
formSegmentedEntryControl.performFormLookup(formRun);
The above process takes up the time.
Any ideas before I hire an exorcist?
There is a known KB for this for R3, look for it on Lifecycle services
KB 3086961 Performance issue of VendorLookup on the volume data,
during the GFM Bugbash 6/11 took over 30 minutes
Even though the fix is for R3 it should be easy to backport as the changes are described as
The root cause seemed to be the DirPartyLookupGridView, which had
around 14 joins on views and tables. This view is used in many places
and hence seemed to have grown quite a lot over time.
The changes in the hotfix remove the view and add only the required
datasources - dirpartytable and logisticsaddress to the
VendTableLookup form.
The custtableLookup is not using the view and using custom datasource
joins instead, so no changes there.
Try implementing that change and see what happens.
I'm not sure this will fix your issue as in your execution plan the only operation that seems really expensive is the sort operator which needs to spill to tempdb (you might need more memory to solve that) but the changes in the datasource could have the effect of removing the sort operator from the execution plan as the data may be sorted by an index.
Probably the SQL Server chose the wrong query plan.
First check that you have not disabled any indexes on the involved tables, then do a synchronize on them.
If still a problem, then to run a STATISTICS UPDATE on the involved tables (including the tables in the view).

Does SQLite checksum its data?

Harddrive bit-rot does happen. I'm using SQLite for a project with fairly critical data. Obviously, I'll be taking regular backups of the database, but does SQLite checksum its data?
I've read about the PRAGMA integrity_check, but can't really say whether it does integrity check on the actual data. The page "How To Corrupt An SQLite Database File" doesn't really mention the fact about bit rot on a harddrive, which is the reason why I'm asking.
Also, the database I am dealing with will be an indexable append-only log. One option would be for me to rotate the database regularly and create an MD5 sum of each rotated file. But maybe that's too much work...
Any input appreciated.
From reading the integrity_check documentation, I would say it would not be guaranteed to detect corruption that only affects user data (due to undetected bit errors on media).
Since your data is an append-only log, you've got it pretty easy. One way would be to write a text file log on a separate hard drive that contains hashes (MD5 or whatever) of every row of your data. Then you can use that hash log to verify the contents of the real database. Obviously backups will be an integral part of your plan.
Just stumbled upon this; I could be using the fzec Python package to recover broken data. Each row would have multiple "fzec block columns" to recover from corruption. Seems pretty neat.

Resources