How to determine the largest length of Progress OpenEdge ABL fields - openedge

In OpenEdge ABL / Progress 4GL, a field can be defined with a FORMAT, but that is only the default format for it to be displayed. Thus, a CHARACTER field with FORMAT 'X(10)' could store thousands of characters past the first ten.
The database I'm using contains millions of rows in some of the tables I'm concerned with. Is there any system table or Progress-internal program I can use to determine the longest length of a given field? I'm looking for anything more efficient than full-table scans. I'm on Progress OpenEdge 11.5.

"dbtool" will scan the db and find fields whose width exceeds the "sql width". By default that is 2x the format that was defined for character fields.
https://knowledgebase.progress.com/articles/Article/P24496/
Of course it has to scan the table to do that so it may not meet your "more efficient than table scans" criteria. FWIW dbtool is reasonably efficient.
If the fields that you are concerned about are problematic because of potential SQL access you might also want to look into "authorized data truncation" via the -SQLTruncateTooLarge parameter which will truncate the data on the fly.
Another option would be -SQLWidthUpdate which automatically adjusts the SQL width on the fly. That requires an upgrade to at least 11.6.
Both of these might solve your problem without periodic table scans.

If it's actually the character format you want to adjust to match the data, I suppose what you could do is to use dbtool to adjust the SQL width of all the fields, and then set the character format to be half the SQL width.

Related

How to overcome Row size too large (> 8126) error on Google-Cloud MySQL5.7 Second Generation

Google Cloud MySQL Engine supports the InnoDB storage engine only.
I am getting the following error when creating a table with 300 columns.
[Err] 1118 - Row size too large (> 8126).
Changing some columns to TEXT or BLOB may help. In the current row format, the BLOB prefix of 0 bytes is stored inline.
I tried creating a table with the combination of some columns as text types and some others as blob types as well but it did not work.
Even modifying innodb_log_file_size is not possible, as it is not allowed on the Google Cloud-SQL Platform.
"Vertical Partitioning"
A table with lots of columns is pushing several limits; you hit one of them. There are several reasonable workarounds, Vertical Partitioning may be the best, especially if many are TEXT/BLOB.
Instead of a single table, have multiple tables with the same PRIMARY KEY, except that one may be AUTO_INCREMENT. JOIN them together as needed to collect the columns. You could even have VIEWs to hide the fact that you split up the table. I recommend grouping the columns by some logical grouping based on the application and which columns are needed 'together'.
Do not splay an array of things across columns; instead, have another table with multiple rows to handle the repetition. Example: address1, state1, country1, address2, state2, country2.
Do not use CHAR or BINARY except for truly fixed-length columns. Most of such are very short. Also, most CHAR columns should be CHARACTER SET ascii, not utf8. (Think, country_code, zipcode, md5.)
innodb_log_file_size is only indirectly related to your Question. What is it's value?
Directly related is innodb_page_size, which defaults to 16K, and virtually no one ever changes. I would expect Cloud Engines to prohibit changing it.
(I'm with Bill on desiring more info about your schema -- so we can be more specific about how to help you.)
You don't have much option here. InnoDB default page size is 16KB, and you must design your tables so at least two rows fit in a page. That's where the limit of 8126 bytes per row comes from.
Variable-length columns like VARCHAR, VARBINARY, BLOB, and TEXT can be longer, because data exceeding the row size limit can be stored on extra pages. To take advantage of this, you must enable the Barracuda table format, and choose ROW_FORMAT=DYNAMIC.
In config:
[mysqld]
innodb_file_per_table = ON
innodb_file_format = Barracuda
innodb_default_row_format = DYNAMIC;
I don't know if these settings are already enabled in Google Cloud SQL, or if they allow you to change these settings.
Read https://dev.mysql.com/doc/refman/5.7/en/innodb-row-format.html for more information
Again, the advantage of DYNAMIC row format only applies to variable-length data types. If you have 300 columns that are fixed-length, like CHAR, then it doesn't help.
By the way, innodb_log_file_size has nothing to do with this error about row size.
In order to do what you want to do on a Cloud SQL instance, first off run this to set the innodb_strict_mode variable:
SET innodb_strict_mode = 0 ;
After that you should be able to create your table.

Should I be worried about the settings table getting huge?

I have got a pretty fat settings table in SQL Server 2012, now with over 100 columns. As the name suggests, this table keeps track of all kinds of setting values within our website. It used to be having less than 50 columns but now its size is doubled.
The reason why I store setting values into database is because users will need to have ability to change these settings via UI.
Should I really be worried about this table getting bigger and bigger over time? Or I will have to find some other ways to store settings data, e.g save into files, perhaps?
First, you don't need to store settings in a database in order to update them at runtime by users. You can simply store them in a settings file that gets updated whenever the user makes changes. This is an xml config file and works well.
If, however, the application is network based, and you want the settings to follow the user from machine to machine, it makes more sense to put it in a database.
Second, yes... 100 columns is huge. Instead of storing each setting in a separate column, you might consider storing each setting in a separate row, and then have a common row format which is ID, SettingName, SettingValue, (maybe) DefaultValue. Then your table can grow as large as you like.
We are using JSON to store user settings. The table obtains only two columns - the user Id and the setting string. This string is quite long, but it doesn't matter. You can also use XML to store this data.
This is worse solution to modify data by finger, but faster to get from your DB and process by the client or by the ASP.NET server.
I am imagining that you are concerned about performance on huge tables?
One question is how many rows in this table? 100 columns with 10000 rows is not real problem. 100 columns over 10million rows is a slightly different ballgame. Not worse of better, just different.
The same considerations apply for small and large tables:
1. Are you indexing properly
2. Is your IO fine
3. Is your space fine
4. Are you querying efficiently
There is no right answer for this, it would depend of why you have big column counts and whether it's hitting your overall performance.
We run 1000s of tables with > 150 columns and no problems, even with millions of rows between them and I can't complain about performance.
And this is relatively de-normalized data, so lots of text.

When to include an index (automated heuristic)

I have a piece of software which takes in a database, and uses it to produce graphs based on what the user wants (primarily queries of the form SELECT AVG(<input1>) AS x, AVG(<intput2>) as y FROM <input3> WHERE <key> IN (<vals..> AND ...). This works nicely.
I have a simple script that is passed a (often large) number of files, each describing a row
name=foo
x=12
y=23.4
....... etc.......
The script goes through each file, saving the variable names, and an INSERT query for each. It then loads the variable names, sort | uniq's them, and makes a CREATE TABLE statement out of them (sqlite, amusingly enough, is ok with having all columns be NUMERIC, even if they actually end up containing text data). Once this is done, it then executes the INSERTS (in a single transaction, otherwise it would take ages).
To improve performance, I added an basic index on each row. However, this increases database size somewhat significantly, and only provides a moderate improvement.
Data comes in three basic types:
single value, indicating things like program version, etc.
a few values (<10), indicating things like input parameters used
many values (>1000), primarily output data.
The first type obviously shouldn't need an index, since it will never be sorted upon.
The second type should have an index, because it will commonly be filtered by.
The third type probably shouldn't need an index, because it will be used in output.
It would be annoying to determine which type a particular value is before it is put in the database, but it is possible.
My question is twofold:
Is there some hidden cost to extraneous indexes, beyond the size increase that I have seen?
Is there a better way to index for filtration queries of the form WHERE foo IN (5) AND bar IN (12,14,15)? Note that I don't know which columns the user will pick, beyond the that it will be a type 2 column.
Read the relevant documentation:
Query Planning;
Query Optimizer Overview;
EXPLAIN QUERY PLAN.
The most important thing for optimizing queries is avoiding I/O, so tables with less than ten rows should not be indexed because all the data fits into a single page anyway, so having an index would just force SQLite to read another page for the index.
Indexes are important when you are looking up records in a big table.
Extraneous indexes make table updates slower, because each index needs to be updated as well.
SQLite can use at most one index per table in a query.
This particular query could be optimized best by having a single index on the two columns foo and bar.
However, creating such indexes for all possible combinations of lookup columns is most likely not worth the effort.
If the queries are generated dynamically, the best idea probably is to create one index for each column that has good selectivity, and rely on SQLite to pick the best one.
And don't forget to run ANALYZE.

How to determine position of specific character/string in SQLite string column value?

I have values in a SQLite table* that contain a number of strings, of different lengths, joined by periods, something like this:
SomeApp.SomeNameSpace.InterestingString.NotInteresting
SomeApp.OtherNameSpace.WantThisOne.ReallyQuiteDull
SomeApp.OtherNameSpace.WantThisOne.AlsoDull
SomeApp.DifferentNameSpace.AlwaysWorthALook.LittleValue
I'd like to extract (in this case) the third period-delimited substring so I could write something like
SELECT interesting_string, COUNT(*)
FROM ( SELECT third_part_of_period_delimited_string(name) interesting_string )
GROUP BY interesting_string;
Obviously I can do this any number of ways programmatically; I'm wondering if there's any way to achieve this in a SQLite SELECT query?
* It's a SharpDevelop Profiler database, if anyone's curious
No.
You can, as you mention, work with the strings after you have selected them from the database. Or you can split them up into separate columns when they are stored.
If you do not have access to the code that is storing the data, you might want to consider reading the data in its entirety, splitting the strings and storing the split out tokens in separate columns in a new table. If the data is not too large, you might look at storing this table in a new memory database to give excellent performance.
Whether this is worthwhile depends on whether one pass to split the data strings can be made use of many times. If the data is constantly changing, then this scheme would probably not work well.

Is there a downside by chosing ntext as datatype for all text columns? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Are there any disadvantages to always using nvarchar(MAX)?
Is there a general downside by chosing 'ntext' as column type instead of a type that contains chars but with a limited max size like 'char' or 'varchar'?
I'm not sure whether a limited column size is applyable to all my colums. Therefore I would use 'ntext' for all columns containing text. Might this lead to problems in the future?
(I'm using Linq-To-SQL in a ASP.net Webforms application)
NTEXT is being deprecated for a start, so you should use NVARCHAR(MAX) instead.
You should always try to use the smallest datatype possible for a column. If you do need to support more than 4000 characters in a field, then you'll need to use NVARCHAR(MAX). If you don't need to support more than 4000 characters, then use NVARCHAR(n).
I believe NTEXT would always be stored out of row, incurring an overhead when querying. NVARCHAR(MAX) can be stored in row if possible. If it can't fit in row, then SQL Server will push it off row. See this MSDN article.
Edit:
For NVARCHAR, the maximum supported explicit size is 4000. After that, you need to use MAX which takes you up to 2^31-1 bytes.
For VARCHAR, the maximum supported explicit size is 8000 before you need to switch to MAX.
In addition to what AdaTheDev said, most of the standard T-SQL string functions do not work with NTEXT data types. You are much better off using VARCHAR(MAX) or NVARCHAR(MAX).
NVARCHAR furthermore is for widechars, e.g. non latin letters.
I had a stored procedure which ran with NVARCHAR parameters, when I changed it to use VARCHAR instead, I more than doubled the performance.
So if you know you won't need widechars in your columns, you're best of using VARCHAR.
and like the other answers says, don't use TEXT/NTEXT at all, they're deprecated.
You can never have an index on any text column because an index is limited to 900 bytes.
And ntext can't be indexed anyway, but there are still limitations on newer BLOB types too.
Do you plan on having only non-unique text columns? Or never plan to search them?

Resources