I'm not sure if anyone has faced this issue.
Basically I have a text document with about 100,000 lines and I am trying to import it into an SQLite table with a single column.
After doing so, when I did a generic query SELECT * FROM table WHERE field LIKE "%something%", I realised that irrelevant results were turning up. Digging further, the reason was that some of the lines in the original text file got concatenated into giant row entries. This gives the impression of a wrong result (which was simply due to the giant rows having a match). Instead of 100,000 records, I had only 50,000 odd and 2 records with LENGTH(field) > 1,000,000 characters.
The first thing that came to my mind was the possibility of special characters messing things up, so I did a strings FILE in BASH. The problem persisted.
So, long story short, does anyone know the reason for this (and how to solve the issue)? Considering that the table has a single field, I don't think delimiters have anything to do with this right?
I've traced the issue to unbalanced double-quotes reserved for quoting strings in csv. so if i have an open quote on one line, it will only count as a record when the next quote is found - which can be many lines down.
Related
I have a csv file with multiple lists. See picture. What I want to do is query every single value so it tells me which list that the value is found in.
Eg I query number 898774 and it tells me 898774 - prim6 in set 1, set 2 and set 4.
I did find a quick work around by making one big list in excel, removing dupes and then manually searching all for each number. Doable for a small amount but not that good for '000s of sets.
I created a vector for each column and started a search with which(sapply) but then remembered I needed the names. Just a little bit out of my knowledge.
SHORT VERSION: If all else fails, add a value (even zero) to an additional Number column in the SQLite table to stop the #DELETED demon.
LONGER VERSION: There have been many posts about this frustrating and inconsistent problem over the years. Lord knows, I have read each one at least half dozen times and tried every remedy proposed - and usually one of the incantations will finally solve the problem. Yet I found myself recently in a new quandary and spent the last two days racking my brain to osmose why none of the tricks worked on it.
It was classic: Access 2019 front end linked to a SQLite back end table (via the Devart SQLite ODBC driver, which I believe is inconsequential). The table had the recommended col_ID of Number format, Auto-incrementing as the Primary Key. It had col_DateUpdated, Text format with default value of current_timestamp. There was col_OperatorID which was Text format and the only column in a Unique Index. Finally, there were two other non-indexed Number format columns.
The table worked fine in SQLite. I could Add, Delete, Update no problem. In Access, when I opened the linked table and added a record it did not immediately show the auto incrementing value in col_ID, nor the date/time stamp. When I clicked off the row, it immediately filled all the columns of the new row with #DELETED. If I closed the table and reopened it, the new row would display just fine.
The SQLite Pragmas were all set to Default including Auto Index, Normal Locking Mode and Full Synch. I tried every combination of changing the table structure, column formats, indexes, default values, etc. The problem persisted regardless of whether there was any other data in the table or not.
I've been coding in Access for over 30 years and SQLite for three and have never seen anything like it.
I was stumped until , for the heck of it, I added a value into one of the other Number columns. Amazingly, it worked great!
I can create a new row, put values in col_OperatorID AND the two non-indexed Number columns, click off the row and it takes it fine. It updates the autonumber primary key col_ID and col_DateUpdated with the current date/time just fine with no #DELETED nonsense.
It beats me why it works, maybe Access finally can accept it as a really, really unique record (even though the additiaonal data is not in any index) or maybe putting the numeric value in the other, seemingly unimportant, columns forces an update across the link, I don't know. But I thought I would pass this along because I KNOW probably forevermore, unless Microsoft or the SQLite folks come up with a cure for this, there will be people that will need this additional gimmick to get out of #DELETED hell.
Good luck and Happy Trails.
obviously editing any column value will change the checksum.
but saving the original value back will not return the file to the original checksum.
I ran VACUUM before and after so it isn't due to buffer size.
I don't have any indexes referencing the column and rows are not added or removed so pk index shouldn't need to change either.
I tried turning off the rollback journal, but that is a separate file so I'm not surprised it had no effect.
I'm not aware of an internal log or modified dates to explain why the same content does not produce the same file bytes.
Looking for insight on what is happening inside the file to explain this and if there is a way to make it behave(I don't see a relevant PRAGMA).
granted https://sqlite.org/dbhash.html exists to work around this problem but I don't see any of these conditions being triggered "... and so forth" is a pretty vague cause
Database files contain (the equivalent of) a timestamp of the last modification so that other processes can detect that the data has changed.
There are many other things that can change in a database file (e.g., the order of pages, the B-tree structure, random data in unused parts) without a difference in the data as seen at the SQL level.
If you want to compare databases at the SQL level, you have to compare a canonical SQL representation of that data, such as the .dump output, or use a specialized tool such as dbhash.
We want to read the multiple CSV files generated at one go dynamically through Oracle PL/SQL or Oracle Proc (for one of our requirement) and we are looking some pseudo code snippets or logic to build the same.
We searched for the same but no luck. This requirement has to be done purely through Oracle and no Java is involved here.
I dealt with this problem in the past, and what I did was to write a (quite easy) parsing function similar to split. It should accept two variables: String and separator. It then returns a array of strings.
You then load the whole file into a text variable (declared big enough to hold the whole file) and then invoke the split function (with EOL as separator) to split the buffer into lines.
Then, for each line, invoke the parser again using comma as separation.
Though the parser is simple, you need to take into account possible conditions (e.g. bypass blanks that are not part of a string, single/double quotes management, etc.).
Unfortunately, I left the company at which the parser was developed, otherwise I would had post the source here.
Hope this helps you.
UPDATE: Added some PSEUDO-CODE
For the Parser:
This mechanism is based on a state-machine concept.
Define a variable that will reflect the state of the parsing; possible values being: BEFORE_VALUE, AFTER_VALUE, IN_STRING, IN_SEPARATOR, IN_BLANK; initially, you will be in state BEFORE_VALUE;
Examine each character of the received string and, based on the character and the current state;
It is up to you to decide what to do with blanks like in aaa,bbb, ccc,ddd, those before ccc (in my case, I ignored them).
Whenever you start or go through a value, you append the character to e temporary variable;
Once you finished a value, you add the collected sub-string (stored in the temporary variable) to the array of strings.
The state machine mechanism is needed to properly handle situations like when you have a comma as part of a string value (and hence it is not possible to simply search for commas and chop the whole string according to them),
Another point to take into account is empty values, which would be represented by two consecutive commas (i.e. in your state machine if you find a comma when your state is IN_SEPARATOR, it means that you just passed an empty value).
Note that exactly the same mechanism can be used for splitting the initial buffer into lines, and the each line into fields (the only different is the input string, the separator and the delimiter).
For the File handling process:
Load the file into a local buffer (big enough, preferable CLOB),
Split the file into records (using the above function) and then loop through the received records,
For each record, invoke the parser with the correct parameters (i.e. the record string, the delimiter, and ',' as separator),
The parser will return you the fields contained in the record with which you can proceed and do whatever you need to do.
Well, I hope this helps you to implement the needed code (looks complex, but it is not; just code it slowly taking into account the possible conditions you may encounter when running the state-machine.
I am trying to modify an existing infopath form that contains repeating table container.User enters the details in the table and form is used by another program for processing.
The current requirement is for user should be able to copy data from somewhere and directly paste into table. The user may copy data containing multiple rows and paste it.This source could be anywhere and also the user can still enter data manually row by row so a data connection is not feasible.
But the data gets truncated with only first row getting entered in form as there is only one row present in the form. I found out this was the behavior after googling. Is there a work around for this. Like overwriting paste function using code?
Thanks for any help.
I cannot comment so I will have to leave this as an answer please do not down vote since I am trying to help you out.
Not sure this is possible but try to create a field that allows the user to enter a number of rows. That way they could take a look at the number of rows in their excel spreadsheet and see that it's 32 (or whatever number). Then you could take that number of rows and have the form create a repeating table with the number of rows specified. That way they could paste in the number of rows they need to paste.
This would not be ideal of course but attempting something like this might be easier than overwriting a paste operation.