database disk image is malformed - sqlite

I got an error when trying to query an sqlite file in R saying Error in rsqlite_fetch(res#ptr, n = n) : database disk image is malformed likely indicating that the sqlite3 file is somehow malformed. Running pragma integrity_check returns ok.
The file size is 76GB. There is one table main containing 172 columns with an index.
This was meant to create an easier to access method for a series of unstructured files and so is intended only to be read only. I tried reconstituting it several times, but it always seems malformed.
Not sure what else to look at or to do. Any help would be appreciated!

Related

Is zero-byte file a valid sqlite database?

When I call sqlite3_open_v2() on a zero byte file with flag SQLITE_OPEN_READWRITE, the function returns zero. Then the "PRAGMA quick_check;" statement returns a row containing string "ok". So zero-byte file is considered a valid database? Seems a little counter-intuitive?
Well I don't think it's "valid". If it's zero bytes, it's nothing, other than a reference. Yes you can open it. But that's not working if the file has >= 1 byte.
So as long as the file is 0 bytes, you can open it; once you start making changes, it will become an SQLite file.
If you open a non-SQLite file and try to make changes, it will give you an error message:
Error: file is encrypted or is not a database
The (only) way to create a new, empty database is to attempt to open a non-existing file.
A zero-sized file is considered the same; it's just an empty database.
There are some settings (such as the page size) that must be set inside an opened database connection, but that would affect the structure of the database file.
Therefore, SQLite delays actually writing the database structure until it is actually needed.

What Exactly are Anonymous Files

A passage in the file documentation caught my eye:
## We can do the same thing with an anonymous file.
Tfile <- file()
cat("abc\ndef\n", file = Tfile)
readLines(Tfile)
close(Tfile)
What exactly is this anonymous file? Does it exist on disk, or only in memory? I'm interested in this as I'm contemplating a program that will potentially need to create/delete thousands of temporary files, and if this happens only in memory it seems like it would have a much lesser impact on system resources.
This linux SO Q appears to suggest this file could be a real disk file, but I'm not sure how relevant to this particular example that is. Additionally, this big memory doc seems to hint at a real disk based storage (though I'm assuming the file based anonymous file is being used):
It should also be noted that a user can create an “anonymous” file-backed big.matrix by specifying "" as the filebacking argument. In this case, the backing resides in the temporary directory and a descriptor file is not created. These should be used with caution since even anonymous backings use disk space which could eventually fill the hard drive. Anonymous backings are removed either manually, by a user, or automatically, when the operating system deems it appropriate.
Alternatively, if textConnection is appropriate for use for this type of application (opened/closed hundreds/thousands of times) and is memory only that would satisfy my needs. I was planning on doing this until I read the note in that function's documentation:
As output text connections keep the character vector up to date line-by-line, they are relatively expensive to use, and it is often better to use an anonymous file() connection to collect output.
My C is very rusty, so hopefully more experienced people can correct me, but I think the answer to your question "What exactly is this anonymous file? Does it exist on disk, or only in memory?" is "It exists on disk".
Here is what happens at C level (I'm looking at the source code at http://cran.r-project.org/src/base/R-3/R-3.0.2.tar.gz):
A. Function file_open, defined in src/main/connections.c:554, has the following logic related to anonymous file (with an empty description), lines 565-568:
if(strlen(con->description) == 0) {
temp = TRUE;
name = R_tmpnam("Rf", R_TempDir);
} else name = R_ExpandFileName(con->description);
So a new temporary filename is generated if no file name was supplied to file.
B. If the name of the file is not equal to stdin, the call R_fopen(name, con->mode) happens at line 585 (there some subtleties with Win32 and UTF8 names, but we can ignore them now).
C. Finally, the file name is unlinked at line 607. The documentation for unlink says:
The unlink() function removes the link named by path from its
directory and decrements the link count of the file which was
referenced by the link. If that decrement
reduces the link count of the file to zero, and no process has the file open, then all resources associated with the file are
reclaimed. If one or more process have the
file open when the last link is removed, the link is removed, but the removal of the file is delayed until all references to it have
been closed.
So in effect the directory entry is removed but file exists as long as it's being open by R process.
D. Finally, R_fopen is defined in src/main/sysutils.c:135 and just calls fopen internally.

How to read invariant csv files using c#

I am working on Windows Application development using c#. I want to read a csv file from a directory and imported into sql server database table. I am successfully read and import the csv file data into database table if the file content is uniform. But I am unable to insert the file data with invariant form ex.Actually my csv file delimiter is tab('\t') and after getting individual fields I have a field that contains data like dcc
Name
----
xxx
xxx yyy
xx yy zz
and i rerieved data like xxx,yyy and xx,yy,zz so the insertion becomes problem.
How could i insert the data uniformly into a database table.
It's pretty easy.
Just read file line-by-line. Example on MSDN here:
How to: Read Text from a File
For each line use String.Split Method with your tab as delimiter. Method documentation and sample are here:
String.Split Method (Char[], StringSplitOptions)
Then working insert your data.
If a CSV (or TSV) value contains a delimiter inside of it, then it should be surrounded by quotes. See the spec for more details: https://www.rfc-editor.org/rfc/rfc4180#page-3
So your input file is incorrectly formatted. If you can convince the input provider to fix this issue, that will be the best way to fix the problem. If not, other solutions may include:
visually inspecting and editing the file to fix errors, or
writing your parser program to have enough knowledge of your data expectations that it can correctly "guess" where the real delimiters are.
If I'm understanding you correctly, the problem is that your code is splitting on spaces instead of on tabs. Given you have read in the lines from the file, all you need to do is:
string[] fileLines;//from the file
foreach(string line in fileLines)
{
string[] lineParts=line.Split(new char[]{'\t'});
}
and then do whatever you want with each lineParts. The \t is the tab character.
If you're also asking about writing the lines to a database file...you can just read in tab-delimited files with the Import Export Wizard (assuming you're using Sql Server Mgmt Studio, but I'm sure there are comparable ways to import using other db management software).

building excel file gives memory out of exception

I need to export huge amount of data from ado.net datatable(which i get by db query) to excel.
I tried the following way :
1. Create excel object with workbook/worksheet # server side...and use memory stream to write whole document to client side.
But this gave me "out of Memory exception". bcoz my memory stream was was so huge.
So I replaced this with a new way - as follows :
Writing each row from datatable as a coma seperated string to client side.
So, as and when we get each row ...we can write to client side ..no memory is used.
But by this way we can write to csv file...not to excel...
Does anybody know how to handle this situation.
Can I use silverlight to get data row by row from server, pass it to client side...build excel at client side.?
try spreadsheetgear
OR
smartxls
I'd keep the csv approach but write to a file and not the memory strea. After you've created the file, I'd use TransmitFile to get it to the browser. You can see more about using TransmitFile here.

excel export Memory exception

I need to export huge amount of data from ado.net datatable(which i get by db query) to excel.
I tried the following way : 1. Create excel object with workbook/worksheet # server side...and use memory stream to write whole document to client side.
But this gave me "out of Memory exception". bcoz my memory stream was was so huge.
So I replaced this with a new way - as follows :
Writing each row from datatable as a coma seperated string to client side. So, as and when we get each row ...we can write to client side ..no memory is used.
But by this way we can write to csv file...not to excel...
Does anybody know how to handle this situation.
Can I use silverlight to get data row by row from server, pass it to client side...build excel at client side.?
You should create it on the server, then copy it to the client in chunks.
For an example, see this answer.
If this is for XL 2007, then the workbook is basically in an OPEN XML file format.
If you can format the data in your datatable to conform to the OPEN XML, you can save the file and then just download the entire file.
Read up on OPEN XML at http://msdn.microsoft.com/en-us/library/aa338205.aspx

Resources