May I use comma as decimal separator on SQLite? - sqlite

I imported TSV files using Firefox SQLite Manager but the decimal separator is comma and math functions are ignoring the decimal part of the value. Could you help me?
Thank you

I haven't used Firefox SQLite but I have used SQLiteman and with this GUI was a problem work with comma as decimal mark (if your database is light perhaps you can 'replace' those commas using an spreadsheet or something like that...). In my case, as my database was really large, the solution was use the raw SQLite shell: I imported perfectly a csv file (separated by ;) with commas as decimal mark. Additionally, the importing process was faster. Cheers.

Related

readcsv fails to read # character in Julia

I've been using asd=readcsv(filename) to read a csv file in Julia.
The first row of the csv file contains strings which describe the column contents; the rest of the data is a mix of integers and floats. readcsv reads the numbers just fine, but only reads the first 4+1/2 string entries.
After that, it renders "". If I ask the REPL to display asd[1,:], it tells me it is 1x65 Array{Any,2}.
The fifth column in the first row of the csv file (this seems to be the entry it chokes on) is APP #1 bias voltage [V]; but asd[1,5] is just APP . So it looks to me as though readcsv has choked on the "#" character.
I tried using "quotes=false" keyword in readcsv, but it didn't help.
I used to use xlsread in Matlab and it worked fine.
Has anybody out there seen this sort of thing before?
The comment character in Julia is #, and this applies when reading files from delimited text files.
But luckily, the readcsv() and readdlm() functions have an optional argument to help in these situations.
You should try readcsv(filename; comment_char = '/').
Of course, the example above assumes that you don't have any / characters in your first line. If you do, then you'll have to change that / above to something else.

CSV file is not recognized

Im exporting an excel file into a .csv file (cause I want to import it into R) but R doesn't recognize it.
I think this is because when I open it in notepad I get:
Item;Description
1;ja
2;ne
While a file which does not have any import issues is structured like this in notepad:
"Item","Description"
"1","ja"
"2","ne"
Does anybody know how I can either export it from excel in the right format or import a csv file with ";" seperator into R.
It's easy to deal with semicolon-delimited files; you can use read.csv2() instead of read.csv() (although be aware this will also use comma as the decimal separator character!), or specify sep=";".
Sorry to ask, but did you try reading ?read.csv ? The relevant information is in there, although it might admittedly be a little overwhelming/hard to sort out if you're new to R:
sep: the field separator character. Values on each line of the
file are separated by this character. If ‘sep = ""’ (the
default for ‘read.table’) the separator is ‘white space’,
that is one or more spaces, tabs, newlines or carriage
returns.

PHPExcel converts dot after writing in comma

I generate an Excel file with importing a csv file. In CSV has contents with following numbers
4.0238484
5.3833888
dot seperated
But if I write an Excel file than the column show me the numbers in following format
4,0238484
5,3833888
I want the dot instead of comma.
How can I make it?
PHPExcel Version 1.7.7
In PhpSpreadsheet (next generation of PhpExcel) you can avoid it's by this:
$sheet->setCellValueExplicitByColumnAndRow(__COL_INDEX__, __ROW_INDEX__, __SOME_DATA__, DataType::TYPE_STRING);
setCellValueExplicitByColumnAndRow can force turn your data into string
Check the locale settings for the version of MS Excel that you are using to view the generated file: if it's set to a locale that uses a decimal comma rather than a decimal point, then this is what you will see. Floating point numbers in PHPExcel are managed with a decimal point (PHP doesn't offer any alternative for numbers), but MS Excel has its own formatting rules based on locale.

repair data in csv file

I have a huge csv file, separated by comma's and I want to do a analysis with glm in R.
In one column there exists data with a comma implied, something like: bla,blabla
When reading the file in R with read.csv.sql there comes a error-message:
RS-DBI driver: (RS_sqlite_import: ./agp.csv line 47612 expected 37 columns of data but found 38)
This is due to the 'extra' comma in some of the data, not the whole column has an extra column.
How can I fix this? I want to remove this extra superfluous comma.
Thanks for the reaction,
André
The CSV format is very simple and can easily be hand edited. In order to include a comma in a value, you must surround the value with quotes quotes. Try this: "bla,blabla". If that data happens to contain any quotes, eg. blah,"thequotedblah",blah, those quotes need to be escaped with another quote, like this: "blah,""thequotedblah"",blah".
Although there is no official standard around it, there isn't much to the CSV format. Wikipedia has a great CSV reference that I have personally used to implement CSV support in applications. Spend 5-10 minutes reading it and you'll know everything you ever need to know to manually create/read/repair CSV data.
Is it just this one line that contains a non-quoted comma - or are there several such lines? Editing the .csv with an editor that can handle large files (e.g. Ultraedit) to sanitize that one record would certainly help. Asaph's suggestion of quoting is also a good 'un.

Fixing Unicode Byte Sequences

Sometimes when copying stuff into PostgreSQL I get errors that there's invalid byte sequences.
Is there an easy way using either vim or other utilities to detect byte sequences that cause errors such as: invalid invalid byte sequence for encoding "UTF8": 0xde70 and whatnot, and possibly and easy way to do a conversion?
Edit:
What my workflow is:
Dumped sqlite3 database (from trac)
Trying to replay it in postgresql
Perhaps there's an easier way?
More Edit:
Also tried these:
Running enca to detect encoding of the file
Told me it was ASCII
Tried iconv to convert from ASCII to UTF8. Got an error
What did work is deleting the couple erroneous lines that it complained about. But that didn't really solve the real problem.
Based on one short sentence, it sounds like you have text in one encoding (e.g. ANSI/ASCII) and you are telling PostgreSQL that it's actually in another encoding (Unicode UTF8). All the different tools you would be using: PostgreSQL, Bash, some programming language, another programming language, other data from somewhere else, the text editor, the IDE, etc., all have default encodings which may be different, and some step of the way, the proper conversions are not being done. I would check the flow of data where it crosses these kinds of boundaries, to ensure that either the encodings line up, or the encodings are properly detected and the text is properly converted.
If you know the encoding of the dump file, you can convert it to utf-8 by using recode. For example, if it is encoded in latin-1:
recode latin-1..utf-8 < dump_file > new_dump_file
If you are not sure about the encoding, you should see how sqlite was configured, or maybe try some trial-and-error.
I figured it out. It wasn't really an encoding issue.
SQLite's output escaped strings differently than Postgres expects. There were some cases where 'asdf\xd\foo' was outputted. I believe the '\x' was causing it to expect the following characters to be unicode encoding.
Solution to this is dumping each table individually in CSV mode in sqlite 3.
First
sqlite3 db/trac.db .schema | psql
Now, this does the trick for the most part to copy the data back in
for table in `sqlite3 db/trac.db .schema | grep TABLE | sed 's/.*TABLE \(.*\) (/\1/'`
do
echo ".mode csv\nselect * from $table;" | sqlite3 db/trac.db | psql -c "copy $table from stdin with csv"
done
Yeah, kind of a hack, but it works.

Resources