Using Aspose.Cells I need a csv file with my terms wraped between double quotes. For some reason the terms after the first column are surrounded by triple double quotes when saving my csv file, like: """term"""
This is my code:
WorkbookDesigner wd = new WorkbookDesigner();
wd.Workbook.Initialize();
Worksheet sht = wd.Workbook.Worksheets[0];
string value = "term";
value = string.Format("\"{0}\"", value);
sht.Cells[row + rowOffset, column].PutValue(value);
wd.Save(fileLoc, FileFormatType.CSV);
I'm using Aspose.Cells version 4.8.0.0
If you want to get a CSV file with your term as
"term"
then you should save it in your CSV file as
"""term"""
Please type
"""term"""
in a notepad and save it as .csv file. Now open it in Microsoft Excel and see its value, you will find it is displayed as
"term"
Note: I am working as Developer Evangelist at Aspose
In CSV, the escape sequence for the quote character is 2 quote characters.
You are writing the value "term" (with quotes). Because of that (your value has quotes), the library will enclose the value with quotes, so you will end up with """term""".
To make it less confusing and more visible, try writing the value: some " term ": blah. This should be printed as:
"some ""term"": blah"
Related
Take this CSV file:
ID,NAME,VALUE
1,Blah,100
2,"Has space",200
3,"Ends with quotes"",300
4,""Surrounded with quotes"",300
It loads just fine in most statistical programs (R, SAS, etc.) but in Excel the third row is misinterpreted because it has two quotation marks. Escaping the last quote as \" will also not work in Excel. The only way I have found so far is to replace the one double quote with two double quotes:
ID,NAME,VALUE
1,Blah,100
2,"Has space",200
3,"Ends with quotes""",300
4,"""Surrounded with quotes""",300
But that would render the file completely useless for all other programs (R, SAS, etc.)
Is there a way to format the CSV file where strings can begin or end with the same characters as that used to surround them, such that it would work in Excel as well as commonly used statistical software?
Your second representation is the normal way to generate a CSV file and so should be easy to work with in any software. See the RFC 4180 specifications. https://www.ietf.org/rfc/rfc4180.txt
So your second example represents this data:
Obs id name value
1 1 Blah 100
2 2 Has space 200
3 3 Ends with quotes" 300
4 4 "Surrounded with quotes" 300
If you want to represent it as a delimited file where none of the values are allowed to contain the delimiter (in other words NOT as a standard CSV file) than it would look like:
id,name,value
1,Blah,100
2,Has space,200
3,Ends with quotes",300
4,"Surrounded with quotes",300
But if you want to allow the values to contain the delimiter then you need some way to distinguish embedded delimiters from real delimiters. So the standard forces values that contain the delimiter to be quoted. But once you do that you also need to also add quotes around fields that contain the quote character itself (and double the embedded quotes) to avoid making an ambiguous file. For example the quotes in the 4th observation in your first file look like they are optional quotes around a value instead of part of the value.
Many programs try to handle ambiguous situations. For example SAS does not allow values to contain embedded line breaks so you will always get four observations with your first example file.
But EXCEL allows the embedding of the end of line character(s) inside of quoted values. So in your original file the value of the second field in the third observations looks like what you would start to get if you added quotes around this value:
Ends with quotes",300
4,"Surrounded with quotes",300
So instead of 4 complete observations of three fields values in each there are only three observations and the last observation has only two field values.
This is caused by the fact that escape character for " in Excel is "": Escaping quotes and delimiters in CSV files with Excel
A quick and simple workaround that comes to mind in R is to first read the content of the csv with readLines, then replace the double (escaped) double quotes with just one double quotes, and then read.table:
read.table(
text = gsub(pattern = "\"\"", "\"", readLines("data.csv")),
sep = ",",
header = TRUE
)
I have a .csv file similar to this one:
id,text,value
1,'foo, bar',5
2,'foo',5
3,'foo"bar',4
It uses single quotes for text.
I would like to read it in with fread(). I tried fread('text.csv', quote = "\'"), but I get 'unused argument' error. Is it possible to read the file in somehow? Can I somehow change the default quoting character?
(Using command line tool as sed to change the single quotes to double quotes also does not work as some field contains double quotes as well.)
Suppose I have a csv file looks like this:
Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,"","I have comma, ha!",I have open double quotes",A,""
desired output should be:
df <- data.frame(Type='A',ID=3, NAME=NA, CONTENT='I have comma, ha!',
RESPONSE='I have open double quotes\"', GRADE=A, SOURCE=NA)
df
Type ID NAME CONTENT RESPONSE GRADE SOURCE
1 A 3 NA I have comma, ha! I have open double quotes" A NA
I tried to use read.csv, since the data provider uses quote to escape comma in the string, but they forgot to escape double quotes in string with no comma, so no matter whether I disable quote in read.csv I won't get desired output.
How can I do this in R? Other package solutions are also welcome.
fread from data.table handles this just fine:
library(data.table)
fread('Type,ID,NAME,CONTENT,RESPONSE,GRADE,SOURCE
A,3,"","I have comma, ha!",I have open double quotes",A,""')
# Type ID NAME CONTENT RESPONSE GRADE SOURCE
#1: A 3 I have comma, ha! I have open double quotes" A
I'm not too sure about the structure of CSV files, but you said the author had escaped the comma in the text under content.
This works to read the text as is with the " at the end.
read.csv2("Test.csv", header = T,sep = ",", quote="")
This is not valid CSV, so you'll have to do your own parsing. But, assuming the convention is as follows, you can just toggle with scan to take advantage of most of its abilities:
If the field starts with a quote, it is quoted.
If the field does not start with a quote, it is raw
next_field<-function(stream) {
p<-seek(stream)
d<-readChar(stream,1)
seek(stream,p)
if(d=="\"")
field<-scan(stream,"",1,sep=",",quote="\"",blank=FALSE)
else
field<-scan(stream,"",1,sep=",",quote="",blank=FALSE)
return(field)
}
Assuming the above convention, this sufficient to parse as follows
s<-file("example.csv",open="rt")
header<-readLines(s,1)
header<-scan(what="",text=header,sep=",")
line<-replicate(length(header),next_field(s))
setNames(as.data.frame(lapply(line,type.convert)),header)
Type ID NAME CONTENT RESPONSE GRADE SOURCE
1 A 3 NA I have comma, ha! I have open double quotes" A NA
However, in practice you might want to first write back the fields, quoting each, to another file, so you can just read.csv on the corrected format.
I have a large dataset in dbf file and would like to export it to the csv type file.
Thanks to SO already managed to do it smoothly.
However, when I try to import it into R (the environment I work) it combines some characters together, making some rows much longer than they should be, consequently breaking the whole database. In the end, whenever I import the exported csv file I get only half of the db.
Think the main problem is with quotes in string characters, but specifying quote="" in R didn't help (and it helps usually).
I've search for any question on how to deal with quotes when exporting in visual foxpro, but couldn't find the answer. Wanted to test this but my computer catches error stating that I don't have enough memory to complete my operation (probably due to the large db).
Any helps will be highly appreciated. I'm stuck with this problem on exporting from the dbf into R for long enough, searched everything I could and desperately looking for a simple solution on how to import large dbf to my R environment without any bugs.
(In R: Checked whether have problems with imported file and indeed most of columns have much longer nchars than there should be, while the number of rows halved. Read the db with read.csv("file.csv", quote="") -> didn't help. Reading with data.table::fread() returns error
Expected sep (',') but '0' ends field 88 on line 77980:
But according to verbose=T this function reads right number of rows (read.csv imports only about 1,5 mln rows)
Count of eol after first data row: 2811729 Subtracted 1 for last eol
and any trailing empty lines, leaving 2811728 data rows
When exporting to TYPE DELIMITED You have some control on the VFP side as to how the export formats the output file.
To change the field separator from quotes to say a pipe character you can do:
copy to myfile.csv type delimited with "|"
so that will produce something like:
|A001|,|Company 1 Ltd.|,|"Moorfields"|
You can also change the separator from a comma to another character:
copy to myfile.csv type delimited with "|" with character "#"
giving
|A001|#|Company 1 Ltd.|#|"Moorfields"|
That may help in parsing on the R side.
There are three ways to delimit a string in VFP - using the normal single and double quote characters. So to strip quotes out of character fields myfield1 and myfield2 in your DBF file you could do this in the Command Window:
close all
use myfile
copy to mybackupfile
select myfile
replace all myfield1 with chrtran(myfield1,["'],"")
replace all myfield2 with chrtran(myfield2,["'],"")
and repeat for other fields and tables.
You might have to write code to do the export, rather than simply using the COPY TO ... DELIMITED command.
SELECT thedbf
mfld_cnt = AFIELDS(mflds)
fh = FOPEN(m.filename, 1)
SCAN
FOR aa = 1 TO mfld_cnt
mcurfld = 'thedbf.' + mflds[aa, 1]
mvalue = &mcurfld
** Or you can use:
mvalue = EVAL(mcurfld)
** manipulate the contents of mvalue, possibly based on the field type
DO CASE
CASE mflds[aa, 2] = 'D'
mvalue = DTOC(mvalue)
CASE mflds[aa, 2] $ 'CM'
** Replace characters that are giving you problems in R
mvalue = STRTRAN(mvalue, ["], '')
OTHERWISE
** Etc.
ENDCASE
= FWRITE(fh, mvalue)
IF aa # mfld_cnt
= FWRITE(fh, [,])
ENDIF
ENDFOR
= FWRITE(fh, CHR(13) + CHR(10))
ENDSCAN
= FCLOSE(fh)
Note that I'm using [ ] characters to delimit strings that include commas and quotation marks. That helps readability.
*create a comma delimited file with no quotes around the character fields
copy to TYPE DELIMITED WITH "" (2 double quotes)
When I tried to read a csv file using data.table:fread(fn, sep='\t', header=T), it gives an "Unbalanced " observed on this line" error. The data has 3 integer variables and 1 string variable. The strings in the csv file are not enclosed with ", and yes there are some lines that contains " within the string variable and the " characters are not in pairs.
I am wondering is it possible to let fread just ignore the unpaired " in the variable and continue reading data? Thanks.
Here is the sample data(just one record)
N_ID VISIT_DATE REQ_URL REQType
175931 2013-3-8 23:40:30 http://aaa.com/rest/api2.do?api=getSetMobileSession&data={"imei":"60893ZTE-CN13cd","appkey":"android_client","content":"Z0JiRA0qPFtWM3BYVltmcx5MWF9ZS0YLdW1ydXoqPycuJS8idXdlY3R0TGBtU 1
UPDATE: Now implemented in v1.8.11
From NEWS :
fread now accepts quotes (both ' and ") in the middle of fields,
whether the field starts with " or not, rather than the 'unbalanced
quotes' error, #2694. Thanks to baidao for reporting. It was known and
documented at the top of ?fread (text now removed). If a field starts
with " it must end with " (necessary if the field separator itself is in the
field contents). Embedded quotes can be in column names too. Newlines (\n)
still can't be in quoted fields or quoted column names, yet.
Yes as #agstudy said, embedded quotes are a known documented problem not yet implemented since fread is new. Strictly speaking, I suppose these ones aren't embedded because the string in your example doesn't start with a quote, though.
Anyway, I've filed this as a bug report so it doesn't get forgotten. To be done in the next release. Thanks for highlighting.
#2694 : Strings including quotes but not starting with quote in fread