I need to set custom thousands separator in PHPExcel for all. According this site https://exceljet.net/custom-number-formats I wrote this pattern # ##0.00 € and it works with thousands BUT the millions are not separated correctly. For example if input is 1 000 000 000 result is 1000000 000.00 €. For now I made this pattern # ### ### ### ##0.00 € but it is not clean. Is there a way to set separator for thousands, millions .... ?
Related
Take this CSV file:
ID,NAME,VALUE
1,Blah,100
2,"Has space",200
3,"Ends with quotes"",300
4,""Surrounded with quotes"",300
It loads just fine in most statistical programs (R, SAS, etc.) but in Excel the third row is misinterpreted because it has two quotation marks. Escaping the last quote as \" will also not work in Excel. The only way I have found so far is to replace the one double quote with two double quotes:
ID,NAME,VALUE
1,Blah,100
2,"Has space",200
3,"Ends with quotes""",300
4,"""Surrounded with quotes""",300
But that would render the file completely useless for all other programs (R, SAS, etc.)
Is there a way to format the CSV file where strings can begin or end with the same characters as that used to surround them, such that it would work in Excel as well as commonly used statistical software?
Your second representation is the normal way to generate a CSV file and so should be easy to work with in any software. See the RFC 4180 specifications. https://www.ietf.org/rfc/rfc4180.txt
So your second example represents this data:
Obs id name value
1 1 Blah 100
2 2 Has space 200
3 3 Ends with quotes" 300
4 4 "Surrounded with quotes" 300
If you want to represent it as a delimited file where none of the values are allowed to contain the delimiter (in other words NOT as a standard CSV file) than it would look like:
id,name,value
1,Blah,100
2,Has space,200
3,Ends with quotes",300
4,"Surrounded with quotes",300
But if you want to allow the values to contain the delimiter then you need some way to distinguish embedded delimiters from real delimiters. So the standard forces values that contain the delimiter to be quoted. But once you do that you also need to also add quotes around fields that contain the quote character itself (and double the embedded quotes) to avoid making an ambiguous file. For example the quotes in the 4th observation in your first file look like they are optional quotes around a value instead of part of the value.
Many programs try to handle ambiguous situations. For example SAS does not allow values to contain embedded line breaks so you will always get four observations with your first example file.
But EXCEL allows the embedding of the end of line character(s) inside of quoted values. So in your original file the value of the second field in the third observations looks like what you would start to get if you added quotes around this value:
Ends with quotes",300
4,"Surrounded with quotes",300
So instead of 4 complete observations of three fields values in each there are only three observations and the last observation has only two field values.
This is caused by the fact that escape character for " in Excel is "": Escaping quotes and delimiters in CSV files with Excel
A quick and simple workaround that comes to mind in R is to first read the content of the csv with readLines, then replace the double (escaped) double quotes with just one double quotes, and then read.table:
read.table(
text = gsub(pattern = "\"\"", "\"", readLines("data.csv")),
sep = ",",
header = TRUE
)
I have some .ttl files with doubles and floats with . (point) as the decimal separator.
Is possible to change the decimal separator to a , (comma) when loading to OpenLink Virtuoso v07.20.3213?
Turtle relies on XML Schema Datatypes, in which the only valid decimal separator is the dot.
Subsequent (re)presentation of these values may vary based on locale (which may change the decimal separator to comma and/or add a thousands separator), but that seems like a different question...
(Note that v07.20.3213 is rather elderly, as of this writing; updating to current v7.20.3217 or later is recommended for all users, whether Open Source or Commercial Edition.)
(ObDisclaimer: I work for OpenLink Software, producer of Virtuoso.)
If a CSV file structure differs from the default CSV file settings, the loader will look for a configuration file of the same name as the CSV file with a .cfg filename extension. This file should contain parameters similar to those below, indicating the CSV file's structure:
[csv]
csv-delimiter=<delimiter char>
csv-quote=<quote char>
header=<zero based header offset>
offset=<zero based data offset>
Invisible "tab" and "space" delimiters should be specified by those names, without the quotation marks.
Other delimiter characters (comma, period, etc.) should simply be typed in.
"Smart" quotation marks which differ at start and end (including but not limited to « », ‹ ›, “ ”, and ‘ ’) are not currently supported.
Example
Consider loading a gzipped CSV file, csv-example.csv.gz, with the non-default CSV structure below:
'Southern North Island wood availability forecast for the period
2008-2040' 'Table 14: Wood availability and average clearfell age
for other species in Eastern Southern North Island' 'Year
ending' 'Recoverable volume' 'Average age' 'December' '(000 m3
i.b.)' '(years)' 2006 0 0 2007 0 0 2008 48 49 2009 45 46
...
In this example
the header is on the third line, #2 with a zero-base
the data starts from the fifth line, #4 with a zero-base
the delimiter is tab
the quote char is the single-quote, or apostrophe
Loading this file requires the creation of a configuration file, csv-example.cfg, containing the entries:
[csv]
csv-delimiter=tab
csv-quote='
header=2
offset=4
More Info..
I have a data.frame that looks like this:
a=data.frame(c("MARCH3","SEPT9","XYZ","ABC","NNN"),c(1,2,3,4,5))
> a
c..MARCH3....SEPT9....XYZ....ABC....NNN.. c.1..2..3..4..5.
1 MARCH3 1
2 SEPT9 2
3 XYZ 3
4 ABC 4
5 NNN 5
Write into csv: write.csv(a,"test.csv")
I want everything to stay the way it is but MARCH3 and SEPT9 become 3-Mar and 9-Sep. I have tried everything in Excel: formatting by date, text, custom...none works. 3-Mar would be converted to 42066 and 9-Sep to 42256. In reality, a is a fairly large table so this can't even be done manually. Is there a way to coerce a[,1] so that Excel would ignore its format?
The best way to prevent Excel from autoformatting would probably be to store the data as excel file:
library(xlsx)
write.xlsx(a, "test.xlsx")
Your best bet is probably to change the file extension (e.g. make it ".txt" or ".dat" or something like that). When you open such a file in Excel the text import wizard will open. Specify that the file is delimited with commas, then make sure to change the appropriate column from "General" to "Text".
As an example: looking at the data in the question it appears that your CSV file might look like
,,,,MARCH3,,,,1
,,,,SEPT9,,,,2
,,,,XYZ,,,,3
,,,,ABC,,,,4
,,,,NNN,,,,5
If I save this file with a ".csv" extension and open it in Excel I get:
3-Mar 1
9-Sep 2
XYZ 3
ABC 4
NNN 5
with the date values changed as you noted. When I change the file extension to ".dat", making no other changes to the file, and open it in Excel I'm presented with the Text Import Wizard. I tell Excel that the file is "Delimited", choose "Comma" as the delimiter, and in the column with the "MARCH3" and "SEPT9" values I change the Column Data Type to "Text" (instead of "General"). After I clicked the Finish button on the wizard I got the following data in the spreadsheet:
MARCH3 1
SEPT9 2
XYZ 3
ABC 4
NNN 5
I tried putting the MARCH3 and SEPT9 values in double-quotes to see if that would convince Excel to treat these values as text but Excel still converted these cells to dates.
Share and enjoy.
My solution was to append a semicolon to all the gene names. The added character convinces excel that this column is text not a date. You can find and replace the semicolon later is you want, but most programs - like perseus will allow you to ignore everything after the semicolon so its not always a problem...
df$Gene.name <- paste(df$Gene.name, ";", sep="")
I would be interested in anyone has a trick for doing this to just the Sept, March gene names though...
I am having a data set (about 500,000 rows * 20 cols). most of the rows are separated by space, but there are some outliers are not (bad records I guess). I am trying to load the data into R using fread(), but it always throws an error at me because there are rows that are not separated -
error message -
Expected sep (' ') but '
' ends field 1 on line 247172 when detecting types: 1128=99=55035=d49=CME34=410252=2014121417033281615=USD22=848=120255=HI107=LAXX9-MIAX9200=201911202=0207=XCME461=FMAXSX462=2555=2600=[N/A]602=354603=8623=1624=1600=[N/A]602=222603=8623=1624=2562=1731=1762=IS827=2864=2865=5866=201411241145=223000000865=7866=201911251145=200000000870=5871=24872=1871=24872=3871=24872=4871=24872=11871=24872=14947=USD969=20996=CTRCT1140=9991141=21022=GBX264=51022=GBI264=21142=T1143=4001144=31146=01147=01150=37801151=LAX1180=131300=705796=201412129787=0.019850=010=101
Is there a way to skip these records?
Thanks.
That's not a particularly huge file. Try something along these lines:
table( count.fields("path/to/file/filename.txt", quote="", sep=" ") )
This will tabulate the number of "deformed lines". If they are not too frequent, then you should just edit with an ordinary text editor. If a pure R solution is needed then use readLines to bring into the R workspace and gregexpr to count spaces in each line.
I'm reading a csv file in R that includes a conversion ID column. The issue I'm running into is that my conversionID is being rounded as an exponential number. Below is snapshot of the CSV file (opened in Excel) that I'm reading into R. As you can see, the conversion ID is an exponential format, but the value is: 383305820480.
When I read the data into R, using the following lines, I got the following output. Which looks like it's rounding the string of conversion IDs.
x<-read.csv("./Test2.csv")
options("scipen"=100, "digits"=15)
x
When I export the file as CSV, using the code
write.csv(x,"./Test3.csv")
I get the following output. As you can see, I no longer have a unique identifier as it rounds the number.
I also tried reading the file as a factor, using the code, but I get the same output with numbers rounded. I need the Conversion.ID to be a unique identifier.
x<-read.csv("./Test2.csv", colClasses="character")
The only way I can get the Conversion ID column to stay as a unique identifier is to open the CSV file and write a ' in front of each conversion ID. That is not scalable because I have hundreds of files.
I can't replicate your experience.
(Update: OP reports that the problem is actually with Excel converting/rounding the data on import [!!!])
I created a file on disk with full precision (I don't know the least-significant digits of your data, you didn't show them except for the first element, but I put a non-zero value in the units place for illustration):
writeLines(c(
"Conversion ID",
" 383305820480",
" 39634500000002",
" 213905000000002",
"1016890000000002",
"1220910000000002"),
con="Test2.csv")
Read the file and print it with full precision (use check.names=FALSE for perfect "round trip" capability -- not something you want to do on a regular basis):
x <- read.csv("Test2.csv",check.names=FALSE)
options(scipen=100)
print(x,digits=20)
## Conversion ID
## 1 383305820480
## 2 39634500000002
## 3 213905000000002
## 4 1016890000000002
## 5 1220910000000002
Looks OK.
Now write output (use row.names=FALSE to avoid adding row names/allow a clean round-trip):
write.csv(x,"Test3.csv",row.names=FALSE,quote=FALSE)
The least-mediated way to examine a file on disk from within R is file.show():
file.show("Test3.csv")
## Conversion ID
## 383305820480
## 39634500000002
## 213905000000002
## 1016890000000002
## 1220910000000002
x3 <- read.csv("Test3.csv",check.names=FALSE)
all.equal(x,x3) ## TRUE
Use system tools to check that the files are the same (except for white space differences -- the original file was right-justified):
system("diff -w Test2.csv Test3.csv") ## no difference
If you have even longer ID strings you will need to read them as character to avoid loss of precision:
read.csv("Test2.csv",colClasses="character")
## Conversion.ID
## 1 383305820480
## 2 39634500000002
## 3 213905000000002
## 4 1016890000000002
## 5 1220910000000002
You could probably round-trip through Excel more safely (if you still think that's a good idea) by importing as character and exporting with quotation marks to protect the values.
I just figured out the issue. It looks like my version of Excel is converting the data, causing it to lose the digits. If I avoid opening the file in Excel after downloading it, it retains all the digits. I'm not sure if this is a known issue with newer version. I'm using Excel Office Professional Plus 2013.