R: accented characters in data frame - r

I'm confused about why certain characters (e.g. "Ě", "Č", and "ŝ") lose their diacritical marks in a data frame, while others (e.g. "Š" and "š") do not. My OS is Windows 10, by the way. In my sample code below, a vector czechvec has 11 single-character strings, all Slavic accented characters. R displays those characters properly. Then a data frame mydf is created with czechvec as the second column (the function I() is used so it won't be converted to a factor). But then when R displays mydf or any row of mydf, it converts most of these characters to their plain-ascii equivalent; e.g. mydf[3,] shows the character as "E" not "Ě". But subscripting with row and column, e.g. mydf[3,2], it properly shows the accented character ("Ě"). Why should it make a difference whether R displays the whole row or just one cell? And why are some characters like "Š" completely unaffected? Also when I write this data frame to a file, it completely loses the accent, even though I specify fileEncoding="UTF-8".
> charvals <- c(193, 269, 282, 268, 262, 263, 348, 349, 350, 352, 353)
> hexvals <- as.hexmode(charvals)
> czechvec <- unlist(strsplit(intToUtf8(charvals), ""))
> czechvec
[1] "Á" "č" "Ě" "Č" "Ć" "ć" "Ŝ" "ŝ" "Ş" "Š" "š"
>
> mydf = data.frame(dec=charvals, char=I(czechvec), hex=I(format(hexvals, width=4, upper.case=TRUE)))
> mydf
dec char hex
1 193 Á 00C1
2 269 c 010D
3 282 E 011A
4 268 C 010C
5 262 C 0106
6 263 c 0107
7 348 S 015C
8 349 s 015D
9 350 S 015E
10 352 Š 0160
11 353 š 0161
> mydf[3,2]
[1] "Ě"
> mydf[3,]
dec char hex
3 282 E 011A
>
> write.table(mydf, file="myfile.txt", fileEncoding="UTF-8")
>
> df2 <- read.table("myfile.txt", stringsAsFactors=FALSE, fileEncoding="UTF-8")
> df2[3,2]
[1] "E"
Edited to add: Per Ernest A's answer, this behaviour is not reproducible in Linux. It must be a Windows issue. (I'm using R 3.4.1 for Windows.)

I cannot reproduce this behaviour, using R version 3.3.3 (Linux).
> data.frame(dec=charvals, char=I(czechvec), hex=I(format(hexvals, width=4, upper.case=TRUE)))
dec char hex
1 193 Á 00C1
2 269 č 010D
3 282 Ě 011A
4 268 Č 010C
5 262 Ć 0106
6 263 ć 0107
7 348 Ŝ 015C
8 349 ŝ 015D
9 350 Ş 015E
10 352 Š 0160
11 353 š 0161

Thanks to Ernest A's answer checking that the weird behaviour I observed does not occur in Linux, I Googled R WINDOWS UTF-8 BUG which led me to this article by Ista Zahn: Escaping from character encoding hell in R on Windows
The article confirms there is a bug in the data.frame print method on Windows, and gives some workarounds. (However, the article doesn't note the issue with write.table in Windows, for data frames with foreign-language text.)
One workaround suggested by Zahn is to change the locale to suit the particular language we are working with:
Sys.setlocale(category = "LC_CTYPE", locale = "czech")
charvals <- c(193, 269, 282, 268, 262, 263, 348, 349, 350, 352, 353)
hexvals <- format(as.hexmode(charvals), width=4, upper.case=TRUE)
df1 <- data.frame(dec=charvals, char=I(unlist(strsplit(intToUtf8(charvals), ""))), hex=I(hexvals))
print.listof(df1)
dec :
[1] 193 269 282 268 262 263 348 349 350 352 353
char :
[1] "Á" "č" "Ě" "Č" "Ć" "ć" "Ŝ" "ŝ" "Ş" "Š" "š"
hex :
[1] "00C1" "010D" "011A" "010C" "0106" "0107" "015C" "015D" "015E" "0160"
[11] "0161"
df1
dec char hex
1 193 Á 00C1
2 269 č 010D
3 282 Ě 011A
4 268 Č 010C
5 262 Ć 0106
6 263 ć 0107
7 348 S 015C
8 349 s 015D
9 350 Ş 015E
10 352 Š 0160
11 353 š 0161
Notice that the Czech characters are now displayed correctly but not "Ŝ" and "ŝ", Unicode U+015C and U+015D, which apparently are used in Esperanto. But with the print.listof command, all the characters are displayed correctly. (By the way, dput(df1) lists the Esperanto characters incorrectly, as "S" and "s".)
write.table(df1, file="special characters example.txt", fileEncoding="UTF-8")
df2 <- read.table("special characters example.txt", stringsAsFactors=FALSE, fileEncoding="UTF-8")
print.listof(df2)
dec :
[1] 193 269 282 268 262 263 348 349 350 352 353
char :
[1] "Á" "č" "Ě" "Č" "Ć" "ć" "S" "s" "Ş" "Š" "š"
hex :
[1] "00C1" "010D" "011A" "010C" "0106" "0107" "015C" "015D" "015E" "0160"
[11] "0161"
When I write.table df1 and then read.table it back as df2, the "Ŝ" and "ŝ" characters have lost their circumflex. This must be a problem with the write.table command, as confirmed when I open the file with a different application such as OpenOffice Writer. The Czech characters are all there correctly, but the "Ŝ" and "ŝ" have been changed to "S" and "s".
For the time being, the best workaround for my purposes is, instead of putting the actual character in my data frame, to record the Unicode value of it, then using write.table, and using the UNICHAR function in OpenOffice Calc to add the character itself to the file. But this is inconvenient.
I believe this same bug is relevant to this question: how to read data in utf-8 format in R?
Edited to add: Other similar questions I've now found on Stack Overflow:
Why do some Unicode characters display in matrices, but not data frames in R?
UTF-8 file output in R
Write UTF-8 files from R
And I found a workaround for the display issue by Peter Meissner here:
http://r.789695.n4.nabble.com/Unicode-display-problem-with-data-frames-under-Windows-tp4707639p4707667.html
It involves defining your own class unicode_df and print function print.unicode_df.
This still does not solve the issue I have with using write.table to write my data frame (which contains some columns with text in a variety of European languages) to a file that can be imported to a spreadsheet or any arbitrary application. But perhaps Meissner's solution can be adapted to work with write.table.

Here's a function write.unicode.csv that uses paste and writeLines (with useBytes=TRUE) to export a data frame containing foreign-language characters (encoded in UTF-8) to a csv file. All cells in the data frame will be enclosed in quote marks in the csv file.
#function that will create a CSV file for a data frame containing Unicode text
#this can be used instead of write.csv in R for Windows
#source: https://stackoverflow.com/questions/46137078/r-accented-characters-in-data-frame
#this is not elegant, and probably not robust
write.unicode.csv <- function(mydf, filename="") { #mydf can be a data frame or a matrix
linestowrite <- character( length = 1+nrow(mydf) )
linestowrite[1] <- paste('"","', paste(colnames(mydf), collapse='","'), '"', sep="") #first line will have the column names
if(nrow(mydf)<1 | ncol(mydf)<1) print("This is not going to work.") #a bit of error checking
for(k1 in 1:nrow(mydf)) {
r <- paste('"', k1, '"', sep="") #each row will begin with the row number in quotes
for(k2 in 1:ncol(mydf)) {r <- paste(r, paste('"', mydf[k1, k2], '"', sep=""), sep=",")}
linestowrite[1+k1] <- r
}
writeLines(linestowrite, con=filename, useBytes=TRUE)
} #end of function
Sys.setlocale(category = "LC_CTYPE", locale = "usa")
charvals <- c(193, 269, 282, 268, 262, 263, 348, 349, 350, 352, 353)
hexvals <- format(as.hexmode(charvals), width=4, upper.case=TRUE)
df1 <- data.frame(dec=charvals, char=I(unlist(strsplit(intToUtf8(charvals), ""))), hex=I(hexvals))
print.listof(df1)
write.csv(df1, file="test1.csv")
write.csv(df1, file="test2.csv", fileEncoding="UTF-8")
write.unicode.csv(df1, filename="test3.csv")
dftest1 <- read.csv(file="test1.csv", encoding="UTF-8", colClasses="character")
dftest2 <- read.csv(file="test2.csv", encoding="UTF-8", colClasses="character")
dftest3 <- read.csv(file="test3.csv", encoding="UTF-8", colClasses="character")
print("CSV file written using write.csv with no fileEncoding parameter:")
print.listof(dftest1)
print('CSV file written using write.csv with fileEncoding="UTF-8":')
print.listof(dftest2)
print("CSV file written using write.unicode.csv:")
print.listof(dftest3)

Related

Character conversion from raw in R giving unwanted result

I have a web response being returned in raw format which I'm unable to properly encode. It contains the following values:
ef bc 86
The character is meant to be a Fullwidth Ampersand (to illustrate below):
> as.character("\uFF06")
[1] "&"
> charToRaw("\uFF02")
[1] ef bc 82
However, no matter what I've tried it gets converted to ". To illustrate:
> rawToChar(charToRaw("\uFF02"))
[1] """
Because of the equivalence of the raw values, I don't think there's anything I can do in my web call to influence the problem I'm having (happy to be corrected). I believe I need to work out how to properly do the character encoding.
I also took an extreme approach of trying all other encodings as follows but none converted to the fullwidth ampersand:
> x_raw <- charToRaw("\uFF02")
> x_raw
[1] ef bc 82
> sapply(
+ stringi::stri_enc_list()
+ ,function(encoding) stringi::stri_encode(str = x_raw, encoding)
+ ) |> # R's new native pipe
+ tibble::enframe(name = "encoding")
# A tibble: 1,203 x 2
encoding value
<chr> <chr>
1 037 "Õ¯b"
2 273 "Õ¯b"
3 277 "Õ¯b"
4 278 "Õ¯b"
5 280 "Õ¯b"
6 284 "Õ¯b"
7 285 "Õ~b"
8 297 "Õ¯b"
9 420 "\u001a\u001ab"
10 424 "\u001a\u001ab"
# ... with 1,193 more rows
My work around at the moment is to replace the strings after the encoding, but this character is just one example of many, and hard-coding every instance doesn't seem practical.
> rawToChar(x_raw)
[1] """
> stringr::str_replace_all(rawToChar(x_raw), c(""" = "\uFF06"))
[1] "&"
The substitution workaround is also complicated that I've also got characters like the HYPHEN (not HYPEN-MINUS) somehow getting converted where the last to raw values are getting converted to a string with what appears to be octal values:
> as.character("\u2010") # HYPHEN
[1] "‐"
> as.character("\u2010") |> charToRaw() # As raw
[1] e2 80 90
> as.character("\u2010") |> charToRaw() |> rawToChar() # Converted back to string
[1] "â€\u0090"
> charToRaw("â\200\220") # string with equivalent raw
[1] e2 80 90
Any help appreciated.
I'm not totally clear on exactly what you are trying to do, but the problem with getting back your original character is that R cannot determine the encoding automatically from the raw bytes. I assume you are on Windows. If you do
val <- rawToChar(charToRaw("\uFF06"))
val
# [1] "&"
Encoding(val)
# [1] "unknown"
Encoding(val) <- "UTF-8"
val
# [1] "&"
Just make sure to set the encoding properly.

How to create specefic columns out of text in r

Here is just an example I hope you can help me with, given that the input is a line from a txt file, I want to transform it into a table (see output) and save it as a csv or tsv file.
I have tried with separate functions but could not get it right.
Input
"PR7 - Autres produits d'exploitation 6.9 371 667 1 389"
Desired output
Variable
note
2020
2019
2018
PR7 - Autres produits d'exploitation
6.9
371
667
1389
I'm assuming that this badly delimited data-set is the only place where you can read your data.
I created for the purpose of this answer an example file (that I called PR.txt) that contains only the two following lines.
PR6 - Blabla 10 156 3920 245
PR7 - Autres produits d'exploitation 6.9 371 667 1389
First I create a function to parse each line of this data-set. I'm assuming here that the original file does not contain the names of the columns. In reality, this is probably not the case. Thus this function that could be easily adapted to take a first "header" line into account.
readBadlyDelimitedData <- function(x) {
# Read the data
dat <- read.table(text = x)
# Get the type of each column
whatIsIt <- sapply(dat, typeof)
# Combine the columns that are of type "character"
variable <- paste(dat[whatIsIt == "character"], collapse = " ")
# Put everything in a data-frame
res <- data.frame(
variable = variable,
dat[, whatIsIt != "character"])
# Change the names
names(res)[-1] <- c("note", "Year2021", "Year2020", "Year2019")
return(res)
}
Note that I do not call the columns with the yearly figure by only "numeric" names because giving rows or columns purely "numerical" names is not a good practice in R.
Once I have this function, I can (l)apply it to each line of the data by combining it with readLines, and collapse all the lines with an rbind.
out <- do.call("rbind", lapply(readLines("tests/PR.txt"), readBadlyDelimitedData))
out
variable note Year2021
1 PR6 - Blabla 10.0 156
2 PR7 - Autres produits d'exploitation 6.9 371
Year2020 Year2019
1 3920 245
2 667 1389
Finally, I save the result with read.csv :
read.csv(out, file = "correctlyDelimitedFile.csv")
If you can get your hands on the Excel file, a simple gdata::read.xls or openxlsx::read.xlsx would be enough to read the data.
I wish I knew how to make the script simpler... maybe a tidyr magic person would have a more elegant solution?

Encoding Issues when reading R Object

I am reading an R object with the readRDS. It should have two columns, a year and a character string. For most rows, the character string is OK, but some have a strange white blob and others seem to have a character vector with escaped special characters and some have special characters like â.
I think its an encoding issue with the original data (which is not mine), but am unsure what the blobs are or what causes the character vectors / escaping. I realise its probably the original data, but trying to understand a little more of what I am seeing so I can investigate.
I'm using macOS 10.14.6.
Any ideas welcome.
The original data is here and I used the following to pull out some of the rows with strange characters.
data <- readRDS("all_speech.rds") %>%
select(year, speech) %>%
filter(str_detect(speech, "â"))
str(hansardOrig)
'data.frame': 2286324 obs. of 2 variables:
$ year : num 1979 1979 1979 1979 1979 ...
$ speech: chr "Mr. Speaker ...
Added
sample <- data %>% mutate(speech = substr(speech, 1, 200))
dput(head(sample))
structure(list(year = c(1982, 1982, 1982, 1984, 1986, 1986),
speech = c("With this it will be convenient to take amendment No. 112, in title, line 10, leave out 'section 163 1) of’.\n",
"I am not so much surprised as astonished by the amendment. It would create tremendous problems. Police officers have a vital role in visiting places of entertainment—without a warrant—particularly in ",
"I note the hon. Gentleman's desire to retire there.\nMy right hon. Friend mentioned that we are setting up a pilot scheme with three experimental homes. They will be in adapted, domestic-style, buildin",
"The British forces in the Lebanon had their headquarters at Haddâsse. From that position they would have been totally unable to help British nationals in west Beirut. They are better able to help, thr",
"We know that soon more cars will be manufactured in the United Kingdom, as the hon. Member for Edinburgh, Central Mr. Fletcher) wishes.\nhirdly, the decision will have a domino effect—that American phr",
"I beg to move,\nThat leave be given to bring in a Bill to make illegal the display of pictures of naked or partially naked women in sexually provocative poses in newspapers.\nThis is a simple but import"
)), row.names = c(NA, 6L), class = "data.frame")
You've got a difficult problem ahead of you. The sample you show has inconsistent encodings, so fixups will be hard to do.
The first entry in sample$speech displays like this on my Mac:
> sample$speech[1]
[1] "With this it will be convenient to take amendment No. 112, in title,
line 10, leave out 'section 163 1) of’.\n"
This looks okay up to the end, where the ’ characters look like a UTF-8 encoding for a directional quote "’", interpreted in the WINDOWS-1252 encoding. I can fix that with this code:
> iconv(sample$speech[1], from="utf-8", to="WINDOWS-1252")
[1] "With this it will be convenient to take amendment No. 112, in title,
line 10, leave out 'section 163 1) of’.\n"
However, this messes up the second entry, because it has em-dashes correctly encoded, so the translation converts them to hex 97 characters, not legal in the native UTF-8 encoding on the Mac:
> sample$speech[2]
[1] "I am not so much surprised as astonished by the amendment. It would
create tremendous problems. Police officers have a vital role in visiting
places of entertainment—without a warrant—particularly in "
> iconv(sample$speech[2], from="utf-8", to="WINDOWS-1252")
[1] "I am not so much surprised as astonished by the amendment. It would
create tremendous problems. Police officers have a vital role in visiting
places of entertainment\x97without a warrant\x97particularly in "
There are functions in various packages to guess encodings and to fix them, e.g. rvest::repair_encoding, stringi::stri_enc_detect, but I couldn't get them to work on your data. I wrote one myself, based on these ideas: use utf8ToInt to convert each string to its Unicode code point, then look for which ones contain multiple high values in a sequence. sample$speech[1] looks like this:
> utf8ToInt(sample$speech[1])
[1] 87 105 116 104 32 116 104 105 115 32 105 116 32 119 105 108 108
[18] 32 98 101 32 99 111 110 118 101 110 105 101 110 116 32 116 111
[35] 32 116 97 107 101 32 97 109 101 110 100 109 101 110 116 32 78
[52] 111 46 32 49 49 50 44 32 105 110 32 116 105 116 108 101 44
[69] 32 108 105 110 101 32 49 48 44 32 108 101 97 118 101 32 111
[86] 117 116 32 39 115 101 99 116 105 111 110 32 49 54 51 32 49
[103] 41 32 111 102 226 8364 8482 46 10
and that sequence near the end 226 8364 8482 is typical for a misinterpreted UTF-8 character. (The Wikipedia page describes the encoding in detail. Two byte chars start with 192 to 223, three byte chars start with 224 to 239, and four byte chars start with 240 to 247. Chars after the first are all in the range 128 to 191. The tricky part is figuring out how these high order chars will be displayed, because that depends on the wrongly assumed encoding.) Here's a quick and dirty function that tries every encoding known to iconv() and reports on what it does:
fixEncoding <- function(s, guess = iconvlist()) {
firstbytes <- list(as.raw(192:223),
as.raw(224:239), as.raw(240:247))
nextbytes <- as.raw(128:191)
for (i in seq_along(s)) {
str <- utf8ToInt(s[i])
if (any(str > 127)) {
fixes <- c()
encs <- c()
for (g in guess) {
high <- which(str > 127)
firsts <- lapply(firstbytes, function(s) utf8ToInt(iconv(rawToChar(s), from = g, to = "UTF-8", sub="")))
nexts <- utf8ToInt(iconv(rawToChar(nextbytes), from = g, to = "UTF-8", sub = ""))
for (try in 1:3) {
starts <- high[str[high] %in% firsts[[try]]]
starts <- starts[starts <= length(str) - try]
for (hit in starts) {
if (str[hit+1] %in% nexts &&
(try < 2 || str[hit+2] %in% nexts) &&
(try < 3 || str[hit+3] %in% nexts))
high <- setdiff(high, c(hit, hit + 1,
if (try > 1) hit + 2,
if (try > 2) hit + 3))
}
}
if (!length(high)) {
fixes <- c(fixes, iconv(s[i], from = "UTF-8", to = g, mark = FALSE))
encs <- c(encs, g)
}
}
if (length(fixes)) {
if (length(unique(fixes)) == 1) {
s[i] <- fixes[1]
message("Fixed s[", i, "] using one of ", paste(encs, collapse=","), "\n", sep = "")
} else {
warning("s[", i, "] has multiple possible fixes.")
message("It could be")
uniq <- unique(fixes)
for (u in seq_along(uniq))
message(paste(encs[fixes == uniq[u]], collapse = ","), "\n")
message("Not fixed!\n")
}
}
}
}
s
}
When I try it on your sample, I see this:
> fixed <- fixEncoding(sample$speech)
Fixed s[1] using one of CP1250,CP1252,CP1254,CP1256,CP1258,MS-ANSI,MS-ARAB,MS-EE,MS-TURK,WINDOWS-1250,WINDOWS-1252,WINDOWS-1254,WINDOWS-1256,WINDOWS-1258
You can make it less verbose by calling it as
fixed <- suppressMessages(fixEncoding(sample$speech))
The other issue you had in your original post was that some strings were being displayed as single characters. I think that's an RStudio bug. If I put too many characters in a single element in a dataframe, the RStudio viewer can't display it. For me the limit is around 10240 chars. This dataframe won't display properly:
d <- data.frame(x = paste(rep("a", 10241), collapse=""))
but any smaller number works. This isn't an R issue; it can display that dataframe in the console with no problem. It's only View(d) that is bad, and only in RStudio.

Remove blank lines in txt output from R

I am trying to create a specifically formatted file to use as an input file in another software. I have been able, with the help of people here, to create a file that is almost there. Now I just need to remove some empty lines in my *.txt output file. I have tried several different approaches with gsub() but can't figure out a way. Below is an example that produces a file that shows where I'm stuck.
matsplitter<-function(M, r, c) {
rg <- (row(M)-1)%/%r+1
cg <- (col(M)-1)%/%c+1
rci <- (rg-1)*max(cg) + cg
N <- prod(dim(M))/r/c
cv <- unlist(lapply(1:N, function(x) M[rci==x]))
dim(cv)<-c(r,c,N)
cv}
B <- matrix(c(1:1380),ncol=5)
capture.output(matsplitter(B,3,5), file='output.txt')
write.table(gsub('\\[.*\\]', '',
readLines('output.txt')),
file='output.txt', row.names=FALSE, quote=FALSE)
What I need to further remove are the two blank lines between the ", , 1", ", , 2" etc. string and the matrix of numbers.
x
, , 1
1 277 553 829 1105
2 278 554 830 1106
3 279 555 831 1107
, , 2
4 280 556 832 1108
5 281 557 833 1109
6 282 558 834 1110
, , 3
7 283 559 835 1111
8 284 560 836 1112
9 285 561 837 1113
A possible solution if you are willing to go beyond gsub. I have taken the liberty of breaking the answer up into pieces for clarity (hopefully).
#read in file created by "capture.out"
out = gsub('\\[.*\\]', '', readLines('output.txt'))
If you look at this object out you will see that blocks seem separated by five spaces, and that the first of the two spaces you want to get rid of is an empty string "". We get rid of the multiple spaces by means of:
out = gsub("\\s{5}","",out)
Now after the header but in front of every block there is two empty strings and after every block there is one empty string. As we only look to exclude spaces in front of blocks we use the function rle to find repeating elements and exclude these.
#get indicator vector
exclvec = rep(rle(out)$lengths,rle(out)$lengths)
#remove values as indicated by exclvec
out = out[ifelse(out=="" & exclvec==2,F,T)]
As i interpret your question writing this dataframe provides the desired result.
write.table(out,file='output.txt', row.names=FALSE, quote=FALSE)

R correct use of read.csv

I must be misunderstanding how read.csv works in R. I have read the help file, but still do not understand how a csv file containing:
40900,-,-,-,241.75,0
40905,244,245.79,241.25,244,22114
40906,244,246.79,243.6,245.5,18024
40907,246,248.5,246,247,60859
read into R using: euk<-data.matrix(read.csv("path\to\csv.csv"))
produces this as a result (using tail):
Date Open High Low Close Volume
[2713,] 15329 490 404 369 240.75 62763
[2714,] 15330 495 409 378 242.50 127534
[2715,] 15331 1 1 1 241.75 0
[2716,] 15336 504 425 385 244.00 22114
[2717,] 15337 504 432 396 245.50 18024
[2718,] 15338 512 442 405 247.00 60859
It must be something obvious that I do not understand. Please be kind in your responses, I am trying to learn.
Thanks!
The issue is not with read.csv, but with data.matrix. read.csv imports any column with characters in it as a factor. The '-' in the first row for your dataset are character, so the column is converted to a factor. Now, you pass the result of the read.csv into data.matrix, and as the help states, it replaces the levels of the factor with it's internal codes.
Basically, you need to insure that the columns of your data are numeric before you pass the data.frame into data.matrix.
This should work in your case (assuming the only characters are '-'):
euk <- data.matrix(read.csv("path/to/csv.csv", na.strings = "-", colClasses = 'numeric'))
I'm no R expert, but you may consider using scan() instead, eg:
> data = scan("foo.csv", what = list(x = numeric(), y = numeric()), sep = ",")
Where foo.csv has two columns, x and y, and is comma delimited. I hope that helps.
I took a cut/paste of your data, put it in a file and I get this using 'R'
> c<-data.matrix(read.csv("c:/DOCUME~1/Philip/LOCALS~1/Temp/x.csv",header=F))
> c
V1 V2 V3 V4 V5 V6
[1,] 40900 1 1 1 241.75 0
[2,] 40905 2 2 2 244.00 22114
[3,] 40906 2 3 3 245.50 18024
[4,] 40907 3 4 4 247.00 60859
>
There must be more in your data file, for one thing, data for the header line. And the output you show seems to start with row 2713. I would check:
The format of the header line, or get rid of it and add it manually later.
That each row has exactly 6 values.
The the filename uses forward slashes and has no embedded spaces
(use the 8.3 representation as shown in my filename).
Also, if you generated your csv file from MS Excel, the internal representation for a date is a number.

Resources