Is it possible to use single quote as quote character in fread? - r

I have a .csv file similar to this one:
id,text,value
1,'foo, bar',5
2,'foo',5
3,'foo"bar',4
It uses single quotes for text.
I would like to read it in with fread(). I tried fread('text.csv', quote = "\'"), but I get 'unused argument' error. Is it possible to read the file in somehow? Can I somehow change the default quoting character?
(Using command line tool as sed to change the single quotes to double quotes also does not work as some field contains double quotes as well.)

Related

When do we use double quotes in R?

So,I am a very beginner in R,I wanted to ask this-
inside read.csv ,we are using "",but while writting summary/mean/sd etc. we don't use "". Why is this the case?
Double quotes " demark strings of characters, text inside code. Code itself is not double-quoted.
The following is valid code
name.of.file <- "C:\\Users\\bernhard\\valued_data.csv"
read.csv(name.of.file)
So you see, there is nothing special about read.csv, it takes the file name either as a string in "s or as a variable containing that string; no " in the latter case.

Are there 2 types of double-quotes in R? My double quotes look slanted and are giving an error message

I am using the statement in R:
setwd("C:\\Users\\carl\\Documents\\research")
to set the working directory. It worked fine when I pasted the statement from someone else's R script but I received an error message:
Error: unexpected input in "setwd("".
when I entered the command directly or when I copied it from my script in a Word file.
It seems to be related to the fact that the double-quotes that I typed (that don't work) look a little slanted while the double-quotes in the pasted text (that work fine) look like they're straight up and down. Is there something I can do to type plain looking double-quotes instead of slanted double-quotes?
Word automatically replaces your double quotes with so-called smart quotes or curly quotes.
You need to use the regular/straight double quotes (") in r.
This support article explains how you can disable the automatic smart quote replacement in Word. In fairness though, Word is probably not the... um... ideal code editor.

Nesting more than two types of quotes in R

I would like to know how to accommodate more than two types of quotes in a same row in R. Let´s say that I want to print:
'first-quote-type1 "first-quote-type2 "second-quote-type2
'sencond-quote-type1
Using one quote in the beginning and one in the end we have:
print("'first-quote-type1 "first-quote-type2 "second-quote-type2 'sencond-quote-type1")
Error: unexpected symbol in "print("'first-quote-type1 "first"
I tried to include triple quotes as required in Python in this cases:
print(''''first-quote-type1 "first-quote-type2 "second-quote-type2 'sencond-quote-type1''')
print("""'first-quote-type1 "first-quote-type2 "second-quote-type2 'sencond-quote-type1""")
However, I also got a similar error. Some idea how to make this syntax work in R?
To use a quote within a quote you can escape the quote character with a backslash
print("the man said \"hello\"")
However, the print function in R will always escape character.
To not show the escaped character use cat() instead
so...
cat("the man said \"hello\"") will return
the man said "hello"

How to remove the extra commas from a csv file?

I was trying to use a csv file in R in read.transactions() command from arules package.
The csv file when opened in Notepad++ shows extra commas for every non-existing values. So, I'm having to manually delete those extra commas before using the csv in read.transactions(). For example, the actual csv file when opened in Notepad++ looks like:
D115,DX06,Slz,,,,
HC,,,,,,
DX06,,,,,,
DX17,PG,,,,,
DX06,RT,Dty,Dtcr,,
I want it to appear like below while sending it into read.transactions():
D115,DX06,Slz
HC
DX06
DX17,PG
DX06,RT,Dty,Dtcr
Is there any way I can make that change in read.transactions() itself, or any other way? But even before that, we don't get to see those extra commas in R(that output I showed was from Notepad++)..
So how can we even remove them in R when we can't see it?
A simple way to create a new file without the trailing commas is:
file_lines <- readLines("input.txt")
writeLines(gsub(",+$", "", file_lines),
"without_commas.txt")
In the gsub command, ",+$" matches one or more (+) commas (,) at the end of a line ($).
Since you're using Notepad++, you could just do the substitution in that program: Search > Replace, replace ,+$ with nothing, Search Mode=Regular Expression.

Reading a csv file with embedded quotes into R

I have to work with a .csv file that comes like this:
"IDEA ID,""IDEA TITLE"",""VOTE VALUE"""
"56144,""Net Present Value PLUS (NPV+)"",1"
"56144,""Net Present Value PLUS (NPV+)"",1"
If I use read.csv, I obtain a data frame with one variable. What I need is a data frame with three columns, where columns are separated by commas. How can I handle the quotes at the beginning of the line and the end of the line?
I don't think there's going to be an easy way to do this without stripping the initial and terminal quotation marks first. If you have sed on your system (Unix [Linux/MacOS] or Windows+Cygwin?) then
read.csv(pipe("sed -e 's/^\"//' -e 's/\"$//' qtest.csv"))
should work. Otherwise
read.csv(text=gsub("(^\"|\"$)","",readLines("qtest.csv")))
is a little less efficient for big files (you have to read in the whole thing before processing it), but should work anywhere.
(There may be a way to do the regular expression for sed in the same, more-compact form using parentheses that the second example uses, but I got tired of trying to sort out where all the backslashes belonged.)
I suggest both removing the initial/terminal quotes and turning the back-to-back double quotes into single double quotes. The latter is crucial in case some of the strings contain commas themselves, as in
"1,""A mostly harmless string"",11"
"2,""Another mostly harmless string"",12"
"3,""These, commas, cause, trouble"",13"
Removing only the initial/terminal quotes while keeping the back-to-back quote leads the read.csv() function to produce 6 variables, as it interprets all commas in the last row as value separators. So the complete code might look like this:
data.text <- readLines("fullofquotes.csv") # Reads data from file into a character vector.
data.text <- gsub("^\"|\"$", "", data.text) # Removes initial/terminal quotes.
data.text <- gsub("\"\"", "\"", data.text) # Replaces "" by ".
data <- read.csv(text=data.text, header=FALSE)
Or, of course, all in a single line
data <- read.csv(text=gsub("\"\"", "\"", gsub("^\"|\"$", "", readLines("fullofquotes.csv", header=FALSE))))

Resources