This question already has answers here:
convert HTML Character Entity Encoding in R
(5 answers)
Convert HTML Entity to proper character R
(1 answer)
Closed 4 years ago.
Is there a standard way in R to transliterate ASCII HTML codes to a standard character? For example, ' is an apostrophe, like ' or ' (I typed an apostrophe for the second one and the HTML code for the first). I'd like to change the following text
text = "Met with Mark's boss today to discuss performance"
to be
"Met with Mark's boss today to discuss performance"
I tried using iconv like below but the HTML code is all valid encoding, so nothing changes.
iconv(text, from="ASCII", to="UTF-8//TRANSLIT")
I could get a lookup table and do it that way but thought I'd check if there's an existing method to accomplish this.
Related
This question already has answers here:
Escaping backslash (\) in string or paths in R
(4 answers)
Closed 1 year ago.
In how to change working directory more easily?
Currently, if we use 'setwd',we have to add many '\', sometimes it's boring
Is there any easier way for this ? (Just like Python can add 'r' )
setwd('C:\Users\Administrator\Desktop\myfolder') # can't work
setwd('C:\\Users\\Administrator\\Desktop\\myfolder') # can work,but havt to add many '\'
You could use r (for raw string) and add parenthesis:
> r"(C:\Users\Administrator\Desktop\myfolder)"
[1] "C:\\Users\\Administrator\\Desktop\\myfolder"
>
And now:
setwd(r"(C:\Users\Administrator\Desktop\myfolder)")
Or reading from clipboard automatically adds the extra slashes:
setwd(readClipboard())
This question already has answers here:
Convert non-ASCII to character representations beginning with backslash u (\u) in R?
(1 answer)
How to escape backslashes in R string
(3 answers)
Closed 2 years ago.
I have a character vector which includes regex special characters such as \n.
I want to save this vector to a txt-file which shows the \n.
The code below creates a txt file, but when opening the file e.g. in notepad, the \n are translated into linebreaks. I don't want to see the linebreaks, I want to see \n.
library(tidyverse)
my_text <- "hello\n, nice to meet you.\n\n. how are you?"
my_text
readr::write_file(my_text, "my_text.txt")
this is what notepad shows:
hello
, nice to meet you.
. how are you?
but I want to see
"hello\n, nice to meet you.\n\n. how are you?"
I thought there was some kind of 'raw option', but seems like I am wrong. Thx!
This question already has answers here:
When I import text file into R, I get a special character appended to the first value of the first column
(4 answers)
Closed 5 years ago.
I have exported data from a result grid in SQL Server Management Studio to a csv file.
The csv file looks correct.
But when I read the data into an R dataframe using read.csv, the first column name is prepended with "ï..". How do I get rid of this junk text?
Example:
str(trainData)
'data.frame': 64169 obs. of 20 variables:
$ ï..Column1 : int 3232...
$ Column2 : int 4242...
The data looks something like this (nothing special) :
Column1,Column2
100116577,100116577
100116698,100116702
You've got a Unicode UTF-8 BOM at the start of the file:
http://en.wikipedia.org/wiki/Byte_order_mark
A text editor or web browser interpreting the text as ISO-8859-1 or
CP1252 will display the characters  for this
R is giving you the ï and then converting the other two into dots as they are non-alphanumeric characters.
Here:
http://r.789695.n4.nabble.com/Writing-Unicode-Text-into-Text-File-from-R-in-Windows-td4684693.html
Duncan Murdoch suggests:
You can declare a file to be in encoding "UTF-8-BOM" if you want to
ignore a BOM on input
So try your read.csv with fileEncoding="UTF-8-BOM" or persuade your SQL wotsit to not output a BOM.
Otherwise you may as well test if the first name starts with ï.. and strip it with substr (as long as you know you'll never have a column that does start like that genuinely...)
This question already has answers here:
How to escape backslashes in R string
(3 answers)
Closed 5 years ago.
I have an array:
t <- c("IMCR01","IMFA02","IMFA03")
I want to make it look like this:
"\'IMCR01\'","\'IMFA02\'","\'IMFA03\'"
I tried different ways like:
paste0("\'",t,"\'")
paste0("\\'",t,"\\'")
paste0("\\\\'",t,"\\\\'")
But none of them is correct. Any other functions are OK as well.
Actually your second attempt is correct:
paste0("\\'",t,"\\'")
If you want to tell paste to use a literal backslash, you need to escape it once (but not twice, as you would need within a regex pattern). This would output the following to the console in R:
[1] "\\'IMCR01\\'" "\\'IMFA02\\'" "\\'IMFA03\\'"
The trick here is that the backslash is even being escaped by R in the console output. If you were instead to write t to a text file, you would only see a single backslash as you wanted:
write(t, file = "/path/to/your/file.txt")
But why does R need to escape backslash when writing to its own console? One possibility is that if it were to write a literal \n then this would actually be interpreted by the console as a newline. Hence the need for eacaping is still there.
This question already has answers here:
When I import text file into R, I get a special character appended to the first value of the first column
(4 answers)
Closed 5 years ago.
I have exported data from a result grid in SQL Server Management Studio to a csv file.
The csv file looks correct.
But when I read the data into an R dataframe using read.csv, the first column name is prepended with "ï..". How do I get rid of this junk text?
Example:
str(trainData)
'data.frame': 64169 obs. of 20 variables:
$ ï..Column1 : int 3232...
$ Column2 : int 4242...
The data looks something like this (nothing special) :
Column1,Column2
100116577,100116577
100116698,100116702
You've got a Unicode UTF-8 BOM at the start of the file:
http://en.wikipedia.org/wiki/Byte_order_mark
A text editor or web browser interpreting the text as ISO-8859-1 or
CP1252 will display the characters  for this
R is giving you the ï and then converting the other two into dots as they are non-alphanumeric characters.
Here:
http://r.789695.n4.nabble.com/Writing-Unicode-Text-into-Text-File-from-R-in-Windows-td4684693.html
Duncan Murdoch suggests:
You can declare a file to be in encoding "UTF-8-BOM" if you want to
ignore a BOM on input
So try your read.csv with fileEncoding="UTF-8-BOM" or persuade your SQL wotsit to not output a BOM.
Otherwise you may as well test if the first name starts with ï.. and strip it with substr (as long as you know you'll never have a column that does start like that genuinely...)