R - intToUtf8 function - r

Using the gsub function, I am replacing certain string of texts that contain the Unicode Character 'START OF GUARDED AREA' (U+0096).
gsub("A string with the following character –","A string without the character")
My code works, but if I close my script and reopen it, that character in my code is replaced by a normal dash.
To work around this problem, I was thinking to replace the actual character by a function. I came across function intToUtf8(), which I thought would return my character in question if I use it as follows:
intToUtf8(150)
However, when typing this in my console, it returns "\u0096"
Question 1: why is the character replaced in my script?
Question 2: why isn't my console returning the character '–'?
Many thanks in advance for your precious help!

Related

How to remove "\" from paste function output with quotation marks?

I'm working with the following code:
Y_Columns <- c("Y.1.1")
paste('{"ImportId":"', Y_Columns, '"}', sep = "")
The paste function produces the following output:
"{\"ImportId\":\"Y.1.1\"}"
How do I get the paste function to omit the \? Such that, the output is:
"{"ImportId":"Y.1.1"}"
Thank you for your help.
Note: I did do a search on SO to see if there were any Q's that asked "what is an escape character in R". But I didn't review all the 160 answers, only the first 20.
This is one way of demonstrating what I wrote in my comment:
out <- paste('{"ImportId":"', Y_Columns, '"}', sep = "")
out
#[1] "{\"ImportId\":\"Y.1.1\"}"
?print
print(out,quote=FALSE)
#[1] {"ImportId":"Y.1.1"}
Both R and regex patterns use escape characters to allow special characters to be displayed in print output or input. (And sometimes regex patterns need to have doubled escapes.) R has a few characters that need to be "escaped" in certain situation. You illustrated one such situation: including double-quote character inside a result that will be printed with surrounding double-quotes. If you were intending to include any single quotes inside a character value that was delimited by single quotes at the time of creation, they would have needed to be escaped as well.
out2 <- '\'quoted\''
nchar(out2)
#[1] 8 ... note that neither the surround single-quotes nor the backslashes get counted
> out2
[1] "'quoted'" ... and the default output quote-char is a double-quote.
Here's a good Q&A to review:How to replace '+' using gsub() function in R
It has two answers, both useful: one shows how to double escape a special character and the other shows how to use teh fixed argument to get around that requirement.
And another potentially useful Q&A on the topic of handling Windows paths:
File path issues in R using Windows ("Hex digits in character string" error)
And some further useful reading suggestions: Look at the series of help pages that start with capital letters. (Since I can never remember which one has which nugget of essential information, I tried ?Syntax first and it has a "See Also" list of essential reading: Arithmetic, Comparison, Control, Extract, Logic, NumericConstants, Paren, Quotes, Reserved. and I then realized what I wanted to refer you to was most likely ?Quotes where all the R-specific escape sequence letters should be listed.

grepping special characters in R

I have a variable named full.path.
And I am checking if the string contained in it is having certain special character or not.
From my code below, I am trying to grep some special character. As the characters are not there, still the output that I get is true.
Could someone explain and help. Thanks in advance.
full.path <- "/home/xyz"
#This returns TRUE :(
grepl("[?.,;:'-_+=()!##$%^&*|~`{}]", full.path)
By plugging this regex into https://regexr.com/ I was able to spot the issue: if you have - in a character class, you will create a range. The range from ' to _ happens to include uppercase letters, so you get spurious matches.
To avoid this behaviour, you can put - first in the character class, which is how you signal you want to actually match - and not a range:
> grepl("[-?.,;:'_+=()!##$%^&*|~`{}]", full.path)
[1] FALSE

Function argument converted to date

I have a function with an argument that is a link to a file. My problem is, that even though I specify that I want to have a string here, a part of it seems to be recognized as a date. This results in a part of my string being replaced by "t-". How do I prevent this from happening?
smfunc <- function(link=as.character("T:\11-10-2017 - Folder\filename.csv"))
{
link
}
smfunc()
[1] "T:\t-10-2017 - Folder\filename.csv"
How do I prevent this from happening?
Easy: this does not happen (that would be terrible). The problem is different: you forgot to escape the backslashes:
smfunc = function (link = "T:\\11-10-2017 - Folder\\filename.csv") {
link
}
Without the escaped backslashes, '\11' is interpreted as a numeric character code (with value 11oct = 9dec, which is equivalent to the tab character '\t').
'\f', by pure chance, is a valid escape sequence equivalent to the “form feed” character. This is not the same as '\\f', i.e. a literal backslash followed by an “f”, and which is what you need.
Using as.character, incidentally, is redundant here: your value is already a character vector.

Use of quotes within get function (get())

I hope to get some help on the use of quotation marks within a string for get().
Say, I want to retrieve an element from a list
some_list <- list(element1=11,element2=22,element3=33)
naturally, I can simply reference this element through
some_list[['element1']]
However, once I use this as a string within get(), R throws this error message
get("some_list[['element1']]")
> Error in get("some_list[['element1']]") :
object 'some_list[['element1']]' not found
I cannot figure out why this is the case. get() works fine when used with strings that do not have quotation marks within them, e.g.
get("some_list")
I also tried escaping the quotation marks within the string (although I don't this I would need to since they are single quotation marks) but it does not work either.
some_list[["\'"element1"\'"]]
What am I missing.
get won't do that.
some_list[['element1']] isn't the name of an object in an R environment (in a technical sense). When you type some_list[['element1']] at the console, R parses the expression, looks up the symbol some_list and then calls the function [[. get is intended just for the symbol lookup piece of that.
(Technically, my sequence of events there probably isn't right, but I listed them that way to help make the issue clear. Really, R is just parsing the expression, and then calling [[ with arguments some_list and 'element1', and those symbols are subsequently looked up.)
The quotes have nothing to do with it. Run:
get("some_list")[['element1']]

read.fwf and the number sign

I am trying to read this file (3.8mb) using its fixed-width structure as described in the following link.
This command:
a <- read.fwf('~/ccsl.txt',c(2,30,6,2,30,8,10,11,6,8))
Produces an error:
line 37 did not have 10 elements
After replicating the issue with different values of the skip option, I figured that the lines causing the problem all contain the "#" symbol.
Is there any way to get around it?
As #jverzani already commented, this problem is probably the fact that the # sign often used as a character to signal a comment. Setting the comment.char input argument of read.fwf to something other than # could fix the problem. I'll leave my answer below as a more general case that you can use on any character that causes problems (e.g. the 's in the Dutch city name 's Gravenhage).
I've had this problem occur with other symbols. The approach I took was to simply replace the # by either nothing, or by a character which does not generate the error. In my case it was no problem to simply replace the character, but this might not be possible in your case.
So my approach would be to delete the symbol that generates the error, or replace by another character. This can be done using a text editor (find and replace), in an R script, or using some linux tools called grep and sed. If you want to do this in an R script, use scan or readLines to read the lines. Once the text is in memory, you can use sub to replace the character.
If you cannot replace the character, I would try the following approach: replace the character by a character that does not generate an error, read it into R using read.fwf, and finally replace the character by the # character.
Following up on the answer above: to get all characters to be read as literals, use both comment.char="" and quote="" (the latter takes care of #PaulHiemstra's problem with single-quotes in Dutch proper nouns) in the call to read.fwf (this is documented in ?read.table).

Resources