Why are names(x)<-y and "names<-"(x,y) not equivalent? - r

Consider the following:
y<-c("A","B","C")
x<-z<-c(1,2,3)
names(x)<-y
"names<-"(z,y)
If you run this code, you will discover that names(x)<-y is not identical to "names<-"(z,y). In particular, one sees that names(x)<-y actually changes the names of x whereas "names<-"(z,y) returns z with its names changed.
Why is this? I was under the impression that the difference between writing a function normally and writing it as an infix operator was only one of syntax, rather than something that actually changes the output. Where in the documentation is this difference discussed?

Short answer: names(x)<-y is actually sugar for x<-"names<-"(x,y) and not just "names<-"(x,y). See the the R-lang manual, pages 18-19 (pages 23-24 of the PDF), which comes to basically the same example.
For example, names(x) <- c("a","b") is equivalent to:
`*tmp*`<-x
x <- "names<-"(`*tmp*`, value=c("a","b"))
rm(`*tmp*`)
If more familiar with getter/setter, one can think that if somefunction is a getter function, somefunction<- is the corresponding setter. In R, where each object is immutable, it's more correct to call the setter a replacement function, because the function actually creates a new object identical to the old one, but with an attribute added/modified/removed and replaces with this new object the old one.
In the case example for instance, the names attribute are not just added to x; rather a new object with the same values of x but with the names is created and linked to the x symbol.
Since there are still some doubts about why the issue is discussed in the language doc instead directly on ?names, here is a small recap of this property of the R language.
You can define a function with the name you wish (there are some restrictions of course) and the name does not impact in any way if the function is called "normally".
However, if you name a function with the <- suffix, it becomes a replacement function and allows the parser to apply the function with the mechanism described at the beginning of this answer if called by the syntax foo(x)<-value. See here that you don't call explicitely foo<-, but with a slightly different syntax you obtain an object replacement (since the name).
Although there are not formal restrictions, it's common to define getter/setter in R with the same name (for instance names and names<-). In this case, the <- suffix function is the replacement function of the corresponding version without suffix.
As stated at the beginning, this behaviour is general and a property of the language, so it doesn't need to be discussed in any replacement function doc.

In particular, one sees that names(x)<-y actually changes the names of x whereas "names<-"(z,y) returns z with its names changed.
That’s because `names<-`1 is a regular function, albeit with an odd name2. It performs no assignment, it returns a new object with the names attribute set. In fact `names<-` is a primitive function in R but it could be implemented as follows (there are shorter, better ways of writing this in R, but I want the separate steps to be explicit):
`names<-` = function (x, value) {
new = x
attr(new, 'names') = value
new
}
That is, it
… creates a new object that’s a copy of x,
… sets the names attribute on that newly created object, and
… returns the new object.
Since virtually all objects in R are immutable, this fits naturally into R’s semantics. In fact, a better name for this exact function would be with_names3. But the creators of R found it convenient to be able to write such an assignment without repeating the name of the object. So instead of writing
x = with_names(x, c('foo', 'bar'))
or
x = `names<-`(x, c('foo', 'bar'))
R allows us to write
names(x) = c('foo', 'bar')
R handles this syntax specially by internally converting it to another expression, documented in the Subset assignment section of the R language definition, as explained in the answer by Nicola.
But the gist is that names(x) = y and `names<-`(x, y) are different because … they just are. The former is a special syntactic form that gets recognised and transformed by the R parser. The latter is a regular function call, and the weird function name is a red herring: it doesn’t affect the execution whatsoever. It does the same as if the function was named differently, and you can confirm this by assigning it a different name:
with_names = `names<-`
`another weird(!) name` = `names<-`
# These are all identical:
`names<-`(x, y)
with_names(x, y)
`another weird(!) name`(x, y)
1 I strongly encourage using backtick quotes (`) instead of straight quotes (' or ") to quote R variable names. While both are allowed in some circumstances, the latter invites confusion with strings, and is conceptually bonkers. These are not strings. Consider:
"a" = "b"
"c" = "a"
Rather than copy the value of a into c, what this code actually does is set c to literal "a", because quotes now mean different things on the left- and right-hand side of assignment.
The R documentation confirms that
The preferred quote [for variable names] is the backtick (`)
2 Regular variable names (aka “identifiers” or just “names”) in R can only contain letters, digits, underscore and the dot, must start with a letter, or with a dot not followed by a digit, and can’t be reserved words. But R allows using pretty much arbitrary characters — including punctuation and even spaces! — in variable names, provided the name is backtick-quoted.
3 In fact, R has an almost-alias for this function, called setNames — which isn’t a great name, since set… implies mutating the object, but of course it doesn’t do that.

Related

Rename a column with R

I'm trying to rename a specific column in my R script using the colnames function but with no sucess so far.
I'm kinda new around programming so it may be something simple to solve.
Basically, I'm trying to rename a column called Reviewer Overall Notes and name it Nota Final in a data frame called notas with the codes:
colnames(notas$`Reviewer Overall Notes`) <- `Nota Final`
and it returns to me:
> colnames(notas$`Reviewer Overall Notes`) <- `Nota Final`
Error: object 'Nota Final' not found
I also found in [this post][1] a code that goes:
colnames(notas) [13] <- `Nota Final`
But it also return the same message.
What I'm doing wrong?
Ps:. Sorry for any misspeling, English is not my primary language.
You probably want
colnames(notas)[colnames(notas) == "Reviewer Overall Notes"] <- "Nota Final"
(#Whatif's answer shows how you can do this with the numeric index, but probably better practice to do it this way; working with strings rather than column indices makes your code both easier to read [you can see what you're renaming] and more robust [in case the order of columns changes in the future])
Alternatively,
notas <- notas %>% dplyr::rename(`Nota Final` = `Reviewer Overall Notes`)
Here you do use back-ticks, because tidyverse (of which dplyr is a part) prefers its arguments to be passed as symbols rather than strings.
Why using backtick? Use the normal quotation mark.
colnames(notas)[13] <- 'Nota Final'
This seems to matter:
df <- data.frame(a = 1:4)
colnames(df)[1] <- `b`
Error: object 'b' not found
You should not use single or double quotes in naming:
I have learned that we should not use space in names. If there are spaces in names (it works and is called a non-syntactic name: And according to Wickham Hadley's description in Advanced R book this is due to historical reasons:
"You can also create non-syntactic bindings using single or double quotes (e.g. "_abc" <- 1) instead of backticks, but you shouldn’t, because you’ll have to use a different syntax to retrieve the values. The ability to use strings on the left hand side of the assignment arrow is an historical artefact, used before R supported backticks."
To get an overview what syntactic names are use ?make.names:
make.names("Nota Final")
[1] "Nota.Final"

Why the code only works with numbers and not letters?

I have to use the code bellow but I don't completely understand how it works. Why it won't work if I change du.4 by du.f and then use the f when calling the function? For some reason it only works with numbers and I do not undarstand why.
This is the error that it is giving in the case of du.f
Error in paste("Meth1=", nr, ".ps", sep = "") : object 'f' not found
du.4 <- function(u,v,a){(exp(a)*(-1+exp(a*v)))/(-exp(a)+exp(a+a*u)-exp(a*(u+v))+exp(a+a*v))}
plotmeth1 <- function(data1,data2,alpha,nr) {
psfile <-paste("Meth1=",nr,".ps",sep="")
diffmethod <-paste("du.",nr,sep="")
title=paste("Family",nr)
alphavalue <-paste("alpha=",round(alpha,digits=3),sep="")
#message=c("no message")
postscript(psfile)
data3<-sort(eval(call(diffmethod,data1,data2,alpha)))
diffdata <-data3[!is.na(data3)]
#if(length(data3)>length(diffdata))
#{message=paste("Family ",nr,"contains NA!")}
tq <-((1:length(diffdata))/(length(diffdata)+1))
plot(diffdata,tq,main=title,xlab="C1[F(x),G(y)]",ylab="U(0,1)",type="l")
legend(0.6,0.3,c(alphavalue))
abline(0,1)
#dev.off()
}
In R, a dot is used as just another character in identifiers. It is often used for clarity but doesn't have a formal function in defining the part after the dot as being in a name-space given by the part of the identifier before the dot. In something like du.f you can't refer to the function by f alone, even if your computation is inside of an environment named du. You can of course define a function named du.4 and then use 4 all by itself, but when you do so you are using the number 4 as just a number and not as a reference to the function. For example, if
du.4 <- function(u,v,a){(exp(a)*(-1+exp(a*v)))/(-exp(a)+exp(a+a*u)-exp(a*(u+v))+exp(a+a*v))}
Then du.4(1,2,3) evaluates to 21.08554 but attempting to use 4(1,2,3) throws the error
Error: attempt to apply non-function
In the case of your code, you are using paste to assemble the function name as a string to be passed to eval. It makes sense to paste the literal number 4 onto the string 'du.' (since the paste will convert 4 to the string '4') but it doesn't make sense to paste an undefined f onto 'du.'. It does, however, make sense to paste the literal string 'f' onto 'du.', so that the function call plotmeth1 (data1, data2, alpha, 'f') will work even though plotmeth1 (data1, data2, alpha, f) will fail.
See this question for more about the use of the dot in R identifiers.

Why does substitute change noquote text to a string in R?

I wanted to answer a question regarding plotmath but I failed to get my desired substitute output.
My desired output:paste("Hi", paste(italic(yes),"why not?"))
and what I get: paste("Hi", "paste(italic(yes),\"why not?\")")
text<-'paste(italic(yes),"why not?")'
text
[1] "paste(italic(yes),\"why not?\")"
noqoute_text<-noquote(text)
noqoute_text
[1] paste(italic(yes),"why not?")
sub<-substitute(paste("Hi",noqoute_text),
env=list(noqoute_text=noqoute_text))
sub
paste("Hi", "paste(italic(yes),\"why not?\")")
You're using the wrong function, use parse instead of noquote :
text<-'paste(italic(yes),"why not?")'
noquote_text <- parse(text=text)[[1]]
sub<- substitute(paste("Hi",noquote_text),env=list(noquote_text= noquote_text))
# paste("Hi", paste(italic(yes), "why not?"))
noquote just applies a class to an object of type character, with a specific print method not to show the quotes.
str(noquote("a"))
Class 'noquote' chr "a"
unclass(noquote("a"))
[1] "a"
Would you please elaborate on your answer?
In R you ought to be careful about the difference between what's in an object, and what is printed.
What noquote does is :
add "noquote" to the class attribute of the object
That's it
The code is :
function (obj)
{
if (!inherits(obj, "noquote"))
class(obj) <- c(attr(obj, "class"), "noquote")
obj
}
Then when you print it, the methods print.noquote :
Removes the class "noquote" from the object if it's there
calls print with the argument quote = FALSE
that's it
You can actually call print.noquote on a string too :
print.noquote("a")
[1] a
It does print in a similar fashion as quote(a) or substitute(a) would but it's a totally different beast.
In the code you tried, you've been substituting a string instead of a call.
For solving the question I think Moody_Mudskipperss answer works fine, but as you asked for some elaboration...
You need to be careful about different ways similar-looking things are actually stored in R, which means they behave differently.
Especially with the way plotmath handles labels, as they try to emulate the way character-strings are normally handled, but then applies its own rules. The 3 things you are mixing I think:
character() is the most familiar: just a string. Printing can be confusing when quotes etc. are escaped. The function noquote basically tells R to mark it's argument, so that quotes are not escaped.
calls are "unevaluated function-calls": it's an instruction as to what R should do, but it's not yet executed. Any errors in this call don't come up yet, and you can inspect it.
Note that a call does not have its own evironment given with it, which means a call can give different results if evaluated e.g. from within a function.
Expressions are like calls, but applied more generally, i.e. not always a function that needs to be executed. An expression can be a variable-name, but also a simple value such as "why not?". Also, expressions can consist of multiple units, like you would have with {
Different functions can convert between these classes, but sometimes functions (such as paste!) also convert unexpectedly:
noquote does not do that much useful, as Moody_Mudskipper already pointed out: it only changes the printing. But the object basically remains a character
substitute not only substitutes variables, but also converts its first argument into (most often) a call. Here, the print bites you, for when printing a call, there is no provision for special classes of its members. Try it: sub[[3]] from the question gives[1] paste(italic(yes),"why not?")
without any backslashes! Only when printing the full call the noquote-part is lost.
parse is used to transform a character to an expression. Nothing is evaluated yet, but some structure is introduced, so that you could manipulate the expression.
paste is often behaving annoyingly (although as documented), as it can only paste together character-strings. Therefore, if you feed it anything but a character, it firs calls as.character. So if you give it a call, you just get a text-line again. So in your question, even if you'd use parse, as soon as you start pasting thing together, you get the quotes again.
Finally, your problem is harder because it's using plotmaths internal logic.
That means that as soon as you try to evaluate your text, you'll probably get an error "could not find function italic" (or a more confusing error if there is a function italic defined elsewhere). When providing it in plotmath, it works because the call is only evaluated by plotmath, which will give it a nice environment, where italic works as expected.
This all means you need to treat it all as an expression or call. As long as evaluation cannot be done (as long as it's you that handles the expression, instead of plotmath) it all needs to remain an expression or call. Giving substitute a call works, but you can also emulate more closely what happens in R, with
call('paste', 'Hi', parse(text=text)[[1]])

Passing arguments to functions, and variable scopes in R

I'm writing a simple function that takes two arguments (state, outcome). State is used to subset a dataframe later.
Having said that, part of the requirement is that state be a length 2 character vector. I need to write more code to ensure that the state that is passed conforms to this requirement.
So I wrote the following:
best <- function(state, outcome) {
outcome <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
state <- vector(mode = "character", length = 2)
st.checkTbl <- outcome[8]
state
}
However, when I call the function and pass the arguments:
best("AXA") or best("FOO") or even best("TX") or best(AL)
All I get back is: "" ""
If I comment out the #state <- ... then it passes the argument just fine and it prints "FOO" or "AXA" or "TX", etc.
How can I ensure that the argument passed to the function is stored as a variable (state) in the function? Or, am I way overthinking this? Really I just wanted to test that what I am passing to the state argument can be printed for test purposes.
. Sorry for the 101 lesson.
You would generally read your data outside of any function, like so:
outcome.data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
Otherwise, since a function has its own namespace, all the variables defined inside of it will vanish upon its return, unless they themselves are returned by the function with return(...). Several objects can be returned by putting them in a list: return(list(item1=var1, item2=var2)).
Some functions, such as assign, have the envir parameter that can be set to .GlobalEnv to change this behavior. Altering an object can also be done inside a function using the <<- operator instead of <-, although this practice is generally recommended against.
As a side note, when using a function, you need to define clearly:
What are its inputs
What does it do
What does it return
It's not useful, for instance, to use outcome as a function parameter and then read into a variable named income the content of a csv file. Your argument is then useless as it will be written over. That's why you had to comment out the line defining your state variable inside the function to actually be able to use state as it was received by the function.
This surely won't answer all your questions, but hopefully it can help you clarify certain things. For the rest there are plenty of good tutorials to learn further on how to program in R and how/when to use functions. Best of luck and happy learning!

R data table issue

I'm having trouble working with a data table in R. This is probably something really simple but I can't find the solution anywhere.
Here is what I have:
Let's say t is the data table
colNames <- names(t)
for (col in colNames) {
print (t$col)
}
When I do this, it prints NULL. However, if I do it manually, it works fine -- say a column name is "sample". If I type t$"sample" into the R prompt, it works fine. What am I doing wrong here?
You need t[[col]]; t$col does an odd form of evaluation.
edit: incorporating #joran's explanation:
t$col tries to find an element literally named 'col' in list t, not what you happen to have stored as a value in a variable named col.
$ is convenient for interactive use, because it is shorter and one can skip quotation marks (i.e. t$foo vs. t[["foo"]]. It also does partial matching, which is very convenient but can under unusual circumstances be dangerous or confusing: i.e. if a list contains an element foolicious, then t$foo will retrieve it. For this reason it is not generally recommended for programming.
[[ can take either a literal string ("foo") or a string stored in a variable (col), and does not do partial matching. It is generally recommended for programming (although there's no harm in using it interactively).

Resources