I'm maintaining some code where a previous author has used statements like:
x <- 1. * a + b
or:
if (y < 1.e-3)
or:
z[z < 0.] <- 0
or:
f <- aa + bb / 2.
The dots in these statements aren't appearing within parameter or function names, and they're not appearing within formulas, so I'm having trouble figuring out whether they have any significance. As far as I can tell, similar statements evaluated with constants substituted for the variables don't evaluate any differently.
I thought that perhaps the periods were inserted to coerce the result to a float, but the statements don't seem to be ambiguous in this regard, and I wasn't under the impression that R needed any help in this regard with different numeric types. he only other explanation I can come up with is that the values originally were floats and the previous author got lazy removing the decimal points when they were changed to integers.
Is there any other possible use for the dot that could be relevant, or am I safe cleaning up these statements?
Related
I was wondering if there a way for R to detect the existence or absence of the sign * as used in the following objects?
In other words, can R understand that a has a * sign but b doesn't?
a = 3*4
b = 12
If you keep the expressions unevaluated, R can understand their internal complexity. Under normal circumstances, though, R evaluates expressions immediately, so there is no way to tell the difference between a <- 3*4 and b <- 12 once the assignments have been made. That means that the answer to your specific question is No.
Dealing with unevaluated expressions can get a bit complex, but quote() is one simple way to keep e.g. 3*4 from being evaluated:
> length(quote(3*4))
[1] 3
> length(quote(12))
[1] 1
If you're working inside a function, you can use substitute to retrieve the unevaluated form of the function arguments:
> f <- function(a) {
+ length(substitute(a))
+ }
> f(12)
[1] 1
> f(3*4)
[1] 3
In case you're pursuing this farther, you should be aware that counting complexity might not be as easy as you think:
> f(sqrt(2*3+(7*19)^2))
[1] 2
What's going on is that R stores expressions as a tree; the top level here is made up of sqrt and <the rest of the expression>, which has length 2. If you want to measure complexity you'll need to do some kind of collapsing or counting down the branches of the tree ...
Furthermore, if you first assign a <- 3*4 and then call f(a) you get 1, not 3, because substitute() gives you back just the symbol a, which has length 1 ... the information about the difference between "12" and "3*4" gets lost as soon as the expression is evaluated, which happens when the value is assigned to the symbol a. The bottom line is that you have to be very careful in controlling when expressions get evaluated, and it's not easy.
Hadley Wickham's chapter on expressions might be a good place to read more.
I am new to R and trying to understand the effect of the following code.
> x <- c(1, 2)
> x[0]
numeric(0)
> x[FALSE]
numeric(0
> x[c(FALSE, TRUE)]
[1] 2
Specifically, having extensive background in C and C++, I am interesting in knowing what R does internally when accessing an element at index 0. I know that R has 1 based array indexing. But in this specific case, does it access the vector and then remove the result (numeric(0)) or does it remove 0 from the vector and show the results?
So, I want to know what is the definitive way to know about this? What should I type in R as part of '?' or 'help' command?
Based on comments from Roland and G. Grothendieck, I did a quick readup of the R language definition. The answer is right there in $3.4.1
A special case is the zero index, which has null effects: x[0] is an
empty vector and otherwise including zeros among positive or negative
indices has the same effect as if they were omitted.
I have a function f(v,u) and I defined function
solutionf(u) := fsolve(f(v,u)=v);
I need to plot solutionf(u) depending on u but just
plot(solutionf(u), u = 0 .. 0.4e-1)
gives me an error
Error, (in fsolve) number of equations, 1, does not match number of variables, 2
However I can always take the value solutionf(x) at any x.
Is there simple way to plot this? Or I have to make own for loop over u, take value at every point and plot interploating values?
This is one of the most-often-asked Maple questions. Your error is caused by what is known as premature evaluation, the expression solutionf(u) being evaluated before u has been given a numeric value.
There are several ways to avoid premature evaluation. The simplest is probably to use forward single quotes:
plot('solutionf(u)', u= 0..0.4e-1);
sqr = seq(1, 100, by=2)
sqr.squared = NULL
for (n in 1:50)
{
sqr.squared[n] = sqr[n]^2
}
I came accross the loop above, for a beginner this was simple enough. To further understand r what was the precise purpose of the second line? For my research I gather it has something to do with resetting the vector. If someone could elaborate it'd be much appreciated.
sqr.squared <- NULL
is one of many ways initialize the empty vector sqr.squared prior to running it through a loop. In general, when the length of the resulting vector is known, it is much better practice to allocate the vector's length. So here,
sqr.squared <- vector("integer", 50)
would be much better practice. And faster too. This way you are not building the new vector in the loop. But since ^ is vectorized, you could also simply do
sqr[1:50] ^ 2
and ditch the loop all together.
Another way to think about it is to remember that everything in r is a function call, and functions need input (usually).
say you calculated y and want to store that value somewhere. You can do x <- y without initializing an x object (r does this for you unlike in other languages, c for example), but say you want to store it in a specific place in x.
So note that <- (or = in your example) is a function
y <- 1
x[2] <- y
# Error in x[2] <- y : object 'x' not found
This is a different function than <-. Since you want to put y at x[2], you need the function [<-
`[<-`(x, 2, y)
# Error: object 'x' not found
But this still doesn't work because we need the object x to use this function, so initialize x to something.
(x <- numeric(5))
# [1] 0 0 0 0 0
# and now use the function
`[<-`(x, 2, y)
# [1] 0 1 0 0 0
This prefix notation is easier for computers to parse (eg, + 1 1) but harder for humans (me at least), so we prefer infix notation (eg, 1 + 1). R makes such functions easier to use x[2] <- y rather than how I did above.
The first answer is correct, when you assign a NULL value to a variable, the purpose is to initialize a vector. In many cases, when you are working checking numbers or with different types of variables, you will need to set NULL this arrays, matrix, etc.
For example, in you want to create a some type of element, in some cases you will need to put something inside them. This is the purpose of to use NULL. In addition, sometimes you will require NA instead of NULL.
I'm confused with when a value is treated as a variable, and when as a string in R. In Ruby and Python, I'm used to a string always having to be quoted, and an unquoted string is always treated as a variable. Ie.
a["hello"] => a["hello"]
b = "hi"
a[b] => a["hi"]
But in R, this is not the case, for example
a$b < c(1,2,3)
b here is the value/name of the column, not the variable b.
c <- "b"
a$c => column not found (it's looking for column c, not b, which is the value of the variable c)
(I know that in this specific case I can use a[c], but there are many other cases. Such as ggplot(a, aes(x=c)) - I want to plot the column that is the value of c, not with the name c)...
In other StackOverflow questions, I've seen things like quote, substitute etc mentioned.
My question is: Is there a general way of "expanding" a variable and making sure the value of the variable is used, instead of the name of the variable? Or is that just not how things are done in R?
In your example, a$b is syntatic sugar for a[["b"]]. That's a special feature of the $ symbol when used with lists. The second form does what you expect - a[[b]] will return the element of a whose name == the value of the variable b, rather than the element whose name is "b".
Data frames are similar. For a data frame a, the $ operator refers to the column names. So a$b is the same as a[ , "b"]. In this case, to refer to the column of a indicated by the value of b, use a[, b].
The reason that what you posted with respect to the $ operator doesn't work is quite subtle and is in general quite different to most other situations in R where you can just use a function like get which was designed for that purpose. However, calling a$b is equivalent to calling
`$`(a , b)
This reminds us, that in R, everything is an object. $ is a function and it takes two arguments. If we check the source code we can see that calling a$c and expecting R to evaluate c to "b" will never work, because in the source code it states:
/* The $ subset operator.
We need to be sure to only evaluate the first argument.
The second will be a symbol that needs to be matched, not evaluated.
*/
It achieves this using the following:
if(isSymbol(nlist) )
SET_STRING_ELT(input, 0, PRINTNAME(nlist));
else if(isString(nlist) )
SET_STRING_ELT(input, 0, STRING_ELT(nlist, 0));
else {
errorcall(call,_("invalid subscript type '%s'"),
type2char(TYPEOF(nlist)));
}
nlist is the argument you passed do_subset_3 (the name of the C function $ maps to), in this case c. It found that c was a symbol, so it replaces it with a string but does not evaluate it. If it was a string then it is passed as a string.
Here are some links to help you understand the 'why's and 'when's of evaluation in R. They may be enlightening, they may even help, if nothing else they will let you know that you are not alone:
http://developer.r-project.org/nonstandard-eval.pdf
http://journal.r-project.org/2009-1/RJournal_2009-1_Chambers.pdf
http://www.burns-stat.com/documents/presentations/inferno-ish-r/
In that last one, the most important piece is bullet point 2, then read through the whole set of slides. I would probably start with the 3rd one, then the 1st 2.
These are less in the spirit of how to make a specific case work (as the other answers have done) and more in the spirit of what has lead to this state of affairs and why in some cases it makes sense to have standard nonstandard ways of accessing variables. Hopefully understanding the why and when will help with the overall what to do.
If you want to get the variable named "b", use the get function in every case. This will substitute the value of b for get(b) wherever it is found.
If you want to play around with expressions, you need to use quote(), substitute(), bquote(), and friends like you mentioned.
For example:
x <- quote(list(a = 1))
names(x) # [1] "" "a"
names(x) <- c("", a)
x # list(foo = 1)
And:
c <- "foo"
bquote(ggplot(a, aes(x=.(c)))) # ggplot(a, aes(x = "foo"))
substitute(ggplot(a, aes(x=c)), list(c = "foo"))