Are integer vectors numeric vectors in R? - r

I have an integer vector that I expected I could treat as a numeric vector:
> class(pf$age)
[1] "integer"
> is.numeric(pf$age)
[1] TRUE
However, when I try to use it to calculate a correlation, I get an error:
> cor.test(x = "age", y = "friend_count", data = pf)
Error in cor.test.default(x = "age", y = "friend_count", data = pf) :
'x' must be a numeric vector
None of my best guesses at alternate syntax work either: http://pastie.org/9595290
What's going on?
Edit:
The following syntax works:
> x = pf$age
> y = pf$friend_count
> cor.test(x, y, data = pf, method="pearson", alternative="greater")
However, I don't understand why I can't specify x and y in the function (as you can with other R functions like ggplot). What is the difference between ggplot and cor.test?

You don't refer to variables using character strings like that in a function call. You want to pass to the x and y arguments numeric vectors. You passed length 1 character vectors:
> is.numeric("age")
[1] FALSE
> is.character("age")
[1] TRUE
Hence you were asking cor.test() to compute the correlation between the strings "age" and "friend_count".
You also mixed up the formula method of cor.test() with the default one. You supply a formula and a data object or you supply arguments x and y. You can't mix and match.
Two solutions are:
with(pdf, cor.test(x = age, y = friend_count))
cor.test( ~ age + friend_count, data = pf)
The first uses the default method, but we allow ourselves to refer to the variables in pf directly by using with(). The second uses the formula method.
As to your question in the title; yes, integer vectors are considered numeric in R:
> int <- c(1L, 2L)
> is.integer(int)
[1] TRUE
> is.numeric(int)
[1] TRUE
Do note #Joshua Ulrich's point in the comment below. Technically integers are slightly different to numerics in R as Joshua shows. However this difference need not concern users most of the time as R can convert/use these as needed. It does matter in some places, such as .C() calls for example.

You can use 'get' with strings to get data:
age = pf$age
friend_count = pf$friend_count
or:
attach(pf)
then following should work:
cor.test(x = get("age"), y = get("friend_count"))

Related

How to split a "formula" in R

I'm trying to make a small R package with my limited knowledge in R programming. I am trying to use the following argument:
formula=~a+b*X
where X is vector, 'a' and 'b' are constants in a function call.
What I'm wondering is once I input the formula, I want to extract (a,b) and X separately and use them for other data manipulations inside the function call. Is there a way to do it in R?
I would really appreciate any guidance.
Note: Edited my question for clarity
I'm looking for something similar to model.matrix() output. The above mentioned formula can be more generalized to accommodate 'n' number of variables, say,
~2+3*X +4*Y+...+2*Z
In the output, I need the coefficients (2 3 4 ...2) as a vector and [1 X Y ... Z] as a covariate matrix.
The question is not completely clear so we will assume that the question is, given a formula using standard formula syntax, how do we parse out the variables names (or in the second answer the variable names and constants) giving as output a character vector containing them.
1) all.vars Try this:
fo <- a + b * X # input
all.vars(fo)
giving:
[1] "a" "b" "X"
2) strapplyc Also we could do it with string manipulation. In this case it also parses out the constants.
library(gsubfn)
fo <- ~ 25 + 35 * X # input
strapplyc(gsub(" ", "", format(fo)), "-?[0-9.]+|[a-zA-Z0-9._]+", simplify = unlist)
giving:
[1] "25" "35" "X"
Note: If all you are trying to do is to evaluate the RHS of the formula as an R expression then it is just:
X <- 1:3
fo <- ~ 1 + 2 * X
eval(fo[[2]])
giving:
[1] 3 5 7
Update: Fixed and added second solution and Note.
A call is a list of symbols and/or other calls and its elements can be accessed through normal indexing operations, e.g.
f <- ~a+bX
f[[1]]
#`~`
f[[2]]
#a + bX
f[[2]][[1]]
#`+`
f[[2]][[2]]
#a
However notice that in your formula bX is one symbol, you probably meant b * X instead.
f <- ~a + b * X
Then a and b typically would be stored in an unevaluated list.
vars <- call('list', f[[2]][[2]], f[[2]][[3]][[2]])
vars
#list(a, b)
and vars would be passed to eval at some point.

Is it possible to write if-statement for errors in r?

for example, I want to fill in empty lists by lots of data frames(using web-scraping).
The most part of web data behaves nicely, but minor part not(for example, the absence of data in imdb rating's page).
Is it possible to create an if-statement, such that:
if(Error "bla-bla-bla" exists){
then create data.frame(that consists of NA's)
and add data.frame to list
It will be nice to see a pattern or smth like that.
Many thanks!
The simplest way is to use try and examine the class of the object:
x = "5"
y = try(x+x, silent=TRUE)
if(class(y) == "try-catch") {
## Do something
}
If you examine y,
R> y
[1] "Error in x + x : non-numeric argument to binary operator\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in x + x: non-numeric argument to binary operator>
You can extract the error message, attr(y, "condition"). Also look at tryCatch.

Function name in single quotation marks in R

It may be a silly question but I have been bothered for quite a while. I've seen people use single quotation marks to surround the function name when they are defining a function. I keep wondering the benefit of doing so. Below is a naive example
'row.mean' <- function(mat){
return(apply(mat, 1, mean))
}
Thanks in advance!
Going off Richard's assumption, the back ticks allows you to use symbols in names which are normally not allowed. See:
`add+5` <- function(x) {return(x+5)}
defines a function, but
add+5 <- function(x) {return(x+5)}
returns
Error in add + 5 <- function(x) { : object 'add' not found
To refer to the function, you need to explicitly use the back ticks as well.
> `add+5`(3)
[1] 8
To see the code for this function, simply call it without its arguments:
> `add+5`
function(x) {return(x+5)}
See also this comment which deals with the difference between the backtick and quotes in name assignment: https://stat.ethz.ch/pipermail/r-help/2006-December/121608.html
Note, the usage of back ticks is much more general. For example, in a data frame you can have columns named with integers (maybe from using reshape::cast on integer factors).
For example:
test = data.frame(a = "a", b = "b")
names(test) <- c(1,2)
and to retrieve these columns you can use the backtick in conjunction with the $ operator, e.g.:
> test$1
Error: unexpected numeric constant in "test$1"
but
> test$`1`
[1] a
Levels: a
Funnily you can't use back ticks in assigning the data frame column names; the following doesn't work:
test = data.frame(`1` = "a", `2` = "b")
And responding to statechular's comments, here are the two more use cases.
In fix functions
Using the % symbol we can naively define the dot product between vectors x and y:
`%.%` <- function(x,y){
sum(x * y)
}
which gives
> c(1,2) %.% c(1,2)
[1] 5
for more, see: http://dennisphdblog.wordpress.com/2010/09/16/infix-functions-in-r/
Replacement functions
Here is a great answer demonstrating what these are: What are Replacement Functions in R?

Converting from a Formula object to a list

In R, I would like to iterate over a formula object. R automatically converts a formula to a parse tree, so I see no reason why I shouldn't be able to iterate.
For example, f <- ~x + y has elements f[[1]] = ~ and f[[2]] = x + y. However, for(v in f) print(toString(v)) does not output
[1] "~"
[1] "+, x, y"
as I would expect it to. Instead, it gives the error invalid for() loop sequence.
If I need to do it manually, I could always use for(i in 1:length(f)) print(toString(f[[i]])) which does produce the correct output. However, I would like to know why the first method does not work.

How to pass vector to integrate function

I want to integrate a function fun_integrate that has a vector vec as an input parameter:
fun_integrate <- function(x, vec) {
y <- sum(x > vec)
dnorm(x) + y
}
#Works like a charm
fun_integrate(0, rnorm(100))
integrate(fun_integrate, upper = 3, lower = -3, vec = rnorm(100))
300.9973 with absolute error < 9.3e-07
Warning message:
In x > vec :
longer object length is not a multiple of shorter object length
As far as I can see, the problem is the following: integrate calls fun_integrate for a vector of x that it computes based on upper and lower. This vectorized call seems not to work with another vector being passed as an additional argument. What I want is that integrate calls fun_integrate for each x that it computes internally and compares that single x to the vector vec and I'm pretty sure my above code doesn't do that.
I know that I could implement an integration routine myself, i.e. compute nodes between lower and upper and evaluate the function on each node separately. But that wouldn't be my preferred solution.
Also note that I checked Vectorize, but this seems to apply to a different problem, namely that the function doesn't accept a vector for x. My problem is that I want an additional vector as an argument.
integrate(Vectorize(fun_integrate,vectorize.args='x'), upper = 3, lower = -3, vec = rnorm(100),subdivisions=10000)
304.2768 with absolute error < 0.013
#testing with an easier function
test<-function(x,y) {
sum(x-y)
}
test(1,c(0,0))
[1] 2
test(1:5,c(0,0))
[1] 15
Warning message:
In x - y :
longer object length is not a multiple of shorter object length
Vectorize(test,vectorize.args='x')(1:5,c(0,0))
[1] 2 4 6 8 10
#with y=c(0,0) this is f(x)=2x and the integral easy to solve
integrate(Vectorize(test,vectorize.args='x'),1,2,y=c(0,0))
3 with absolute error < 3.3e-14 #which is correct
Roland's answer looks good. Just wanted to point out that it's sum , not integrate that is throwing the warning message.
Rgames> xf <- 1:10
Rgames> vf <- 4:20
Rgames> sum(xf>vf)
[1] 0
Warning message:
In xf > vf :
longer object length is not a multiple of shorter object length
The fact that the answer you got is not the correct value is what suggests that integrate is not sending the x-vector you expected to your function.

Resources