Question regarding using function c() in R coding - r

I am studying Data analyitcs, my teacher give a question for class "using one-sigma to find any outlier in vector D". He gave his answere as below but I do not understand why he called Out=c() before using function "for" and called "Out" again in fumction c(Out,o)? Could you help me answer this question? Thank you!
D=c(4,6,1,2,8,11)
xbar=mean(D)
std=sd(D)
L=xbar-std
U=xbar+std
Out=c()
for(j in 1:length(D)){
if(D[j]<L | D[j]>U) {o=D[j]} else{o=NULL} Out=c(Out,o)}

Out=c() is your output. It's just an empty dataframe in the beginning. The for loop is iterating j in the length D. So for each j observation, it's performing the conditional statement if(D[j]<L | D[j]>U) {o=D[j]} else{o=NULL} and then putting the results in the Output Out Hope this helps.

The need for an object named Out to exist before entering a for-loop is based on how the c and [<- functions are designed. They need names to exist in the table of objects that the R interpreter maintains, You used "=" but in that context it is really the <- function, the assignment operator, that is being used. The code in the question doesn't appear to use that operator, but it is actually being called when the "=" sign in Out=c(Out,o) is used. You cannot assign a value to the Out on the LHS of the assignment by appending to it because the Out in the RHS doesn't already a value (not even a value of length-0) within the R data objects list when thec-function tries to access its value.
The <- operator is really a function disguised as an infix operator. You can demonstrate this with:
`<-`(my.out , 4)
> my.out
[1] 4
It also has an indexed assignment version [<- which requires that the named object on the LHS exist. This is another source of error for for-loop users. If the named LHS object given to [<- doesn't exist at the time the loop is run, then the first time through the loop you will get and error:
rm(my.out2) #make sure it doesn't exist
for (i in 1:10) { my.out2[i] <- 4 } # LHS doesn't exist, but RHS value exists
#Error: object 'my.out2' not found

Related

R if statement in a function with a fomula argument

testing<-function(formula=NULL,data=NULL){
if(with(data,formula)==T){
print('YESSSS')
}
}
A<-matrix(1:16,4,4)
colnames(A)<-c('x','y','z','gg')
A<-as.data.frame(A)
testing(data=A,formula=(2*x+y==Z))
Error in eval(expr, envir, enclos) : object 'x' not found
##or I can put formula=(x=1)
##reason that I use formula is because my dataset had different location and I would want
##to 'subset' my data into different set
This is the main flow of my code. I had done some search and seems to be no one ask this kind of stupid question or it is not possible to pass a formula in a if statement. Thank you in advance
if you just want subset of your data.frame create a character object representing the formula like this:
formula="2*x+y==z"
testing<-function(data,formula){with(data = data,expr = eval(parse(text = formula)))}
subset(A,testing(A,formula=formula))
#x y z gg
#2 2 6 10 14
You can change the formula as per your need.
If we need to evaluate it, one option is eval(parse
testing<-function(formula=NULL,data=NULL){
data <- deparse(substitute(data))
if(any(eval(parse(text=paste("with(", data, ",",
deparse(substitute(formula)), ")")))))
print("YESSS")
}
testing(data=A,formula=(2*x+y==z))
#[1] "YESSS"
When you call a function in R it evaluates its arguments first before executing the function.
For example, prod(2+2, 3) is first turned into prod(4, 3) before the function prod() is even called.
Thus, in your code, R starts by trying to solve (2*x+y==Z). It fails because there is no x object outside of the function code. So, it not even begin running testing().
To use your function correctly you should make it clear to R that it is not supposed to calculate (2*x+y==Z). Instead it should pass this information as is. You could do that using the functions expression() and eval().
testing<-function(formula=NULL,data=NULL){
if(with(data,eval(formula==T)){
print('YESSSS')
}
}
A<-matrix(1:16,4,4)
colnames(A)<-c('x','y','z','gg')
A<-as.data.frame(A)
testing(data=A,formula=expression(2*x+y==Z))
However, you will notice that there other problems with your code.
For Z is different than z. Notice that the in colnames you use z and in the formula Z.
The if() only works for when there is a single value of true or false. In your case, you will have one value for each row in A. When this happens, if() will only check if the first row fits the criteria.
If your purpose is subsetting, it is much more easier to do:
A.subset <- subset(A, 2*A$x+A$y == A$z)
After a discussion with my colleague,
here is a kind of solution
testing<-function(cx,cy,px,py,z,data=NULL){
list<-NULL
for(m in 1:nrow(data)){
if(cx*data$x[m]^px+cy*data$y[m]^py+data$z==0){
print(m)}
}
}
but this can deal with polynomial only and with a lot of arguments in the function. I am think of a way to reduce it as a general equation.or maybe this is the most easiest equation.

'=' vs. '<-' as a function argument in R

I am a beginner so I'd appreciate any thoughts, and I understand that this question might be too basic for some of you.
Also, this question is not about the difference between <- and =, but about the way they get evaluated when they are part of the function argument. I read this thread, Assignment operators in R: '=' and '<-' and several others, but I couldn't understand the difference.
Here's the first line of code:
My objective is to get rid of variables in the environment. From reading the above thread, I would believe that <- would exist in the user workspace, so there shouldn't be any issue with deleting all variables.
Here is my code and two questions:
Question 1
First off, this code doesn't work.
rm(ls()) #throws an error
I believe this happens because ls() returns a character vector, and rm() expects an object name. Am I correct? If so, I would appreciate if someone could guide me how to get object names from character array.
Question 2
I googled this topic and found that this code below deletes all variables.
rm(list = ls())
While this does help me, I am unsure why = is used instead of <-. If I run the following code, I get an error Error in rm(list <- ls()) : ... must contain names or character strings
rm(list <- ls())
Why is this? Can someone please guide me? I'd appreciate any help/guidance.
I read this thread, Assignment operators in R: '=' and '<-' and several others, but I couldn't understand the difference.
No wonder, since the answers there are actually quite confusing, and some are outright wrong. Since that’s the case, let’s first establish the difference between them before diving into your actual question (which, it turns out, is mostly unrelated):
<- is an assignment operator
In R, <- is an operator that performs assignment from right to left, in the current scope. That’s it.
= is either an assignment operator or a distinct syntactic token
=, by contrast, has several meanings: its semantics change depending on the syntactic context it is used in:
If = is used inside a parameter list, immediately to the right of a parameter name, then its meaning is: “associate the value on the right with the parameter name on the left”.
Otherwise (i.e. in all other situations), = is also an operator, and by default has the same meaning as <-: i.e. it performs assignment in the current scope.
As a consequence of this, the operators <- and = can be used interchangeably1. However, = has an additional syntactic role in an argument list of a function definition or a function call. In this context it’s not an operator and cannot be replaced by <-.
So all these statements are equivalent:
x <- 1
x = 1
x[5] <- 1
x[5] = 1
(x <- 1)
(x = 1)
f((x <- 5))
f((x = 5))
Note the extra parentheses in the last example: if we omitted these, then f(x = 5) would be interpreted as a parameter association rather than an assignment.
With that out of the way, let’s turn to your first question:
When calling rm(ls()), you are passing ls() to rm as the ... parameter. Ronak’s answer explains this in more detail.
Your second question should be answered by my explanation above: <- and = behave differently in this context because the syntactic usage dictates that rm(list = ls()) associates ls() with the named parameter list, whereas <- is (as always) an assignment operator. The result of that assignment is then once again passed as the ... parameter.
1 Unless somebody changed their meaning: operators, like all other functions in R, can be overwritten with new definitions.
To expand on my comment slightly, consider this example:
> foo <- function(a,b) b+1
> foo(1,b <- 2) # Works
[1] 3
> ls()
[1] "b" "foo"
> foo(b <- 3) # Doesn't work
Error in foo(b <- 3) : argument "b" is missing, with no default
The ... argument has some special stuff going on that restricts things a little further in the OP's case, but this illustrates the issue with how R is parsing the function arguments.
Specifically, when R looks for named arguments, it looks specifically for arg = val, with an equals sign. Otherwise, it is parsing the arguments positionally. So when you omit the first argument, a, and just do b <- 1, it thinks the expression b <- 1 is what you are passing for the argument a.
If you check ?rm
rm(..., list = character(),pos = -1,envir = as.environment(pos), inherits = FALSE)
where ,
... - the objects to be removed, as names (unquoted) or character strings (quoted).
and
list - a character vector naming objects to be removed.
So, if you do
a <- 5
and then
rm(a)
it will remove the a from the global environment.
Further , if there are multiple objects you want to remove,
a <- 5
b <- 10
rm(a, b)
This can also be written as
rm(... = a, b)
where we are specifying that the ... part in syntax takes the arguments a and b
Similarly, when we want to specify the list part of the syntax, it has to be given by
rm(list = ls())
doing list <- ls() will store all the variables from ls() in the variable named list
list <- ls()
list
#[1] "a" "b" "list"
I hope this is helpful.

Subsetting of Lists in R

I had a few questions about subsetting a named list in R using the [] operator:
For example, consider the list formals <- list(x = DOUBLE, y = DOUBLE, z = NULL). In this example, DOUBLE is treated as a symbol in R.
1) How should I retrieve all elements that are not equal to NULL. I tried formals[formals != NULL] but this only returns an object of type listwith no members.
2) How should I retrieve elements whose names satisfy for a condition. For example, how would I get all elements whose names are not z? I could use names(formals) but this is cumbersome and I was hoping for a quick solution using [].
Another option for the first question:
Filter(Negate(is.null), formals)
For the second case, you'll have to use names. Here's one way:
formals[names(formals) != 'z']
formals is actually a function in R. It's best to avoid names of functions when naming your variables.
This will work for your first question:
formals[!unlist(lapply(formals, is.null))]
I don't think you can avoid using names for the second question.

Convert character vector to numeric vector in R for value assignment?

I have:
z = data.frame(x1=a, x2=b, x3=c, etc)
I am trying to do:
for (i in 1:10)
{
paste(c('N'),i,sep="") -> paste(c('z$x'),i,sep="")
}
Problems:
paste(c('z$x'),i,sep="") yields "z$x1", "z$x1" instead of calling the actual values. I need the expression to be evaluated. I tried as.numeric, eval. Neither seemed to work.
paste(c('N'),i,sep="") yields "N1", "N2". I need the expression to be merely used as name. If I try to assign it a value such as paste(c('N'),5,sep="") -> 5, ie "N5" -> 5 instead of N5 -> 5, I get target of assignment expands to non-language object.
This task is pretty trivial since I can simply do:
N1 = x1...
N2 = x2...
etc, but I want to learn something new
I'd suggest using something like for( i in 1:10 ) z[,i] <- N[,i]...
BUT, since you said you want to learn something new, you can play around with parse and substitute.
NOTE: these little tools are funny, but experienced users (not me) avoid them.
This is called "computing on the language". It's very interesting, and it helps understanding the way R works. Let me try to give an intro:
The basic language construct is a constant, like a numeric or character vector. It is trivial because it is not different from its "unevaluated" version, but it is one of the building blocks for more complicated expressions.
The (officially) basic language object is the symbol, also known as a name. It's nothing but a pointer to another object, i.e., a token that identifies another object which may or may not exist. For instance, if you run x <- 10, then x is a symbol that refers to the value 10. In other words, evaluating the symbol x yields the numeric vector 10. Evaluating a non-existant symbol yields an error.
A symbol looks like a character string, but it is not. You can turn a string into a symbol with as.symbol("x").
The next language object is the call. This is a recursive object, implemented as a list whose elements are either constants, symbols, or another calls. The first element must not be a constant, because it must evaluate to the real function that will be called. The other elements are the arguments to this function.
If the first argument does not evaluate to an existing function, R will throw either Error: attempt to apply non-function or Error: could not find function "x" (if the first argument is a symbol that is undefined or points to something other than a function).
Example: the code line f(x, y+z, 2) will be parsed as a list of 4 elements, the first being f (as a symbol), the second being x (another symbol), the third another call, and the fourth a numeric constant. The third element y+z, is just a function with two arguments, so it parses as a list of three names: '+', y and z.
Finally, there is also the expression object, that is a list of calls/symbols/constants, that are meant to be evaluated one by one.
You'll find lots of information here:
https://github.com/hadley/devtools/wiki/Computing-on-the-language
OK, now let's get back to your question :-)
What you have tried does not work because the output of paste is a character string, and the assignment function expects as its first argument something that evaluates to a symbol, to be either created or modified. Alternativelly, the first argument can also evaluate to a call associated with a replacement function. These are a little trickier, but they are handled by the assignment function itself, not by the parser.
The error message you see, target of assignment expands to non-language object, is triggered by the assignment function, precisely because your target evaluates to a string.
We can fix that building up a call that has the symbols you want in the right places. The most "brute force" method is to put everything inside a string and use parse:
parse(text=paste('N',i," -> ",'z$x',i,sep=""))
Another way to get there is to use substitute:
substitute(x -> y, list(x=as.symbol(paste("N",i,sep="")), y=substitute(z$w, list(w=paste("x",i,sep="")))))
the inner substitute creates the calls z$x1, z$x2 etc. The outer substitute puts this call as the taget of the assignment, and the symbols N1, N2 etc as the values.
parse results in an expression, and substitute in a call. Both can be passed to eval to get the same result.
Just one final note: I repeat that all this is intended as a didactic example, to help understanding the inner workings of the language, but it is far from good programming practice to use parse and substitute, except when there is really no alternative.
A data.frame is a named list. It usually good practice, and idiomatically R-ish not to have lots of objects in the global environment, but to have related (or similar) objects in lists and to use lapply etc.
You could use list2env to multiassign the named elements of your list (the columns in your data.frame) to the global environment
DD <- data.frame(x = 1:3, y = letters[1:3], z = 3:1)
list2env(DD, envir = parent.frame())
## <environment: R_GlobalEnv>
## ta da, x, y and z now exist within the global environment
x
## [1] 1 2 3
y
## [1] a b c
## Levels: a b c
z
## [1] 3 2 1
I am not exactly sure what you are trying to accomplish. But here is a guess:
### Create a data.frame using the alphabet
data <- data.frame(x = 'a', y = 'b', z = 'c')
### Create a numerical index corresponding to the letter position in the alphabet
index <- which(tolower(letters[1:26]) == data[1, ])
### Use an 'lapply' to apply a function to every element in 'index'; creates a list
val <- lapply(index, function(x) {
paste('N', x, sep = '')
})
### Assign names to our list
names(val) <- names(data)
### Observe the result
val$x

lapply fail, but function works fine for each individual input arguments

Many thanks in advance for any advices or hints.
I'm working with data frames. The simplified coding is as follows:
`
f<-funtion(name){
x<-tapply(name$a,list(name$b,name$c),sum)
1) y<-dataset[[deparse(substitute(name))]]
#where dataset is an already existed list object with names the same as the
#function argument. I would like to avoid inputting two arguments.
z<-vector("list",n) #where n is also defined already
2) for (i in 1:n){z[[i]]<-x[y[[i]],i]}
...
}
lapply(list_names,f)
`
The warning message is:
In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
and the output is incorrect. I tried debugging and found the conflict may lie in line 1) and 2). However, when I try f(name) it is perfectly fine and the output is correct. I guess the problem is in lapply and I searched for a while but could not get to the point. Any ideas? Many thanks!
The structure of the data
Thanks Joran. Checking again I found the problem might not lie in what I had described. I produce the full code as follows and you can copy-paste to see the error.
n<-4
name1<-data.frame(a=rep(0.1,20),b=rep(1:10,each=2),c=rep(1:n,each=5),
d=rep(c("a1","a2","a3","a4","a5","a6","a7","a8","a9","a91"),each=2))
name2<-data.frame(a=rep(0.2,20),b=rep(1:10,each=2),c=rep(1:n,each=5),
d=rep(c("a1","a2","a3","a4","a5","a6","a7","a8","a9","a91"),each=2))
name3<-data.frame(a=rep(0.3,20),b=rep(1:10,each=2),c=rep(1:n,each=5),
d=rep(c("a1","a2","a3","a4","a5","a6","a7","a8","a9","a91"),each=2))
#d is the name for the observations. d corresponds to b.
dataset<-vector("list",3)
names(dataset)<-c("name1","name2","name3")
dataset[[1]]<-list(c(1,2),c(1,2,3,4),c(1,2,3,4,5,10),c(4,5,8))
dataset[[2]]<-list(c(1,2,3,5),c(1,2),c(1,2,10),c(2,3,4,5,8,10))
dataset[[3]]<-list(c(3,5,8,10),c(1,2,5,7),c(1,2,3,4,5),c(2,3,4,6,9))
f<-function(name){
x<-tapply(name$a,list(name$b,name$c),sum)
rownames(x)<-sort(unique(name$d)) #the row names for
y<-dataset[[deparse(substitute(name))]]
z<-vector("list",n)
for (i in 1:n){
z[[i]]<-x[y[[i]],i]}
nn<-length(unique(unlist(sapply(z,names)))) # the number of names appeared
names_<-sort(unique(unlist(sapply(z,names)))) # the names appeared add to the matrix
# below
m<-matrix(,nrow=nn,ncol=n);rownames(m)<-names_
index<-vector("list",n)
for (i in 1:n){
index[[i]]<-match(names(z[[i]]),names_)
m[index[[i]],i]<-z[[i]]
}
return(m)
}
list_names<-vector("list",3)
list_names[[1]]<-name1;list_names[[2]]<-name2;list_names[[3]]<-name3
names(list_names)<-c("name1","name2","name3")
lapply(list_names,f)
f(name1)
the lapply(list_names,f) would fail, but f(name1) will produce exactly the matrix I want. Thanks again.
Why it doesn't work
The issue is the calling stack doesn't look the same in both cases. In lapply, it looks like
[[1]]
lapply(list_names, f) # lapply(X = list_names, FUN = f)
[[2]]
FUN(X[[1L]], ...)
In the expression being evaluated, f is called FUN and its argument name is called X[[1L]].
When you call f directly, the stack is simply
[[1]]
f(name1) # f(name = name1)
Usually this doesn't matter, but with substitute it does because substitute cares about the name of the function argument, not its value. When you get to
y<-dataset[[deparse(substitute(name))]]
inside lapply it's looking for the element in dataset named X[[1L]], and there isn't one, so y is bound to NULL.
A way to get it to work
The simplest way to deal with this is probably to just have f operate on character strings and pass names(list_names) to lapply. This can be accomplished fairly easily by changing the beginning of f to
f<-function(name){
passed.name <- name
name <- list_names[[name]]
x<-tapply(name$a,list(name$b,name$c),sum)
rownames(x)<-sort(unique(name$d)) #the row names for
y<-dataset[[passed.name]]
# the rest of f...
and changing lapply(list_names, f) to lapply(names(list_names),f). This should give you what you want with nearly minimal modification, but you also might consider also renaming some of your variables so the word name isn't used for so many different things--the function names, the argument of f, and all the various variables containing name.

Resources