My apologies if this is kind of vague question as I am new to R. While experimenting with R I found one weird behavior. When I create a function like:
myfunction <- function(a,b){
print(a,b)
}
and call it like:
myfunction(b = 10, a = 20)
it returns with result 20, but if I simply call it without function via assigning it directly to variables like:
a <- 20
b <- 10
print(a, b)
I get an error:
Error in print.default(a, b) : invalid 'digits' argument
Furthermore I have read that printing multiple variables in the same line can be accomplished via:
sprintf("%i %i",a, b)
So here is it a bug that it is appearing in function call with result as the first argument?
It might be revealing some underlying differences in how parameters are being handled in different scenarios but I don't think it's a bug.
If your intention is to print both values, consider changing:
print(a,b)
To something like:
print(paste(a,b))
From ?print.default:
# S3 method for default
print(x, digits = NULL, quote = TRUE,
na.print = NULL, print.gap = NULL, right = FALSE,
max = NULL, useSource = TRUE, …)
x the object to be printed.
digits a non-null value for digits specifies the minimum number of
significant digits to be printed in values. The default, NULL, uses
getOption("digits"). (For the interpretation for complex numbers see
signif.) Non-integer values will be rounded down, and only values
greater than or equal to 1 and no greater than 22 are accepted.
...
So R is expecting everything that you actually want to print to be contained in the first variable (x).
Based on your results and some of the comments, apparently in some cases the second variable is being accepted as a valid digits parameter value and in other cases it is not.
While this is a little odd, the more important point is that print(a,b) is not a syntactically correct way to print multiple values.
Related
I am exploring using R optim() or optimx() for a (very) nonlinear optimization. Essentially I wrote a function that takes as its inputs:
1) a data.frame with specific column names/types
2) a numeric vector of length 1
3) a numeric vector of length > 1
The function then takes the inputs, performs some calculations and logic tests, then either returns a very negative value if the logic tests are FALSE, or the value of input #2 if the logic tests are TRUE. The goal is to maximize #2 without tripping the logic tests to FALSE.
I tried using optimx() with the following code (the par values correspond to the inputs I refer to above):
optimout <-
optimx:::optimx(
par = c(inputDF, 5000, rep(99,20)),
fn = MyFunction,
maximize = TRUE)
I received the following error message:
Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower, :
Cannot evaluate function at initial parameters
Ralph
If I understand your problem correctly and you only want to maximize the function based the second input parameter (a numeric vector of length 1), you need to call optimxdifferently, which would make sense given that the data.frame is probably some given input data.
So, try to do the following:
optimout <- optimx(par = c(5000), fn = MyFunction, par1=inputDF,
par2=rep(99,20), maximize = TRUE)
where par1 and par2 are the names of the input variables for your function. Essentially, you are providing optimxwith initial values for the input parameters par1 & par2, which are then not being optimized. Thus, the maximum is searched by only changing the value of your second parameter (a numeric vector of length 1), which you chose to start at 5000.
I was playing with R loops when I got this output
printLoop<-function(size){
for(index in seq(size))
cat("\n Index is at:",index)
}
And since I have started to program in R just few days ago.I here made a simple print function.I gave different inputs to this but on giving input 0 I got following output.
Index is at: 1
Index is at: 0
Why is that?
So I thought that may be there is something wrong in seq() and by passing 0 to it I got results 0 and 1 again; my question is why?
In addition to my comment above, see ?seq, relevant/important points bolded:
Typical usages are
seq(from, to)
seq(from, to, by= )
seq(from, to, length.out= )
seq(along.with= )
seq(from)
seq(length.out= )
The first form generates
the sequence from, from+/-1, ..., to (identical to from:to).
The second form generates from, from+by, ..., up to the sequence value
less than or equal to to. Specifying to - from and by of opposite
signs is an error. Note that the computed final value can go just
beyond to to allow for rounding error, but is truncated to to. (‘Just
beyond’ is by up to 1e-10 times abs(from - to).)
The third generates a sequence of length.out equally spaced values
from from to to. (length.out is usually abbreviated to length or len,
and seq_len is much faster.)
The fourth form generates the integer sequence 1, 2, ...,
length(along.with). (along.with is usually abbreviated to along, and
seq_along is much faster.)
The fifth form generates the sequence 1, 2, ..., length(from) (as if
argument along.with had been specified), unless the argument is
numeric of length 1 when it is interpreted as 1:from (even for seq(0)
for compatibility with S). Using either seq_along or seq_len is much
preferred (unless strict S compatibility is essential).
The final form generates the integer sequence 1, 2, ..., length.out
unless length.out = 0, when it generates integer(0).
So one way to specify your function to get what seems like your desired output is:
printLoop<-function(size){
for(index in seq(to=size,by=1L))
cat("\n Index is at:",index)
}
> printLoop(0L)
Error in seq.default(to = size, by = 1L) : wrong sign in 'by' argument
(note that if you don't want an error, you could just use seq_len(size))
Which is simply obeying the other admonishment of ?seq, namely:
The interpretation of the unnamed arguments of seq and seq.int is not standard, and it is recommended always to name the arguments when programming.
I have two lists of lists. humanSplit and ratSplit. humanSplit has element of the form::
> humanSplit[1]
$Fetal_Brain_408_AGTCAA_L001_R1_report.txt
humanGene humanReplicate alignment RNAtype
66 DGKI Fetal_Brain_408_AGTCAA_L001_R1_report.txt 6 reg
68 ARFGEF2 Fetal_Brain_408_AGTCAA_L001_R1_report.txt 5 reg
If you type humanSplit[[1]], it gives the data without name $Fetal_Brain_408_AGTCAA_L001_R1_report.txt
RatSplit is also essentially similar to humanSplit with difference in column order. I want to apply fisher's test to every possible pairing of replicates from humanSplit and ratSplit. Now I defined the following empty vector which I will use to store the informations of my fisher's test
humanReplicate <- vector(mode = 'character', length = 0)
ratReplicate <- vector(mode = 'character', length = 0)
pvalue <- vector(mode = 'numeric', length = 0)
For fisher's test between two replicates of humanSplit and ratSplit, I define the following function. In the function I use `geneList' which is a data.frame made by reading a file and has form:
> head(geneList)
human rat
1 5S_rRNA 5S_rRNA
2 5S_rRNA 5S_rRNA
Now here is the main function, where I use a function getGenetype which I already defined in other part of the code. Also x and y are integers :
fishertest <-function(x,y) {
ratReplicateName <- names(ratSplit[x])
humanReplicateName <- names(humanSplit[y])
## merging above two based on the one-to-one gene mapping as in geneList
## defined above.
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
mergedRatData <- merge(geneList, ratSplit[[x]], by.x = "rat", by.y = "ratGene")
## [here i do other manipulation with using already defined function
## getGenetype that is defined outside of this function and make things
## necessary to define following contingency table]
contingencyTable <- matrix(c(HnRn,HnRy,HyRn,HyRy), nrow = 2)
fisherTest <- fisher.test(contingencyTable)
humanReplicate <- c(humanReplicate,humanReplicateName )
ratReplicate <- c(ratReplicate,ratReplicateName )
pvalue <- c(pvalue , fisherTest$p)
}
After doing all this I do the make matrix eg to use in apply. Here I am basically trying to do something similar to double for loop and then using fisher
eg <- expand.grid(i = 1:length(ratSplit),j = 1:length(humanSplit))
junk = apply(eg, 1, fishertest(eg$i,eg$j))
Now the problem is, when I try to run, it gives the following error when it tries to use function fishertest in apply
Error in humanSplit[[y]] : recursive indexing failed at level 3
Rstudio points out problem in following line:
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
Ultimately, I want to do the following:
result <- data.frame(humanReplicate,ratReplicate, pvalue ,alternative, Conf.int1, Conf.int2, oddratio)
I am struggling with these questions:
In defining fishertest function, how should I pass ratSplit and humanSplit and already defined function getGenetype?
And how I should use apply here?
Any help would be much appreciated.
Up front: read ?apply. Additionally, the first three hits on google when searching for "R apply tutorial" are helpful snippets: one, two, and three.
Errors in fishertest()
The error message itself has nothing to do with apply. The reason it got as far as it did is because the arguments you provided actually resolved. Try to do eg$i by itself, and you'll see that it is returning a vector: the corresponding column in the eg data.frame. You are passing this vector as an index in the i argument. The primary reason your function erred out is because double-bracket indexing ([[) only works with singles, not vectors of length greater than 1. This is a great example of where production/deployed functions would need type-checking to ensure that each argument is a numeric of length 1; often not required for quick code but would have caught this mistake. Had it not been for the [[ limit, your function may have returned incorrect results. (I've been bitten by that many times!)
BTW: your code is also incorrect in its scoped access to pvalue, et al. If you make your function return just the numbers you need and the aggregate it outside of the function, your life will simplify. (pvalue <- c(pvalue, ...) will find pvalue assigned outside the function but will not update it as you want. You are defeating one purpose of writing this into a function. When thinking about writing this function, try to answer only this question: "how do I compare a single rat record with a single human record?" Only after that works correctly and simply without having to overwrite variables in the parent environment should you try to answer the question "how do I apply this function to all pairs and aggregate it?" Try very hard to have your function not change anything outside of its own environment.
Errors in apply()
Had your function worked properly despite these errors, you would have received the following error from apply:
apply(eg, 1, fishertest(eg$i, eg$j))
## Error in match.fun(FUN) :
## 'fishertest(eg$i, eg$j)' is not a function, character or symbol
When you call apply in this sense, it it parsing the third argument and, in this example, evaluates it. Since it is simply a call to fishertest(eg$i, eg$j) which is intended to return a data.frame row (inferred from your previous question), it resolves to such, and apply then sees something akin to:
apply(eg, 1, data.frame(...))
Now that you see that apply is being handed a data.frame and not a function.
The third argument (FUN) needs to be a function itself that takes as its first argument a vector containing the elements of the row (1) or column (2) of the matrix/data.frame. As an example, consider the following contrived example:
eg <- data.frame(aa = 1:5, bb = 11:15)
apply(eg, 1, mean)
## [1] 6 7 8 9 10
# similar to your use, will not work; this error comes from mean not getting
# any arguments, your error above is because
apply(eg, 1, mean())
## Error in mean.default() : argument "x" is missing, with no default
Realize that mean is a function itself, not the return value from a function (there is more to it, but this definition works). Because we're iterating over the rows of eg (because of the 1), the first iteration takes the first row and calls mean(c(1, 11)), which returns 6. The equivalent of your code here is mean()(c(1, 11)) will fail for a couple of reasons: (1) because mean requires an argument and is not getting, and (2) regardless, it does not return a function itself (in a "functional programming" paradigm, easy in R but uncommon for most programmers).
In the example here, mean will accept a single argument which is typically a vector of numerics. In your case, your function fishertest requires two arguments (templated by my previous answer to your question), which does not work. You have two options here:
Change your fishertest function to accept a single vector as an argument and parse the index numbers from it. Bothing of the following options do this:
fishertest <- function(v) {
x <- v[1]
y <- v[2]
ratReplicateName <- names(ratSplit[x])
## ...
}
or
fishertest <- function(x, y) {
if (missing(y)) {
y <- x[2]
x <- x[1]
}
ratReplicateName <- names(ratSplit[x])
## ...
}
The second version allows you to continue using the manual form of fishertest(1, 57) while also allowing you to do apply(eg, 1, fishertest) verbatim. Very readable, IMHO. (Better error checking and reporting can be used here, I'm just providing a MWE.)
Write an anonymous function to take the vector and split it up appropriately. This anonymous function could look something like function(ii) fishertest(ii[1], ii[2]). This is typically how it is done for functions that either do not transform as easily as in #1 above, or for functions you cannot or do not want to modify. You can either assign this intermediary function to a variable (which makes it no longer anonymous, figure that) and pass that intermediary to apply, or just pass it directly to apply, ala:
.func <- function(ii) fishertest(ii[1], ii[2])
apply(eg, 1, .func)
## equivalently
apply(eg, 1, function(ii) fishertest(ii[1], ii[2]))
There are two reasons why many people opt to name the function: (1) if the function is used multiple times, better to define once and reuse; (2) it makes the apply line easier to read than if it contained a complex multi-line function definition.
As a side note, there are some gotchas with using apply and family that, if you don't understand, will be confusing. Not the least of which is that when your function returns vectors, the matrix returned from apply will need to be transposed (with t()), after which you'll still need to rbind or otherwise aggregrate.
This is one area where using ddply may provide a more readable solution. There are several tutorials showing it off. For a quick intro, read this; for a more in depth discussion on the bigger picture in which ddply plays a part, read Hadley's Split, Apply, Combine Strategy for Data Analysis paper from JSS.
calld=data.frame(matrix(rnorm(100*50,0,1),1000,50))
for (x in names(calld)) {
assign(paste("calld$",x,sep=""),pnorm(get(paste("calld$",x,sep="")),0,1,lower.tail=T,log.p=F))
}
Error in get(paste("calld$", x, sep = "")) : object 'calld$X1' not found
Am I using the get function correctly?? I am trying to concatenate the names of the data set via a loop and paste of it's existing valued by passing the values through a pnorm (cumulative normal distribution function). But I keep getting an error. The function works when I call the variable names in the "calld" dataframe. The problem is the concentration process of creating the loop. Where am I going wrong? I appreciate your help
Update::
I took your advice guys and reedited the loop, to.
for (n in names(calld)) {
get("calld")[[n]]=pnorm(get("calld")[[n]],0,1,lower.tail=T,log.p=F)
}
Error in get("calld")[[n]] = pnorm(get("calld")[[n]], 0, 1, lower.tail = T, :
target of assignment expands to non-language object
But now I am getting this new error. So everything on the right hand side of the equation in the loop when I tested it it works. The error arises when I set it the value equal to itself, replacing the prior values.
Have mercy on kittens!
You can't use assign this way, nor get.
calld[] <- lapply(calld, pnorm, mean = 0, sd = 1)
Explanantion: calld[]<- replaces all existing columns of calld (whilst retaining the structure as a data.frame) with the results of lapply(calld, pnorm, mean = 0, sd = 1) which cycles through all columns of calld, applying pnorm on each one.
library(fortunes)
fortune(312)
The problem here is that the $ notation is a magical shortcut and like any other magic if used incorrectly is likely to do the programmatic equivalent of turning yourself into a toad.
-- Greg Snow (in response to a user that wanted to access a column whose name is stored in y via x$y rather than x[[y]])
R-help (February 2012)
I have defined a couple of functions of arity 1, say func1(-) and func2(-). I have tested them and seen that they actually do what they are supposed to.
I wish to define a third function, say func3(-), that outputs the difference of func1(-) and func2(-). This is what I do
func3(k) = {j=func1(k)-func2(k); print(j)}
Nevertheless, it doesn't return what it ought to. Let us suppose that func1(5) outputs 10 and func2(5) outputs 2. Then, func3(5) ought to output an 8, right? It returns instead the output of func1(5) in one row, the output of func2(2) in another row, and then a zero (even though the difference of the corresponding outputs is not 0).
Do you know what's wrong with the definition of func3(-)?
A GP user function returns the last evaluated value. Here, it's the resut of
the 'print(j)' command, which prints j (side effect) and returns 'void',
which is typecast to 0 when it must be given a value, as here.
f1(x) = 10
f2(x) = 2
f3(x) = f1(x) - f2(x)
correctly returns 8. You didn't give the code for your func1 / func2
functions, but I expect you included a 'print' statement, maybe expecting it
to return a value. That's why you get outputs on different rows, before the 0.
If you don't like this 'return-last-evaluation-result' behaviour, you can use
explicit 'return (result)' statements.