String Loop to Form a Variable name in R - r

calld=data.frame(matrix(rnorm(100*50,0,1),1000,50))
for (x in names(calld)) {
assign(paste("calld$",x,sep=""),pnorm(get(paste("calld$",x,sep="")),0,1,lower.tail=T,log.p=F))
}
Error in get(paste("calld$", x, sep = "")) : object 'calld$X1' not found
Am I using the get function correctly?? I am trying to concatenate the names of the data set via a loop and paste of it's existing valued by passing the values through a pnorm (cumulative normal distribution function). But I keep getting an error. The function works when I call the variable names in the "calld" dataframe. The problem is the concentration process of creating the loop. Where am I going wrong? I appreciate your help
Update::
I took your advice guys and reedited the loop, to.
for (n in names(calld)) {
get("calld")[[n]]=pnorm(get("calld")[[n]],0,1,lower.tail=T,log.p=F)
}
Error in get("calld")[[n]] = pnorm(get("calld")[[n]], 0, 1, lower.tail = T, :
target of assignment expands to non-language object
But now I am getting this new error. So everything on the right hand side of the equation in the loop when I tested it it works. The error arises when I set it the value equal to itself, replacing the prior values.

Have mercy on kittens!
You can't use assign this way, nor get.
calld[] <- lapply(calld, pnorm, mean = 0, sd = 1)
Explanantion: calld[]<- replaces all existing columns of calld (whilst retaining the structure as a data.frame) with the results of lapply(calld, pnorm, mean = 0, sd = 1) which cycles through all columns of calld, applying pnorm on each one.
library(fortunes)
fortune(312)
The problem here is that the $ notation is a magical shortcut and like any other magic if used incorrectly is likely to do the programmatic equivalent of turning yourself into a toad.
-- Greg Snow (in response to a user that wanted to access a column whose name is stored in y via x$y rather than x[[y]])
R-help (February 2012)

Related

Writing/applying "subtract the mean"-function to standardize regression parameters

I was trying to write and apply a seemingly easy function that would standardize my continuous regression parameters/ predictors. The reason is that I want to deal with multicollinearity.
So instead of writing x-mean(x,na.rm=T) each time, I'm looking for something more handy which does the job for me - not least because I wanted to exercize writing functions in R. ;)
So here is what I tried:
fun <- function(data.frame, x){
data.frame$x - mean(data.frame$x, na.rm=T)
}
Apparently this is not too wrong. At least it doesn't return an error message.
However, applying fun to, say, the built-in mtcars dataset and, say, the variable disp yields this error message:
#Loading the data:
data("mtcars")
fun(mtcars,x=disp) #I tried several ways, e.g. w and w/o "mtcars" in front
Warning message:
In mean.default(mtcars$x, na.rm = T) :
argument is not numeric or logical: returning NA
My guess is that it is about how I applied the function, because when I do manually what the function is supposed to do, it works perfectly.
Also, I was looking for similar questions on writing and applying such a function (also beyond the Stack Exchange universe), but I didn't find anything helpful.
Hope I didn't make a blunder due to my novice R-skills.
There is already a function in R which does what you want to do: scale().
You can just write scale(mtcars$hp, center = TRUE, scale = FALSE) which then subtracts the mean of the vector from the vector itself.
In combination with apply this is powerful; You can, for example center every column of your dataframe by writing:
apply(dataframe, MARGIN = 2, FUN = scale, center = TRUE, scale = FALSE)
Before you do that you have to make sure that this is a valid function for your column. You cannot scale factors or characters, for example.
In regards to your question: Your function should have to look like this:
fun <- function(data.frame, x){
data.frame[[x]] - mean(data.frame[[x]], na.rm=T)
}
and then when specifying the function you would have to write fun(mtcars, "hp") and specify the variable name in quotation marks. This is because of the special way the $ operator works, you cannot use a character string after it.

simulation a while loop

there might be some threads on while loops but I am struggling with them. It would be great if someone could help an R beginner out.
So I am trying to do 10000 simulations from a an out of sample regression forecast using the forecast parameters: mean, sd. Thankfully, my data is normal.
This is what I have
N<-10000
i<-1:N
k<-vector(,N)
while(i<N+1){k(,i)=vector(,rnorm(N,mean=.004546,sd=.00464163))}
...and I get this error
Error in vector(, rnorm(5000, mean = 0.004546, sd = 0.00464163)) :
invalid 'length' argument
In addition: Warning message:
In while (i < N + 1) { : the condition has length > 1 and only the first element will be used
I can't seem to get my head around it.
No reason to create a loop here. If you want to put 10000 samples, normal distributed around mean = 0.004546 and sd = 0.00464163 into vector k, just do:
k <- rnorm(10000,mean = 0.004546, sd = 0.00464163)
try this
N<-10
i<-1
k<-matrix(0,1,N)
while(i<N+1){k[i]=rnorm(1,mean=.004546,sd=.00464163)
i=i+1
}
print(k)
To solve your problem, use #Esben Friis' answer. You are taking a hard approach to an easy problem.
To adress the questions you had about the error messages you got however:
Error in vector(, rnorm(5000, mean = 0.004546, sd = 0.00464163)) :
invalid 'length' argument
This is the wrong way to go as vector() will produce a vector of a set length instead of a set of values. You are thinking about the as.vector() function:
as.vector(rnorm(5000, mean = 0.004546, sd = 0.00464163))
This is however not needed as this will only create a new vector of your values, which are already in a vector structure of the type double. Using this function will therefore not change anything.
It is best to simply use:
rnorm(5000, mean=0.004546, sd=0.00464163)
Further:
In addition: Warning message:
In while(i<N+1){: the condition has length>1 and only the first element will be used
This warning stems from i being a vector 1:N with a length larger than 1. The warning states that only the first index in i will be recycled (used in all instances of the loop) which is the same as doing i[1] .
while(i<N+1){ }
#is the same as
while(i[1]<N+1){ }
Instead you want to loop a new value to N. Furthermore you can use the <= (less or equal to) operator instead of doing <N+1 .
while(newVal<=N){ }
This method will bring up new problems which could be solved by using a for() loop instead, but that is however out of the scope of the question and really not the right approach to your problem, as stated in the beginning. Hope you learned something and good luck!

R: passing by parameter to function and using apply instead of nested loop and recursive indexing failed

I have two lists of lists. humanSplit and ratSplit. humanSplit has element of the form::
> humanSplit[1]
$Fetal_Brain_408_AGTCAA_L001_R1_report.txt
humanGene humanReplicate alignment RNAtype
66 DGKI Fetal_Brain_408_AGTCAA_L001_R1_report.txt 6 reg
68 ARFGEF2 Fetal_Brain_408_AGTCAA_L001_R1_report.txt 5 reg
If you type humanSplit[[1]], it gives the data without name $Fetal_Brain_408_AGTCAA_L001_R1_report.txt
RatSplit is also essentially similar to humanSplit with difference in column order. I want to apply fisher's test to every possible pairing of replicates from humanSplit and ratSplit. Now I defined the following empty vector which I will use to store the informations of my fisher's test
humanReplicate <- vector(mode = 'character', length = 0)
ratReplicate <- vector(mode = 'character', length = 0)
pvalue <- vector(mode = 'numeric', length = 0)
For fisher's test between two replicates of humanSplit and ratSplit, I define the following function. In the function I use `geneList' which is a data.frame made by reading a file and has form:
> head(geneList)
human rat
1 5S_rRNA 5S_rRNA
2 5S_rRNA 5S_rRNA
Now here is the main function, where I use a function getGenetype which I already defined in other part of the code. Also x and y are integers :
fishertest <-function(x,y) {
ratReplicateName <- names(ratSplit[x])
humanReplicateName <- names(humanSplit[y])
## merging above two based on the one-to-one gene mapping as in geneList
## defined above.
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
mergedRatData <- merge(geneList, ratSplit[[x]], by.x = "rat", by.y = "ratGene")
## [here i do other manipulation with using already defined function
## getGenetype that is defined outside of this function and make things
## necessary to define following contingency table]
contingencyTable <- matrix(c(HnRn,HnRy,HyRn,HyRy), nrow = 2)
fisherTest <- fisher.test(contingencyTable)
humanReplicate <- c(humanReplicate,humanReplicateName )
ratReplicate <- c(ratReplicate,ratReplicateName )
pvalue <- c(pvalue , fisherTest$p)
}
After doing all this I do the make matrix eg to use in apply. Here I am basically trying to do something similar to double for loop and then using fisher
eg <- expand.grid(i = 1:length(ratSplit),j = 1:length(humanSplit))
junk = apply(eg, 1, fishertest(eg$i,eg$j))
Now the problem is, when I try to run, it gives the following error when it tries to use function fishertest in apply
Error in humanSplit[[y]] : recursive indexing failed at level 3
Rstudio points out problem in following line:
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
Ultimately, I want to do the following:
result <- data.frame(humanReplicate,ratReplicate, pvalue ,alternative, Conf.int1, Conf.int2, oddratio)
I am struggling with these questions:
In defining fishertest function, how should I pass ratSplit and humanSplit and already defined function getGenetype?
And how I should use apply here?
Any help would be much appreciated.
Up front: read ?apply. Additionally, the first three hits on google when searching for "R apply tutorial" are helpful snippets: one, two, and three.
Errors in fishertest()
The error message itself has nothing to do with apply. The reason it got as far as it did is because the arguments you provided actually resolved. Try to do eg$i by itself, and you'll see that it is returning a vector: the corresponding column in the eg data.frame. You are passing this vector as an index in the i argument. The primary reason your function erred out is because double-bracket indexing ([[) only works with singles, not vectors of length greater than 1. This is a great example of where production/deployed functions would need type-checking to ensure that each argument is a numeric of length 1; often not required for quick code but would have caught this mistake. Had it not been for the [[ limit, your function may have returned incorrect results. (I've been bitten by that many times!)
BTW: your code is also incorrect in its scoped access to pvalue, et al. If you make your function return just the numbers you need and the aggregate it outside of the function, your life will simplify. (pvalue <- c(pvalue, ...) will find pvalue assigned outside the function but will not update it as you want. You are defeating one purpose of writing this into a function. When thinking about writing this function, try to answer only this question: "how do I compare a single rat record with a single human record?" Only after that works correctly and simply without having to overwrite variables in the parent environment should you try to answer the question "how do I apply this function to all pairs and aggregate it?" Try very hard to have your function not change anything outside of its own environment.
Errors in apply()
Had your function worked properly despite these errors, you would have received the following error from apply:
apply(eg, 1, fishertest(eg$i, eg$j))
## Error in match.fun(FUN) :
## 'fishertest(eg$i, eg$j)' is not a function, character or symbol
When you call apply in this sense, it it parsing the third argument and, in this example, evaluates it. Since it is simply a call to fishertest(eg$i, eg$j) which is intended to return a data.frame row (inferred from your previous question), it resolves to such, and apply then sees something akin to:
apply(eg, 1, data.frame(...))
Now that you see that apply is being handed a data.frame and not a function.
The third argument (FUN) needs to be a function itself that takes as its first argument a vector containing the elements of the row (1) or column (2) of the matrix/data.frame. As an example, consider the following contrived example:
eg <- data.frame(aa = 1:5, bb = 11:15)
apply(eg, 1, mean)
## [1] 6 7 8 9 10
# similar to your use, will not work; this error comes from mean not getting
# any arguments, your error above is because
apply(eg, 1, mean())
## Error in mean.default() : argument "x" is missing, with no default
Realize that mean is a function itself, not the return value from a function (there is more to it, but this definition works). Because we're iterating over the rows of eg (because of the 1), the first iteration takes the first row and calls mean(c(1, 11)), which returns 6. The equivalent of your code here is mean()(c(1, 11)) will fail for a couple of reasons: (1) because mean requires an argument and is not getting, and (2) regardless, it does not return a function itself (in a "functional programming" paradigm, easy in R but uncommon for most programmers).
In the example here, mean will accept a single argument which is typically a vector of numerics. In your case, your function fishertest requires two arguments (templated by my previous answer to your question), which does not work. You have two options here:
Change your fishertest function to accept a single vector as an argument and parse the index numbers from it. Bothing of the following options do this:
fishertest <- function(v) {
x <- v[1]
y <- v[2]
ratReplicateName <- names(ratSplit[x])
## ...
}
or
fishertest <- function(x, y) {
if (missing(y)) {
y <- x[2]
x <- x[1]
}
ratReplicateName <- names(ratSplit[x])
## ...
}
The second version allows you to continue using the manual form of fishertest(1, 57) while also allowing you to do apply(eg, 1, fishertest) verbatim. Very readable, IMHO. (Better error checking and reporting can be used here, I'm just providing a MWE.)
Write an anonymous function to take the vector and split it up appropriately. This anonymous function could look something like function(ii) fishertest(ii[1], ii[2]). This is typically how it is done for functions that either do not transform as easily as in #1 above, or for functions you cannot or do not want to modify. You can either assign this intermediary function to a variable (which makes it no longer anonymous, figure that) and pass that intermediary to apply, or just pass it directly to apply, ala:
.func <- function(ii) fishertest(ii[1], ii[2])
apply(eg, 1, .func)
## equivalently
apply(eg, 1, function(ii) fishertest(ii[1], ii[2]))
There are two reasons why many people opt to name the function: (1) if the function is used multiple times, better to define once and reuse; (2) it makes the apply line easier to read than if it contained a complex multi-line function definition.
As a side note, there are some gotchas with using apply and family that, if you don't understand, will be confusing. Not the least of which is that when your function returns vectors, the matrix returned from apply will need to be transposed (with t()), after which you'll still need to rbind or otherwise aggregrate.
This is one area where using ddply may provide a more readable solution. There are several tutorials showing it off. For a quick intro, read this; for a more in depth discussion on the bigger picture in which ddply plays a part, read Hadley's Split, Apply, Combine Strategy for Data Analysis paper from JSS.

Using bnlearn Function "cpquery" Within a Loop

I'm attempting to use the bnlearn package to calculate conditional probabilities, and I'm running into a problem when the "cpquery" function is used within a loop. I've created an example, shown below, using data included with the package. When using the cpquery function in a loop, a variable created in the loop ("evi" in the example) is not recognized by the function. I receive the error:
Error in parse(text = evi) : object 'evi' not found
The creation steps of "evi" are based on examples provided by the author.
Any help you could provide would be great. I'm desperate to find a way that I can apply the cpquery function for a large number of observations.
library(bnlearn)
data(learning.test)
fitted = bn.fit(hc(learning.test), learning.test)
bn.function <- function(network, evidence_data) {
a <- NULL
b <- nrow(evidence_data)
for (i in 1:b) {
evi <- paste("(", names(evidence_data), "=='",
sapply(evidence_data[i,], as.character), "')",
sep = "", collapse = " & ")
a[i] <- cpquery(network, (C=='c'), eval(parse(text=evi)))
}
return(a)
}
test <- bn.function(fitted, learning.test)
Thanks in advance!
I don't know if this is due to a bugfix or just because I tried another approach - anyways, looping works if you iteratively build up the evidence list outside of the cpquery-function.
An example for an iteration through a list called evidenceData with all-positive evidences:
for(i in names(evidenceData)){
loopEvidenceList <- list()
loopEvidenceList[[i]] <- "TRUE"
a =cpquery(fitted = bayesNet, event = queryNode == "TRUE",
evidence = loopEvidenceList, method = "lw", n = 100000)
print(a)
}
Depending on the way your evidence is availible, you might need more sophisticated preparation of the "loopEvidenceList" but once you got that prepared, it works fine.
To avoid the scoping problem, you can postpone the call to eval and do it inside the cpquery function. If you directly pass evi (the character variable) to cpquery and then parse it inside the definition, the chain of environments gets shifted and cpquery will have access to evi.
You can use m.cpquery <- edit(cpquery) to fork your own version of the function and insert the following line at its beginning:
evidence = parse(text = evidence)
and then save your new function.
So the heading of m.cpquery will look like:
> m.cpquery
function (fitted, event, evidence, cluster = NULL, method = "ls",
..., debug = FALSE)
{
evidence = parse(text = evidence)
check.fit(fitted)
check.logical(debug)
...
Now you can use m.cpquery in your own function like before, except we'll pass the plain character variable to it:
a[i] <- m.cpquery(network, (C=='c'), evi)
Note that in the first line of m.cpquery, we only parsed the evidence character variable and didn't call eval on it. cpquery is a front-end to conditional.probability.query (see here) and we're relying on conditional.probability.query's subsequent call to eval.
I should say that this is a rather ugly workaround. And it only works if you are using logic sampling (method='ls'). But if you want to use likelihood weighting, the check.mutilated.evidence function will raise an error. I haven't checked if injecting an eval expression before it gets called would result in a mayhem of subsequent errors leading to hell.
I feel like the problem is you are using the same variable in evidence as well as event. Learning.test contains the values of "C" variable. then we are trying to predict C as the event. Maybe using a subset of the original dataset excluding C will do the trick

Why the parameter I am trying to estimate is "not found"?

I am trying to optimise my likelihood function of R_j and R_m using optim to estimate al_j, au_j, b_j and sigma_j. This is what I did.
a = read.table("D:/ff.txt",header=T)
attach(a)
a
R_j R_m
1 2e-03 0.026567295
2 3e-03 0.009798475
3 5e-02 0.008497274
4 -1e-02 0.012464578
5 -9e-04 0.002896023
6 9e-02 0.000879473
7 1e-02 0.003194435
8 6e-04 0.010281122
The parameters al_j, au_j, b_j and sigma_j need to be estimated.
llik=function(R_j,R_m)
if(R_j< 0)
{
sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2]
}else if(R_j>0)
{
sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2]
}else if(R_j==0)
{
sum(log(pnorm(au_j,mean=b_j*R_m,sd=sigma_j)-pnorm(al_j,mean=b_j*R_m,sd=sigma_j)))
}
start.par=c(al_j=0,au_j=0,sigma_j=0.01,b_j=1)
out1=optim(llik,par=start.par,method="Nelder-Mead")
Error in pnorm(au_j, mean = b_j * R_m, sd = sigma_j) :
object 'au_j' not found
It is difficult to tell where to start on this.
As #mac said, your code is difficult to read. It also contains errors.
For example, if you try sum[c(1,2)] you will get an error: you should use sum(c(1,2)). In any case, you seem to be taking the sum in the wrong place. You cannot use if and else if on vectors, and need to use ifelse. You have nothing to stop the standard deviation going negative. There is more.
The following code runs without errors or warnings. You will still have to decide whether it does what you want.
a <- data.frame( R_j = c(0.002,0.003,0.05,-0.01,-0.0009,0.09,0.01,0.0006),
R_m = c(0.026567295,0.009798475,0.008497274,0.012464578,
0.002896023,0.000879473,0.003194435,0.010281122) )
llik = function(x)
{
al_j=x[1]; au_j=x[2]; sigma_j=x[3]; b_j=x[4]
sum(
ifelse(a$R_j< 0, log(1/(2*pi*(sigma_j^2)))-
(1/(2*(sigma_j^2))*(a$R_j+al_j-b_j*a$R_m))^2,
ifelse(a$R_j>0 , log(1/(2*pi*(sigma_j^2)))-
(1/(2*(sigma_j^2))*(a$R_j+au_j-b_j*a$R_m))^2,
log(pnorm(au_j,mean=b_j*a$R_m,sd=sqrt(sigma_j^2))-
pnorm(au_j,mean=b_j*a$R_m,sd=sqrt(sigma_j^2)))))
)
}
start.par = c(0, 0, 0.01, 1)
out1 = optim(llik, par=start.par, method="Nelder-Mead")
Let's start with the error message:
Error in pnorm(au_j, mean = b_j * R_m, sd = sigma_j) :
object 'au_j' not found
So R is telling you that when it got to the pnorm call, it couldn't find anything called 'au_j' to use in that call. Your next step should be to look at your function, llik, and try to identify how you expect the variable 'au_j' to be defined within that function.
At this point, the answer should be fairly clear (maybe!). Nowhere in llik is the variable 'au_j' assigned a value. So it won't be 'created' inside the function. R's scoping rules will then cause it to look outside the function in the global environment for something called 'au_j'.
And you might say that here is where things should work, since you assigned 'au_j' a value within start.par. But that's a list, and R can't find the named object 'au_j' inside a list like that.
So the solution here is most likely to rework your function llik so that it takes as arguments everything that it will use, so you're going to add everything in start.par to the arguments of llik. Something like:
llik <- function(par=c(al_j,au_j,sigma_j,b_j),R_j,R_m){...}
and then within llik you'll refer to al_j using par[1] and so forth. Then the optim call should look something like:
optim(start.par,llik,R_j=a$R_j,R_m=a$R_m)
Since you've attached your data, in a, you probably don't have explicitly pass the arguments R_j and R_m in the optim call, but it's probably good practice to do so.
I think I've reconstructed what you're trying to accomplish here (modulo the math, which I haven't even glanced at), but I confess that your code is a bit hard to parse. I would suggest spending some time with the examples in ?optim to make sure you understand how that function is called.

Resources