I am trying to set up a Gibbs sampler in R where I update my value at each step.
I have a function in R that I want to maximise for 2 values; my previous value and a new one.
So I know the maximum outcome from the function applied to both values. But then how do I select the best input without doing it manually? (I need to do a lot of iterations). Here is an idea of the code and the variables:
g0<-function(k){sample(0:1,k,replace=T)}
this is a k dimensional vector with entries 1 or 0 uniformly. Initial starting point for my chain. If i=1 then include the i'th variable in the design matrix.
X1 design matrix
Xg<-function(g){
Xg<-cbind(X1[,1]*g[1],X1[,2]*g[2],X1[,3]*g[3],X1[,4]*g[4],X1[,5]*g[5],X1[,6]*g[6],X1[,7]*g[7])
return(Xg[,which(!apply(Xg,2,FUN = function(x){all(x == 0)}))])
}
Xg0<-Xg(g0)
reduced design matrix for g0
c<-1:100000
mp<-function(g){
mp<-sum((1/(c*(c+1)^-((q+1)/2)))*
(t(Y)%*%Y-(c/(c+1))*t(Y)%*%Xg(g)%*%solve(t(Xg(g))%*%Xg(g))%*%t(Xg(g))%*%Y)^(-27/2))
return(mp)
}
this is my function.
Therefore if I have mp(g) and mp(g*), for 2 inputs g and g*, such that the max is mp(g*) how can I return g*?
Thanks for any help and if you have any queries just ask. sorry about the messy code as well; I have not used this site before.
Like this:
inputs <- list(g, g2)
outputs <- sapply(inputs, mp)
best.input <- inputs[which.max(outputs)]
Related
I am trying to replace all values of r for which r<=10 with the value of the 1st observation in x (which is 1). This is just a very simplified example of what I am trying to do, so please do not question why I'm trying to do this in a complicated way because the full code is more complicated. The only thing I need help with is figuring out how to use the vector I created (p1) to replace r[p1] or equivalently r[c(1,2,3,4)] with x[ 1 ] (which is equal to 1). I can not write p1 explicitly because it will be generated in a loop (not shown in code).
x=c(1,2,3)
r=c(1,3,7,10,15)
assign(paste0("p", x[1]), which(r<=10))
p1
r[paste0("p", x[1])]=x[1]
In the code above, I tried using r[paste0("p", x[1])]=x[1] but this is the output I end up with
When instead I would like to see this output
Basically, I need to figure out a way to call p1 in this code r[??]=x[1] without explicitly typing p1.
I have included the full code I am attempting below in case context is needed.
##Creates a function to generate discrete random values from a specified pmf
##n is the number of random values you wish to generate
##x is a vector of discrete values (e.g. c(1,2,3))
##pmf is the associated pmf for the discrete values (e.g. c(.3,.2,.5))
r.dscrt.pmf=function(n,x,pmf){
set.seed(1)
##Generate n uniform random values from 0 to 1
r=runif(n)
high=0
low=0
for (i in 1:length(x)){
##High will establish the appropriate upper bound to consider
high=high+pmf[i]
if (i==1){
##Creates the variable p1 which contains the positions of all
##r less than or equal to the first value of pmf
assign(paste0("p", x[i]), which(r<=pmf[i]))
} else {
##Creates the variable p2,p3,p4,etc. which contains the positions of all
##r between the appropriate interval of high and low
assign(paste0("p", x[i]), which(r>low & r<=high))
}
##Low will establish the appropriate lower bound to consider
low=low+pmf[i]
}
for (i in 1:length(x)){
##Will loops to replace the values of r at the positions specified at
##p1,p2,p3,etc. with x[1],x[2],x[3],etc. respectively.
r[paste0("p", x[i])]=x[i]
}
##Returns the new r
r
}
##Example call of the function
r.dscrt.pmf(10,c(0,1,3),c(.3,.2,.5))
get is like assign, in that it lets you refer to variables by string instead of name.
r[get(paste0("p", x[1]))]=x[1]
But get is one of those "flags" of something that could be written in a much clearer and safer way.
Would this suit your needs?
ifelse(r<11, x[1], r)
[1] 1 1 1 1 15
I've tried a couple ways of doing this problem but am having trouble with how to write it. I think I did the first three steps correctly, but now I have to fill the vector z with numbers from y that are divisible by four, not divisible by three, and have an odd number of digits. I know that I'm using the print function in the wrong way, I'm just at a loss on what else to use ...
This is different from that other question because I'm not using a while loop.
#Step 1: Generate 1,000,000 random, uniformly distributed numbers between 0
#and 1,000,000,000, and name as a vector x. With a seed of 1.
set.seed(1)
x=runif(1000000, min=0, max=1000000000)
#Step 2: Generate a rounded version of x with the name y
y=round(x,digits=0)
#Step 3: Empty vector named z
z=vector("numeric",length=0)
#Step 4: Create for loop that populates z vector with the numbers from y that are divisible by
#4, not divisible by 3, with an odd number of digits.
for(i in y) {
if(i%%4==0 && i%%3!=0 && nchar(i,type="chars",allowNA=FALSE,keepNA=NA)%%2!=0){
print(z,i)
}
}
NOTE: As per #BenBolker's comment, a loop is an inefficient way to solve your problem here. Generally, in R, try to avoid loops where possible to maximise the efficiency of your code. #SymbolixAU has provided an example of doing so here in the comments. Having said that, in aid of helping you learn the ins-and-outs of loops and vectors, here's a solution which only requires a change to one line of your code:
You've got the vector created before the loop, that's a good start. Now, inside your loop, you need to populate that vector. To do so, you've currently got print(z,i), which won't really do too much. What you need to to change the vector itself:
z <- c( z, i )
Should work for you (just replace that print line in your loop).
What's happening here is that we're taking the existing z vector, binding i to the end of it, and making that new vector z again. So every time a value is added, the vector gets a little longer, such that you'll end up with a complete vector.
where you have print put this instead:
z <- append(z, i)
I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.
I'm attempting to create sigma/summation function with the variables in my dataset that looks like this:
paste0("(choose(",zipdistrib$Leads[1],",",zipdistrib$Starts[1],")*beta(a+",zipdistrib$Starts[1],",b+",zipdistrib$Leads[1],"-",zipdistrib$Starts[1],")/beta(a,b))")
When I enter that code, I get
[1] "(choose(9,6)*beta(a+6,b+9-6)/beta(a,b))"
I want to create a sigma/summation function where a and b are unknown free-floating variables and the values of Leads[i] and Starts[i] are determined by the values for Leads and Starts for observation i in my dataset. I have tried using a sum function in conjunction with mapply and sapply to no avail. Currently, I am taking the tack of creating the function as a string using a for loop in conjunction with a paste0 command so that the only things that change are the values of the variables Leads and Starts. Then, I try coercing the result into a function. To my surprise, I can actually enter this code without creating a syntax error, but when I try optimize the function for variables a and b, I'm not having success.
Here's my attempt to create the function out of a string.
betafcn <- function (a,b) {
abfcnstring <-
for (i in 1:length(zipdistrib$Zip5))
toString(
paste0(" (choose(",zipdistrib$Leads[i],",",zipdistrib$Starts[i],")*beta(a+",zipdistrib$Starts[i],",b+",zipdistrib$Leads[i],"-",zipdistrib$Starts[i],")/beta(a,b))+")
)
as.function(
as.list(
substr(abfcnstring, 1, nchar(abfcnstring)-1)
)
)
}
Then when I try to optimize the function for a and b, I get the following:
optim(c(a=.03, b=100), betafcn(a,b))
## Error in as.function.default(x, envir) :
argument must have length at least 1
Is there a better way for me to compile a sigma from i=1 to length of dataset with mapply or lapply or some other *apply function? Or am I stuck using a dreaded for loop? And then once I create the function, how do I make sure that I can optimize for a and b?
Update
This is what my dataset would look like:
leads <-c(7,4,2)
sales <-c(3,1,0)
zipcodes <-factor(c("11111", "22222", "33333"))
zipleads <-data.frame(ZipCode=zipcodes, Leads=leads, Sales=sales)
zipleads
## ZipCode Leads Sales
# 1 11111 7 3
# 2 22222 4 1
# 3 33333 2 0
My goal is to create a function that would look something like this:
betafcn <-function (a,b) {
(choose(7,3)*beta(a+3,b+7-3)/beta(a,b))+
(choose(4,1)*beta(a+4,b+4-1)/beta(a,b))+
(choose(2,0)*beta(a+0,b+2-0)/beta(a,b))
}
The difference is that I would ideally like to replace the dataset values with any other possible vectors for Leads and Sales.
Since R vectorizes most of its operations by default, you can write an expression in terms of single values of a and b (which will automatically be recycled to the length of the data) and vectors of x and y (i.e., Leads and Sales); if you compute on the log scale, then you can use sum() (rather than prod()) to combine the results. Thus I think you're looking for something like:
betafcn <- function(a,b,x,y,log=FALSE) {
r <- lchoose(x,y)+lbeta(a+x,b+x-y)-lbeta(a,b)
if (log) r else exp(r)
}
Note that (1) optim() minimizes by default (2) if you're trying to optimize a likelihood you're better off optimizing the log-likelihood instead ...
Since all of the internal functions (+, lchoose, lbeta) are vectorized, you should be able to apply this across the whole data set via:
zipleads <- data.frame(Leads=c(7,4,2),Sales=c(3,1,0))
objfun <- function(p) { ## negative log-likelihood
-sum(betafcn(p[1],p[2],zipleads$Leads,zipleads$Sales,
log=TRUE))
}
objfun(c(1,1))
optim(fn=objfun,par=c(1,1))
I got crazy answers for this example (extremely large values of both shape parameters), but I think that's because it's awfully hard to fit a two-parameter model to three data points!
Since the shape parameters of the beta-binomial (which is what this appears to be) have to be positive, you might run into trouble with unconstrained optimization. You can use method="L-BFGS-B", lower=c(0,0) or optimize the parameters on the log scale ...
I thought your example was hopelessly complex. If you are going to attemp making a function by pasting character values, you first need to understand how to make a function body with an unevaluated expression, and after that basic task is understood, then you can elaborate ... if in fact it is necessary, noting BenBolker's suggestions.
choosefcn <- function (a,b) {}
txtxpr <- paste0("choose(",9,",",6,")" )
body(choosefcn) <- parse(text= txtxpr)
#----------
> betafcn
function (a, b)
choose(9, 6)
val1 <- "a"
val2 <- "b"
txtxpr <- paste0("choose(", val1, ",", val2, ")" )
body(choosefcn) <- parse(text= txtxpr)
#
choosefcn
#function (a, b)
#choose(a, b)
It also possible to configure the formal arguments separately with the formals<- function. See each of these help pages:
?formals
?body
?'function' # needs to be quoted
Let me describe my problem by a simplified example
I have a contingency table
datatable<-array(c(1,2,3,4,5,6,dim=c(3,2))
datatable<-cbind(datatable,rowSums(datatable))
datatable<-rbind(datatable,colSums(datatable))
here, m=3,n=2 This is a (m+1)x(n+1)=4x3 table. Now, i want a new mxn array, where the ij'th entrance is a confidence interval (a list of size 2 if you will) for the estimate datatable[i,j]. Below is a function that computes an interval for the estimate datatable[i,j] from the arguments a<-datatable[i,j] and b<-datatable[i,n+1]
CIfunction<-function(a,b) c(-1,1)+a*b
I hope it is not too messy, i couldn't think of an easier example.
How do i create such a table in a elegant way? In the real example, this is a function that should take arbitrary contingency tables and return the corresponding CI table.
I already found a way, using double loop over i resp. j, but it really smells like something that could be done in an elegant way using mapply/outer or thelike.
The code below involves some trickery because it relies on how R recycles vectors (under what circumstances does R recycle?). array and matrix objects are just folded vectors (stored column-wise). It's not hard to come up with tricks like this once you understand the behavior
datatable <- array (1:6, dim=c(3,2))
datatable <-cbind(datatable,rowSums(datatable))
datatable <-rbind(datatable,colSums(datatable))
# last column recycled as necessary
lower <- -datatable[1:3, 1:2] * datatable[1:3, 3]
upper <- -lower
CIlist <- list(lower, upper)
I do suggest, however, that you store datatable without the row and column totals and compute them only on print.
Thank for your comments and answers. I had abit of trouble generalizing the (elegant) method provided by ilir to arbitrary CI functions. I ended up just doing the double loop mentioned above. My solution to the above problem would be something like
CItable<-function(datatable,CIfunction)
{
m<-dim(datatable)[1]-1
n<-dim(datatable)[2]-1
CItable<-array(NA,dim=c(m,2*n))
for(i in 1:m)
{
for(j in 1:n)
{
tempint<-CIfunction(datatable[i,j],datatable[i,n+1])
CItable[i,2*j]<-tempint[2]
CItable[i,2*j-1]<-tempint[1]
}
}
return(CItable)
}