Similar function to R "rep" in jags to create array? - r

Is there a similar function in jags as the R function rep? I want to create an array using similar code as the following:
n ~ dmulti(pi, N) # pi is a 3 dimensional probability vector, N is fixed
# the dimension of n is hard coded in this line:
a <- c(rep(0, n[1]), rep(1, n[2]), rep(2, n[3]))
I read through the manual and wasn't able to find a way to achieve this. I understand that Stan would probably allow this but I couldn't use Stan because I need to do inference on discrete parameters. I really appreciate your help!
This question is also posted on the JAGS help forum.

I have added a rep function to the development version (future JAGS 4.0.0) as Matt and John have alluded to, this requires the second argument to be fixed so that the length of the resulting vector can be determined at compile time.

The short answer is no, I'm afraid not. One of the stipulations of the JAGS/BUGS language is that variables must have fixed dimensions (with every element defined exactly once) - in your example a will change dimension size depending on the vector n. There may be other ways to get the result you are looking for, but not using this approach.
Incidentally, you use n twice in that bit of code (LHS and RHS of the multinominal distribution) which is not allowed - although that may just be a typo :)
Matt

You could populate your vector with some loops:
library(R2jags)
M <- function() {
for (i in 1:n[1]) {
a[i] <- 0
}
for (i in 1:n[2]) {
a[i + n[1]] <- 1
}
for (i in 1:n[3]) {
a[i + sum(n[1:2])] <- 2
}
}
j <- jags(list(n=3:5), NULL, 'a', M, DIC=FALSE)
j$BUGSoutput$mean$a
## [1] 0 0 0 1 1 1 1 2 2 2 2 2
However, as #MattDenwood alluded to, if the sum of the elements of n is variable this will throw an error - a must be of constant length throughout the simulation.

Related

R Precision for Double - Why code returns negative why positive outcome expected?

I am testing 2 ways of calculating Prod(b-a), where a and b are vectors of length n. Prod(b-a)=(b1-a1)(b2-a2)(b3-a3)*... (bn-an), where b_i>a_i>0 for all i=1,2,3, n. For some special cases, another way (Method 2) of calculation this prod(b-a) is more efficient. It uses the following formula, which is to expand the terms and sum them:
Here is my question is: When it happens that a_i very close to b_i, the true outcome could be very, very close 0, something like 10^(-16). Method 1 (substract and Multiply) always returns positive output. Method 2 of using the formula some times return negative output ( about 7~8% of time returning negative for my experiment). Mathematically, these 2 methods should return exactly the same output. But in computer language, it apparently produces different outputs.
Here are my codes to run the test. When I run the testing code for 10000 times, about 7~8% of my runs for method 2 returns negative output. According to the official document, the R double has the precision of "2.225074e-308" as indicated by R parameter: ".Machine$double.xmin". Why it's getting into the negative values when the differences are between 10^(-16) ~ 10^(-18)? Any help that sheds light on this will be apprecaited. I would also love some suggestions concerning how to practically increase the precision to higher level as indicated by R document.
########## Testing code 1.
ftest1case<-function(a,b) {
n<-length(a)
if (length(b)!=n) stop("--------- length a and b are not right.")
if ( any(b<a) ) stop("---------- b has to be greater than a all the time.")
out1<-prod(b-a)
out2<-0
N<-2^n
for ( i in 1:N ) {
tidx<-rev(as.integer(intToBits(x=i-1))[1:n])
tsign<-ifelse( (sum(tidx)%%2)==0,1.0,-1.0)
out2<-out2+tsign*prod(b[tidx==0])*prod(a[tidx==1])
}
c(out1,out2)
}
########## Testing code 2.
ftestManyCases<-function(N,printFreq=1000,smallNum=10^(-20))
{
tt<-matrix(0,nrow=N,ncol=2)
n<-12
for ( i in 1:N) {
a<-runif(n,0,1)
b<-a+runif(n,0,1)*0.1
tt[i,]<-ftest1case(a=a,b=b)
if ( (i%%printFreq)==0 ) cat("----- i = ",i,"\n")
if ( tt[i,2]< smallNum ) cat("------ i = ",i, " ---- Negative summation found.\n")
}
tout<-apply(tt,2,FUN=function(x) { round(sum(x<smallNum)/N,6) } )
names(tout)<-c("PerLess0_Method1","PerLee0_Method2")
list(summary=tout, data=tt)
}
######## Step 1. Test for 1 case.
n<-12
a<-runif(n,0,1)
b<-a+runif(n,0,1)*0.1
ftest1case(a=a,b=b)
######## Step 2 Test Code 2 for multiple cases.
N<-300
tt<-ftestManyCases(N=N,printFreq = 100)
tt[[1]]
It's hard for me to imagine when an algorithm that consists of generating 2^n permutations and adding them up is going to be more efficient than a straightforward product of differences, but I'll take your word for it that there are some special cases where it is.
As suggested in comments, the root of your problem is the accumulation of floating-point errors when adding values of different magnitudes; see here for an R-specific question about floating point and here for the generic explanation.
First, a simplified example:
n <- 12
set.seed(1001)
a <- runif(a,0,1)
b <- a + 0.01
prod(a-b) ## 1e-24
out2 <- 0
N <- 2^n
out2v <- numeric(N)
for ( i in 1:N ) {
tidx <- rev(as.integer(intToBits(x=i-1))[1:n])
tsign <- ifelse( (sum(tidx)%%2)==0,1.0,-1.0)
j <- as.logical(tidx)
out2v[i] <- tsign*prod(b[!j])*prod(a[j])
}
sum(out2v) ## -2.011703e-21
Using extended precision (with 1000 bits of precision) to check that the simple/brute force calculation is more reliable:
library(Rmpfr)
a_m <- mpfr(a, 1000)
b_m <- mpfr(b, 1000)
prod(a_m-b_m)
## 1.00000000000000857647286522936696473705868726043995807429578968484409120647055193862325070279593735821154440625984047036486664599510856317884962563644275433171621778761377125514191564456600405460403870124263023336542598111475858881830547350667868450934867675523340703947491662460873009229537576817962228e-24
This proves the point in this case, but in general doing extended-precision arithmetic will probably kill any performance gains you would get.
Redoing the permutation-based calculation with mpfr values (using out2 <- mpfr(0, 1000), and going back to the out2 <- out2 + ... running summation rather than accumulating the values in a vector and calling sum()) gives an accurate answer (at least to the first 20 or so digits, I didn't check farther), but takes 6.5 seconds on my machine (instead of 0.03 seconds when using regular floating-point).
Why is this calculation problematic? First, note the difference between .Machine$double.xmin (approx 2e-308), which is the smallest floating-point value that the system can store, and .Machine$double.eps (approx 2e-16), which is the smallest value such that 1+x > x, i.e. the smallest relative value that can be added without catastrophic cancellation (values a little bit bigger than this magnitude will experience severe, but not catastrophic, cancellation).
Now look at the distribution of values in out2v, the series of values in out2v:
hist(out2v)
There are clusters of negative and positive numbers of similar magnitude. If our summation happens to add a bunch of values that almost cancel (so that the result is very close to 0), then add that to another value that is not nearly zero, we'll get bad cancellation.
It's entirely possible that there's a way to rearrange this calculation so that bad cancellation doesn't happen, but I couldn't think of one easily.

modelling an infinite series in R

I'm trying to write a code to approximate the following infinite Taylor series from the Theis hydrogeological equation in R.
I'm pretty new to functional programming, so this was a challenge! This is my attempt:
Wu <- function(u, repeats = 100) {
result <- numeric(repeats)
for (i in seq_along(result)){
result[i] <- -((-u)^i)/(i * factorial(i))
}
return(sum(result) - log(u)-0.5772)
}
I've compared the results with values from a data table available here: https://pubs.usgs.gov/wsp/wsp1536-E/pdf/wsp_1536-E_b.pdf - see below (excuse verbose code - should have made a csv, with hindsight):
Wu_QC <- data.frame(u = c(1.0*10^-15, 4.1*10^-14,9.9*10^-13, 7.0*10^-12, 3.7*10^-11,
2.3*10^-10, 6.8*10^-9, 5.7*10^-8, 8.4*10^-7, 6.3*10^-6,
3.1*10^-5, 7.4*10^-4, 5.1*10^-3, 2.9*10^-2,8.7*10^-1,
4.6,9.90),
Wu_table = c(33.9616, 30.2480, 27.0639, 25.1079, 23.4429,
21.6157, 18.2291, 16.1030, 13.4126, 11.3978,
9.8043,6.6324, 4.7064,2.9920,0.2742,
0.001841,0.000004637))
Wu_QC$rep_100 <- Wu(Wu_QC$u,100)
The good news is the formula gives identical results for repeats = 50, 100, 150 and 170 (so I've just given you the 100 version above). The bad news is that, while the function performs well for u < ~10^-3, it goes off the rails and gives negative outputs for numbers within an order of magnitude or so of 1. This doesn't happen when I just call the function on an individual number. i.e:
> Wu(4.6)
[1] 0.001856671
Which is the correct answer to 2sf.
Can anyone spot what I've done wrong and/or suggest a better way to code this equation? I think the problem is something to do with my for loop and/or an issue with the factorials generating infinite numbers as u gets larger, but I'm not at all certain.
Thanks!
As it says on page 93 of your reference, W is also known as the exponential integral. See also here.
Then, e.g., the package expint provides a function to compute W(u):
library(expint)
expint(10^(-8))
# [1] 17.84347
expint(4.6)
# [1] 0.001841006
where the results are exactly as in your referred table.
You can write a function that takes in a value together with the repetition times and outputs the required value:
w=function(u,l){
a=2:l
-0.5772-log(u)+u+sum(u^(a)*rep(c(-1,1),length=l-1)/(a)/factorial(a))
}
transform(Wu_QC,new=Vectorize(w)(u,170))
u Wu_table new
1 1.0e-15 3.39616e+01 3.396158e+01
2 4.1e-14 3.02480e+01 3.024800e+01
3 9.9e-13 2.70639e+01 2.706387e+01
4 7.0e-12 2.51079e+01 2.510791e+01
5 3.7e-11 2.34429e+01 2.344290e+01
6 2.3e-10 2.16157e+01 2.161574e+01
7 6.8e-09 1.82291e+01 1.822914e+01
8 5.7e-08 1.61030e+01 1.610301e+01
9 8.4e-07 1.34126e+01 1.341266e+01
10 6.3e-06 1.13978e+01 1.139777e+01
11 3.1e-05 9.80430e+00 9.804354e+00
12 7.4e-04 6.63240e+00 6.632400e+00
13 5.1e-03 4.70640e+00 4.706408e+00
14 2.9e-02 2.99200e+00 2.992051e+00
15 8.7e-01 2.74200e-01 2.741930e-01
16 4.6e+00 1.84100e-03 1.856671e-03
17 9.9e+00 4.63700e-06 2.030179e-05
As the numbers become large the estimation is not quite good, so we should have to go further than 170! but R cannot do that. Maybe you can try other platforms. ie Python
I think I may have solved this myself (though borrowing heavily from Onyambo's answer!) Here's my code:
well_func2 <- function (u, l = 100) {
result <- numeric(length(u))
a <- 2:l
for(i in seq_along(u)){
result[i] <- -0.5772-log(u[i])+u[i]+sum(u[i]^(a)*rep(c(-1,1),length=l-1)/(a)/factorial(a))
}
return(result)
}
As far as I can tell so far, this matches the tabulated results well for u <5 (as did Onyambo's code), and it also gives the same result for vector vs single-value inputs.
Still needs a bit more testing, and there's probably a tidier way to code it using map() or similar instead of the for loop, but I'm happy enough for now. Thought I'd share in case anyone else has the same problem.

Function in R does not return vector

I am sure this is a really dumb question with a simple answer, but I have been banging my head against the desk for an hour now. The goal is to write a simple function that returns a vector of n length consisting of integers spaces as evenly as possible, from 1 to k. So:
place_in_groups <- function(n, k){
rate = (n - 1) / (k - 1)
vect <- round(seq(from = 1, to = n, by = rate), 0)
return(vect)
}
When I run the lines inside the function on the outside of the function, it does what I want it to do: creates a vector with the appropriate values. But when I run it inside the vector, I get the actual values, not the vector:
place_in_groups(4,5)
[1] 1 2 2 3 4
As I said, I'm sure it is something obvious I'm doing wrong, but it is also something I'm obviously in need of learning.
I'm not shure I undestand the question correctly. Try str() on your results. What you are getting is a vector.
vect <- place_in_groups(4,5)
str(vect)
num [1:5] 1 2 2 3 4
What do you want to do with the vector, or what is your challenge?

i not showing up as number in loop

so I have a loop that finds the position in the matrix where there is the largest difference in consecutive elements. For example, if thematrix[8] and thematrix[9] have the largest difference between any two consecutive elements, the number given should be 8.
I made the loop in a way that it will ignore comparisons where one of the elements is NaN (because I have some of those in my data). The loop I made looks like this.
thenumber = 0 #will store the difference
for (i in 1:nrow(thematrix) - 1) {
if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) {
if (abs(thematrix[i] - thematrix[i + 1]) > thenumber) {
thenumber = i
}
}
}
This looks like it should work but whenever I run it
Error in if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) { :
argument is of length zero
I tried this thing but with a random number in the brackets instead of i and it works. For some reason it only doesn't work when I use the i specified in the beginning of the for-loop. It doesn't recognize that i represents a number. Why doesn't R recognize i?
Also, if there's a better way to do this task I'd appreciate it greatly if you could explain it to me
You are pretty close but when you call i in 1:nrow(thematrix) - 1 R evaluates this to make i = 0 which is what causes this issue. I would suggest either calling i in 1:nrow(thematrix) or i in 2:nrow(thematrix) - 1 to start your loop at i = 1. I think your approach is generally pretty intuitive but one suggestion would be to frequently use the print() function to evaluate how i changes over the course of your function.
The issue is that the : operator has higher precedence than -; you just need to use parentheses around (nrow(thematrix)-1). For example,
thematrix <- matrix(1:10, nrow = 5)
##
wrong <- 1:nrow(thematrix) - 1
right <- 1:(nrow(thematrix) - 1)
##
R> wrong
#[1] 0 1 2 3 4
R> right
#[1] 1 2 3 4
Where the error message is coming from trying to access the zero-th element of thematrix:
R> thematrix[0]
integer(0)
The other two answers address your question directly, but I must say this is about the worst possible way to solve this problem in R.
set.seed(1) # for reproducible example
x <- sample(1:10,10) # numbers 1:10 in random order
x
# [1] 3 4 5 7 2 8 9 6 10 1
which.max(abs(diff(x)))
# [1] 9
The diff(...) function calculates sequential differences, and which.max(...) identifies the element number of the maximum value in a vector.

Dynamic variables in base R

How to create please dependent variables in R ?
For example
a <- 1
b <- a*2
a <- 2
b
# [1] 2
But I expect the result 4. How can R maintained relations automatically ?
Thank you very much
Explanation - I'm trying to create something as excel spreeadsheet with the relationships (formula or functions) between cells. Input for R is for examle csv (same values, some function or formula) and output only values
It sounds like you're looking for makeActiveBinding
a <- 1
makeActiveBinding('b', function() a * 2, .GlobalEnv)
b
# [1] 2
a <- 2
b
# [1] 4
The syntax is simpler if you want to use Hadley's nifty pryr package:
library(pryr)
b %<a-% (a * 2)
Most people don't expect variables to behave like this, however. So if you're writing code that others will be reading, I don't recommend using this feature of R. Explicitly update b when a changes or make b a function of a.
Warning: This isn't a good idea and task callbacks really should only be used if you know what you're doing.
You can do something like this but it's tedious and there are better ways to achieve your goal. You can make a function that will be called after every top level evaluation that basically does the reassignment for you.
modified <- function(expr, value, ok, visible){
if(exists("a")){
assign("b", a*2, env = .GlobalEnv)
}
return(TRUE)
}
addTaskCallback(modified)
After running that you should be able to get this...
> a
Error: object 'a' not found
> b
Error: object 'b' not found
> a <- 2
> a
[1] 2
> b
[1] 4
> a <- 3
> a
[1] 3
> b
[1] 6
Note that if you want to emulate a spreadsheet it would probably just be better to define a function to take your input and do all the necessary calculations to get your desired output. R isn't Excel and it would be best if you don't treat it like Excel.
R doesn't work like that. Variables only change when assigned new values. This is a good thing, because it means things don't change magically. Suppose in 20 lines time you want to know the value of b? When did it change? What does it depend on?
R is not a spreadsheet.
Just to spell it out a bit more.
sales = 100
costs = 90
profit = sales - costs
now profit has the value 10.
sales = 120
Only sales has changed.
profit = sales - costs
That changes profits to 30.
If you have a complex calculation you would normally write a function:
computeProfit = function(sales, costs){return(sales - costs)}
and then do:
profit = computeProfit(sales, costs)
whenever you want to compute the profits from the sales and the costs.
Although what you want to do is not completely possible in R, with a simple modification of b into a function and thanks to lexical scoping, you actually can have a "dependent variable" (sort of).
Define a:
a <- 1
Define b like this:
b <- function() {
a*2
}
Then, instead of using b to get the value of b, use b()
b() ##gives 2
a <- 4
b() ##gives 8

Resources