How to create please dependent variables in R ?
For example
a <- 1
b <- a*2
a <- 2
b
# [1] 2
But I expect the result 4. How can R maintained relations automatically ?
Thank you very much
Explanation - I'm trying to create something as excel spreeadsheet with the relationships (formula or functions) between cells. Input for R is for examle csv (same values, some function or formula) and output only values
It sounds like you're looking for makeActiveBinding
a <- 1
makeActiveBinding('b', function() a * 2, .GlobalEnv)
b
# [1] 2
a <- 2
b
# [1] 4
The syntax is simpler if you want to use Hadley's nifty pryr package:
library(pryr)
b %<a-% (a * 2)
Most people don't expect variables to behave like this, however. So if you're writing code that others will be reading, I don't recommend using this feature of R. Explicitly update b when a changes or make b a function of a.
Warning: This isn't a good idea and task callbacks really should only be used if you know what you're doing.
You can do something like this but it's tedious and there are better ways to achieve your goal. You can make a function that will be called after every top level evaluation that basically does the reassignment for you.
modified <- function(expr, value, ok, visible){
if(exists("a")){
assign("b", a*2, env = .GlobalEnv)
}
return(TRUE)
}
addTaskCallback(modified)
After running that you should be able to get this...
> a
Error: object 'a' not found
> b
Error: object 'b' not found
> a <- 2
> a
[1] 2
> b
[1] 4
> a <- 3
> a
[1] 3
> b
[1] 6
Note that if you want to emulate a spreadsheet it would probably just be better to define a function to take your input and do all the necessary calculations to get your desired output. R isn't Excel and it would be best if you don't treat it like Excel.
R doesn't work like that. Variables only change when assigned new values. This is a good thing, because it means things don't change magically. Suppose in 20 lines time you want to know the value of b? When did it change? What does it depend on?
R is not a spreadsheet.
Just to spell it out a bit more.
sales = 100
costs = 90
profit = sales - costs
now profit has the value 10.
sales = 120
Only sales has changed.
profit = sales - costs
That changes profits to 30.
If you have a complex calculation you would normally write a function:
computeProfit = function(sales, costs){return(sales - costs)}
and then do:
profit = computeProfit(sales, costs)
whenever you want to compute the profits from the sales and the costs.
Although what you want to do is not completely possible in R, with a simple modification of b into a function and thanks to lexical scoping, you actually can have a "dependent variable" (sort of).
Define a:
a <- 1
Define b like this:
b <- function() {
a*2
}
Then, instead of using b to get the value of b, use b()
b() ##gives 2
a <- 4
b() ##gives 8
Related
I'm trying to write a code to approximate the following infinite Taylor series from the Theis hydrogeological equation in R.
I'm pretty new to functional programming, so this was a challenge! This is my attempt:
Wu <- function(u, repeats = 100) {
result <- numeric(repeats)
for (i in seq_along(result)){
result[i] <- -((-u)^i)/(i * factorial(i))
}
return(sum(result) - log(u)-0.5772)
}
I've compared the results with values from a data table available here: https://pubs.usgs.gov/wsp/wsp1536-E/pdf/wsp_1536-E_b.pdf - see below (excuse verbose code - should have made a csv, with hindsight):
Wu_QC <- data.frame(u = c(1.0*10^-15, 4.1*10^-14,9.9*10^-13, 7.0*10^-12, 3.7*10^-11,
2.3*10^-10, 6.8*10^-9, 5.7*10^-8, 8.4*10^-7, 6.3*10^-6,
3.1*10^-5, 7.4*10^-4, 5.1*10^-3, 2.9*10^-2,8.7*10^-1,
4.6,9.90),
Wu_table = c(33.9616, 30.2480, 27.0639, 25.1079, 23.4429,
21.6157, 18.2291, 16.1030, 13.4126, 11.3978,
9.8043,6.6324, 4.7064,2.9920,0.2742,
0.001841,0.000004637))
Wu_QC$rep_100 <- Wu(Wu_QC$u,100)
The good news is the formula gives identical results for repeats = 50, 100, 150 and 170 (so I've just given you the 100 version above). The bad news is that, while the function performs well for u < ~10^-3, it goes off the rails and gives negative outputs for numbers within an order of magnitude or so of 1. This doesn't happen when I just call the function on an individual number. i.e:
> Wu(4.6)
[1] 0.001856671
Which is the correct answer to 2sf.
Can anyone spot what I've done wrong and/or suggest a better way to code this equation? I think the problem is something to do with my for loop and/or an issue with the factorials generating infinite numbers as u gets larger, but I'm not at all certain.
Thanks!
As it says on page 93 of your reference, W is also known as the exponential integral. See also here.
Then, e.g., the package expint provides a function to compute W(u):
library(expint)
expint(10^(-8))
# [1] 17.84347
expint(4.6)
# [1] 0.001841006
where the results are exactly as in your referred table.
You can write a function that takes in a value together with the repetition times and outputs the required value:
w=function(u,l){
a=2:l
-0.5772-log(u)+u+sum(u^(a)*rep(c(-1,1),length=l-1)/(a)/factorial(a))
}
transform(Wu_QC,new=Vectorize(w)(u,170))
u Wu_table new
1 1.0e-15 3.39616e+01 3.396158e+01
2 4.1e-14 3.02480e+01 3.024800e+01
3 9.9e-13 2.70639e+01 2.706387e+01
4 7.0e-12 2.51079e+01 2.510791e+01
5 3.7e-11 2.34429e+01 2.344290e+01
6 2.3e-10 2.16157e+01 2.161574e+01
7 6.8e-09 1.82291e+01 1.822914e+01
8 5.7e-08 1.61030e+01 1.610301e+01
9 8.4e-07 1.34126e+01 1.341266e+01
10 6.3e-06 1.13978e+01 1.139777e+01
11 3.1e-05 9.80430e+00 9.804354e+00
12 7.4e-04 6.63240e+00 6.632400e+00
13 5.1e-03 4.70640e+00 4.706408e+00
14 2.9e-02 2.99200e+00 2.992051e+00
15 8.7e-01 2.74200e-01 2.741930e-01
16 4.6e+00 1.84100e-03 1.856671e-03
17 9.9e+00 4.63700e-06 2.030179e-05
As the numbers become large the estimation is not quite good, so we should have to go further than 170! but R cannot do that. Maybe you can try other platforms. ie Python
I think I may have solved this myself (though borrowing heavily from Onyambo's answer!) Here's my code:
well_func2 <- function (u, l = 100) {
result <- numeric(length(u))
a <- 2:l
for(i in seq_along(u)){
result[i] <- -0.5772-log(u[i])+u[i]+sum(u[i]^(a)*rep(c(-1,1),length=l-1)/(a)/factorial(a))
}
return(result)
}
As far as I can tell so far, this matches the tabulated results well for u <5 (as did Onyambo's code), and it also gives the same result for vector vs single-value inputs.
Still needs a bit more testing, and there's probably a tidier way to code it using map() or similar instead of the for loop, but I'm happy enough for now. Thought I'd share in case anyone else has the same problem.
so I have a loop that finds the position in the matrix where there is the largest difference in consecutive elements. For example, if thematrix[8] and thematrix[9] have the largest difference between any two consecutive elements, the number given should be 8.
I made the loop in a way that it will ignore comparisons where one of the elements is NaN (because I have some of those in my data). The loop I made looks like this.
thenumber = 0 #will store the difference
for (i in 1:nrow(thematrix) - 1) {
if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) {
if (abs(thematrix[i] - thematrix[i + 1]) > thenumber) {
thenumber = i
}
}
}
This looks like it should work but whenever I run it
Error in if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) { :
argument is of length zero
I tried this thing but with a random number in the brackets instead of i and it works. For some reason it only doesn't work when I use the i specified in the beginning of the for-loop. It doesn't recognize that i represents a number. Why doesn't R recognize i?
Also, if there's a better way to do this task I'd appreciate it greatly if you could explain it to me
You are pretty close but when you call i in 1:nrow(thematrix) - 1 R evaluates this to make i = 0 which is what causes this issue. I would suggest either calling i in 1:nrow(thematrix) or i in 2:nrow(thematrix) - 1 to start your loop at i = 1. I think your approach is generally pretty intuitive but one suggestion would be to frequently use the print() function to evaluate how i changes over the course of your function.
The issue is that the : operator has higher precedence than -; you just need to use parentheses around (nrow(thematrix)-1). For example,
thematrix <- matrix(1:10, nrow = 5)
##
wrong <- 1:nrow(thematrix) - 1
right <- 1:(nrow(thematrix) - 1)
##
R> wrong
#[1] 0 1 2 3 4
R> right
#[1] 1 2 3 4
Where the error message is coming from trying to access the zero-th element of thematrix:
R> thematrix[0]
integer(0)
The other two answers address your question directly, but I must say this is about the worst possible way to solve this problem in R.
set.seed(1) # for reproducible example
x <- sample(1:10,10) # numbers 1:10 in random order
x
# [1] 3 4 5 7 2 8 9 6 10 1
which.max(abs(diff(x)))
# [1] 9
The diff(...) function calculates sequential differences, and which.max(...) identifies the element number of the maximum value in a vector.
Is there a similar function in jags as the R function rep? I want to create an array using similar code as the following:
n ~ dmulti(pi, N) # pi is a 3 dimensional probability vector, N is fixed
# the dimension of n is hard coded in this line:
a <- c(rep(0, n[1]), rep(1, n[2]), rep(2, n[3]))
I read through the manual and wasn't able to find a way to achieve this. I understand that Stan would probably allow this but I couldn't use Stan because I need to do inference on discrete parameters. I really appreciate your help!
This question is also posted on the JAGS help forum.
I have added a rep function to the development version (future JAGS 4.0.0) as Matt and John have alluded to, this requires the second argument to be fixed so that the length of the resulting vector can be determined at compile time.
The short answer is no, I'm afraid not. One of the stipulations of the JAGS/BUGS language is that variables must have fixed dimensions (with every element defined exactly once) - in your example a will change dimension size depending on the vector n. There may be other ways to get the result you are looking for, but not using this approach.
Incidentally, you use n twice in that bit of code (LHS and RHS of the multinominal distribution) which is not allowed - although that may just be a typo :)
Matt
You could populate your vector with some loops:
library(R2jags)
M <- function() {
for (i in 1:n[1]) {
a[i] <- 0
}
for (i in 1:n[2]) {
a[i + n[1]] <- 1
}
for (i in 1:n[3]) {
a[i + sum(n[1:2])] <- 2
}
}
j <- jags(list(n=3:5), NULL, 'a', M, DIC=FALSE)
j$BUGSoutput$mean$a
## [1] 0 0 0 1 1 1 1 2 2 2 2 2
However, as #MattDenwood alluded to, if the sum of the elements of n is variable this will throw an error - a must be of constant length throughout the simulation.
In Matlab, there is a 1-D filter function http://www.mathworks.com/help/matlab/ref/filter.html .
In R's signal package, the description of its filter function states: Generic filtering function. The default is to filter with an ARMA filter of given coefficients. The default filtering operation follows Matlab/Octave conventions.
However, the answers don't match if I give the same specification.
In MATLAB (correct answer):
x=[4 3 5 2 7 3]
filter(2/3,[1 -1/3],x,x(1)*1/3)
ans =
4.0000 3.3333 4.4444 2.8148 5.6049 3.8683
In R, if I follow Matlab/Octave's convention (incorrect answer):
library(signal)
x<-c(4,3,5,2,7,3)
filter(2/3,c(1,-1/3),x,x[1]*1/3)
Time Series:
Start = 1
End = 6
Frequency = 1
[1] 3.111111 3.037037 4.345679 2.781893 5.593964 3.864655
I tried a lot of other examples too. R's signal package's filter function doesn't appear to follow the Matlab/Octave conventions even though the document states it so. Perhaps, I'm using the filter function incorrectly in R. Can someone help me?
I believe the answer is in the documentation (shock!!!!)
matlab:
The filter is a "Direct Form II Transposed"
implementation of the standard difference equation:
a(1)*y(n) = b(1)*x(n) + b(2)*x(n-1) + ... + b(nb+1)*x(n-nb)
- a(2)*y(n-1) - ... - a(na+1)*y(n-na)
If a(1) is not equal to 1, filter normalizes the filter coefficients by a(1).
[emphasis mine]
R:
a[1]*y[n] + a[2]*y[n-1] + … + a[n]*y[1] = b[1]*x[n] + b[2]*x[m-1] + … + b[m]*x[1]
Thanks for lifting this issue a couple of years back... I bumped into it as well and think I got an answer. Essentially I think the optimization algos are different for R and Matlab.
If no guess is provided (that is, set the initial values to default which is zero for both R and Matlab), the results are very similar.
R
library(signal)
x<-c(4,3,5,2,7,3)
filter(2/3,cbind(1,-1/3),x, 0.00)
2.666667 2.888889 4.296296 2.765432 5.588477 3.862826
Matlab
x=[4 3 5 2 7 3]
filter(2/3,[1 -1/3],x,0.00)
2.6667 2.8889 4.2963 2.7654 5.5885 3.8628
Now, if we start tweaking the initial guess of the parameters, then the results will diverge.
R
library(signal)
x<-c(4,3,5,2,7,3)
filter(2/3,cbind(1,-1/3),x, 0.05)
2.683333 2.894444 4.298148 2.766049 5.588683 3.862894
Matlab
x=[4 3 5 2 7 3]
filter(2/3,[1 -1/3],x,0.05)
2.7167 2.9056 4.3019 2.7673 5.5891 3.8630
Hope it helps!
I'm working on a dataset that consists of ~10^6 values which clustered into a variable number of bins. In the course of my analysis, I am trying to randomize my clustering, but keeping bin size constant. As a toy example (in pseudocode), this would look something like this:
data <- list(c(1,5,6,3), c(2,4,7,8), c(9), c(10,11,15), c(12,13,14));
sizes <- lapply(data, length);
for (rand in 1:no.of.randomizations) {
rand.data <- partition.sample(seq(1,15), partitions=sizes, replace=F)
}
So, I am looking for a function like "partition.sample" that will take a vector (like seq(1,15)) and randomly sample from it, returning a list with the data partitioned into the right bin sizes given already by "sizes".
I've been trying to write one such function myself, since the task seems to be not so hard. However, the partitioning of a vector into given bin sizes looks like it would be a lot faster and more efficient if done "under the hood", meaning probably not in native R. So I wonder whether I have just missed the name of the appropriate function, or whether someone could please point me to a smart solution that is around :-)
Your help & time are very much appreciated! :-)
Best,
Lymond
UPDATE:
By "no.of.randomizations" I mean the actual number of times I run through the whole "randomization loop". This will, later on, obviously include more steps than just the actual sampling.
Moreover, I would in addition be interested in a trick to do the above feat for sampling without replacement.
Thanks in advance, your help is very much appreciated!
Revised: This should be fairly efficient. It's complexity should be primarily in the permutation step:
# A single step:
x <- sample( unlist(data))
list( one=x[1:4], two=x[5:8], three=x[9], four=x[10:12], five=x[13:16])
As mentioned above the "no.of.randomizations" may have been the number of repeated applications of this proces, in which case you may want to wrap replicate around that:
replic <- replicate(n=4, { x <- sample(unlist(data))
list( x[1:4], x[5:8], x[9], x[10:12], x[13:15]) } )
After some more thinking and googling, I have come up with a feasible solution. However, I am still not convinced that this is the fastest and most efficient way to go.
In principle, I can generate one long vector of a uniqe permutation of "data" and then split it into a list of vectors of lengths "sizes" by going via a factor argument supplied to split. For this, I need an additional ID scheme for my different groups of "data", which I happen to have in my case.
It becomes clearer when viewed as code:
data <- list(c(1,5,6,3), c(2,4,7,8), c(9), c(10,11,15), c(12,13,14));
sizes <- lapply(data, length);
So far, everything as above
names <- c("set1", "set2", "set3", "set4", "set5");
In my case, I am lucky enough to have "names" already provided from the data. Otherwise, I would have to obtain them as (e.g.)
names <- seq(1, length(data));
This "names" vector can then be expanded by "sizes" using rep:
cut.by <- rep(names, times = sizes);
[1] 1 1 1 1 2 2 2 2 3 4 4 4 5
[14] 5 5
This new vector "cut.by" can then by provided as argument to split()
rand.data <- split(sample(1:15, 15), cut.by)
$`1`
[1] 8 9 14 4
$`2`
[1] 10 2 15 13
$`3`
[1] 12
$`4`
[1] 11 3 5
$`5`
[1] 7 6 1
This does the job I was looking for alright. It samples from the background "1:15" and splits the result into vectors of lengths "sizes" through the vector "cut.by".
However, I am still not happy to have to go via an additional (possibly) long vector to indicate the split positions, such as "cut.by" in the code above. This definitely works, but for very long data vectors, it could become quite slow, I guess.
Thank you anyway for the answers and pointers provided! Your help is very much appreciated :-)