Why would an R function not write to the environment? - r

I'm trying to write a relatively simple AR(1) representation in R. I cannot find any glaring issues with this code, and furthermore I am returning not errors, it simple isn't writing to the environment, or recognizing areone2 as a function. Any suggestions would be much appreciated.
areone2<-function(y,N,p,d){
yvec<-c(rep(y, times = N))
for(i in 1:N){
yvec[i+1]<-
((1+p*(yvec[i]-d))
+ d)
}
plot(yvec, type='l', xlab="N", ylab="yeild")
}
areone2(.3,10,.9,.2)

It doesn't trigger an error or warning because you broke the line in the middle of a binary operation, but that binary operation wasn't recognized by the parser. It's perfectly legal to begin a line with + 3 It's just 3, which is not what you want.
For example, 2 + 3 is 5, we would expect. But +3 on a new line does not add it to the previous line
> 2 ## break the line here and R returns 2
[1] 2
> +3 ## adding three next is not recognized as a continuation of a call
[1] 3
But you can still break the line if you wrap the call in parentheses (not brackets)
(2
+ 3)
# [1] 5 ## correct
{2
+ 3}
# [1] 3 ## incorrect
Bringing your yvec[]<- assignment call up onto one line is the cleaner and safer way to go.
yvec[i+1] <- ((1+p*(yvec[i]-d)) + d)

Related

modelling an infinite series in R

I'm trying to write a code to approximate the following infinite Taylor series from the Theis hydrogeological equation in R.
I'm pretty new to functional programming, so this was a challenge! This is my attempt:
Wu <- function(u, repeats = 100) {
result <- numeric(repeats)
for (i in seq_along(result)){
result[i] <- -((-u)^i)/(i * factorial(i))
}
return(sum(result) - log(u)-0.5772)
}
I've compared the results with values from a data table available here: https://pubs.usgs.gov/wsp/wsp1536-E/pdf/wsp_1536-E_b.pdf - see below (excuse verbose code - should have made a csv, with hindsight):
Wu_QC <- data.frame(u = c(1.0*10^-15, 4.1*10^-14,9.9*10^-13, 7.0*10^-12, 3.7*10^-11,
2.3*10^-10, 6.8*10^-9, 5.7*10^-8, 8.4*10^-7, 6.3*10^-6,
3.1*10^-5, 7.4*10^-4, 5.1*10^-3, 2.9*10^-2,8.7*10^-1,
4.6,9.90),
Wu_table = c(33.9616, 30.2480, 27.0639, 25.1079, 23.4429,
21.6157, 18.2291, 16.1030, 13.4126, 11.3978,
9.8043,6.6324, 4.7064,2.9920,0.2742,
0.001841,0.000004637))
Wu_QC$rep_100 <- Wu(Wu_QC$u,100)
The good news is the formula gives identical results for repeats = 50, 100, 150 and 170 (so I've just given you the 100 version above). The bad news is that, while the function performs well for u < ~10^-3, it goes off the rails and gives negative outputs for numbers within an order of magnitude or so of 1. This doesn't happen when I just call the function on an individual number. i.e:
> Wu(4.6)
[1] 0.001856671
Which is the correct answer to 2sf.
Can anyone spot what I've done wrong and/or suggest a better way to code this equation? I think the problem is something to do with my for loop and/or an issue with the factorials generating infinite numbers as u gets larger, but I'm not at all certain.
Thanks!
As it says on page 93 of your reference, W is also known as the exponential integral. See also here.
Then, e.g., the package expint provides a function to compute W(u):
library(expint)
expint(10^(-8))
# [1] 17.84347
expint(4.6)
# [1] 0.001841006
where the results are exactly as in your referred table.
You can write a function that takes in a value together with the repetition times and outputs the required value:
w=function(u,l){
a=2:l
-0.5772-log(u)+u+sum(u^(a)*rep(c(-1,1),length=l-1)/(a)/factorial(a))
}
transform(Wu_QC,new=Vectorize(w)(u,170))
u Wu_table new
1 1.0e-15 3.39616e+01 3.396158e+01
2 4.1e-14 3.02480e+01 3.024800e+01
3 9.9e-13 2.70639e+01 2.706387e+01
4 7.0e-12 2.51079e+01 2.510791e+01
5 3.7e-11 2.34429e+01 2.344290e+01
6 2.3e-10 2.16157e+01 2.161574e+01
7 6.8e-09 1.82291e+01 1.822914e+01
8 5.7e-08 1.61030e+01 1.610301e+01
9 8.4e-07 1.34126e+01 1.341266e+01
10 6.3e-06 1.13978e+01 1.139777e+01
11 3.1e-05 9.80430e+00 9.804354e+00
12 7.4e-04 6.63240e+00 6.632400e+00
13 5.1e-03 4.70640e+00 4.706408e+00
14 2.9e-02 2.99200e+00 2.992051e+00
15 8.7e-01 2.74200e-01 2.741930e-01
16 4.6e+00 1.84100e-03 1.856671e-03
17 9.9e+00 4.63700e-06 2.030179e-05
As the numbers become large the estimation is not quite good, so we should have to go further than 170! but R cannot do that. Maybe you can try other platforms. ie Python
I think I may have solved this myself (though borrowing heavily from Onyambo's answer!) Here's my code:
well_func2 <- function (u, l = 100) {
result <- numeric(length(u))
a <- 2:l
for(i in seq_along(u)){
result[i] <- -0.5772-log(u[i])+u[i]+sum(u[i]^(a)*rep(c(-1,1),length=l-1)/(a)/factorial(a))
}
return(result)
}
As far as I can tell so far, this matches the tabulated results well for u <5 (as did Onyambo's code), and it also gives the same result for vector vs single-value inputs.
Still needs a bit more testing, and there's probably a tidier way to code it using map() or similar instead of the for loop, but I'm happy enough for now. Thought I'd share in case anyone else has the same problem.

Function in R does not return vector

I am sure this is a really dumb question with a simple answer, but I have been banging my head against the desk for an hour now. The goal is to write a simple function that returns a vector of n length consisting of integers spaces as evenly as possible, from 1 to k. So:
place_in_groups <- function(n, k){
rate = (n - 1) / (k - 1)
vect <- round(seq(from = 1, to = n, by = rate), 0)
return(vect)
}
When I run the lines inside the function on the outside of the function, it does what I want it to do: creates a vector with the appropriate values. But when I run it inside the vector, I get the actual values, not the vector:
place_in_groups(4,5)
[1] 1 2 2 3 4
As I said, I'm sure it is something obvious I'm doing wrong, but it is also something I'm obviously in need of learning.
I'm not shure I undestand the question correctly. Try str() on your results. What you are getting is a vector.
vect <- place_in_groups(4,5)
str(vect)
num [1:5] 1 2 2 3 4
What do you want to do with the vector, or what is your challenge?

i not showing up as number in loop

so I have a loop that finds the position in the matrix where there is the largest difference in consecutive elements. For example, if thematrix[8] and thematrix[9] have the largest difference between any two consecutive elements, the number given should be 8.
I made the loop in a way that it will ignore comparisons where one of the elements is NaN (because I have some of those in my data). The loop I made looks like this.
thenumber = 0 #will store the difference
for (i in 1:nrow(thematrix) - 1) {
if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) {
if (abs(thematrix[i] - thematrix[i + 1]) > thenumber) {
thenumber = i
}
}
}
This looks like it should work but whenever I run it
Error in if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) { :
argument is of length zero
I tried this thing but with a random number in the brackets instead of i and it works. For some reason it only doesn't work when I use the i specified in the beginning of the for-loop. It doesn't recognize that i represents a number. Why doesn't R recognize i?
Also, if there's a better way to do this task I'd appreciate it greatly if you could explain it to me
You are pretty close but when you call i in 1:nrow(thematrix) - 1 R evaluates this to make i = 0 which is what causes this issue. I would suggest either calling i in 1:nrow(thematrix) or i in 2:nrow(thematrix) - 1 to start your loop at i = 1. I think your approach is generally pretty intuitive but one suggestion would be to frequently use the print() function to evaluate how i changes over the course of your function.
The issue is that the : operator has higher precedence than -; you just need to use parentheses around (nrow(thematrix)-1). For example,
thematrix <- matrix(1:10, nrow = 5)
##
wrong <- 1:nrow(thematrix) - 1
right <- 1:(nrow(thematrix) - 1)
##
R> wrong
#[1] 0 1 2 3 4
R> right
#[1] 1 2 3 4
Where the error message is coming from trying to access the zero-th element of thematrix:
R> thematrix[0]
integer(0)
The other two answers address your question directly, but I must say this is about the worst possible way to solve this problem in R.
set.seed(1) # for reproducible example
x <- sample(1:10,10) # numbers 1:10 in random order
x
# [1] 3 4 5 7 2 8 9 6 10 1
which.max(abs(diff(x)))
# [1] 9
The diff(...) function calculates sequential differences, and which.max(...) identifies the element number of the maximum value in a vector.

R in simple terms - why do I have to feel like such an idiot? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
my question is simple... every reference I find in books and on the internet for learning R programming is presented in a very linear way with no context. When I try and learn things like functions, I see the code and my brain just freezes because it's looking for something to relate these R terms to and I have no frame of reference. I have a PhD and did a lot of statistics for my dissertation but that was years ago when we were using different programming languages and when it comes to R, I don't know why I can't get this into my head. Is there someone who can explain in plain english an example of this simple code? So for example:
above <- function(x, n){
use <- x > n
x[use]
}
x <- 1:20
above(x, 12)
## [1] 13 14 15 16 17 18 19 20
I'm trying to understand what's going on in this code but simply don't. As a result, I could never just write this code on my own because I don't have the language in my head that explains what is happening with this. I get stuck at the first line:
above <- function(x, n) {
Can someone just explain this code sample in plain English so I have some kind of context for understanding what I'm looking at and why I'm doing what I'm doing in this code? And what I mean by plain English is, walking through the code, step by step and not just repeating the official terms from R like vector and function and array and all these other things, but telling me, in a common sense way, what this means.
Since your background ( phd in statsitics) the best way to understand this
is in mathematics words.
Mathematically speaking , you are defining a parametric function named above that extracts all element from a vector x above a certain value n. You are just filtering the set or the vector x.
In sets notation you can write something like :
above:{x,n} --> {y in x ; y>n}
Now, Going through the code and paraphrasing it (in the left the Math side , in the right its equivalent in R):
Math R
---------------- ---------------------
above: (x,n) <---> above <- function(x, n)
{y in x ; y>n} <---> x[x > n]
So to wrap all the statments together within a function you should respect a syntax :
function_name <- function(arg1,arg2) { statements}
Applying the above to this example (we have one statement here) :
above <- function(x,n) { x[x>n]}
Finally calling this function is exactly the same thing as calling a mathematical function.
above(x,2)
ok I will try, if this is too detailed let me know, but I tried to go really slowly:
above <- function(x, n)
this defines a function, which is just some procedure which produces some output given some input, the <- means assign what is on the right hand side to what is on the left hand side, or in other words put everything on the right into the object on the left, so for example container <- 1 puts 1 into the container, in this case we put a function inside the object above,
function(x, n) everything in the paranthesis specifys what inputs the function takes, so this one takes two variables x and n,
now we come to the body of the function which defines what it does with the inputs x and n, the body of the function is everything inside the curley braces:
{
use <- x > n
x[use]
}
so let's explain that piece by piece:
use <- x > n
this part again puts whats on the right side into the object on the left, and what is happening on the right hand side? a comparison returning TRUE if x is bigger than n and FALSE if x is equal to or smaller then n, so if x is 5 and n is 3 the result will be TRUE, and this value will get stored inside use, so use contains TRUE now, now if we have more than one value inside x than every value inside x will get compared to n, so for example if x = [1, 2, 3] and n = 2
than we have
1 > 2 FALSE
2 > 2 FALSE
3 > 2 TRUE
, so use will contain FALSE, FALSE, TRUE
x[use]
now we are taking a part of x, the square brackets specify which parts of x we want, so in my example case x has 3 elements and use has 3 elements if we combine them we have:
x use
1 FALSE
2 FALSE
3 TRUE
so now we say I dont want 1,2 but i want 3 and the result is 3
so now we have defined the function, now we call it, or in normal words we use it:
x <- 1:20
above(x, 12)
first we assign the numbers 1 through 20 to x, and then we tell the function above to execute (do everything inside its curley braces with the inputs x = 1:20 and n = 12, so in other words we do the following:
above(x, 12)
execute the function above with the inputs x = 1:20 and n = 12
use <- 1:20 > 12
compare 12 to every number from 1:20 and return for each comparison TRUE if the number is in fact bigger than 12 and FALSE if otherwise, than store all the results inside use
x[use]
now give me the corresponding elements of x for which the vector use contains TRUE
so:
x use
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
7 FALSE
8 FALSE
9 FALSE
10 FALSE
11 FALSE
12 FALSE
13 TRUE
14 TRUE
15 TRUE
16 TRUE
17 TRUE
18 TRUE
19 TRUE
20 TRUE
so we get the numbers 13:20 back as a result
I'll give it a crack too. A few basic points that should get you going in the right direction.
1) The idea of a function. Basically, a function is reusable code. Say I know that in my analysis for some bizarre reason I will often want to add two numbers, multiply them by a third, and divide them by a fourth. (Just suspend disbelief here.) So one way I could do that would just be to write the operation over and over, as follows:
(75 + 93)*4/18
(847 + 3)*3.1415/2.7182
(999 + 380302)*-6901834529/2.5
But that's tedious and error-prone. (What happens if I forget a parenthesis?) Alternatively, I can just define a function that takes whatever numbers I feed into it and carries out the operation. In R:
stupidMath <- function(a, b, c, d){
result <- (a + b)*c/d
}
That code says "I'd like to store this series of commands and attach them to the name "stupidMath." That's called defining a function, and when you define a function, the series of commands is just stored in memory---it doesn't actually do anything until you "call" it. "Calling" it is just ordering it to run, and when you do so, you give it "arguments" ---the stuff in the parentheses in the first line are the arguments it expects, i.e., in my example, it wants four distinct pieces of data, which will be called 'a', 'b', 'c', and 'd'.
Then it'll do the things it's supposed to do with whatever you give it. "The things it's supposed to do" is the stuff in the curly brackets {} --- that's the "body" of the function, which describes what to do with the arguments you give it. So now, whenever you want to carry that mathematical operation you can just "call" the function. To do the first computation, for example, you'd just write stupidMath(75, 93, 4, 18) Then the function gets executed, treating 75 as 'a', 83 as 'b', and so forth.
In your example, the function is named "above" and it takes two arguments, denoted 'x' and 'n'.
2) The "assignment operator": R is unique among major programming languages in using <- -- that's equivalent to = in most other languages, i.e., it says "the name on the left has the value on the right." Conceptually, it's just like how a variable in algebra works.
3) so the "body" of the function (the stuff in the curly brackets) first assigns the name "use" to the expression x > n. What's going on there. Well, an expression is something that the computer evaluates to get data. So remember that when you call the function, you give it values for x and n. The first thing this function does is figures out whether x is greater than n or less than n. If it's greater than n, it evaluates the expression x > n as TRUE. Otherwise, FALSE.
So if you were to define the function in your example and then call it with above(10, 5), then the first line of the body would set the local variable (don't worry right now about what a 'local' variable is) 'use' to be 'TRUE'. This is a boolean value.
Then the next line of the function is a "filter." Filtering is a long topic in R, but basically, R things of everything as a "vector," that is, a bunch of pieces of data in a row. A vector in R can be like a vector in linear algebra, i.e., (1, 2, 3, 4, 5, 99) is a vector, but it can also be of stuff other than numbers. For now let's just focus on numbers.
The wacky thing about R (one of the many wacky things about R) is that it treats a single number (a "scalar" in linear algebra terms) just as a vector with only one item in it.
Ok, so why did I just go into that? Because in lots of places in R, a vector and a scalar are interchangable.
So in your example code, instead of giving a scalar for the first argument, when we call the function we've given 'above' a vector for its first argument. R likes vectors. R really likes vectors. (Just talk to R people for a while. They're all obsessed with doing every goddmamn thing in terms of a vector.) So it's no problem to pass a vector for the first argument. But what that means is that the variable 'use' is going to be a vector too. Specifically, 'use' is going to be a vector of booleans, i.e., of TRUE or FALSE for each individual value of X.
To take a simpler version: suppose you said:
mynums <- c(5, 10)
myresult <- above(mynums, 7)
when the code runs, the first thing it's going to do is define that 'use' variable. But x is a vector now, not a scalar (the c(5,10) code said "make a vector with two elements, and fill them with the numbers '5' and '10'), so R's going to go ahead and carry out the comparison for each element of x. Since 5 is less than 7 and 10 is greater than 7, use becomes the two item-vector of boolean values (FALSE, TRUE)
Ok, now we can talk about filtering. So a vector of boolean values is called a 'logical vector.' And the code x[use] says "filter x by the stuff in the variable use." When you tell R to filter something by a logical vector, it spits back out the elements of the thing being filtered which correspond to the values of 'TRUE'
So in the example just given:
mynums <- c(5, 10)
myresult <- above(mynums, 7)
the value of myresult will just be 10. Why? Because the function filtered 'x' by the logical vector 'use,' 'x' was (5, 10), and 'use' was (FALSE, TRUE); since the second element of the logical was the only true, you only got the second element of x.
And that gets assigned to the variable myresult because myresult <- above(mynums, 7) means "assign the name myresult to the value of above(mynums, 7)"
voila.

R's signal package's filter not matching with Matlab's filter function

In Matlab, there is a 1-D filter function http://www.mathworks.com/help/matlab/ref/filter.html .
In R's signal package, the description of its filter function states: Generic filtering function. The default is to filter with an ARMA filter of given coefficients. The default filtering operation follows Matlab/Octave conventions.
However, the answers don't match if I give the same specification.
In MATLAB (correct answer):
x=[4 3 5 2 7 3]
filter(2/3,[1 -1/3],x,x(1)*1/3)
ans =
4.0000 3.3333 4.4444 2.8148 5.6049 3.8683
In R, if I follow Matlab/Octave's convention (incorrect answer):
library(signal)
x<-c(4,3,5,2,7,3)
filter(2/3,c(1,-1/3),x,x[1]*1/3)
Time Series:
Start = 1
End = 6
Frequency = 1
[1] 3.111111 3.037037 4.345679 2.781893 5.593964 3.864655
I tried a lot of other examples too. R's signal package's filter function doesn't appear to follow the Matlab/Octave conventions even though the document states it so. Perhaps, I'm using the filter function incorrectly in R. Can someone help me?
I believe the answer is in the documentation (shock!!!!)
matlab:
The filter is a "Direct Form II Transposed"
implementation of the standard difference equation:
a(1)*y(n) = b(1)*x(n) + b(2)*x(n-1) + ... + b(nb+1)*x(n-nb)
- a(2)*y(n-1) - ... - a(na+1)*y(n-na)
If a(1) is not equal to 1, filter normalizes the filter coefficients by a(1).
[emphasis mine]
R:
a[1]*y[n] + a[2]*y[n-1] + … + a[n]*y[1] = b[1]*x[n] + b[2]*x[m-1] + … + b[m]*x[1]
Thanks for lifting this issue a couple of years back... I bumped into it as well and think I got an answer. Essentially I think the optimization algos are different for R and Matlab.
If no guess is provided (that is, set the initial values to default which is zero for both R and Matlab), the results are very similar.
R
library(signal)
x<-c(4,3,5,2,7,3)
filter(2/3,cbind(1,-1/3),x, 0.00)
2.666667 2.888889 4.296296 2.765432 5.588477 3.862826
Matlab
x=[4 3 5 2 7 3]
filter(2/3,[1 -1/3],x,0.00)
2.6667 2.8889 4.2963 2.7654 5.5885 3.8628
Now, if we start tweaking the initial guess of the parameters, then the results will diverge.
R
library(signal)
x<-c(4,3,5,2,7,3)
filter(2/3,cbind(1,-1/3),x, 0.05)
2.683333 2.894444 4.298148 2.766049 5.588683 3.862894
Matlab
x=[4 3 5 2 7 3]
filter(2/3,[1 -1/3],x,0.05)
2.7167 2.9056 4.3019 2.7673 5.5891 3.8630
Hope it helps!

Resources