How to create these functions in R? - r

I have two mathematical formulae that needed to be created as R functions and then run them on my data.
First of all let me show you the head of my data, which I named it "data_1"
sex age seca1 chad1 DL alog1 dig1 scifirst1 crimetech1
1 F 20 1754 1750 175 95 95 432 429
2 F 19 1594 1596 158 56 55 420 417
3 F 20 1556 1558 156 74 72 435 437
4 F 18 1648 1640 167 67 65 431 434
5 F 19 1780 1780 178 99 67 433 431
6 F 19 1610 1620 165 56 54 423 425
Now lets look at the formulae and my attempts to create the functions on r :
1)The 1st formula (f1):
The first formula (can be accessed through this link):
This is what I did to create the first formula:
f1 <- function(x, y) {sqrt(sum((x - y) ^ 2) / 2 / length(x))}
As I need to run f1 on data_1$alog1 vs data_1$dig1... here's what I did:
f1(data_1$alog1, data_1$dig1)
which gives: 4.3
Now the question is: have I created the function correctly? and is it supposed to be giving me only one value?
2)The 2nd formula (f2):
The second function is to create a mathematical formula in which the first function is substituted. Here is the formula --> (follow this link):
"f1" that appears in red in the second formula, is supposed to be the first function created (1st formula).
Next, here is what I did to implement the 2nd formula:
f2 <- function(x, y){(f1 / ((x + y) / 2)) * 100}
but then, when I run it on data_1$alog1 vs data_1$dig1 to calculate the coefficient of variation of the error for these data, I get:
> f2(data_1$alog1, data_1$dig1)
Error in f1/((x + y)/2) : non-numeric argument to binary operator
Could anyone please comment on the steps performed to create the functions and the way I run the functions on "alog1 vs dig1"?

If x is a vector, then sum(x) is a scalar. To get cumulative sums, use cumsum(x):
f1 <- function(x, y) {sqrt(cumsum((x - y) ^ 2) / 2 / length(x))}
f1(data_1$alog1, data_1$dig1)
which returns
[1] 0.0000000 0.2886751 0.6454972 0.8660254 9.2781104 9.2960565
instead of simply returning the last value.
In the second function definition, you are forgetting that f1 is a function of (x, y):
f2 <- function(x, y) {f1(x, y) / ((x + y) / 2) * 100}
f2(data_1$alog1, data_1$dig1)
[1] 0.0000000 0.5201354 0.8842428 1.3121597 11.1784463 16.9019209

Related

Error in application of the Rascola-Wagner model in R

I hope I don't have a big gap in education.
I need to get the final best alpha - learning rate of the model, but I can't manage to get the function right.
I have a data that looks something like this:
ID Turn_no p_mean t_mean
1 1 170 99
1 2 176 93
1 3 138 92
1 4 172 118
1 5 163 96
1 6 170 105
1 7 146 99
1 8 172 94
and so on...
I want to use the equation:
p(turn) = p(turn-1) + alpha[(p(turn-1) - t(turn-1)]
I'm pretty stuck on making a function and log-likelihood based on the Rescorla-Wagner model.
This is the function so far:
RWmodel = function(data, par) {
ll <- NA
alpha <- par[1]
ID <- data$ID
Turn_no <- data$Turn_no
p_mean<- data$p_mean
t_mean<- data$t_mean
num_reps <- length(df$Turn_no)
i <- 2
for (i in 2:num_reps) {
#calculate prediction error
PE <- p_mean[i-1] - t_mean[i-1]
#update p's value
p_mean[i] <- p_mean[i-1] + alpha*PE
}
#minus maximum log likelihood, use sum and log functions
ll <- -sum(log(??))
#return ll
ll
}`
I know I'm missing an important step in the function, I just can't figure out how to execute the log likelihood right in this situation.

R function generating incorrect results

I am trying to get better with functions in R and I was working on a function to pull out every odd value from 100 to 500 that was divisible by 3. I got close with the function below. It keeps returning all of the values correctly but it also includes the first number in the sequence (101) when it should not. Any help would be greatly appreciated. The code I wrote is as follows:
Test=function(n){
if(n>100){
s=seq(from=101,to=n,by=2)
p=c()
for(i in seq(from=101,to=n,by=2)){
if(any(s==i)){
p=c(p,i)
s=c(s[(s%%3)==0],i)
}}
return (p)}else{
stop
}}
Test(500)
Here is a function that gets all non even multiples of 3. It's fully vectorized, no loops at all.
Check if n is within the range [100, 500].
Create an integer vector N from 100 to n.
Create a logical index of the elements of N that are divisible by 3 but not by 2.
Extract the elements of N that match the index i.
The main work is done in 3 code lines.
Test <- function(n){
stopifnot(n >= 100)
stopifnot(n <= 500)
N <- seq_len(n)[-(1:99)]
i <- ((N %% 3) == 0) & ((N %% 2) != 0)
N[i]
}
Test(500)
Here is a vectorised one-liner which optionally allows you to change the lower bound from a default of 100 to anything you like. If the bounds are wrong, it returns an empty vector rather than throwing an error.
It works by creating a vector of 1:500 (or more generally, 1:n), then testing whether each element is greater than 100 (or whichever lower bound m you set), AND whether each element is odd AND whether each element is divisible by 3. It uses the which function to return the indices of the elements that pass all the tests.
Test <- function(n, m = 100) which(1:n > m & 1:n %% 2 != 0 & 1:n %% 3 == 0)
So you can use it as specified in your question:
Test(500)
# [1] 105 111 117 123 129 135 141 147 153 159 165 171 177 183 189 195 201 207 213 219
# [21] 225 231 237 243 249 255 261 267 273 279 285 291 297 303 309 315 321 327 333 339
# [41] 345 351 357 363 369 375 381 387 393 399 405 411 417 423 429 435 441 447 453 459
# [61] 465 471 477 483 489 495
Or play around with upper and lower bounds:
Test(100, 50)
# [1] 51 57 63 69 75 81 87 93 99
Here is a function example for your objective
Test <- function(n) {
if(n<100 | n> 500) stop("out of range")
v <- seq(101,n,by = 2)
na.omit(ifelse(v%%2==1 & v%%3==0,v,NA))
}
stop() is called when your n is out of range [100,500]
ifelse() outputs desired odd values + NA
na.omit filters out NA and produce the final results

Regression (log normal) in R: Finding x value (predictor) for a particular y value (outcome)

This is a similar question to that posted in Regression (logistic) in R: Finding x value (predictor) for a particular y value (outcome). I am trying to find the x value for a known y value (in this case 0.000001) obtained from fitting a log normal curve fitted to sapling densities at distances from parent trees using a genetic algorithm. This algorithm gives me the a and b parameters of the best-fit log normal curve.
I have obtained the value of x for y=0.00001 for other curves, such as negative exponential, by using uniroot using this code (which works well for these curves):
##calculate x value at y=0.000001 (predicted near-maximum recruitment distance)
aparam=a
bparam=b
testfn <- function (y, aparam, bparam) {
## find value of x that satisfies y = a + bx
fn <- function(x) (a * exp(-b * x)) - y
uniroot(fn, lower=0, upper= 100000000)$root
}
testfn(0.000001)
Unfortunately, the same code using a log normal formula does not work. I have tried to use uniroot by setting the lower boundary above zero. But get an error code:
Error in uniroot(fn, lower = 1e-16, upper = 1e+18) :
f() values at end points not of opposite sign
My code and data (given below the code) is:
file="TR maire 1mbin.txt"
xydata <- read.table(file,header=TRUE,col.names=c('x','y'))
####assign best parameter values
a = 1.35577
b = 0.8941521
#####Plot model against data
par(mar=c(5,5,2,2))
xvals=seq(1,max(xydata$x),1)
plot(jitter(xydata$x), jitter(xydata$y),pch=1,xlab="distance from NCA (m)",
ylab=quote(recruit ~ density ~ (individuals ~ m^{2~~~ -1})))
col2="light grey"
plotmodel <- a* exp(-(b) * xvals)
lines(xvals,plotmodel,col=col2)
####ATTEMPT 1
##calculate x value at y=0.000001 (predicted near-maximum recruitment distance)
aparam=a
bparam=b
testfn <- function (y, aparam, bparam) {
fn <- function(x) ((exp(-(((log(x/b)) * (log(x/b)))/(2*a*a))))/(a * x * sqrt(2*pi))) - y
uniroot(fn, lower=0.0000000000000001, upper= 1000000000000000000)$root
}
testfn(0.000001)
data is:
xydata
1 1 0.318309886
2 2 0.106103295
3 2 0.106103295
4 2 0.106103295
5 3 0.063661977
6 4 0.045472841
7 5 0.035367765
8 5 0.035367765
9 7 0.048970752
10 8 0.021220659
11 8 0.021220659
12 8 0.042441318
13 9 0.018724111
14 10 0.016753152
15 10 0.016753152
16 12 0.013839560
17 13 0.025464791
18 16 0.010268061
19 17 0.009645754
20 24 0.013545102
21 25 0.032480601
22 26 0.043689592
23 27 0.006005847
24 28 0.011574905
25 31 0.062618338
26 32 0.005052538
27 42 0.003835059
28 42 0.003835059
29 44 0.003658734
30 46 0.003497911
31 48 0.006701261
32 50 0.003215251
33 50 0.006430503
34 51 0.006303166
35 58 0.002767912
36 79 0.002027452
37 129 0.003715680
38 131 0.001219578
39 132 0.001210304
40 133 0.001201169
41 144 0.001109094
42 181 0.000881745
43 279 0.001142944
44 326 0.000488955
Or is there another way of approaching this?
I'm an ecologist and sometimes R just does not make sense!
Seems like there were some errors in my r code, but the main problem is that my lower limit was too low and the Log Normal curve does not extend to that value (my interpretation). The solution that works for me is:
### define the formula parameter values
a = 1.35577
b = 0.8941521
### define your formula (in this instance a log normal) in the {}
fn <- function(x,a,b,y) { ((exp(-(((log(x/b)) * (log(x/b)))/(2*a*a))))/(a * x * sqrt(2*pi))) - y}
###then use uniroot()$root calling the known parameter values and defining the value of y that is of interest (in this case 0.000001)
uniroot(fn,c(1,200000),a=a,b=b,y=0.000001)$root

10 fold cross validation using logspline in R

I would like to do 10 fold cross validation and then using MSE for model selection in R . I can divide the data into 10 groups, but I got the following error, how can I fix it?
crossvalind <- function(N, kfold) {
len.seg <- ceiling(N/kfold)
incomplete <- kfold*len.seg - N
complete <- kfold - incomplete
ind <- matrix(c(sample(1:N), rep(NA, incomplete)), nrow = len.seg, byrow = TRUE)
cvi <- lapply(as.data.frame(ind), function(x) c(na.omit(x))) # a list
return(cvi)
}
I am using logspline package for estimation of a density function.
library(logspline)
x = rnorm(300, 0, 1)
kfold <- 10
cvi <- crossvalind(N = 300, kfold = 10)
for (i in 1:length(cvi)) {
xc <- x[cvi[-i]] # x in training set
xt <- x[cvi[i]] # x in test set
fit <- logspline(xc)
f.pred <- dlogspline(xt, fit)
f.true <- dnorm(xt, 0, 1)
mse[i] <- mean((f.true - f.pred)^2)
}
Error in x[cvi[-i]] : invalid subscript type 'list'
cvi is a list object, so cvi[-1] and cvi[1] are list objects, and then you try and get x[cvi[-1]] which is subscripting using a list object, which doesn't make sense because list objects can be complex objects containing numbers, characters, dates and other lists.
Subscripting a list with single square brackets always returns a list. Use double square brackets to get the constituents, which in this case are vectors.
> cvi[1] # this is a list with one element
$V1
[1] 101 78 231 82 211 239 20 201 294 276 181 168 207 240 61 72 267 75 218
[20] 177 127 228 29 159 185 118 296 67 41 187
> cvi[[1]] # a length 30 vector:
[1] 101 78 231 82 211 239 20 201 294 276 181 168 207 240 61 72 267 75 218
[20] 177 127 228 29 159 185 118 296 67 41 187
so you can then get those elements of x:
> x[cvi[[1]]]
[1] 0.32751014 -1.13362827 -0.13286966 0.47774044 -0.63942372 0.37453378
[7] -1.09954301 -0.52806368 -0.27923480 -0.43530831 1.09462984 0.38454106
[13] -0.68283862 -1.23407793 1.60511404 0.93178122 0.47314510 -0.68034783
[19] 2.13496564 1.20117869 -0.44558321 -0.94099782 -0.19366673 0.26640705
[25] -0.96841548 -1.03443796 1.24849113 0.09258465 -0.32922472 0.83169736
this doesn't work with negative indexes:
> cvi[[-1]]
Error in cvi[[-1]] : attempt to select more than one element
So instead of subscripting x with the list elements you don't want, subscript it with the negative of the indexes you do want (since you are partitioning here):
> x[-cvi[[1]]]
will return the other 270 elements. Note I've used 1 here for the first pass through your loop, replace with i and insert in your code.

creating functions to calculate the technical error and the coefficient of variation of the error

I have this equation (which can be accessed through this link):
I would like to create two functions by using r. The first one is by using the first equation provided.
The second function is to create a mathematical formula in which the first function is substituted. Here is the formula:
(http://i43.tinypic.com/b6vq5j.jpg)
THis is the head of my data: (data_1)
sex age seca1 chad1 DL alog1 dig1 scifirst1 crimetech1
1 F 20 1754 1750 175 95 95 432 429
2 F 19 1594 1596 158 56 55 420 417
3 F 20 1556 1558 156 74 72 435 437
4 F 18 1648 1640 167 67 65 431 434
5 F 19 1780 1780 178 99 67 433 431
6 F 19 1610 1620 165 56 54 423 425
After doing this as #janos suggested:
f1 <- function(x, y) {sqrt(sum((x - y) ^ 2) / 2 / length(x))}
now, as i need to run f1 on data_1$alog1 vs data_1$dig1... here's what i did:
f1(data_1$alog1, data_1$dig1)
which gives: 4.3
Next, I tried to implement the 2nd formula like this:
f2 <- function(x, y){(f1 / ((x + y) / 2)) * 100}
but then, when I run it on data_1$alog1 vs data_1$dig1 to calculate the coefficient of variation of the error for these data I get:
> f2(data_1$alog1, data_1$dig1)
Error in f1/((x + y)/2) : non-numeric argument to binary operator
Could anyone please comment on the steps performed to create the first function, the second function and the way i run the functions on "alog1 vs dig1" ?
Thanks all!!
If I understood correctly, here you go:
f1 <- function(x, y) {
sqrt(sum((x - y) ^ 2) / 2 / length(x))
}
f1(1:3, 4:6)
This will output:
[1] 2.12132
The function assumes that x and y are both vectors of the same length.
You can do the same for the 2nd function, with some simplification:
f2 <- function(x, y) {
200 * f1(x, y) / (x + y)
}
f2(1:3, 3:5)
To check that two vectors have the same length, you can use the length method. It can be also useful to halt execution if this assumption fails, like this:
stopifnot(length(x) == length(y))

Resources