Making custom functions in R involving summation - r

I am a novice in R asked to compute for a descriptive statistic called dominance (D; expressed as a percentage). D is basically defined as the mean abundance (MA) value of x divided by the sum of MA values of x to i. MA meanwhile is defined as the sum of all values in a vector over the length of the said vector. Here is how I normally approach things:
#Example data
x <- c(1, 2, 3)
y <- c(4, 5, 6)
z <- c(7, 8, 9)
#Mean abundance function
mean.abundance <- function(x){
N_sum <- sum(x)
N_count <- length(x)
N_sum/N_count
}
#Percent dominance function (workaround)
percent.dominance <- function(x, ...){
MA_a <- (x)
sum_MA_i <- sum(x, ...)
(MA_a/sum_MA_i)*100
}
MA_x <- mean.abundance(x)
MA_y <- mean.abundance(y)
MA_z <- mean.abundance(z)
MA <- c(MA_x, MA_y, MA_z)
MA
D_x <- percent.dominance(MA_x, MA_y, MA_z)
D_y <- percent.dominance(MA_y, MA_x, MA_z)
D_z <- percent.dominance(MA_z, MA_x, MA_y)
D <- c(D_x, D_y, D_z)
D
That approach alone already gives me the %D values I am looking for. My problem is that my (perfectionist) PI is asking me to compute for the %D values directly using vectors x, y, and z (and not stepwise by means of calculating MA values then using vectors MA_x, MA_y, and MA_z to calculate for %D). I am stumped making a custom function for %D that involves vectors containing raw data; here is a failed attempt to revise said custom function, just to give a general idea.
#Percent dominance function (incorrect)
percent.dominance <- function(x, ...){
MA_a <- sum(x)/length(x)
sum_MA_i <- sum(x, ...)/length(x, ...)
(MA_a/sum_MA_i)*100
}

You can capture the optional data passed with list(...) and make the following changes to the function -
percent.dominance <- function(x, ...){
data <- list(...)
MA_a <- sum(x)/length(x)
sum_MA_i <- sum(x, unlist(data))/(length(data) + 1)
(MA_a/sum_MA_i)*100
}
percent.dominance(x, y, z)
#[1] 13.33333
percent.dominance(y, x, z)
#[1] 33.33333
percent.dominance(z, x, y)
#[1] 53.33333

Related

Is it possible to use vector math in R for a summation involving intervals?

Title's a little rough, open to suggestions to improve.
I'm trying to calculate time-average covariances for a 500 length vector.
This is the equation we're using
The result I'm hoping for is a vector with an entry for k from 0 to 500 (0 would just be the variance of the whole set).
I've started with something like this, but I know I'll need to reference the gap (i) in the first mean comparison as well:
x <- rnorm(500)
xMean <-mean(x)
i <- seq(1, 500)
dfGam <- data.frame(i)
dfGam$gamma <- (1/(500-dfGam$i))*(sum((x-xMean)*(x[-dfGam$i]-xMean)))
Is it possible to do this using vector math or will I need to use some sort of for loop?
Here's the for loop that I've come up with for the solution:
gamma_func <- function(input_vec) {
output_vec <- c()
input_mean <- mean(input_vec)
iter <- seq(1, length(input_vec)-1)
for(val in iter){
iter2 <- seq((val+1), length(input_vec))
gamma_sum <- 0
for(val2 in iter2){
gamma_sum <- gamma_sum + (input_vec[val2]-input_mean)*(input_vec[val2-val]-input_mean)
}
output_vec[val] <- (1/length(iter2))*gamma_sum
}
return(output_vec)
}
Thanks
Using data.table, mostly for the shift function to make x_{t - k}, you can do this:
library(data.table)
gammabar <- function(k, x){
xbar <- mean(x)
n <- length(x)
df <- data.table(xt = x, xtk = shift(x, k))[!is.na(xtk)]
df[, sum((xt - xbar)*(xtk - xbar))/n]
}
gammabar(k = 10, x)
# [1] -0.1553118
The filter [!is.na(xtk)] starts the sum at t = k + 1, because xtk will be NA for the first k indices due to being shifted by k.
Reproducible x
x <- c(0.376972124936433, 0.301548373935665, -1.0980231706536, -1.13040590360378,
-2.79653431987176, 0.720573498411587, 0.93912102300901, -0.229377746707471,
1.75913134696347, 0.117366786802848, -0.853122822287008, 0.909259181618213,
1.19637295955276, -0.371583903741348, -0.123260233287436, 1.80004311672545,
1.70399587729432, -3.03876460529759, -2.28897494991878, 0.0583034949929225,
2.17436525195634, 1.09818265352131, 0.318220322390854, -0.0731475581637693,
0.834268741278827, 0.198750636733429, 1.29784138432631, 0.936718306241348,
-0.147433193833294, 0.110431994640128, -0.812504663900505, -0.743702167768748,
1.09534507180741, 2.43537370755095, 0.38811846676708, 0.290627670295127,
-0.285598287083935, 0.0760147178373681, -0.560298603759627, 0.447188372143361,
0.908501134499943, -0.505059597708343, -0.301004012157305, -0.726035976548133,
-1.18007702699501, 0.253074712637114, -0.370711296884049, 0.0221795637601637,
0.660044122429767, 0.48879363533552)

How to condition a computation and then add al computation done in R?

i am experimenting with and R and I can't find the way to do the next thing:
1- I want to multiply if x == 3 multiply by "y" value of the same row
2- Add all computations done in step 1.
x <- 3426278722533992028364647392927338
y <- 7479550949037487987438746984798374
x <- as.numeric(strsplit(as.character(x), "")[[1]])
y <- as.numeric(strsplit(as.character(y), "")[[1]])
Table <- table(x,y)
Table <- data.frame(Table)
Table$Freq <- NULL
So I tried creating a function:
Calculation <- function (x,y) {
z <- if(x == 3){ x * y }
w <- sum(z)
}
x and y are the columns of the data.frame
This prints and error which I struggle to solve...
Thanks for your time,
Kylian Pattje
2 things here:
1. Use ifelse in your function,
Calculation <- function (x,y) {
z <- ifelse(x == 3, x * y, NA)
w <- sum(z, na.rm = TRUE)
return(w)
}
2. Make sure your variables are NOT factors,
Table[] <- lapply(Table, function(i) as.numeric(as.character(i)))
Calculation(Table$x, Table$y)
#[1] 84

Variable number of Inputs for the Function - R

I have the following function that A and B indicate vector as inputs. My question is how I can have different number of vetors for this function. In my function, n is fixed and shows the number of samples. For example, I need that my function works for this case as well: sample_sum (A, B, C, D, E, n, ...).
Also for the part that I get samples of vector A and B (i.e. in linesample_A <- qss(A, n=n, ...)) how I can modify it for different number of vectors not using for loop since for loop is not fast enough.
Thanks
sample_sum <- function(A, B, n, ...)
{
qss <- function(X, n, ...)
{
r_X <- range(X)
dens_X <- density(X, ...)
sample(dens_X$x, size=n, prob=dens_X$y, replace=TRUE)
}
sample_A <- qss(A, n=n, ...)
sample_B <- qss(B, n=n, ...)
sample_A + sample_B
}
I think what you need is the following:
new_sample_sum <- function(my_vector_list, n, ...)
{
qss <- function(X, n, ...)
{
r_X <- range(X)
dens_X <- density(X, ...)
sample(dens_X$x, size=n, prob=dens_X$y, replace=TRUE)
}
rowSums(sapply(my_vector_list, qss, n = n))
}
You can put multiple vectors in your vector list and then do your calculations over the list without worrying about the number of vectors. Just a brief note on the calculations, sapply will loop over all vectors in the list and then return a table with all the results of applying qss to each vector (For every vector the result is one column in the table that sapply returns). Since you add the vectors in the final step of your function I do a rowSums of the table with all the results.
And in order to prove the consistency:
set.seed(1)
x <- c(1,2,3,4)
y <- c(6,7,9,0)
z <- c(2,2,2,2)
> sample_sum(x, y, n=2)
[1] 6.545129 13.474390
> new_sample_sum(list(x, y), n = 2)
[1] 6.545129 13.474390
Or with more vectors:
sample_sum <- function(A, B, C, n, ...)
{
qss <- function(X, n, ...)
{
r_X <- range(X)
dens_X <- density(X, ...)
sample(dens_X$x, size=n, prob=dens_X$y, replace=TRUE)
}
sample_A <- qss(A, n=n, ...)
sample_B <- qss(B, n=n, ...)
sample_C <- qss(C, n=n, ...)
sample_A + sample_B + sample_C
}
set.seed(1)
> sample_sum(x, y, z, n = 2)
[1] 6.102482 15.450364
set.seed(1)
> new_sample_sum(list(x, y, z), n = 2)
[1] 6.102482 15.450364
Just use n=length(A) to dynamically determine the size of the vector arguments.

using events in deSolve to prevent negative state variables, R

I am modeling the population change in a food web of species, using ODE and deSolve in R. obviously the populations should not be less than zero. therefore I have added an event function and run it as below. although the answers change from when I did nlt used event function, but it still producds negative values. What is wrong?
#using events in a function to distinguish and address the negative abundances
eventfun <- function(t, y, parms){
y[which(y<0)] <- 0
return(y)
}
# =============================== main code
max.time = 100
start.time = 50
initials <- c(N, R)
#parms <- list(webs=webs, a=a, b=b, h=h, m=m, basals=basals, mu=mu, Y=Y, K=K, no.species=no.species, flow=flow,S=S, neighs=neighs$neighs.per, dispers.maps=dispers.maps)
temp.abund <- ode(y=initials, func=solve.model, times=0:max.time, parms=parms, events = list(func = eventfun, time = 0:max.time))
and here is the ODE function(if it helps in finding the problem):
solve.model <- function(t, y, parms){
y <- ifelse(y<1e-6, 0, y)
with(parms,{
# return from vector form into matrix form for calculations
(R <- as.matrix(y[(max(no.species)*length(no.species)+1):length(y)]))
(N <- matrix(y[1:(max(no.species)*length(no.species))], ncol=length(no.species)))
dy1 <- matrix(nrow=max(no.species), ncol=length(no.species))
dy2 <- matrix(nrow=length(no.species), ncol=1)
no.webs <- length(no.species)
for (i in 1:no.webs){
species <- no.species[i]
(abundance <- N[1:species,i])
adj <- as.matrix(webs[[i]])
a.temp <- a[1:species, 1:species]*adj
b.temp <- b[1:species, 1:species]*adj
h.temp <- h[1:species, 1:species]*adj
(sum.over.preys <- abundance%*%(a.temp*h.temp))
(sum.over.predators <- (a.temp*h.temp)%*%abundance)
#Calculating growth of basal
(basal.growth <- basals[,i]*N[,i]*(mu*R[i]/(K+R[i])-m))
# Calculating growth for non-basal species D
no.basal <- rep(1,len=species)-basals[1:species]
predator.growth<- rep(0, max(no.species))
(predator.growth[1:species] <- ((abundance%*%(a.temp*b.temp))/(1+sum.over.preys)-m*no.basal)*abundance)
predation <- rep(0, max(no.species))
(predation[1:species] <- (((a.temp*b.temp)%*%abundance)/t(1+sum.over.preys))*abundance)
(pop <- basal.growth + predator.growth - predation)
dy1[,i] <- pop
dy2[i] <- 0.0005 #to consider a nearly constant value for the resource
}
#Calculating dispersals .they can be easily replaced
# by adjacency maps of connections between food webs arbitrarily!
disp.left <- dy1*d*dispers.maps$left.immig
disp.left <- disp.left[,neighs[,2]]
disp.right <- dy1*d*dispers.maps$right.immig
disp.right <- disp.right[,neighs[,3]]
emig <- dy1*d*dispers.maps$emigration
mortality <- m*dy1
dy1 <- dy1+disp.left+disp.right-emig
return(list(c(dy1, dy2)))
})
}
thank you so much for your help
I have had success using a similar event function defined like this:
eventfun <- function(t, y, parms){
with(as.list(y), {
y[y < 1e-6] <- 0
return(y)
})
}
I am using a similar event function to the one posted by jjborrelli. I wanted to note that for me it is still showing the ode function returning negative values. However, when ode goes to calculate the next step, it is using 0, and not the negative value shown for the current step, so you can basically ignore the negative values and replace with zeros at the end of the simulation.

Function for normalizing one data frame to be applied on a second data frame in R

This is home work.
I am new to R.
I have two data frames each containing two columns of data. I have to find a function that normalize the first data frame to a mean of 0 and a variance of 1 - for both columns. Then I want to apply that function on the second data frame.
I have tried this:
my_scale_test <- function(x,y) {
apply(y, 2, function(x,y) {
(y - mean(x ))/sd(x)
})
}
where x is the first data frame and y is the data frame to be normalized.
Can some one help me?
Edit:
I have now tried this aswell, but not working either:
scale_func <- function(x,y) {
xmean <- mean(x)
xstd <- sd(x)
yout <- y
for (i in 1:length(x[1,]))
yout[,i] <- yout[,i] - xmean[i]
for (i in 1:length(x[1,]))
yout[,i] <- yout[,i]/xsd[i]
invisible(yout)
}
Edit 2:
I found this working function for MatLab (which i tried to translate in edit 1):
function [ Xout ] = scale( Xbase, Xin )
Xmean = mean(Xbase);
Xstd = std(Xbase);
Xout = Xin;
for i=1:length(Xbase(1,:))
Xout(:,i) = Xout(:,i) - Xmean(i);
end
for i=1:length(Xbase(1,:))
Xout(:,i) = Xout(:,i)/Xstd(i);
end
end
Can someone help me translate it?
Since you are new to R, let's try something really basic.
my_scale_test <- function(x, y) {
y.nrow <- nrow(y)
x.mean <- data.frame(t(apply(x, 2, mean)))
x.sd <- data.frame(t(apply(x, 2, sd)))
# To let x.mean and x.sd have the same dimension as y, let's repeat the rows.
x.mean <- x.mean[rep(1, y.nrow), ]
x.sd <- x.sd[rep(1, y.nrow), ]
(y - x.mean)/x.sd
}
To test, try
set.seed(1)
x <- data.frame(matrix(rnorm(10), nrow = 5))
y <- x
result <- my_scale_test(x, y)
apply(result, 2, mean)
apply(result, 2, sd)

Resources