How to solve a matrix equation in R - r

My friend and I (both non-R experts) are trying to solve a matrix equation in R. We have matrix y which is defined by:
y=matrix(c(0.003,0.977,0,0,0,0,0,0,0,0,0,0.02,0,0.0117,0.957,0,0,0,0,0,0,0,0,0.03,0,0,0.0067,0.917,0,0,0,0,0,0,0,0.055,0,0,0,0.045,0.901,0,0,0,0,0,0,0.063,0,0,0,0,0.0533,0.913,0,0,0,0,0,0.035,0,0,0,0,0,0.05,0,0,0,0,0,0.922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0,0,0,0,0,0,0.023,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
nrow=12, ncol=12, byrow=TRUE)
This matrix simulates the way students in our school pass on to the next year. By multiplying this matrix with a vector containing the amount of students in each year we will get the amount of students in each year a year later.
With the function:
sumfun<-function(x,start,end){
return(sum(x[start:end]))
We add up the amount of students that are in each year to get the amount of students in our school in total. We want to fill in the vector (which we multiplicate by array with our matrix) with the amount of students currently in the school and have the amount of new students (first number of the vector) as our variable X.
For example:
sumfun(colSums(y*c(x,200,178,180,201,172,0,0,200,194,0,0)),2,6)
We want to equate this equation to 1000, the maximum amount of students our school building can house. By doing this, we can calculate how many new students can be accepted by our school. We have no idea how to do this. We would precast X is something between 100 and 300. We would be very grateful if somebody can help us with this!

I'm not familiar with R but I can guide through the main process of solving this matrix equation. Assuming that your matrix is called P:
And let the current student vector be called s0:
s0 = {x, 200, 178, 180, 201, 172, 0, 0, 200, 194, 0, 0};
Note that we leave x undefined as we want to solve for this variable later. Note that even though x is unknown, we can still multiply s0 with P. We call this new vector s1.
s1 = s0.P = {0.003*x, 2.34 + 0.977*x, 192.593, 173.326, 177.355, 192.113, 0, 0, 0, 0, 0, 192.749 + 0.02*x}
We can verify that this is correct as of the student years 2-6, only year 2 is effected by the amount of new students (x). So if now sum over the years 2-6 like in your example, we find that the sum is:
s1[2:6] = 737.727 + 0.977*x
All that is left is solving the trivial equation that s1[2:6] == 1000:
s1[2:6] == 1000
737.727 + 0.977*x == 1000
x = 268.447
Let me know if this is correct! This was all done in Mathematica.
The following code shows how to this in R:
y=matrix(c(0.003,0.977,0,0,0,0,0,0,0,0,0,0.02,0,0.0117,0.957,0,0,0,0,0,0,0,0,0.03,0,0,0.0067,0.917,0,0,0,0,0,0,0,0.055,0,0,0,0.045,0.901,0,0,0,0,0,0,0.063,0,0,0,0,0.0533,0.913,0,0,0,0,0,0.035,0,0,0,0,0,0.05,0,0,0,0,0,0.922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0,0,0,0,0,0,0.023,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
nrow=12, ncol=12, byrow=TRUE)
sumfun<-function(x,start,end){
return(sum(x[start:end]))
}
students <- function(x) {
students = sumfun(colSums(y*c(x,200,178,180,201,172,0,0,200,194,0,0)),2,6);
return(students - 1000);
}
uniroot(students, lower=100, upper=300)$root;
The function uniroot finds whenever a function is 0. So if you define a function which returns the amount of students for a value x and subtract 1000, it will find the x for which the number of students is 1000.
Note: this only describes short term behavior of the total amount of students. To have the number of students be 1000 in the long-term other equations must be solved.

I would suggest probing various x values and see the resulting answer. From that, you could see the trend and use it for figuring out the answer. Here is an example:
# Sample data
y=matrix(c(0.003,0.977,0,0,0,0,0,0,0,0,0,0.02,0,0.0117,0.957,0,0,0,0,0,0,0,0,0.03,0,0,0.0067,0.917,0,0,0,0,0,0,0,0.055,0,0,0,0.045,0.901,0,0,0,0,0,0,0.063,0,0,0,0,0.0533,0.913,0,0,0,0,0,0.035,0,0,0,0,0,0.05,0,0,0,0,0,0.922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0,0,0,0,0,0,0.023,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
nrow=12, ncol=12, byrow=TRUE)
# funciton f will return a total number of students in the school for a given 'x'
f <- function(x) {
z <- c(x,200,178,180,201,172,0,0,200,194,0,0)
sum(t(y[,2:6]) %*% z)
}
# Let's see the plot
px <- 1:1000
py <- sapply(px,f) # will calculate the total number of students for each x from 1 to 1000
plot(px,py,type='l',lty=2)
# Analyze the matrices (the analysis is not shown here) and reproduce the linear trend
lines(px,f(0)+sum(y[1,2:6])*px,col='red',lty=4)
# obtain the answer using the linear trend
Xstudents <- (1000-f(0))/sum(y[1,2:6])
floor(Xstudents)

Related

lpsolve with constraints in r

I would like to use R to solve an optimization problem using the lpSolve which can perform processes similar to the solver add-in in excel. Below is a simple case where I would like to maximize npv value specifically using lpSolve.
df<-structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), Revenue = c(109,
111, 122, 139, 156, 140, 137, 167)), row.names = c(NA, 8L), class = "data.frame")
dcf <- function(x, r, t0=FALSE){
# calculates discounted cash flows (DCF) given cash flow and discount rate
#
# x - cash flows vector
# r - vector or discount rates, in decimals. Single values will be recycled
# t0 - cash flow starts in year 0, default is FALSE, i.e. discount rate in first period is zero.
if(length(r)==1){
r <- rep(r, length(x))
if(t0==TRUE){r[1]<-0}
}
x/cumprod(1+r)
}
npv <- function(x, r, t0=FALSE){
# calculates net present value (NPV) given cash flow and discount rate
#
# x - cash flows vector
# r - discount rate, in decimals
# t0 - cash flow starts in year 0, default is FALSE
sum(dcf(x, r, t0))
}
npv(df$Revenue,.2)
#Non optimized npv yields a value of 492.
#How can i use lpSolve to optimize my table? Said another way how can I rearrange the table to maximize npv using lpSolve?
More complicated problem involves a penalizing column with the following rule:
Id's represent projects.
if Id project is not the starting period (row 1). Check to see if previous Id is within a delta of 2 (absolute value of subtracting row Id from other previous rows. If true, penalize Revenue by 20%. I think this problem still involved solving for the correct order. How can I optimize this function?
#Randomize order to give base npv. Now i need to optimize the order to find max value
df<- df%>%mutate(random_sort= sample(nrow(df)))
x=function(i){
df_fcn<- i
df_fcn<- df_fcn%>%mutate(Penalty= if_else(abs(random_sort-lag(random_sort))>2,1,.8))%>%mutate(Penalty=ifelse(is.na(Penalty),1,Penalty))
df_fcn<- df_fcn%>%mutate(Revenue_Penalized= Revenue*Penalty)
npv(df_fcn$Revenue_Penalized,.2)
}
Best I've come up with is to randomly rearrange the data and find the maximum value.
schedule_function=function(i){
i<- i%>%mutate(random_sort=sample(random_sort))
df_fcn<- i%>%mutate(Penalty= if_else(abs(random_sort-lag(random_sort))>2,1,.8))%>%mutate(Penalty=ifelse(is.na(Penalty),1,Penalty))
df_fcn<- df_fcn%>%mutate(Revenue_Penalized= Revenue*Penalty)
final_df<-print(df_fcn)
npv(df_fcn$Revenue_Penalized,.2)
}
n <- 1:10000
MAX = -Inf ## initialize maximum
for (i in 1:length(n)) {
x <- schedule_function(df)
if (x > MAX) MAX <- x
}

How can I repeat these two lines of code 100+ times?

I'm still new to the programming world and looking for some guidance on a model I am building for individual animal growths over time.
The goal for the code I'm working with is to
i) Generate random starting sizes of animals from a given distribution
ii) Give each of these individuals a starting growth rate from a given distribution
iii) Calculate new size of individual after 1 year
iv) Assign a new growth rate from above distribution
v) Calculate the new size of individual after another year.
So far I have the code below, and what I want to do is repeat the last two lines of code x amount of times without I having to physically run the code over and over.
# Generate starting lengths
lengths <- seq(from=4.4, to=5.4, by =0.1)
# Generate starting ks (growth rate)
ks <- seq(from=0.0358, to=0.0437, by =0.0001)
#Create individuals
create.inds <- function(id = NaN, length0=NaN, k1=NaN){
inds <- data.frame(id=id, length0 = length0, k1=k1)
inds
}
# Generate individuals
inds <- create.inds(id=1:n.initial,
length=sample(lengths,100,replace=TRUE),
k1=sample(ks, 100, replace=TRUE))
# Calculate new lengths based on last and 2nd last columns and insert into next column
inds[,ncol(inds)+1] <- 326*(1-exp(-(inds[,ncol(inds)])))+
(inds[,ncol(inds)-1]*exp(-(inds[,ncol(inds)])))
# Calculate new ks and insert into last column
inds[,ncol(inds)+1] <- sample(ks, 100, replace=TRUE)
Any and all assistance would be appreciated, also if you think there is a better way to write this please let me know.
i think what you are asking is a simple loop:
for (i in 1:100) { #replace 100 with the desired times you want this to excecute
inds[,ncol(inds)+1] <- 326*(1-exp(-(inds[,ncol(inds)])))+
(inds[,ncol(inds)-1]*exp(-(inds[,ncol(inds)])))
# Calculate new ks and insert into last column
inds[,ncol(inds)+1] <- sample(ks, 100, replace=TRUE)
}

Data perturbation - How to perform it?

I am doing some projects related to statistics simulation using R based on "Introduction to Scientific Programming and Simulation Using R" and in the Students projects session (chapter 24) i am doing the "The pipe spiders of Brunswick" problem, but i am stuck on one part of an evolutionary algorithm, where you need to perform some data perturbation according to the sentence bellow:
"With probability 0.5 each element of the vector is perturbed, independently
of the others, by an amount normally distributed with mean 0 and standard
deviation 0.1"
What does being "perturbed" really mean here? I dont really know which operation I should be doing with my vector to make this perturbation happen and im not finding any answers to this problem.
Thanks in advance!
# using the most important features, we create a ML model:
m1 <- lm(PREDICTED_VALUE ~ PREDICTER_1 + PREDICTER_2 + PREDICTER_N )
#summary(m1)
#anova(m1)
# after creating the model, we perturb as follows:
#install.packages("perturb") #install the package
library(perturb)
set.seed(1234) # for same results each time you run the code
p1_new <- perturb(m1, pvars=c("PREDICTER_1","PREDICTER_N") , prange = c(1,1),niter=200) # your can change the number of iterations to any value n. Total number of iteration would come to be n+1
p1_new # check the values of p1
summary(p1_new)
Perturbing just means adding a small, noisy shift to a number. Your code might look something like this.
x = sample(10, 10)
ind = rbinom(length(x), 1, 0.5) == 1
x[ind] = x[ind] + rnorm(sum(ind), 0, 0.1)
rbinom gets the elements to be modified with probability 0.5 and rnorm adds the perturbation.

how to create a random loss sample in r using if function

I am working currently on generating some random data for a school project.
I have created a variable in R using a binomial distribution to determine if an observation had a loss yes=1 or not=0.
Afterwards I am trying to generate the loss amount using a random distribution for all observations which already had a loss (=1).
As my loss amount is a percentage it can be anywhere between 0
What Is The Intuition Behind Beta Distribution # stats.stackexchange
In a third step I am looking for an if statement, which combines my two variables.
Please find below my code (which is only working for the Loss_Y_N variable):
Loss_Y_N = rbinom(1000000,1,0.01)
Loss_Amount = dbeta(x, 10, 990, ncp = 0, log = FALSE)
ideally I can combine the two into something like
if(Loss_Y_N=1 then Loss_Amount=dbeta(...) #... is meant to be a random variable with mean=0.15 and should be 0<x=<1
else Loss_Amount=0)
Any input highly appreciated!
Create a vector for your loss proportion. Fill up the elements corresponding to losses with draws from the beta. Tweak the parameters for the beta until you get the desired result.
N <- 100000
loss_indicator <- rbinom(N, 1, 0.1)
loss_prop <- numeric(N)
loss_prop[loss_indicator > 0] <- rbeta(sum(loss_indicator), 10, 990)

Root mean square deviation on binned GAM results using R

Background
A PostgreSQL database uses PL/R to call R functions. An R call to calculate Spearman's correlation looks as follows:
cor( rank(x), rank(y) )
Also in R, a naïve calculation of a fitted generalized additive model (GAM):
data.frame( x, fitted( gam( y ~ s(x) ) ) )
Here x represents the years from 1900 to 2009 and y is the average measurement (e.g., minimum temperature) for that year.
Problem
The fitted trend line (using GAM) is reasonably accurate, as you can see in the following picture:
The problem is that the correlations (shown in the bottom left) do not accurately reflect how closely the model fits the data.
Possible Solution
One way to improve the accuracy of the correlation is to use a root mean square error (RMSE) calculation on binned data.
Questions
Q.1. How would you implement the RMSE calculation on the binned data to get a correlation (between 0 and 1) of GAM's fit to the measurements, in the R language?
Q.2. Is there a better way to find the accuracy of GAM's fit to the data, and if so, what is it (e.g., root mean square deviation)?
Attempted Solution 1
Call the PL/R function using the observed amounts and the model (GAM) amounts: correlation_rmse := climate.plr_corr_rmse( v_amount, v_model );
Define plr_corr_rmse as follows (where o and m represent the observed and modelled data): CREATE OR REPLACE FUNCTION climate.plr_corr_rmse(
o double precision[], m double precision[])
RETURNS double precision AS
$BODY$
sqrt( mean( o - m ) ^ 2 )
$BODY$
LANGUAGE 'plr' VOLATILE STRICT
COST 100;
The o - m is wrong. I'd like to bin both data sets by calculating the mean of every 5 data points (there will be at most 110 data points). For example:
omean <- c( mean(o[1:5]), mean(o[6:10]), ... )
mmean <- c( mean(m[1:5]), mean(m[6:10]), ... )
Then correct the RMSE calculation as:
sqrt( mean( omean - mmean ) ^ 2 )
How do you calculate c( mean(o[1:5]), mean(o[6:10]), ... ) for an arbitrary length vector in an appropriate number of bins (5, for example, might not be ideal for only 67 measurements)?
I don't think hist is suitable here, is it?
Attempted Solution 2
The following code will solve the problem, however it drops data points from the end of the list (to make the list divisible by 5). The solution isn't ideal as the number "5" is rather magical.
while( length(o) %% 5 != 0 ) {
o <- o[-length(o)]
}
omean <- apply( matrix(o, 5), 2, mean )
What other options are available?
Thanks in advance.
You say that:
The problem is that the correlations (shown in the bottom left) do not accurately reflect how closely the model fits the data.
You could calculate the correlation between the fitted values and the measured values:
cor(y,fitted(gam(y ~ s(x))))
I don't see why you want to bin your data, but you could do it as follows:
mean.binned <- function(y,n = 5){
apply(matrix(c(y,rep(NA,(n - (length(y) %% n)) %% n)),n),
2,
function(x)mean(x,na.rm = TRUE))
}
It looks a bit ugly, but it should handle vectors whose length is not a multiple of the binning length (i.e. 5 in your example).
You also say that:
One way to improve the accuracy of the
correlation is to use a root mean
square error (RMSE) calculation on
binned data.
I don't understand what you mean by this. The correlation is a factor in determining the mean squared error - for example, see equation 10 of Murphy (1988, Monthly Weather Review, v. 116, pp. 2417-2424). But please explain what you mean.

Resources