How to calculate CAPM variables in Julia? - julia

In Python, using stats of scipy package, variables beta, alpha, r, p, std_err of CAPM can be calculated as follows:
beta, alpha, r_value, pvalue, std_err = stats.linregress(stock_rtn_arr, mkt_rtn_arr)
Please guide me in calculating the above variables in Julia.

I'm assuming you are looking to run a simple OLS model, which in Julia can be fit using the GLM package:
julia> using GLM, DataFrame
julia> mkt_rtn_arr = randn(500); stock_rtn_arr = 0.5*mkt_rtn_arr .+ rand();
julia> df = DataFrame(mkt_rtn = mkt_rtn_arr, stock_rtn = stock_rtn_arr);
julia> linear_model = lm(#formula(stock_rtn ~ mkt_rtn), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
stock_rtn ~ 1 + mkt_rtn
Coefficients:
──────────────────────────────────────────────────────────────────────────────
Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────────────
(Intercept) 0.616791 7.80308e-18 7.90446e16 <1e-99 0.616791 0.616791
mkt_rtn 0.5 7.78767e-18 6.42041e16 <1e-99 0.5 0.5
──────────────────────────────────────────────────────────────────────────────
You can then extract the parameters of interest from the linear_model:
julia> β = coef(linear_model)[2]
0.4999999999999999
julia> α = coef(linear_model)[1]
0.6167912017573035
julia> r_value = r2(linear_model)
1.0
julia> pvalues = coeftable(linear_model).cols[4]
2-element Array{Float64,1}:
0.0
0.0
julia> stderror(linear_model)
2-element Array{Float64,1}:
7.803081577574428e-18
7.787667394841443e-18
Note that I have used the #formula API to run the regression, which requires putting your data into a DataFrame and is in my opinion the preferred way of estimating the linear model in GLM, as it allows for much more flexibility in specifying the model. Alternatively you could have called lm(X, y) directly on an array for your X variable and the y variable:
julia> lm([ones(length(mkt_rtn_arr)) mkt_rtn_arr], stock_rtn_arr)
LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}}:
Coefficients:
─────────────────────────────────────────────────────────────────────
Estimate Std. Error t value Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────
x1 0.616791 7.80308e-18 7.90446e16 <1e-99 0.616791 0.616791
x2 0.5 7.78767e-18 6.42041e16 <1e-99 0.5 0.5
─────────────────────────────────────────────────────────────────────
Note that here I have appended a column of ones to the market return array to estimate the model with an intercept, which the #formula macro will do automatically (similar to the way it's done in R).

Related

how to add an intercept for linear regression when using matrix as an input in GLM Julia

I am trying to use linear regression in GLM from Julia, with a matrix as inputs rather than a DataFrame.
The inputs are:
julia> x
4×2 Matrix{Int64}:
1 1
2 2
3 3
4 4
julia> y
4-element Vector{Int64}:
0
2
4
6
But when I tried to fit it using lm function, I found that the intercept is not default:
julia> lr = lm(x, y)
LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:
Coefficients:
───────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
───────────────────────────────────────────────────────────────
x1 0.666667 1.11848e7 0.00 1.0000 -4.81244e7 4.81244e7
x2 0.666667 1.11848e7 0.00 1.0000 -4.81244e7 4.81244e7
───────────────────────────────────────────────────────────────
I check the official docs of GLM but they only explain the usage of DataFrames as input. Is there a way of adding intercepts to the model when using matrice as inputs without altering the input (such as adding a column of 1s in x)?
If you are using the X, y method, you are responsible for constructing the design matrix yourself. If you do not want to do that, use the formula method. This requires a bit of intermediate setup with your example, as the data needs to be in tabular form, but you can just create a named tuple:
data = #views (;y, x1 = x[:, 1], x2 = x[:, 2])
lm(#formula(y ~ 1 + x1 + x2), data)
If you have a dataframe or similar at hand, you can (probably) directly use it.
(IIRC, you could also just write #formula(y ~ x1 + x2), and it will add the intercept automatically, as in R. But I prefer the explicit specification.)

Simple Linear Regression Error no matching methods

I am attempting to mess around with a simple autoregressive model and need to perform a simple linear regression in julia and am running into an issue that says
ERROR: MethodError: no method matching fit(::Type{LinearModel}, ::Matrix{Float64}, ::Matrix{Float64}, ::Nothing)
My code is
using XLSX, DataFrames, GLM, StatsBase
Quakes = DataFrame(XLSX.readtable("/Users/jjtan/Downloads/QUAKE.xlsx", "quakes"))
for i in 1:98
Quakes[!, Symbol("t", i)] = [zeros(i); Quakes[1:end-i, :x]]
end
ols = lm(#formula(x ~ t1), Quakes)
Note that
typeof(Quakes)
returns a DataFrame 99x99
I have additionally tried the documentation regression to ensure that everything is working properly ie
using DataFrames, GLM
data = DataFrame(X=[1,2,3], Y=[2,4,7])
ols = lm(#formula(Y ~ X), data)
Which has produced the expected
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}
Y ~ 1 + J
Coefficients:
─────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────────
(Intercept) -0.666667 0.62361 -1.07 0.4788 -8.59038 7.25704
J 2.5 0.288675 8.66 0.0732 -1.16797 6.16797
─────────────────────────────────────────────────────────────────────────
I have tried changing the variables to call using df.var notation as well as :var notation to no avail. I am quite confused as to why with pretty much identical lines of code one is running into an error and the other is not.

Julia: How to find best fit curve / equation when I have a plot?

Julia: How to find best fit curve / equation when I have a plot?
I have a plot which I made with map but I need to find a quadratic equation that fits this?
As said in comments, having a plot is not really relevant here; only the data itself is. You can use packages such as GLM to build (Generalized) Linear Models of your data, and possibly plot them or use them to predict new outcomes.
Here is a simple example. Let's first create sample data:
using Plots
using DataFrames
df = DataFrame(x = sort(rand(100)))
df.y = 1 .+ 2*df.x .+ 3*df.x.^2 .+ 0.1*randn(100) # y = 1 + 2x + 3x² + noise
scatter(df.x, df.y, label="data")
and build a 2nd order linear model out of it:
using GLM
model = lm(#formula(y ~ 1 + x + x^2), df) # Note how the formula looks exactly like the model you want to build
plot!(df.x, predict(model, df), label="model")
you should get something like the following:
julia> model
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
y ~ 1 + x + :(x ^ 2)
Coefficients:
────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept) 1.04201 0.0073252 142.25 <1e-99 1.02747 1.05655
x 2.04349 0.0332272 61.50 <1e-78 1.97754 2.10944
x ^ 2 2.95854 0.0321212 92.11 <1e-95 2.89478 3.02229
────────────────────────────────────────────────────────────────────────
The Polynomials package is a bit less intimidating than GLM. You can use the fit function in that package to obtain a Polynomial of best fit for any provided order (degree). Given some arbitrary (x,y) data, you can create and plot the polynomial of best fit as below.
julia> using Polynomials
julia> x=1:10;
julia> y=rand(10);
julia> quadfit=fit(x,y,2)
Polynomial(-0.06970526100724156 + 0.1638766946706202*x -0.008058423435867207*x^2)
julia> using Plots
julia> plot(x,y,label="Data")
julia> plot!(quadfit,x[1],x[end],label="Quadratic Fit")
julia> savefig("Data and Curve Fit.png")

Calculate α and β in Probit Model in R

I am facing following issue: I want to calculate the α and β from the following probit model in R, which is defined as:
Probability = F(α + β sprd )
where sprd denotes the explanatory variable, α and β are constants, F is the cumulative normal distribution function.
I can calculate probabilities for the entire dataset, the coeffcients (see code below) etc. but I do not know how to get the constant α and β.
The purpose is to determine the Spread in Excel that corresponds to a certain probability. E.g: Which Spread corresponds to 50% etc.
Thank you in advance!
Probit model coefficients
probit<- glm(Y ~ X, family=binomial (link="probit"))
summary(probit)
Call:
glm(formula = Y ~ X, family = binomial(link = "probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4614 -0.6470 -0.3915 -0.2168 2.5730
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3566755 0.0883634 -4.036 5.43e-05 ***
X -0.0058377 0.0007064 -8.264 < 2e-16 ***
From the help("glm") page you can see that the object returns a value named coefficients.
An object of class "glm" is a list containing at least the following
components:
coefficients a named vector of coefficients
So after you call glm() that object will be a list, and you can access each element using $name_element.
Reproducible example (not a Probit model, but it's the same):
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
d.AD <- data.frame(treatment, outcome, counts)
# fit model
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
Now glm.D93$coefficients will print the vector with all the coefficients:
glm.D93$coefficients
# (Intercept) outcome2 outcome3 treatment2 treatment3
#3.044522e+00 -4.542553e-01 -2.929871e-01 1.337909e-15 1.421085e-15
You can assign that and access each individually:
coef <- glm.D93$coefficients
coef[1] # your alpha
#(Intercept)
# 3.044522
coef[2] # your beta
# outcome2
#-0.4542553
I've seen in your deleted post that you are not convinced by #RLave's answer. Here are some simulations to convince you:
# (large) sample size
n <- 10000
# covariate
x <- (1:n)/n
# parameters
alpha <- -1
beta <- 1
# simulated data
set.seed(666)
y <- rbinom(n, 1, prob = pnorm(alpha + beta*x))
# fit the probit model
probit <- glm(y ~ x, family = binomial(link="probit"))
# get estimated parameters - very close to the true parameters -1 and 1
coef(probit)
# (Intercept) x
# -1.004236 1.029523
The estimated parameters are given by coef(probit), or probit$coefficients.

obtain derivative by spline interpolation

there is a series of x and y values I have (but not the function itself). I would like to get derivative of the unknown function by spline interpolation of the x and y values (getting the derivat...).
My example
EDITED
x<-c(1,2,3,4,5,6,7,8,9,10)
y<-c(0.1,0.3,0.8,0.9,0.91,0.93,0.95,0.98,0.99,0.999)
is it possible in R to interpolate and to get the functional form of the derivative?
My problem is that I have only x and y values of a cdf function but would need to obtain the probability denisty function..so I want to get the derivative by spline interpolation...
The reason for the question is that I would need to obtain the pdf of that cdf so I am trying to spline interpolate the xy values of the cdf - please note that this is a simple example here and not a real cdf
I haven't found the functional form of restricted cubic splines to be particularly difficult to grasp after reading the explanation by Frank Harrell in his book: "Regression Modeling Strategies".
require(rms)
df <- data.frame( x = c(1,2,3,4,5,6,7,8,9,10),
y =c(12,2,-3,5,6,9,8,10,11,10.5))
ols( y ~ rcs(x, 3), df)
#--------------
Linear Regression Model
ols(formula = y ~ rcs(x, 3), data = df)
Model Likelihood Discrimination
Ratio Test Indexes
Obs 10 LR chi2 3.61 R2 0.303
sigma 4.4318 d.f. 2 R2 adj 0.104
d.f. 7 Pr(> chi2) 0.1646 g 2.811
Residuals
Min 1Q Median 3Q Max
-8.1333 -1.1625 0.5333 0.9833 6.9000
Coef S.E. t Pr(>|t|)
Intercept 5.0833 4.2431 1.20 0.2699
x 0.0167 1.1046 0.02 0.9884
x' 1.0000 1.3213 0.76 0.4738
#----------
The rms package has an odd system for storing summary information that needs to be done for some of its special
dd <- datadist(df)
options(datadist="dd")
mymod <- ols( y ~ rcs(x, 3), df)
# cannot imagine that more than 3 knots would make sense in such a small example
Function(mymod)
# --- reformatted to allow inspection of separate terms
function(x = 5.5) {5.0833333+0.016666667* x +
1*pmax(x-5, 0)^3 -
2*pmax(x-5.5, 0)^3 +
1*pmax(x-6, 0)^3 }
<environment: 0x1304ad940>
The zeros in the pmax functions basically suppress any contribution to the total from the term when the x value is less than the knots ( 5, 5.5 and 6 in this case)
Compare three versus four knots (and if you wanted smooth curves then include a finer grained ...-data argument to Predict):
png()
plot(df$x,df$y )
mymod <- ols( y ~ rcs(x, 3), df)
lines(df$x, predict(mymod) ,col="blue")
mymod <- ols( y ~ rcs(x, 4), df)
lines(df$x, predict(mymod) ,col="red")
dev.off()
Take a look at monotone cubic splines, which are nondecreasing by construction. A web search for "monotone cubic spline R" turns up some hits. I haven't used any of the packages mentioned.

Resources