I was trying to fit a linear regression in Julia.
I have a data frame with 10 columns. The first 9 columns are the predictors
I call it X and the last column is the response variable I call it Y
I typed linreg(X, Y) But I get an error message saying
linreg has no method matching DataFrame and DataArray Float.
I was wondering how I could fix the issue.
I was thinking of converting X to a data Array
I tried convert(X, Array) But that threw an error as well:
'Convert has no method matching convert'
Does anyone have any suggestions
If you already have your data in a DataFrame, you should take a look at the GLM.jl package.
Specifically the lm function should do what you want and feel very familiar if you are an R user.
If you post more code (maybe which columns in your DataFrame store X and Y) we could help you further.
update: have to use dot operator in Julia 1.0 while performing scalar addition on arrays. i.e. y = m*x .+ b
You can also do linear regression using simple linear algebra.
Here is an example:
# Linear Algebra style
# For single linear regresion y= mx .+ b
m = 3.3; b = 2; x = rand(100,1)
y = m * x .+ b
# add noise
yn= y + randn(size(y)) * 0.5
# regression
X = zeros(100,2); X[:,1] = x; X[:,2] = 1.0
coeff_pred = X\yn
slope = round(coeff_pred[1], 2)
intercept = round(coeff_pred[2], 2)
println("The real slope is $m, and the predicted slope is $slope")
println("The real intercept is $b, and the predicted slope is $intercept")
You are just using convert wrong. The correct syntax is convert(T, x) it reads: convert x to a value of type T.
so basically you need to do:
linreg(convert(Array,X),convert(Array,Y))
and it should work.
Related
Now I have a binary linear equation z = wx1 - (1-w)x2, and w varys from 0 to 1 in 0.1 intervals. With two variables x1 and x2, I want to plot this equation with R.
With constraints x1<=19266669.5, and x2<=52575341.065
I tried the code below, but it didn't work.
w=seq(0,1,0.1)
x1<=19266669.5
x2<=52575341.065
z = w*x1 - (1-w)*x2
plot(w,z,type="l",lwd=2,col="red",main="z = w*x1 - (1-w)*x2")
How should I improve the code? Thank you in advance!
Your are assigning values so, it is an assignment error, you should use first (based on your snippet) the operator <-
Then to plot use <=, or whatever relational operator you want.
Assignments operators:
x <- value
x <<- value<="" code="">
value -> x
value ->> x
x = value
I would like to solve a differential equation in R (with deSolve?) for which I do not have the initial condition, but only the final condition of the state variable. How can this be done?
The typical code is: ode(times, y, parameters, function ...) where y is the initial condition and function defines the differential equation.
Are your equations time reversible, that is, can you change your differential equations so they run backward in time? Most typically this will just mean reversing the sign of the gradient. For example, for a simple exponential growth model with rate r (gradient of x = r*x) then flipping the sign makes the gradient -r*x and generates exponential decay rather than exponential growth.
If so, all you have to do is use your final condition(s) as your initial condition(s), change the signs of the gradients, and you're done.
As suggested by #LutzLehmann, there's an even easier answer: ode can handle negative time steps, so just enter your time vector as (t_end, 0). Here's an example, using f'(x) = r*x (i.e. exponential growth). If f(1) = 3, r=1, and we want the value at t=0, analytically we would say:
x(T) = x(0) * exp(r*T)
x(0) = x(T) * exp(-r*T)
= 3 * exp(-1*1)
= 1.103638
Now let's try it in R:
library(deSolve)
g <- function(t, y, parms) { list(parms*y) }
res <- ode(3, times = c(1, 0), func = g, parms = 1)
print(res)
## time 1
## 1 1 3.000000
## 2 0 1.103639
I initially misread your question as stating that you knew both the initial and final conditions. This type of problem is called a boundary value problem and requires a separate class of numerical algorithms from standard (more elementary) initial-value problems.
library(sos)
findFn("{boundary value problem}")
tells us that there are several R packages on CRAN (bvpSolve looks the most promising) for solving these kinds of problems.
Given a differential equation
y'(t) = F(t,y(t))
over the interval [t0,tf] where y(tf)=yf is given as initial condition, one can transform this into the standard form by considering
x(s) = y(tf - s)
==> x'(s) = - y'(tf-s) = - F( tf-s, y(tf-s) )
x'(s) = - F( tf-s, x(s) )
now with
x(0) = x0 = yf.
This should be easy to code using wrapper functions and in the end some list reversal to get from x to y.
Some ODE solvers also allow negative step sizes, so that one can simply give the times for the construction of y in the descending order tf to t0 without using some intermediary x.
I am currently using Measurements.jl for error propagation and LsqFit.jl for fitting functions to data. Is there a simple way to fit a function to data with errors? It would be no problem to use an other package if that makes things easier.
Thanks in advance for your help.
While in principle it should be possible to make these packages work together, the implementation of LsqFit.jl does not seem to play nicely with the Measurement type. However, if one writes a simple least-squares linear regression directly
# Generate test data, with noise
x = 1:10
y = 2x .+ 3
using Measurements
x_observed = (x .+ randn.()) .± 1
y_observed = (y .+ randn.()) .± 1
# Simple least-squares linear regression
# for an equation of the form y = a + bx
# using `\` for matrix division
linreg(x, y) = hcat(fill!(similar(x), 1), x) \ y
(a, b) = linreg(x_observed, y_observed)
then
julia> (a, b) = linreg(x_observed, y_observed)
2-element Vector{Measurement{Float64}}:
3.9 ± 1.4
1.84 ± 0.23
This ought to be able to work with either x uncertainties, y uncertainties, or both.
If you need a nonlinear least-squares fit, it should also be possible to extend the above approach to nonlinear least squares -- though for the latter it may be easier to just find where the incompatibility is in LsqFit.jl and make a PR.
I would like to solve a differential equation in R (with deSolve?) for which I do not have the initial condition, but only the final condition of the state variable. How can this be done?
The typical code is: ode(times, y, parameters, function ...) where y is the initial condition and function defines the differential equation.
Are your equations time reversible, that is, can you change your differential equations so they run backward in time? Most typically this will just mean reversing the sign of the gradient. For example, for a simple exponential growth model with rate r (gradient of x = r*x) then flipping the sign makes the gradient -r*x and generates exponential decay rather than exponential growth.
If so, all you have to do is use your final condition(s) as your initial condition(s), change the signs of the gradients, and you're done.
As suggested by #LutzLehmann, there's an even easier answer: ode can handle negative time steps, so just enter your time vector as (t_end, 0). Here's an example, using f'(x) = r*x (i.e. exponential growth). If f(1) = 3, r=1, and we want the value at t=0, analytically we would say:
x(T) = x(0) * exp(r*T)
x(0) = x(T) * exp(-r*T)
= 3 * exp(-1*1)
= 1.103638
Now let's try it in R:
library(deSolve)
g <- function(t, y, parms) { list(parms*y) }
res <- ode(3, times = c(1, 0), func = g, parms = 1)
print(res)
## time 1
## 1 1 3.000000
## 2 0 1.103639
I initially misread your question as stating that you knew both the initial and final conditions. This type of problem is called a boundary value problem and requires a separate class of numerical algorithms from standard (more elementary) initial-value problems.
library(sos)
findFn("{boundary value problem}")
tells us that there are several R packages on CRAN (bvpSolve looks the most promising) for solving these kinds of problems.
Given a differential equation
y'(t) = F(t,y(t))
over the interval [t0,tf] where y(tf)=yf is given as initial condition, one can transform this into the standard form by considering
x(s) = y(tf - s)
==> x'(s) = - y'(tf-s) = - F( tf-s, y(tf-s) )
x'(s) = - F( tf-s, x(s) )
now with
x(0) = x0 = yf.
This should be easy to code using wrapper functions and in the end some list reversal to get from x to y.
Some ODE solvers also allow negative step sizes, so that one can simply give the times for the construction of y in the descending order tf to t0 without using some intermediary x.
I want to calculate the differential response of y to x (continuous) depending on the categorical variable z.
In the standard lm setup:
lm(y~ x:z)
However, I want to do this while allowing for Impulse Indicator Saturation (IIS) in the 'gets' package. However, the following syntax produces an error:
isat(y, mxreg=x:z, iis=TRUE)
The error message is of the form:
"Error in solve.qr(out, tol = tol, LAPACK = LAPACK) :
singular matrix 'a' in 'solve"
1: In x:z :
numerical expression has 96 elements: only the first used
2: In x:z :
numerical expression has 96 elements: only the first used"
How should I modify the syntax?
Thank you!
At the moment, alas, isat doesn't provide the same functionality as lm on categorical/character variables, nor on using * and :. We hope to address that in a future release.
In the meantime you'll have to create distinct variables in your dataset representing the interaction. I guess something like the following...
library(gets)
N <- 100
x <- rnorm(N)
z <- c(rep("A",N/4),rep("B",N/4),rep("C",N/4),rep("D",N/4))
e <- rnorm(N)
y <- 0.5*x*as.numeric(z=="A") + 1.5*x*as.numeric(z=="B") - 0.75*x*as.numeric(z=="C") + 5*x*as.numeric(z=="D") + e
lm.reg <- lm(y ~ x:z)
arx.reg.0 <- arx(y,mxreg=x:z)
data <- data.frame(y,x,z,stringsAsFactors=F)
for(i in z[duplicated(z)==F]) {
data[[paste("Zx",i,sep=".")]] <- data$x * as.numeric(data$z==i)
}
arx.reg.1 <- arx(data$y,mxreg=data[,c("x","Zx.A","Zx.B","Zx.C")])
isat.1 <- isat(data$y,mc=TRUE,mxreg=data[,c("x","Zx.A","Zx.B","Zx.C")],max.block.size=20)
Note that as you'll be creating dummies for each category, there's a chance those dummies will cause singularity of your matrix of explanatory variables (if, as in my example, isat automatically uses 4 blocks). Using the argument max.block.size enables you to avoid this problem.
Let me know if I haven't addressed your particular point.