Non-negative linear regression with xgboost - r

This is my first question here so I'm sorry if it's not properly asked.
I'm playing around with the xgboost function in R and I was wondering if there is a simple parameter I could change so my linear regression objective=reg:linear has the restriction of only non-negative coefficients? I know I can use nnls for non-negative least squares regression, but I would prefer some stepwise solution like xgboost is offering.
If there is no easy way but a complicated one I would be happy to hear that, too. I read there is an option to build custom objective functions. So maybe you could change the reg:linear function at some point to get the non-negativity?
Thank you very much for your advice in advance!

Related

How do I find the exact equation for a Caret model in R?

I used Caret to create a regression model of a dataset in R, and I wish to find this equation for usage in other websites (e.g. Desmos). I am unable to find info anywhere on how to do this, so if anyone has answers, that would be much appreciated! :D

R: Evaluate Gradient Boosting Machines (GBM) for Regression

Which are the best metrics to evaluate the fit of a GBM algorithm in R (metrics, graphs, ratios)? And how interpret them?
I think maybe you are overthinking this one! Take a step back and think about what matters... the error. You have forecasted values and you have observed values. the difference tells you most of what you need to know when comparing across models. Basic measures like MSE, MPE, etc. should do fine. If you are looking to refine within a given model, I would recommend taking a look at the gbm documentation. For example, you can pass your gbm model object to summary(), to get the relative influence of each of your variables. Additionally, you can find a lot of information in the documentation, so if you haven't taken a look, I would recommend doing so! I have posted the link at the bottom.
-Carmine
gbm_documentation

Initial parameters in nonlinear regression in R

I want to learn how to do nonlinear regression in R. I managed to learn the basics of the nls function, but how we know it's crucial in nonlinear regression to use good initial parameters. I tried to figure out how selfStart and getInitial functions works but failed. The documentation is very scarce and not very usefull. I wanted to learn these functions via a simple simulation data. I simulated data from logistic model:
n<-100 #Number of observations
d<-10000 #our parameters
b<--2
e<-50
set.seed(n)
X<-rnorm(n, -e/b, 2) #Thanks to it we'll have many observations near the point where logistic function grows the faster
Y<-d/(1+exp(b*X+e))+rnorm(n, 0, 200) #I simulate data
Now I wanted to do regression with a function f(x)=d/(1+exp(b*x+e)) but I don't know how to use selfStart or getInitial. Could you help me? But please, don't tell me about SSlogis. I'm aware it's a functon destined to find initial parameters in logistic regression, but It seems it only works in regression with one explanatory variable and I'd like to learn how to do logistic regression with more than one explanatory variables and even how to do general nonlinear regression with a function that I defined mysefl.
I will be very gratefull for your help.
I don't know why the calculus of good initial parameters fails in R. The aim of my answer is to provide a method to find good enough initial parameters.
Note that a non-iterative method exists which doesn't requires initial parameters. The principle is explained in this paper, pp.37-46 : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
A simplified version is shown below.
If the results are not sufficient, they can be used as initial parameters in an usual non-linear regression software such as in R.
A numerical example is shown below. Usually the number of points is much higher. Here it is deliberately low in order to make easier the checking when one edit the code and check it.

Standard error of the ARIMA constant

I am trying to manually calculate the standard error of the constant in an ARIMA model, if it is included. I have referred to Box and Jenkins (1994) text, specially Section 7.2, but my understanding is that the methods mentioned here calculates the variance-covariance matrix for the ARIMA parameters only, not the constant. Tried searching on the Internet, but couldn't find any theory. Software like Minitab, R etc. calculate this, so I was wondering what is the way? Can someone provide any pointer(s) on this topic?
Thanks.
arima() will fit a regression model with ARMA errors. The constant is treated as the coefficient of a regression variable consisting only of 1s. So you need the covariance matrix of the regression coefficients which is usually calculated separately from the covariance matrix of the ARMA coefficients. Look at Section 8.3 of Hamilton's "Time series analysis"
One of the nicest things about R is that you can access a lot of the source code to R itself from within the environment. If you simply type arima at the command prompt, you get the high-level source code for the arima() function. I got several pages of code here, when I tried it.
You do miss out on anything implemented internally within the R executable in native code, but often the high-level code tells you everything you want to know.
Perhaps a shift of perspective can solve this problem.
Rather than seeing the constant as something special, just consider the problem without constant and with a variable that is a vector of ones.

How can I estimate the logarithmic form of data points using R?

I have data points that represent a logarithmic function.
Is there an approach where I can just estimate the function that describes this data using R?
Thanks.
I assume that you mean that you have vectors y and x and you try do fit a function y(x)=Alog(x).
First of all, fitting log is a bad idea, because it doesn't behave well. Luckily we have x(y)=exp(y/A), so we can fit an exponential function which is much more convenient. We can do it using nonlinear least squares:
nls(x~exp(y/A),start=list(A=1.),algorithm="port")
where start is an initial guess for A. This approach is a numerical optimization, so it may fail.
The more stable way is to transform it to a linear function, log(x(y))=y/A and fit a straight line using lm:
lm(log(x)~y)
If I understand right you want to estimate a function given some (x,y) values of it. If yes check the following links.
Read about this:
http://en.wikipedia.org/wiki/Spline_%28mathematics%29
http://en.wikipedia.org/wiki/Polynomial_interpolation
http://en.wikipedia.org/wiki/Newton_polynomial
http://en.wikipedia.org/wiki/Lagrange_polynomial
Googled it:
http://www.stat.wisc.edu/~xie/smooth_spline_tutorial.html
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/smooth.spline.html
http://www.image.ucar.edu/GSP/Software/Fields/Help/splint.html
I never used R so I am not sure if that works or not, but if you have Matlab i can explain you more.

Resources