I need to run a non-linear least squares regression on an entire data set, and then repeat the regression on several subsets of that data set. I can do this for a single subset; for example (where y is a generic logistic equation, and x is a vector from 1 to 20):
example = nls(x ~ y, subset = c(2:20))
but I want to do this for 3:20, 4:20, 5:20, etc. I tried a for loop:
datasubsets <- sapply(2:19, seq, to = 20)
for (i in 1:19){
example[i] = nls(x ~ y, subset = datasubsets[i])
}
but I receive "Error in xj[i] : invalid subscript type 'list'". I would very much like to avoid having to copy and paste nls() 20 times. Any help is much appreciated.
This does the job: sapply(2:19,function(jj) nls(x~y,subset=jj:20)).
Related
I have a csv file (single column, numeric values) called "y" that consists of zeros and ones where the rows with the value 1 indicate the target variable for logistic regression, and another file called "x" with the same number of rows and with columns of numeric predictor values. How do I load these so that I can then use cv.glmnet, i.e.
x <- read.csv('x',header=FALSE,sep=",")
y <- read.csv('y',header=FALSE )
is throwing an error
Error in y %*% rep(1, nc) :
requires numeric/complex matrix/vector arguments
when I call
cvfit = cv.glmnet(x, y, family = "binomial")
I know that "y" should be loaded as a "factor," but how do I do this? My online searches have found all sorts of approaches that have just confused me. What is the simple one-liner to just load this data ready for glmnet?
The cv.glmnet requires data to be provided in vector or matrix format. You can use the following code
xmat = as.matrix(x)
yvec = as.vector(y)
Then use
cvfit = cv.glmnet(xmat, yvec, family = "binomial")
If you can provide your data in dput() format, I can give a try.
I am naive at R and trying to get a stuff done so advanced apologies if it is a stupid way of doing it.
I am trying to get coefficient and relevance of x-values to y-values. Values in X are criteria to which co-relevance is being tested.
I need to find postive or negative relevance/confidence for results represented in myList. Rather than putting one column in Y manually I just want to iterate through it for result of each column.
library(rms)
parameters <- read.csv(file="C:/Users/manjaria/Documents/Lek papers/validation_csv.csv", header=TRUE)
#attach(parameters)
myList <- c("name1","name2","name3","name4","name5")
for (cnt in seq(length(myList))) {
Y<- cbind(myList[cnt])
X<- cbind(age,female,income,employed,traveldays,modesafety,prPoolsize)
XVar <-c("age","female","income","employed","traveldays","modesafety","prPoolsize")
summary (Y)
summary (X)
table(Y)
ddist<- datadist(XVar)
options(datadist = 'ddist')
ologit<- lrm(Y ~ X, data = parameters)
print(ologit)
fitted<- predict(ologit, newdata=parameters, type = "fitted.ind")
colMeans(fitted)
}
I encounter:
Error in model.frame.default(formula = Y ~ X, data = parameters, na.action = function (frame) :
variable lengths differ (found for 'X')
If I don't loop through for-loop and use a static name for Y like
Y<- cbind(name1) it works well.
I'm fitting GARCH model to the residuals of and ARIMA, and trying to apply ARCH(p) for p from 1 to 10 to compare the fitness. Here is my code. Errors are returned in the for loop part but I cannot figure out the reason why. Could anyone give some tips?
So for the single value p=1 the codes are as below and it's no problem.
fitone<- garchFit(~garch(1,0),data=logprice)
coef(fitone)
summary(fitone)
And for the for loop my codes go like
for (n in 1:10) {
fit [[n]]<- garchFit(~garch(n,0),data=logprice)
coef(fit[[n]])
summary(fit[[n]])
}
Error in .garchArgsParser(formula = formula, data = data, trace = FALSE) :
Formula and data units do not match.
I never wrote a loop code before. Can someone help me with the codes?
The problem is that generally one tries to evaluate all the variables in a formula in the context of the data= parameter, but your n variable isn't coming from logprice, it's coming from the global environment. You will need to dynamically create the formula. Here's one way to run all the models with lapply rather than a for look would be
library(fGarch)
#sample data
x.vec = as.vector(garchSim(garchSpec(rseed = 1985), n = 200)[,1])
fits <- lapply(1:10, function(n) {
garchFit(bquote(~garch(.(n),0)), data = x.vec, trace = FALSE)
})
and then we can get the coefs with
lapply(fits, coef)
I am working on a classification problem where for my training data I have a data frame X and a factor variable Y, and I would like predict my variable Y from X.
The function cforest from the party package has the following interface
cforest(formula, data = list(), ...)
Where:
formula: a symbolic description of the model to be fit. Note that
symbols like ':' and '-' will not work and the tree will make
use of all variables listed on the rhs of 'formula'.
data: a data frame containing the variables in the model.
However, when I try:
# Build a random set of training vectors X
X <- data.frame(replicate(5, rnorm(2000)))
# Build Y from X
Y <- runif(1)*X[,1]*X[,2]^2+runif(1)*X[,3]/X[,4]
cforest(Y, data = X, ...)
I get an error:
..
10: ParseFormula(formula, data = data)
...
5: cforest(Y, data = X, ...) at ..
From the traceback it looks like I am not using the interface to cforest correctly. I have read about R formulas (?formula and this tutorial, which was very helpful), and I understand the concept abstractly, but I don't know how to convert my prediction problem (which I would write Y ~ X) to the formula syntax.
How can I convert my call to cforest using a formula?
The answer is to use the following syntax:
cf.model = cforest(Y ~ ., data=X, ...)
which basically says "use all variables in the dataframe X when trying to predict Y"
In R I use nls to do a nonlinear least-squares fit. How then do I plot the model function using the values of the coefficients that the fit provided?
(Yes, this is a very naive question from an R relative newbie.)
Using the first example from ?nls and following the example I pointed you to line by line achieves the following:
#This is just our data frame
DNase1 <- subset(DNase, Run == 1)
DNase1$lconc <- log(DNase1$conc)
#Fit the model
fm1DNase1 <- nls(density ~ SSlogis(lconc, Asym, xmid, scal), DNase1)
#Plot the original points
# first argument is the x values, second is the y values
plot(DNase1$lconc,DNase1$density)
#This adds to the already created plot a line
# once again, first argument is x values, second is y values
lines(DNase1$lconc,predict(fm1DNase1))
The predict method for a nls argument is automatically returning the fitted y values. Alternatively, you add a step and do
yFitted <- predict(fm1DNase1)
and pass yFitted in the second argument to lines instead. The result looks like this:
Or if you want a "smooth" curve, what you do is to simply repeat this but evaluate the function at more points:
r <- range(DNase1$lconc)
xNew <- seq(r[1],r[2],length.out = 200)
yNew <- predict(fm1DNase1,list(lconc = xNew))
plot(DNase1$lconc,DNase1$density)
lines(xNew,yNew)
coef(x) returns the coefficients for regression results x.
model<-nls(y~a+b*x^k,my.data,list(a=0.,b=1.,k=1))
plot(y~x,my.data)
a<-coef(model)[1]
b<-coef(model)[2]
k<-coef(model)[3]
lines(x<-c(1:10),a+b*x^k,col='red')
For example.
I know what you want (I'm a Scientist). This isn't it, but at least shows how to use 'curve' to plot your fitting function over any range, and the curve will be smooth. Using the same data set as above:
nonlinFit <- nls(density ~ a - b*exp(-c*conc), data = DNase1, start = list(a=1, b=1, c=1) )
fitFnc <- function(x) predict(nonlinFit, list(conc=x))
curve(fitFnc, from=.5, to=10)
or,
curve(fitFnc, from=8.2, to=8.4)
or,
curve(fitFnc, from=.1, to=50) # well outside the data range
or whatever (without setting up a sequence of evaluation points first).
I'm a rudimentary R programmer, so I don't know how to implement (elegantly) something like ReplaceAll ( /. ) in Mathematica that one would use to replace occurrences of the symbolic parameters in the model, with the fitted parameters. This first step works although it looks horrible:
myModel <- "a - b*exp(-c*conc)"
nonlinFit <- nls(as.formula(paste("density ~", myModel)), data = DNase1, start = list(a=1, b=1, c=1) )
It leaves you with a separate 'model' (as a character string), that you might be able to make use of with the fitted parameters ... cleanly (NOT digging out a, b, c) would simply use nonlinFit ... not sure how though.
The function "curve" will plot functions for you.