Error with pls function in MixOmics package - r

I am trying to use the pls function in the mixOmics package.
The code I have is the following:
a = rnorm(100)
X = cbind(1, a, a^2, a^3)
Y = rnorm(100)
pls(X,Y)
When I run it, I get the following error message:
In pls(X, Y) : Zero- or near-zero variance predictors.
Reset predictors matrix to not near-zero variance predictors.
See $nzv for problematic predictors.
But I don't understand where is the problem!

The error tells you that one of your input variables (or column) in X has zero or very little variance.
Here, the problem is simply that your X in pls(X,Y) contains a column with constant values, so that the variance of this variable is exactly zero.
If you remove this column from your data, the pls will work ;)
X = X[,-1]
pls(X,Y)

Related

How to load a csv file into R as a factor for use with glmnet and logistic regression

I have a csv file (single column, numeric values) called "y" that consists of zeros and ones where the rows with the value 1 indicate the target variable for logistic regression, and another file called "x" with the same number of rows and with columns of numeric predictor values. How do I load these so that I can then use cv.glmnet, i.e.
x <- read.csv('x',header=FALSE,sep=",")
y <- read.csv('y',header=FALSE )
is throwing an error
Error in y %*% rep(1, nc) :
requires numeric/complex matrix/vector arguments
when I call
cvfit = cv.glmnet(x, y, family = "binomial")
I know that "y" should be loaded as a "factor," but how do I do this? My online searches have found all sorts of approaches that have just confused me. What is the simple one-liner to just load this data ready for glmnet?
The cv.glmnet requires data to be provided in vector or matrix format. You can use the following code
xmat = as.matrix(x)
yvec = as.vector(y)
Then use
cvfit = cv.glmnet(xmat, yvec, family = "binomial")
If you can provide your data in dput() format, I can give a try.

R non-linear model fitting using fitModel function

I want to fit a non-linear model to a real data.
The real data consists of 2 known numerical vectors ; thickness as 'x' and fh as 'y'
thickness=seq(0.15,2.00,by=0.05)
fh = c(5.17641, 4.20461, 3.31091, 2.60899, 2.23541, 1.97771, 1.88141, 1.62821, 1.50138, 1.51075, 1.40850, 1.26222, 1.09432, 1.13202, 1.12918, 1.10355, 1.11867, 1.09740,1.08324, 1.05687, 1.19422, 1.22984, 1.34516, 1.19713,1.25398 ,1.29885, 1.33658, 1.31166, 1.40332, 1.39550,1.37855, 1.41491, 1.59549, 1.56027, 1.63925, 1.72440, 1.74192, 1.82049)
plot(thickness,fh)
This is apparently non-linear. So, I am trying to fit this model as a non-linear function of
y= x*2/3+(2+2*a)/(3*x)
Variable a is an unknown constant and I am trying to find the best constant a that minimizes the sum of square of error between the regression line and the real data.
I first used a function fitModel that I found on a YouTube video, Fitting Functions to Data in R.
library(TIMP)
f=fitModel(fh~thickness^2/3+(2+2*A)/(3*thickness)) #it finds the coefficient 'A'
coef(f) # to represent just the coefficient
However, there's an error
Error in modelspec[[datasetind[i]]] : subscript out of bounds
So, as an alternative, want to find a plot of 'a' and 'the Sum of Squares of Error'. This time, I have such a hard time finding 'a' and plotting this graph. By manual work, I figured out the value 'a' is somewhere near 0.2 but this is not a precise value.
It would be helpful if someone could manifest either:
Why the fitModel function didn't work or
How to find the value a and plot the graph.
You could try this instead:
yf = function(a,xv) xv*(2/3)+(2+2*a)/(3*xv)
yf(2,thickness)
f <- function (a,y, xv) sum((y - yf(a,xv))^2)
f(2,fh,thickness)
xmin <- optimize(f, c(0, 10), tol = 0.0001, y=fh,xv=thickness)
xmin
plot(thickness,fh)
lines(thickness,yf(xmin$minimum,thickness),col=3)

How do I ensure that my x and y lengths don't differ when plotting a glm using the predict() function in R?

I am running the following code:
c.model<-glm(cars$speed~cars$dist, family=gaussian)
summary(c.model)
c.x<-seq(0,1,0.01)
c.x
c.y<-predict.glm(c.model,as.data.frame(c.x), type="response")
c.y
plot(cars$dist)
lines(c.x,c.y)
And getting the error, "Error in xy.coords(x, y) : 'x' and 'y' lengths differ". I'm not quite sure what is causing this error.
You need to be more careful in matching up the variable names used in the model, and those used during prediction. The error you are getting is because the names in the data.frame in the preidct function do not match the names of the terms in your model so you're not actually predicting new values. The problem is that predict is essentially getting the data from
model.frame(~cars$dist, data.frame(dist=c.x))
so because you explicitly have cars$dist in your formula, there are no "free" symbols that will be taken from your newdata parameter. Compare that to the results from
model.frame(~dist, data.frame(dist=c.x))
This time, dist isn't specifically tied to the the cars variable and can be "resolved" in the context of the newdata data.frame.
Additionally, you want to make sure you're keeping your dist values on the same scale. For example.
c.model <- glm(speed~dist, data=cars, family=gaussian)
summary(c.model)
c.x <- seq(min(cars$dist),max(cars$dist),length.out=101)
c.y <- predict.glm(c.model,data.frame(dist=c.x), type="response")
plot(speed~dist, cars)
lines(c.x,c.y)
Here we predict over the range of observed values rather than 0-1 because no distance value is actually less than 1.

User-specified Z matrix in lme

I have been looking forever about how to do this in R and cannot find anything! Basically, I am wanting to shrink predictors using LMM. So I have a set of fixed effects, X, and I have a set of predictors, Z, that I want to put a random effect on so the model is
Y=X*beta+Z*u+e
where u~N(0,sigma_u^2 * I) and e ~ N(0,sigma_e^2 * I). I thought I could do this in lme with
fit <- lme(Y~X,random=pdIdent(~-1+Z))
but I only get the error:
Error in getGroups.data.frame(dataMix, groups) :
invalid formula for groups
Any help on this issue is much appreciated.
Have you tried:
N = sample size
group <- rep(1, N)
fit <- lme(Y~X, random=list(group=pdIdent(~-1+Z)))

R: How to make column of predictions for logistic regression model?

So I have a data set called x. The contents are simple enough to just write out so I'll just outline it here:
the dependent variable, Report, in the first column is binary yes/no (0 = no, 1 = yes)
the subsequent 3 columns are all categorical variables (race.f, sex.f, gender.f) that have all been converted to factors, and they're designated by numbers (e.g. 1= white, 2 = black, etc.)
I have run a logistic regression on x as follows:
glm <- glm(Report ~ race.f + sex.f + gender.f, data=x,
family = binomial(link="logit"))
And I can check the fitted probabilities by looking at summary(glm$fitted).
My question: How do I create a fifth column on the right side of this data set x that will include the predictions (i.e. fitted probabilities) for Report? Of course, I could just insert the glm$fitted as a column, but I'd like to try to write a code that predicts it based on whatever is in the race, sex, gender columns for a more generalized use.
Right now I the follow code which I will hope create a predicted column as well as lower and upper bounds for the confidence interval.
xnew <- cbind(xnew, predict(glm5, newdata = xnew, type = "link", se = TRUE))
xnew <- within(xnew, {
PredictedProb <- plogis(fit)
LL <- plogis(fit - (1.96 * se.fit))
UL <- plogis(fit + (1.96 * se.fit))
})
Unfortunately I get the error:
Error in eval(expr, envir, enclos) : object 'race.f' not found
after the cbind code.
Anyone have any idea?
There appears to be a few typo in your codes; First Xnew calls on glm5 but your model as far as I can see is glm (by the way using glm as name of your output is probably not a good idea). Secondly make sure the variable race.f is actually in the dataset you wish to do the prediction from. My guess is R can't find that variable hence the error.

Resources