Predict using multiple variables in R - r

I have a slight problem with my R coursework.
I have made a following dataset:
Now I'm going to plot the values based on this dataset using the following command:
plot(x ~ Group.1, data = jarelmaks_vaikelaen23mean,
xlab = "Vanus", ylab = "PD", main = "Järelmaks ja väikelaen")
After that, I'm creating a glm model using the following command. The difference is, that now I'm using an original dataset (the values of the dependent values are 1/0).
GLM command:
jarelmaks_vaikelaen23_mudel <- glm(Default ~ Vanus.aastates + Toode,
family = binomial(link = 'logit'), data = jarelmaks_vaikelaen_23)
Now, I'm trying to predict the values using my model.
predict(jarelmaks_vaikelaen23_mudel,data.frame(Vanus.aastates=x),type = "resp")
Unfortunately, I get a following error message:
Error in data.frame(Vanus.aastates = x) : object 'x' not found
Can you give me some ideas, how to solve this problem or explain, how this predict() command works or smth?

When you provide a data-frame to the predict function's newdata argument, the data-frame should have column names that match the variables used as independent variables in your model-fitting step. That is, your predict call should look like
predict(
jarelmaks_vaikelaen23_mudel,
newdata = data.frame(
Vanus.aastates = SOMETHING,
Toode = SOMETHING_ELSE
),
type = "response"
)

Related

R: how to make predictions using gamboost

library(mboost)
### a simple two-dimensional example: cars data
cars.gb <- gamboost(dist ~ speed, data = cars, dfbase = 4,
control = boost_control(mstop = 50))
set.seed(1)
cars_new <- cars + rnorm(nrow(cars))
> predict(cars.gb, newdata = cars_new$speed)
Error in check_newdata(newdata, blg, mf) :
‘newdata’ must contain all predictor variables, which were used to specify the model.
I fit a model using the example on the help(gamboost) page. I want to use this model to predict on a new dataset, cars_new, but encountered the above error. How can I fix this?
predict function looks for a variable called speed but when you subset it with $ sign it has no name anymore.
so, this variant of prediction works;
predict(cars.gb, newdata = data.frame(speed = cars_new$speed))
or keep the original name as is;
predict(cars.gb, newdata = cars_new['speed'])

Unable to plot p values when using facet.by from ggsurvplot. Error message: "variable lengths differ"

I have a problem that I don't know how to solve. And it seems to be related to my data set (or is it?). Indeed, I am actually able to plot different p values when using facet.by when I use your example from issue#205 via the "colon" data set. However,it does not work with my data set that is available on my Github profile here (https://github.com/CroixJeremy2/Data-frame-for-stack-overflow.git).
Expected behavior
I would like to be able to plot different p values as in issue#205 with my data set.
Actual behavior
I am only able to plot the curves via facet.by. But I can't plot the p values that should be automatically calculated for a log-rank test. Instead, an error message is returned in my R console saying:
"Error in model.frame.default(formula = Survival ~ Sex, data = list(ID = c(147L, :"
"les longueurs des variables diffèrent (trouvé pour 'Sex')"
# the last line translated from French to English means:
"variable lengths differ (found for Sex)"
Steps to reproduce the problem
library(survival)
Survival = Surv(time = D$Age, event = D$outcome)
library(survminer)
fit = survfit(data = D, formula = Survival ~ Sex + Genotype)
ggsurvplot(fit = fit, data = D, pval = TRUE, facet.by = 'Genotype') #error message
ggsurvplot(fit = fit, data = D, facet.by = 'Genotype') #curves can be plotted
Remarks
Note that the survdiff() function works perfectly on my data sets in order to calculate p values from log-rank tests. Therefore, I do not know if I am doing something wrong in ggsurvplot() (most likely hypothesis) or if there is something wrong in the ggsurvplot() function itself (unlikely).
survdiff(data = D, subset = D$Ctrl, formula = Survival ~ Sex)
survdiff(data = D, subset = D$nKO, formula = Survival ~ Sex)
survdiff(data = D, subset = D$CRE_Ctrl, formula = Survival ~ Sex)
#works fine, p value returned, no message/warning/error returned.
Moreover, the variable lengths seems equal... And I don't have any "NA" values in my dataframe... So I really don't understand why I have this error message...
sapply(D,function(x) length(x))
# length = 298 for all my variables...
Thanks in advance for your response and help,

How to use covariates in rddtools rdd_reg_lm function?

I am trying to run a parametric RD regression using the rddtools R package. However, the package documentation is not very clear to me.
First: the function to define an RD object is:
rdd_data(y, x, covar, cutpoint, z, labels, data)
where covar, in the help file, means only "Exogeneous variables" . But what type? A data frame? A list?
Second: The function rdd_reg_lm again demands informing covariates in this way:
rdd_reg_lm(rdd_object, covariates = NULL, order = 1, bw = NULL,
slope = c("separate", "same"), covar.opt = list(strategy = c("include",
"residual"), slope = c("same", "separate"), bw = NULL),
covar.strat = c("include", "residual"), weights)
Where, according to the help file, the covariates argument means simply "Formula to include covariates". Again, it is not clear to me what is exactly the correct way of applying these covariates.
Moreover, is it possible to include multiple covariates in this function rdd_data() and rdd_reg_lm()?
I appreciate some help here. I have already read the help and vignette files again and again, searched in many blogs and still nothing.
I have already checked this topic below
How to include a linear trend in a regression discontinuity design using rddtools
which showed me the following example:
rd.medic <- rdd_data(y = er ,x = ageyrs, covar = ageyrs, cutpoint=65, data = medicare)
rd.reg <- rdd_reg_lm(rdd_object=rd.medic, covariates = 'ageyrs', slope =
("same"), covar.opt = list("include"))
Even so, the syntax is still not clear to me, as I am trying to add multiple covariates without success
Thanks!
You can create a data frame with your covariates and then include it in rdd_data.
covariates<-data.frame(z1=ageyrs, z2=ageyrs2)
rd.medic <- rdd_data(y = er ,x = ageyrs, covar = covariates, cutpoint=65, data = medicare)
rd.reg <- rdd_reg_lm(rdd_object=rd.medic, covariates =TRUE, slope =("same"))

R predict() returns only fitted values for nls() model

First of all, I would like to mention I am just a beginner in R.
I have encountered a problem when trying to predict data from a model generated by nls(). I fitted the exponential decay function into my data and everything seems to be fine, e.g. I got a decent regression line. However, when I use predict() on a new data set, it returns only fitted values.
My code is:
df = data.frame(Time = c(0,5,15,30), Value = c(1, 0.38484677,0.18679383, 0.06732328))
model <- nls(Value~a*exp(-b*Time), start=list(a=1, b=0.15), data = df)
plot(Value~Time, data = df)
lines(df$Time, predict(model))
newtime <- data.frame(Time = seq(1,20, by = 1))
pr = predict(model, newdata = newtime$Time)
pr
[1] 0.979457389 0.450112312 0.095058637 0.009225664
Could someone explain me please, what I am doing wrong? I know there are here some answers to that problem, but none helped me.
Thank you in advance for your help!
The newdata parameter should be a data.frame with the same names as your input data. When you use newdata = newtime$Time you are actually passing in newtime$Time which is not a data.frame anymore since it 'dropped' down to a vector. You can just pass in newtime like so
pr = predict(model, newdata = newtime)

library(e1071), tune Variable lengths differ

I have been attempting to utilize the iris dataset and although I've gotten svm to work from the e1071 library, I keep getting a 'variable lengths differ' error when I attempt to make tune work:
library(e1071)
data <- data.frame(iris$Sepal.Width,iris$Petal.Length,iris$Species)
svm_tr <- data[sample(nrow(datasvm), 100), ] #sample 100 random rows
tuned <- tune(svm, svm_tr$iris.Species~.,
data = svm_tr[1:2],
kernel = "linear",
ranges = list(cost=c(.001,.01,.1,1,10,100)))
I have checked the lengths of each of the columns in svm_tr[1:2] and they are the same length. I know the function doesn't take a dataframe directly but maybe I'm missing something?
I can get it to work with:
tune(svm, iris.Species ~ ., data = svm_tr[1:3],
kernel = "linear", ranges = list(cost=c(.001,.01,.1,1,10,100)))
If it's a formula interface you shouldn't be referring to a variable by using $ as all the required variables are sourced from the object specified by the data= argument. Note that I've also made data=svm_tr[1:3] instead of 1:2 so that the iris.Species column is included.

Resources