Error in .local(x, ...): x and y don't match - r

I'm new to R, and trying to fit a model using kernlab with some data that I just loaded in. However, when I try and load it in I get the error message in the subject line. I assume this means the data type of X and y are not compatible.
Here's some sample code:
data = read.delim("my-sample-file.txt")
model = ksvm(data[, 1:10], data[, 11])
When I call data[, 11] I just the raw values in the column returned to me, and I notice the typeof function returns the value integer, which I found strange. I am not using any additional packages, just trying to get something basic to work.
Thank you.

Reading the help page for ksvm shows that the Usage sections says that using x and y as the input parameters requires a matrix for x, so this should be more successful (assuming that the data object has all numeric columns. You really should be looking at your data carefully before reaching for analysis tools.):
model = ksvm( x = data.matrix(data[, 1:10]), y=data[, 11]) )
Note that you can get exactly the same error with the iris data.frame:
ksvm(x=iris[-5], y=iris$Species)
Error in .local(x, ...) : x and y don't match.
Whereas converting to matrix results in success:
ksvm(x=data.matrix(iris[-5]), y=iris$Species)
Support Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 1
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 0.484488222038106
Number of Support Vectors : 57
Objective Function Value : -3.7021 -3.8304 -21.7405
Training error : 0.026667
Morals of the story: Pay attention to the 'Usage' section to give guidance on the different forms that generic functions may take. And always assume that the authors of the help page are excruciatingly correct in their description of the arguments in the 'Arguments' sections. If they say matrix, don't assume they mean anything sort of like a matrix. (But if you mutter under your breath that this seems like something that should have been anticipated and a more informative error message emitted, I would not disagree.)

Related

Specifying custom weights for the nonparametric estimate of spatially-varying relative risk in spatstat

Is there a way to specify weights in relrisk.ppp function in spatstat (version 1.63-3)?
The relrisk.ppp function calls the density.ppp function, which does allow users to specify their own weights.
For example, let us build upon the provided spatstat.data::urkiola data where, instead of individual trees, the locations are tree stands and we have a second numeric mark for the frequency of trees at each point-location:
urkiola_new <- spatstat.data::urkiola
urkiola_new$marks <- data.frame("type" = urkiola_new$marks, "freq" = rpois(urkiola_new$n, 3))
f1 <- spatstat::relrisk(urkiola_new, weights = urkiola_new$marks$freq)
When using the urkiola_new in a call of relrisk, urkiola_new is caught by stopifnot(is.multitype(X)) in relrisk.ppp. I next tried specifying the weights separately as a vector while using the original urkiola data,
f2 <- spatstat::relrisk(urkiola, weights = urkiola_new$marks$freq)
but was caught by a warning from the pixellate.ppp function within the internal density.ppp function:
Error in pixellate.ppp(x, ..., padzero = TRUE) : length(weights) == npoints(x) || length(weights) == 1 is not TRUE
The same error occurs when I convert the weights into a list
urkiola_weights <- split(urkiola_new$marks$freq, urkiola_new$marks$type)
f3 <- spatstat::relrisk(urkiola, weights = urkiola_weights)
I suspect there is a way to specify the weights cleverly, but it yet escapes me. Any suggestions or guidance would be helpful, thank you!
The function relrisk.ppp is not currently designed to handle weights. The help entry for relrisk.ppp does not mention weights.
The example above does not work because relrisk.ppp applies density.ppp separately to the sub-patterns of points of each type, and the extra argument weights is the wrong length for these sub-patterns.
I will take this question as a feature request, to add this capability to relrisk.ppp. It should be done soon.
Update: this is now implemented in the development version, spatstat 1.64-0.018 available at the spatstat github repository

Error related to randomisation test within lapply() function in R

I have 30 datasets that are conbined in a data list. I wanted to analyze spatial point pattern by L function along with randomisation test. Codes are following.
The first code works well for a single dataset (data1) but once it is applied to a list of dataset with lapply() function as shown in 2nd code, it gives me a very long error like so,
"Error in Kcross(X, i, j, ...) : No points have mark i = Acoraceae
Error in envelopeEngine(X = X, fun = fun, simul = simrecipe, nsim =
nsim, : Exceeded maximum number of errors"
Can anybody tell me what is wrong with 2nd code?
grp <- factor(data1$species)
window <- ripras(data1$utmX, data1$utmY)
pp.grp <- ppp(data1$utmX, data1$utmY, window=window, marks=grp)
L.grp <- alltypes(pp.grp, Lest, correlation = "Ripley")
LE.grp <- alltypes(pp.grp, Lcross, nsim = 100, envelope = TRUE)
plot(L.grp)
plot(LE.grp)
L.LE.sp <- lapply(data.list, function(x) {
grp <- factor(x$species)
window <- ripras(x$utmX, x$utmY)
pp.grp <- ppp(x$utmX, x$utmY, window = window, marks = grp)
L.grp <- alltypes(pp.grp, Lest, correlation = "Ripley")
LE.grp <- alltypes(pp.grp, Lcross, envelope = TRUE)
result <- list(L.grp=L.grp, LE.grp=LE.grp)
return(result)
})
plot(L.LE.sp$LE.grp[1])
This question is about the R package spatstat.
It would help if you could add a minimal working example including data which demonstrate this problem.
If that is not available, please generate the error on your computer, then type traceback() and capture the output and post it here. This will trace the location of the error.
Without this information, my best guess is the following:
The error message says No points have mark i=Acoraceae. That means that the code is expecting a point pattern to include points of type Acoraceae but found that there were none. This can happen because in alltypes(... envelope=TRUE) the code generates random point patterns according to complete spatial randomness. In the simulated patterns, the number of points of type Acoraceae (say) will be random according to a Poisson distribution with a mean equal to the number of points of type Acoraceae in the observed data. If the number of Acoraceae in the actual data is small then there is a reasonable chance that the simulated pattern will contain no Acoraceae at all. This is probably what is causing the error message No points have mark i=Acoraceae.
If this interpretation is correct then you should be able to suppress the error by including the argument fix.marks=TRUE, that is,
alltypes(pp.grp, Lcross, envelope=TRUE, fix.marks=TRUE, nsim=99)
I'm not suggesting this is necessarily appropriate for your application, but this should remove the error message if my guess is correct.
In the latest development version of spatstat, available on github, the code for envelope has been tweaked to detect this error.

R implementation of kohonen SOMs: prediction error due to data type.

I have been trying to run an example code for supervised kohonen SOMs from https://clarkdatalabs.github.io/soms/SOM_NBA . When I tried to predict test set data I got the following error:
pos.prediction <- predict(NBA.SOM3, newdata = NBA.testing)
Error in FUN(X[[i]], ...) :
Data type not allowed: should be a matrix or a factor
I tried newdata = as.matrix(NBA.testing) but it did not help. Neither did as.factor().
Why does it happen? And how can I fix that?
You should put one more argument to the predict function, i.e. "whatmap", then set its value to 1.
The code would be like:
pos.prediction <- predict(NBA.SOM3, newdata = NBA.testing, whatmap = 1)
To verify the prediction result, you can check using:
table(NBA$Pos[-training_indices], pos.prediction$predictions[[2]], useNA = 'always')
The result may be different from that of the tutorial, since it did not declare the use of set.seed() function.
I suggest that the set.seed() with an arbitrary number in it was declared somewhere before the training phase.
For simplicity, put it once on the top most of your script, e.g.
set.seed(12345)
This will guarantee a reproducible result of your model next time you re-run your script.
Hope that will help.

External validation of a Cox model using rcorr.cens() and val.surv

I have two independent datasets one with 5421 and the other 1000 subjects. What I would like to do is validate the Cox model obtained from the main dataset (main_dat, n=5421) using the external dataset (test_dat, n=1000). However, I get an error message using both rcorr.cens() in the Hmisc package and val.surv in rms. Here is what I have been doing:
library(rms)
surv.obj=with(main_dat,Surv(survival,surv_cens)) ## to use with rcorr.cens
phmodel=cph(surv.obj~sex+age+treatment, x=TRUE, y=TRUE, surv=T, time.inc=10, data=main_dat, se.fit=T)
estimates=survest(phmodel, newdata=test_dat, times=10)
rcorr.cens(x=estimates, S=surv.obj)
Error in rcorr.cens(x = estimates, S = surv.obj) :
y must have same length as x
w=val.surv(phmodel ,newdata=test_dat, u=10)
Error in val.surv(phmodel, newdata = test_dat, u = 10) :
dims [product 1000] do not match the length of object [5421]
In addition: Warning message:
In est.surv + S[, 1] :
longer object length is not a multiple of shorter object length
Am I doing something wrong or the two datasets must have same number of observations?
Any help will be greatly appreciated.
I don't see where test_dat has surv.obj defined. You'll either need to add that to test_dat or have a free-standing object surv.obj that is used in the calls.
Note that your sample sizes are not large enough for split-sample validation, i.e., if you re-split the sample multiple times you will get disagreements in the result. Rigorous bootstrap internal validation (using the rms package validate and calibrate functions) is usually more precise.

R: finding the source code that produces the output for S4 slot?

G'day Everyone,
When the 'lmer' function in 'lme4' runs its produces an S4 object with a lot of slots. I am interested in one of these slots, namely model#X, and how this 'X' slot output is produced. I want to try reproduce this output for a different model function (glmmPQL) I am using which does not automatically produces this 'X' output (FYI 'lmer' produces an object of class 'mer', and slot 'X' is a model matrix for the fixed effects). Code below shows what I am talking about.
What I want to figure out is how the produced this 'X' data? I looked at the code for 'lmer' by writing it in the terminal without '()' but I couldn't find anything there. I also tried showMethod('lmer') but it says function 'lmer': .
Just wondering if there is a way to get the source code for what the 'X' slot is doing in particular (or any slot in a S4 object)? Or does anyone know how to reproduce this? Thanks lots for your help and time.
library(lme4)
# here is a quick example of what I am looking at using the cake dataset in the 'lme4' package
m <- lmer(cakeglmm<- lmer(angle ~ temp + recipe + (1| replicate), family = gaussian, data = cake)
slotNames(m)
head(m#X)
You started off okay by printing lmer. That won't show you where m#X is set, but you can see which methods are called by lmer.
The methods within lmer can be accessed using lme4:::methodName.
If you look inside lme4:::lmer_finalize, you'll see (paraphrasing):
ans <- new(Class = "mer", ..., X = fr$X, ...)
So that's where the #X slot is being populated. Back up in lmer you'll see that fr comes from lme4:::lmerFrames, and specifically fr$X is calculated by:
X <- if (!is.empty.model(mt))
model.matrix(mt, mf, contrasts)
else matrix(, NROW(Y), 0)

Resources