R error using DBSCAN on Data frame - r

Error in data - x : non-numeric argument to binary operator
My code is as follows:
x <- as.factor(c(2, 2, 8, 5, 7, 6, 1, 4))
y <- as.factor(c(10, 5, 4, 8, 5, 4, 2, 9))
coordinates <- data.frame(x, y)
colnames(coordinates) <- c("x_coordinate", "y_coordinate")
print(coordinates)
point_clusters <- dbscan(coordinates, 2, MinPts = 2, scale = FALSE,
method = c("hybrid", "raw", "dist"), seeds = TRUE,
showplot = 1, countmode = NULL)
point_clusters
But I'm getting following error while executing the above code:
> point_clusters <- dbscan(coordinates, 2, MinPts = 2, scale = FALSE, method = c("hybrid", "r ..." ... [TRUNCATED]
Error in data - x : non-numeric argument to binary operator
I don't know what is the problem with above code.

I solved the problem as per my need. I saw somewhere that the data needs to be numeric matrix, although I'm not sure about that. So, here is what I did:
x <- c(2, 2, 8, 5, 7, 6, 1, 4)
y <- c(10, 5, 4, 8, 5, 4, 2, 9)
coordinates <- matrix(c(x, y), nrow = 8, byrow = FALSE)
Remaining code is same as above. Now it works fine for me.

Related

R tidyverse warning: The `i` argument of ``[`()` can't be a matrix as of tibble 3.0.0

I get a warning when wanting to select rows dependent on the mean of one of the variables in a tibble. See details below and warning. So I wonder if there is a more tidyverse solution to this.
Example data:
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
z <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
xyz <- tibble(x, y, z)
group1 <- xyz[xyz[2] < stats::median(purrr::as_vector(xyz$y), na.rm = TRUE), ]
Warning message:
The i argument of ``[() can't be a matrix as of tibble 3.0.0.
Convert to a vector.
Thanks in advance
xyz %>%
filter(y < stats::median(y))

What does "Error: Must use a vector in `[`, not an object of class matrix." mean when running a PCA?

I am quite new to R and I am trying to run a PCA for an incomplete data set with the code:
res.comp <- imputePCA(questionaire_results_PCA, ncp = nb$ncp)
but R tells me:
Error: Must use a vector in [, not an object of class matrix.
Run rlang::last_error() to see where the error occurred.
So I run:
rlang::last_error()
R says:
1. missMDA::imputePCA(questionaire_results_PCA, ncp = nb$ncp)
4. tibble:::`[.tbl_df`(X, !is.na(X))
5. tibble:::check_names_df(i, x)
Run `rlang::last_trace()` to see the full context
So I run:
rlang::last_trace()
And R Says:
Must use a vector in `[`, not an object of class matrix.
Backtrace:
█
1. └─missMDA::imputePCA(questionaire_results_PCA, ncp = nb$ncp)
2. ├─base::mean((res.impute$fittedX[!is.na(X)] - X[!is.na(X)])^2)
3. ├─X[!is.na(X)]
4. └─tibble:::`[.tbl_df`(X, !is.na(X))
5. └─tibble:::check_names_df(i, x)
Does anyone know what this means and how I could get it to work?
I have run:
dput(head(questionaire_results_PCA))
and I got:
structure(list(Active = c(6, 6, 5, 7, 5, 6), `Aggressive to people` = c(NA,
4, NA, 2, NA, 1), Anxious = c(NA, 4, NA, 3, NA, 2), Calm = c(NA,
5, NA, 5, NA, 6), Cooperative = c(7, 6, 7, 6, 6, 6), Curious = c(7,
2, 7, 7, 7, 6), Depressed = c(1, 3, 1, 1, 1, 1), Eccentric = c(1,
3, 1, 4, 1, 4), Excitable = c(5, 2, 5, 5, 4, 4), `Fearful of people` = c(1,
2, 1, 2, 1, 1), `friendly of people` = c(5, 6, 7, 7, 7, 7), Insecure = c(2,
5, 2, 3, 2, 2), Playful = c(4, 6, 2, 5, 6, 6), `Self assured` = c(7,
6, 7, 5, 6, 6), Smart = c(6, 2, 7, 5, 7, 3), Solitary = c(4,
4, 3, 4, 3, 2), Tense = c(1, 2, 1, 3, 1, 2), Timid = c(2, 2,
2, 2, 2, 2), Trusting = c(6, 6, 6, 6, 6, 6), Vigilant = c(7,
6, 5, 3, 5, 3), Vocal = c(2, 7, 1, 6, 1, 7)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
I then ran the code:
dput(nb$ncp)
and got:
3L
Here's the answer in case anyone comes across the same issue. Using the data provided by OP:
class(questionaire_results_PCA)
[1] "tbl_df" "tbl" "data.frame"
The input of imputePCA requires a data.frame, but it does not work with a tribble. So we need to convert it back to a matrix or data.frame:
library(missMDA)
res.comp <- imputePCA(data.frame(questionaire_results_PCA), ncp = 2)
Error in eigen(crossprod(t(X), t(X)), symmetric = TRUE) :
infinite or missing values in 'x'
I get this error because it's a subset of the data and some of the columns have no deviation, we work around this first.
sel = which(apply(questionaire_results_PCA,2,sd)!=0)
# returns you a data.frame
res1 <- imputePCA(as.data.frame(questionaire_results_PCA[,sel]), ncp = 2)
# returns you a matrix
res2 <- imputePCA(as.matrix(questionaire_results_PCA[,sel]), ncp = 2)

How to update a list in a for loop(cannot store ggplot object into the list) [duplicate]

My problem is similar to this one; when I generate plot objects (in this case histograms) in a loop, seems that all of them become overwritten by the most recent plot.
To debug, within the loop, I am printing the index and the generated plot, both of which appear correctly. But when I look at the plots stored in the list, they are all identical except for the label.
(I'm using multiplot to make a composite image, but you get same outcome if you print (myplots[[1]])
through print(myplots[[4]]) one at a time.)
Because I already have an attached dataframe (unlike the poster of the similar problem), I am not sure how to solve the problem.
(btw, column classes are factor in the original dataset I am approximating here, but same problem occurs if they are integer)
Here is a reproducible example:
library(ggplot2)
source("http://peterhaschke.com/Code/multiplot.R") #load multiplot function
#make sample data
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3,
3, 1, 5, 3, 4, 6)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3,
3, 1, 4, 3, 5, 4)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3,
3, 3, 4, 3, 5, 4)
col4 <- c(2, 5, 2, 1, 4, 1, 3, 4, 1, 3, 5, 2, 4, 3, 5, 3, 4, 6, 3, 4, 6, 4, 3, 2, 5, 5, 4,
2, 3, 2, 2, 3, 3, 4, 0, 1, 4, 3, 3, 5, 4, 4, 4, 3, 3, 5, 4, 3, 5, 3, 6, 6, 4, 2,
3, 3, 4, 4, 4, 6)
data2 <- data.frame(col1,col2,col3,col4)
data2[,1:4] <- lapply(data2[,1:4], as.factor)
colnames(data2)<- c("A","B","C", "D")
#generate plots
myplots <- list() # new empty list
for (i in 1:4) {
p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+
geom_histogram(fill="lightgreen") +
xlab(colnames(data2)[ i])
print(i)
print(p1)
myplots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
When I look at a summary of a plot object in the plot list, this is what I see
> summary(myplots[[1]])
data: A, B, C, D [60x4]
mapping: x = data2[, i]
faceting: facet_null()
-----------------------------------
geom_histogram: fill = lightgreen
stat_bin:
position_stack: (width = NULL, height = NULL)
I think that mapping: x = data2[, i] is the problem, but I am stumped! I can't post images, so you'll need to run my example and look at the graphs if my explanation of the problem is confusing.
Thanks!
In addition to the other excellent answer, here’s a solution that uses “normal”-looking evaluation rather than eval. Since for loops have no separate variable scope (i.e. they are performed in the current environment) we need to use local to wrap the for block; in addition, we need to make i a local variable — which we can do by re-assigning it to its own name1:
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(colnames(data2)[i])
print(p1)
})
}
However, an altogether cleaner way is to forego the for loop entirely and use list functions to build the result. This works in several possible ways. The following is the easiest in my opinion:
plot_data_column = function (data, column) {
ggplot(data, aes_string(x = column)) +
geom_histogram(fill = "lightgreen") +
xlab(column)
}
myplots <- lapply(colnames(data2), plot_data_column, data = data2)
This has several advantages: it’s simpler, and it won’t clutter the environment (with the loop variable i).
1 This might seem confusing: why does i <- i have any effect at all? — Because by performing the assignment we create a new, local variable with the same name as the variable in the outer scope. We could equally have used a different name, e.g. local_i <- i.
Because of all the quoting of expressions that get passed around, the i that is evaluated at the end of the loop is whatever i happens to be at that time, which is its final value. You can get around this by eval(substitute(ing in the right value during each iteration.
myplots <- list() # new empty list
for (i in 1:4) {
p1 <- eval(substitute(
ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+
geom_histogram(fill="lightgreen") +
xlab(colnames(data2)[ i])
,list(i = i)))
print(i)
print(p1)
myplots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
Using lapply works too as x exists within the anonymous function environment (using mtcars as data):
plot <- lapply(seq_len(ncol(mtcars)), FUN = function(x) {
ggplot(data = mtcars) +
geom_line(aes(x = mpg, y = mtcars[ , x]), size = 1.4, color = "midnightblue", inherit.aes = FALSE) +
labs(x="Date", y="Value", title = "Revisions 1M", subtitle = colnames(mtcars)[x]) +
theme_wsj() +
scale_colour_wsj("colors6")
})
I have run the code in the question and in the answer, changing geom_histogram to geom_bar to avoid the error: Error: StatBin requires a continuous x variable.
Here is the code with the visualizations:
Question
#generate plots
myplots <- list() # new empty list
for (i in 1:4) {
p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+
geom_bar(fill="lightgreen") +
xlab(colnames(data2)[ i])
print(i)
print(p1)
myplots[[i]] <- p1 # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid
Answer
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_bar(fill = "lightgreen") +
xlab(colnames(data2)[i])
print(p1)
})
}
multiplot(plotlist = myplots, cols = 4)
Same result using lapply:
plot_data_column = function (data, column) {
ggplot(data, aes_string(x = column)) +
geom_bar(fill = "lightgreen") +
xlab(column)
}
myplots <- lapply(colnames(data2), plot_data_column, data = data2)
multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid
Created on 2021-04-09 by the reprex package (v0.3.0)

How does one calculate LD50 from a glmer?

I am analyzing a data set where ~10 individuals are exposed to a set treatment (Time) and mortality is recorded (Alive, Dead). glmer was used to model the data because Treatments were blocked (Trial).
From the following model I want to predict the Time at which 50% of individuals die.
Trial <- c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3)
Time <- c(2, 6, 9, 12, 15, 18, 21, 24, 1, 2, 3, 4, 5, 6, 1.5, 3, 4.5, 6, 39)
Alive <- c(10, 0, 0, 0, 0, 0, 0, 0, 6, 2, 8, 1, 0, 0, 4, 6, 1, 2, 0)
Dead <- c(0, 10, 6, 10, 10, 10, 7, 10, 0, 8, 1, 9, 10, 10, 5, 0, 8, 6, 10)
ostrinaA.glmm<- glmer(cbind(Alive, Dead)~Time+(1|Trial), family = binomial(link="logit"))
summary(ostrinaA.glmm)
If I was simply modelling using glmthe dose.p function from MASS could be used. From a different forum I found generalized code for a dose.p.glmm from Bill Pikounis. It is as follows:
dose.p.glmm <- function(obj, cf = 1:2, p = 0.5) {
eta <- obj$family$linkfun(p)
b <- fixef(obj)[cf]
x.p <- (eta - b[1L])/b[2L]
names(x.p) <- paste("p = ", format(p), ":", sep = "")
pd <- -cbind(1, x.p)/b[2L]
SE <- sqrt(((pd %*% vcov(obj)[cf, cf]) * pd) %*% c(1, 1))
res <- structure(x.p, SE = SE, p = p)
class(res) <- "glm.dose"
res
}
I'm new to coding and need help adjusting this code for my model. My attempt is as follows:
dose.p.glmm <- function(ostrinaA.glmm, cf = 1:2, p = 0.5) {
eta <- ostrinaA.glmm$family$linkfun(p)
b <- fixef(ostrinaA.glmm)[cf]
x.p <- (eta - b[1L])/b[2L]
names(x.p) <- paste("p = ", format(p), ":", sep = "")
pd <- -cbind(1, x.p)/b[2L]
SE <- sqrt(((pd %*% vcov(obj)[cf, cf]) * pd) %*% c(1, 1))
res <- structure(x.p, SE = SE, p = p)
class(res) <- "glm.dose"
res
}
dose.p.glmm(ostrinaA.glmm, cf=1:2, p=0.5)
Error in ostrinaA.glmm$family : $ operator not defined for this S4 class
Any assistance adjusting this code for my model would be greatly appreciated.
At a quick glance I would think replacing
eta <- obj$family$linkfun(p)
with
f <- family(obj)
eta <- f$linkfun(p)
should do the trick.
You also need to replace the res <- ... line with
res <- structure(x.p, SE = matrix(SE), p = p)
This is rather obscure, but is necessary because the print.dose.glm method (from the MASS package) automatically tries to cbind() some stuff together. This fails if SE is a fancy matrix from the Matrix package rather than a vanilla matrix from base R: matrix() does the conversion.
If you are very new to coding, you might not realize that you don't have to change the obj variable name in the code you've copied to ostrina.glmm. In other words, Pikounis's code should work perfectly well with only the two modifications I suggested above.

how to plot the results of a LDA

There are quite some answers to this question. Not only on stack overflow but through internet. However, none could solve my problem. I have two problems
I try to simulate a data for you
df <- structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2), var1 = c(2, 3, 1, 2, 3, 2, 3, 3, 5, 6, 7, 6, 8, 5,
5), var2 = c(9, 9, 9, 8, 7, 8, 9, 3, 2, 2, 1, 1, 2, 3, 3), var3 = c(6,
7, 6, 6, 5, 6, 7, 1, 2, 1, 2, 3, 1, 1, 2)), .Names = c("Group",
"var1", "var2", "var3"), row.names = c(NA, -15L), class = "data.frame")
then I do as follows:
fit <- lda(Group~., data=df)
plot(fit)
I end up with groups appearing in two different plots.
how to plot my results in one figure like e.g. Linear discriminant analysis plot
Linear discriminant analysis plot using ggplot2
or any other beautiful plot ?
The plot() function actually calls plot.lda(), the source code of which you can check by running getAnywhere("plot.lda"). This plot() function does quiet a lot of processing of the LDA object that you pass in before plotting. As a result, if you want to customize how your plots look, you will probably have to write your own function that extracts information from the lda object and then passes it to a plot fuction. Here is an example (I don't know much about LDA, so I just trimmed the source code of the default plot.lda and use ggplot2 package (very flexible) to create a bunch of plots).
#If you don't have ggplot2 package, here is the code to install it and load it
install.packages("ggplot2")
library("ggplot2")
library("MASS")
#this is your code. The only thing I've changed here is the Group labels because you want a character vector instead of numeric labels
df <- structure(list(Group = c("a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b"),
var1 = c(2, 3, 1, 2, 3, 2, 3, 3, 5, 6, 7, 6, 8, 5, 5),
var2 = c(9, 9, 9, 8, 7, 8, 9, 3, 2, 2, 1, 1, 2, 3, 3),
var3 = c(6, 7, 6, 6, 5, 6, 7, 1, 2, 1, 2, 3, 1, 1, 2)),
.Names = c("Group","var1", "var2", "var3"),
row.names = c(NA, -15L), class = "data.frame")
fit <- lda(Group~., data=df)
#here is the custom function I made that extracts the proper information from the LDA object. You might want to write your own version of this to make sure it works with all cases (all I did here was trim the original plot.lda() function, but I might've deleted some code that might be relevant for other examples)
ggplotLDAPrep <- function(x){
if (!is.null(Terms <- x$terms)) {
data <- model.frame(x)
X <- model.matrix(delete.response(Terms), data)
g <- model.response(data)
xint <- match("(Intercept)", colnames(X), nomatch = 0L)
if (xint > 0L)
X <- X[, -xint, drop = FALSE]
}
means <- colMeans(x$means)
X <- scale(X, center = means, scale = FALSE) %*% x$scaling
rtrn <- as.data.frame(cbind(X,labels=as.character(g)))
rtrn <- data.frame(X,labels=as.character(g))
return(rtrn)
}
fitGraph <- ggplotLDAPrep(fit)
#Here are some examples of using ggplot to display your results. If you like what you see, I suggest to learn more about ggplot2 and then you can easily customize your plots
#this is similar to the result you get when you ran plot(fit)
ggplot(fitGraph, aes(LD1))+geom_histogram()+facet_wrap(~labels, ncol=1)
#Same as previous, but all the groups are on the same graph
ggplot(fitGraph, aes(LD1,fill=labels))+geom_histogram()
The following example won't work with your example because you don't have LD2, but this is equivalent to the scatter plot in the external example you provided. I've loaded that example here as a demo
ldaobject <- lda(Species~., data=iris)
fitGraph <- ggplotLDAPrep(ldaobject)
ggplot(fitGraph, aes(LD1,LD2, color=labels))+geom_point()
I didn't customize ggplot settings much, but you can make your graphs look like anything you want if you play around with it.Hope this helps!

Resources