When running the code of this example I'm getting the following error in the last line:
Error in matrix(mean(range), ncol = ncol(x), nrow = nrow(x), dimnames
= dimnames(x)) : non-numeric matrix extent
However, I remember having seen other cases some months ago where the library arulesViz worked whit categorical data type.
landing.data=read.csv2("http://archive.ics.uci.edu/ml/machine-learning-databases/shuttle-landing-control/shuttle-landing-control.data",
sep=",", header=F, dec=".")
landing.data=as.data.frame(sapply(landing.data,gsub,pattern="\\*",replacement=10))
library(arules)
landing.system <- as(landing.data, "transactions")
rules <- apriori(landing.system, parameter=list(support=0.01, confidence=0.6))
rulesLandingManual <- subset(rules, subset=rhs %in% "V1=1" & lift>1.2)
library(arulesViz)
plot(head(sort(rulesLandingManual, by="confidence"), n=3),
method="graph",control=list(type="items"))
Doing a traceback() after running your code gives this:
6: matrix(mean(range), ncol = ncol(x), nrow = nrow(x), dimnames = dimnames(x))
5: map(m, c(5, 20))
4: graph_arules(x, measure = measure, shading = shading, control,
...)
3: plot.rules(head(sort(rulesLandingManual, by = "confidence"),
n = 3), method = "graph", control = list(type = "items"))
2: plot(head(sort(rulesLandingManual, by = "confidence"), n = 3),
method = "graph", control = list(type = "items"))
1: plot(head(sort(rulesLandingManual, by = "confidence"), n = 3),
method = "graph", control = list(type = "items"))
So, basically the error comes from 6:. And the error implies that any of the argument matrix(.) are not numeric. To illustrate this:
> matrix(1:4, ncol=2)
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
> matrix(1:4, ncol="x")
# Error in matrix(1:4, ncol = "x") : non-numeric matrix extent
You see the error? I don't think there's nothing much YOU can do here as the package extends graph, map and matrix to objects of class rules. So, this probably has a lot to do with the developer side. If it is indeed the case, probably it is worth writing/contacting the developers.
I had exactly the same problem with some data I was mining rules for, and after doing some tests I found out that this error comes from the use of the sort() and head() commands when there are more rules that met the condition for quality measures than required.
For instance, in your code, you ask to plot the 3 top confidence rules in rulesLandingManual, but if you inspect(rulesLandingManual) you find that there are 216 rules with confidence 1 (max confidence) , so, when you ask to subset the top n (with n less than 217), the matrix generated in this new rules object goes messy, at least for the graph method in the plot function.
To test what I´m explaining, in your code, change n to anything between 217 to 224 (224 is the number of rules in rulesLandingManual) and it will draw the graph, while n = 216 or less will cause the mentioned error.
I don´t know if this is intended to work this way or it is a bug, I am trying to figure it out at the moment, so an explanation will come really handy.
range is a function. Did you mean mean(range(x)), ...?
Mean mean. Heh.
Related
I'm trying to run a HERGM and MLERGM models on network data, but my code always returns the same error:
Error in rep(" ", max_char - num_chars[i]) : invalid 'times' argument In addition: Warning >message: In max(num_chars) : no non-missing arguments to max; returning -Inf
I'm using a much larger network dataset, consisting of 5969 nodes and explanatory variables, but I've made a smaller, reproducible example below of some basic edges-only models.
library(mlergm)
library(hergm)
#HERGM
my_sociomatrix <- matrix(round(runif(20*20)), # edge values
nrow = 20, #nrow must be same as ncol
ncol = 20)
test.network <-
network(x = my_sociomatrix,
directed = F, matrix.type = "adjacency")
test.model <-
hergm(test.network ~
edges_ij,
max_iter = 4,
method = "ml")
#MLERGM
my_sociomatrix <- matrix(round(runif(30*30)),
nrow = 30,
ncol = 30)
node_memb <- c(rep(1, 10), rep(2, 10), rep(3, 10))
mlnet <- mlnet(network = my_sociomatrix,
node_memb = node_memb)
model_est <- mlergm(mlnet ~ edges)
My question is, why am I encountering such an error? And what can I do to solve it?
Did you try to include a gwesp term in your last line of code to mlergm:
model_est <- mlergm(mlnet ~ edges + gwesp)?
This worked for me, although I'm not sure why.
I don't have knowledge about hergm, but maybe something similar works there.
I am trying to learn about the "kohonen" package in R. In particular, there is a function called "supersom()" (https://www.rdocumentation.org/packages/kohonen/versions/3.0.10/topics/supersom , corresponding to the SOM (Self Organizing Maps) algorithm used in unsupervised machine learning) that I am trying to apply on some data.
Below, (from a previous question: R error: "Error in check.data : Argument Should be Numeric") I learned how to apply the "supersom()" function on some artificially created data with both "factor" and "numeric" variables.
#the following code works
#load libraries
library(kohonen)
library(dplyr)
#create and format data
a =rnorm(1000,10,10)
b = rnorm(1000,10,5)
c = rnorm(1000,5,5)
d = rnorm(1000,5,10)
e <- sample( LETTERS[1:4], 100 , replace=TRUE, prob=c(0.25, 0.25, 0.25, 0.25) )
f <- sample( LETTERS[1:5], 100 , replace=TRUE, prob=c(0.2, 0.2, 0.2, 0.2, 0.2) )
g <- sample( LETTERS[1:2], 100 , replace=TRUE, prob=c(0.5, 0.5) )
data = data.frame(a,b,c,d,e,f,g)
data$e = as.factor(data$e)
data$f = as.factor(data$f)
data$g = as.factor(data$g)
cols <- 1:4
data[cols] <- scale(data[cols])
#som model
som <- supersom(data= as.list(data), grid = somgrid(10,10, "hexagonal"),
dist.fct = "euclidean", keep.data = TRUE)
Everything works well - the problem is, when I try to apply the "supersom()" function on " more realistic and bigger data", I get the following error:
"Error: Non-informative layers present : mean distances between objects zero"
When I look at the source code for this function (https://rdrr.io/cran/kohonen/src/R/supersom.R), I notice a reference for the same error:
if (any(sapply(meanDistances, mean) < .Machine$double.eps))
stop("Non-informative layers present: mean distance between objects zero")
Can someone please show me how I might be able to resolve this error, i.e. make the "supersom()" function work with factor and numeric data?
I thought that perhaps removing duplicate rows and NA's might fix this problem:
data <- na.omit(data)
data <- unique(data)
However the same error ("Non-informative layers present : mean distances between objects zero") is still there.
Can someone please help me figure out what might be causing this error? Note: when I remove the "factor" variables, everything works fine.
Sources:
https://cran.r-project.org/web/packages/kohonen/kohonen.pdf
https://www.rdocumentation.org/packages/kohonen/versions/2.0.5/topics/supersom
https://rdrr.io/cran/kohonen/src/R/supersom.R
The error happens if you have certain numeric columns whose mean is 0. You can reproduce the error by turning any 1 column to 0.
data$a <- 0
som <- supersom(data= as.list(data), grid = somgrid(10,10, "hexagonal"),
dist.fct = "euclidean", keep.data = TRUE)
Error in supersom(data = as.list(data), grid = somgrid(10, 10, "hexagonal"), :
Non-informative layers present: mean distance between objects zero
Maybe you can investigate why those column have 0 mean or remove the columns with 0 means from the data.
library(kohonen)
library(dplyr)
data <- data %>% select(where(~(is.numeric(.) && mean(.) > 0) | !is.numeric(.)))
#som model
som <- supersom(data= as.list(data), grid = somgrid(10,10, "hexagonal"),
dist.fct = "euclidean", keep.data = TRUE)
I have looked on Google and on StackOverflow to find a solution to my problem. I have tried a few things now, and nothing seems to be working.
I am trying to create a correlation boxplot of linguistic features. For each feature (36 in total), there is a 1 in Excel for when a speaker used it, and a 0 for when a speaker did not.
There are 41 speakers, none of whom used all 36 features, though the lowest score is 8. I want to analyse my data to see which features correlate, and therefore find out which features predict the use of other features.
I have been using corrplot in R. Here is the command I have been using:
cor_mat <- df_analysis %>%
replace(., is.na(.), 0) %>%
cor(method = "spearman")
cor_residuals <- cor.mtest(cor_mat, conf.level = .95)
But, I get an error saying:
Error in cor.test.default(x = mat[, i], y = mat[, j], ...) : not enough finite observations
Does anybody know why and how I can rectify it? In fact, all I really need to know is what the problem is, and I can probably figure it out on my own from there. Though I would be hugely grateful if you also have the solution!
Many thanks!
You have several columns in your data set that have no variation; thus the correlations for these variables are all NA, which screws things up downstream.
which(apply(df_analysis,2,sd)==0)
## [1] a' c[h]lach bheag [3] a' c[h]loich bhig [14] a' b[h]ord bheag
## 1 3 14
## [26] nan su[ ]l [27] nan sul[ ]
## 26 27
I figured this out by setting options(error=recover) and running to see where the error occurred (this setting drops you into browser/debug mode when an error occurs). More directly, I should have done corrplot(cor_mat), which helpfully puts question marks for NA values ...
image(), or heatmap(as.matrix(df_analysis),Rowv=NA,Colv=NA, scale="none", margins=c(10,8)), would be good for looking at your raw data.
You can define corrplot ideally like this:
df_cor <- cor(df_analysis)
corrplot(df_cor, type = "full", order = "hclust",
outline.color = "white", hc.method = "ward",
pch.cex = .5, show.diag = TRUE,
p.mat = cor_residuals$p, insig = "blank", sig.level = .01,
addrect = 20, tl.srt = 36, tl.cex = .8, tl.col = "black",
col = rev(lacroix_palette("PassionFruit", 8, "continuous")))
I have price data for an asset. I want to fit a Markow Switching model (with 2 states). The code I have run is below. Price is configured as numeric and date as a date. Not sure where I'm going wrong.
library(MSwM)
# Loading required package: parallel
library(ggplot2)
nstates <- 6
olsPrice <- lm(PriceUSD~date, Priced)
msmPrice <- msmFit(olsPrice, k = nstates, sw = c(FALSE, TRUE, TRUE))
The error message I get is:
Error in w * matrix(resid(modaux), ncol = k, byrow = T)^2 :
non-conformable arrays
I know that the smoothing parameter(lambda) is quite important for fitting a smoothing spline, but I did not see any post here regarding how to select a reasonable lambda (spar=?), I was told that spar normally ranges from 0 to 1. Could anyone share your experience when use smooth.spline()? Thanks.
smooth.spline(x, y = NULL, w = NULL, df, spar = NULL,
cv = FALSE, all.knots = FALSE, nknots = NULL,
keep.data = TRUE, df.offset = 0, penalty = 1,
control.spar = list(), tol = 1e-6 * IQR(x))
agstudy provides a visual way to choose spar. I remember what I learned from linear model class (but not exact) is to use cross validation to pick "best" spar. Here's a toy example borrowed from agstudy:
x = seq(1:18)
y = c(1:3,5,4,7:3,2*(2:5),rep(10,4))
splineres <- function(spar){
res <- rep(0, length(x))
for (i in 1:length(x)){
mod <- smooth.spline(x[-i], y[-i], spar = spar)
res[i] <- predict(mod, x[i])$y - y[i]
}
return(sum(res^2))
}
spars <- seq(0, 1.5, by = 0.001)
ss <- rep(0, length(spars))
for (i in 1:length(spars)){
ss[i] <- splineres(spars[i])
}
plot(spars, ss, 'l', xlab = 'spar', ylab = 'Cross Validation Residual Sum of Squares' , main = 'CV RSS vs Spar')
spars[which.min(ss)]
R > spars[which.min(ss)]
[1] 0.381
Code is not neatest, but easy for you to understand. Also, if you specify cv=T in smooth.spline:
R > xyspline <- smooth.spline(x, y, cv=T)
R > xyspline$spar
[1] 0.3881
From the help of smooth.spline you have the following:
The computational λ used (as a function of \code{spar}) is λ = r *
256^(3*spar - 1)
spar can be greater than 1 (but I guess no too much). I think you can vary this parameters and choose it graphically by plotting the fitted values for different spars. For example:
spars <- seq(0.2,2,length.out=10) ## I will choose between 10 values
dat <- data.frame(
spar= as.factor(rep(spars,each=18)), ## spar to group data(to get different colors)
x = seq(1:18), ## recycling here to repeat x and y
y = c(1:3,5,4,7:3,2*(2:5),rep(10,4)))
xyplot(y~x|spar,data =dat, type=c('p'), pch=19,groups=spar,
panel =function(x,y,groups,...)
{
s2 <- smooth.spline(y,spar=spars[panel.number()])
panel.lines(s2)
panel.xyplot(x,y,groups,...)
})
Here for example , I get best results for spars = 0.4
If you don't have duplicated points at the same x value, then try setting GCV=TRUE - the Generalized Cross Validation (GCV) procedure is a clever way of selecting a pretty good stab at picking a good value for lambda (span). One neat detail about the GCV is that it doesn't actually have to go to the trouble of doing the calculations for every single set of one-left-out points - as highlighted in Simon Wood's book. For lots of detail on this have a look at the notes on Simon Wood's web page on MGCV.
Adrian Bowman's (sm) r-package has a function h.select() which is intended specifically for going the grunt work for choosing a value of lambda (though I'm not 100% sure that it is compatible with the smooth.spline() function in the base package.