Mclust : NAs in foreign function call (arg 13) - r

While trying to determine the optimal number of clusters for a kmeans, I tried to use the package mclust with the following code :
d_clust <- Mclust(df,
G=1:10,
mclust.options("emModelNames"))
d_clust$BIC
df is a data frame of 132656 obs. of 19 variables, the data is scaled, and there is no missing values (no NA/NaN/Inf values I checked with is.na and is.finite). Also, my variables are all in numeric format thanks to as.numeric
However after using the code, the screen displays "fitting" with a loading bar, goes up to 11%, and then after a moment I get the error message :
NAs in foreign function call (arg 13)
Does anyone know why I have this type of error ?
EDIT
Output of str(df) (I modified the variable name because of confidential issues)
'data.frame': 132656 obs. of 19 variables:
$ X1: num 0.5 1 1 1 0.5 1 1 1 1 1 ...
$ X2: num 0.714 0.286 1 0.857 0.286 ...
$ X3: num 0.667 1 0.667 0.667 0.667 ...
$ X4: num 0.714 0.429 1 0.714 0.429 ...
$ X5: num 0.667 0.333 1 0.667 0.333 ...
$ X6: num 0.5 0.25 1 0.5 0.25 0.25 0 0.5 0.5 0.25 ...
$ X7: num 0.667 0.667 0.667 0.667 0.667 ...
$ X8: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
$ X9: num 0.667 0 0.667 0.333 0 ...
$ X10: num 1 0.833 1 1 1 ...
$ X11: num 1 0.75 1 1 1 1 1 1 1 1 ...
$ X12: num 1 1 1 0.8 1 1 1 1 1 1 ...
$ X13: num 0.5 0.75 0.75 0.5 0.75 0.25 0.75 0.5 0.5 0.5 ...
$ X14: num 0.75 0.75 0.75 1 0.75 0.75 0.75 1 0.75 0.75 ...
$ X15: num 1 0 0.5 1 1 1 0.75 1 0.5 1 ...
$ X16: num 1 0.333 0.667 0.833 0.833 ...
$ X17: num 1 1 1 1 1 1 1 1 1 1 ...
$ X18: num 0.00157 0.000438 0.001059 0.000879 0.004919 ...
$ X19: num 0.5 0.125 1 0.625 0.125 0.125 0.125 1 0.5 0.25 ...

Related

Error message using glmer function "Error in pwrssUpdate"

I'm trying to create linear mixed model to explain the presence / absence of a species according to 30 fixed environmental variables and 2 random variables ("Location" and "Season"). My data looks like this:
str(glmm_data)
'data.frame': 209 obs. of 40 variables:
$ CODE : Factor w/ 209 levels "VAL1_1","VAL1_2",..: 1 72 142 170 176 183 190 197 203 8 ...
$ Location : Factor w/ 32 levels "ALMENARA","ARES 1",..: 10 11 12 15 17 2 3 4 21 18 ...
$ Season : Factor w/ 7 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ PO4 : num -1.301 -1.301 -1.301 0.437 -1.301 ...
$ NO2 : num -1.129 -1.629 -0.781 -1.699 -1.654 ...
$ NO3 : num 1.044 0.115 1.918 1.457 1.467 ...
$ NH4 : num 0.0123 -0.014 -1.301 -0.2772 -1.301 ...
$ ChlA : num 0.341 0.117 0.87 -0.699 1.53 ...
$ Secchi : num 29 23 10 17 20 9 22 25 25 24 ...
$ Temp_w : num 5.4 3.2 10.3 10.5 4.7 7.2 8 9.2 4.6 6.9 ...
$ Conductivity : num 2.74 2.52 2.76 2.36 2.66 ...
$ Oxi_conc : num 11.6 9.2 7.04 9.99 7 ...
$ Hydroperiod : int 0 0 0 0 1 0 1 0 0 0 ...
$ Rain : int 1 1 1 1 1 1 1 1 1 1 ...
$ RainFre : int 0 0 0 0 0 0 0 0 0 0 ...
$ Veg_flo : num 0 0 0 0 0 0 0 0 0 0 ...
$ Veg_emg : num 0.735 0.524 0.226 0.685 0.226 ...
$ Depth_max : num 1.64 1.57 1.18 1.11 1.85 ...
$ Agricultural : num 0 0 0 0 0 ...
$ LowGrass : num 0 0.41 0.766 0 0.856 ...
$ Forest : num 1.097 1.161 0.44 1.05 0.502 ...
$ Buildings : num 0 0 0 0 0 ...
$ Heterogeneity : num 0.512 0.437 1.028 0.559 0.98 ...
$ Morphology : num 0.04519 -0.00115 0.01556 0.00771 0.12125 ...
$ Fish : int 0 0 0 0 0 0 0 0 0 0 ...
$ TempRange : num 1.4 1.4 1.4 1.4 1.4 ...
$ Tavg : num 1.03 1 1.03 1.03 1 ...
$ Precipitation : num 2.8 2.82 2.8 2.81 2.8 ...
$ MatOrg : num 0.264 0.257 0.236 0.251 0.313 ...
$ CO3 : num 0.14 0.163 0.222 0.335 0.306 ...
$ PC1 : num -0.132 -0.186 -0.074 0.127 -0.175 ...
$ PC2 : num -0.0729 0.0568 -0.0428 -0.0688 -0.0464 ...
$ PC3 : num -0.00638 0.01857 0.02817 -0.00918 0.02056 ...
$ Alytes_obstetricans : int 0 0 0 0 0 0 1 0 0 0 ...
$ Bufo_spinosus : int 0 0 0 0 0 0 0 0 0 0 ...
$ Epidalea_calamita : int 0 0 0 0 0 0 0 0 0 0 ...
$ Pelobates_cultripes : int 0 0 0 0 0 0 0 0 0 0 ...
$ Pelodytes_hespericus: int 1 0 0 0 0 0 0 0 0 0 ...
$ Pelophylax_perezi : int 0 0 0 0 1 0 1 0 0 0 ...
$ Pleurodeles_waltl : int 0 0 0 0 0 0 0 0 0 0 ...
PS: if anyone knows a better way to show my data please explain, I'm a noob at this.
The last 7 columns are the response variables, namely presence (1) or absence (0) of said species so my response variables are binomial. I'm using the glmer function from the lme4 package.
I'm trying to create a model for each species. So the first one looks like this:
Aly_Obs_GLMM <- glmer(Alytes_obstetricans ~ PO4 + NO2 + NO3 + NH4 + ChlA +
Secchi + Temp_w + Conductivity + Oxi_conc + Hydroperiod + Rain + RainFre +
Veg_flo + Veg_emg + Depth_max + Agricultural + LowGrass + Forest + Buildings +
Heterogeneity + Morphology + Fish + TempRange + Tavg + Precipitation +
MatOrg + CO3 + PC1 + PC2 + PC3 + (1|Location) + (1|Season), family = binomial,
data = glmm_data
)
However when running the code, I get the followed error message:
Error in pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GHrule(0L),
compDev = compDev, : Downdated VtV is not positive definite
and the model fails to create.
Any ideas on what I may be doing wrong? Thanks

UCI Machine Learning Repository datasets

I am new to UCI Machine Learning Repository datasets
I have tried to download the data into R, but I can not do it.
Could someone please help with this?
Note, I am using MacBook Pro.
data capture
data capture
This is the data I want to use
You need to look at the data first to understand its arrangement and whether there is any metadata like a header. Your browser should be sufficient for this. The first two lines of the ionosphere.data file are:
1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300,g
1,0,1,-0.18829,0.93035,-0.36156,-0.10868,-0.93597,1,-0.04549,0.50874,-0.67743,0.34432,-0.69707,-0.51685,-0.97515,0.05499,-0.62237,0.33109,-1,-0.13151,-0.45300,-0.18056,-0.35734,-0.20332,-0.26569,-0.20468,-0.18401,-0.19040,-0.11593,-0.16626,-0.06288,-0.13738,-0.02447,b
So, no header, but it is a CSV file. Can use either read.table with sep="," or read.csv with header=FALSE. You might (incorrectly as did I) assume the column names are in the other file, but this is a machine learning task where there are no labels, so the read.* functions will assign generic names to the columns of the dataframe created.
You copy the link address with your browser to the datafile, then paste it into read.table in quotes and add the separator argument (since read.table's default separator values (whitespace) does not include commas:
ionosphere <- read.table( "https://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/ionosphere.data",
sep=",") # header=FALSE is default for read.table
> str(ionosphere)
'data.frame': 351 obs. of 35 variables:
$ V1 : int 1 1 1 1 1 1 1 0 1 1 ...
$ V2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ V3 : num 0.995 1 1 1 1 ...
$ V4 : num -0.0589 -0.1883 -0.0336 -0.4516 -0.024 ...
$ V5 : num 0.852 0.93 1 1 0.941 ...
$ V6 : num 0.02306 -0.36156 0.00485 1 0.06531 ...
$ V7 : num 0.834 -0.109 1 0.712 0.921 ...
$ V8 : num -0.377 -0.936 -0.121 -1 -0.233 ...
$ V9 : num 1 1 0.89 0 0.772 ...
$ V10: num 0.0376 -0.0455 0.012 0 -0.164 ...
$ V11: num 0.852 0.509 0.731 0 0.528 ...
$ V12: num -0.1776 -0.6774 0.0535 0 -0.2028 ...
$ V13: num 0.598 0.344 0.854 0 0.564 ...
$ V14: num -0.44945 -0.69707 0.00827 0 -0.00712 ...
$ V15: num 0.605 -0.517 0.546 -1 0.344 ...
$ V16: num -0.38223 -0.97515 0.00299 0.14516 -0.27457 ...
$ V17: num 0.844 0.055 0.838 0.541 0.529 ...
$ V18: num -0.385 -0.622 -0.136 -0.393 -0.218 ...
$ V19: num 0.582 0.331 0.755 -1 0.451 ...
$ V20: num -0.3219 -1 -0.0854 -0.5447 -0.1781 ...
$ V21: num 0.5697 -0.1315 0.7089 -0.6997 0.0598 ...
$ V22: num -0.297 -0.453 -0.275 1 -0.356 ...
$ V23: num 0.3695 -0.1806 0.4339 0 0.0231 ...
$ V24: num -0.474 -0.357 -0.121 0 -0.529 ...
$ V25: num 0.5681 -0.2033 0.5753 1 0.0329 ...
$ V26: num -0.512 -0.266 -0.402 0.907 -0.652 ...
$ V27: num 0.411 -0.205 0.59 0.516 0.133 ...
$ V28: num -0.462 -0.184 -0.221 1 -0.532 ...
$ V29: num 0.2127 -0.1904 0.431 1 0.0243 ...
$ V30: num -0.341 -0.116 -0.174 -0.201 -0.622 ...
$ V31: num 0.4227 -0.1663 0.6044 0.2568 -0.0571 ...
$ V32: num -0.5449 -0.0629 -0.2418 1 -0.5957 ...
$ V33: num 0.1864 -0.1374 0.5605 -0.3238 -0.0461 ...
$ V34: num -0.453 -0.0245 -0.3824 1 -0.657 ...
$ V35: Factor w/ 2 levels "b","g": 2 1 2 1 2 1 2 1 2 1 ...

R glarma error: "requires numeric/complex matrix/vector arguments"

This is my data:
'data.frame': 72 obs. of 7 variables:
$ X1 : chr "2011M1" "2011M2" "2011M3" "2011M4" ...
$ KPR : int 0 0 0 0 0 0 0 0 0 0 ...
$ LTV : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ sukubunga: num 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 ...
$ inflasi : num 0.89 0.13 -0.32 -0.31 0.12 0.55 0.67 0.93 0.27 -0.12 ...
$ npl : num 2.31 2.39 2.22 2.2 2.12 ...
$ sbkredit : num 11.4 11.4 11.3 11.4 11.3 ...
i use the package glarma and this is my steps:
library(readr)
b <- read_csv("E:/b.csv")
dataku<-as.data.frame(b)
dataku$LTV<-as.factor(dataku$LTV)
dataku$LTV<-relevel(dataku$LTV,ref="0")
glmmo<-glm(KPR~LTV+sbkredit+inflasi+npl,data=dataku,family=binomial(link=logit),na.action=na.omit,x=TRUE)
summary(glmmo)
X<-glmmo$x
X<-as.matrix(X)
y1<-dataku$KPR
n1<-rep(1,length(dataku$X))
Y<-cbind(y1,n1-y1)
Y<-as.matrix(Y)
library(glarma)
glarmamo<-glarma(Y,X,phiLags=c(1),phiInit=c(0.6),type="Bin",method="FS",residuals="Pearson",maxit=100,grad=1e-6)
but, i get error :
Error in GL$cov %*% GL$ll.d : requires numeric/complex matrix/vector
arguments
When i multiply GL$cov %*% GL$ll.d for
so, what should i do?

R: Extract one value out of a list and paste it into a data frame

I have a list of data frames (list9).
> str(list9)
List of 2
$ :'data.frame': 64 obs. of 11 variables:
..$ list$Stimulus : Factor w/ 7 levels "108.wav","42.wav",..: 1 1 1 1 1 1 1 1 2 2 ...
..$ list$IndicationStandard: num [1:64] 1 0 1 0 1 0 0 0 0 0 ...
..$ list$P42 : num [1:64] 0 0 0 0 0 0 0 0 0 0 ...
..$ list$P53 : num [1:64] 0 0 0 0 0 0 0 0 0 0 ...
..$ list$P64 : num [1:64] 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 ...
..$ list$P75 : num [1:64] 0.812 0.812 0.812 0.812 0.812 ...
..$ list$P86 : num [1:64] 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 ...
..$ list$P97 : num [1:64] 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ...
..$ list$P108 : num [1:64] 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 ...
..$ list$TGdispInd1 : num [1:64] 0.317 0.317 0.317 0.317 0.317 ...
..$ list$TGdispInd2 : num [1:64] 0.756 0.756 0.756 0.756 0.756 ...
$ :'data.frame': 64 obs. of 11 variables:
..$ list$Stimulus : Factor w/ 7 levels "108.wav","42.wav",..: 1 1 1 1 1 1 1 1 2 2 ...
..$ list$IndicationStandard: num [1:64] 0 0 1 0 1 0 0 0 0 0 ...
..$ list$P42 : num [1:64] 0 0 0 0 0 0 0 0 0 0 ...
..$ list$P53 : num [1:64] 0 0 0 0 0 0 0 0 0 0 ...
..$ list$P64 : num [1:64] 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ...
..$ list$P75 : num [1:64] 0.812 0.812 0.812 0.812 0.812 ...
..$ list$P86 : num [1:64] 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 ...
..$ list$P97 : num [1:64] 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ...
..$ list$P108 : num [1:64] 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ...
..$ list$TGdispInd1 : num [1:64] 0.351 0.351 0.351 0.351 0.351 ...
..$ list$TGdispInd2 : num [1:64] 0.784 0.784 0.784 0.784 0.784 ...
I created a target data frame (result)
> str(result)
'data.frame': 2 obs. of 3 variables:
$ TGdispInd1: num 0 0
$ TGdispInd2: num 0 0
$ subject : chr "TG_75ms_Step11_V1-998-1.txt" "TG_75ms_Step11_V1-999-1.txt"
I would like to paste the first value of list$TGdispInd1 and list$TGdispInd2 of each data frame in the list into the data frame "result" (it could also be the mean of list$TGdispInd1 and list$TGdispInd2, since all 64 values are equal).
This is how the resulting data frame should look like
> result
TGdispInd1 TGdispInd2 subject
1 .317 .756 TG_75ms_Step11_v1-998-1.txt
2 .351 .784 TG_75ms_Step11_v1-999-1.txt
Does anybody know how to do this?
Try
result[1:2] <- do.call(rbind,lapply(list9, function(x)
x[1, c('list$TGdispInd1', 'list$TGdispInd2']))
If you are interested in the mean value
result[1:2] <- do.call(rbind, lapply(list9, function(x)
colMeans(x[c('list$TGdispInd1', 'list$TGdispInd2'])))

R Plyr Rename multiple columns in list of dataframes

Only just discovered Plyr and it has saved me a tonne of lines combining multiple data frames which is great. BUT I have another renaming problem I cannot fathom.
I have a list, which contains a number of data frames (this is a subset as there are actually 108 in the real list).
> str(mydata)
List of 4
$ C11:'data.frame': 8 obs. of 3 variables:
..$ X : Factor w/ 8 levels "n >= 1","n >= 2",..: 1 2 3 4 5 6 7 8
..$ n.ENSEMBLE.COVERAGE: num [1:8] 1 1 1 1 0.96 0.91 0.74 0.5
..$ n.ENSEMBLE.RECALL : num [1:8] 0.88 0.88 0.88 0.88 0.9 0.91 0.94 0.95
$ C12:'data.frame': 8 obs. of 3 variables:
..$ X : Factor w/ 8 levels "n >= 1","n >= 2",..: 1 2 3 4 5 6 7 8
..$ n.ENSEMBLE.COVERAGE: num [1:8] 1 1 1 1 0.96 0.89 0.86 0.72
..$ n.ENSEMBLE.RECALL : num [1:8] 0.91 0.91 0.91 0.91 0.93 0.96 0.97 0.98
$ C13:'data.frame': 8 obs. of 3 variables:
..$ X : Factor w/ 8 levels "n >= 1","n >= 2",..: 1 2 3 4 5 6 7 8
..$ n.ENSEMBLE.COVERAGE: num [1:8] 1 1 1 1 0.94 0.79 0.65 0.46
..$ n.ENSEMBLE.RECALL : num [1:8] 0.85 0.85 0.85 0.85 0.88 0.9 0.92 0.91
$ C14:'data.frame': 8 obs. of 3 variables:
..$ X : Factor w/ 8 levels "n >= 1","n >= 2",..: 1 2 3 4 5 6 7 8
..$ n.ENSEMBLE.COVERAGE: num [1:8] 1 1 1 1 0.98 0.95 0.88 0.74
..$ n.ENSEMBLE.RECALL : num [1:8] 0.91 0.91 0.91 0.91 0.92 0.94 0.95 0.98
What I really want to achieve is for each data frame to have the columns prepended with the title of the dataframe. So in the example the columns would be:
C11.X, C11.n.ENSEMBLE.COVERAGE & C11.n.ENSEMBLE.RECALL
C12.X, C12.n.ENSEMBLE.COVERAGE & C12.n.ENSEMBLE.RECALL
C13.X, C13.n.ENSEMBLE.COVERAGE & C13.n.ENSEMBLE.RECALL
C14.X, C14.n.ENSEMBLE.COVERAGE & C14.n.ENSEMBLE.RECALL
Can anyone suggest an elegant approach to renaming columns like this?
Here's a reproducible example using the iris data set:
# produce a named list of data.frames as sample data:
dflist <- split(iris, iris$Species)
# store the list element names:
n <- names(dflist)
# rename the elements:
Map(function(df, vec) setNames(df, paste(vec, names(df), sep = ".")), dflist, n)

Resources