I am a beginner in R. I have a row standardized matrix (1542x1542) that I created in excel and saved as a .csv file. I am trying to use the matrix in R to calculate Moran's I. -using the following command:
# Weights Matrix Based on Connectivity
sw <- read.csv(file = "20210929_Weights_Matrix.csv")
sw.2.mat <- as.matrix(sw)
## mat to listw
mat2listw(sw.2.mat)
dnn.2.listw = nb2listw(sw.2.mat, zero.policy=T)
However, when I run the command I get the following errors
sw <- read.csv(file = "20210929_Weights_Matrix.csv")
sw.2.mat <- as.matrix(sw)
mat2listw(sw.2.mat) Error in mat2listw(sw.2.mat) : x must be a square matrix
dnn.2.listw = nb2listw(sw.2.mat, zero.policy=T)
Error in nb2listw(sw.2.mat, zero.policy = T) : Not a neighbours list
When I try to add an additional row in excel, I get the following error in R
sw <- read.csv(file = "20210929_Weights_Matrix.csv")
sw.2.mat <- as.matrix(sw)
## mat to listw
mat2listw(sw.2.mat)
Error in if (any(x < 0)) stop("values in x cannot be negative") :
missing value where TRUE/FALSE needed
dnn.2.listw = nb2listw(sw.2.mat, zero.policy=T)
Error in nb2listw(sw.2.mat, zero.policy = T) : Not a neighbours list
Could someone please help? Is there a possibility I can share my excel?
Related
I've recently been attempting to evaluate output from k-modes (a cluster label), relative to a so-called True cluster label (labelled 'class' below).
In other words: I've been attempting to external validate the clustering output. However, when I tried external validation measures from the 'fpc' package, I was unsuccessful (error term posted below script).
I've attached my code for the mushroom dataset. I would appreciate if anyone could show me how to successful execute these external validation measures in the context of categorical data.
Any help appreciated.
# LIBRARIES
install.packages('klaR')
install.packages('fpc')
library(klaR)
library(fpc)
#MUSHROOM DATA
mushrooms <- read.csv(file = "https://raw.githubusercontent.com/miachen410/Mushrooms/master/mushrooms.csv", header = FALSE)
names(mushrooms) <- c("edibility", "cap-shape", "cap-surface", "cap-color",
"bruises", "odor", "gill-attachment", "gill-spacing",
"gill-size", "gill-color", "stalk-shape", "stalk-root",
"stalk-surface-above-ring", "stalk-surface-below-ring",
"stalk-color-above-ring", "stalk-color-below-ring", "veil-type",
"veil-color", "ring-number", "ring-type", "spore-print-color",
"population", "habitat")
names(mushrooms)[names(mushrooms)=="edibility"] <- "class"
indexes <- apply(mushrooms, 2, function(x) any(is.na(x) | is.infinite(x)))
colnames(mushrooms)[indexes]
table(mushrooms$class)
str(mushrooms)
#REMOVING CLASS VARIABLE
mushroom.df <- subset(mushrooms, select = -c(class))
#KMODES ANALYSIS
result.kmode <- kmodes(mushroom.df, 2, iter.max = 50, weighted = FALSE)
#EXTERNAL VALIDATION ATTEMPT
mushrooms$class <- as.factor(mushrooms$class)
class <- as.numeric(mushrooms$class))
clust_stats <- cluster.stats(d = dist(mushroom.df),
class, result.kmode$cluster)
#ERROR TERM
Error in silhouette.default(clustering, dmatrix = dmat) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In dist(mushroom.df) : NAs introduced by coercion
I am using PC algorithm function, in which Conditional Independence is one of the attribute. Facing error in the following code. Note that 'data' here is the data that I have been using, and 1,6,2 in gaussCItest are the node positions in my adjacency matrix x and y of the data.
code:
library(pcalg)
suffstat <- list(C = cor(data), n = nrow(data))
pc.data <- pc(suffstat,
indepTest=gaussCItest(1,6,2,suffstat),
p=ncol(data),alpha=0.01)
Error:
Error in indepTest(x, y, nbrs[S], suffStat) :
could not find function "indepTest"
Below is the code that worked.removed the parameters for gaussCItest as its a function, which can be used directly.
library(pcalg)
suffstat <- list(C = cor(data), n = nrow(data))
pc.data <- pc(suffstat,indepTest=gaussCItest, p=ncol(data),alpha=0.01)
I'm now studying R, and now doing project about movie recommend algorithm.
I used movielense 100k data with recommenderlab library, and use these tutorials.
https://mitxpro.mit.edu/asset-v1%3AMITProfessionalX+DSx+2017_T1+type#asset+block#Module4_CS1_Movies.pdf
https://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf
I've now calculated sparsity, and splited data into train and test data.
And I want to make popularity recommendation code. My code is here:
install.packages("SnowballC")
install.packages("class")
install.packages("dbscan")
install.packages("proxy")
install.packages("recommenderlab")
install.packages("dplyr")
install.packages("tm")
install.packages("reshape2")
library(recommenderlab)
library(dplyr)
library(tm)
library(SnowballC)
library(class)
library(dbscan)
library(proxy)
library(reshape2)
#read data
data<- read.table('C:/Users/ginny/OneDrive/Documents/2018_1/dataanalytics/실습3/ml-100k/u.data')
#####raw data to matrix#####
data.frame2matrix = function(data, rowtitle, coltitle, datatitle,
rowdecreasing = FALSE, coldecreasing = FALSE,
default_value = NA) {
# check, whether titles exist as columns names in the data.frame data
if ( (!(rowtitle%in%names(data)))
|| (!(coltitle%in%names(data)))
|| (!(datatitle%in%names(data))) ) {
stop('data.frame2matrix: bad row-, col-, or datatitle.')
}
# get number of rows in data
ndata = dim(data)[1]
# extract rownames and colnames for the matrix from the data.frame
rownames = sort(unique(data[[rowtitle]]), decreasing = rowdecreasing)
nrows = length(rownames)
colnames = sort(unique(data[[coltitle]]), decreasing = coldecreasing)
ncols = length(colnames)
# initialize the matrix
out_matrix = matrix(NA,
nrow = nrows, ncol = ncols,
dimnames=list(rownames, colnames))
# iterate rows of data
for (i1 in 1:ndata) {
# get matrix-row and matrix-column indices for the current data-row
iR = which(rownames==data[[rowtitle]][i1])
iC = which(colnames==data[[coltitle]][i1])
# throw an error if the matrix entry (iR,iC) is already filled.
if (!is.na(out_matrix[iR, iC])) stop('data.frame2matrix: double entry in data.frame')
out_matrix[iR, iC] = data[[datatitle]][i1]
}
# set empty matrix entries to the default value
out_matrix[is.na(out_matrix)] = default_value
# return matrix
return(out_matrix)
}
#data 열 별로 이름 지정('' 안은 필요에 따라 변경 가능)
colnames(data)<-c('user_id','item_id','rating','timestamp')
#raw 데이터를 matrix로 변환
pre_data = data.frame2matrix(data, 'user_id', 'item_id', 'rating')
#matrix를 realratingmatrix로 변환
target_data<- as(as(pre_data, "matrix"), "realRatingMatrix")
data=data[,-which(names(data) %in% c("timestamp"))]
data
str(data)
summary(data)
hist(data$rating)
write.csv(data,"C:/Users/ginny/OneDrive/Documents/2018_1/dataanalytics/실습
3/u.csv")
Number_Ratings=nrow(data)
Number_Ratings
Number_Movies=length(unique(data$item_id))
Number_Movies
Number_Users=length(unique(data$user_id))
Number_Users
data1=data[data$user_id %in% names(table(data$user_id))
[table(data$user_id)>50],]
Number_Ratings1=nrow(data1)
Number_Movies1=length(unique(data1$item_id))
Number_Users1=length(unique(data1$user_id))
sparsity=((Number_Ratings1)*3*5*100)/((Number_Movies1)*(Number_Users1))
sparsity
install.packages("caTools")
library(caTools)
set.seed(10)
sample=sample.split(data1$rating, SplitRatio=0.75)
train=subset(data1, sample==TRUE)
test=subset(data1, sample==FALSE)
data2<-as.data.frame(data1)
data2
#matrix to realratingmatrix
target_data2<- as(as(pre_data2, "matrix"), "realRatingMatrix")
recommender_models<-recommenderRegistry$get_entry(dataType =
"realRatingMatrix")
recomm_model <- Recommender(data2$rating, method = "POPULAR")
I used data2 realRatingMatrix, but when I run last line, error like this happen:
Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘Recommender’ for signature ‘"integer"’
Can anybody help me what's wrong with it?
I'm trying to use cor.ci to obtain polychoric correlations with significance tests, but it keeps giving me an error message. Here is the code:
install.packages("Hmisc")
library(Hmisc)
mydata <- spss.get("S-IAT for R.sav", use.value.labels=TRUE)
install.packages('psych')
library(psych)
poly.example <- cor.ci(mydata(nvar = 10,n = 100)$items,n.iter = 10,poly = TRUE)
poly.example
print(corr.test(poly.example$rho), short=FALSE)
Here is the error message it gives:
> library(psych)
> poly.example <- cor.ci(mydata(nvar = 10,n = 100)$items,n.iter = 10,poly = TRUE)
Error in cor.ci(mydata(nvar = 10, n = 100)$items, n.iter = 10, poly = TRUE) :
could not find function "mydata"
> poly.example
Error: object 'poly.example' not found
> print(corr.test(poly.example$rho), short=FALSE)
Error in is.data.frame(x) : object 'poly.example' not found
How can I make it recognize mydata and/or select certain variables from this dataset for the analysis? I got the above code from here:
Polychoric correlation matrix with significance in R
Thanks!
You have several problems.
1) As previously commented upon, you are treating mydata as a function, but you need to treat it as a data.frame. Thus the call should be
poly.example <- cor.ci(mydata,n.iter = 10,poly = TRUE)
If you are trying to just get the first 100 cases and the first 10 variables, then
poly.example <- cor.ci(mydata[1:10,1:100],n.iter = 10,poly = TRUE)
2) Then, you do not want to run corr.test on the resulting correlation matrix. corr.test should be run on the data.
print(corr.test(mydata[1:10,1:100],short=FALSE)
Note that corr.test is testing the Pearson correlation.
I am trying to normalize the data frame before prediction but I get this error :
Error in seq_len(nrows)[i] :
only 0's may be mixed with negative subscripts
Called from: top level
Here is my code :
library('caret')
load(file = "some dataset path here")
DummyDataSet = data
attach(DummyDataSet)
foldCount = 10
classifyLabels = DummyDataSet$ClassLabel
folds = createFolds(classifyLabels,k=foldCount)
for (foldIndex in 1:foldCount){
cat("----- Start Fold -----\n")
#holding out samples of one fold in each iterration
testFold = DummyDataSet[folds[[foldIndex]],]
testLabels = classifyLabels[folds[[foldIndex]]]
trainFolds = DummyDataSet[-folds[[foldIndex]],]
trainLabels = classifyLabels[-folds[[foldIndex]]]
#Zero mean unit variance normalization to ONLY numerical data
for (k in 1:ncol(trainFolds)){
if (!is.integer(trainFolds[,k])){
params = meanStdCalculator(trainFolds[,k])
trainFolds[,k] = sapply(trainFolds[,k], function(x) (x - params[1])/params[2])
testFold[,k] = sapply(testFold[,k], function(x) (x - params[1])/params[2])
}
}
meanStdCalculator = function(data){
Avg = mean(data)
stdDeviation = sqrt(var(data))
return(c(Avg,stdDeviation))
}
cat("----- Start Fold -----\n")
}
where trainFolds is a fold creating by caret package and its type is data.frame.
I have already read these links :
R Debugging
Subset
Negative Subscripts
but I couldn't find out what is wrong with the indexes?
anybody can help me?