CSV Matrix in R - r

I am a beginner in R. I have a row standardized matrix (1542x1542) that I created in excel and saved as a .csv file. I am trying to use the matrix in R to calculate Moran's I. -using the following command:
# Weights Matrix Based on Connectivity
sw <- read.csv(file = "20210929_Weights_Matrix.csv")
sw.2.mat <- as.matrix(sw)
## mat to listw
mat2listw(sw.2.mat)
dnn.2.listw = nb2listw(sw.2.mat, zero.policy=T)
However, when I run the command I get the following errors
sw <- read.csv(file = "20210929_Weights_Matrix.csv")
sw.2.mat <- as.matrix(sw)
mat2listw(sw.2.mat) Error in mat2listw(sw.2.mat) : x must be a square matrix
dnn.2.listw = nb2listw(sw.2.mat, zero.policy=T)
Error in nb2listw(sw.2.mat, zero.policy = T) : Not a neighbours list
When I try to add an additional row in excel, I get the following error in R
sw <- read.csv(file = "20210929_Weights_Matrix.csv")
sw.2.mat <- as.matrix(sw)
## mat to listw
mat2listw(sw.2.mat)
Error in if (any(x < 0)) stop("values in x cannot be negative") :
missing value where TRUE/FALSE needed
dnn.2.listw = nb2listw(sw.2.mat, zero.policy=T)
Error in nb2listw(sw.2.mat, zero.policy = T) : Not a neighbours list
Could someone please help? Is there a possibility I can share my excel?

Related

External Cluster Validation - Categorical Data R

I've recently been attempting to evaluate output from k-modes (a cluster label), relative to a so-called True cluster label (labelled 'class' below).
In other words: I've been attempting to external validate the clustering output. However, when I tried external validation measures from the 'fpc' package, I was unsuccessful (error term posted below script).
I've attached my code for the mushroom dataset. I would appreciate if anyone could show me how to successful execute these external validation measures in the context of categorical data.
Any help appreciated.
# LIBRARIES
install.packages('klaR')
install.packages('fpc')
library(klaR)
library(fpc)
#MUSHROOM DATA
mushrooms <- read.csv(file = "https://raw.githubusercontent.com/miachen410/Mushrooms/master/mushrooms.csv", header = FALSE)
names(mushrooms) <- c("edibility", "cap-shape", "cap-surface", "cap-color",
"bruises", "odor", "gill-attachment", "gill-spacing",
"gill-size", "gill-color", "stalk-shape", "stalk-root",
"stalk-surface-above-ring", "stalk-surface-below-ring",
"stalk-color-above-ring", "stalk-color-below-ring", "veil-type",
"veil-color", "ring-number", "ring-type", "spore-print-color",
"population", "habitat")
names(mushrooms)[names(mushrooms)=="edibility"] <- "class"
indexes <- apply(mushrooms, 2, function(x) any(is.na(x) | is.infinite(x)))
colnames(mushrooms)[indexes]
table(mushrooms$class)
str(mushrooms)
#REMOVING CLASS VARIABLE
mushroom.df <- subset(mushrooms, select = -c(class))
#KMODES ANALYSIS
result.kmode <- kmodes(mushroom.df, 2, iter.max = 50, weighted = FALSE)
#EXTERNAL VALIDATION ATTEMPT
mushrooms$class <- as.factor(mushrooms$class)
class <- as.numeric(mushrooms$class))
clust_stats <- cluster.stats(d = dist(mushroom.df),
class, result.kmode$cluster)
#ERROR TERM
Error in silhouette.default(clustering, dmatrix = dmat) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In dist(mushroom.df) : NAs introduced by coercion

Error in 'indepTest' in PC algorithm for conditional Independence Test

I am using PC algorithm function, in which Conditional Independence is one of the attribute. Facing error in the following code. Note that 'data' here is the data that I have been using, and 1,6,2 in gaussCItest are the node positions in my adjacency matrix x and y of the data.
code:
library(pcalg)
suffstat <- list(C = cor(data), n = nrow(data))
pc.data <- pc(suffstat,
indepTest=gaussCItest(1,6,2,suffstat),
p=ncol(data),alpha=0.01)
Error:
Error in indepTest(x, y, nbrs[S], suffStat) :
could not find function "indepTest"
Below is the code that worked.removed the parameters for gaussCItest as its a function, which can be used directly.
library(pcalg)
suffstat <- list(C = cor(data), n = nrow(data))
pc.data <- pc(suffstat,indepTest=gaussCItest, p=ncol(data),alpha=0.01)

movielense popularity recommender code with R

I'm now studying R, and now doing project about movie recommend algorithm.
I used movielense 100k data with recommenderlab library, and use these tutorials.
https://mitxpro.mit.edu/asset-v1%3AMITProfessionalX+DSx+2017_T1+type#asset+block#Module4_CS1_Movies.pdf
https://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf
I've now calculated sparsity, and splited data into train and test data.
And I want to make popularity recommendation code. My code is here:
install.packages("SnowballC")
install.packages("class")
install.packages("dbscan")
install.packages("proxy")
install.packages("recommenderlab")
install.packages("dplyr")
install.packages("tm")
install.packages("reshape2")
library(recommenderlab)
library(dplyr)
library(tm)
library(SnowballC)
library(class)
library(dbscan)
library(proxy)
library(reshape2)
#read data
data<- read.table('C:/Users/ginny/OneDrive/Documents/2018_1/dataanalytics/실습3/ml-100k/u.data')
#####raw data to matrix#####
data.frame2matrix = function(data, rowtitle, coltitle, datatitle,
rowdecreasing = FALSE, coldecreasing = FALSE,
default_value = NA) {
# check, whether titles exist as columns names in the data.frame data
if ( (!(rowtitle%in%names(data)))
|| (!(coltitle%in%names(data)))
|| (!(datatitle%in%names(data))) ) {
stop('data.frame2matrix: bad row-, col-, or datatitle.')
}
# get number of rows in data
ndata = dim(data)[1]
# extract rownames and colnames for the matrix from the data.frame
rownames = sort(unique(data[[rowtitle]]), decreasing = rowdecreasing)
nrows = length(rownames)
colnames = sort(unique(data[[coltitle]]), decreasing = coldecreasing)
ncols = length(colnames)
# initialize the matrix
out_matrix = matrix(NA,
nrow = nrows, ncol = ncols,
dimnames=list(rownames, colnames))
# iterate rows of data
for (i1 in 1:ndata) {
# get matrix-row and matrix-column indices for the current data-row
iR = which(rownames==data[[rowtitle]][i1])
iC = which(colnames==data[[coltitle]][i1])
# throw an error if the matrix entry (iR,iC) is already filled.
if (!is.na(out_matrix[iR, iC])) stop('data.frame2matrix: double entry in data.frame')
out_matrix[iR, iC] = data[[datatitle]][i1]
}
# set empty matrix entries to the default value
out_matrix[is.na(out_matrix)] = default_value
# return matrix
return(out_matrix)
}
#data 열 별로 이름 지정('' 안은 필요에 따라 변경 가능)
colnames(data)<-c('user_id','item_id','rating','timestamp')
#raw 데이터를 matrix로 변환
pre_data = data.frame2matrix(data, 'user_id', 'item_id', 'rating')
#matrix를 realratingmatrix로 변환
target_data<- as(as(pre_data, "matrix"), "realRatingMatrix")
data=data[,-which(names(data) %in% c("timestamp"))]
data
str(data)
summary(data)
hist(data$rating)
write.csv(data,"C:/Users/ginny/OneDrive/Documents/2018_1/dataanalytics/실습
3/u.csv")
Number_Ratings=nrow(data)
Number_Ratings
Number_Movies=length(unique(data$item_id))
Number_Movies
Number_Users=length(unique(data$user_id))
Number_Users
data1=data[data$user_id %in% names(table(data$user_id))
[table(data$user_id)>50],]
Number_Ratings1=nrow(data1)
Number_Movies1=length(unique(data1$item_id))
Number_Users1=length(unique(data1$user_id))
sparsity=((Number_Ratings1)*3*5*100)/((Number_Movies1)*(Number_Users1))
sparsity
install.packages("caTools")
library(caTools)
set.seed(10)
sample=sample.split(data1$rating, SplitRatio=0.75)
train=subset(data1, sample==TRUE)
test=subset(data1, sample==FALSE)
data2<-as.data.frame(data1)
data2
#matrix to realratingmatrix
target_data2<- as(as(pre_data2, "matrix"), "realRatingMatrix")
recommender_models<-recommenderRegistry$get_entry(dataType =
"realRatingMatrix")
recomm_model <- Recommender(data2$rating, method = "POPULAR")
I used data2 realRatingMatrix, but when I run last line, error like this happen:
Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘Recommender’ for signature ‘"integer"’
Can anybody help me what's wrong with it?

Error: object not found - cor.ci

I'm trying to use cor.ci to obtain polychoric correlations with significance tests, but it keeps giving me an error message. Here is the code:
install.packages("Hmisc")
library(Hmisc)
mydata <- spss.get("S-IAT for R.sav", use.value.labels=TRUE)
install.packages('psych')
library(psych)
poly.example <- cor.ci(mydata(nvar = 10,n = 100)$items,n.iter = 10,poly = TRUE)
poly.example
print(corr.test(poly.example$rho), short=FALSE)
Here is the error message it gives:
> library(psych)
> poly.example <- cor.ci(mydata(nvar = 10,n = 100)$items,n.iter = 10,poly = TRUE)
Error in cor.ci(mydata(nvar = 10, n = 100)$items, n.iter = 10, poly = TRUE) :
could not find function "mydata"
> poly.example
Error: object 'poly.example' not found
> print(corr.test(poly.example$rho), short=FALSE)
Error in is.data.frame(x) : object 'poly.example' not found
How can I make it recognize mydata and/or select certain variables from this dataset for the analysis? I got the above code from here:
Polychoric correlation matrix with significance in R
Thanks!
You have several problems.
1) As previously commented upon, you are treating mydata as a function, but you need to treat it as a data.frame. Thus the call should be
poly.example <- cor.ci(mydata,n.iter = 10,poly = TRUE)
If you are trying to just get the first 100 cases and the first 10 variables, then
poly.example <- cor.ci(mydata[1:10,1:100],n.iter = 10,poly = TRUE)
2) Then, you do not want to run corr.test on the resulting correlation matrix. corr.test should be run on the data.
print(corr.test(mydata[1:10,1:100],short=FALSE)
Note that corr.test is testing the Pearson correlation.

how to solve negative subscript error in R?

I am trying to normalize the data frame before prediction but I get this error :
Error in seq_len(nrows)[i] :
only 0's may be mixed with negative subscripts
Called from: top level
Here is my code :
library('caret')
load(file = "some dataset path here")
DummyDataSet = data
attach(DummyDataSet)
foldCount = 10
classifyLabels = DummyDataSet$ClassLabel
folds = createFolds(classifyLabels,k=foldCount)
for (foldIndex in 1:foldCount){
cat("----- Start Fold -----\n")
#holding out samples of one fold in each iterration
testFold = DummyDataSet[folds[[foldIndex]],]
testLabels = classifyLabels[folds[[foldIndex]]]
trainFolds = DummyDataSet[-folds[[foldIndex]],]
trainLabels = classifyLabels[-folds[[foldIndex]]]
#Zero mean unit variance normalization to ONLY numerical data
for (k in 1:ncol(trainFolds)){
if (!is.integer(trainFolds[,k])){
params = meanStdCalculator(trainFolds[,k])
trainFolds[,k] = sapply(trainFolds[,k], function(x) (x - params[1])/params[2])
testFold[,k] = sapply(testFold[,k], function(x) (x - params[1])/params[2])
}
}
meanStdCalculator = function(data){
Avg = mean(data)
stdDeviation = sqrt(var(data))
return(c(Avg,stdDeviation))
}
cat("----- Start Fold -----\n")
}
where trainFolds is a fold creating by caret package and its type is data.frame.
I have already read these links :
R Debugging
Subset
Negative Subscripts
but I couldn't find out what is wrong with the indexes?
anybody can help me?

Resources