KNN: "no missing values are allow" -> I do not have missing values - r

I am in a group project for a class and one of the people in my group ran the normalization, as well as creating the test/train sets so that we all have the same sets to work from (we're all utilizing different algorithms). I am assigned with running the KNN algorithm.
We had multiple columns with NA's so those columns were omitted (<-NULL). When attempting to run the KNN I keep getting the error of
Error in knn(train = trainsetne, test = testsetne, cl = ne_train_target, :
no missing values are allowed
I ran which(is.na(dataset$col)) and found:
which(is.na(testsetne$median_days_on_market))
# [1] 8038 8097 8098 8100 8293 8304
When I look through the dataset those cells do not have missing data.
I am wondering if I may get some help with how to either find and fix the "No missing values" or to find a work around (if any).
I am sorry if I am missing something simple. Any help is appreciated.
I have listed the code that we have below:
ne$pending_ratio_yy <- ne$total_listing_count_yy <- ne$average_listing_price_yy <- ne$median_square_feet_yy <- ne$median_listing_price_per_square_feet_yy <- ne$pending_listing_count_yy <- ne$price_reduced_count_yy <- ne$median_days_on_market_yy <- ne$new_listing_count_yy <- ne$price_increased_count_yy <- ne$active_listing_count_yy <- ne$median_listing_price_yy <- ne$flag <- NULL
ne$pending_ratio_mm <- ne$total_listing_count_mm <- ne$average_listing_price_mm <- ne$median_square_feet_mm <- ne$median_listing_price_per_square_feet_mm <- ne$pending_listing_count_mm <- ne$price_reduced_count_mm <- ne$price_increased_count_mm <- ne$new_listing_count_mm <- ne$median_days_on_market_mm <- ne$active_listing_count_mm <- ne$median_listing_price_mm <- NULL
ne$factor_month_date <- as.factor(ne$month_date_yyyymm)
ne$factor_median_days_on_market <- as.factor(ne$median_days_on_market)
train20ne= sample(1:20893, 4179)
trainsetne=ne[train20ne,1:10]
testsetne=ne[-train20ne,1:10]
#This is where I start to come in
ne_train_target <- ne[train20ne, 3]
ne_test_target <- ne[-train20ne, 3]
predict_1 <- knn(train = trainsetne, test = testsetne, cl=ne_train_target, k=145)
# Error in knn(train = trainsetne, test = testsetne, cl = ne_train_target, :
# no missing values are allowed

Related

R Model returning the error: Too many open devices

I've been working on the creation of a training model in R for MS Azure. When I initially set up the model it all worked fine. Now it's continuously returning the below:
{"error":{"code":"LibraryExecutionError","message":"Module execution encountered an internal library error.","details":[{"code":"FailedToEvaluateRScript","target":"Score Model (RPackage)","message":"The following error occurred during evaluation of R script: R_tryEval: return error: Error in png(file = \"3e25ea05d5bc49d683f4471ff40780bcrViz%03d.png\", bg = \"transparent\") : \n too many open devices\n"}]}}
I haven't changed anything, and have looked around online only to find references to other issues. My code is as follows:
Trainer R Script
# Modify Datatype, factor Level, Replace NA to 0
x <- dataset
for (i in seq_along(x)) {
if (class(x[[i]]) == "character") {
#Convert Type
x[[i]] <- type.convert(x[[i]])
#Apply Levels
# levels(x[[i]]) <- levels(cols_modeled[, names(x)[i]]) # linked with levels in model
}
if (is.numeric(x[[i]]) && is.na(x[[i]]) ){
#print("*** Updating NA to 0")
x[[i]] <- 0
}
}
df1 <- x
rm(x)
set.seed(1234)
model <- svm(Paid ~ ., data= df1, type= "C")
Scorer R Script
library(e1071)
scores <- data.frame( predicted_result = predict(model, dataset))
Has anyone come across this before?

R: Package topicmodels: LDA: Error: invalid argument

I have a question regarding LDA in topicmodels in R.
I created a matrix with documents as rows, terms as columns, and the number of terms in a document as respective values from a data frame. While I wanted to start LDA, I got an Error Message stating "Error in !all.equal(x$v, as.integer(x$v)) : invalid argument type" . The data contains 1675 documents of 368 terms. What can I do to make the code work?
library("tm")
library("topicmodels")
data_matrix <- data %>%
group_by(documents, terms) %>%
tally %>%
spread(terms, n, fill=0)
doctermmatrix <- as.DocumentTermMatrix(data_matrix, weightTf("data_matrix"))
lda_head <- topicmodels::LDA(doctermmatrix, 10, method="Gibbs")
Help is much appreciated!
edit
# Toy Data
documentstoy <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)
meta1toy <- c(3,4,1,12,1,2,3,5,1,4,2,1,1,1,1,1)
meta2toy <- c(10,0,10,1,1,0,1,1,3,3,0,0,18,1,10,10)
termstoy <- c("cus","cus","bill","bill","tube","tube","coa","coa","un","arc","arc","yib","yib","yib","dar","dar")
toydata <- data.frame(documentstoy,meta1toy,meta2toy,termstoy)
So I looked inside the code and apparently the lda() function only accepts integers as the input so you have to convert your categorical variables as below:
library('tm')
library('topicmodels')
documentstoy <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)
meta1toy <- c(3,4,1,12,1,2,3,5,1,4,2,1,1,1,1,1)
meta2toy <- c(10,0,10,1,1,0,1,1,3,3,0,0,18,1,10,10)
toydata <- data.frame(documentstoy,meta1toy,meta2toy)
termstoy <- c("cus","cus","bill","bill","tube","tube","coa","coa","un","arc","arc","yib","yib","yib","dar","dar")
toy_unique = unique(termstoy)
for (i in 1:length(toy_unique)){
A = as.integer(termstoy == toy_unique[i])
toydata[toy_unique[i]] = A
}
lda_head <- topicmodels::LDA(toydata, 10, method="Gibbs")

R: error in t test

https://filebin.net/3et86d1gh8cer9mu this is example subset of my data
I try to apply a code that was already working on similar data, now I can't tract where its wrong. The code goes like this:
url <- 'https://filebin.net/3et86d1gh8cer9mu/TCA_subset_GnoG_melt.csv'
TCA_subset_GnoG_melt <- read.csv(url)
L <- data.frame()
IDs <- unique(TCA_subset_GnoG_melt$X1)
for (i in 1 : length(IDs)){
temp<-TCA_subset_GnoG_melt[(TCA_subset_GnoG_melt$X1)==IDs[i],]
temp<- na.omit(temp)
t_test_CTROL_ABC.7<- t.test(temp$value[temp$X1.1=="CTROL"], temp$value[temp$X1.1=="ABC.7"])
t_test_CTROL_ABC.8<- t.test(temp$value[temp$X1.1=="CTROL"], temp$value[temp$X1.1=="ABC.8"])
t_test_CTROL_ABC.7.8<- t.test(temp$value[temp$X1.1=="CTROL"], temp$value[temp$X1.1=="ABC7.8"])
t_test_ABC.7_ABC.8<- t.test(temp$value[temp$X1.1=="ABC.7"], temp$value[temp$X1.1=="ABC.8"])
t_test_ABC.7_ABC.7.8<- t.test(temp$value[temp$X1.1=="ABC.7"], temp$value[temp$X1.1=="ABC7.8"])
t_test_ABC.8_ABC.7.8<- t.test(temp$value[temp$X1.1=="ABC.8"], temp$value[temp$X1.1=="ABC7.8"])
LLc <- cbind(as.character(unique(IDs[i])), t_test_CTROL_ABC.7,t_test_CTROL_ABC.8,t_test_CTROL_ABC.7.8, t_test_ABC.7_ABC.8,t_test_ABC.7_ABC.7.8, t_test_ABC.8_ABC.7.8)
L<-rbind(L,LLc)
}
AA<-rownames(L)
L$names <- AA
p_value_TCA <-L[grep("p.value",L$names), ]
df <- apply(p_value_TCA ,2,as.character)
df = as.matrix(df)
the error i get is:
Error in t.test.default(temp$value[temp$X1.1 == "CTROL"], temp$value[temp$X1.1 == :
not enough 'y' observations
I dpm't understand it, when i check the code line by line it goes until the LLc creation and than the df "L" is empty. it makes no sense to me. help!

Object 'sef' not found in R corr.test

I am attempting to run the corr.test equation in R, with code that my professor submitted and tested on his system. Unfortunately, when I run it I am getting an error that "object sef not found".
This is confounding both my professor and I, and having done a thorough search, we're not sure how to address this.
I really appreciate any help you can provide.
Edit: Here is the code I am using:
trendan1 <- read.table("trendan1.for.R.dat", header=TRUE, na.strings=".")
head(trendan1)
tail(trendan1)
attributes(trendan1)
is.matrix(trendan1)
id <- trendan1$id
famenv1 <- trendan1$famenv1
famenv2 <- trendan1$famenv2
famenv3 <- trendan1$famenv3
conf1 <- trendan1$conf1
conf2 <- trendan1$conf2
conf3 <- trendan1$conf3
trendan1dataset1 <- cbind(id,famenv1,famenv2,famenv3,conf1,conf2,conf3)
attributes(trendan1dataset1)
is.matrix(trendan1dataset1)
is.data.frame(trendan1dataset1)
require("psych")
describe(trendan1dataset1[,2:7])
print(describe(trendan1dataset1[,2:7]), digits=6)
famave <- (1*famenv1 + 1*famenv2 + 1*famenv3)/3
famlin <- -1*famenv1 + 0*famenv2 + 1*famenv3
famquad <- 1*famenv1 - 2*famenv2 + 1*famenv3;
trendandataset2 <- cbind(famenv1,famenv2,famenv3,famave,famlin,famquad)
print(describe(trendandataset2), digits=6)
hist(famenv1)
boxplot(famenv1)
abline(h=mean(famenv1))
qqnorm(famenv1,ylab="famenv1")
qqline(famenv1)
shapiro.test(famenv1)
hist(famenv2)
boxplot(famenv2)
abline(h=mean(famenv2)) # add mean to the boxplot
qqnorm(famenv1,ylab="famenv2")
qqline(famenv2)
shapiro.test(famenv2)
corvars1 <- cbind(famenv1,famenv2,famenv3)
cor(corvars1,use = "everything", method = "pearson")
cov(corvars1,use = "everything")
sscp1 <- t(corvars1)%*%(corvars1) #Matrix multiplcation
sscp1
rc1 <- corr.test(corvars1,
use="pairwise",method="pearson",adjust="holm",alpha=.05, ci=FALSE)
attributes(rc1)
print(rc1$p, digits=6)
This is a bug that sometimes happens when you do not evaluate confidence interval. It should be fixed if u change the option to ci=TRUE, or simply delete this option as the default is ci=TRUE.

How to estimate static yield curve with 'termstrc' package in R?

I am trying to estimate the static yield curve for Brazil using termstrc package in R. I am using the function estim_nss.couponbonds and putting 0% coupon-rates and $0 cash-flows, except for the last one which is $1000 (the face-value at maturity) -- as far as I know this is the function to do this, because the estim_nss.zeroyields only calculates the dynamic curve. The problem is that I receive the following error message:
"Error in (pos_cf[i] + 1):pos_cf[i + 1] : NA/NaN argument In addition: Warning message: In max(n_of_cf) : no non-missing arguments to max; returning -Inf "
I've tried to trace the problem using trace(estim_nss.couponbons, edit=T) but I cannot find where pos_cf[i]+1 is calculated. Based on the name I figured it could come from the postpro_bondfunction and used trace(postpro_bond, edit=T), but I couldn't find the calculation again. I believe "cf" comes from cashflow, so there could be some problem in the calculation of the cashflows somehow. I used create_cashflows_matrix to test this theory, but it works well, so I am not sure the problem is in the cashflows.
The code is:
#Creating the 'couponbond' class
ISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019', 'ltn_2021','ltn_2023')) #Bond's identification
MATURITYDATE <- as.Date(c(42736, 43101, 43466, 44197, 44927), origin='1899-12-30') #Dates are in system's format
ISSUEDATE <- as.Date(c(41288,41666,42395, 42073, 42395), origin='1899-12-30') #Dates are in system's format
COUPONRATE <- rep(0,5) #Coupon rates are 0 because these are zero-coupon bonds
PRICE <- c(969.32, 867.77, 782.48, 628.43, 501.95) #Prices seen 'TODAY'
ACCRUED <- rep(0.1,5) #There is no accrued interest in the brazilian bond's market
#Creating the cashflows sublist
CFISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019', 'ltn_2021', 'ltn_2023')) #Bond's identification
CF <- c(1000,1000,1000,1000,1000)# The face-values
DATE <- as.Date(c(42736, 43101, 43466, 44197, 44927), origin='1899-12-30') #Dates are in system's format
CASHFLOWS <- list(CFISIN,CF,DATE)
names(CASHFLOWS) <- c("ISIN","CF","DATE")
TODAY <- as.Date(42646, origin='1899-12-30')
brasil <- list(ISIN,MATURITYDATE,ISSUEDATE,
COUPONRATE,PRICE,ACCRUED,CASHFLOWS,TODAY)
names(brasil) <- c("ISIN","MATURITYDATE","ISSUEDATE","COUPONRATE",
"PRICE","ACCRUED","CASHFLOWS","TODAY")
mybonds <- list(brasil)
class(mybonds) <- "couponbonds"
#Estimating the zero-yield curve
ns_res <-estim_nss.couponbonds(mybonds, 'brasil' ,method = "ns")
#Testing the hypothesis that the error comes from the cashflow matrix
cf_p <- create_cashflows_matrix(mybonds[[1]], include_price = T)
m_p <- create_maturities_matrix(mybonds[[1]], include_price = T)
b <- bond_yields(cf_p,m_p)
Note that I am aware of this question which reports the same problem. However, it is for the dynamic curve. Besides that, there is no useful answer.
Your code has two problems. (1) doesn't name the 1st list (this is the direct reason of the error. But if modifiy it, another error happens). (2) In the cashflows sublist, at least one level of ISIN needs more than 1 data.
# ...
CFISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019',
'ltn_2021', 'ltn_2023', 'ltn_2023')) # added a 6th element
CF <- c(1000,1000,1000,1000,1000, 1000) # added a 6th
DATE <- as.Date(c(42736,43101,43466,44197,44927, 44928), origin='1899-12-30') # added a 6th
CASHFLOWS <- list(CFISIN,CF,DATE)
names(CASHFLOWS) <- c("ISIN","CF","DATE")
TODAY <- as.Date(42646, origin='1899-12-30')
brasil <- list(ISIN,MATURITYDATE,ISSUEDATE,
COUPONRATE,PRICE,ACCRUED,CASHFLOWS,TODAY)
names(brasil) <- c("ISIN","MATURITYDATE","ISSUEDATE","COUPONRATE",
"PRICE","ACCRUED","CASHFLOWS","TODAY")
mybonds <- list(brasil = brasil) # named the list
class(mybonds) <- "couponbonds"
ns_res <-estim_nss.couponbonds(mybonds, 'brasil', method = "ns")
Note: the error came from these lines
bonddata <- bonddata[group] # prepro_bond()'s 1st line (the direct reason).
# cf <- lapply(bonddata, create_cashflows_matrix) # the additional error
create_cashflows_matrix(mybonds[[1]], include_price = F) # don't run

Resources