K nearest neighbors with Gower distance measure in R - r

I hope everyone is well; I have a question it is may be looked as a dumb one but I really need someone to explain it for me. I also though it will be useful for some, since it has been asked before with no satisfactory answer.
Since , I have mixed data type matrix, I was looking for K-nearst neighbors algorithem that works with gower distance in R. I found the function Knngow under the package dprep that claims to perform this.
http://finzi.psych.upenn.edu/library/dprep/html/knngow.html
The function take three argument knngow( Training_Set, Testing_set, K_number) and return the predicted class.
I was playing around with it and was wondering how the function can recognize what is my target vector? Put differently, how does it return the predicted class, without me acknowledging it in advance with my target column.
please find the source code below ( I retrieved it using the function edit)
function (train, test, k)
{
p = dim(train)[2]
ntest = dim(test)[1]
ntrain = dim(train)[1]
classes = rep(0, ntest)
if (ntest == ntrain) {
for (i in 1:ntest) {
tempo = order(gower.dist(test[i, -p], train[-i,
-p]))[1:k]
classes[i] = moda(train[tempo, p])[1]
}
}
else {
for (i in 1:ntest) {
tempo = order(StatMatch::gower.dist(test[i, -p],
train[, -p]))[1:k]
classes[i] = moda(train[tempo, p])[1]
}
}
classes
}
please can someone explain for me the code?
I hope I have post the question in the correct form, please let me know if I have to move it to somewhere else.
Thank you very much for your time.

knngow function takes the last column of the train as the target attribute. Also p = dim(train)[2]) indicates your column number.
Column p (the last column of your training data) is not used for calculating Gower dist. It is only taken into account when it comes to predict the class label of test samples.

Related

R function to check each element and its related children elements to add a result to a list

Suppose we have given dataframe in R. By 0--7, it means it is taking integer values from 0-7 i.e. 0,1,2,3,4,5,6,7.
I am interested in making a function such that
If a[1,1]>alpha, it goes and checks its children i.e. 0--7 consists of a[1,2] and a[2,2].
So,
{a[2,1]>alpha
{a[4,1]>alpha
{a[5,1]>alpha
ps=list.append(0)
else ps=list.append(1)
}}}
Here, alpha is a a threshold. The ps is appended from values of 0 to 15 based on this criteria.
My code is
{for (i in 1:2)
{ if (a[j,i]>alpha)
{if (i%%2==1}
{j=j*2
if (a[j,i]>alpha
###here i want to go recursively i think and where and how should i add append values to the list
if a[j,i+1]>alpha}
if{i%%2==0}
{}
}}
I am stuck and confused at the same time. Any help or advices would be greatly appreciated.
Thanks

combine three functions using r

hello i created the following functions that test reliability. however i want to combine them into one function like :reliability<-function(x)in order for them to give me a 1-0 matris showing each answer for each function with in "reliability". because each has been giving me the answer on its own. if any ideas help.
splithalf1<- function( data ) {
n<-ncol(data)
tek<-data[ , seq(1,n , 2)]
cift<-data[ , seq(2 ,n , 2)]
top_single<-rowSums(single)
top_double<-rowSums(double)
kor<-cor(top_single,top_double)
r<-2*kor / (1+kor)
return(r)
}
cr.alpha2<- function(x) {
n<-ncol(x)
kov<-cov(x)
kov1<-as.vector(kov)
kov2<-unique (kov1)
kov3<- kov2[-1]
kov4<-sum(kov3)/length(kov3)
pay<- n*kov4
payd<- (1 + (n-1)*kov4)
alpha<-pay/payd
return(alpha)
}
kr20<-function(x) {
n<-ncol(x)
pq<-function(x) {
p<-mean(x)
q<-1-p
res<-p*q
return (res)
}
pay<- sum(apply(x,2,pq))
top<-rowSums(x)
payda<-var(top)
result<- n /(n-1)* (1-(pay/payda))
return(result)
}
Stack is not a coding service! As a teaching service to you, however, I will suggest several things which is probably the intent of your teacher in the first place!
Study up on fundamental R. There are innumerable (well numerable but innumerable for all practical individual purposes) resources for free on the net. One good (IMO), free, intro-to-journeyman-level book is R for Data Science which can be accessed here: https://r4ds.had.co.nz/
check out assigning function return values to vars.
check out the c() and matrix() functions.

Combine into function and iterate

I am attempting to combine a series of loops/functions into one all-encompassing function to then be able to see the result for different input values. While the steps work properly when standalone (and when given just one input), I am having trouble getting the overall function to work. The answer I am getting back is a vector of 1s, which is incorrect.
The goal is to count the number of occurrences of consecutive zeroes in the randomly generated results, and then to see how the probability of consecutive zeroes occurring changes as I change the initial percentage input provided.
Does anyone have a tip for what I'm doing wrong? I have stared at this at several separate points now but cannot figure out where I'm going wrong. Thanks for your help.
### Example
pctgs_seq=seq(0.8,1,.01)
occurs=20
iterations=10
iterate_pctgs=function(x) {
probs=rep(0,length(pctgs_seq))
for (i in 1:length(pctgs_seq)) {
all_sims=lapply(1:iterations, function (x) ifelse(runif(occurs) <= i, 1, 0))
totals=sapply(all_sims,sum)
consec_zeroes=function (x) {
g=0
for (i in 1:(length(x)-1))
{ g= g+ifelse(x[i]+x[i+1]==0,1,0) }
return (g) }
consec_zeroes_sim=sapply(all_sims,consec_zeroes)
no_consec_prob=sum(consec_zeroes_sim==0)/length(consec_zeroes_sim)
probs[i]=no_consec_prob }
return (probs)
}
answer=iterate_pctgs(pctgs_seq)

Find min value (parameters estmation) based on recurrence equations

Sorry for trivial question, but I`m not a programmer. Do I transformed the following tasks in the form of R function OK?
I have recurrence equations, e.g.(p1_par,...,p4_par-parameters to find):
z1[i+1]= z1[i]+p1_par*p2_par
z12[i+1]= z12[i]+(p1_par*z1[i]-p3_par*z1z2[i]-p4_par)*p2_par
z1z2[i+1]=z1z2[i]+(-p3_par*z12[i]-p4_par*z1z2[i])*p2_par
i=1,...,5
with the initial conditions for i=0:
z1_0=1.23
z12_0=1
z1z2_0=0
and t=6, y=c(0.1,0.06,0.08,0.04,0.05,0.01)
I want to find parameters based on min value of function e.g. like this:
(-2*p1_par*z1[i]-z12[i]+y[i+1]^2+2*p3_par*z1z2[i]+2*p4_par*z1z3[i])^2
I try to build the function in R like:
function1=function(p1_par,p2_par,p3_par,p4_par,y,t){
ep=1
summa=0
result=rep(1,t)
for(i in 1:t){
z1_0=1.23
z12_0=1
z1z2_0=0
z1[1]=z1_0+p1_par*p2_par
z12[1]=z12_0+(p1_par*z1_0-*p3_par*z1z2_0-*p4_par)*p2_par
z1z2[1]=z1z2_0+(-p3_par*z12_0-p4_par*z1z2_0)*p2_par
z1[i+1]= z1[i]+p1_par*p2_par
z12[i+1]= z12[i]+p1_par*z1[i]-p3_par*z1z2[i]-p4_par)*p2_par
z1z2[i+1]=z1z2[i]+(-p3_par*z12[i]-p4_par*z1z2[i])*p2_par
if(i==1) {
result[ep]=(-2*p1_par*z1_0-z12_0+y[i+1]^2+2*p3_par*z1z2_0+2*p4_par*z1z3_0)^2
} else {
result[ep]=(-2*p1_par*z1[i]-z12[i]+y[i+1]^2+2*p3_par*z1z2[i]+2*p4_par*z1z3[i])^2
}
summa<<-summa+result[ep]
ep=ep+1
}
return(result)
}
Do I transformed task of the R function correct? Results from other softwares (like Math) differs. Thanks in advance for help.
PPS

how to subtract two vectors in OpenBUGS

I am having a very hard time trying to subtract two vectors in my OpenBUGS model. The last line of the code below keeps giving the error "expected right parenthesis error":
model {
for ( i in 1:N) {
for(j in 1:q) {
vv[i,j] ~ dnorm(vz[i,j],tau.eta[j])
}
vz[i,1:q] ~ dmnorm(media.z[i,], K.delta[,])
for(j in 1:q) {
mean.z[i,j] <- inprod(K[i,] , vbeta[j,])
}
K[i,1] <- 1.0
for(j in 1:N) {
K[i,j+1] <- sum(ve[,i] - ve[,j])
}
}
If I change that line to K[i,j+1] <- sum(ve[,i]) - sum(ve[,j]), then the model works fine, but that is not what I want to do. I would like to subtract element-wise.
I searched SO for OpenBUGS, but there are only a few unrelated topics:
OpenBUGS - Variable is not defined
OpenBUGS: missing value in Bernoulli distribution
In Stats Stack Exchange there is this post which is close, but I still could not make how to implement this in my model:
https://stats.stackexchange.com/questions/20653/vector-multiplication-in-bugs-and-jags/20739#20739
I understand I have to write a for loop, but this thing is sure giving me a big headache. :)
I tried changing that line to:
for(k in 1:p) { temp [k] <- ve[k,i] - ve[k,j] }
K[i,j+1] <- sum(temp[])
where 'p' is the number of rows in each 've'. Now I keep getting the error "multiple definitions of node temp[1]".
I could definitely use some help. It will be much appreciated.
Best regards to all and thanks in advance!
PS: I wanted to add the tag "OpenBUGS" to this question but unfortunately I couldn't because it would be a new tag and I do not have enough reputation. I added "winbugs" instead.
The "multiple definitions" error is because temp[k] is redefined over and over again within a loop over i and another loop over j - you can only define it once. To get around that, use i and j subscripts like
for(k in 1:p) { temp[k,i,j] <- ve[k,i] - ve[k,j] }
K[i,j+1] <- sum(temp[,i,j])
Though if that compiles and runs, I'd check the results to make sure that's exactly what you want mathematically.

Resources