Deduplication in R Studio - r

this is my first R Code, and it is a very simple deduplication, but it is working so slowly I can't believe it! My question is: Is it normal that it is working so slowly or is my code just bad?
Here it is:
file1=c(read.delim("file.txt", header=TRUE))
dedupes<-0
i<-1
n<-1
while (i<=100) {
while (n<=100) {
if (file1$email[i]==file1$email[n] && i!=n) {
#Remember amount of deduces
dedupes=dedupes+1
#Show dedupes
print(file1$email[i]) }
n<-n+1
}
n<-1
i<-i+1
}
#Show amount of dedupes
cat("There are ", dedupes/2, " deduces")
Many thanks in advance,
Saitam

Imbricated loops are well known to be slow in R. You need to vectorize your calculus or use existing optimized functions such as in the suggestion of BondedDust

Related

Create functions in R

I'm fairly new to R and I have not been working with functions in R before.
I want to write a program/algorithm (using R) that calculates the square root of a given positive number.
Would anyone mind take the time to give me an example of how this can be achieved?
Thanks a lot in advance!
UPDATE
posNum_to_squaRtNum <- function(posNum) {
if (posNum <= 0)
print("Due to mathmatical principles you have to input a positive number")
else
squaRtNum <- sqrt(posNum)
return(squaRtNum)
}
When I insert a negative number in the function, the output is my print PLUS the error: "Error in posNum_to_squaRtNum(-1) : object 'squaRtNum' not found." It should not go on to the else statement, if the if statement is fulfilled right?
You should wrap your if conditions in brackets:
posNum_to_squaRtNum <- function(posNum) {
if (posNum <= 0) {
print("Due to mathmatical principles you have to input a positive number")
} else {
squaRtNum <- sqrt(posNum)
return(squaRtNum)
}
}

combine three functions using r

hello i created the following functions that test reliability. however i want to combine them into one function like :reliability<-function(x)in order for them to give me a 1-0 matris showing each answer for each function with in "reliability". because each has been giving me the answer on its own. if any ideas help.
splithalf1<- function( data ) {
n<-ncol(data)
tek<-data[ , seq(1,n , 2)]
cift<-data[ , seq(2 ,n , 2)]
top_single<-rowSums(single)
top_double<-rowSums(double)
kor<-cor(top_single,top_double)
r<-2*kor / (1+kor)
return(r)
}
cr.alpha2<- function(x) {
n<-ncol(x)
kov<-cov(x)
kov1<-as.vector(kov)
kov2<-unique (kov1)
kov3<- kov2[-1]
kov4<-sum(kov3)/length(kov3)
pay<- n*kov4
payd<- (1 + (n-1)*kov4)
alpha<-pay/payd
return(alpha)
}
kr20<-function(x) {
n<-ncol(x)
pq<-function(x) {
p<-mean(x)
q<-1-p
res<-p*q
return (res)
}
pay<- sum(apply(x,2,pq))
top<-rowSums(x)
payda<-var(top)
result<- n /(n-1)* (1-(pay/payda))
return(result)
}
Stack is not a coding service! As a teaching service to you, however, I will suggest several things which is probably the intent of your teacher in the first place!
Study up on fundamental R. There are innumerable (well numerable but innumerable for all practical individual purposes) resources for free on the net. One good (IMO), free, intro-to-journeyman-level book is R for Data Science which can be accessed here: https://r4ds.had.co.nz/
check out assigning function return values to vars.
check out the c() and matrix() functions.

Find min value (parameters estmation) based on recurrence equations

Sorry for trivial question, but I`m not a programmer. Do I transformed the following tasks in the form of R function OK?
I have recurrence equations, e.g.(p1_par,...,p4_par-parameters to find):
z1[i+1]= z1[i]+p1_par*p2_par
z12[i+1]= z12[i]+(p1_par*z1[i]-p3_par*z1z2[i]-p4_par)*p2_par
z1z2[i+1]=z1z2[i]+(-p3_par*z12[i]-p4_par*z1z2[i])*p2_par
i=1,...,5
with the initial conditions for i=0:
z1_0=1.23
z12_0=1
z1z2_0=0
and t=6, y=c(0.1,0.06,0.08,0.04,0.05,0.01)
I want to find parameters based on min value of function e.g. like this:
(-2*p1_par*z1[i]-z12[i]+y[i+1]^2+2*p3_par*z1z2[i]+2*p4_par*z1z3[i])^2
I try to build the function in R like:
function1=function(p1_par,p2_par,p3_par,p4_par,y,t){
ep=1
summa=0
result=rep(1,t)
for(i in 1:t){
z1_0=1.23
z12_0=1
z1z2_0=0
z1[1]=z1_0+p1_par*p2_par
z12[1]=z12_0+(p1_par*z1_0-*p3_par*z1z2_0-*p4_par)*p2_par
z1z2[1]=z1z2_0+(-p3_par*z12_0-p4_par*z1z2_0)*p2_par
z1[i+1]= z1[i]+p1_par*p2_par
z12[i+1]= z12[i]+p1_par*z1[i]-p3_par*z1z2[i]-p4_par)*p2_par
z1z2[i+1]=z1z2[i]+(-p3_par*z12[i]-p4_par*z1z2[i])*p2_par
if(i==1) {
result[ep]=(-2*p1_par*z1_0-z12_0+y[i+1]^2+2*p3_par*z1z2_0+2*p4_par*z1z3_0)^2
} else {
result[ep]=(-2*p1_par*z1[i]-z12[i]+y[i+1]^2+2*p3_par*z1z2[i]+2*p4_par*z1z3[i])^2
}
summa<<-summa+result[ep]
ep=ep+1
}
return(result)
}
Do I transformed task of the R function correct? Results from other softwares (like Math) differs. Thanks in advance for help.
PPS

removing elements of the environment using a loop

I have 16 elements in the environment called Factor1 to Factor16. I would like to remove them automatically. I wrote that and I cannot understand why that's not working...
for(i in 1:16) {
rm(paste0('Factor',i))
}
sorry for this basic question, I am a beginner!
for(i in 1:16) {
rm(list=paste0('Factor',i))
}
although rm(list=paste0('Factor',1:16)) or rm(list=ls(pattern="Factor"))would be more appropriate...

how to subtract two vectors in OpenBUGS

I am having a very hard time trying to subtract two vectors in my OpenBUGS model. The last line of the code below keeps giving the error "expected right parenthesis error":
model {
for ( i in 1:N) {
for(j in 1:q) {
vv[i,j] ~ dnorm(vz[i,j],tau.eta[j])
}
vz[i,1:q] ~ dmnorm(media.z[i,], K.delta[,])
for(j in 1:q) {
mean.z[i,j] <- inprod(K[i,] , vbeta[j,])
}
K[i,1] <- 1.0
for(j in 1:N) {
K[i,j+1] <- sum(ve[,i] - ve[,j])
}
}
If I change that line to K[i,j+1] <- sum(ve[,i]) - sum(ve[,j]), then the model works fine, but that is not what I want to do. I would like to subtract element-wise.
I searched SO for OpenBUGS, but there are only a few unrelated topics:
OpenBUGS - Variable is not defined
OpenBUGS: missing value in Bernoulli distribution
In Stats Stack Exchange there is this post which is close, but I still could not make how to implement this in my model:
https://stats.stackexchange.com/questions/20653/vector-multiplication-in-bugs-and-jags/20739#20739
I understand I have to write a for loop, but this thing is sure giving me a big headache. :)
I tried changing that line to:
for(k in 1:p) { temp [k] <- ve[k,i] - ve[k,j] }
K[i,j+1] <- sum(temp[])
where 'p' is the number of rows in each 've'. Now I keep getting the error "multiple definitions of node temp[1]".
I could definitely use some help. It will be much appreciated.
Best regards to all and thanks in advance!
PS: I wanted to add the tag "OpenBUGS" to this question but unfortunately I couldn't because it would be a new tag and I do not have enough reputation. I added "winbugs" instead.
The "multiple definitions" error is because temp[k] is redefined over and over again within a loop over i and another loop over j - you can only define it once. To get around that, use i and j subscripts like
for(k in 1:p) { temp[k,i,j] <- ve[k,i] - ve[k,j] }
K[i,j+1] <- sum(temp[,i,j])
Though if that compiles and runs, I'd check the results to make sure that's exactly what you want mathematically.

Resources