Correlation Function - r

I wrote a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between two variables(sulfate and nitrate) on the second and third column of the data set,for monitor locations where the number of completely observed cases is greater than the threshold.
My function returns a warning message:
in if(n>0){:
the condition has length > 1 and only the first element will be used.
Please been trying to fix this to no avail. I am very new to programming. Would appreciate any helpful solution.
corr <- function(directory, threshold = 0) {
files <- list.files(directory,full.names = TRUE)
p <- data.frame()
for (i in 1:332) {
p <- rbind(p,read.csv(files[i]))
r <- c(p[,2])
s <- c(p[,3])
n <- p[! is.na(p[,])]
if (n > threshold) {
cor(r,s)
} else {
return(0)
}
}
}

Related

User defined function within for-loops

I am working on a project in which I am simulating 8 classroom social networks over 6 weeks, so 30 iterations. Students will nominate each other based on a number of factors, and I plan to simulate a number of conditions in which I remove or add some of these factors to the simulation. In other words, I'm going to be repeating a lot of code, so I'd rather use functions rather than cutting and pasting where ever possible.
Right now, I'm trying to create a function that adjusts the probability of one student selecting another based on the similarity of their emotions. When I include it in a set of nested for for loops, this works just fine:
num_students <- 5
names_students <- letters[1:num_students]
student_emotion <- sample(round(runif(5, min = -5, max = 5), digits = 1))
student_emotion_df <- cbind.data.frame(names_students, student_emotion)
probs <- rep(1/num_students, 5)
row_prob <- vector(length = 5)
for(i in 1:num_students){
for(q in 1:num_students){
if(abs(student_emotion[i]-student_emotion[q]) >= 0 &
abs(student_emotion[i]-student_emotion[q]) <= .5){
row_prob[q] <- 1*probs[q]
} else if(abs(student_emotion[i]-student_emotion[q]) > .5 &
abs(student_emotion[i]-student_emotion[q]) <= 1) {
row_prob[q] <- .75 * probs[q]
}
else {
row_prob[q] <- .5 * probs[q]
}
}
}
The row_prob object is a vector of probabilities a student i, in the column, will select student q, in the rows.
I've created a user-defined function based on the same code, and that works:
emotion_difference_fun <- function(probs){
for(q in 1:num_students){
if(abs(student_emotion[i]-student_emotion[q]) >= 0 &
abs(student_emotion[i]-student_emotion[q]) <= .5){
row_prob[q] <- 1*probs[q]
} else if(abs(student_emotion[i]-student_emotion[q]) > .5 &
abs(student_emotion[i]-student_emotion[q]) <= 1) {
row_prob[q] <- .75 * probs[q]
}
else {
row_prob[q] <- .5 * probs[q]
}
}
return(row_prob)
}
emotion_difference_fun(probs)
But when I try to embed that function within the for loop iterating through the columns, row_prob returns as an empty vector:
for(i in 1:num_students){
emotion_difference_fun(probs)
}
Any thoughts on how I can get this to work?
Thanks for any help you're able to offer.
If I understood your question properly, then you need to assign the results in your last 'for' loop:
for(i in 1:num_students){
if(i == 1) out <- NULL
out <- c(out, emotion_difference_fun(probs))
}
out
Is that what you are looking for?
What I am unclear about though, is why in your second code section you are not looking for a 5*5 matrix. Eventually, when running that code, it doesn't matter that you did it for i = 5 students, because it will only save in row_prob your last iteration (student = 5).
You can use replicate to repeat the function emotion_difference_fun for num_students.
result <- replicate(num_students, emotion_difference_fun(probs))
You can also set simplify = FALSE to get output as list.
result <- replicate(num_students, emotion_difference_fun(probs),simplify = FALSE)

Receiving error: argument of length 0 when reading in separate tables from multiple csv files in for loop

Error in 1:nrow(csvs) : argument of length 0 when reading in separate tables from multiple csv files
I am trying to perform the following for loop on separate tables read in from multiple csv files in the working directory. But I keep getting the error 'Error in 1:nrow(csvs) : argument of length 0'. This is what I have tried below...
# Read all csvs from files path.
csv_files <- list.files(pattern="*.csv")
csvs <- lapply(csv_files, read.table)
count <- 0
for(x in 1:nrow(csvs) - 1) {
for(y in 1:ncol(csvs)) {
if((isTRUE(csvs[x,y] == 1)) && (isTRUE(csvs[x+1,y+1] == 0))) {
count <- count + 1
}
}
}
count
This outputs nothing and shows the error: argument of length 0. Any suggestions?
I have also included a reproducible example using only one reproducible matrix below:
set.seed(99)
mat <- matrix(sample(c(0,1), 2500, prob=c(0.8,0.2), replace=TRUE), nrow=50)
count <- 0
for(x in 1:nrow(mat) - 1) {
for(y in 1:ncol(mat)) {
if((isTRUE(mat[x,y] == 1)) {
count <- count + 1
}
}
}
count
[1] 91
I can't figure out why this works for one matrix and not multiple. Can anyone help please?
You're calling nrow on a list, which won't work. Since your loop works on one matrix, and you're trying to use it on a list you could turn your loop into a function and then apply that function to your list:
yourfunction <- function (mat) {
for(x in 1:nrow(mat) - 1) {
for(y in 1:ncol(mat)) {
if(isTRUE(mat[x, y] == 1)) && (isTRUE(mat[x + 1, y] == 1))) {
counter <- counter + 1
}
}
}
counter
}
lapply(csvs, yourfunction)

Do I need to loop through vector elements in R to calculate correlation?

I have created two vectors and "cleaned" the data. I am looking to loop through each element in each vector as pairs, in order to calculate the correlation of each pair and store in a third vector.
I am able to see each element and print - however, the cor() function is producing a result of NA.
Code is below, any advice is appreciated.
corr <- function(directory, threshold = 0) {
files_list <- list.files(directory, full.names = TRUE)
dat <- data.frame()
cleandat <- data.frame()
correlation <- c()
for (count in 1:length(files_list)) {
dat <- rbind(dat, read.csv(files_list[count]))
cleandat <- dat[complete.cases(dat),]
}
if (nrow(cleandat) <= threshold) {
print("Nope.")
} else {
sulfate_data <- cleandat$sulfate
nitrate_data <- cleandat$nitrate
for (element in 1:length(sulfate_data)) {
print("Sulfate: ")
print(sulfate_data[element])
print("Nitrate: ")
print(nitrate_data[element])
print(cor(sulfate_data[element], y = nitrate_data[element]))
correlation <- cor(sulfate_data[element], y = nitrate_data[element])
}
}
print(correlation)
} #end corr()

nodes Similarity in R

i try to compute similarity between a pre-known node and all the other nodes of the graph in R. and at each step, if the similarity exceeds a certain threshold i put the node in a vector, for storing all the nodes in dataframe ( for each node, i will give their similars).
but,this code give only the last node, and his last similar node.
v <- DC2$node[order(-DC2$'Centrality')]
Nei1 <- neighbors(g1,as.character(v[1]),1)
vec <- numeric()
if(length(Nei1) > 0) {
for (i in 1:length(V(g1))) {
Nei2 <- neighbors(g,as.character(V(g1)[i]),1)
k1 <- as.numeric(degree(g1,as.character(v[1])))
k2 <- as.numeric(degree(g1,as.character(V(g1)[i])))
Simhpi <- (length(intersect(Nei1,Nei2)) / min(k1,k2))
if (Simhpi >= 0.5) {
for (j in 1:length(V(g1))) {
vv <- V(g1)[j]
vec[j] <- c(vec,vv$name)
}
}
}
}
nn<- data.frame(node=as.character(v[1]), Nei=vec)
thanks for your help.

R: Complex numbers not compatible with boot() function

I found out that boot function of the boot package is not working with complex numbers. I am trying to bootstrap a data by taking the eigenvalue of the bivariate matrix. The problem with the eigenvalue is that, it often returns complex numbers, and by that it (boot) gives error. Is there a way to avoid complex numbers?
Here is my codes,
Data <- read.table('http://ubuntuone.com/6n1igcHXq4EnOm4x2zeqFb', header = FALSE)
Mat <- cbind(Data[["V1"]],Data[["V2"]])
Data.ts <- as.ts(Mat)
Below are some functions needed,
library(mvtnorm)
var1.sim <- function(T, n.start=100, phi1=matrix(c(0.7,0.2,0.2,0.7),nr=2),
err.mu=c(0,0), err.sigma2=matrix(c(1,0.5,0.5,1), nr=2),
errors=NULL) {
e <- rmvnorm(n.start + T, err.mu, err.sigma2) # (n.start+T) x 2 matrix
y <- matrix(0, nrow=n.start+T, ncol=2)
if (!is.null(errors) && is.matrix(errors) && ncol(errors) == 2) {
rows <- nrow(errors)
if (rows < n.start + T) {
# replace last nrow(errors) errors
e[seq.int(n.start+T-rows+1,n.start+T),] <- errors
} else {
e <- errors[seq.int(n.start+T+1, rows)]
}
}
for (t in seq.int(2, n.start + T)) {
y[t,] <- phi1 %*% y[t-1,] + e[t,]
}
return(ts(y[seq.int(n.start+1,n.start+T),]))
}
########
coef.var1 <- function(var.fit) {
k <- coef(var.fit)
rbind(k[[1]][,"Estimate"], k[[2]][,"Estimate"])
}
And here is the main method,
library(vars)
library(boot)
y.var <- VAR(Data.ts, p=1, type="none")
y.resid <- resid(y.var)
rm(y.boot)
y.boot <- boot(y.resid, R=100, statistic=function(x,i) {
resid.boot <- x[i,]
y.boot1 <- var1.sim(T=nrow(x), errors=resid.boot)
min(eigen(coef.var1(VAR(y.boot1, p=1, type="none")))$values)
}, stype="i")
y.boot$t
y.ci <- boot.ci(y.boot, type="norm", conf=0.95)$normal[2:3]
list(t=y.boot$t,ci=y.ci)
The problem occurs in y.boot object, particularly this line
min(eigen(coef.var1(VAR(y.boot1, p=1, type="none")))$values)
When the obtain minimum eigenvalue is complex, then boot will return this error
Error in min(eigen(coef.var1(VAR(y.boot1, p = 1, type = "none")))$values) :
invalid 'type' (complex) of argument
Otherwise, there is no problem. Now, it would be safe if this 100 bootstraps is performed once, but I am going to loop this actually about 100 times too. So, there is a big chance that complex values will occur in these loops. Hence, we will obtain the above error again.
Is there a way to avoid these complex values?

Resources