How to create a loop of ppcor function? - r

I am trying to create a loop to go through and perform a correlation (and in future a partial correlation) using ppcor function on variables stored within a data frame. The first variable (A) will remain the same for all correlations, whilst the second variable (B) will be the next variable along in the next column within my data frame. I have around 1000 variables.
I show the mtcars dataset below as an example, as it is in the same layout as my data.
I've been able to complete the operation successfully when performed manually using cbind to bind 2 columns (the 2 variables of interest) prior to running ppcor on the array ("tmp_df"). I have then been able to bind the output from correlation operation ("mpg_cycl"), ("mpg_disp") into a single object. However I can't get any of this operation to work in a loop. Any ideas please?
library("MASS")
install.packages("ppcor")
library("ppcor")
mtcars_df <- as.data.frame(mtcars)
tmp_df = cbind(mtcars_df$mpg, mtcars_df$cycl)
mpg_cycl <- pcor(as.matrix(tmp_df), method = 'spearman')
tmp_df1= cbind(mtcars_df$mpg, mtcars_df$disp)
mpg_disp <- pcor(as.matrix(tmp_df1), method = 'spearman')
combined_table <- do.call(cbind, lapply(list("mpg_cycl" = mpg_cycl,
mpg_disp" = mpg_disp), as.data.frame, USE.NAMES = TRUE))
attempting to loop above operation ## (ammended after last reviewer's comments:
for (i in mtcars_df[2:7]){
tmp_df = (cbind(i, mtcars_df$mpg)
i <- pcor(as.matrix(tmp_df), method = 'spearman')
write.csv(i, file = paste0("MyDataOutput",i[1],".csv")
}
I expected the loop to output two of the correlations results to MyDataOutput csv file. But this generates an error message, I thought i was in the correct place?:
Error: unexpected symbol in:
" tmp_df = (cbind(i, mtcars_df$mpg)
i"
Even adding a curly bracket at the end does not resolve issue so I have left this out as it introduces another error message '}'

I have redone some of your code and fixed missing ), }, ". The for cyckle now outputs file with name + name of the variable. Hope this will help.
library("MASS")
#install.packages("ppcor")
library("ppcor")
mtcars_df <- as.data.frame(mtcars)
tmp_df = cbind(mtcars_df$mpg, mtcars_df$cycl)
mpg_cycl <- pcor(as.matrix(tmp_df), method = 'spearman')
tmp_df1= cbind(mtcars_df$mpg, mtcars_df$disp)
mpg_disp <- pcor(as.matrix(tmp_df1), method = 'spearman')
combined_table <- do.call(cbind, lapply(list("mpg_cycl" = mpg_cycl,
"mpg_disp" = mpg_disp), as.data.frame, USE.NAMES = TRUE))
for(i in colnames(mtcars_df[2:7])){
tmp_df = mtcars_df[c(i,"mpg")]
i_resutl <- pcor(as.matrix(tmp_df), method = 'spearman')
write.csv(i_resutl, file = paste0("MyDataOutput_",i,".csv"))
}
for merging before saving:
dta <- c()
for(i in colnames(mtcars_df[2:7])){
tmp_df = mtcars_df[c(i,"mpg")]
i_resutl <- pcor(as.matrix(tmp_df), method = 'spearman')
dta <- rbind(dta,c(i,(unlist( i_resutl))))
}

Related

Getting result of function using common columns in R

This is an extension of question on Function and looping in training and testing set using r.
How to get the result of the function (func1) given below in external folder using common columns and then each with its own additional column output? Moreover, how can I get the output of each unique data_by_plot result in external folder. I used write.table(func1, “c:\\Document\\project\\result), but I couldn’t get in the way I want. My code is given below. I tried different ways using cbind and rbind but it doesn’t give me what I want.
My code is :
result<- c()
data$groups <- paste(data$Plot, data$Species, sep = "_")
data_by_plot <- split (data$Count, data$groups)
func1<- do.call(rbind, lapply(data_by_plot, function(df){
Training<-df[1:20,]
Testing<-df[21:30,]
Model1<-lm(count~1, data = Training)
Pred1<-Testing$Count[i]- Model1$coefficients
Model2<-lm(Diff~1, data = Training)
Pred2<-Testing$Count[i]- Model2$coefficients
Model3<-lm(Diff~1+LogCount, data = Training)
Pred3<-Testing$Count[i]- Model3$coefficients
Model4<-lm(Diff~1+Count, data = Training)
Pred4<-Testing$Count[i]- Model4$coefficients
result <- Reduce(merge, list(Pred1, Pred2, Pred3, Pred4))
return(result)
})
Ok with clarification from your comment above. There are two ways you could do this. You could incorporate it into the function itself or pull out the result of the function and pass it to an export function.
I think the easiest way would be the former, so create an export function:
export.function <- function(result){
path <- "//folder/" #whatever your path is to the folder
as.data.frame(result) -> result #turn to data.frame
write.csv(paste0(path, result, ".csv"))
}
This will write the result, as a data frame, as a csv in the path designated. (It will name it "result.csv").
Then add it:
result<- c()
data$groups <- paste(data$Plot, data$Species, sep = "_")
data_by_plot <- split (data$Count, data$groups)
func1<- do.call(rbind, lapply(data_by_plot, function(df){
Training<-df[1:20,]
Testing<-df[21:30,]
Model1<-lm(count~1, data = Training)
Pred1<-Testing$Count[i]- Mean_model$coefficients
Model2<-lm(Diff~1, data = Training)
Pred2<-Testing$Count[i]- Mean_model$coefficients
Model3<-lm(Diff~1+LogCount, data = Training)
Pred3<-Testing$Count[i]- Mean_model$coefficients
Model4<-lm(Diff~1+Count, data = Training)
Pred4<-Testing$Count[i]- Mean_model$coefficients
result <- Reduce(merge, list(Pred1, Pred2, Pred3, Pred4))
export.function(result)
})
The other way would be to just do:
func1(df) -> results
export.function(results)

Is there a way to simplify this code using a loop?

Is there a way to simplify this code using a loop?
VariableList <- c(v0,v1,v2, ... etc)
National_DF <- df[,VariableList]
AL_DF <- AL[,VariableList]
AR_DF <- AR[,VariableList]
AZ_DF <- AZ[,VariableList]
... etc
I want the end result to have each as a data frame since it will be used later in the model. Each state such as 'AL', 'AR', 'AZ', etc are data frames. The v{#} represents an out of place variable from the RAW data frame. This is meant to restructure the fields, while eliminating some fields, for preparation for model use.
Continuing the answer from your previous question, we can arrange the data in the same lapply call before creating dataframes.
VariableList <- c('v0','v1','v2')
data <- unlist(lapply(mget(ls(pattern = '_DF$')), function(df) {
index <- sample(1:nrow(df), 0.7*nrow(df))
df <- df[, VariableList]
list(train = df[index,], test = df[-index,])
}), recursive = FALSE)
Then get data in global environment :
list2env(data, .GlobalEnv)

New data frame after function is empty

I prepare a function to have a temporary dataframe, but whent i apply this function on my old dataframe , the temporary dataframe is empty. How can i solve this ?
I tried this code :
data_a <- as.data.frame(cbind(pop=c("a1","b2","c3","d4","d5"),
PA1=c(1,40,430,4330,43330),
PA2=c(2,50,530,5330,53330)))
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat["vname"]
locci_1 <- sample(dat["loc1"], replace = F)
locci_2 <- sample(dat["loc2"], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
data_3 <- perm_all(dat= "data_a",vname="pop",loc1="PA1",loc2="PA2")
I've tried to convert the data_a with
data_a <- as.matrix(data_a)
and
popu <- sample(dat[,1], replace = F)
but they didn't work too
Thank's :)
There are maybe multiple issues. First, when you have created your data frame, be aware that data.frame function family treat string as a factor by default. It may be not what you want.
Then #NURAIMIAZIMAH is right, your function needs a data frame to work properly, so :
data_3 <- perm_all(dat= data_a,vname="pop",loc1="PA1",loc2="PA2")
is a good start.
Moreover, you give value to vector like vname, loc1 and loc2. But you only use the name of these objects in your function, because you forgot to remove quotation mark.
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat[vname]
locci_1 <- sample(dat[loc1], replace = F)
locci_2 <- sample(dat[loc2], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
Now your function should work, but maybe not in the way you would like to. Because there won't be any permutations in your data_3 table. If you look carefully, the type of return of this part of the code dat[loc1] is a data frame. You certainly want a vector to permute your data, so you have to subset your data frame like this : dat[,loc1].
This code below should do what you expect.
data_a <- as.data.frame(cbind(pop=c("a1","b2","c3","d4","d5"),
PA1=c(1,40,430,4330,43330),
PA2=c(2,50,530,5330,53330)))
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat[vname]
locci_1 <- sample(dat[,loc1], replace = F)
locci_2 <- sample(dat[,loc2], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
data_3 <- perm_all(dat= data_a,vname="pop",loc1="PA1",loc2="PA2")
See you.

Trycatch in for loop- continue to next r dataRetrieval

I have a list containing the following site id numbers:
sitelist <- c("02074500", "02077200", "208111310", "02081500", "02082950")
I want to use the dataRetrieval package to collect additional information about these sites and save it into individual .csv files. Site number "208111310" does not exist, so it returns an error and stops the code.
I want the code to ignore site numbers that do not return data and continue to the next number in sitelist.
I've tried trycatch in several ways but can't get the correct syntax. Here is my for loop without trycatch.
for (i in sitelist){
test_gage <- readNWISdv(siteNumbers = i,
parameterCd = pCode)
df = test_gage
df = subset(df, select= c(site_no, Date, X_00060_00003))
names(df)[3] <- c("flow in m3/s")
df$Year <- as.character(year(df$Date))
write.csv(df, paste0("./gage_flow/",i,".csv"), row.names = F)
rm(list=setdiff(ls(),c("sitelist", "pCode")))
}
You can use the variable error in the function trycatch to specify what happened when an error occurs and store the return value using operator <<-.
for (i in sitelist){
test_gage <- NULL
trycatch(error=function(message){
test_gage <<- readNWISdv(siteNumbers = i,parameterCd = pCode)
}
df = test_gage
df = subset(df, select= c(site_no, Date, X_00060_00003))
names(df)[3] <- c("flow in m3/s")
df$Year <- as.character(year(df$Date)) write.csv(df, paste0("./gage_flow/",i,".csv"), row.names = F)
rm(list=setdiff(ls(),c("sitelist", "pCode")))
}
If you want to catch the warnings also just give a second argument to trycatch.
trycatch(error=function(){},warning=function(){})

R, creating variables on the fly in a list using assign statement

I want to create variable names on the fly inside a list and assign them values in R, but I am unable to get the desired result. Here is the logic of my code:
Upon the function call: dat_in <- readf(1,2), an input file is read based on a product and site. After reading, a particular column (13th, here) is assigned to a variable aot500. I want to have this variable return from the function for each combination of product and site. For example, I need variables name in the list statement as aot500.AF, aot500.CM, aot500.RB to be returned from this function. I am having trouble in the return statement. There is no error but there is nothing in dat_in. I expect it to have dat_in$aot500.AF etc. Please inform what is wrong in the return statement. Furthermore, I want to read files for all combinations in a single call to the function, say using a for loop and I wonder how would the return statement handle list of more variables.
prod <- c('inv','tot')
site <- c('AF','CM','RB')
readf <- function(pp, kk) {
fname.dsa <- paste("../data/site_data_",prod[pp],"/daily_",site[kk],".dat",sep="")
inp.aod <- read.csv(fname.dsa,skip=4,sep=",",stringsAsFactors=F,na.strings="N/A")
aot500 <- inp.aod[,13]
return(list(assign(paste("aot500",siteabbr[kk],sep="."),aot500)))
}
Almost always there is no need to use assign(), we can solve the problem in two steps, read the files into a list, then give names.
(Not tested as we don't have your files)
prod <- c('inv', 'tot')
site <- c('AF', 'CM', 'RB')
# get combo of site and prod
prod_site <- expand.grid(prod, site)
colnames(prod_site) <- c("prod", "site")
# Step 1: read the files into a list
res <- lapply(1:nrow(prod_site), function(i){
fname.dsa <- paste0("../data/site_data_",
prod_site[i, "prod"],
"/daily_",
prod_site[i, "site"],
".dat")
inp.aod <- read.csv(fname.dsa,
skip = 4,
stringsAsFactors = FALSE,
na.strings = "N/A")
inp.aod[, 13]
})
# Step 2: assign names to a list
names(res) <- paste("aot500", prod_site$prod, prod_site$site, sep = ".")
I propose two answers, one based on dplyr and one based on base R.
You'll probably have to adapt the filename in the readAOT_500 function to your particular case.
Base R answer
#' Function that reads AOT_500 from the given product and site file
#' #param prodsite character vector containing 2 elements
#' name of a product and name of a site
readAOT_500 <- function(prodsite,
selectedcolumn = c("AOT_500"),
path = tempdir()){
cat(path, prodsite)
filename <- paste0(path, prodsite[1],
prodsite[2], ".csv")
dtf <- read.csv(filename, stringsAsFactors = FALSE)
dtf <- dtf[selectedcolumn]
dtf$prod <- prodsite[1]
dtf$site <- prodsite[2]
return(dtf)
}
# Load one file for example
readAOT_500(c("inv", "AF"))
listofsites <- list(c("inv","AF"),
c("tot","AF"),
c("inv", "CM"),
c( "tot", "CM"),
c("inv", "RB"),
c("tot", "RB"))
# Load all files in a list of data frames
prodsitedata <- lapply(listofsites, readAOT_500)
# Combine all data frames together
prodsitedata <- Reduce(rbind,prodsitedata)
dplyr answer
I use Hadley Wickham's packages to clean data.
library(dplyr)
library(tidyr)
daily_CM <- read.csv("~/downloads/daily_CM.dat",skip=4,sep=",",stringsAsFactors=F,na.strings="N/A")
# Generate all combinations of product and site.
prodsite <- expand.grid(prod = c('inv','tot'),
site = c('AF','CM','RB')) %>%
# Group variables to use do() later on
group_by(prod, site)
Create 6 fake files by sampling from the data you provided
You can skip this section when you have real data.
I used various sample length so that the number of observations
differs for each site.
prodsite$samplelength <- sample(1:495,nrow(prodsite))
prodsite %>%
do(stuff = write.csv(sample_n(daily_CM,.$samplelength),
paste0(tempdir(),.$prod,.$site,".csv")))
Read many files using dplyr::do()
prodsitedata <- prodsite %>%
do(read.csv(paste0(tempdir(),.$prod,.$site,".csv"),
stringsAsFactors = FALSE))
# Select only the columns you are interested in
prodsitedata2 <- prodsitedata %>%
select(prod, site, AOT_500)

Resources