New data frame after function is empty - r

I prepare a function to have a temporary dataframe, but whent i apply this function on my old dataframe , the temporary dataframe is empty. How can i solve this ?
I tried this code :
data_a <- as.data.frame(cbind(pop=c("a1","b2","c3","d4","d5"),
PA1=c(1,40,430,4330,43330),
PA2=c(2,50,530,5330,53330)))
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat["vname"]
locci_1 <- sample(dat["loc1"], replace = F)
locci_2 <- sample(dat["loc2"], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
data_3 <- perm_all(dat= "data_a",vname="pop",loc1="PA1",loc2="PA2")
I've tried to convert the data_a with
data_a <- as.matrix(data_a)
and
popu <- sample(dat[,1], replace = F)
but they didn't work too
Thank's :)

There are maybe multiple issues. First, when you have created your data frame, be aware that data.frame function family treat string as a factor by default. It may be not what you want.
Then #NURAIMIAZIMAH is right, your function needs a data frame to work properly, so :
data_3 <- perm_all(dat= data_a,vname="pop",loc1="PA1",loc2="PA2")
is a good start.
Moreover, you give value to vector like vname, loc1 and loc2. But you only use the name of these objects in your function, because you forgot to remove quotation mark.
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat[vname]
locci_1 <- sample(dat[loc1], replace = F)
locci_2 <- sample(dat[loc2], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
Now your function should work, but maybe not in the way you would like to. Because there won't be any permutations in your data_3 table. If you look carefully, the type of return of this part of the code dat[loc1] is a data frame. You certainly want a vector to permute your data, so you have to subset your data frame like this : dat[,loc1].
This code below should do what you expect.
data_a <- as.data.frame(cbind(pop=c("a1","b2","c3","d4","d5"),
PA1=c(1,40,430,4330,43330),
PA2=c(2,50,530,5330,53330)))
perm_all <- function(dat,vname,loc1, loc2){
popu <- dat[vname]
locci_1 <- sample(dat[,loc1], replace = F)
locci_2 <- sample(dat[,loc2], replace = F)
data_a_1 <- as.data.frame(cbind(popu, locci_1, locci_2))
return(data_a_1)
}
data_3 <- perm_all(dat= data_a,vname="pop",loc1="PA1",loc2="PA2")
See you.

Related

Save a dataframe name and then reference that object in subsequent code

Would like to reference a dataframe name stored in an object, such as:
dfName <- 'mydf1'
dfName <- data.frame(c(x = 5)) #want dfName to resolve to 'mydf1', not create a dataframe named 'dfName'
mydf1
Instead, I get: Error: object 'mydf1' not found
CORRECTED SCENARIO:
olddf <- data.frame(c(y = 8))
mydf1 <- data.frame(c(x = 5))
assign('dfName', mydf1)
dfName <- olddf #why isnt this the same as doing "mydf1 <- olddf"?
I don't want to reference an actual dataframe named "dfName", rather "mydf1".
UPDATE
I have found a clunky workaround for what I wanted to do. The code is:
olddf <- data.frame(x = 8)
olddfName <- 'olddf'
newdfName <- 'mydf1'
statement <- paste(newdfName, "<-", olddfName, sep = " ")
writeLines(statement, "mycode.R")
source("mycode.R")
Anyone have a more elegant way, especially without resorting to a write/source?
I am guessing you want to store multiple data.frames in a loop or similar. In that case it is much more efficient and better to store them in a named list. However, you can achieve your goal with assign
assign('mydf1', data.frame(x = 5))
mydf1
x
1 5

Getting result of function using common columns in R

This is an extension of question on Function and looping in training and testing set using r.
How to get the result of the function (func1) given below in external folder using common columns and then each with its own additional column output? Moreover, how can I get the output of each unique data_by_plot result in external folder. I used write.table(func1, “c:\\Document\\project\\result), but I couldn’t get in the way I want. My code is given below. I tried different ways using cbind and rbind but it doesn’t give me what I want.
My code is :
result<- c()
data$groups <- paste(data$Plot, data$Species, sep = "_")
data_by_plot <- split (data$Count, data$groups)
func1<- do.call(rbind, lapply(data_by_plot, function(df){
Training<-df[1:20,]
Testing<-df[21:30,]
Model1<-lm(count~1, data = Training)
Pred1<-Testing$Count[i]- Model1$coefficients
Model2<-lm(Diff~1, data = Training)
Pred2<-Testing$Count[i]- Model2$coefficients
Model3<-lm(Diff~1+LogCount, data = Training)
Pred3<-Testing$Count[i]- Model3$coefficients
Model4<-lm(Diff~1+Count, data = Training)
Pred4<-Testing$Count[i]- Model4$coefficients
result <- Reduce(merge, list(Pred1, Pred2, Pred3, Pred4))
return(result)
})
Ok with clarification from your comment above. There are two ways you could do this. You could incorporate it into the function itself or pull out the result of the function and pass it to an export function.
I think the easiest way would be the former, so create an export function:
export.function <- function(result){
path <- "//folder/" #whatever your path is to the folder
as.data.frame(result) -> result #turn to data.frame
write.csv(paste0(path, result, ".csv"))
}
This will write the result, as a data frame, as a csv in the path designated. (It will name it "result.csv").
Then add it:
result<- c()
data$groups <- paste(data$Plot, data$Species, sep = "_")
data_by_plot <- split (data$Count, data$groups)
func1<- do.call(rbind, lapply(data_by_plot, function(df){
Training<-df[1:20,]
Testing<-df[21:30,]
Model1<-lm(count~1, data = Training)
Pred1<-Testing$Count[i]- Mean_model$coefficients
Model2<-lm(Diff~1, data = Training)
Pred2<-Testing$Count[i]- Mean_model$coefficients
Model3<-lm(Diff~1+LogCount, data = Training)
Pred3<-Testing$Count[i]- Mean_model$coefficients
Model4<-lm(Diff~1+Count, data = Training)
Pred4<-Testing$Count[i]- Mean_model$coefficients
result <- Reduce(merge, list(Pred1, Pred2, Pred3, Pred4))
export.function(result)
})
The other way would be to just do:
func1(df) -> results
export.function(results)

Parsing colnames text string as expression in R

I am trying to create a large number of data frames in a for loop using the "assign" function in R. I want to use the colnames function to set the column names in the data frame. The code I am trying to emulate is the following:
county_tmax_min_df <- data.frame(array(NA,c(length(days),67)))
colnames(county_tmax_min_df) <- c('Date',sd_counties$NAME)
county_tmax_min_df$Date <- days
The code I have so far in the loop looks like this:
file_vars = c('file1','file2')
days <- seq(as.Date("1979-01-01"), as.Date("1979-01-02"), "days")
f = 1
for (f in 1:2){
assign(paste0('county_',file_vars[f]),data.frame(array(NA,c(length(days),67))))
}
I need to be able to set the column names similar to how I did in the above statement. How do I do this? I think it needs to be something like this, but I am unsure what goes in the text portion. The end result I need is just a bunch of data frames. Any help would be wonderful. Thank you.
expression(parse(text = ))
You can set the names within assign, like that:
file_vars = c('file1', 'file2')
days <- seq.Date(from = as.Date("1979-01-01"), to = as.Date("1979-01-02"), by = "days")
for (f in seq_along(file_vars)) {
assign(x = paste0('county_', file_vars[f]),
value = {
df <- data.frame(array(NA, c(length(days), 67)))
colnames(df) <- paste0("fancy_column_",
sample(LETTERS, size = ncol(df), replace = TRUE))
df
})
}
When in {} you can use colnames(df) or setNames to assign column names in any manner desired. In your first piece of code you are referring to sd_counties object that is not available but the generic idea should work for you.

Is there a way to simplify this code using a loop?

Is there a way to simplify this code using a loop?
VariableList <- c(v0,v1,v2, ... etc)
National_DF <- df[,VariableList]
AL_DF <- AL[,VariableList]
AR_DF <- AR[,VariableList]
AZ_DF <- AZ[,VariableList]
... etc
I want the end result to have each as a data frame since it will be used later in the model. Each state such as 'AL', 'AR', 'AZ', etc are data frames. The v{#} represents an out of place variable from the RAW data frame. This is meant to restructure the fields, while eliminating some fields, for preparation for model use.
Continuing the answer from your previous question, we can arrange the data in the same lapply call before creating dataframes.
VariableList <- c('v0','v1','v2')
data <- unlist(lapply(mget(ls(pattern = '_DF$')), function(df) {
index <- sample(1:nrow(df), 0.7*nrow(df))
df <- df[, VariableList]
list(train = df[index,], test = df[-index,])
}), recursive = FALSE)
Then get data in global environment :
list2env(data, .GlobalEnv)

How to create a loop of ppcor function?

I am trying to create a loop to go through and perform a correlation (and in future a partial correlation) using ppcor function on variables stored within a data frame. The first variable (A) will remain the same for all correlations, whilst the second variable (B) will be the next variable along in the next column within my data frame. I have around 1000 variables.
I show the mtcars dataset below as an example, as it is in the same layout as my data.
I've been able to complete the operation successfully when performed manually using cbind to bind 2 columns (the 2 variables of interest) prior to running ppcor on the array ("tmp_df"). I have then been able to bind the output from correlation operation ("mpg_cycl"), ("mpg_disp") into a single object. However I can't get any of this operation to work in a loop. Any ideas please?
library("MASS")
install.packages("ppcor")
library("ppcor")
mtcars_df <- as.data.frame(mtcars)
tmp_df = cbind(mtcars_df$mpg, mtcars_df$cycl)
mpg_cycl <- pcor(as.matrix(tmp_df), method = 'spearman')
tmp_df1= cbind(mtcars_df$mpg, mtcars_df$disp)
mpg_disp <- pcor(as.matrix(tmp_df1), method = 'spearman')
combined_table <- do.call(cbind, lapply(list("mpg_cycl" = mpg_cycl,
mpg_disp" = mpg_disp), as.data.frame, USE.NAMES = TRUE))
attempting to loop above operation ## (ammended after last reviewer's comments:
for (i in mtcars_df[2:7]){
tmp_df = (cbind(i, mtcars_df$mpg)
i <- pcor(as.matrix(tmp_df), method = 'spearman')
write.csv(i, file = paste0("MyDataOutput",i[1],".csv")
}
I expected the loop to output two of the correlations results to MyDataOutput csv file. But this generates an error message, I thought i was in the correct place?:
Error: unexpected symbol in:
" tmp_df = (cbind(i, mtcars_df$mpg)
i"
Even adding a curly bracket at the end does not resolve issue so I have left this out as it introduces another error message '}'
I have redone some of your code and fixed missing ), }, ". The for cyckle now outputs file with name + name of the variable. Hope this will help.
library("MASS")
#install.packages("ppcor")
library("ppcor")
mtcars_df <- as.data.frame(mtcars)
tmp_df = cbind(mtcars_df$mpg, mtcars_df$cycl)
mpg_cycl <- pcor(as.matrix(tmp_df), method = 'spearman')
tmp_df1= cbind(mtcars_df$mpg, mtcars_df$disp)
mpg_disp <- pcor(as.matrix(tmp_df1), method = 'spearman')
combined_table <- do.call(cbind, lapply(list("mpg_cycl" = mpg_cycl,
"mpg_disp" = mpg_disp), as.data.frame, USE.NAMES = TRUE))
for(i in colnames(mtcars_df[2:7])){
tmp_df = mtcars_df[c(i,"mpg")]
i_resutl <- pcor(as.matrix(tmp_df), method = 'spearman')
write.csv(i_resutl, file = paste0("MyDataOutput_",i,".csv"))
}
for merging before saving:
dta <- c()
for(i in colnames(mtcars_df[2:7])){
tmp_df = mtcars_df[c(i,"mpg")]
i_resutl <- pcor(as.matrix(tmp_df), method = 'spearman')
dta <- rbind(dta,c(i,(unlist( i_resutl))))
}

Resources