I'm trying to use various data frames within a single for loop, ie:
#after loading the 5 data frames
for(i in 1:5){
dframe <- dataframe[i]
print(sprintf("This is data frame %s", dframe)
}
However this only passes the variable name and not the data frame itself. Thanks.
To obtain the data use the get function.
dframe <- get(dataframe[i])
Related
I have a for loop as
for(i in c("a","b","c","d"))
{
as.name(paste("df",i,sep=""))= mydataframe
}
mydataframe is a data frame and I want to create data frames dfa,dfb,dfc and dfd using this loop.
The as.name(paste("df",i,sep="")) does not work here. I do not want to create a list that has the 4 data frames.
Can I directly create 4 data frames from this loop?
You can do this using assign. Although in general, you are better off using lists.
Using your example:
for(i in letters[1:4]){
assign(paste0("df", i), mydataframe)
}
Note that this will simply create the same object 4 times, unless you change what mydataframe is inside the loop.
This is the code I am currently using to move data from multiple data frames into a time-ordered vector which I then perform analysis on and graph:
TotalLoans <- c(
sum(as.numeric(HCD2001$loans_all)), sum(as.numeric(HCD2002$loans_all)),
sum(as.numeric(HCD2003$loans_all)), sum(as.numeric(HCD2004$loans_all)),
sum(as.numeric(HCD2005$loans_all)), sum(as.numeric(HCD2006$loans_all)),
sum(as.numeric(HCD2007$loans_all)), sum(as.numeric(HCD2008$loans_all)),
sum(as.numeric(HCD2009$loans_all)), sum(as.numeric(HCD2010$loans_all)),
sum(as.numeric(HCD2011$loans_all)), sum(as.numeric(HCD2012$loans_all)),
sum(as.numeric(HCD2013$loans_all)), sum(as.numeric(HCD2014$loans_all)),
sum(as.numeric(HCD2015$loans_all)), sum(as.numeric(HCD2016$loans_all))
)
I do this four more times with similar data frames that also are similarly formatted as:
Varname$year
Is there a way to loop through these 16 data frames, select an individual column, perform a function on it, and put it into a vector? This is what I have tried so far:
AllList <- list(HCD2001, HCD2002, HCD2003, HCD2004, HCD2005, HCD2006, HCD2007, HCD2008, HCD2009, HCD2010, HCD2011, HCD2012, HCD2013, HCD2014, HCD2015, HCD2016)
TotalLoans <- lapply(AllList,
function(df){
sum(as.numeric(df$loans_all))
return(df)
}
)
However, it returns a Large List with every column from the data frames. All the other posts related to this were for modifying data frames, not creating a new vector with modified values of the data frames.
I cannot for the life of me figure out where the simple error is in my for loop to perform the same analyses over multiple data frames and output each iteration's new data frame utilizing the variable used along with extra string to identify the new data frame.
Here is my code:
john and jane are 2 data frames among many I am hoping to loop over and compare to bcm to find duplicate results in rows.
x <- list(john,jane)
for (i in x) {
test <- rbind(bcm,i)
test$dups <- duplicated(test$Full.Name,fromLast=T)
test$dups2 <- duplicated(test$Full.Name)
test <- test[which(test$dups==T | test$dups2==T),]
newname <- paste("dupl",i,sep=".")
assign(newname, test)
}
Thus far, I can either get the naming to work correctly without including the x data or the loop to complete correctly without naming the new data frames correctly.
Intended Result: I am hoping to create new data frames dupl.john and dupl.jane to show which rows are duplicated in comparison to bcm.
I understand that lapply() might be better to use and am very open to that form of solution. I could not figure out how to use it to solve my problem, so I turned to the more familiar for loop.
EDIT:
Sorry if I'm not being more clear. I have about 13 data frames in total that I want to run the same analysis over to find the duplicate rows in $Full.Name. I could do the first 4 lines of my loop and then dupl.john <- test 13 times (for each data frame), but I am purposely trying to write a for loop or lapply() to gain more knowledge in R and because I'm sure it is more efficient.
If I understand correctly based on your intended result, maybe using the match_df could be an option.
library(plyr)
dupl.john <- match_df(john, bcm)
dupl.jane <- match_df(jane, bcm)
dupl.john and dupl.jane will be both data frames and both will have the rows that are in these data frames and bcm. Is this what you are trying to achieve?
EDITED after the first comment
library(plyr)
l <- list(john, jane)
res <- lapply(l, function(x) {match_df(x, bcm, on = "Full.Name")} )
dupl.john <- as.data.frame(res[1])
dupl.jane <- as.data.frame(res[2])
Now, res will have a list of the data frames with the matches, based on the column "Full.Name".
I want a function that I can call several times throughout a data analysis script, each time appending a new data frame to an existing list.
myList <- list()
The function creates a new data frame after subsetting an existing data frame, and then appends this new data frame to my list (in theory).
appendList = function(){
df = mydf[mydf$myData < 0.5, ]
myList[[(length(myList)+1)]] <- df
}
In my real-world problem I have several different code chunks, each with a different set of data in the column 'myData'.
I thought I could just use my function above like this:
mydf <- data.frame(myData = runif(10))
appendList()
mydf <- data.frame(myData = rnorm(10))
appendList()
But my list remains unchanged:
length(myList)
>[1] 0
Is it an environment issue?
My goal is for 'myList' to contain all of these different data frames.
Bonus: Perhaps there is a better way to complete this kind of task?
I have a number of R scripts that create data frames of the same length and I am trying to aggregate all the data frames into one.
I used a for loop to run those R scripts:
for(i in sample){
source(i)
}
This does create all the data frames I need. But is there a good way to include a function that binds those data frames together within that for loop?
Assuming source(i) returns a data frame, you can combine all the data frames together with something like:
do.call(rbind, lapply(sample, source))