I have written a function to strip down a data frame to contain only the columns I want to plot. I now want to iterate a list of data frames through that function, so that each individual data frame only contains the info relevant to my plot.
Here is the function:
clean_data <- function(show_df){
show_data <- show_df[,c(1:2,7)]
colnames(show_data) <- c("Week", "WeeklyGross", "AvgTicketPrice")
#turns WeeklyGross into Numeric values
show_data$WeeklyGross <- gsub('[^a-zA-Z0-9.]', '', show_data$WeeklyGross)
show_data$WeeklyGross <- as.numeric(show_data$WeeklyGross)
#turns AvgTicketPrice into Numeric values
show_data$AvgTicketPrice <- gsub('[^a-zA-Z0-9.]', '', show_data$AvgTicketPrice)
show_data$AvgTicketPrice <- as.numeric(show_data$AvgTicketPrice)
show_data
}
And here is my code when I attempt to iterate the list of my data frames through the function:
df.list <- list(atw_df, cly_df, gent_df, kin_df,
mo_df,on_df, van_df, war_df)
new_list <- list()
for (i in seq(df.list)){
new_list <- clean_data(i)
}
I know that my loop is missing something, but I cannot figure out what. I want to store each data frame from that list in it's revised format as a variable so that I can use them to plot the information.
EDIT: made some code changes, I am now receiving an incorrect number of dimensions error in show_df[, c(1:2, 7)]
EDIT2: more changes made to the for loop, still receiving same error message.
Once you have your function, and your list, simply do
new_list <- lapply(df.list, clean_data)
Which will call clean_data once for each data frame in df.list and return a list of newly cleaned data frames.
Thus your entire "loop" becomes
df.list <- list(atw_df, cly_df, gent_df, kin_df,
mo_df,on_df, van_df, war_df)
new_list <- lapply(df.list, clean_data)
Related
I am trying to take my data frame that has a list of player id numbers and find their name, using this function. Right now my code will simply print separate tibbles of each result, but I want it to combine those results into a data frame. I tried using rbind, but it doesn't work.
for(x in dataframe...)
print(function I am using to find name)
Use sapply which is more efficient than looping :
results <- data.frame(name = sapply(dataframe[,'playerid'], FUN = function(id) baseballr::playername_lookup(id)))
You can initialise a results data frame like this
results <- data.frame()
You can then add the results in each loop using rbind combining the previous version with the new results. In the first iteration of the loop you add your first results to an empty data frame. So combined
results <- data.frame()
for(x in dataframe$playerid){
results <- rbind(results, baseballr::playername_lookup(x))
}
The problem in you code was that you simply printed the results without saving them anywhere.
As mentioned in the comment below, the better way to do this, once your data set becomes very large, is to create a list an later combine that to a data.frame.
results <- list()
for(i in seq_len(nrow(dataframe))){
results[[i]] <- baseballr::playername_lookup(dataframe$playerid[i])
}
final_results <- do.call(rbind, results)
I have a list of data frames allData. Each data frame has a column called idCode. How do I change the type of idCode to character with lapply (or some other function if possible)?
I've tried this but it only returns me a list of all the "idCode" columns. Nothing changed in the original allData list.
lapply(allData, function(x) x$idCode <- as.character(x$idCode))
I've also tried this:
lapply(allData, function(x) {x$idCode <- as.character(x$idCode) x})
With the hope that it will return all the data frames with idCode converted, so I may "stitch" them together again in a new list. However, it give me an error: unexpected symbol in "lapply(allData, function(x) {x$idCode <- as.character(x$idCode) x.
Is it possible to do this with lapply()? Or some other functions are also OK.
You have several options here:
You can just use a for loop and manipulate each dataframe with e.g. as.character()
for(i in 1:length(allData)){
allData[[i]]$idCode<-as.character(allData[[i]]$idCode)
}
or you use the global variable assignement '<<-'
lapply(X = 1:length(allData),FUN = function(x){
allData[[x]]$idCode<<-as.character(allData[[x]]$idCode)
return(NULL)
})
In order to change the type of a column in a dataframe you can also use the function class()
lapply(X = 1:length(allData),FUN = function(x){
class(allData[[x]]$idCode)<<-"character"
return(NULL)
})
I am curious that why the following code doesn't work for adding column data to a data frame.
a <- c(1:3)
b <- c(4:6)
df <- data.frame(a,b) # create a data frame example
add <- function(df, vector){
df[[3]] <- vector
} # create a function to add column data to a data frame
d <- c(7:9) # a new vector to be added to the data frame
add(df,d) # execute the function
If you run the code in R, the new vector doesn't add to the data frame and no error also.
R passes parameters to functions by value - not by reference - that means inside the function you work on a copy of the data.frame df and when returning from the function the modified data.frame "dies" and the original data.frame outside the function is still unchanged.
This is why #RichScriven proposed to store the return value of your function in the data.frame df again.
Credits go to #RichScriven please...
PS: You should use cbind ("column bind") to extend your data.frame independently of how many columns already exist and ensure unique column names:
add <- function(df, vector){
res <- cbind(df, vector)
names(res) <- make.names(names(res), unique = T)
res # return value
}
PS2: You could use a data.table instead of a data.frame which is passed by reference (not by value).
My problem is the following. Suppose I have 1000 dataframes in R with the names eq1.1, eq1.2, ..., eq1.1000. I would like a single dataframe containing my 1000 dataframes. Normally, if I have only two dataframes, say eq1.1 and eq1.2 then I could define
df <- data.frame(eq1.1,eq1.2)
and I'm good. However, I can't follow this procedure because I have 1000 dataframes.
I was able to define a list containing the names of my 1000 dataframes using the code
names <- c()
for (i in 1:1000){names[i]<- paste0("eq1.",i)}
However, the elements of my list are recognized as strings and not as the dataframes that I previously defined.
Any help is appreciated!
How about
df.names <- ls(pattern = "^eq1\\.\\d")
eq1.dat <- do.call(cbind,
lapply(df.names,
get))
rm(list = df.names)
library(stringi)
library(dplyr)
# recreate dummy data
lapply(1:1000,function(i){
assign(sprintf("eq1.%s",i),
as.data.frame(matrix(ncol = 12, nrow = 13, sample(1:15))),
envir = .GlobalEnv)
})
# Now have 1000 data frames in my working environment named eq1.[1:1000]
> str(ls(pattern = "eq1.\\d+"))
> chr [1:1000] "eq1.1" "eq1.10" "eq1.100" "eq1.1000" "eq1.101" "eq1.102" "eq1.103" ...
1) create a holding data frame from the ep1.1 data frame that will be appended
each iteration in the following loop
empty_df <- eq1.1
2) im going to search for all the data frame named by convention and
create a data frame from the returned characters which represent our data frame
objects, but are nothing more than a character string.
3) mutate that data frame to hold an indexing column so that I can order the data frames properly from 1:1000 as the character representation will not be in numeric order from the step above
4) Drop the indexing column once the data frame names are in proper sequence
and then unlist the dfs column back into a character sequence and slice
the first value out, since it is stored already to our empty_df
5) loop through that sequence and for each iteration globally assign and
bind the preceding data frame into place. So for example on iteration 1,
the empty_df is now the same as data.frame(ep1.1, ep1.2) and for the
second iteration the empty_df is the same as data.frame(ep1.1, ep1.2, ep1.3)
NOTE: the get function takes the character representation and calls the data object from it. see ?get for details
lapply(
data.frame(dfs = ls(pattern = 'eq1\\.\\d+'))%>%
mutate(nth = as.numeric(stri_extract_last_regex(dfs,'\\d+'))) %>%
arrange(nth) %>% select(-nth) %>% slice(-1) %>% .$dfs, function(i){
empty_df <<- data.frame(empty_df, get(i))
}
)
All done, all the dataframes are bound to the empty_df and to check
> dim(empty_df)
[1] 13 12000
I have a code similar to as shown below :
# initialize an empty list of dataframes here....
for (i in range 1:10) {
# create a new data frame here....
# append this newly created dataframe to the list here....
How can I create an empty list of dataframes at the start of the loop and then go on adding a newly created dataframe in each iteration of the for loop?
If the sole purpose is to merge the data frames, it may be easier to use the merge_all from the reshape package:
reshape::merge_all(your_list_with_dfs, ...)
Or alternatively, you may try:
do.call("rbind", your_list_with_dfs)
in order to append the rows.
The way I did it was as follows (as suggested by #akrun)
list_of_frames <- replicate(10, data.frame())
for (i in range 1:10) {
# create a new data frame called new_dataframe
list_of_frames[[i]] <- new_dataframe
myList <- vector(mode = "list", length = 10) # Create an empty list.
# Length depends on how many dataframe you want to append.
for (i in seq(1,10)) {
myList[[i]] <- new_dataframe #append your new dataframe
}