how to cbind many data-frames? - r

I have 247 data frames which are sequentially named (y1, y2, y3, ...., y247). They are resulted from the following code:
for (i in (1:247)) {
nam <- paste("y", i, sep = "")
assign(nam, dairy[dairy$FARM==i,"YIT"])
}
I wish to cbind all of them to have:
df <- cbind(y1,y2,...,y247)
Can I do this with a loop without typing all 247 data frames?
Thanks

If you really want to do this, it is possible:
df <- y1
for (i in 2:247) {
df <- cbind(df, eval(parse(text=paste("y", i, sep = ''))))
}

Creating many variables in a loop as you do is not a good idea. You should use a list instead:
ys <- split(dairy$FARM, dairy$FARM)
names(ys) <- paste0("y", names(ys))
The first line creates list ys that contains your y1 as its first element (ys[[1]]), your y2 as its second element (ys[[2]]) and so on. The second line names the list elements the same way as you named your variables (y1, y2, etc.), since those will in the end
be used to name the columns in the data frame.
There is a function in the dplyr package that takes a list of data frames and binds them all together as columns:
library(dplyr)
df <- bind_cols(ys)
Note, by the way, that this will only work, if each value appears exactly the same number of times in the column FARM, since the columns in a data frame must all have the same length.

Related

How to put data frame index inside a for loop in r?

I have a code that I want it to be repeated for three times, at each time I have outputs of data frame as df1, df2,... and inside this loop I have another loop that says bind this data frames by row, my problem is how to put index to " e<-bind_rows(listdf))" (I should have three "e" ) so at the end I could bind the three "e"s and have one dataframe including df1, df2,... for three repeats of index i.
In advance, I really appreciate your response.
for (i in 1:3){
(there are some codes in here which uses i as index and gives:)
df1=...
df2=...
listdf<-list()
for (j in 1:20){
z <- j
sdf <- paste("df", z, sep="")
ddf <- get(paste("df", z, sep=""))
listdf[[sdf]] <-ddf
}
e<-bind_rows(listdf))
}
You can use the assign function to save each e with lets say a number beside it. It will be something like this:
assign(paste0("e",i),bind_rows(listdf))
You will end up with e1, e2 and e3

How to do a complex edit of columns of all data frames in a list?

I have a list of 185 data frames called WaFramesNumeric. Each dataframe has several hundred columns and thousands of rows. I want to edit every data frame, so that it leaves all numeric columns as well as any non-numeric columns that I specify.
Using:
for(i in seq_along(WaFramesNumeric)) {
WaFramesNumeric[[i]] <- WaFramesNumeric[[i]][,sapply(WaFramesNumeric[[i]],is.numeric)]
}
successfully makes each dataframe contain only its numeric columns.
I've tried to amend this with lines to add specific columns. I have tried:
for (i in seq_along(WaFramesNumeric)) {
a <- WaFramesNumeric[[i]]$Device_Name
WaFramesNumeric[[i]] <- WaFramesNumeric[[i]][,sapply(WaFramesNumeric[[i]],is.numeric)]
cbind(WaFramesNumeric[[i]],a)
}
and in an attempt to call the column numbers of all integer columns as well as the specific ones and then combine based on that:
for (i in seq_along(WaFramesNumeric)) {
f <- which(sapply(WaFramesNumeric[[i]],is.numeric))
m <- match("Cost_Center",colnames(WaFramesNumeric[[i]]))
n <- match("Device_Name",colnames(WaFramesNumeric[[i]]))
combine <- c(f,m,n)
WaFramesNumeric[[i]][,i,combine]
}
These all return errors and I am stumped as to how I could do this. WaFramesNumeric is a copy of another list of dataframes (WaFramesNumeric <- WaFramesAll) and so I also tried adding the specific columns from the WaFramesAll but this was not successful.
I appreciate any advice you can give and I apologize if any of this is unclear.
You are mistakenly assuming that the last commmand in a for loop is meaningful. It is not. In fact, it is being discarded, so since you never assigned it anywhere (the cbind and the indexing of WaFramesNumeric...), it is silently discarded.
Additionally, you are over-indexing your data.frame in the third code block. First, it's using i within the data.frame, even though i is an index within the list of data.frames, not the frame itself. Second (perhaps caused by this), you are trying to index three dimensions of a 2D frame. Just change the last indexing from [,i,combine] to either [,combine] or [combine].
Third problem (though perhaps not seen yet) is that match will return NA if nothing is found. Indexing a frame with an NA returns an error (try mtcars[,NA] to see). I suggest that you can replace match with grep: it returns integer(0) when nothing is found, which is what you want in this case.
for (i in seq_along(WaFramesNumeric)) {
f <- which(sapply(WaFramesNumeric[[i]], is.numeric))
m <- grep("Cost_Center", colnames(WaFramesNumeric[[i]]))
n <- grep("Device_Name", colnames(WaFramesNumeric[[i]]))
combine <- c(f,m,n)
WaFramesNumeric[[i]] <- WaFramesNumeric[[i]][combine]
}
I'm not sure what you mean by "an attempt to call the column numbers of all integer columns...", but in case you want to go through a list of data frames and select some columns based on some function and keep given a column name you can do like this:
df <- data.frame(a=rnorm(20), b=rnorm(20), c=letters[1:20], d=letters[1:20], stringsAsFactors = FALSE)
WaFramesNumeric <- rep(list(df), 2)
Selector <- function(data, select_func, select_names) {
select_func <- match.fun(select_func)
idx_names <- match(select_names, colnames(data))
idx_names <- idx_names[!is.na(idx_names)]
idx_func <- which(sapply(data, select_func))
idx <- unique(c(idx_func, idx_names))
return(data[, idx])
}
res <- lapply(X = WaFramesNumeric, FUN = Selector, select_names=c("c"), select_func = is.numeric)

R repeatedly adding similar column from different data frames into one data frame

I have 10 data sets : prediction1.csv, prediction2.csv, ... and they all have similar columns e.g a, b, c..
I want to add the "a" column from each of the data sets into a combined data frame "evaluating" and rename them accordingly a1, a2, a3..
What I have tried so far is:
I red in the data sets. this part works fine
for(i in 1:10){
assign(paste("pred.", i, sep = ""), read_csv(paste0("prediction", i, ".csv")))
}
I tried to assign new columns to the "evaluating" data frame, but this does not work, it creates variables evaluating[a1] instead of adding a variable to the data frame
for(i in 1:10){
assign(paste("evaluating[a.", i,"],"), paste0("pred.",i,"$a" ))
}
If you want just the a column of every data.frame you can try:
files = paste0("prediction",1:10,".csv")
data = lapply(seq_along(files),function(x) {
dat = read.csv2(paste0("path/to/file/",files[x]))
dat = data.frame(dat$a) ; colnames(dat) = paste0("a",x)
return(dat)
})
do.call(rbind,data)
For this approach, all prediction data.frames need to have the same amount of rows as well a a column named a.

Splitting large data frame by column into smaller data frames (not lists) using loops

I have many large data frames. Using of the smaller ones for example:
dim(ch29)
476 4283
I need to split it into smaller pieces (i.e. subset into 241 columns at the most). My problems come afterwards when I want to analyze these smaller subsets.
I do not know how to subset the large date-frame into smaller data-frames and not simply a list.
I also want to do all of this in a loop and give the newly created smaller data frames unique names in the loop.
chunk=241
df<-ch29
n<-ceiling(ncol(df)/chunk)
for (i in 1:n) {
xname <- paste("ch29", i, sep="_")
cat("_", xname)
assign(xname, split(df, rep(1:n, each=chunk, length.out=ncol(df))))
}
I'm not exactly sure what you're trying to do or how you want to choose the columns that go in each data frame, but here's an example of one option:
# Fake data
set.seed(100)
ch29 = as.data.frame(replicate(4283, rnorm(476)))
# Number of columns we want in each split data frame
ncols = floor(ncol(ch29)/20)
# Start column for each split data frame
start = seq(1,ncol(ch29),ncols)
# Split ch29 into a bunch of separate data frames
df.list = lapply(setNames(start, paste0("ch29_", start, "_", start+ncols-1)),
function(i) ch29[ , i:min(i+ncols-1,ncol(ch29))])
You now have a list, df.list, where each list element is a data frame with ncols columns from ch29, except for the last element of the list, which will have between 1 and ncols columns. Also, the name of each list element is the name of the parent data frame (ch29) and the column range from which the subset data frame is drawn.
Try
for (i in 1:3) { # i = 1
xname = paste("ch29", i, sep = "_")
col.min = (i - 1) * chunk + 1
col.max = min(i * chunk, ncol(df))
assign(xname, df[,col.min:col.max])
}
In other words, use the notation df[,a:b], where a < b, to get the subset of the dataframe df consisting only of columns a to b.

Using lapply to apply a function over list of data frames and saving output to files with different names

I have a list of data frames and have given each element in the list (e.g. each data frame) a name:
e.g.
df1 <- data.frame(x = c(1:5), y = c(11:15))
df2 <- data.frame(x = c(1:5), y = c(11:15))
mylist <- list(A = df1, B = df2)
I have a function that I want to apply to each data frame; In this function, I want to include a line to write the results to file (eventually I want to do more complicated things like save plots of the correlation between two variables for each data frame but thought I'd start simple)
e.g.
NewVar <- function(mydata, whichVar, i) {
mydata$newVar <- mydata[, whichVar] + 1
write.csv(mydata, file = i)
}
I want to use lapply() to apply this function to each data frame in my list
something like:
hh<-lapply(mylist, NewVar, whichVar = "y")
I can't figure out how to assign the "i" within the context of lapply so that i iterates over the names in the list of data frames, saving multiple files with different names (in this case, two files named A and B) that correspond with the modified data frames.
It will work with the following lapply call:
lapply(names(mylist), function(x) NewVar(mylist[[x]], "y", x))
There are many options. For example:
lapply(names(mylist),
function(x)write.csv(mylist[x],
file =paste0(x,'.csv')))
or using indexes :
lapply(seq_along(mylist),
function(i)write.csv(mylist[i],
file =paste0(names(mylist)[i],'.csv')))

Resources