cbind dataframe in R with placeholders

cbind dataframe in R with placeholders - r

Imagine I have three dataframes:
data.frame1 <- data.frame(x=c(1:10))
data.frame2 <- data.frame(x=c(11:20))
data.frame3 <- data.frame(x=c(21:30))
I could bind them together by explicitely naming each of them:
res.data.frame <- cbind(data.frame1, data.frame2, data.frame3)
However, I am looking for more dynamic ways to do so, e.g. with placeholders.
This saves somehow the three dataframes in a new dataframe, but not in a usable format:
res.data.frame1 <- as.data.frame(mapply(get, grep("^data.frame.$", ls(), value=T)))
This command would only save the three names:
res.data.frame2 <- grep(pattern = "^data.frame.$", ls(), value=T)
This one only gives an error message:
res.data.frame3 <- do.call(cbind, lapply(ls(pattern = "^data.frame.$")), get)
Does anyone know the right way to do this?

Something like this maybe?
Assuming ls()
# [1] "data.frame1" "data.frame2" "data.frame3"
as.data.frame(Reduce("cbind", sapply(ls(), function(i) get(i))))
Based on #akrun's comment, this can be simplified to
as.data.frame(Reduce("cbind", mget(ls())))

Related

how to repeat order for dataframe in r

I want to do same order as removing the first rows for several dataframes.
lab1 <- lab1[-c(1),]
lab2 <- lab2[-c(1),]
lab3 <- lab3[-c(1),]
lab4 <- lab4[-c(1),]
lab5 <- lab5[-c(1),]
lab6 <- lab6[-c(1),]
lab7 <- lab7[-c(1),]
lab8 <- lab8[-c(1),]
lab9 <- lab9[-c(1),]
lab10 <- lab10[-c(1),]
...
I want to use repeated phase like this.
for(i in 2:19){ labi <- labi[-c(1),]}
However, labi is recognized as a dataframe name.
I need to do such orders for many dataframes. Can someone help?

Maybe you can try list2env like below
list2env(
lapply(mget(ls(pattern = "lab\\d+")), function(x) x[-1, ]),
envir = .GlobalEnv
)

How do you create the dataframes? Is it possible instead to make a list of lab[[i]]?
Else, you can write the expression as a string then evaluate it, but it's a bit of a hacky way, it would be better to avoid it:
cmd <- paste0("lab", 1, "[c(-1),]")
eval(str2expression(cmd))

relisting resulted data.frame respect to input list

I feed inputList to my custom function, after several workflows(few simple filtration), I end up with data.frame resultDF, which needed to be relisted. I used relist to make resultDF has the same structure of inputList, but I got an error. Is there any simplest way of relisting resultDF? Can anyone point me out how to make this happen? Any idea? sorry for this simple question.
Here is input data.frame within the list:
inputList <- list(
bar=data.frame(from=c(8,18,33,53),
to=c(14,21,39,61), val=c(48,7,10,8)),
cat=data.frame(from=c(6,15,20,44),
to=c(10,17,34,51), val=c(54,21,14,12)),
foo=data.frame(from=c(11,43), to=c(36,49), val=c(49,13)))
After several workflows, I end up with this data.frame:
resultDF <- data.frame(
from=c(53,8,6,15,11,44,43,44,43),
to=c(61,14,10,17,36,51,49,51,49),
val=c(8,48,54,21,49,12,13,12,13)
)
I need to relist resultDF with the same structure of inputList. I used relit method, but I got an error.
This is my desired list:
desiredList <- list(
bar=data.frame(from=c(8,53), to=c(14,61), val=c(48,8)),
cat=data.frame(from=c(6,15,44,44), to=c(10,17,51,51), val=c(54,21,12,12)),
foo=data.frame(from=c(11,43,43), to=c(36,49,49), val=c(49,13,13))
)
How can I achieve desiredList ? Thanks in advance :)

We can loop through the 'inputList' and check whether the pasted row elements in 'resultDF' are %in% list elements and use that index to subset the 'resultDF'
lapply(inputList, function(x) resultDF[do.call(paste, resultDF) %in% do.call(paste, x),])
Another option is a join and then split. We rbind the 'inputList' to a data.table with an additional column 'grp' specifying the list names, join with the 'resultDF' on the column names of 'resultDF', and finally split the dataset using the 'grp' column
library(data.table)
dt <- rbindlist(inputList, idcol = "grp")[resultDF, on = names(resultDF)]
split(dt[,-1, with = FALSE], dt$grp)

globbing dataframes or other objects in R

This should be a simple one, i hope. I have several dataframes loaded into workspace, labelled df01 to df100, not all numbers represented. I'd like to plot a specific column across all datasets, for example in a box plot. How do I refer all objects starting with df, using globbing, ie:
boxplot(df00$col1, df02$col1, df04$col1)
=
boxplot(df*$col1)

The idomatic approach is to work with lists, or to use a separate environment.
You can create this list using ls and pattern
df.names <- ls(pattern = '^df')
# note
# ls(pattern ='^df[[:digit:]]{2,}')
# may be safer if there are objects starting with df you don't want
df.list <- mget(df.names)
# note if you are using a version of R prior to R 3.0.0
# you will need `envir = parent.frame()`
# mget(ls(pattern = 'df'), envir = parent.frame())
# use `lapply` to extract the relevant columns
df.col1 <- lapply(df.list, '[[', 'col1')
# call boxplot
boxplot(df.col1)

Try this:
nums <- sprintf("%02d", 0:100)
dfs.names <- Filter(exists, paste0("df", nums))
dfs.obj <- lapply(dfs.names, get)
dfs.col1 <- lapply(dfs.obj, `[[`, "col1")
do.call(boxplot, dfs.col1)

Split the dataframe into subset dataframes and naming them on-the-fly (for loop)

I have 9880 records in a data frame, I am trying to split it into 9 groups of 1000 each and the last group will have 880 records and also name them accordingly. I used for-loop for 1-9 groups but manually for the last 880 records, but i am sure there are better ways to achieve this,
library(sqldf)
for (i in 0:8)
{
assign(paste("test",i,sep="_"),as.data.frame(final_9880[((1000*i)+1):(1000*(i+1)), (1:53)]))
}
test_9<- num_final_9880[9001:9880,1:53]
also am unable to append all the parts in one for-loop!
#append all parts
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
Any help is appreciated, thanks!

A small variation on this solution
ls <- split(final_9880, rep(0:9, each = 1000, length.out = 9880)) # edited to Roman's suggestion
for(i in 1:10) assign(paste("test",i,sep="_"), ls[[i]])
Your command for binding should work.
Edit
If you have many dataframes you can use a parse-eval combo. I use the package gsubfn for readability.
library(gsubfn)
nms <- paste("test", 1:10, sep="_", collapse=",")
eval(fn$parse(text='do.call(rbind, list($nms))'))
How does this work? First I create a string containing the comma-separated list of the dataframes
> paste("test", 1:10, sep="_", collapse=",")
[1] "test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10"
Then I use this string to construct the list
list(test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10)
using parse and eval with string interpolation.
eval(fn$parse(text='list($nms)'))
String interpolation is implemented via the fn$ prefix of parse, its effect is to intercept and substitute $nms with the string contained in the variable nms. Parsing and evaluating the string "list($mns)" creates the list needed. In the solution the rbind is included in the parse-eval combo.
EDIT 2
You can collect all variables with a certain pattern, put them in a list and bind them by rows.
do.call("rbind", sapply(ls(pattern = "test_"), get, simplify = FALSE))
ls finds all variables with a pattern "test_"
sapply retrieves all those variables and stores them in a list
do.call flattens the list row-wise.

No for loop required -- use split
data <- data.frame(a = 1:9880, b = sample(letters, 9880, replace = TRUE))
splitter <- (data$a-1) %/% 1000
.list <- split(data, splitter)
lapply(0:9, function(i){
assign(paste('test',i,sep='_'), .list[[(i+1)]], envir = .GlobalEnv)
return(invisible())
})
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
identical(all_9880,data)
## [1] TRUE

Rename columns of a data frame by searching column name

I am writing a wrapper to ggplot to produce multiple graphs based on various datasets. As I am passing the column names to the function, I need to rename the column names so that ggplot can understand the reference.
However, I am struggling with renaming of the columns of a data frame
here's a data frame:
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
here are my column names for search:
col1_search <- "col1"
col2_search <- "col2"
col3_search <- "col3"
and here are column names to replace:
col1_replace <- "new_col1"
col2_replace <- "new_col2"
col3_replace <- "new_col3"
when I search for column names, R sorts the column indexes and disregards the search location.
for example, when I run the following code, I expected the new headers to be new_col1, new_col2, and new_col3, instead the new column names are: new_col3, new_col2, and new_col1
colnames(df)[names(df) %in% c(col3_search,col2_search,col1_search)] <- c(col3_replace,col2_replace,col1_replace)
Does anyone have a solution where I can search for column names and replace them in that order?

require(plyr)
df <- data.frame(col2=1:3,col1=3:5,col3=6:8)
df <- rename(df, c("col1"="new_col1", "col2"="new_col2", "col3"="new_col3"))
df
And you can be creative in making that second argument to rename so that it is not so manual.

> names(df)[grep("^col", names(df))] <-
paste("new", names(df)[grep("^col", names(df))], sep="_")
> names(df)
[1] "new_col1" "new_col2" "new_col3"
If you want to replace an ordered set of column names with an arbitrary character vector, then this should work:
names(df)[sapply(oldNames, grep, names(df) )] <- newNames
The sapply()-ed grep will give you the proper locations for the 'newNames' vector. I suppose you might want to make sure there are a complete set of matches if you were building this into a function.

hmm, this might be way to complicated, but the first that come into my mind:
lookup <- data.frame(search = c(col3_search,col2_search,col1_search),
replace = c(col3_replace,col2_replace,col1_replace))
colnames(df) <- lookup$replace[match(lookup$search, colnames(df))]

I second #justin's aes_string suggestion. But for future renaming you can try.
require(stringr)
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
oldNames <- c("col1", "col2", "col3")
newNames <- c("new_col1", "new_col2", "new_col3")
names(df) <- str_replace(string=names(df), pattern=oldNames, replacement=newNames)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

cbind dataframe in R with placeholders - r

Something like this maybe? Assuming ls() # [1] "data.frame1" "data.frame2" "data.frame3" as.data.frame(Reduce("cbind", sapply(ls(), function(i) get(i)))) Based on #akrun's comment, this can be simplified to as.data.frame(Reduce("cbind", mget(ls())))

Related

how to repeat order for dataframe in r

relisting resulted data.frame respect to input list

globbing dataframes or other objects in R

Split the dataframe into subset dataframes and naming them on-the-fly (for loop)

Rename columns of a data frame by searching column name

Categories

Resources