how to repeat order for dataframe in r - r

I want to do same order as removing the first rows for several dataframes.
lab1 <- lab1[-c(1),]
lab2 <- lab2[-c(1),]
lab3 <- lab3[-c(1),]
lab4 <- lab4[-c(1),]
lab5 <- lab5[-c(1),]
lab6 <- lab6[-c(1),]
lab7 <- lab7[-c(1),]
lab8 <- lab8[-c(1),]
lab9 <- lab9[-c(1),]
lab10 <- lab10[-c(1),]
...
I want to use repeated phase like this.
for(i in 2:19){ labi <- labi[-c(1),]}
However, labi is recognized as a dataframe name.
I need to do such orders for many dataframes. Can someone help?

Maybe you can try list2env like below
list2env(
lapply(mget(ls(pattern = "lab\\d+")), function(x) x[-1, ]),
envir = .GlobalEnv
)

How do you create the dataframes? Is it possible instead to make a list of lab[[i]]?
Else, you can write the expression as a string then evaluate it, but it's a bit of a hacky way, it would be better to avoid it:
cmd <- paste0("lab", 1, "[c(-1),]")
eval(str2expression(cmd))

Related

Save a dataframe name and then reference that object in subsequent code

Would like to reference a dataframe name stored in an object, such as:
dfName <- 'mydf1'
dfName <- data.frame(c(x = 5)) #want dfName to resolve to 'mydf1', not create a dataframe named 'dfName'
mydf1
Instead, I get: Error: object 'mydf1' not found
CORRECTED SCENARIO:
olddf <- data.frame(c(y = 8))
mydf1 <- data.frame(c(x = 5))
assign('dfName', mydf1)
dfName <- olddf #why isnt this the same as doing "mydf1 <- olddf"?
I don't want to reference an actual dataframe named "dfName", rather "mydf1".
UPDATE
I have found a clunky workaround for what I wanted to do. The code is:
olddf <- data.frame(x = 8)
olddfName <- 'olddf'
newdfName <- 'mydf1'
statement <- paste(newdfName, "<-", olddfName, sep = " ")
writeLines(statement, "mycode.R")
source("mycode.R")
Anyone have a more elegant way, especially without resorting to a write/source?
I am guessing you want to store multiple data.frames in a loop or similar. In that case it is much more efficient and better to store them in a named list. However, you can achieve your goal with assign
assign('mydf1', data.frame(x = 5))
mydf1
x
1 5

cbind dataframe in R with placeholders

Imagine I have three dataframes:
data.frame1 <- data.frame(x=c(1:10))
data.frame2 <- data.frame(x=c(11:20))
data.frame3 <- data.frame(x=c(21:30))
I could bind them together by explicitely naming each of them:
res.data.frame <- cbind(data.frame1, data.frame2, data.frame3)
However, I am looking for more dynamic ways to do so, e.g. with placeholders.
This saves somehow the three dataframes in a new dataframe, but not in a usable format:
res.data.frame1 <- as.data.frame(mapply(get, grep("^data.frame.$", ls(), value=T)))
This command would only save the three names:
res.data.frame2 <- grep(pattern = "^data.frame.$", ls(), value=T)
This one only gives an error message:
res.data.frame3 <- do.call(cbind, lapply(ls(pattern = "^data.frame.$")), get)
Does anyone know the right way to do this?
Something like this maybe?
Assuming ls()
# [1] "data.frame1" "data.frame2" "data.frame3"
as.data.frame(Reduce("cbind", sapply(ls(), function(i) get(i))))
Based on #akrun's comment, this can be simplified to
as.data.frame(Reduce("cbind", mget(ls())))

I'm trying to split a data set into two random fragments, and my code isn't working

I'm trying to do this simple task, all the variables are initialized properly, but for some reason this isn't working. What am I doing wrong?
for(i in 1:117)
{x = runif(1,0,1)
if(x<0.5)
testframe = rbind(utilities[i,])
else
trainframe = rbind(utilities[i,])}
In your loop, you overwrite both testframe and trainframe in each run of the loop. You could use testframe <- rbind(testframe, utilities[i, ]), but this would be quite inefficient.
Here's another approach without loops:
x <- sample(c(TRUE, FALSE), 117, replace = TRUE)
testframe <- utilities[x, ]
trainframe <- utilities[!x, ]
You can also create a list including the two subsets (based on vector x):
split(utilities, x)
If you insist to use a for loop, you can always save your outcome as an empty list and add items to that list:
Here's an untested example (as I do not have the "utilities" data):
testframe <- list()
trainframe <- list()
for(i in 1:117)
{x = runif(1,0,1)
if(x<0.5)
testframe[i] <- utilities[i,] ##whatever you want to save here
else
trainframe = utilities[i,]
}
Hope this helps

globbing dataframes or other objects in R

This should be a simple one, i hope. I have several dataframes loaded into workspace, labelled df01 to df100, not all numbers represented. I'd like to plot a specific column across all datasets, for example in a box plot. How do I refer all objects starting with df, using globbing, ie:
boxplot(df00$col1, df02$col1, df04$col1)
=
boxplot(df*$col1)
The idomatic approach is to work with lists, or to use a separate environment.
You can create this list using ls and pattern
df.names <- ls(pattern = '^df')
# note
# ls(pattern ='^df[[:digit:]]{2,}')
# may be safer if there are objects starting with df you don't want
df.list <- mget(df.names)
# note if you are using a version of R prior to R 3.0.0
# you will need `envir = parent.frame()`
# mget(ls(pattern = 'df'), envir = parent.frame())
# use `lapply` to extract the relevant columns
df.col1 <- lapply(df.list, '[[', 'col1')
# call boxplot
boxplot(df.col1)
Try this:
nums <- sprintf("%02d", 0:100)
dfs.names <- Filter(exists, paste0("df", nums))
dfs.obj <- lapply(dfs.names, get)
dfs.col1 <- lapply(dfs.obj, `[[`, "col1")
do.call(boxplot, dfs.col1)

Split the dataframe into subset dataframes and naming them on-the-fly (for loop)

I have 9880 records in a data frame, I am trying to split it into 9 groups of 1000 each and the last group will have 880 records and also name them accordingly. I used for-loop for 1-9 groups but manually for the last 880 records, but i am sure there are better ways to achieve this,
library(sqldf)
for (i in 0:8)
{
assign(paste("test",i,sep="_"),as.data.frame(final_9880[((1000*i)+1):(1000*(i+1)), (1:53)]))
}
test_9<- num_final_9880[9001:9880,1:53]
also am unable to append all the parts in one for-loop!
#append all parts
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
Any help is appreciated, thanks!
A small variation on this solution
ls <- split(final_9880, rep(0:9, each = 1000, length.out = 9880)) # edited to Roman's suggestion
for(i in 1:10) assign(paste("test",i,sep="_"), ls[[i]])
Your command for binding should work.
Edit
If you have many dataframes you can use a parse-eval combo. I use the package gsubfn for readability.
library(gsubfn)
nms <- paste("test", 1:10, sep="_", collapse=",")
eval(fn$parse(text='do.call(rbind, list($nms))'))
How does this work? First I create a string containing the comma-separated list of the dataframes
> paste("test", 1:10, sep="_", collapse=",")
[1] "test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10"
Then I use this string to construct the list
list(test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10)
using parse and eval with string interpolation.
eval(fn$parse(text='list($nms)'))
String interpolation is implemented via the fn$ prefix of parse, its effect is to intercept and substitute $nms with the string contained in the variable nms. Parsing and evaluating the string "list($mns)" creates the list needed. In the solution the rbind is included in the parse-eval combo.
EDIT 2
You can collect all variables with a certain pattern, put them in a list and bind them by rows.
do.call("rbind", sapply(ls(pattern = "test_"), get, simplify = FALSE))
ls finds all variables with a pattern "test_"
sapply retrieves all those variables and stores them in a list
do.call flattens the list row-wise.
No for loop required -- use split
data <- data.frame(a = 1:9880, b = sample(letters, 9880, replace = TRUE))
splitter <- (data$a-1) %/% 1000
.list <- split(data, splitter)
lapply(0:9, function(i){
assign(paste('test',i,sep='_'), .list[[(i+1)]], envir = .GlobalEnv)
return(invisible())
})
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
identical(all_9880,data)
## [1] TRUE

Resources