Reading variable name of dataframe into another dataframe using loops - r

So using a for loop I was able to break my 1.1 million row dataset in r into 110 tables of approximately 10,000 rows each in hopes of getting r to handle the data better. I now want to run another for loop that assigns the values in each of these tables to a different dataframe name.
My table names are:
Pom_1
Pom_2
Pom_3
...
Pom_110
What I want to do is create a for loop like the following:
for (i in 1:110)
{
Pom <- read.table(paste("Pom",i,sep = "_"))
for (j in 1:nrows(Pom))
{do something}
}
So I want to loop through the array and assign the values of each Pom table to "Pom" so that I can then run a for loop on each subsection of Pom. This problem is the read.table function does not seem to be the right one. Any ideas?

Can you give a more specific example of what you want to do withing each dataframe? You should avoid using the inner loop when possible and if you really need to have a look at ?apply
nrow instead of nrows

This is a generic solution using a example data.frame. The function you're looking for is assign, check it's help page:
Pom = data.frame(x = rnorm(30)) #original data.frame
n.tables = 3 # number of new data.frames you want to creat
Pom.names = paste("Pom",1:3,sep="") # name of all new data.frames
breaks = nrow(Pom)/n.tables * 0:n.tables # breaks of the original data.frame
for (i in 1:n.tables) {
rows = (breaks[i]+1):breaks[i+1] # which rows from Pom are going to be assign to the new data.frame?
assign(Pom.names[i],Pom[rows,]) # create new data.frame
}
ls()
[1] "breaks" "i" "n.tables" "Pom" "Pom.names" "Pom1"
[7] "Pom2" "Pom3" "rows"

I'm willing to bet the problem with your table call is that you aren't specifying the file extension (assuming Pom_1 - Pom_110 are files in your working directory, which I think they are since you're using read.table).
You can fix it by the following
fileExtension<-".xls" #specify your extension, I assume xls
for (i in 1:110)
{
tablename<-paste("Pom",i,sep = "_")
Pom <- read.table(paste(tablename, fileExtension, sep=""))
for (j in 1:nrows(Pom))
{do something}
}
Of course that's assuming a couple things about how everything in your problem is set up, but it's my best guess based on your description and code

Related

Change all R columns names using a reference file

I am trying to rename columns in a dataframe in R. However, the renaming has circular referencing. I would like a solution to this problem, the circular referencing cannot be avoided. One way to think was to rename a column and move it to a new dataframe, hence, avoiding the circular referencing. However, I am unable to do so.
The renaming reference is as follows:
The current function I am using is as follows:
standard_mapping <- function(mapping.col, current_name, standard_name, data){
for(i in 1:nrow(mapping.col)) {
# i =32
print(i)
eval(parse(text = paste0("std.name = mapping.col[",i,",'",new_name,"']")))
eval(parse(text = paste0("data.name = mapping.col[",i,",'",old_name,"']")))
if(data.name %in% colnames(data)){
setnames(data, old=c(data.name), new = c(std.name))
}
}
return(data)
}
Mapping.col is referred to the image
You can rename multiple colums at the same time, and there's no need to move the data itself that's stored in your data.frame. If you know the right order, you can just use
names(data) <- mapping.col$new_name
If the order is different, you can use match to first match them to the right positions:
names(data) <- mapping.col$new_name[match(names(data), mapping.col$old_name)]
By the way, assigning names and other attributes is always done by some sort of assignment. The setNames returns something, that still needs assigning.

Assign name to a substring in a loop importing raster files

I'm importing some raster files from a PostgreSQL connection into R in a loop. I want to assign my newly gained rasters automatically to a variable whose name is derived from the input variable like this: substring(crop, 12)
crop <- "efsa_capri_barley"
ras <- readGDAL(sprintf("PG:dbname='' host='' port='' user='' schema='' table='%s' mode=2", crop))
paste0(substring(crop, 12)) <- raster(ras, 1)
What function do I have to use that R recognizes the result of substring() as a character string and not as the function itself? I was thinking about paste() but it doesn't work.
Probably this question has already been asked but I couldn't find a proper answer.
Based on your description, assign is technically correct, but recommending it is bad advice.
If you are pulling in multiple rasters in a loop, best practice in R is to initialize a list to hold all the resulting rasters and name each list element accordingly. You can do this one at a time:
# n is number of rasters
raster_list <- vector("list",n)
for (i in seq_len(n)){
...
#crop[i] is the ith crop name
raster_list[[substring(crop[i],12)]] <- raster(...)
}
You can also set the names of each element of the list all at once via setNames. But you should try to avoid using assign pretty much at all costs.
If I understand your question correctly, you are looking for something like assign. For example you can try this:
assign(substring(crop, 12), raster(ras, 1))
To understand how assign works, you can check this code:
x <- 2
# x is now 2
var_to_assign <- "x"
assign(var_to_assign, 3)
# x is now set to 3
x
# 3
Does that give you what you want?

How to change name referenced in a for loop -- R

I have a linear program in my R code. I am passing data frames through it that have similar names "FP_2013_01", "FP_2013_02", "FP_2014_01", etc. I would like the for loop to adjust the "2013" and "01" values dynamically, so I don't have to repeat the process over and over again. Here is the linear program:
num_constraints <- 5
dec_var <-length(FP_2013_01$PLAYER)
test <- make.lp(num_constraints,dec_var)
set.type(test,{1:dec_var},"binary")
set.objfn(test,c(FP_2013_01$avg_FD_PTS))
set.row(test,1,c(FP_2013_01$Wk1))
set.row(test,2,c(POS_FP_2013_01$QB))
set.constr.type(test,c(3,3,3,3),{2:5})
set.row(test,3,c(POS_FP_2013_01$RB))
set.row(test,4,c(POS_FP_2013_01$WR))
set.row(test,5,c(POS_FP_2013_01$TE))
set.rhs(test,c(50000,1,2,3,1))
lp.control(test,sense='max')
write.lp(test,'model.lp',type='lp')
solve(test)
get.objective(test)
I would like to add something like: for (i in 2013:2014) { for (j in 1:10)...}
Thoughts?
renaming in R can be done just by setting it to something else with = or <- and then removing the original for example:
newFP = FP_2013_01
remove(FP_2013_01)
I use remove() regularly to clean up my workspace from big data.frames and data.tables.

Cleaning up the global environment after sourcing: How to remove objects of a certain type in R

I read in a public-use dataset that created dozens of temporary vectors in the process of building a final dataframe. Since this dataframe will be analyzed as part of a larger process, I plan on sourceing the R script that creates the dataframe, but do not want to leave myself or future users with a cluttered global environment.
I know that I can use ls to list the current objects in my global environment and use rm to remove certain objects, but I'm unsure of how to use those two functions in concert to remove all objects except the dataframe created by a certain script.
To clarify, here is a reproducible example:
Script 1, named "script1.R"
setwd("C:/R/project")
set.seed(12345)
var <- letters
for (i in var) {
assign(i, runif(1))
}
df <- data.frame(x1 = a, x2 = b, x3 = c)
Script 2
source("script1.r")
It would be easy enough to remove all vectors from the sourced script by some combination of rm, ls with pattern = letters or something like that, but what I want to do is create a general function that removes ALL vectors created by a certain script and only retain the dataframe (in this example, df).
(NOTE: There are similar questions as this here and here, but I feel mine is different in that it is more specific to sourcing and cleaning in the context of a multi-script project).
Update
While looking around, the following link gave me a nice work around:
How can I neatly clean my R workspace while preserving certain objects?
Specifically, user #Fojtasek suggested:
I would approach this by making a separate environment in which to store all the junk variables, making your data frame using with(), then copying the ones you want to keep into the main environment. This has the advantage of being tidy, but also keeping all your objects around in case you want to look at them again.
So I could just append the source code that creates the dataframe as follows...
temp <- new.env()
with(temp, {
var <- letters
for (i in var) {
assign(i, runif(1))
}
df <- data.frame(x1 = a, x2 = b, x3 = c)
}
... and then just extract the desired dataframe (df) to my global environment, but I'm curious if there are other elegant solutions, or if I'm thinking about this incorrectly.
Thanks.
As an alternative approach (similar to #Ken's suggestion from the comments), the following code allows you to delete all objects created after a certain point, except one (or more) that you specify:
freeze <- ls() # all objects created after here will be deleted
var <- letters
for (i in var) {
assign(i, runif(1))
}
df <- data.frame(x1 = a, x2 = b, x3 = c)
rm(list = setdiff(ls(), c(freeze, "df"))) #delete old objects except df
The workhorse here is setdiff(), which will return a list a list of the items that appear in the first list but not the second. In this case, all items created after freeze except df. As an added bonus, freeze is deleted here as well.
This should work.
source(file="script1.R")
rm(list=ls()[!sapply(mget(ls(),.GlobalEnv), is.data.frame)])
Breaking it down:
mget(ls()) gets all the objects in the global environment
!sapply(..., is.data.frame determines which is not a data.frame
rm(list=ls()[..] removes only the objects that are not data.frames
I have scripts like this save the result as an RDS file and then open the result in a new session (or alternatively, after clearing everything). That is,
a <- 1
saveRDS(a, file="a.RDS")
rm(list=ls())
a <- readRDS("a.RDS")
a
## [1] 1

using a for loop to add columns to a data frame

I am new to R and it seems like this shouldn't be a difficult task but I cannot seem to find the answer I am looking for. I am trying to add multiple vectors to a data frame using a for loop. This is what I have so far and it works as far as adding the correct columns but the variable names are not right. I was able to fix them by using rename.vars but was wondering if there was a way without doing that.
for (i in 1:5) {
if (i==1) {
alldata<-data.frame(IA, rand1) }
else {
alldata<-data.frame(alldata, rand[[i]]) }
}
Instead of the variable names being rand2, rand3, rand4, rand5, they show up as rand..i.., rand..i...1, rand..i...2, and rand..i...3.
Any Suggestions?
You can set variable names using the colnames function. Therefore, your code would look something like:
newdat <- cbind(IA, rand1, rand[2:5])
colnames(newdat) <- c(colnames(IA), paste0("rand", 1:5))
If you're creating your variables in a loop, you can assign the names during the loop
alldata <- data.frame(IA)
for (i in 1:5) {alldata[, paste0('rand', i)] <- rand[[i]]}
However, R is really slow at loops, so if you are trying to do this with tens of thousands of columns, the cbind and rename approach will be much faster.
Just do cbind(IA, rand1, rand[2:5]).

Resources