Losing row names during for loop - r

I have the following code:
supply = vector(length = 64, mode = 'list')
for (i in 1:64) {
supply[[i]] = df3[rownames(df6),]*df6[,i]
names(supply) <- sheetnames
both df3 and df6 have row names, which is use to match the 64 new tables on. In these new tables the row names dissappear (column names are still there). How do I get the row names in my results? I need to export them to Excel including the row names which are matched in the for loop.
**edit
i tried the following:
supply = vector(length = 64, mode = 'list')
for (i in 1:64) {
supply[[i]] = df3[rownames(df6),]*df6[,i]
row.names(supply[[i]]) = row.names(df6)}
but it does not work

You can try with this. It should return exactly what you're looking for, with the exception that rownames is a column.
# get initial columnnames
colnames3 <- names(df3)
colnames6 <- names(df6)
# set rownames as a column names "rowname"
df3 <- tibble::rownames_to_column(df3)
df6 <- tibble::rownames_to_column(df6)
# join by rowname
df3 <- dplyr::inner_join(df3, df6, by = "rowname")
# define the columns you need
out <- df3[c("rowname", colnames3)]
# your loop!
supply <- lapply(colnames6, function(col){
out[colnames3] <- df3[colnames3] * df3[,col]
out
})
Without a reproducible example is difficult to help you more.
lapply returns a list, so you don't need to initialize supply before and you don't need a for loop either.

Related

How can lapply work with addressing columns as unknown variables?

So, I have a list of strings named control_for. I have a data frame sampleTable with some of the columns named as strings from control_for list. And I have a third object dge_obj (DGElist object) where I want to append those columns. What I wanted to do - use lapply to loop through control_for list, and for each string, find a column in sampleTable with the same name, and then add that column (as a factor) to a DGElist object. For example, for doing it manually with just one string, it looks like this, and it works:
group <- as.factor(sampleTable[,3])
dge_obj$samples$group <- group
And I tried something like this:
lapply(control_for, function(x) {
x <- as.factor(sampleTable[, x])
dge_obj$samples$x <- x
}
Which doesn't work. I guess the problem is that R can't recognize addressing columns like this. Can someone help?
Here are two base R ways of doing it. The data set is the example of help("DGEList") and a mock up data.frame sampleTable.
Define a vector common_vars of the table's names in control_for. Then create the new columns.
library(edgeR)
sampleTable <- data.frame(a = 1:4, b = 5:8, no = letters[21:24])
control_for <- c("a", "b")
common_vars <- intersect(control_for, names(sampleTable))
1. for loop
for(x in common_vars){
y <- sampleTable[[x]]
dge_obj$samples[[x]] <- factor(y)
}
2. *apply loop.
tmp <- sapply(sampleTable[common_vars], factor)
dge_obj$samples <- cbind(dge_obj$samples, tmp)
This code can be rewritten as a one-liner.
Data
set.seed(2021)
y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
dge_obj <- DGEList(counts=y, group=rep(1:2,each=2))

for loops nested in R

I have a dataset dt, it stored list dataset names, I need to use them to create some new datasets with select some variables, then I use the dataset I just created, repeat the same process .....
The first row and second row were data available.
Then use data available to create a new data.
Then use data just create to create a new data
The final output was list of datasets
I appreciated any helps or suggestions.
dt <- data.frame(name = c("mtcars","iris", "mtcars_new","mtcars_new_1"),
data_source = c("mtcars","iris", "mtcars","mtcars_new"),
variable = c("","","mpg,cyl,am,hp","mpg,cyl"), stringsAsFactors = FALSE)
> dt
name data_source variable
1 mtcars mtcars
2 iris iris
3 mtcars_new mtcars mpg,cyl,am,hp
4 mtcars_new_1 mtcars_new mpg,cyl
dt_list <- list(mtcars, iris)
names(dt_list ) <- c("mtcars","iris")
# The final list of datasets
final_dt <- list(mtcars, iris, mtcars_new, mtcars_new_1)
So far if I wrote a loop like that, I got only mtcars_new dataset, but I don't know how to return to the list and continue looping to get mtcars_new_1 and so on. I have many datasets, and I don't know how many times I should looping through nested data.
mtcars_new <- data.frame()
for(i in 1:nrow(dt)){
if(dt$data_source[[i]] %in% names(dt_list) && !dt$name[[i]] %in% names(dt_list)){
check <- eval(parse(text = dt$data_source[[i]]))
var <- c(unlist(strsplit(dt$variable[[i]],",")))
mtcars_new <- check[, colnames(check) %in% var]
}
}
This will produce the desired output shown. Since the fourth loop uses the data created in the third loop, you need to have a way to append the results of each loop to a growing list of available data sets. Then within each loop find which one is the right starting data set from the available list.
dt <- data.frame(name = c("mtcars","iris", "mtcars_new","mtcars_new_1"),
data_source = c("mtcars","iris", "mtcars","mtcars_new"),
variable = c("","","mpg,cyl,am,hp","mpg,cyl"), stringsAsFactors = FALSE)
input_data_sets <- list(mtcars, iris)
names(input_data_sets) <- c("mtcars","iris")
final_data_sets <- list()
for(i in 1:nrow(dt)) {
available_data_sets <- c(input_data_sets, final_data_sets) #Grows a list of all available data sets
num_to_use <- which(dt$data_source[[i]] == names(available_data_sets)) #finds the right list member to use
temp <- available_data_sets[num_to_use][[1]]
var <- c(unlist(strsplit(dt$variable[[i]],",")))
temp <- list(subset(temp, select = var)) #keep only the desired variables
names(temp) <- dt$name[i] #assign the name provided
final_data_sets <- c(final_data_sets, temp) #add to list of final data sets which will be the output. Anything listed here will become part of the available list in the next loop
}

Rename the same column in a list of identical data frames in r

I'm fairly new to R and I was wondering if someone could help me?
I have a list of identical data frames (df1, df2, ..., df9) and I'm trying to rename one of the columns, 'value', in all the data frames to be 'value_dataframename'- the renamed column in all 9 data frames should be value_df1 in df1, value_df2 in df2, ..., value_df9 in df9.
Any help would be much appreciated!
Below code with example list (auto.list) that does what you want. Run it to check.
To use it for your list:
skip the code till the your.list <- ... line,
save your list as your.list object,
assign to term your "value".
auto.list <- list()
for (i in seq_len(10)) {
auto.list[[i]] <- data.frame("a" = 1:i, "value" = sample(letters, i))
names(auto.list)[i] <- paste0("df", i)
}
your.list <- auto.list # assign to your.list your own list
term <- "value" # assign your own "value"
for (i in seq_along(your.list)) {
colnames(your.list[[i]])[colnames(your.list[[i]]) == term] <- paste0(term, "_", names(your.list)[i])
}
Try this out:
## these two are my sample data frames for this example
df_1 <- data.frame(first = rbinom(10,size = 2,prob = 0.3), second = rnorm(10))
df_2 <- data.frame(first = rbinom(10,size = 2,prob = 0.3), second = rnorm(10))
# R stores data frames as list, so you can retrieve all your data frames thus:
all_df_names = ls.str(mode = "list")
# to check: all_df_names[1] - the first element - will give you "df_1", which is the name of the first data frame
# be careful though - 'ls.str(mode = "list")' will pick ALL the lists currently in your environment
# if you don't want to use this ls method, it might be wiser to manually create a variable 'all_df_names' and put all your data frame names there yourself.
# rename
for(i in 1:length(all_df_names)) {
# get the actual content via its variable name, and store it in a temporary variable 'x'
x = get(all_df_names[i])
# rename the column you want
names(x)[2] = paste0(names(x)[2], "_", i) # this will replace the column with the previous name plus a '_' and the current iteration
# resave that dataframe, with the new content
assign(all_df_names[i], x)
}
# to remove variables we no longer need when done:
# rm(x, i)
# confirm
# names(df_1) = "first" "second_1"
# names(df_2) = "first" "second_2"

Using a loop to select a column names from a list

I've been struggling with column selection with lists in R. I've loaded a bunch of csv's (all with different column names and different number of columns) with the goal of extracting all the columns that have the same name (just phone_number, subregion, and phonetype) and putting them together into a single data frame.
I can get the columns I want out of one list element with this;
var<-data[[1]] %>% select("phone_number","Subregion", "PhoneType")
But I cannot select the columns from all the elements in the list this way, just one at a time.
I then tried a for loop that looks like this:
new.function <- function(a) {
for(i in 1:a) {
tst<-datas[[i]] %>% select("phone_number","Subregion", "PhoneType")
}
print(tst)
}
But when I try:
new.function(5)
I'll only get the columns from the 5th element.
I know this might seem like a noob question for most, but I am struggling to learn lists and loops and R. I'm sure I'm missing something very easy to make this work. Thank you for your help.
Another way you could do this is to make a function that extracts your columns and apply it to all data.frames in your list with lapply:
library(dplyr)
extractColumns = function(x){
select(x,"phone_number","Subregion", "PhoneType")
#or x[,c("phone_number","Subregion","PhoneType")]
}
final_df = lapply(data,extractColumns) %>% bind_rows()
The way you have your loop set up currently is only saving the last iteration of the loop because tst is not set up to store more than a single value and is overwritten with each step of the loop.
You can establish tst as a list first with:
tst <- list()
Then in your code be explicit that each step is saved as a seperate element in the list by adding brackets and an index to tst. Here is a full example the way you were doing it.
#Example data.frame that could be in datas
df_1 <- data.frame("not_selected" = rep(0, 5),
"phone_number" = rep("1-800", 5),
"Subregion" = rep("earth", 5),
"PhoneType" = rep("flip", 5))
# Another bare data.frame that could be in datas
df_2 <- data.frame("also_not_selected" = rep(0, 5),
"phone_number" = rep("8675309", 5),
"Subregion" = rep("mars", 5),
"PhoneType" = rep("razr", 5))
# Datas is a list of data.frames, we want to pull only specific columns from all of them
datas <- list(df_1, df_2)
#create list to store new data.frames in once columns are selected
tst <- list()
#Function for looping through 'a' elements
new.function <- function(a) {
for(i in 1:a) {
tst[[i]] <- datas[[i]] %>% select("phone_number","Subregion", "PhoneType")
}
print(tst)
}
#Proof of concept for 2 elements
new.function(2)

Access variable dataframe in R loop

If I am working with dataframes in a loop, how can I use a variable data frame name (and additionally, variable column names) to access data frame contents?
dfnames <- c("df1","df2")
df1 <- df2 <- data.frame(X = sample(1:10),Y = sample(c("yes", "no"), 10, replace = TRUE))
for (i in seq_along(dfnames)){
curr.dfname <- dfnames[i]
#how can I do this:
curr.dfname$X <- 42:52
#...this
dfnames[i]$X <- 42:52
#or even this doubly variable call
for (j in 1_seq_along(colnames(curr.dfname)){
curr.dfname$[colnames(temp[j])] <- 42:52
}
}
You can use get() to return a variable reference based on a string of its name:
> x <- 1:10
> get("x")
[1] 1 2 3 4 5 6 7 8 9 10
So, yes, you could iterate through dfnames like:
dfnames <- c("df1","df2")
df1 <- df2 <- data.frame(X = sample(1:10), Y = sample(c("yes", "no"), 10, replace = TRUE))
for (cur.dfname in dfnames)
{
cur.df <- get(cur.dfname)
# for a fixed column name
cur.df$X <- 42:52
# iterating through column names as well
for (j in colnames(cur.df))
{
cur.df[, j] <- 42:52
}
}
I really think that this is gonna be a painful approach, though. As the commenters say, if you can get the data frames into a list and then iterate through that, it'll probably perform better and be more readable. Unfortunately, get() isn't vectorised as far as I'm aware, so if you only have a string list of data frame names, you'll have to iterate through that to get a data frame list:
# build data frame list
df.list <- list()
for (i in 1:length(dfnames))
{
df.list[[i]] <- get(dfnames[i])
}
# iterate through data frames
for (cur.df in df.list)
{
cur.df$X <- 42:52
}
Hope that helps!
2018 Update: I probably wouldn't do something like this anymore. Instead, I'd put the data frames in a list and then use purrr:map(), or, the base equivalent, lapply():
library(tidyverse)
stuff_to_do = function(mydata) {
mydata$somecol = 42:52
# … anything else I want to do to the current data frame
mydata # return it
}
df_list = list(df1, df2)
map(df_list, stuff_to_do)
This brings back a list of modified data frames (although you can use variants of map(), map_dfr() and map_dfc(), to automatically bind the list of processed data frames row-wise or column-wise respectively. The former uses column names to join, rather than column positions, and it can also add an ID column using the .id argument and the names of the input list. So it comes with some nice added functionality over lapply()!

Resources