Replace list names if they exist - r

I have example data as follows:
# list of data frames:
l = list(a=mtcars, b=mtcars, c=mtcars)
I would like to replace the list names, if they exist in the vector list_names_available_for_name_change with new_list_names.
list_names_available_for_name_change <- c("a", "c")
new_list_names <- c("android", "circus")
I thought of doing something like:
names(l)[names(l) == "a"] <- "android"
But I would like to do this for the entire list. Something like:
names(l)[names(l) == list_names_available_for_name_change ] <- new_list_names
How should I write the syntax to achieve this?
Desired output:
# list of data frames:
l = list(android=mtcars, b=mtcars, circus=mtcars)

In base R, use match to find the matching positions of the 'names' of the list with the subsset of list names, use that to get the corresponding 'new_list_names' and do the assign on the names of the list
nm1 <- new_list_names[match(names(l), list_names_available_for_name_change)]
i1 <- !is.na(nm1)
names(l)[i1] <- nm1[i1]
-output
names(l)
[1] "android" "b" "circus"
Or with mapvalues
names(l) <- plyr::mapvalues(names(l),
list_names_available_for_name_change, new_list_names)

Related

Obtaining a vector with sapply and use it to remove rows from dataframes in a list with lapply

I have a list with dataframes:
df1 <- data.frame(id = seq(1:10), name = LETTERS[1:10])
df2 <- data.frame(id = seq(11:20), name = LETTERS[11:20])
mylist <- list(df1, df2)
I want to remove rows from each dataframe in the list based on a condition (in this case, the value stored in column id). I create an empty vector where I will store the ids:
ids_to_remove <- c()
Then I apply my function:
sapply(mylist, function(df) {
rows_above_th <- df[(df$id > 8),] # select the rows from each df above a threshold
a <- rows_above_th$id # obtain the ids of the rows above the threshold
ids_to_remove <- append(ids_to_remove, a) # append each id to the vector
},
simplify = T
)
However, with or without simplify = T, this returns a matrix, while my desired output (ids_to_remove) would be a vector containing the ids, like this:
ids_to_remove <- c(9,10,9,10)
Because lastly I would use it in this way on single dataframes:
for(i in 1:length(ids_to_remove)){
mylist[[1]] <- mylist[[1]] %>%
filter(!id == ids_to_remove[i])
}
And like this on the whole list (which is not working and I don´t get why):
i = 1
lapply(mylist,
function(df) {
for(i in 1:length(ids_to_remove)){
df <- df %>%
filter(!id == ids_to_remove[i])
i = i + 1
}
} )
I get the errors may be in the append part of the sapply and maybe in the indexing of the lapply. I played around a bit but couldn´t still find the errors (or a better way to do this).
EDIT: original data has 70 dataframes (in a list) for a total of 2 million rows
If you are using sapply/lapply you want to avoid trying to change the values of global variables. Instead, you should return the values you want. For example generate a vector if IDs to remove for each item in the list as a list
ids_to_remove <- lapply(mylist, function(df) {
rows_above_th <- df[(df$id > 8),] # select the rows from each df above a threshold
rows_above_th$id # obtain the ids of the rows above the threshold
})
And then you can use that list with your data list and mapply to iterate the two lists together
mapply(function(data, ids) {
data %>% dplyr::filter(!id %in% ids)
}, mylist, ids_to_remove, SIMPLIFY=FALSE)
Using base R
Map(\(x, y) subset(x, !id %in% y), mylist, ids_to_remove)

How can lapply work with addressing columns as unknown variables?

So, I have a list of strings named control_for. I have a data frame sampleTable with some of the columns named as strings from control_for list. And I have a third object dge_obj (DGElist object) where I want to append those columns. What I wanted to do - use lapply to loop through control_for list, and for each string, find a column in sampleTable with the same name, and then add that column (as a factor) to a DGElist object. For example, for doing it manually with just one string, it looks like this, and it works:
group <- as.factor(sampleTable[,3])
dge_obj$samples$group <- group
And I tried something like this:
lapply(control_for, function(x) {
x <- as.factor(sampleTable[, x])
dge_obj$samples$x <- x
}
Which doesn't work. I guess the problem is that R can't recognize addressing columns like this. Can someone help?
Here are two base R ways of doing it. The data set is the example of help("DGEList") and a mock up data.frame sampleTable.
Define a vector common_vars of the table's names in control_for. Then create the new columns.
library(edgeR)
sampleTable <- data.frame(a = 1:4, b = 5:8, no = letters[21:24])
control_for <- c("a", "b")
common_vars <- intersect(control_for, names(sampleTable))
1. for loop
for(x in common_vars){
y <- sampleTable[[x]]
dge_obj$samples[[x]] <- factor(y)
}
2. *apply loop.
tmp <- sapply(sampleTable[common_vars], factor)
dge_obj$samples <- cbind(dge_obj$samples, tmp)
This code can be rewritten as a one-liner.
Data
set.seed(2021)
y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
dge_obj <- DGEList(counts=y, group=rep(1:2,each=2))

Add different suffix to column names on multiple data frames in R

I'm trying to add different suffixes to my data frames so that I can distinguish them after I've merge them. I have my data frames in a list and created a vector for the suffixes but so far I have not been successful.
data2016 is the list containing my 7 data frames
new_names <- c("june2016", "july2016", "aug2016", "sep2016", "oct2016", "nov2016", "dec2016")
data2016v2 <- lapply(data2016, paste(colnames(data2016)), new_names)
Your query is not quite clear. Therefore two solutions.
The beginning is the same for either solution. Suppose you have these four dataframes:
df1x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df2x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
df3x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df4x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
Suppose further you assemble them in a list, something akin to your data2016using mgetand ls and describing a pattern to match them:
my_list <- mget(ls(pattern = "^df\\d+x$"))
The names of the dataframes in this list are the following:
names(my_list)
[1] "df1x" "df2x" "df3x" "df4x"
Solution 1:
Suppose you want to change the names of the dataframes thus:
new_names <- c("june2016", "july2016","aug2016", "sep2016")
Then you can simply assign new_namesto names(my_list):
names(my_list) <- new_names
And the result is:
names(my_list)
[1] "june2016" "july2016" "aug2016" "sep2016"
Solution 2:
You want to add the new_names literally as suffixes to the 'old' names, in which case you would use pasteor paste0 thus:
names(my_list) <- paste0(names(my_list), "_", new_names)
And the result is:
names(my_list)
[1] "df1x_june2016" "df2x_july2016" "df3x_aug2016" "df4x_sep2016"
You could use an index number within lapply to reference both the list and your vector of suffixes. Because there are a couple steps, I'll wrap the process in a function(). (Called an anonymous function because we aren't assigning a name to it.)
data2016v2 <- lapply(1:7, function(i) {
this_data <- data2016[[i]] # Double brackets for a list
names(this_data) <- paste0(names(this_data), new_names[i]) # Single bracket for vector
this_data # The renamed data frame to be placed into data2016v2
})
Notice in the paste0() line we are recycling the term in new_names[i], so for example if new_names[i] is "june2016" and your first data.frame has columns "A", "B", and "C" then it would give you this:
> paste0(c("A", "B", "C"), "june2016")
[1] "Ajune2016" "Bjune2016" "Cjune2016"
(You may want to add an underscore in there?)
As an aside, it sounds like you might be better served by adding the "june2016" as a column in your data (like say a variable named month with "june2016" as the value in each row) and combining your data using something like bind_rows() from the dplyr package, running it "long" instead of "wide".

How to use an if...else statement in R?

I'm trying to create a variable using an if-statement. I want to check whether variable "st" exists in the dataframes in the list of dataframes "dflist", and if it doesn't exist I want to create variable "st". I tried to do it like this(however, it doens't work):
#making list of dataframes, and reading them into r
mylist = list.files(pattern="*.dta")
dflist <- lapply(mylist, read.dta13)
# if "st" exists in every dataframe in dflist, return "yes", else if it doesn't exist in a particular dataframe, create variable "st" in those dataframes
if(exists(st, dflist)){
"yes"
} else{
st <- c("total")
dflist$st <- st
}
We can use lapply to loop over the list and create a column in the 'data.frame' if 'st' is not there.
dflist1 <- lapply(dflist, function(x) if(!exists("st", x))
transform(x, st = "total") else x)
data
dflist <- list(data.frame(v1 = 1:5), data.frame(st = 1:6))

data.frame from lists in list, weird column names

I'm trying to make a data.frame from a "list in list"
l <- list(c("sam1", "GSM6683", "GSM6684", "GSM6687", "GSM6688"), c("sam2",
"GSM6681", "GSM6682", "GSM6685", "GSM6686"))
df <- data.frame(l)
1) I get a date.frame with weird column names, how can I avoid it?
2) I'd like to get the column names from the first element of the inner list in list
like so:
column names: sam1, sam2
row1 GSM6683 GSM6681
row2 GSM6684 GSM6682
row3 GSM6687 GSM6685
row4 GSM6688 GSM6686
You were almost there, since you want sam1 and sam2 to be column names you don't need to make them part of you list and specify they are column names.
>l <- list(c("GSM6683", "GSM6684", "GSM6687", "GSM6688"), c(
"GSM6681", "GSM6682", "GSM6685", "GSM6686"))
>df <- data.frame(l)
>colnames(df)<-c("sam1", "sam2")
If you're starting with the data structure in your example, do this:
df <- data.frame(lapply(l, function(x) x[-1]))
names(df) <- lapply(l, function(x) x[1])
If you have a choice on how to construct the data structure, do what R_Newbie says in his answer.

Resources