For loop to plyr function - r

I have a character array that holds the column names and values for a row in a data frame. Unfortunately, if the value of a specific entry is zero, the column name and value are not listed in the array. I create my desired data frame with this information, but I rely on a "for loop".
I want to utilize plyr to avoid the for loop in the working code below.
types <- c("one", "two", "three") # My data
entry <- c("one(1)", "three(2)") # My data
values <- function(entry, types)
{
frame<- setNames(as.data.frame(matrix(0, ncol = length(types), nrow = 1)), types)
for(s1 in 1:length(entry))
{
name <- gsub("\\(\\w*\\)", "", entry[s1]) # get name
quantity <- as.numeric(unlist(strsplit(entry[s1], "[()]"))[2]) # get value
frame[1, which(colnames(frame)==name)] <- quantity # store
}
return(frame)
}
values(entry, types) # This is how I want the output to look
I have tried the following to split the array, but I can't figure out how to get adply to return a single row.
types <- c("one", "two", "three") # data
entry <- c("one(1)", "three(2)") # data
frame<- setNames(as.data.frame(matrix(0, ncol = length(types), nrow = 1)), types)
array_split <- function(entry, frame){
name <- gsub("\\(\\w*\\)", "", entry) # get name
quantity <- as.numeric(unlist(strsplit(entry, "[()]"))[2]) # get value
frame[1, which(colnames(frame)==name)] <- quantity # store
return(frame)
}
adply(entry, 1, array_split, frame)
Is there something like cumsum I should be considering? I want to complete the operation quickly.

I'm not sure why you aren't just doing something more like this:
frame <- setNames(rep(0,length(types)),types)
a <- as.numeric(sapply(strsplit(entry,"[()]"),`[[`,2))
names(a) <- gsub("\\(\\w*\\)", "", entry)
frame[names(a)] <- a
Both gsub and strsplit are already vectorized, so there's no real need for explicit loop anywhere. You only need the sapply to extract the second element of the strsplit results. The rest is just regular indexing.

Related

How can lapply work with addressing columns as unknown variables?

So, I have a list of strings named control_for. I have a data frame sampleTable with some of the columns named as strings from control_for list. And I have a third object dge_obj (DGElist object) where I want to append those columns. What I wanted to do - use lapply to loop through control_for list, and for each string, find a column in sampleTable with the same name, and then add that column (as a factor) to a DGElist object. For example, for doing it manually with just one string, it looks like this, and it works:
group <- as.factor(sampleTable[,3])
dge_obj$samples$group <- group
And I tried something like this:
lapply(control_for, function(x) {
x <- as.factor(sampleTable[, x])
dge_obj$samples$x <- x
}
Which doesn't work. I guess the problem is that R can't recognize addressing columns like this. Can someone help?
Here are two base R ways of doing it. The data set is the example of help("DGEList") and a mock up data.frame sampleTable.
Define a vector common_vars of the table's names in control_for. Then create the new columns.
library(edgeR)
sampleTable <- data.frame(a = 1:4, b = 5:8, no = letters[21:24])
control_for <- c("a", "b")
common_vars <- intersect(control_for, names(sampleTable))
1. for loop
for(x in common_vars){
y <- sampleTable[[x]]
dge_obj$samples[[x]] <- factor(y)
}
2. *apply loop.
tmp <- sapply(sampleTable[common_vars], factor)
dge_obj$samples <- cbind(dge_obj$samples, tmp)
This code can be rewritten as a one-liner.
Data
set.seed(2021)
y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
dge_obj <- DGEList(counts=y, group=rep(1:2,each=2))

Iteratively adding a row containing characters and numbers to a dataframe

I have a list containing named elements. I am iterating over the list names, performing the computation for each corresponding element, "encapsulating" the results and the name in a vector and finally adding the vector to a table. The row or vector after each iteration contains a mix of characters and numbers.
The first row is getting added but from the second row onwards there is a problem.
In this example, there is supposed to be one column (first) containing alphanumeric names. All rows after the first one contain NAs.
x <- list(a_1=c(1,2,3), b_2=c(3,4,5), c_3=c(5,1,9))
df <- data.frame()
for(name in names(x))
{
tmp <- x[[name]]
m <- mean(tmp)
s <- sum(tmp)
df <- rbind(df, c(name,m,s))
}
df <- as.data.frame(df)
I know there are possibly more efficient ways but for the moment this is more intuitive for me as it is assuring that each computation is associated with a particular name. There can be several columns and rows and the names are extremely helpful to join tables, query, compare etc. They make it easier to trace back results to a particular element in my original list.
Additionally, I would be glad to know other ways in which the element names are always retained while transforming.
Thankyou!
You have to set stringsAsFactors = FALSE in rbind. With stringsAsFactors = TRUE the first iteration in the loop converts the string variables into factors (with the factor levels being the values).
x <- list(a_1=c(1,2,3), b_2=c(3,4,5), c_3=c(5,1,9))
df <- data.frame()
for(name in names(x))
{
tmp <- x[[name]]
m <- mean(tmp)
s <- sum(tmp)
df <- rbind(df, c(name,m,s), stringsAsFactors = FALSE)
}
An easier solution would be to utilize sapply().
x <- list(a_1=c(1,2,3), b_2=c(3,4,5), c_3=c(5,1,9))
df <- data.frame(name = names(x), m = sapply(x, mean), s = sapply(x, sum))

Replace value from dataframe column with value from keyvalue lookup

I want to replace certain values in a data frame column with values from a lookup table. I have the values in a list, stuff.kv, and many values are stored in the list (but some may not be).
stuff.kv <- list()
stuff.kv[["one"]] <- "thing"
stuff.kv[["two"]] <- "another"
#etc
I have a dataframe, df, which has multiple columns (say 20), with assorted names. I want to replace the contents of the column named 'stuff' with values from 'lookup'.
I have tried building various apply methods, but nothing has worked.
I built a function, which process a list of items and returns the mutated list,
stuff.lookup <- function(x) {
for( n in 1:length(x) ) {
if( !is.null( stuff.kv[[x[n]]] ) ) x[n] <- stuff.kv[[x[n]]]
}
return( x )
}
unlist(lapply(df$stuff, stuff.lookup))
The apply syntax is bedeviling me.
Since you made such a nice lookup table, You can just use it to change the values. No loops or apply needed.
## Sample Data
set.seed(1234)
DF = data.frame(stuff = sample(c("one", "two"), 8, replace=TRUE))
## Make the change
DF$stuff = unlist(stuff.kv[DF$stuff])
DF
stuff
1 thing
2 another
3 another
4 another
5 another
6 another
7 thing
8 thing
Below is a more general solution building on #G5W's answer as it doesn't cover the case where your original data frame has values that don't exist in the lookup table (which would result in length mismatch error):
library(dplyr)
stuff.kv <- list(one = "another", two = "thing")
df <- data_frame(
stuff = rep(c("one", "two", "three"), each = 3)
)
df <- df %>%
mutate(stuff = paste(stuff.kv[stuff]))

Saving dataframes and variables' names within for loop

I am trying to use a for loop to save dataframes and variable names on the way.
I have a data frame called regionmap, one of the variables (Var3) can take thousands different values, among which there are 15 of this form:
"RegionMap *" where * is one of the values of the vector c:
regions <- c("A", "B"........"Z")
I need to run a loop which selects the rows in which each of these values appear, save those rows as a new data frame, transform the relative frequency in a dummy and then merge the new data frame with a bigger one aimed at collecting all of these.
The following code works, I just wanted to know whether it possible to run it 15 times substituting every "A" (both as strings to select and as names of data frames and variables) with other elements of c like in a for loop. (standard for loop does not work)
A <- regionmap[grep("RegionMap A", regionmap$Var3), ]
A$Freq[A$Freq > 1] <- 1
A$Var3 <- NULL
colnames(A) <- c( "name", "date", "RegionMap A")
access_panel <- merge(access_panel, A,by=c("name", "date"))
You don't need to name the variables differently if you are merging everything together anyway - just the column names. Something like this should do the trick...
regions <- c("A", "B"........"Z")
for(x in regions){
mapname <- paste("RegionMap",x,sep=" ") #this is all that needs to change each time
A <- regionmap[grep(mapname, regionmap$Var3), ]
A$Freq[A$Freq > 1] <- 1
A$Var3 <- NULL
colnames(A) <- c( "name", "date", mapname)
if(x=="A") {
access_panel <- A #first one has nothing to merge into
} else {
access_panel <- merge(access_panel, A ,by=c("name", "date"))
}
}

Loop through rows in list of dataframes and extract data. (Nested "apply" functions)

I am new to R and trying to do things the "R" way, which means no for loops. I would like to loop through a list of dataframes, loop through each row in the dataframe, and extract data based on criteria and store in a master dataframe.
Some issues I am having are with accessing the "global" dataframe. I am unsure the best approach (global variable, pass by reference).
I have created an abstract example to try to show what needs to be done:
rm(list=ls())## CLEAR WORKSPACE
assign("last.warning", NULL, envir = baseenv())## CLEAR WARNINGS
# Generate a descriptive name with name and size
generateDescriptiveName <- function(animal.row, animalList.vector){
name <- animal.row["animal"]
size <- animal.row["size"]
# if in list of interest prepare name for master dataframe
if (any(grepl(name, animalList.vector))){
return (paste0(name, "Sz-", size))
}
}
# Animals of interest
animalList.vector <- c("parrot", "cheetah", "elephant", "deer", "lizard")
jungleAnimals <- c("ants", "parrot", "cheetah")
jungleSizes <- c(0.1, 1, 50)
jungle.df <- data.frame(jungleAnimals, jungleSizes)
fieldAnimals <- c("elephant", "lion", "hyena")
fieldSizes <- c(1000, 100, 80)
field.df <- data.frame(fieldAnimals, fieldSizes)
forestAnimals <- c("squirrel", "deer", "lizard")
forestSizes <- c(1, 40, 0.2)
forest.df <- data.frame(forestAnimals, forestSizes)
ecosystems.list <- list(jungle.df, field.df, forest.df)
# Final master list
descriptiveAnimal.df <- data.frame(name = character(), descriptive.name = character())
# apply to all dataframes in list
lapply(ecosystems.list, function(ecosystem.df){
names(ecosystem.df) <- c("animal", "size")
# apply to each row in dataframe
output <- apply(ecosystem.df, 1, function(row){generateDescriptiveName(row, animalList.vector)})
if(!is.null(output)){
# Add generated names to unique master list (no duplicates)
}
})
The end result would be:
name descriptive.name
1 "parrot" "parrot Sz-0.1"
2 "cheetah" "cheetah Sz-50"
3 "elephant" "elephant Sz-1000"
4 "deer" "deer Sz-40"
5 "lizard" "lizard Sz-0.2"
I did not use your function generateDescriptiveName() because I think it is a bit too laborious. I also do not see a reason to use apply() within lapply(). Here is my attempt to generate the desired output. It is not perfect but I hope it helps.
df_list <- lapply(ecosystems.list, function(ecosystem.df){
names(ecosystem.df) <- c("animal", "size")
temp <- ecosystem.df[ecosystem.df$animal %in% animalList.vector, ]
if(nrow(temp) > 0){
data.frame(name = temp$animal, descriptive.name = paste0(temp$animal, " Sz-", temp$size))
}
})
do.call("rbind",df_list)

Resources