I have a series of repeated IDs that I would like to assign to groups with fix size. The subject IDs repeat with different frequencies for example:
# Example Data
ID = c(101,102,103,104)
Repeats = c(2,3,1,3)
Data = data.frame(ID,Repeats)
> head(Data)
ID Repeats
1 101 2
2 102 3
3 103 1
4 104 3
I would like the same repeated ID to stay within the same group. However, each group has a fixed capacity (say 3 only). For example, in my desired output matrix each group can only accommodate 3 IDs:
# Create empty data frame for group annotation
# Add 3 rows in order to have more space for IDs
# Some groups will have NAs due to keeping IDs together (I'm OK with that)
Target = data.frame(matrix(NA,nrow=(sum(Data$Repeats)+3),
ncol=dim(Data)[2]))
names(Target)<-c("ID","Group")
Target$Group<-rep(1:3)
Target$Group<-sort(Target$Group)
> head(Target)
ID Group
1 NA 1
2 NA 1
3 NA 1
4 NA 1
5 NA 2
6 NA 2
I can loop each ID to my Target data frame but this does not guarantee that repeated IDs will stay together in the same group:
# Loop repeated IDs the groups
IDs.repeat = rep(Data$ID, times=Data$Repeats)
# loop IDs to Targets to assign IDs to groups
for (i in 1:length(IDs.repeat))
{
Target$ID[i]<-IDs.repeat[i]
}
In my example in the loop above I get the same ID (102) across two different groups (1 and 2), I would like to avoid this!:
> head(Target)
ID Group
1 101 1
2 101 1
3 102 1
4 102 1
5 102 2
6 103 2
Instead I want the output to look the code to put NA if there is no space for that ID in that group.
> head(Target)
ID Group
1 101 1
2 101 1
3 NA 1
4 NA 1
5 102 2
6 102 2
Anyone has a solution for IDs to stay within the same group if there is sufficient space after assigning ID i?
I think that I need a loop and count NAs within that group and see if the NAs>= to the length of that unique ID. However, I don't know how to implement this simultaneously. Maybe nesting another loop for the j group?
Any help with the loop will be appreciated immensely!
Here's one solution,
## This is the data.frame I'll try to match
target <- data.frame(
ID = c(
rep(101, 2),
rep(102, 3),
rep(103, 1),
rep(104, 3)),
Group = c(
rep(1L, 6), # "L", like 1L makes it an int type rather than numeric
rep(2L, 3)
)
)
print(target)
## Your example data
ID = c(101,102,103,104)
Repeats = c(2,3,1,3)
Data = data.frame(ID,Repeats)
head(Data)
ids_to_group <- 3 # the number of ids per group is specified here.
Data$Group <- sort(
rep(1:ceiling(length(Data$ID) / ids_to_group),
ids_to_group))[1:length(Data$ID)]
# The do.call(rbind, lapply(x = a series, FUN = function(x) { }))
# pattern is a really useful way to stack data.frames
# lapply is basically a fancy for-loop check it out by sending
# ?lapply to the console (to view the help page).
output <- do.call(
rbind,
lapply(unique(Data$ID), FUN = function(ids) {
print(paste(ids, "done.")) # I like to put print statements to follow along
obs <- Data[Data$ID == ids, ]
data.frame(ID = rep(obs$ID, obs$Repeats))
})
)
output <- merge(output, Data[,c("ID", "Group")], by = "ID")
identical(target, output) # returns true if they're equivalent
# For example inspect each with:
str(target)
str(output)
Related
Problem
In some health datasets, a column may categorize various disease manifestations of interest for individual cases. In some summaries it is beneficial to tabulate various combinations of these manifestations, including counting if a given case had 'greater than' or 'less than' a selection of key manifestations.
In SAS, a column can be assigned a multilabel format, which can allow various overlapping categories to be summarized at the same time during procedure steps. I have struggled to find a satisfactory solution in R that replicates this feature from SAS. I am aware that a combination of dplyr or base functions chained together can tabulate and append different combinations, effectively creating a dataset that duplicates rows needed for representing all overlapping levels.
Aim
To create a function that allows for easy creation of a dataset that considers various overlapping levels of a target category. This would allow for the transformation of the example data provided below into a new dataset that appends the correct rows, and can provide checks within groups to see if a certain grouping matches all the desired levels to be considered part of a new grouping.
library(tibble)
# Example data (Repeat groups)
exampleData <- tibble(group = c(1, 1, 1, 2, 3, 3),
condition = factor(c('A', 'B', 'C', 'A', 'B', 'Q'), ordered = F))
# Initial output
# A tibble: 6 x 2
group condition
<dbl> <fct>
1 1 A
2 1 B
3 1 C
4 2 A
5 3 B
6 3 Q
# Function to add new level combinations, based upon the levels within each group.
create_multilevelFactor(exampleData , target_col = 'condition', group_col = 'group', new_levels = list('AB' = c('A', 'B'), 'QB' = c('Q', 'B')))
# Desired output
# A tibble: 8 x 3
group condition track_col
<dbl> <chr> <dbl>
1 1 A 1
2 1 B 1
3 1 C 1
4 2 A 1
5 3 B 1
6 3 Q 1
7 1 AB 2
8 3 QB 3
You will note that the original factor levels persist, and the groups that contained the correct levels in the named list will form a new row if the combination exists. In more realistic examples, the grouping for AB could be considered as group 1 having 'at least A or B disease manifestations'.
Challenge
I suspect that others may have a similar need for this function and like me, are either ignorant of a simpler approach or have not come across an existing solution that is easy to use. During my thought process for this question, I have created a function (trying to use base R primarily) that, albeit inelegantly, creates the aforementioned desired output.
It is my hope that others can provide a more ideal solution using an alternative approach or increase robustness and wider applicability of the function.
The following function provides a working, albeit inelegant, solution to the problem. I tend to overthink processes, which is likely reflected in the answer here.
This function will take in the initial dataset, and based upon if a grouping function is provided, it will create a new dataset with additional rows for various combinations of aggregated factor levels if those levels existed within the groupings. Various new levels can be provided as a list, and an additional column makes it easy to see which new levels were added in addition to the original rows.
#-----------------------------------------------------------#
# Create function for multilevel labelling of factor groups #
#-----------------------------------------------------------#
# target_col is a character string for the column of interest to be adjusted
# group_col is a character string for the column to check levels that exist within groupings
# new_levels is a list that uses name and value pairs to determine how new levels should be aggregated
# collapse will ensure that only unique combinations of the new level is appended
# track will add a flag to ensure one can easily see the new combinations that were appended
create_multilevelFactor <- function(data, target_col, new_levels , group_col, collapse = T, track = T) {
#
# Do some basic checks on inputs
#
# Check if new_levels is provided as a list
if(!is.list(new_levels)) stop('The provided set of levels is not in a list format, please provide as a list')
# Check if target_col is a factor
if(!is.factor(data[[target_col]])) stop('The target column for multiple levels is not a factor, convert to a factor before proceeding.')
# Check if levels are in list
for(i in 1:length(new_levels)) {
if(length(setdiff(levels(factor(new_levels[[i]])),
levels(factor(data[[target_col]])))) > 0) { # If levels in provided list contain a level not in the column, then throw error
stop('Levels in list do not match the levels in the target column')
}
}
# State if grouping col was provided and its purpose
if(!missing(group_col)) { message(paste0('The following column is used as a grouping variable for summarizing multilevel factoring: ',
group_col, '. If you do not want labels determined by those within groupings, leave argument blank.'))
}
#
# Main
#
# Set new column for tracking if desired
if(track == T) {track_col <- rep(NA,nrow(data)); data$track_col <- 1; trackColIndex <- 1;}
OutData <- as.data.frame(NULL) # Empy data frame to fill and append later
# Loop for all new levels of interest to add
for(i in 1:length(new_levels)){
tempData <- data # Look at fresh data every pass
levelIndex <- which(levels(tempData[[target_col]]) %in% new_levels [[i]]) # Index of matches
# If grouping provided, do necessary splits and rbinds
if(!missing(group_col)) {
tempData <- split(tempData, tempData[[group_col]]) # Split if there are groupings
tempData <- lapply(tempData, function(x) {
if(!(length(setdiff(levels(factor(new_levels [[i]])), levels(factor(x[[target_col]])))) > 0)) { # If the grouping does not have all the levels for the new grouping, then do nothing
levels(x[[target_col]])[levelIndex] <- names(new_levels )[i]
x
}
})
tempData <- do.call(rbind, tempData) # If didnt match necessary group conditions, will bring back empty
rownames(tempData) <- NULL # Correct row names for tibble
} else { # If not grouping
levels(tempData[[target_col]])[levelIndex] <- names(new_levels )[i]
}
tempData <- tempData[tempData[[target_col]] %in% names(new_levels )[i],] # Only keep new factor levels (could be empty if no group matches)
if(collapse == T) tempData <- unique(tempData[(tempData[[target_col]] %in% names(new_levels )[i]),]) # Collapse to unique combinations if desired
if(track == T){track_col <- rep(NA, nrow(tempData)); tempData$track_col <- trackColIndex+1; trackColIndex <- trackColIndex+1;} # Add track column to the new rows
OutData <- suppressWarnings(dplyr::bind_rows(OutData, tempData)) # Append all the new rows
}
# Append new rows to the original rows
OutData <- suppressWarnings(dplyr::bind_rows(data, OutData)) #
return(OutData)
}
Using the example data initially provided, this can produce the following outputs:
#Original data
library(tibble)
# Example data (Repeat groups)
exampleData <- tibble(group = c(1, 1, 1, 2, 3, 3),
condition = factor(c('A', 'B', 'C', 'A', 'B', 'Q'), ordered = F))
# Original data
# A tibble: 6 x 2
group condition
<dbl> <fct>
1 1 A
2 1 B
3 1 C
4 2 A
5 3 B
6 3 Q
##################
newData <- create_multilevelFactor(exampleData,
target_col = 'condition',
group_col = 'group',
new_levels = list('AB' = c('A', 'B'), 'QB' = c('Q', 'B')),
collapse = T, track = T)
newData
# Data with grouping argument
# A tibble: 8 x 3
group condition track_col
<dbl> <chr> <dbl>
1 1 A 1
2 1 B 1
3 1 C 1
4 2 A 1
5 3 B 1
6 3 Q 1
7 1 AB 2
8 3 QB 3
addmargins(table(newData$group,newData$condition))
A AB B C Q QB Sum
1 1 1 1 1 0 0 4
2 1 0 0 0 0 0 1
3 0 0 1 0 1 1 3
Sum 2 1 2 1 1 1 8
newData <- create_multilevelFactor(exampleData,
target_col = 'condition',
new_levels = list('AB' = c('A', 'B'), 'QB' = c('Q', 'B')),
collapse = T, track = T)
newData
# Without grouping argument
# A tibble: 11 x 3
group condition track_col
<dbl> <chr> <dbl>
1 1 A 1
2 1 B 1
3 1 C 1
4 2 A 1
5 3 B 1
6 3 Q 1
7 1 AB 2
8 2 AB 2
9 3 AB 2
10 1 QB 3
11 3 QB 3
newData <- create_multilevelFactor(exampleData,
target_col = 'condition',
new_levels = list('AB' = c('A', 'B'), 'QB' = c('Q', 'B')),
collapse = F, track = T)
newData
# Without collapse and grouping argument
# A tibble: 13 x 3
group condition track_col
<dbl> <chr> <dbl>
1 1 A 1
2 1 B 1
3 1 C 1
4 2 A 1
5 3 B 1
6 3 Q 1
7 1 AB 2
8 1 AB 2
9 2 AB 2
10 3 AB 2
11 1 QB 3
12 3 QB 3
13 3 QB 3
Thanks for the feedback, below is a reproducible example with my desire output:
# Example Data where I would like my output
N=24
school.assignment = matrix(NA, ncol = 3, nrow = N)
school.assignment = as.data.frame(school.assignment)
colnames(school.assignment) <- c("ID","Group","Assignment")
# Number of groups and assigments per group
groups = 6
Assignment = 4
school.assignment$Group<-rep(1:groups,Assignment)
school.assignment$Group<- sort(school.assignment$Group)
school.assignment$Assignment<-rep(1:Assignment)
# IDs with number of repeats (i.e repeated students)
Data = matrix(0, ncol = 2, nrow = N/2) # ~half with repeated samples
Data = as.data.frame(Data)
colnames(Data) <- c("ID","Repeats")
Data$ID <-1:(N/2)
length(unique(Data$ID)) # unique IDS
ID=rep(seq(1:8),3)
# Genearte random repeats for each ID
Data$Repeats<-{set.seed(55)
sapply(1:(N/2),
function(x) sample(1:5,1))
}
Data=Data[-1,] #take out first row to match N=24
sum(Data$Repeats) #24 total IDs for all assigments
# List of IDs at random to use
IDs <- vector("list",dim(Data)[1]) #
for (i in 1:dim(Data)[1])
{
IDs[[i]]<-rep(Data$ID[i], times=Data$Repeats[i])
}
head(IDs)
# Object with number of repeated IDs
sample.per.ID <- vector("list",length(IDs)[1])
for (i in 1:length(IDs))
{
sample.per.ID[[i]]<-sum(length((IDs)[[i]]))
}
sum=sum(as.data.frame(sample.per.ID)); sum # 24 total IDs (including repeats)
## Unlist Vector with ransom sequence of samples
SRS.ID.order = unlist(IDs) #order of IDs with repeats
for (i in 1:length(SRS.ID.order ))
{
school.assignment$ID[i]<-SRS.ID.order [i]
}
My last loop is where I attempt to assign IDs to my matrix of school.assignment$ID. However, as you can see some IDs cross different groups and I want to condition ID assignment from the SRS.ID.order to stay within the same group (i.e. constant school.assignment$Group, below you can see that this is not the case, for example ID 4 is in group 1 and 2)
> head(school.assignment)
ID Group Assignment
1 2 1 1
2 2 1 2
3 3 1 3
4 4 1 4
5 4 2 1
6 4 2 2
I would like the output of the loop to don't assign any ID (i.e. NA) to that group if the next school.assignment$ID length is longer than the space available in that group.
ID Group Assignment
1 2 1 1
2 2 1 2
3 3 1 3
4 NA 1 4
5 4 2 1
6 4 2 2
I was thinking that I need some type of indicator for the J group like this code below:
########################################
for (i in 1:length(school.assignment$ID))
{
for (j in 1:length(unique(school.assignment$Group)))
{
school.assignment$ID[i]<-ifelse(sum(is.na(school.assignment$ID[i,j]))>=sample.per.ID[i],SRS.ID.order[i],NA)
}
}
Error in school.assignment$ID[i, j] : incorrect number of dimensions
Any help is very much appreciated!
Thanks
OLD POST
I'm currently trying to do a loop in R with a a condition. My data structure is below:
> head(school.assignment)
ID Group Assignment
1 NA 1 1
2 NA 1 2
3 NA 1 3
4 NA 1 4
5 NA 2 1
6 NA 2 2
I would like to assign an ID of the same length as school.assignment to the ID variable shown below:
head(IDs)
[1] 519 519 519 343 251 251...
Not all IDs repeat the same amount of times some 1,2 or even 3 times as shown above. I have an object with the number of repeats per ID, for example:
> head(repeats)
[1] 3 1 2...
Indicating that ID=519 repeats 3 times, ID=343 only once ad ID=251 2 times etc...
There is one condition that I would like to meet:
1) I would like every single ID to be in the same group whenever possible (i.e. if there is only one spot (NA) left for ID in the matrix object "school.assignment" for group 1 then assign the ID to the next group where they will be enough spaces (i.e where NA for school.assignment$ID is >= to repeats for that ID)
My idea was to do a loop but the code below is not working:
########################################
for (i in 1:length(school.assignment$ID))
{
for (j in 1:length(unique(school.assignment$Group)))
{
school.assignment$ID[i]<-ifelse(sum(is.na(school.assignment$ID[i,j]))>=repeats[i],ID[i],NA)
}
}
Is there a way to do this loop while respecting my condition to assign IDs to only one group?
Thank you!
Consider using merge() to assign random group IDs to data frame. No need for nested for loops. Below creates a unique group data frame, assigns random numbers there, and then merges with school.assignment:
# CREATE UNIQUE GROUP DATA FRAME
Group <- unique(school.assignment$Group)
grp.ids <- as.data.frame(Group)
# CREATE RANDOM ID FIELD (THREE DIGITS BETWEEN 100 AND 999)
grp.ids$RandomID <- sample(100:999, size = nrow(grp.ids), replace = TRUE)
# MERGE DATA FRAMES
school.assignment <- merge(school.assignment, grp.ids, by="Group", all=TRUE)
# ASSIGN ID COLUMN
school.assignment$ID <- school.assignment$RandomID
# RESTRUCTURE FINAL DATA FRAME
school.assignment <- school.assignment[c("ID", "Group", "Assignment")]
OUTPUT
ID Group Assignment
977 1 1
977 1 2
977 1 3
977 1 4
368 2 1
368 2 2
Given
index = c(1,2,3,4,5)
codes = c("c1","c1,c2","","c3,c1","c2")
df=data.frame(index,codes)
df
index codes
1 1 c1
2 2 c1,c2
3 3
4 4 c3,c1
5 5 c2
How can I create a new df that looks like
df1
index codes
1 1 c1
2 2 c1
3 2 c2
4 3
5 4 c3
6 4 c1
7 5 c2
so that I can perform aggregates on the codes? The "index" of the actual data set are a series of timestamps, so I'll want to aggregate by day or hour.
The method of Roland is quite good, provided the variable index has unique keys. You can gain some speed by working with the lists directly. Take into account that :
in your original data frame, codes is a factor. No point in doing that, you want it to be character.
in your original data frame, "" is used instead of NA. As the length of that one is 0, you can get in all kind of trouble later on. I'd use NA there. " " is an actual value, "" is no value at all, but you want a missing value. Hence NA.
So my idea would be:
The data:
index = c(1,2,3,4,5)
codes = c("c1","c1,c2",NA,"c3,c1","c2")
df=data.frame(index,codes,stringsAsFactors=FALSE)
Then :
X <- strsplit(df$codes,",")
data.frame(
index = rep(df$index,sapply(X,length)),
codes = unlist(X)
)
Or, if you insist on using "" instead of NA:
X <- strsplit(df$codes,",")
ll <- sapply(X,length)
X[ll==0] <- NA
data.frame(
index = rep(df$index,pmax(1,ll)),
codes = unlist(X)
)
Neither of both methods assume a unique key in index. They work perfectly well with non-unique timestamps.
You need to split the string (using strsplit) and then combine the resulting list with the data.frame.
The following relies on the assumption that codes are unique in each row. If you have many codes in some rows and only few in others, this might waste a lot of RAM and it might be better to loop.
#to avoid character(0), which would be omitted in rbind
levels(df$codes)[levels(df$codes)==""] <- " "
#rbind fills each row by propagating the values to the "empty" columns for each row
df2 <- cbind(df, do.call(rbind,strsplit(as.character(df$codes),",")))[,-2]
library(reshape2)
df2 <- melt(df2, id="index")[-2]
#here the assumtion is needed
df2 <- df2[!duplicated(df2),]
df2[order(df2[,1], df2[,2]),]
# index value
#1 1 c1
#2 2 c1
#7 2 c2
#3 3
#9 4 c1
#4 4 c3
#5 5 c2
Here's another alternative using "data.table". The sample data includes NA instead of a blank space and includes duplicated index values:
index = c(1,2,3,2,4,5)
codes = c("c1","c1,c2",NA,"c3,c1","c2","c3")
df = data.frame(index,codes,stringsAsFactors=FALSE)
library(data.table)
## We could create the data.table directly, but I'm
## assuming you already have a data.frame ready to work with
DT <- data.table(df)
DT[, list(codes = unlist(strsplit(codes, ","))), by = "index"]
# index codes
# 1: 1 c1
# 2: 2 c1
# 3: 2 c2
# 4: 2 c3
# 5: 2 c1
# 6: 3 NA
# 7: 4 c2
# 8: 5 c3
I have an aggregation problem which I cannot figure out how to perform efficiently in R.
Say I have the following data:
group1 <- c("a","b","a","a","b","c","c","c","c",
"c","a","a","a","b","b","b","b")
group2 <- c(1,2,3,4,1,3,5,6,5,4,1,2,3,4,3,2,1)
value <- c("apple","pear","orange","apple",
"banana","durian","lemon","lime",
"raspberry","durian","peach","nectarine",
"banana","lemon","guava","blackberry","grape")
df <- data.frame(group1,group2,value)
I am interested in sampling from the data frame df such that I randomly pick only a single row from each combination of factors group1 and group2.
As you can see, the results of table(df$group1,df$group2)
1 2 3 4 5 6
a 2 1 2 1 0 0
b 2 2 1 1 0 0
c 0 0 1 1 2 1
shows that some combinations are seen more than once, while others are never seen. For those that are seen more than once (e.g., group1="a" and group2=3), I want to randomly pick only one of the corresponding rows and return a new data frame that has only that subset of rows. That way, each possible combination of the grouping factors is represented by only a single row in the data frame.
One important aspect here is that my actual data sets can contain anywhere from 500,000 rows to >2,000,000 rows, so it is important to be mindful of performance.
I am relatively new at R, so I have been having trouble figuring out how to generate this structure correctly. One attempt looked like this (using the plyr package):
choice <- function(x,label) {
cbind(x[sample(1:nrow(x),1),],data.frame(state=label))
}
df <- ddply(df[,c("group1","group2","value")],
.(group1,group2),
pick_junc,
label="test")
Note that in this case, I am also adding an extra column to the data frame called "label" which is specified as an extra argument to the ddply function. However, I killed this after about 20 min.
In other cases, I have tried using aggregate or by or tapply, but I never know exactly what the specified function is getting, what it should return, or what to do with the result (especially for by).
I am trying to switch from python to R for exploratory data analysis, but this type of aggregation is crucial for me. In python, I can perform these operations very rapidly, but it is inconvenient as I have to generate a separate script/data structure for each different type of aggregation I want to perform.
I want to love R, so please help! Thanks!
Uri
Here is the plyr solution
set.seed(1234)
ddply(df, .(group1, group2), summarize,
value = value[sample(length(value), 1)])
This gives us
group1 group2 value
1 a 1 apple
2 a 2 nectarine
3 a 3 banana
4 a 4 apple
5 b 1 grape
6 b 2 blackberry
7 b 3 guava
8 b 4 lemon
9 c 3 durian
10 c 4 durian
11 c 5 raspberry
12 c 6 lime
EDIT. With a data frame that big, you are better off using data.table
library(data.table)
dt = data.table(df)
dt[,list(value = value[sample(length(value), 1)]),'group1, group2']
EDIT 2: Performance Comparison: Data Table is ~ 15 X faster
group1 = sample(letters, 1000000, replace = T)
group2 = sample(LETTERS, 1000000, replace = T)
value = runif(1000000, 0, 1)
df = data.frame(group1, group2, value)
dt = data.table(df)
f1_dtab = function() {
dt[,list(value = value[sample(length(value), 1)]),'group1, group2']
}
f2_plyr = function() {ddply(df, .(group1, group2), summarize, value =
value[sample(length(value), 1)])
}
f3_by = function() {do.call(rbind,by(df,list(grp1 = df$group1,grp2 = df$group2),
FUN = function(x){x[sample(nrow(x),1),]}))
}
library(rbenchmark)
benchmark(f1_dtab(), f2_plyr(), f3_by(), replications = 10)
test replications elapsed relative
f1_dtab() 10 4.764 1.00000
f2_plyr() 10 68.261 14.32851
f3_by() 10 67.369 14.14127
One more way:
with(df, tapply(value, list( group1, group2), length))
1 2 3 4 5 6
a 2 1 2 1 NA NA
b 2 2 1 1 NA NA
c NA NA 1 1 2 1
# Now use tapply to sample withing groups
# `resample` fn is from the sample help page:
# Avoids an error with sample when only one value in a group.
resample <- function(x, ...) x[sample.int(length(x), ...)]
#Create a row index
df$idx <- 1:NROW(df)
rowidxs <- with(df, unique( c( # the `c` function will make a matrix into a vector
tapply(idx, list( group1, group2),
function (x) resample(x, 1) ))))
rowidxs
# [1] 1 5 NA 12 16 NA 3 15 6 4 14 10 NA NA 7 NA NA 8
df[rowidxs[!is.na(rowidxs)] , ]
I need to get the maximum of a variable in a nested list. For a certain station number "s" and a certain member "m", mylist[[s]][[m]] are of the form:
station date.time member bias
6019 2011-08-06 12:00 mbr003 86
6019 2011-08-06 13:00 mbr003 34
For each station, I need to get the maximum of bias of all members. For s = 3, I managed to do it through:
library(plyr)
var1 <- mylist[[3]]
var2 <- lapply(var1, `[`, 4)
var3 <- laply(var2, .fun = max)
max.value <- max(var3)
Is there a way of avoiding the column number "4" in the second line and using the variable name $bias in lapply or a better way of doing it?
You can use [ with the names of columns of data frames as well as their index. So foo[4] will have the same result as foo["bias"] (assuming that bias is the name of the fourth column).
$bias isn't really the name of that column. $ is just another function in R, like [, that is used for accessing columns of data frames (among other things).
But now I'm going to go out on a limb and offer some advice on your data structure. If each element of your nested list contains the data for a unique combination of station and member, here is a simplified toy version of your data:
dat <- expand.grid(station = rep(1:3,each = 2),member = rep(1:3,each = 2))
dat$bias <- sample(50:100,36,replace = TRUE)
tmp <- split(dat,dat$station)
tmp <- lapply(tmp,function(x){split(x,x$member)})
> tmp
$`1`
$`1`$`1`
station member bias
1 1 1 87
2 1 1 82
7 1 1 51
8 1 1 60
$`1`$`2`
station member bias
13 1 2 64
14 1 2 100
19 1 2 68
20 1 2 74
etc.
tmp is a list of length three, where each element is itself a list of length three. Each element is a data frame as shown above.
It's really much easier to record this kind of data as a single data frame. You'll notice I constructed it that way first (dat) and then split it twice. In this case you can rbind it all together again using code like this:
newDat <- do.call(rbind,lapply(tmp,function(x){do.call(rbind,x)}))
rownames(newDat) <- NULL
In this form, these sorts of calculations are much easier:
library(plyr)
#Find the max bias for each unique station+member
ddply(newDat,.(station,member),summarise, mx = max(bias))
station member mx
1 1 1 87
2 1 2 100
3 1 3 91
4 2 1 94
5 2 2 88
6 2 3 89
7 3 1 74
8 3 2 88
9 3 3 99
#Or maybe the max bias for each station across all members
ddply(newDat,.(station),summarise, mx = max(bias))
station mx
1 1 100
2 2 94
3 3 99
Here is another solution using repeated lapply.
lapply(tmp, function(x) lapply(lapply(x, '[[', 'bias'), max))
You may need to use [[ instead of [, but it should work fine with a string (don't use the $). try:
var2 <- lapply( var1, `[`, 'bias' )
or
var2 <- lapply( var1, `[[`, 'bias' )
depending on if var1 is a list.