Finding longest length out of 3 different vectors in R - r

I do not know if there is a function for this but I have 3 dataframes with different lengths. I was wondering if there is a way to find which one is the largest length and load that into a variable. For example:
x <- c(1:10)
y <- c(1:20)
z <- c(1:40)
I would want to use z as my variable because it has the longest length. Is there a function that can search through these 3 variables (x,y,z) and give me back the one with the longest length?
Thanks

We can place it in a list, use lengths to create an index of maximum length and extract those element from the list
lst[which.max(lengths(lst))]
data
lst <- list(x, y, z)

if you have dataframe and not vectors:
lst[which.max(sapply(lst,nrow))]
data
lst <- list(df1, df2, df3)

Related

R: Creating data frame from list of values and list of variable names

I have two lists, A and B:
List A contains K character vectors of length W. Each vector contains the same W string values but the indices of the strings may differ. We can think of this list in practice as containing vector of variable names, where each vector contains the same variable names but in potentially-differing orders..
List B contains K character vectors of length W. Each vector can contain W arbitrary values. We can think of this list in practice as containing vectors with the corresponding values of the variables contained in each vector of List A.
I am trying to generate a data frame that is K rows long and W rows wide, where the column names are the W unique values in each vector in List A and the values for each row are drawn from the vector found in that row's index in List B.
I've been able to do this (minimal working example below) but it seems very hackish because it basically involves turning the two lists into data frames and then assigning values from one as column names for the other in a loop.
Is there a way to skip the steps of turning each list into a data frame before then using a loop to combine them? Looping through the lists seems inefficient, as does generating the two data frames rather than a single data frame that draws on contents of both lists.
# Declare number of rows and columns
K <- 10
W <- 5
colnames_set <- sample(LETTERS, W)
# Generate example data
# List A: column names
list_a <- vector(mode = "list", length = K)
list_a <- lapply(list_a, function(x) x <- sample(colnames_set, W))
# List B: values
list_b <- vector(mode = "list", length = K)
list_b <- lapply(list_b, function(x) x <- rnorm(n = W))
# Define function to take a vector and turn it into a
# data frame where each element of the vector is
# assigned to its own colun
vec2df <- function(x) {
x %>%
as.data.frame(., stringsAsFactors = FALSE) %>%
t() %>%
as.data.frame(., stringsAsFactors = FALSE)
}
# Convert vectors to data frames
vars <- lapply(list_a, vec2df)
vals <- lapply(list_b, vec2df)
# Combine the data frames into one
# (note the looping)
for(i in 1:K){
colnames(vals[[i]]) <- vars[[i]][1, ]
}
# Combine rows into a single data frame
out <- vals %>%
dplyr::bind_rows()
rownames(out) <- NULL
# Show output
out
Arrange the data in list_b so that the variables are aligned. We can use Map/mapply to do this, convert the output to dataframe and name the columns.
setNames(data.frame(t(mapply(function(x, y) y[order(x)], list_a, list_b))),
sort(colnames_set))

How to use stopifnot with a list in R

Suppose I have a list of vectors. Suppose further that I would like to have a condition based on their length. That is, I would like my function return an error if the lengths of these vectors are not equal.
For example,
x <- c(1:4)
y <- c(1:5)
z <- c(1:4)
k <- list(x, y, z)
I would like to check that their lengths are equal.
stopifnot(length(k[[1]]) == length (k[[2]]) == length(k[[3]]))
How could I generalize this code and make it works for an arbitrary number of elements of the list?
We can use lengths with unique
stopifnot(length(unique(lengths(k)))==1)
Error: length(unique(lengths(k))) == 1 is not TRUE
The lengths will get the length of each of the vector in the list as a vector, get the unique and check if the length is equal to 1. If it is not i.e. stopifnot, give an error

why nrow(dataframe) and length(dataframe) in r give different results?

I have a ordered data frame and want to know the number of the last row.
data_ranking <- reduced_data[order(reduced_data$outcome,reduced_data$hospital,na.last=NA),]
nobs <- nrow(data_ranking)
gives me different results of
data_ranking <- reduced_data[order(reduced_data$outcome,reduced_data$hospital,na.last=NA),]
nobs <- length(data_ranking)
I would like to understand why is that. It seems that nrowgives me the answer I'm looking for, but I don't understand why.
data frames are essentially lists where each element has the same length.
Each element of the list is a column, hence length gives you the length of the list, usually the number of columns.
nrow will give you the number of rows, ncol (or length) the number of columns.
The obvious equivalence of columns and list lengths gets messy once we have nonstandard structures within the data.frame (eg. matrices) and
x <- data.frame(y=1:5, z = matrix(1:10,ncol=2))
ncol(x)
# 3
length(x)
# 3
x1 <- data.frame(y=1:5, z = I(matrix(1:10,ncol=2)))
ncol(x1)
# 2
length(x)
# 2

Returning head and tail means from list of vectors

I need to calculate the mean (or other summary functions) on the top x and bottom x portions on list of vectors of varying lengths.
Here is a list of 3 vectors of different lengths similar in format with what I am working with:
t <- list(a = exp(-4:3), b = exp(-2:12), c = exp(-5:3))
Ideally, I would like a single vector of numbers for each type of means (I manually ran mean(head(t$a),2)) and mean(tail(t$a),2)) for each vectors):
Ideal output yielding a nameless vector of means of the first two elements from each vector:
[1] 0.2516074 1.859141 0.09256118
Second vector of means for last two entries in each vector:
[1] 1.859141 15064.77 1.859141
Looking for a clever lapply-type construct to get a vector of numbers for each means without the attached names (in this case a,b,c). Thanks!
What about
n = 2
v = lapply(t, function(i) mean(head(i, n)))
The variable v is list. So to get a vector, just use unlist
v = unlist(v)
To extract the numbers use as.vector
as.vector(v)
For the tail, just use
lapply(t, function(i) mean(tail(i, n)))
Using sapply you can wrap this in a function:
sapply(dat,function(x,length=2)
c(mean(head(x,length)),mean(head(x,length))))
# a b c
# [1,] 0.03405135 0.2516074 0.01252679
# [2,] 0.03405135 0.2516074 0.01252679

Subsetting a list of different data types

I have a list of different data types (factors, data.frames, and vectors, all the same length or number of rows), What I would like to do is subset each element of the list by a vector (let's call it rows) that represents row names.
If it was a data.frame() I would:
x <- x[rows,]
If it was a vector() or factor() I would:
x <- x[rows]
So, I've been playing around with this:
x <- lapply(my_list, function(x) ifelse(is.data.frame(x), x[rows,], x[rows]))
So, how do I accomplish my goal of getting a list of subsetted data?
I think this is YAIEP (Yet Another If Else Problem). From ?ifelse:
ifelse returns a value with the same shape as test which is filled
with elements selected from either yes or no depending on whether the
element of test is TRUE or FALSE.
See the trouble? Same shape as test.
So just do this:
l <- list(a = data.frame(x=1:10,y=1:10),b = 1:10, c = factor(letters[1:20]))
rows <- 1:3
fun <- function(x){
if (is.data.frame(x)){
x[rows,]
}
else{
x[rows]
}
}
lapply(l,fun)

Resources