Suppose you have a list of data.frames like
dfs <- list(
a = data.frame(x = c(1:4, 7:10), a = runif(8)),
b = data.frame(x = 1:10, b = runif(10)),
c = data.frame(x = 1:10, c = runif(10))
)
I would now like to extract the longest data.frame or data.frames in this list. How?
I am stuck at this point:
library(plyr)
lengths <- lapply(dfs, nrow)
longest <- max(lengths)
There are two built-in functions in R that could solve your question in my opinion:
which.max: returns the index of the first element of your list that is equal to the max
> which.max(lengths)
[1] 2
which function returns all indexes that are TRUE
Here:
> which(lengths==longest)
[1] 2 3
Then you can subset you list to the desired element:
dfs[which(lengths==longest)]
will return b and c in your example.
cnt <- sapply(dfs, nrow)
dfs[cnt == max(cnt)]
Or if you only need the first occurrence of the maximum length:
dfs[which.max(cnt)]
Related
I want to add columns based on a function in all lists in list.
list1 <- list(A = c(1:10), B = c(rnorm(1:10)), C = c(rnorm(1:10)), D = c(rnorm(1:10)))
list2 <- list(A = c(1:10), B = c(rnorm(1:10)), C = c(rnorm(1:10)), D = c(rnorm(1:10)))
both_lists <- list(list1,list2)
both_lists <- lapply(both_lists, function(x) ... )
For one dataframe (not in a list) I normally use:
df1 <- data.frame(A = c(1:10), B = c(rnorm(1:10)), C = c(rnorm(1:10)), D = c(rnorm(1:10)))
df2 <- data.frame(A = c(1:10), B = c(rnorm(1:10)), C = c(rnorm(1:10)), D = c(rnorm(1:10)))
df1 %>% mutate(max = do.call(pmax, c(select(., c(2:4)))))
But how do I do this for the lists* in the list? So I want to do 2 things to all the lists in my list:
find the maximum of columns 2-4
add that maximum as a separate row
Oh and could anyone also tell me how I actually change the name of the list inside the list? (So changing the name of list1 to the name of a row name in the set? EG setting the name of list to df1[[1]][1] and repeat that with lapply for every list in the list?
With lapply you can do it as follows:
lapply(both_lists, function(x){x[['max']] <- do.call(pmax, x[2:4]); x})
The output looks like this:
[[1]]
[[1]]$A
[1] 1 2 3 4 5 6 7 8 9 10
[[1]]$B
[1] 1.325128799 0.341702207 0.341139152 -0.630065889 0.799934566 0.427531770
[7] -1.492861023 2.643621022 0.008158055 -0.187956774
[[1]]$C
[1] -0.8535937 -0.1753520 1.1008905 -0.0385363 -1.6739434 0.2179597 -0.1300490 0.4177869
[9] 1.3066992 0.2369493
[[1]]$D
[1] 0.98472409 0.66930725 0.52449977 0.08553770 -1.81759549 -0.07564249 -0.63611958
[8] -1.19293507 -1.61571223 1.29777033
[[1]]$max
[1] 1.3251288 0.6693073 1.1008905 0.0855377 0.7999346 0.4275318 -0.1300490 2.6436210
[9] 1.3066992 1.2977703
[[2]]
...
Assuming your data.frames df1 and df2 as shown in the OP are in a list named dfl:
library(dplyr)
library(magrittr)
dfl <- lapply(dfl, function(x){
x %<>% mutate(max = do.call(pmax, c(select(., c(2:4)))))
})
And if you want to set the names of the list elements as some value from the data.frames within, maybe something like this?
names(dfl) <- lapply(dfl, function(x){
x[2,2]
})
I hope this is what you actually meant because your question was a bit unclear to me. (Apologies if I am wrong.)
I need to carry forward NA values from one column to the next. An example of the code is below
df <- data.frame(a = c(1,2,NA,NA,NA,NA,NA,NA,NA,NA),
b =c(NA,NA,3,4,NA,NA,NA,NA,NA,NA),
c = c(NA,NA,NA,NA,5,6,NA,NA,NA,NA),
d = c(NA,NA,NA,NA,NA,NA,7,8,NA,NA),
e = c(NA,NA,NA,NA,NA,NA,NA,NA,9,10))
I have tried to use a loop with the na.locf function in zoo but this only carries the previous columns values
columns <- seq(2,ncol(df))
output <- list()
for (i in columns){
output[[i]] <- t(zoo::na.locf(t(df[,(i-1):i])))[,2]
}
The expected output would be like
expected_output <- data.frame(a = c(1,2,NA,NA,NA,NA,NA,NA,NA,NA),
b = c(1,2,3,4,NA,NA,NA,NA,NA,NA),
c = c(1,2,3,4,5,6,NA,NA,NA,NA),
d = c(1,2,3,4,5,6,7,8,NA,NA),
e = c(1,2,3,4,5,6,7,8,9,10))
Transpose df, apply na.locf, transpose again and replace df contents with that to make it a data frame with the correct names.
library(zoo)
out <- replace(df, TRUE, t(na.locf(t(df), fill = NA)))
identical(out, expected_output)
## [1] TRUE
This also works and is similar except it applies na.locf0 to each row instead of applying na.locf to the transpose.
out <- replace(df, TRUE, t(apply(df, 1, na.locf0)))
identical(out, expected_output)
## [1] TRUE
I would like to add a new column D to data.frames in a list that contains the first part of column B. But I'm not sure how to adress within lists down to the column level?
create some data
df1 <- data.frame(A = "hey", B = "wass.7", C = "up")
df2 <- data.frame(A = "how", B = "are.1", C = "you")
dfList <- list(df1,df2)
desired output:
# a new column removing the last part of column B
[[1]]
A B C D
1 hey wass.7 up wass
[[2]]
A B C D
1 how are.1 you are
for each data frame I did this, which worked
df1$D<-sub('\\..*', '', df1$B)
in a function I tried this, which is probably
not correctly addressing the columns and returns
"unexpected symbol..."
dfList <- lapply(rapply(dfList, function(x)
x$D<-sub('\\..*', '', x$B) how = "list"),
as.data.frame)
the lapply(rapply) part is copied from Using gsub in list of dataframes with R
Check this out
lapply(dfList, function(x){
x$D <-sub('\\..*', '', x$B);
x
})
[[1]]
A B C D
1 hey wass.7 up wass
[[2]]
A B C D
1 how are.1 you are
The rapply solution does work. However, you needed a comma before the how argument to resolve the error. Additionally, you will NOT be able to assign one new column only replace existing ones. Since rapply is a recursive call, it will run the gsub across every element in nested list so across ALL columns of ALL dataframes.
Otherwise use a simple lapply per #JilberUrbina's answer.
df1 <- data.frame(A = "hey", B = "wass.7", C = "up", stringsAsFactors = F)
df2 <- data.frame(A = "how", B = "are.1", C = "you", stringsAsFactors = F)
dfList <- list(df1,df2)
dfList <- lapply(rapply(dfList, function(x)
sub('\\..*', '', x), how = "list"),
as.data.frame)
dfList
# [[1]]
# A B C
# 1 hey wass up
# [[2]]
# A B C
# 1 how are you
I am facing the following challenge:
I have a list of dataframes in R and I'd like to extract some specific information from it. Here is an example:
df_1 <- data.frame(A = c(1,2), B = c(3,4), D = c(5,6))
df_2 <- data.frame(A = c(7,8), B = c(9,10), D = c(11,12))
df_3 <- data.frame(A = c(0,1), B = c(2,3), D = c(4,5))
L <- list(df_1, df_2, df_3)
What I'd like to extract are the values at position (1,1) in each of these dataframes. In the above case this would be: 1, 7, 0.
Is there a way to extract this information easily, probably with one line of code?
As Ronak has suggested , you can use function like lapply and wrap it with unlist for desired output.
unlist(lapply(L,function(x) x[1,1]))
In addition to the *apply methods shown above, you can also do this in a Vectorized manner. Since all the data frames in your list have the same column names, and you want the first element from the first column, i.e. 'A1', then you can simply unlist (which will create a named vector) and grab the values with the name A1.
v1 <- unlist(L)
v1[names(v1) == 'A1']
#A1 A1 A1
# 1 7 0
I have a list of dataframes and I need to transform a certain variable in each of the dataframes as factor.
E.g.
myList <- list(df1 = data.frame(A = sample(10), B = rep(1:2, 10)),
df2 = data.frame(A = sample(10), B = rep(1:2, 10))
)
Lets say that variable B needs to be factor in each dataframe. I've tried this:
TMP <- setNames(lapply(seq_along(myList), function(x) apply(myList[[x]][c("B")], 2, factor)), names(myList))
but it only returns the transformed variable, not the whole dataframe as I need. I know how to do this with for loop, but I don't want to resort to that.
Per comment from David Arenburg, this solution should work:
TMP <- lapply(myList, function(x) {x[, "B"] <- factor(x[, "B"]) ; x}) ; str(TMP)