R apply function to nested list elements using "[[" - r

Given a nested list of numeric vectors like
l = list( a = list(1:2, 3:5), b = list(6:10, 11:16))
If I want to apply a function, say length, of the "index 1 / first" numeric vectors I can do it using the subset function [[:
> sapply(lapply(l, "[[", 1), length)
a b
2 5
I cant figure how to supply arbitrary indeces to [[ in order to get length of (in this example) both vectors in every sub-list (a naive try : sapply(lapply(l, "[[", 1:2), length)).

The [[ can only subset a single one. Instead, we need [ for more than 1 and then use lengths
sapply(lapply(l, "[", 1:2), lengths)
# a b
#[1,] 2 5
#[2,] 3 6

Not using base, but purrr is a great package for lists.
library(purrr)
map_dfc(l, ~lengths(.[1:2]))
# A tibble: 2 x 2
a b
<int> <int>
1 2 5
2 3 6

Maybe the code below can help...
> sapply(l, function(x) sapply(x, length))
a b
[1,] 2 5
[2,] 3 6

Related

changing column names of a data frame by changing values - R

Let I have the below data frame.
df.open<-c(1,4,5)
df.close<-c(2,8,3)
df<-data.frame(df.open, df.close)
> df
df.open df.close
1 1 2
2 4 8
3 5 3
I wanto change column names which includes "open" with "a" and column names which includes "close" with "b":
Namely I want to obtain the below data frame:
a b
1 1 2
2 4 8
3 5 3
I have a lot of such data frames. The pre values(here it is "df.") are changing but "open" and "close" are fix.
Thanks a lot.
We can create a function for reuse
f1 <- function(dat) {
names(dat)[grep('open$', names(dat))] <- 'a'
names(dat)[grep('close$', names(dat))] <- 'b'
dat
}
and apply on the data
df <- f1(df)
-output
df
a b
1 1 2
2 4 8
3 5 3
if these datasets are in a list
lst1 <- list(df, df)
lst1 <- lapply(lst1, f1)
Thanks to dear #akrun's insightful suggestion as always we can do it in one go. So we create character vectors in pattern and replacement arguments of str_replace to be able to carry out both operations at once. We can assign character vector of either length one or more to each one of them. In case of the latter the length of both vectors should correspond. More to the point as the documentation says:
References of the form \1, \2, etc will be replaced with the contents
of the respective matched group (created by ())
library(dplyr)
library(stringr)
df %>%
rename_with(~ str_replace(., c(".*\\.open", ".*\\.close"), c("a", "b")))
a b
1 1 2
2 4 8
3 5 3
Another base R option using gsub + match + setNames
setNames(
df,
c("a", "b")[match(
gsub("[^open|close]", "", names(df)),
c("open", "close")
)]
)
gives
a b
1 1 2
2 4 8
3 5 3

Duplicating R dataframe vector values using another vector as a guide

I have the following R dataframe: df = data.frame(value=c(5,4,3,2,1), a=c(2,0,1,6,9), b=c(7,0,0,3,4)). I would like to duplicate the values of a and b by the number of times of the corresponding position values in value. For example, Expanding b would look like b_ex = c(7,7,7,7,7,2,2,2,4). No values of three or four would be in b_ex because values of zero are in b[2] and b[3]. The expanded vectors would be assigned names and be stand-alone.
Thanks!
Maybe you are looking for :
result <- lapply(df[-1], function(x) rep(x[x != 0], df$value[x != 0]))
#$a
#[1] 2 2 2 2 2 1 1 1 6 6 9
#$b
#[1] 7 7 7 7 7 3 3 4
To have them as separate vectors in global environment use list2env :
list2env(result, .GlobalEnv)

Get the mean across list of dataframes by rows

I have a list of dataframes and I want to calculate a mean from each first rows, for all second rows etc.
I think this is possible by creating some common factor as index, put dataframes together using rbind and then calculate the mean value using aggregate(value ~ row.index, mean, large.df). However, I guess there is more straightforward way?
Here is my example:
df1 = data.frame(val = c(4,1,0))
df2 = data.frame(val = c(5,2,1))
df3 = data.frame(val = c(6,3,2))
myLs=list(df1, df2, df3)
[[1]]
val
1 4
2 1
3 0
[[2]]
val
1 5
2 2
3 1
[[3]]
val
1 6
2 3
3 2
And my expected dataframe output, as rowise means:
df.means
mean
1 5
2 2
3 1
My first steps, not working as expected yet:
# Calculate the mean of list by rows
lapply(myLs, function(x) mean(x[1,]))
A simple way would be to cbind the list and calculate mean of each row with rowMeans
rowMeans(do.call(cbind, myLs))
#[1] 5 2 1
We can also use bind_cols from dplyr to combine all the dataframes.
rowMeans(dplyr::bind_cols(myLs))
Here is another base R solution using unlist + data.frame + rowMeans, i.e.,
rowMeans(data.frame(unlist(myLs,recursive = F)))
# [1] 5 2 1
Using double loop:
sapply(1:3, function(i) mean(sapply(myLs, function(j) j[i, ] )))
# [1] 5 2 1
Another base R possibility could be:
Reduce("+", myLs)/length(myLs)
val
1 5
2 2
3 1

R: converting fractions into decimals in a data frame

I am trying to convert a data frame of numbers stored as characters in a fraction form to be stored as numbers in decimal form. (There are also some integers, also stored as char.) I want to keep the current structure of the data frame, i.e. I do not want a list as a result.
Example data frame (note: the real data frame has all elements as character, here it is a factor but I couldn't figure out how to replicate a data frame with characters):
a <- c("1","1/2","2")
b <- c("5/2","3","7/2")
c <- c("4","9/2","5")
df <- data.frame(a,b,c)
I tried df[] <- apply(df,1, function(x) eval(parse(text=x))). This calculates the numbers correctly, but only for the last column, populating the data frame with that.
Result:
a b c
1 4 4.5 5
2 4 4.5 5
3 4 4.5 5
I also tried df[] <- lapply(df, function(x) eval(parse(text=x))), which had the following result (and I have no idea why):
a b c
1 3 3 2
2 3 3 2
3 3 3 2
Desired result:
a b c
1 1 2.5 4
2 0.5 3 4.5
3 2 3.5 5
Thanks a lot!
You are probably looking for:
df[] <- apply(df, c(1, 2), function(x) eval(parse(text = x)))
df
a b c
1 1.0 2.5 4.0
2 0.5 3.0 4.5
3 2.0 3.5 5.0
eval(parse(text = x))
evaluates one expression at a time so, you need to run cell by cell.
EDIT: if some data frame elements can not be evaluated you can account for that by adding an ifelse statement inside the function:
df[] <- apply(df, c(1, 2), function(x) if(x %in% skip){NA} else {eval(parse(text = x))})
Where skip is a vector of element that should not be evaluated.
Firstly, you should prevent your characters from turning into factors in data.frame()
df <- data.frame(a, b, c, stringsAsFactors = F)
Then you can wrap a simple sapply/lapply inside your lapply to achieve what you want.
sapply(X = df, FUN = function(v) {
sapply(X = v,
FUN = function(w) eval(parse(text=w)))
}
)
Side Notes
If you feed eval an improper expression such as expression(1, 1/2, 2), that evaluates to last value. This explains the 4 4.5 5 output. A proper expression(c(1, 1/2, 2)) evaluates to the expected answer.
The code lapply(df, function(x) eval(parse(text=x))) returns a 3 3 2 because sapply(data.frame(a,b,c), as.numeric) returns:
a b c
[1,] 1 2 1
[2,] 2 1 3
[3,] 3 3 2
These numbers correspond to the levels() of the factors, through which you were storing your fractions.
To those looking for a one-liner: you can use parse_ratio from the DOSE package to coerce the character fractions to numeric.
library(DOSE)
b <- c("5/2","3","7/2")
parse_ratio(b)
[1] 2.5 1.0 3.5

Using the apply or plyr when the return has a variable number of columns

I'm wondering if there is a way to directly return a data frame from an apply or plyr call when the return from the function can have a variable number of columns (but will always have the same number of rows). For example:
df <- data.frame(A = 1:3, B = c("a","b", "c"))
my_fun <- function(x){
if(is.numeric(unlist(x))){
return(x)
} else {
return(cbind(x, x))
}
}
The closest I've been able to get is by returning a list and converting it into a data frame:
library(plyr)
data.frame(alply(df, 2, my_fun))
## A X2.B X2.B.1
## 1 1 a a
## 2 2 b b
## 3 3 c c
It feels like there should be a way to do this without the extra conversion, is there?
I use lapply() a lot in this way, when you want to apply a function to several columns of a data frame. In base R, you can treat a data frame as a list, where each column is one element. If you use lapply() as usual it will return a list, which isn't what we want.
> lapply(df, my_fun)
$A
[1] 1 2 3
$B
x x
[1,] 1 1
[2,] 2 2
[3,] 3 3
But if you assign the result to df[] it will signal to R that you want a subset of your original data frame (the full subset, which isn't a subset at all), thus preserving the data frame object type.
> df[] <- lapply(df, my_fun)
> df
A B.x B.x
1 1 1 1
2 2 2 2
3 3 3 3

Resources