How to replicate a function for a nested list - r

I have a nested list (datalist) which I'd like to repeat the following function for. Within datalist are multiple dataframes (e.g., A-F).
After doing the following for the nested dataframe "A", I'd like to run it for the other nested dataframes (B-F):
dat_A_dat<-datalist["A"]
dat_A <- dat_A_dat$"A"[,c(1:4,7)] #note: I have to use $ to access this
dat_A.v <-dat_A[,c(1,2)]
dat_A.b <-dat_A[,3]
dat_A.c <-dat_A[,4]
dat_A.r <-dat_A[,7]
Is there a simpler way of doing this?
Your help would be greatly appreciated. Thank you all.

Its not clear what structure your data is or what exactly you're trying to achieve, but if you're just asking how to write a function that can be applied to each element of the list then it would be something like as follows.
Note: you might have to change this depending on the structure of your data and what you are trying to achieve, in future try to include a reproducible example
my_extraction_function <- function(d_list) {
dat <- d_list$"A"[,c(1:4,7)] #note: I have to use $ to access this
dat.v <-dat[,c(1,2)]
dat.b <-dat[,3]
dat.c <-dat[,4]
dat.r <-dat[,7]
# Return them in whatever format you want
list(v = dat.v,
b = dat.b,
c = dat.c,
r = dat.r)
}
You can then do:
my_extraction_function(datalist[["A"]])
my_extraction_function(datalist[["B"]])
... etc.
or
lapply(datalist, my_extraction_function)

Related

Convert R list to Pythonic list and output as a txt file

I'm trying to convert these lists like Python's list. I've used these codes
library(GenomicRanges)
library(data.table)
library(Repitools)
pcs_by_tile<-lapply(as.list(1:length(tiled_chr)) , function(x){
obj<-tileSplit[[as.character(x)]]
if(is.null(obj)){
return(0)
} else {
runs<-filtered_identical_seqs.gr[obj]
df <- annoGR2DF(runs)
score = split(df[,c("start","end")], 1:nrow(df[,c("start","end")]))
#print(score)
return(score)
}
})
dt_text <- unlist(lapply(tiled_chr$score, paste, collapse=","))
writeLines(tiled_chr, paste0("x.txt"))
The following line of code iterates through each row of the DataFrame (only 2 columns) and splits them into the list. However, its output is different from what I desired.
score = split(df[,c("start","end")], 1:nrow(df[,c("start","end")]))
But I wanted the following kinda output:
[20350, 20355], [20357, 20359], [20361, 20362], ........
If I understand your question correctly, using as.tuple from the package 'sets' might help. Here's what the code might look like
library(sets)
score = split(df[,c("start","end")], 1:nrow(df[,c("start","end")]))
....
df_text = unlist(lapply(score, as.tuple),recursive = F)
This will return a list of tuples (and zeroes) that look more like what you are looking for. You can filter out the zeroes by checking the type of each element in the resulting list and removing the ones that match the type. For example, you could do something like this
df_text_trimmed <- df_text[!lapply(df_text, is.double)]
to get rid of all your zeroes
Edit: Now that I think about it, you probably don't even need to convert your dataframes to tuples if you don't want to. You just need to make sure to include the 'recursive = F' option when you unlist things to get a list of 0s and dataframes containing the numbers you want.

How to constrain duplicate removal for specific data.frame in the list more elegantly?

I have data.frame objects in the list as an output of custom function, and I intend to proceed duplicate removal only first data.frame objects, while others shouldn't not be effected. I tried this in lapply function to control this constrain, but I have subscript error instead. I know this easier to do it in separately, but this is not desired for me. Can anyone point me out how to make this easier in functional programming ? Does anyone knows any useful trick of controlling the constrain on specific objects in the list ?
mini example:
myList <- list(
bar = data.frame(v1=c(12,21,37,21,37), v2=c(14,29,45,29,45)),
cat = data.frame(v1=c(18,42,18,42,81), v2=c(27,46,27,46,114)),
foo = data.frame(v1=c(3,3,33,3,33,91), v2=c(26,26,42,26,42,107))
)
it is easy to do like this:
.first <- unique(myList[[1L]])
res <- c(list(.first), myList[- 1L])
but I need to constrain duplicate removal only effect on the first data.frame, while others doesn't do remove duplicate, I intend to implement this in function more elegant way.
desired output:
myOutput <- list(
bar = data.frame(v1=c(12,21,37),v2=c(14,29,45)),
cat = data.frame(v1=c(18,42,18,42,81), v2=c(27,46,27,46,114)),
foo = data.frame(v1=c(3,3,33,3,33,91), v2=c(26,26,42,26,42,107))
)
If we need to use lapply, then we can loop through the sequence of list, and with if/else modify the list elements
lapply(seq_along(myList), function(i) if(i==1) unique(myList[[i]]) else myList[[i]])
Or else we can assign the modified list element
myList[[1]] <- unique(myList[[1]])

Problems with an if statement

I have a complex matrix with several rows per individual. I create a script where I summarize different variables per individual. In order to do that, I first create a list with the new summarized variables in it. In order to get some of these variables I need to introduce if clases like the following:
this_iids_roh <- dat[class,]
my_list<-c("Froh"=(sum(this_iids_roh$KB)/2881033),
"chr1"= if (this_iids_roh$CHR==1) {(sum(this_iids_roh$KB)/247249.719)*100},
"chr2"= if (this_iids_roh$CHR==2) {(sum(this_iids_roh$KB)/242193.529)*100},
"chr3"= if (this_iids_roh$CHR==3) {(sum(this_iids_roh$KB)/198295.559)*100})
return(my_list)
However when I do run this script (this is just a small part) I only get the "Froh" and "chr1" variables. I tried several things but I'm not being able to get other variables after "chr1".
I hope you can help me!
Instead of If condition outside you can directly use the condition to subset the data.
this_iids_roh <- NULL
this_iids_roh$CHR = rep(c(1,2,3),10)
this_iids_roh$KB = runif(30)*100000
this_iids_roh = as.data.frame(this_iids_roh)
The way to do this is
my_list<-c("Froh"=(sum(this_iids_roh$KB)/2881033),
"chr1"= {(sum(this_iids_roh$KB[this_iids_roh$CHR==1])/247249.719)*100},
"chr2"= {(sum(this_iids_roh$KB[this_iids_roh$CHR==2])/242193.529)*100},
"chr3"= {(sum(this_iids_roh$KB[this_iids_roh$CHR==3])/198295.559)*100})
> my_list
Froh chr1 chr2 chr3
0.60958 203.99334 251.06703 324.65984
Hope this solves the problem. Note that the conditions are written inside the square brackets above.
alternativly
my_list<-c(Froh= sum(this_iids_roh$KB)/2881033,
chr1= sum(this_iids_roh$KB[this_iids_roh$CHR==1])/2472.49719,
chr2= sum(this_iids_roh$KB[this_iids_roh$CHR==2])/2421.93529,
chr3= sum(this_iids_roh$KB[this_iids_roh$CHR==3])/1982.95559)
my_list
also fine with with()
my_list <- with(this_iids_roh, c(Froh= sum(KB)/2881033,
chr1= sum(KB[CHR==1])/2472.49719,
chr2= sum(KB[CHR==2])/2421.93529,
chr3= sum(KB[CHR==3])/1982.95559))
my_list

Apply custom function to any dataset with common name

I have a custom function that I want to apply to any dataset that shares a common name.
common_funct=function(rank_p=5){
df = ANY_DATAFRAME_HERE[ANY_DATAFRAME_HERE$rank <rank_p,]
return(df)
}
I know with common functions I could do something like below to get the value of each.
apply(mtcars,1,mean)
But what if I wanted to do :
apply(any_dataset, 1, common_funct(anyvalue))
How would I pass that along?
library(dplyr)
mtcars$rank = dense_rank(mtcars$mpg)
iris$rank = dense_rank(iris$Sepal.Length)
Now how would I go about applying my same function to both values?
If I understand you question, I would suggest putting you data frames into a list and apply over it. So
## Your example function
common_funct=function(df, rank_p=5){
df[df$rank <rank_p,]
}
## Sanity check
common_funct(mtcars)
common_funct(iris)
Next create a list of the data frames
l = list(mtcars, iris)
and use lapply
lapply(l, common_funct)

A loop to create a list

I would like to form a list thanks to a loop.
I have a list of variables called:
var1, var2, ... varN
And I would like to create easily a list of length named listvar with:
unlist(listvar[i])=vari (with i in 1:N)
Is someone inspired ?
The code makes me wonder why the variables var1 … varN exist in the first place: they shouldn’t. Instead, generate the list directly.
That said, you can easily retrieve the value of a variable given by its name using get. This doesn’t even require a loop, you can use R’s vectorised operations.
varnames = paste0('var', 1 : N)
listvar = mget(varnames)

Resources