simulate lapply / loop over a list of parameters

simulate lapply / loop over a list of parameters - r

I would like to simulate the frequency and severity over a list of parameters.
Here is the case for the first item in the list:
data <- data.frame(
lamda = c(5, 2, 3),
meanlog = c(9, 10, 11),
sdlog = c(2, 2.1, 2.2))
freq <- rpois(s, data$lamda[1])
freqsev <- lapply(freq, function(k) rlnorm(k, data$meanlog[1], sdlog = data$sdlog[1]))
freq
freqsev
How I set up a loop or an lapply statement to iterate over all the items in data? (not just the first).
Thanks.

We can use map (from the purrr package, part of the tidyverse package) as follows to create list columns. The contents are now stored in the freq and freqsev columns.
library(tidyverse)
set.seed(123)
s <- 2
data2 <- data %>%
mutate(freq = map(lamda, ~rpois(s, .x)),
freqsev = map(freq, ~map(.x, function(k) rlnorm(k, meanlog, sdlog))))
data2$freq
# [[1]]
# [1] 4 7
#
# [[2]]
# [1] 2 4
#
# [[3]]
# [1] 6 0
data2$freqsev
# [[1]]
# [[1]][[1]]
# [1] 9330.247 28897.323 2605520.369 20370.283
#
# [[1]][[2]]
# [1] 645.4047 5206.2183 22461.1778 93729.0634 46892.3129 144595.7492 10110.8606
#
#
# [[2]]
# [[2]][[1]]
# [1] 2665.955 938950.074
#
# [[2]][[2]]
# [1] 21931.9763 354.2858 280122.6952 3147.6681
#
#
# [[3]]
# [[3]][[1]]
# [1] 957.5257 13936.3063 6265.3530 1886.0077 5927.8540 1464.5081
#
# [[3]][[2]]
# numeric(0)
Update
Here is the way to replace values larger than equal to 500.
data3 <- data2 %>%
mutate(capat500 = map(freqsev, ~map(.x, function(y) ifelse(y >= 500, 500, y))))

Related

How do I search if the numbers in one vector are within a range of two other vectors in R?

I have two vectors. One is start and one is stop for a range of Nucleotides in a protein. Ex. one range is 1374742-1375555.
domainStart = c(1374742,1374760,1374769,1375822,1376182,1376320,1376350)
domainStop = c(1375555, 1375726,1375516, 1378129, 1376638, 1376638, 1377382)
Next I have a long list of nucleotide mutation positions.
db = c(37788, 40303, 138445, 161587, 165946,172979,177605, 200118, 244427, 251156, 258459, 265170, 344062)
I want to know if any of the mutation positions (db) are in the ranges of the domain (1374742-1375555) and return TRUE /FALSE as a vector for each position. Thanks!

You could use map2() from the purrr package:
domainStart = c(1374742,1374760,1374769,1375822,1376182,1376320,1376350)
domainStop = c(1375555, 1375726,1375516, 1378129, 1376638, 1376638, 1377382)
db = c(37788, 40303, 138445, 161587, 165946,172979,177605, 200118, 244427, 251156, 258459, 265170, 344062)
purrr:::map2(domainStart, domainStop, ~which(db > .x & db < .y))
# [[1]]
# integer(0)
#
# [[2]]
# integer(0)
#
# [[3]]
# integer(0)
#
# [[4]]
# integer(0)
#
# [[5]]
# integer(0)
#
# [[6]]
# integer(0)
#
# [[7]]
# integer(0)
Each element of the list identifies the position of the match in db for each pair of start/stop values. Here it is with some that actually work:
db <- c(1374750, 1374761, 1374770)
purrr:::map2(domainStart, domainStop, ~which(db > .x & db < .y))
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 2 3
#
# [[3]]
# [1] 3
#
# [[4]]
# integer(0)
#
# [[5]]
# integer(0)
#
# [[6]]
# integer(0)
#
# [[7]]
# integer(0)
Update: Fixed to address comment
db <- c(1374750, 1374761, 1374770)
purrr:::map2(domainStart, domainStop, function(.x,.y){
mx <- db[which(db > .x & db < .y)]
if(length(mx) == 0){
mx <- NA
}
data.frame(domainStart = .x, domainStop = .y, db = mx)
})
# [[1]]
# domainStart domainStop db
# 1 1374742 1375555 1374750
# 2 1374742 1375555 1374761
# 3 1374742 1375555 1374770
#
# [[2]]
# domainStart domainStop db
# 1 1374760 1375726 1374761
# 2 1374760 1375726 1374770
#
# [[3]]
# domainStart domainStop db
# 1 1374769 1375516 1374770
#
# [[4]]
# domainStart domainStop db
# 1 1375822 1378129 NA
#
# [[5]]
# domainStart domainStop db
# 1 1376182 1376638 NA
#
# [[6]]
# domainStart domainStop db
# 1 1376320 1376638 NA
#
# [[7]]
# domainStart domainStop db
# 1 1376350 1377382 NA

Perhaps we can try the code below
df <- data.frame(Start = domainStart, Stop = domainStop)
apply(
outer(db, domainStart, `>=`) & outer(db, domainStart, `<=`),
1,
function(v) {
df[which(v, arr.ind = TRUE), ]
}
)

Selection of only existing combination before the normalization operation

I'd like to normalize some variable just only if existing combinations in var1and var2 using for, in my example:
# Create my variables
var1<-c(rep(6,25),rep(7,5))
var2<-c(1,1,1,1,1,2,2,2,2,2,5,5,5,5,5,10,10,10,10,10,11,11,11,11,11,5,5,5,5,5)
var3<-rnorm(30)
# Create a data frame
mydf<-data.frame(var1,var2,var3)
str(mydf)
# Inspection by var1 and var2
table(mydf$var1,mydf$var2)
# 1 2 5 10 11
#6 5 5 5 5 5
#7 0 0 5 0 0
# I'd like not considering "0" combinations!!
# My idea is create a subset just only for combinations that have values, but if I make:
var1ID <- unique(mydf$var1)
var2ID <- unique(mydf$var2)
for(a in 1:length(var1ID)){
for(b in 1:length(var2ID)){
mydf_sub <- mydf[mydf$var1 == var1ID[a] & mydf$var2 ==var2ID[b],]
print(var1ID[a])
print(var2ID[b])
# Normalize function
normalizevar <- function(x, na.rm = TRUE) {
return((x- min(x))/(max(x)-min(x)))
}
print(normalizevar(mydf_sub$var3))
}}
# [1] 6
# [1] 1
# [1] 0.0000000 0.1235632 0.1541684 1.0000000 0.3910381
# [1] 6
# [1] 2
# [1] 0.7911505 0.0000000 0.6296866 1.0000000 0.1904835
# [1] 6
# [1] 5
# [1] 0.6571259 1.0000000 0.1402675 0.0000000 0.4068031
# [1] 6
# [1] 10
# [1] 0.7060784 0.0000000 1.0000000 0.4842629 0.9560127
# [1] 6
# [1] 11
# [1] 0.4096362 0.4831099 1.0000000 0.0000000 0.5492811
# [1] 7
# [1] 1
# numeric(0)
# [1] 7
# [1] 2
# numeric(0)
# [1] 7
# [1] 5
# [1] 0.6208451 0.3219927 1.0000000 0.4012007 0.0000000
# [1] 7
# [1] 10
# numeric(0)
# [1] 7
# [1] 11
# numeric(0)
Here a have a problem because I'd just only the output with values existent combinations and not numeric(0). Please, any help with my problem or any dplyr approach to solving it?

Note that in the question, the normalizing function was not removing NA's, if any.
# define the function at the beginning of the script,
# never in a loop
normalizevar <- function(x, na.rm = TRUE) {
(x- min(x, na.rm = na.rm))/(max(x, na.rm = na.rm)-min(x, na.rm = na.rm))
}
# make the results reproducible
set.seed(2021)
# Create my variables
var1 <- c(rep(6,25),rep(7,5))
var2 <- c(1,1,1,1,1,2,2,2,2,2,5,5,5,5,5,10,10,10,10,10,11,11,11,11,11,5,5,5,5,5)
var3 <- rnorm(30)
mydf <- data.frame(var1,var2,var3)
Base R solution
There is no need for nested loops, two (unnested) *apply loops will do it. And in just 3 code lines.
# create the groups of var1, var2
sp <- split(mydf, mydf[1:2])
# keep the sub-data.frames with more than zero rows
sp <- sp[sapply(sp, nrow) > 0]
# and normalize var3
lapply(sp, function(X) normalizevar(X$var3))
dplyr solution
A dplyr solution could be the following.
mydf %>%
group_by(var1, var2) %>%
mutate(new_var3 = normalizevar(var3))

All combinations of summing a vector of numbers together

Given I have the vector and a target number
target.mountain <- 10
Roll_dice <- sample(1:6, 4, replace=TRUE)
With Roll_dice producing
[1] 6, 5, 3, 2 as an example
How can I produce a list of all numbers in Roll_dice with all the ways of adding them together by combining either 2, 3 or 4 of the values in Roll_dice together in a list
For example [1] 2, 3, 5, 5, 6, 7, 11, ....

I would like you to check out the RccpAlgos-package, which has some awesome (and fast!) functions for fast operations on combinations/permutations with constraints.
update
library(RcppAlgos)
library(vecsets)
library(data.table)
target.mountain <- 10
Roll_dice <- c(5, 5, 3, 2)
L <- lapply( 2:4, function(x) {
as.data.table(comboGeneral( Roll_dice,
x,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = target.mountain ),
keep.rownames = TRUE )
})
# [[1]]
# V1 V2
# 1: 5 5
#
# [[2]]
# V1 V2 V3
# 1: 2 3 5
#so 5-5 of 2-3-5 can be chosen to get to 10
#remaining dice
DT <- data.table::rbindlist( L, fill = TRUE )
remains <- lapply( transpose(DT), function(x) {
v <- as.vector(x)
v <- v[ !is.na(v) ]
sum( vecsets::vsetdiff( Roll_dice, v) )
})
remains
#witrh leftovers:
# $V1
# [1] 5
#
# $V2
# [1] 5
old answer
library(RcppAlgos)
target.mountain <- 10
Roll_dice <- c(6, 4, 5, 5)
sapply( 2:4, function(x) {
comboGeneral( Roll_dice,
x,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = target.mountain )
})
# [[1]]
# [,1] [,2]
# [1,] 4 6
# [2,] 5 5
#
# [[2]]
# [,1] [,2] [,3]
#
# [[3]]
# [,1] [,2] [,3] [,4]

Something like this?
> sapply(
+ 2:4,
+ function(k) combn(Roll_dice, k, sum)
+ )
[[1]]
[1] 11 9 8 8 7 5
[[2]]
[1] 14 13 11 10
[[3]]
[1] 16
Or do you need this?
> lapply(
+ setNames(2:4, 2:4),
+ function(k) target.mountain %in% combn(Roll_dice, k, sum)
+ )
$`2`
[1] FALSE
$`3`
[1] TRUE
$`4`
[1] FALSE

How to find if the numbers are continuous in R?

I have a range of values
c(1,2,3,4,5,8,9,10,13,14,15)
And I want to find the ranges where the numbers become discontinuous. All I want is this as output:
(1,5)
(8,10)
(13,15)
I need to find break points.
I need to do it in R.

Something like this?
x <- c(1:5, 8:10, 13:15) # example data
unname(tapply(x, cumsum(c(1, diff(x)) != 1), range)
# [[1]]
# [1] 1 5
#
# [[2]]
# [1] 8 10
#
# [[3]]
# [1] 13 15
Another example:
x <- c(1, 5, 10, 11:14, 20:21, 23)
unname(tapply(x, cumsum(c(1, diff(x)) != 1), range))
# [[1]]
# [1] 1 1
#
# [[2]]
# [1] 5 5
#
# [[3]]
# [1] 10 14
#
# [[4]]
# [1] 20 21
#
# [[5]]
# [1] 23 23

x <- c(1:5, 8:10, 13:15)
rr <- rle(x - seq_along(x))
rr$values <- seq_along(rr$values)
s <- split(x, inverse.rle(rr))
s
# $`1`
# [1] 1 2 3 4 5
#
# $`2`
# [1] 8 9 10
#
# $`3`
# [1] 13 14 15
## And then to get *literally* what you asked for:
cat(paste0("(", gsub(":", ",", sapply(s, deparse)), ")"), sep="\n")
# (1,5)
# (8,10)
# (13,15)

I published seqle which will do this for you in one line. You can load the package cgwtools or search SO for the code, as it's been posted a couple times.

Assuming that you don't care about the exact output and are looking for the min and max of each range, you can use diff/cumsum/range as follows:
x <- c(1:5, 8:10, 13:15)
x. <- c(0, cumsum( diff(x)-1 ) )
lapply( split(x, x.), range )

Combining sequences with similar gene IDs

I have a list of gene IDs along with their sequences in R.
$2435
[1]"ATGCGGGCGGGGGTCGTCGA"
$2435
[1]"ATGCGGCGCGCGCGCTATATACGC"
$2435
[1]"ATGCGGCGCCTCTCATCGCGGGGG"
I want to combine the sequences with the same gene IDs in that list in R.
$2435
[1]"ATGCGGGCGGGGGTCGTCGAATGCGGCGCGCGCGCTATATACGCATGCGGCGCCTCTCATCGCGGGGG"

Use lapply after matching the names with unique. Here's some sample data:
A <- list("12" = "AAAABBBBCCCCDDDD",
"34" = "GGGG",
"12" = "XXXXXXXXXXXXXXXXXXXXXXX",
"10" = "FFFFGGGG",
"10" = "HHHHIIII")
A
# $`12`
# [1] "AAAABBBBCCCCDDDD"
#
# $`34`
# [1] "GGGG"
#
# $`12`
# [1] "XXXXXXXXXXXXXXXXXXXXXXX"
#
# $`10`
# [1] "FFFFGGGG"
#
# $`10`
# [1] "HHHHIIII"
Subset the related names and paste them together.
lapply(unique(names(A)), function(x) paste(A[names(A) %in% x], collapse = ""))
# [[1]]
# [1] "AAAABBBBCCCCDDDDXXXXXXXXXXXXXXXXXXXXXXX"
#
# [[2]]
# [1] "GGGG"
#
# [[3]]
# [1] "FFFFGGGGHHHHIIII"

l <- list("A" = "ABC", "B" = "XYX", "A" = "DEF", "C" = "YZY", "A" = "GHI")
tapply(l, names(l), paste, collapse = "", simplify = FALSE)
# $A
# [1] "ABCDEFGHI"
#
# $B
# [1] "XYX"
#
# $C
# [1] "YZY"

Bonus:
For a dataframe output, use this:
aggregate(unlist(A), by=list(id=names(A)), paste, collapse="")
Where A is you list.
Using #Ananda's A, I get this:
id x
1 10 FFFFGGGGHHHHIIII
2 12 AAAABBBBCCCCDDDDXXXXXXXXXXXXXXXXXXXXXXX
3 34 GGGG

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

simulate lapply / loop over a list of parameters - r

Related

How do I search if the numbers in one vector are within a range of two other vectors in R?

Selection of only existing combination before the normalization operation

All combinations of summing a vector of numbers together

How to find if the numbers are continuous in R?

Combining sequences with similar gene IDs

Categories

Resources