R. lapply multinomial test to list of dataframes - r

I have a data frame A, which I split into a list of 100 data frames, each having 3 rows (In my real data each data frame has 500 rows). Here I show A with 2 elements of the list (row1-row3; row4-row6):
A <- data.frame(n = c(0, 1, 2, 0, 1, 2),
prob = c(0.4, 0.5, 0.1, 0.4, 0.5, 0.1),
count = c(24878, 33605, 12100 , 25899, 34777, 13765))
# This is the list:
nest <- split(A, rep(1:2, each = 3))
I want to apply the multinomial test to each of these data frames and extract the p-value of each test. So far I have done this:
library(EMT)
fun <- function(x){
multinomial.test(x$count,
prob=x$prob,
useChisq = FALSE, MonteCarlo = TRUE,
ntrial = 100, # n of withdrawals accomplished
atOnce=100)
}
lapply(nest, fun)
However, I get:
"Error in multinomial.test(x$counts_set, prob = x$norm_genome, useChisq = F, :
Observations have to be stored in a vector, e.g. 'observed <- c(5,2,1)'"
Does anyone have a smarter way of doing this?

The results of split are created with names 1, 2 and so on. That's why x$count in fun cannot access it. To make it simpler, you can combine your splitted elements using the list function and then use lapply:
n <- c(0,1,2,0,1,2)
prob <- c(0.4, 0.5, 0.1, 0.4, 0.5, 0.1)
count <- c(24878, 33605, 12100 , 25899, 34777, 13765)
A <- cbind.data.frame(n, prob, count)
nest = split(A,rep(1:2,each=3))
fun <- function(x){
multinomial.test(x$count,
prob=x$prob,
useChisq = F, MonteCarlo = TRUE,
ntrial = 100, # n of withdrawals accomplished
atOnce=100)
}
# Create a list of splitted elements
new_list <- list(nest$`1`, nest$`2`)
lapply(new_list, fun)

A solution with dplyr.
A = data.frame(n = c(0,1,2,0,1,2),
prob = c(0.4, 0.5, 0.1, 0.4, 0.5, 0.1),
count = c(43, 42, 9, 74, 82, 9))
library(dplyr)
nest <- A %>%
mutate(pattern = rep(1:2,each=3)) %>%
group_by(pattern) %>%
dplyr::summarize(mn_pvals = multinomial.test(count, prob)$p.value)
nest

Related

Applying function to data.frame

I have a function with following which looks like
function(nsim = 10, maxN = 10000, mu = 0, sigma = 0.1, S0 = 100, endT = 1, K = 100){
nsim+maxN+mu+sigma+S0+endT+K
}
(The function here is just given for simplicity, the actual funtion is a simple Black Sholes pricing model)
Now, I have a data.frame:
df <- expand.grid(nsim = 10,
maxN = 10000,
mu = c(0.05, 0.10, 0.15),
sigma = c(0.2, 0.4, 0.6),
S0 = seq(80,120, by = 1),
endT = c(0.25, 0.50, 0.75),
K = 100,
sim = sprintf("Sim.%s", 1:10)
)
Which is just a collection of multiple values. Now the question is, how do I apply previous function to the data set to calculate a new column with values, but using the column values from each row as input?
You can add a column with mutate :
library(dplyr)
my_function <- function(nsim = 10, maxN = 10000, mu = 0, sigma = 0.1, S0 = 100, endT =
1, K = 100){
nsim+maxN+mu+sigma+S0+endT+K
}
df %>%
mutate(new_c = my_function(nsim, maxN, mu,sigma, S0, endT, K))
You can use mapply :
apply_fun <- function(nsim = 10, maxN = 10000, mu = 0, sigma = 0.1, S0 = 100, endT = 1, K = 100){
nsim+maxN+mu+sigma+S0+endT+K
}
df$price <- mapply(apply_fun, df$nsim, df$maxN, df$mu, df$sigma, df$S0, df$endT, df$K)
If you don't want to write each argument separately you can also use apply with do.call.
df$price <- apply(df[-ncol(df)], 1, function(x) do.call(apply_fun, as.list(x)))

How to generate n MarkovChain sequences of 25 transitions each

I have a transition matrix "T" and would like to produce 20 different sequences of 25 states each.
I have the markovchain package and have tried the following:
lapply(1:20,markovchainSequence(n = 25, markovchain = T, t0 = "In"))
but it says that markovcahinsequence is not a function. Is there a way around this please?
A reproducible example can really help here but I think this does the job done! You may just need a bigger transition matrix?!
set.seed(123)
statesNames <- c("a", "b", "c") #easier with three states
t <- new("markovchain", states = statesNames,
transitionMatrix = matrix(c(0.2, 0.5, 0.3, 0, 0.2, 0.8, 0.1, 0.8, 0.1),
nrow = 3, byrow = TRUE, dimnames = list(statesNames, statesNames)))
mchain = function(n){
markovchainSequence(n = n, markovchain = t, t0 = "a")
}
lapply(rep(25, each=20), mchain) # you may change 25 to desired number

Sampling iteratively without a for loop in R

Even though I think the issue I have may be simple, I nevertheless can't figure it out. Here's the thing:
I have the following list and vector. The list is used to fill up the vector:
probabilities = list(c(0.2, 0.3, 0.5), c(0.1, 0.1, 0.8), c(0.3,0.4,0.3))
nextState = c()
for(counter in 1:3){
nextState[counter] = sample(1:3, size = 1, prob = probabilities[[counter]])
}
The code works fine. However, when expanding to larger lists (>10,000 elements), the loop becomes aggravatingly slow. Since the loop above is used multiple times in the larger code, the time consumed is way too much. Would there be a way to achieve the same result without looping?
Additional question:
Thanks guys, you've been a big help. One additional question: How would approach the same issue if the probabilities and the nextState were interdependent Meaning, how could I avoid the for loop? Perhaps some code to clarify:
M <- list(matrix(c(0.1, 0.2, 0.7, 0.2, 0.2, 0.6, 0.3, 0.3, 0.4), nrow = 3, ncol = 3),
matrix(c(0.3, 0.3, 0.4, 0.5, 0.5, 0, 0.1, 0.1, 0.8), nrow = 3, ncol = 3))
probabilities <- list()
nextState <- c(2, NA, NA)
for(i in 1:2){
probabilities[[i]] <- M[[i]][nextState[i], ]
nextState[i + 1] <- sample(1:3, size = 1, prob = probabilities[[i]])
}
If you've got any idea, then you truly are miracle workers!!
try sapply
nextstate <- sapply( probabilities, function(x) {sample(1:3, size = 1, prob = x)})
benchmarks
# Unit: microseconds
# expr min lq mean median uq max neval
# for 2115.170 2223.475 2436.0797 2283.2755 2371.546 10048.64 100
# sapply 24.704 29.524 164.0261 37.3565 41.123 12763.03 100
microbenchmark::microbenchmark(
`for` = {
nextState = c()
for(counter in 1:3){
nextState[counter] = sample(1:3, size = 1, prob = probabilities[[counter]])
}
},
sapply = sapply( probabilities, function(x) {sample(1:3, size = 1, prob = x)}),
times = 100)
Another possibility with purrr package:
library(purrr)
nexstate <- map_int(probabilities, function(x) {sample(1:3, size = 1, prob = x)})
Data:
probabilities = list(c(0.2, 0.3, 0.5), c(0.1, 0.1, 0.8), c(0.3,0.4,0.3))

Specifying x values when converting approx() to data frame

I am trying to get a data frame from the output of approx(t,y, n=120) below. My intent is for the input values returned to be in increments of 0.25; for instance, 0, 0.25, 0.5, 0.75, ... so I've set n = 120.
However, the data frame I get doesn't return those input values.
t <- c(0, 0.5, 2, 5, 10, 30)
z <- c(1, 0.9869, .9478, 0.8668, .7438, .3945)
data.frame(approx(t, z, n = 120))
I appreciate any assistance in this matter.
There are 121, not 120, points from 0 to 30 inclusive in steps of 0.25
length(seq(0, 30, 0.25))
## [1] 121
so use this:
approx(t, z, n = 121)
Another approach is:
approx(t, z, xout = seq(min(t), max(t), 0.25))

Iteratively define user-defined discrete distributions

I am writing a script that, using -distr-, defines some discrete distributions based on the following objects:
margins <- c("discrete1", "discrete2")
vec1 <- list(support=c(0,1,2), probabilities=c(0.2, 0.2, 0.6))
vec2 <- list(support=c(12,14,20), probabilities=c(0.1, 0.15, 0.75))
Here you have the code that works as expeced: it creates the two distributions.
library("distr")
discrete1 <- DiscreteDistribution (supp = vec1[[1]], prob = vec1[[2]])
ddiscrete1 <- d(discrete1) # Density function
pdiscrete1 <- p(discrete1) # Distribution function
qdiscrete1 <- q(discrete1) # Quantile function
rdiscrete1 <- r(discrete1)
discrete2 <- DiscreteDistribution (supp = vec2[[1]], prob = vec2[[2]])
ddiscrete2 <- d(discrete2)
pdiscrete2 <- p(discrete2)
qdiscrete2 <- q(discrete2)
rdiscrete2 <- r(discrete2)
Once the two (or possibly more) distributions are defined, my final goal is to sample random numbers from them:
rdiscrete1(100)
rdiscrete2(100)
The problem with this code is that the number of distributions can be very high.. I wonder how it could be possible to automatize the creation of the functions in a more elegant manner.
Also, I need the two functions to be of class DiscreteDistribution and not as nested in lists (see is(discrete1) in my example).
l <- list(list(support = c(0, 1, 2), probabilities = c(0.2, 0.2, 0.6)),
list(support = c(12, 14, 20), probabilities = c(0.1, 0.15, 0.75)))
distrs <- lapply(1:length(l), function(n) {
d <- DiscreteDistribution(supp = l[[n]][[1]], prob = l[[n]][[2]])
list(d = d, dd = d(d), pd = p(d), qd = q(d), rd = r(d))
})
# First object of class DiscreteDistribution
is(distrs[[1]][[1]])
# [1] "DiscreteDistribution" "UnivariateDistribution" "AcDcLcDistribution"
# [4] "Distribution" "UnivDistrListOrDistribution"
# Random numbers
dim(sapply(distrs, function(x) x[[5]](100)))
# [1] 100 2

Resources