Related
I want to build a data frame like
In the head I have a value of a number n
in factorial the factorial(n) which is a recursive function
in sum the sum of the previous values of the factiorials.
I write a recursive function that successfully generate the head and factorial columns but the still struggling with the sum column.
Thanks
Below R code
fact <- function(n, x){
if (n<=1){
return (n)
} else {
n*fact(n-1)
}
}
recurDf <- function(n, df){
if (n<=1){
df <- rbind (df, data.frame("value" = paste('Value', n) , "factorial" = n, "previous.sum" = 1) )
return (df)
} else {
if(is.null(df)) {
#df <- data.frame(matrix(ncol = 3, nrow = 0))
#colnames(df) <- c("value", "factorial", "previous.sum")
df <- data.frame("value"= 'va', "factorial" =0, "previous.sum" = 0)
}
rbind (recurDf(n-1,df), data.frame("value" = paste('Value', n) , "factorial" = fact(n), "previous.sum" = sum(recurDf(n-1,df)$factorial) ))
}
}
recurDf(4, NULL)
The following returns the factor of n in its first component and the cumulative sum of all factorials to n in its second argument.
fact2 <- function(n) {
if (n <= 1) c(1,1)
else {
prev <- Recall(n-1)
n * prev[1] + c(0, prev[2])
}
}
fact2(1)
## [1] 1 1
fact2(2)
## [1] 2 3
fact2(3)
## [1] 6 9
fact2(4)
## [1] 24 33
cbind(1:4, t(sapply(1:4, fact2)))
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 2 2 3
## [3,] 3 6 9
## [4,] 4 24 33
Is there a reason you need to do this recursively?
There are much simpler ways to get to your answer.
recurDf <- function(n){
df <- data.frame("value" = c(paste('Value',1:n)) , "factorial" = c(1:n))
df$factorial <- factorial(df$factorial)
df$previous.sum <- cumsum(df$factorial)
return (df)
}
recurDf(4)
This returns
value factorial previous.sum
1 Value 1 1 1
2 Value 2 2 3
3 Value 3 6 9
4 Value 4 24 33
I have the following four equations (a,b,c,d), with several different variables (x,t,v,w,n,f). My goal would be to try and find all sets of variable values that would generate all positive (and non-zero) numbers for equations(a,b,c,d). A regular loop would just go through each number of the sequence generated and systematically check if it generates a positive value or not. I want it to pick up random numbers from each sequence and test it against the others in R.
For example (x=8, t = 2.1,v=13,w=1,n=10,f=1) is a possible set of combinations.
Please do not suggest analytically solving for these and then finding out values. These are simply representations of equations I'm dealing with. The equations I have are quite complex, and more than 15 variables.
#Equations
a <- x * t - 2*x
b <- v - x^2
c <- x - w*t - t*t
d <- (n - f)/t
x <- seq(from = 0.0001, to = 1000, by = 0.1)
t <- seq(from = 0.0001, to = 1000, by = 0.1)
v <- seq(from = 0.0001, to = 1000, by = 0.1)
w <- seq(from = 0.0001, to = 1000, by = 0.1)
n <- seq(from = 0.0001, to = 1000, by = 0.1)
f <- seq(from = 0.0001, to = 1000, by = 0.1)
For a start, it might be better to organize your equations and your probe values into lists:
set.seed(1222)
values <- list(x = x, t = t, v = v, w = w, n = n, f = f)
eqs <- list(
a = expression(x * t - 2 * x),
b = expression(v - x^2),
c = expression(x - w*t - t*t),
d = expression((n - f)/t)
)
Then we can define a number of samples to take randomly from each probe vector:
samples <- 3
values.sampled <- lapply(values, sample, samples)
$x
[1] 642.3001 563.1001 221.3001
$t
[1] 583.9001 279.0001 749.1001
$v
[1] 446.6001 106.7001 0.7001
$w
[1] 636.0001 208.8001 525.5001
$n
[1] 559.8001 28.4001 239.0001
$f
[1] 640.4001 612.5001 790.1001
We can then iterate over each stored equation, evaluating the equation within the "sampled" environment:
results <- sapply(eqs, eval, envir = values.sampled)
a b c d
[1,] 373754.5 -412102.82 -711657.5 -0.1380373
[2,] 155978.8 -316975.02 -135533.2 -2.0935476
[3,] 165333.3 -48973.03 -954581.8 -0.7356827
From there you can remove any value that is 0 or less:
results[results <= 0] <- NA
If every independent value can take on the same value (e.g. seq(from = 0.0001, to = 1000, by = 0.1)), we can approach this with much greater rigor and avoid the possibility of generating duplicates. First we create a masterFun that is essentially a wrapper for all of the functions you want to define:
masterFun <- function(y) {
## y is a vector with 6 values
## y[1] -->> x
## y[2] -->> t
## y[3] -->> v
## y[4] -->> w
## y[5] -->> n
## y[6] -->> f
fA <- function(x, t) {x * t - 2*x}
fB <- function(v, x) {v - x^2}
fC <- function(x, w, t) {x - w*t - t*t}
fD <- function(n, f, t) {(n - f)/t}
## one can easily filter out negative
## results as #jdobres has done.
c(a = fA(y[1], y[2]), b = fB(y[3], y[1]),
c = fC(y[1], y[4], y[2]), d = fD(y[5], y[6], y[2]))
}
Now, using permuteSample, which is capable of generating random permutations of a vector and subsequently applying any given user defined function to each permutation, from RcppAlgos (I am the author), we have:
## Not technically the domain, but this variable name
## is concise and very descriptive
domain <- seq(from = 0.0001, to = 1000, by = 0.1)
library(RcppAlgos)
## number of variables ... x, t, v, w, n, f
## ||
## \/
permuteSample(domain, m = 6, repetition = TRUE,
n = 3, seed = 123, FUN = masterFun)
[[1]]
a b c d
218830.316100 -608541.146040 -310624.596670 -1.415869
[[2]]
a b c d
371023.322880 -482662.278860 -731052.643620 1.132836
[[3]]
a b c d
18512.60761001 -12521.71284001 -39722.27696002 -0.09118721
In short, the underlying algorithm is capable of generating the nth lexicographical result, which allows us to apply a mapping from 1 to "# of total permutations" to the permutations themselves. For example, given the permutations of the vector 1:3:
permuteGeneral(3, 3)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 3 2
[3,] 2 1 3
[4,] 2 3 1
[5,] 3 1 2
[6,] 3 2 1
We can easily generate the 2nd and the 5th permutation above without generating the first permutation or the first four permutations:
permuteSample(3, 3, sampleVec = c(2, 5))
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 3 1 2
This allows us to have a more controlled and tangible grasp of our random samples as we can now think of them in a more familiar way (i.e. a random sample of numbers).
If you actually want to see which variables were used in the above calculation, we simply drop the FUN argument:
permuteSample(domain, m = 6, repetition = TRUE, n = 3, seed = 123)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 780.7001 282.3001 951.5001 820.8001 289.1001 688.8001
[2,] 694.8001 536.0001 84.9001 829.2001 757.3001 150.1001
[3,] 114.7001 163.4001 634.4001 80.4001 327.2001 342.1001
I'm a newer user of R and understand how to make my code work but I know there has to be a dplyr or purrr function that does this more efficiently and with a lot less code? If there is I haven't found it yet. My PI wants a summation of our race data but the trick is to have it separated by one race and then if they answered more than one race the sum breakdown of those. I did a subset of the data to get just those columns and then added the columns individually in each row and output that to a new matrix 7x7 to get sums of each.
This is my code. My question is there a much more efficient way of doing this?
-sum races to create totaled matrix of all races
subset <- subset(dataset[,11:17])
test <- matrix(,nrow=7, ncol=7)
colnames(test) <- c("African_American", "Asian", "Hawaiian_Pacific", "Native_Alaskan", "White_Euro", "Hispanic_Latino", "No-Answer")
rownames(test) <- c("African_American", "Asian", "Hawaiian_Pacific", "Native_Alaskan", "White_Euro", "Hispanic_Latino", "No-Answer")
-basic design of "if ==1 then strictly one race. If >1 stick in appropriate category
test[1,1] <- sum(subset$African_American==1, na.rm=TRUE)
test[1,2] <- sum(subset$African_American+subset$Asian>1, na.rm=TRUE)
test[1,3] <- sum(subset$African_American+subset$Hawaiian_Pacific>1, na.rm=TRUE)
test[1,4] <- sum(subset$African_American+subset$Native_Alaskan>1, na.rm=TRUE)
test[1,5] <- sum(subset$African_American+subset$White_Euro>1, na.rm=TRUE)
test[1,6] <- sum(subset$African_American+subset$Hispanic_Latino>1, na.rm=TRUE)
test[1,7] <- sum(subset$African_American+subset$`No-Answer`>1, na.rm=TRUE)
test[2,1] <- sum(subset$Asian+subset$African_American>1, na.rm=TRUE)
test[2,2] <- sum(subset$Asian==1, na.rm=TRUE)...
There are seven columns to add to each other so it moves all the way through the matrix and outputs something similar to this where the diagonal are actual counts of only one race and the others are multiple occurrences:
matrix
I found a way which is not using plyr but the r-base function apply.
data = data.frame(set1 = round(runif(n = 10,min = 0,max = 1)),
set2 = round(runif(n = 10,min = 0,max = 1)),
set3 = round(runif(n = 10,min = 0,max = 1)),
set4 = round(runif(n = 10,min = 0,max = 1)),
set5 = round(runif(n = 10,min = 0,max = 1)),
set6 = round(runif(n = 10,min = 0,max = 1)),
set7 = round(runif(n = 10,min = 0,max = 1))
)
res = apply(combn(1:ncol(data), 2), 2, function(x) sum(data[, x[1]] & data[, x[2]]))
test <- matrix(0,nrow=7, ncol=7)
test[upper.tri(test)] = res
> test
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 5 3 2 2 4 2 2
[2,] 0 5 5 3 4 5 4
[3,] 0 0 6 3 1 0 5
[4,] 0 0 0 8 3 3 1
[5,] 0 0 0 0 2 2 2
[6,] 0 0 0 0 0 6 3
[7,] 0 0 0 0 0 0 6
The first part is producing some test data.
combn(1:ncol(data), 2) is telling apply to use a function for each combination of 2 columns. The & function then is returning TRUE for all entries of data[, x[1]] and data[, x[2]] (the 2 selected comlumns) where both values are 1. The summation is counting these. As a return you get the desired values. The following two lines construct a matrix as you wanted.
Please note that with addition of
res2 = apply(combn(1:ncol(data), 1), 2, function(x) sum(data[, x[1]]))
test[cbind(1:7,1:7)] <- res2
ou can also set the diagonal to the correct counts. Anyway this is only working for objects having answered 1 in 2 columns. It wont find those who are Asian, Hispanic and American. But you can compute this with a slight change to combination of 3 columns :
apply(combn(1:ncol(data), 3), 2, function(x) sum(data[, x[1]] & data[, x[2]] & data[, x[3]]))
Please also note that my random data may not be representative/unrealistic.
I want to create random mock data looks like this.
__ID__|__Amount__
1 20
1 14
1 9
1 3
2 11
2 5
2 2
Starting from the random number but the second number with the same ID should be lesser than the first one, and the third number has to be lesser than the second one. Maximum number to start should be 20.
you can just create the data first and then sort it as you need, using tidyverse :
set.seed(0)
df <- data.frame(id = rep(1:3,10), amt = sample(1:20, 30, replace = TRUE))
df %>%
group_by(id) %>%
arrange(id, desc(amt))
This is a tricky one if you want the Amount column to be truly random values you can use a recursive call that will use sample recursively:
## Recursively sampling from a uniform distribution
recursive.sample <- function(start, end, length, results = NA, counter =0) {
## To enter the recursion, counter must be smaller than the length out
## and the last result must be smaller than the starting point (except the firs time)
if(counter < length && ifelse(counter != 0, results[counter] > start, TRUE)){
## Increment the counter
counter <- counter + 1
## Sample between start and the last result or the start and the end of the vector
results[counter] <- ifelse(counter != 1, sample(start:results[counter-1], 1), sample(start:end, 1))
## Recursive call
return(recursive.sample(start = start, end = end, length = length, results = results, counter = counter))
} else {
## Exit the recursion
return(results)
}
}
## Example
set.seed(0)
recursive.sample(start = 1, end = 20, length = 3, results = NA, counter = 0)
#[1] 18 5 2
Alternatively (and way easier) you can use sort(sample()):
set.seed(0)
sort(sample(1:20, 3), decreasing = TRUE)
#[1] 18 7 6
Note that the results differ due to the lower probability of sampling higher values in the recursive function.
You can then easily create your table with your chosen function as follow:
set.seed(123)
## The ID column
ID <- c(rep(1, 4), rep(2,3))
## The Amount column
Amount <- c(recursive.sample(1, 20, 4, NA, 0), recursive.sample(1, 11, 3, NA, 0))
## The table
cbind(ID, Amount)
# ID Amount
#[1,] 1 18
#[2,] 1 5
#[3,] 1 2
#[4,] 1 2
#[5,] 2 10
#[6,] 2 3
#[7,] 2 3
Or, again, with the simple sort(sample()) function for a higher probability of picking larger numbers.
Two methods, one using dplyr and one using only base R functions. These are slightly different to the two previous solutions.
I used sorted ID column, but this is not necessary.
Method 1
rm(list = ls())
set.seed(1)
df <- data.frame(ID = rep(1:3, each = 5))
df %>% group_by(ID) %>%
mutate(Amount = sort(sample(1 : 20, n(), replace = T), decreasing = TRUE))
Method 2
rm(list = ls())
set.seed(1)
df <- data.frame(ID = rep(1:3, each = 5))
df$Amount <- NA
uniq_ID <- unique(df$ID)
index_lst <- lapply(uniq_ID, function(x) which(df$ID == x))
res <- lapply(index_lst, function(x) sort(sample(1 : 20, length(x)),
decreasing = TRUE))
df$Amount[unlist(index_lst)] <- unlist(res)
Method 2.5
This is more convoluted than the 2nd method.
rm(list = ls())
set.seed(1)
df <- data.frame(ID = rep(1:3, each = 5))
df$Amount <- NA
tab <- as.data.frame(table(df$ID))
lapply(1 : nrow(tab), function(x) df$Amount[which(df$ID == tab$Var1[x])] <<-
sort(sample(1 : 20, tab$Freq[x]), decreasing = TRUE))
so I've estimated a multidimensional IRT model using the TAM package, based on this dataset that I have.
So now that I have the TAM fit object, is there any way to use it to simulate a new dataset that "abides by the rules" of that model I estimated?
Here is something similar, but about lme fit objects:
https://stats.stackexchange.com/questions/11233/how-to-simulate-data-based-on-a-linear-mixed-model-fit-object-in-r
Thanks in advance,
KH
Edit
now, since TAM version 1.10-0, it is possible using the function IRT.simulate (see respective help file). Thanks again for the request.
library(TAM)
data(data.gpcm)
psych::describe(data.gpcm)
resp <- data.gpcm
# define three dimensions and different loadings of item categories
# on these dimensions in B loading matrix
I <- 3 # 3 items
D <- 3 # 3 dimensions
# define loading matrix B
# 4 categories for each item (0, 1, 2, 3)
B <- array(0 , dim = c(I, 4, D))
for (ii in 1:I){
B[ii, 1:4, 1] <- 0:3
B[ii, 1, 2] <- 1
B[ii, 4, 3] <- 1
}
dimnames(B)[[1]] <- colnames(resp)
B[1, , ]
## > B[1,,]
## [,1] [,2] [,3]
## [1,] 0 1 0
## [2,] 1 0 0
## [3,] 2 0 0
## [4,] 3 0 1
#-- test run
mod1 <- tam.mml(resp, B = B, control = list(snodes = 1000, maxiter = 5))
sim.dat <- IRT.simulate(mod1, nobs = 2000)
Old Solution
I wouldn't say it is impossible. However, for the time being, it is not easy since it involves handling of TAM internal functions and attributes of the estimation object. That is, there is no method yet that lets you extract the response probability function at prespecified trait points.
However, thanks to your request, we are working on exactly this very valuable feature and I'll give an update to this answer as soon as the method is on CRAN.
For now, let's extend the example of that request: Implement ConQuest score command in TAM that Alex also included at the manual page of the tam function as EXAMPLE 20.
data(data.gpcm)
psych::describe(data.gpcm)
resp <- data.gpcm
# define three dimensions and different loadings of item categories
# on these dimensions in B loading matrix
I <- 3 # 3 items
D <- 3 # 3 dimensions
# define loading matrix B
# 4 categories for each item (0, 1, 2, 3)
B <- array(0 , dim = c(I, 4, D))
for (ii in 1:I){
B[ii, 1:4, 1] <- 0:3
B[ii, 1, 2] <- 1
B[ii, 4, 3] <- 1
}
dimnames(B)[[1]] <- colnames(resp)
B[1, , ]
## > B[1,,]
## [,1] [,2] [,3]
## [1,] 0 1 0
## [2,] 1 0 0
## [3,] 2 0 0
## [4,] 3 0 1
#-- test run
mod1 <- tam.mml(resp, B = B, control = list(snodes = 1000, maxiter = 5))
Now for the part where we extract the attributes that are necessary for the computation of the response probabilities and generate new testees.
# Extract necessary item attributes
xsi <- mod1$xsi$xsi
A <- mod1$A
B <- mod1$B
maxK <- mod1$maxK
nI <- dim(A)[1]
iIndex <- 1:nI
AXsi <- matrix(0, nrow = nI, ncol = maxK)
# Simulate new testees
nnodes <- 2000
theta <- mvrnorm(n = nnodes, mod1$beta, mod1$variance)
The response probabilities can be obtained from a call to an internal function.
# Calculate response probablities and simulate
p <- TAM:::calc_prob.v5(iIndex, A, AXsi, B, xsi, theta, nnodes, maxK, recalc = TRUE)$rprobs
p[,,1] # response probability of testee 1 to each category 0, 1, 2, 3 for all three items
# [,1] [,2] [,3] [,4]
# [1,] 0.06738066 0.8111365 0.1043441 0.0171387
# [2,] 0.02545206 0.4895568 0.3182046 0.1667866
# [3,] 0.04503185 0.5105446 0.3429603 0.1014633
With this, simulate the success cut and compare that to the response probabilities.
sim.data <- matrix(runif(nnodes * nI), nrow = nnodes, ncol = nI)
for(pp in 1:nnodes){
cat.success.pp <- (sim.data[pp, ] > t(apply(p[, , pp], 1, cumsum)))
sim.data[pp, ] <- c(cat.success.pp %*% rep(1, maxK))
}
Best,
Tom