If I have 37 different characters and I need to create all possible words from that having max length of 15, what will be number pf total words?
Supoose, I have X=2 characters, and Y=2 max length, then possible outcomes are:
A
AA
B
BB
AB
BA
So Z=6 number of total outomces.
Now if value of X=37 and Y=15, what will be value of Z???
It's sum(37^i) for i=1..15. For your example: 2^1 + 2^2 = 6.
In Addition to Bartosz' answer, a simple formula for the sum is
n = ((1-X^(Y+1)) / (1-X)) - 1
Another simple formula for the summation is
x * (x^y - 1)
-------------
x - 1
http://www.wolframalpha.com/input/?i=sum+x%5Ei+for+i+1+to+y+
Related
I have a dataframe with participants and I want to randomly assign them to a group (0,1). Each group should have approximately the same amount of participants.
My problem: I will keep adding participants. So, when I calculate a new random number for that participant, it should take into accound the distribution of the random numbers I already have.
This is my code:
groupData <- data.frame(participant = c(1), Group = floor(runif(1, min=0, max=2)))
groupData[nrow(groupData) + 1,] = c(2,floor(runif(1, min=0, max=2))) # with this I will be adding participants
I think what you're saying is that when iteratively adding participants to groupData, you want to randomly assign them to a group such that over time, the groups will be evenly distributed.
N.B., iteratively adding rows to a frame scales horribly, so if you're doing this with a lot of data, it will slow down a lot. See "Growing Objects" in The R Inferno.
We can weight the different groups proportion to their relative size (inversely), so that a new participant has a slightly-higher likelihood of being assigned an under-populated group.
For instance, if we already have 100 participants with unbalanced groups:
set.seed(42)
groupData <- data.frame(participant = 1:100, Group = sample(c(rep(0, 70), rep(1, 30))))
head(groupData)
# participant Group
# 1 1 0
# 2 2 0
# 3 3 0
# 4 4 1
# 5 5 0
# 6 6 1
table(groupData$Group)
# 0 1
# 70 30
then we can prioritize the under-filled group using
100 / (table(c(0:1, groupData$Group))-1)
# 0 1
# 1.428571 3.333333
which can be used with sample as in
sample(0:1, size = 1, prob = 100 / (table(c(0:1, groupData$Group)) - 1) )
I use table(c(0:1, ..)) - 1 because I want this to work when there may not yet be participants in one of the groups; by concatenating 0:1 to it, I ensure heac group has at least one, and the "minus one" compensates for this artificiality, trying to keep the ratios unbiased.
To "prove" that this eventually rounds out ...
for (pa in 101:400) {
newgroup <- sample(0:1, size = 1, prob = 100 / (table(c(0:1, groupData$Group))-1))
groupData <- rbind(groupData, data.frame(participant=pa, Group=newgroup))
}
library(ggplot2)
transform(groupData, GroupDiff = cumsum(Group == 0) - cumsum(Group == 1)) |>
ggplot(aes(participant, y = GroupDiff)) +
geom_point() +
geom_hline(yintercept=0) +
geom_vline(xintercept = 100) +
geom_text(data=data.frame(participant=101, GroupDiff=c(-Inf, -1, 1), vjust=c(-0.5, 0.5, -0.5), label=c("Start of group-balancing", "Group0-heavy", "Group1-heavy")), hjust=0, aes(label=label, vjust=vjust))
It is possible (even likely) that the balance will sway from side-to-side, but in general (asymptotically) it should stay balanced.
It occurs to me that the simplest method is just to assign people in pairs. Draw a random number (0 or 1) assign person N to the group associated with that value and assign person N+1 to the other group. That guarantees random assignment as well as perfectly equal group sizes.
Whether this properly simulates the situation you want to analyze is a separate issue.
How do I calculate a number if its number of factors (N) and number of prime factors (K) are given?
Example: if N = 4 and K = 2 is given then the only possible value will be 6
Explanation : of the above is 6 has 4 factors (1,2,3,6) of which 2 are primes (2,3). So the only possible value is 6.
Any integer number can be represented as product of powers of primes:
I = 2^p1 * 3^p2 * 5^p3 * 7^p4 * 11^p5 * ....
Total number of all its factors is
N = (p1+1) * (p2+1) * (p3+1) * ....
\__________________________/
K multipliers
So you you need to represent the value N as a product of K factors larger than 1.
Factorize N into primes, group these primes into K groups.
Imagine you have N=420 with 5 prime factors: 2 2 3 5 7 and K=3. Make groups 2*2, 3*5, 7 (or any other combination) so corresponding powers to make I are 3,14,6
For example, having N = 12 and K=3, you can represent 12 = 2 * 2 * 3 and use a product of any two primes with a square of a third prime as the number I. The smallest such value is 60 (2^2 * 3 * 5), the next one is 90 (2 * 3^2 * 5) and so on (for example, 3 * 7 * 11^2 is also a solution).
For the case N = 12 and K=2 you can represent 12 = 3 * 4 and get results as p^2*q^3 or 12 = 2 * 6 and get results as p*q^5 where p,q are distinct primes
For the case N = 12 and K=4 you cannot represent 12 as product of four integers larger than 1, so it is impossible to produce result with these arguments
I have to total count of row and column total of a table. in the example it is 31, 92, 59 and 64. Also each cell can get the value maximum (Ex: max 20 for cell 1 and so on) of that indicated in the example.
Example :
How can I code it in R ? I tried with repeat loop , but no success !!
your table look like this:
a b | sab
c d | scd
----------
sac sbd| S
and you have 4 unknowns with 4 constraints (forget about the maximum constraints on a,b,c and d for a moment):
a+b=sab
a+c=sac
c+d=scd
b+d=sbd
the 4 constraints are not independent (otherwise you would have only one possible solution!), and a
bit of algebra shows that the matrix of this linear system has rank 3. so you have one degree of
freedom to play with. pick a for example, and then vary a from 0 to its maximum value. For each value
of a then compute b, c and d using the row and column sum constraints, and check that they satisy the
positivity and maximum constraints.
The R code for your example is as follows:
sab <- 59
scd <- 64
sac <- 31
sbd <- sab + scd - sac ### this is always true
amax <- 20
bmax <- 40
cmax <- 12
dmax <- 70
### let us vary a, our only degree of freedom
for (a in 0:amax){
### let us compute b, c and d by satisfying row and column sum constraints
b <- sab - a
c <- sac - a
d <- sbd - b
### let us check inequality constraints
if (b <= bmax && b>= 0 && c <= cmax && c >= 0 && d <= dmax && d >= 0){
cat("\nSolution:\n")
print(m <- rbind(c(a,b),c(c,d)))
cat("\nrowSums:", rowSums(m))
cat("\ncolsums:", colSums(m))
cat("\n---------------\n")
if (! identical(rowSums(m), c(sab,scd)))
stop("\nrow sum is not right!\n")
if (! identical(colSums(m), c(sac,sbd)))
stop("\ncolumns sum is not right!\n")
}
}
These is what i did, given h = 5 and get the value of ARL=2.5.
C=[1,5,8,9,7,8,1,2,5,5]
h=5
n=10
ooc=C>h
lenOOC = sum(ooc)
lenOOC
>4
p=lenOOC/n
p
>0.4
ARL=1/p
ARL
>2.5
But now i need is given the value of ARL and compute the value of h=?
for example, with same data set of C.
ARL=1.428571
n=10
lenOOC=n/ARL
lenOOC
>7...
and how to proceed to get the value of h? given ooc=C>h.
Hope you all can understand what im trying to ask..
Thanks for helping..=]]
lenOOC=n/ARL You want h so that lenOOC = sum(C>h). that is you want to find a number h so that lenOOC of the numbers are strictly bigger than h. That is the same as n - lenOOC are less than or equal to h so you want the n - lenOOC biggest number. Just sort the numbers in C and pick the n - lenOOC biggest number.
C=c(1,5,8,9,7,8,1,2,5,5)
ARL=1.428571
n=10
lenOOC=n/ARL
sort(C)[n - round(lenOOC)]
[1] 2
So h = 2.
But notice that this will not work for all values of ARL. What if we tried ARL = 10/9 = 1.1111111
This would give
ARL=1.111111
lenOOC=n/ARL
sort(C)[n - round(lenOOC)]
[1] 1
But h=1 would actually give ARL = 1.25 No number will give ARL=1.111111. The above method only works for possible values of ARL.
The purpose of my code is to find the amount of people where the probability that at least 2 of them have the same birthday is 50%.
source('colMatches.r')
all_npeople = 1:300
days = 1:365
ntrials = 1000
sizematch = 2
N = length(all_npeople)
counter = 1
pmean = rep(0,N)
while (pmean[counter] <= 0.5)
{
npeople = all_npeople[counter]
x = matrix(sample(days, npeople*ntrials, replace=TRUE),nrow=npeople,
ncol=ntrials)
w = colMatches(x, sizematch)
pmean[counter] = mean(w)
counter = counter + 1
}
s3 = toString(pmean[counter])
s2 = toString(counter)
s1 = "The smallest value of n for which the probability of a match is at least 0.5 is equal to "
s4 = " (the test p value is "
s5 = "). This means when you have "
s6 = " people in a room the probability that two of them have the same birthday is 50%."
paste(s1, s2, s4, s3, s5, s2, s6, sep="")
When I run that code I get "The smallest value of n for which the probability of a match is at least 0.5 is equal to 301 (the test p value is NA). This means when you have 301 people in a room the probability that two of them have the same birthday is 50%." So the while statement isn't working properly for some reason. It's cycling all the way through all_npeople even though it should stop when pmean[counter] is no longer less than or equal to 0.5.
I know that pmean is updating correctly though because when I test it afterwards pmean[50] = 0.971. So that list is indeed correct but the while loop still won't end.
*colmatches is a function that determines if a column has a certain number of matches based on sizematch. So in this case it's looking at the matrix defined in x and listing 1 for every column that has at least 2 similar values and 0 for every column with no matches.
I admire your attempt to program this question, but the beauty of R is most of this work is done for you:
qbirthday(prob = 0.5, classes = 365, coincident = 2)
#answer is 23 people.
You maybe also be interested in:
pbirthday(n, classes = 365, coincident = 2)
If the purpose of the code is only to define number of people when probability that at least two of them have same birthday is above 0.5, it is possible to write it in much simplier way:
# note that probability below is probability of NOT having same birthday
probability <- 1
people <- 1
days <- 365
while(probability >= 0.5){
people <- people + 1
probability <- probability * (days + 1 - people) / days
}
print(people)