How can I calculate values between 0 and 1 from values between 0 and n. E.g. I have items with "click count" and want to get "importance" (a float between 0 and 1) from that.
My attempt: importance = 1-1/count
gives bad results, since the values don't distribute well…
I'm not sure what you mean by "don't distribute well". If you want to normalize a value between 0 and n to between 0 and 1, just divide by n.
Also not sure what you mean...
If you are looking for a linear distribution between 0 and 1, you need to know the maximum value of n. This will be transformed to 1.
importance = thisCount / maxCount;
just divide by n
How about count/n?
Related
There's this line.
X1_X2_X3_X4_X5_X6
It is known that each variable X* can take values from 0 to 100. The sum of all X* variables is always equal to 100. How many possible string variants can be created?
Suppose F(n,s) is the number of strings with n variables, and the variables sum to s, where each variable is between 0 and 100, and suppose s<=100. You want F(6,100).
Clearly
F(1,s) = 1
If the first variable is t, then it can be followed by strings of n-1 variables that sum to s-t. Thus
F(n,s) = Sum{ 0<=t<=s | F(n-1, s-t) }
So its easy to write a wee function to compute the answer.
In R I am trying to convert some values to the power -0.5. Some of my values are 0, and they return Inf when I do this. Why is that the case? Unless I'm having some obvious lapse in math knowledge, 0 to the power of anything is still 0, which is what I'm expecting.
My values are in the 1000's range, and they return proper results (i.e. 1000^(-0.5) = 0.031)
Am I doing something obviously wrong? Is there something special I need to do in R? Here's a sample of my code:
DF[grep(".SUFFIX", names(henn))] <- DF[grep(".SUFFIX", names(DF))]^(-0.5)
The code works fine btw for other functions, like if I did + 10
Thanks in advance!
You're getting Infinity because raising 0 to a negative power results in dividing by 0:
0 ^ (-0.5) =
1 / (0^(0.5)) =
1 / 0 = Infinity
I figured out the answer. 0 to any positive power is 0, 0 to any negative power is undefined (dividing by 0). It makes sense.
I am looking to create multiple lists based on the sequence of numbers (ie 0 0 1 0 1 0 1...) until there are 10 successful trials with the negative binomial distribution. I am obviously not understanding the list functions too well as my current code isn't retrieving anything worthwhile:
z = as.list(supply(1:10, function(x) rnbinom(inf, 10, 1/x)))
The probabilities need to vary as per the sequence 1/n with n=1,2,...,10, with the "experiment" continuing until 10 successes occur, then the results ("1 1 1 1 1 1 1 1 1 1" would be the first for instance, since Pr=1) need to be listed.
You'll want to understand that what rnbinom actually returns is the simulated number of failures, not the sequence of trials. So, to simulate based on a random number of failures, you'll need rep failures (0) by the number of failures returned, shuffle in 9 successes (1), and end with a success (for a total of 10).
For example, with prob=0.5
c(sample(c(rep(1,10-1),rep(0,times=rnbinom(1,10,0.5)))),1)
To apply of a list of inverse probabilities
lapply(1:10,function(x) c(sample(c(rep(1,10-1),rep(0,times=rnbinom(1,10,1/x)))),1))
An alternative would be to simulate a sufficient number of the trials themselves and then truncate to the number of desired successes.
x <- rbinom(100,1,0.5)
x[cumsum(x)<=10]
A proper implementation would handle the extreme case of having not generated enough successes by simulating additional trials to concatenate.
I am working in R with a database that has these two variables. Camouflage and Detection. The values are binary, 0 for being conspicuous and 1 for being camouflaged. 1 for detected and 0 for undetected. However, during my analysis I added values that are called Unknown in the Detection variable. I would like to permute the Unknown with 1 then 0 and see if each of the permutations affects the significance of the glm that I am using. The permutation may be that all Unknown change to 0 or to 1, or that some change to 1 and others to 0. A random permutation. It may be simple, it's just that I am not really functional with R.
Try this:
camouf = c(1,NA,0,0,1,0,NA,NA,NA,0,1)
perm <- function(vec, chance = 0.5){
unknown <- which(is.na(camouf))
vec[unknown] <- sample(0:1, size=length(unknown),
prob = c(1-chance,chance), replace = TRUE)
return(vec)
}
perm(camouf) # do it once
replicate(50, perm(camouf)) # do it many times
It defines a function perm to do what you call permute to a vector of 0 and 1 and put in a random 0 or 1 at the places, where the original had an NA. The probability of a 1 can be given via the chance = argument.
I'd like to split a sequence into k parts, and optimize the homogeneity of these sub-parts.
Example : 0 0 0 0 0 1 1 2 3 3 3 2 2 3 2 1 0 0 0
Result : 0 0 0 0 0 | 1 1 2 | 3 3 3 2 2 3 2 | 1 0 0 0 when you ask for 4 parts (k = 4)
Here, the algorithm did not try to split in fixed-length parts, but instead tried to make sure elements in the same parts are as homogeneous as possible.
What algorithm should I use ? Is there an implementation of it in R ?
Maybe you can use Expectation-maximization algorithm. Your points would be (value, position). In your example, this would be something like:
With the E-M algorithm, the result would be something like (by hand):
This is the desired output, so you can consider using this, and if it really works in all your scenarios. An annotation, you must assign previously the number of clusters you want, but I think it's not a problem for you, as you have set out your question.
Let me know if this worked ;)
Edit:
See this picture, is what you talked about. With k-means you should control the delta value, this is, how the position increment, to have its value to the same scale that value. But with E-M this doesn't matter.
Edit 2:
Ok I was not correct, you need to control the delta value. It is not the same if you increment position by 1 or by 3: (two clusters)
Thus, as you said, this algorithm could decide to cluster points that are not neighbours if their position is far but their value is close. You need to guarantee this not to happen, with a high increment of delta. I think that with a increment of 2 * (max - min) values of your sequence this wouldn't happen.
Now, your points would have the form (value, delta * position).