I've the below data in variable X. The data is in the form of pair of numbers {a, b}.
a represents the actual value while b represents its frequency in the data set.
X = {{20, 30}, {21, 40}, {22, 50}}
I want to calculate expected value of this data set.
How can extract out all values of a in a separate data set ?
The expected value is (in non-Mma notation) sum(x[i]*p[i], i, 1, n) where x[i] is the i-th distinct value (i.e., first value in each pair), p[i] is the proportion of that value (i.e., second value in each pair divided by the total of all of the second values), and n is the number of distinct values of x (i.e., the number of pairs). I think this is enough to help you solve it now.
Related
The nearZeroVar() function from the mixOmics R package is simply the following code:
nearZeroVar(x, freqCut=95/5, uniqueCut=15) # default values shown
Here is the description of what this function does, straight from the source.
For example, an example of near zero variance predictor is one that,
for 1000 samples, has two distinct values and 999 of them are a single
value.
To be flagged, first the frequency of the most prevalent value over
the second most frequent value (called the “frequency ratio”) must be
above freqCut. Secondly, the “percent of unique values,” the number of
unique values divided by the total number of samples (times 100), must
also be below uniqueCut.
In the above example, the frequency ratio is 999 and the unique value
percentage is 0.0001.
I understand that the frequency ratio would be 999/1 (because there are 999 single values and 1 other value) so it would be 999. But shouldn't the unique value percentage be = 2/1000*100 = 0.2, since it would be 2 unique values over the number of samples. How does one obtain 0.0001 as the answer?
I am supposed to find the mean and standard deviation at each given sample size (N), using the "FOR LOOP". I started writing the code as below, I am required to save all the means into vector "p". How do I save all the means into one vector?
sample.sizes =c(3,10,50,100,500,1000)
mean.sds = numeric(0)
for ( N in sample.sizes ){
x <- rnorm(3,mean=0,sd=1)
mean.sds[i]
}
mean(x)
Actually you are doing many thing wrong?
If you are using variable N in for loop, you are not using it anywhere
for (N in 'some_vector') actually means N will take that value one by one. So N in sample sizes will first take, 3 then 10 then 50 and so on.
Now where does i come into picture?
You are calculating x for each iteration of N. In fact you are not using N anywhere in the loop?
first x will return 3 values. In the next line you intend to store these three values in just ith value of mean.sds where i is unknown and storing three values into one value, as it is, is not logically possible.
Do you want this?
sample.sizes =c(3,10,50,100,500,1000)
mean.sds = numeric(0)
for ( i in seq_along(sample.sizes )){
x <- rnorm(sample.sizes[i], mean=0, sd=1)
mean.sds[i] <- mean(x)
}
mean.sds
[1] 0.6085489531 -0.1547286299 0.0052106559 -0.0452804986 -0.0374094936 0.0005667246
I replaced N with seq_along(sample.sizes) which will give iterations equal to the number of that vector. Six in this example.
I passed each ith element to first argument of rnorm to generate these many random values.
Stored each random value into single vector. calculated its mean (one value only) and stored in ith value of your empty vector.
There's this line.
X1_X2_X3_X4_X5_X6
It is known that each variable X* can take values from 0 to 100. The sum of all X* variables is always equal to 100. How many possible string variants can be created?
Suppose F(n,s) is the number of strings with n variables, and the variables sum to s, where each variable is between 0 and 100, and suppose s<=100. You want F(6,100).
Clearly
F(1,s) = 1
If the first variable is t, then it can be followed by strings of n-1 variables that sum to s-t. Thus
F(n,s) = Sum{ 0<=t<=s | F(n-1, s-t) }
So its easy to write a wee function to compute the answer.
Say, I have two vectors of the same length
A = mtcars$mpg
B = mtcars$cyl
I can calculate correlation between whole vectors
cor (A, B)
and get one single value (-0.852162).
What I need is to calculate correlation between the two vectors with a sampling rate of 10, which means I start at the first datapoint in A and B, take 5 values on the right from it (there are no values on the left), calculate a correlation coefficient ad write it in a vector C. Then I take the next value in A & B, take 5 values on the right and 1 on the left, write it into a vector; then shift again to the next value and so forth. The resulting vector C must contain the same number of values as A or B (N=32), and each value in C represents a correlation b/w A and B with a sampling rate 10 (5 values on the left and 5 on the right from that datapoint, if availible).
Is there any elegant and simple way to do it in R?
P.S.: The ease of coding is more important, than the time needed for calculations.
The TTR package may provide what you are looking for.
It should be as simple as:
TTR::runCor(A, B)
There is a whole blog post about rolling correlation here.
I'm trying to iterate through a matrix called XY with 50 rows and 100 columns (divided into 50 pairs of X and Y values descending alongside each other) with a for loop:
for (i in 1:50){
slope=atan(((XY[i+1,2]-XY[i,2])/(XY[i+1,1]-XY[i,1]))*100)
}
So as you can see on top XY[i+1,2]-XY[i,2]), I'm trying to take y the i-th value and subtract it from the next one and iterate through the entire list for each consecutive pair of descending values and then divide that by that by the corresponding x increments to get the slope and convert that into an angle using atan(()*100).
Unfortunately it keeps telling me that XY[i+1,2] is "out of bounds" and I'm pretty sure I have equal brackets on each side of the equation.