I'm struggling with what seems like it should be a simple and straightforward calculation. I'm trying to produce a weighted average of a list, where the most recent elements heavier than the earlier ones. The weighting list needs to be based on the length of the input list and add up to 1.
Let's say I have a given list:
l = [123, 456, 789]
Next step is to make a weight list, something like this (I just made these numbers up, but they show the general idea of the weight profile I'm after):
w = [0.15, 0.25, 0.6]
This is where I'm stuck. Generating w based on the length of l.
Any ideas?
I'm guessing the weights you're looking after are somewhat like:
[a, 2a, 3a, 4a, ...,n*a]
Sum of weights: (a*n*(n+1))/2 = 1
Solve for a and fill your weight array.
When N is 3, a would be (1/6)
w = [1/6, 2/6, 3/6]
You can follow some sort of monotonically increasing function (i.e. e^x) and just normalize the first n values where n is the length of the list.
I have a vector epsilon of length N. I am applying the function bw.CDF.pi(x, pilot="UCV") from the sROC package to compute bandwidths for cdf Kernel estimation.
My goal is to repeat this bandwidth function for every subvector from epsilon from the beginning value on. Stated otherwise, I would like to apply this function for the first value in epsilon, then for the first two values in epsilon, then for the first three values in epsilon, continiuing until the function is applied fot the total vector epsilon. Finally i want to have then N values for the bandwidth.
How can I accomplish this?
Apparently you need a vector of 2 elements for the function bw.CDF.pi to run. If you want to run it for the first 2 elemts of a vector, then the first 3, etc, you can do the following. Note that the data example is the one in the help page for the function.
n <- 200
x <- c(rnorm(n/2, mean=-2, sd=1), rnorm(n/2, mean=3, sd=0.8))
lapply(seq_along(x)[-1], function(m) bw.CDF.pi(x[seq_len(m)], pilot="UCV"))
Once again I have been set another programming task and to most of which I have done, so a quick run through: I've had to take n amount of samples of multivariate normal distribution with dimension p (called it X) then to put it into a matrix (Matx) where the first two values in each row were taken and summed a long with a value randomly drawn from the standard normal distribution. (Call this vector Y) Then we had to order Y numerically and split it up into H groups, and then I had to find out the mean of each row in the matrix and now having to order then in terms of which Y group they were associated. I've struggled a fair bit and have now hit a brick wall. Quite confusing I understand, if anyone could help it'd be greatly appreciated!
Task:Return the pxH matrix which has in the first column the mean of the observations in the first group and in the Hth column the mean in the observations in the Hth group.
Matx<-matrix(c(x), ncol=6, byrow=TRUE)
I have been recently helped in getting a function to write a random binary matrix, with the condition that the diagonal is 0s.
fun <- function(n){
vals <- sample(0:1, n*(n-1)/2, rep = T)
mat <- matrix(0, n, n)
mat[upper.tri(mat)] <- vals
mat[lower.tri(mat)] <- vals
Here I am entering values from the 'sample' to the upper and lower triangle separately. I would like to keep this in any updated function because sometimes I may wish to enter transpositions of each triangle into the other.
What I would like assistance with is how to change the frequency of 1s in the random matrix. This already varies around, I believe, a normal distribution. e.g. in a 9x9 matrix, there are 81-9=72 cells to fill, and the average number of 1s used is 36.
However, if I wanted to create matrices with a probability of e.g. p=0.9 of there being a 1, or e.g. p=0.2 of there being a 1... - how is this done?
I tried some ways of changing the sample(0:1,) part of the code by adding in probability functions but I only got errors.
You should look in to help page of sample function
?sample shows :
sample(x, size, replace = FALSE, prob = NULL)
A vector of probability weights for obtaining the elements of
the vector being sampled.
and further below in Details you will see
The optional prob argument can be used to give a vector of weights for
obtaining the elements of the vector being sampled. They need not sum
to one, but they should be non-negative and not all zero.
So to answer your question, apart from read the manual , use prob=c(0.1,0.9) if you want probability of 0.1 for first element of x and 0.9 for the second.
I have the number of samples per unit and need to calculate statistics with R.
The table is like this (all rows and columns are actually filled with values, I only write a few here for easier visibility, and there are many more columns):
Hour 1 2 3 4
H1 72 11 98 65
H2 19 27
I.e. the first hour (H1) there were 72 samples of value 1, 11 samples of value 2, etc. The second hour(H2) there were 19 samples of value 1, 27 samples of value 2, etc.
I need to calculate the mean and standard deviation per hour (i.e. per row). As there are many thousands of rows I need a fast method.
Example: The manual mean-calculation for hour 1 (H1) would be:
(72x1 + 11x2 + 98x3 + 65x4)/(72+11+98+65) = 2.6
I suppose there are R-methods or packages that can do this, but I fail to find where. Your support is highly appreciated.
You want to calculate a weighted mean, so you need weighted.mean. For the first row:
values <- c(1, 2, 3, 4)
weights <- c(72, 11, 98, 65)
weighted.mean(values, weights)
The weighted standard deviation is not well-defined. You could use a hand-rolled weighted RMS as an estimator (but this assumes that your input sample is really from a single Gaussian, i.e. there are no outliers -- not sure if that's the case for your example).
# same values and weights as above
You should read your data into a table and iterate over every row. Also, "many thousands of rows" is not necessarily a large number for such a simple calculation. This is very basic stuff, maybe checking out a tutorial would also be beneficial.
You are much better off (i.e. faster calculations) using matrix operations instead of applying something by row. For example, assuming X is the matrix containing your data, you can get the weighted means the following way:
w <- 1:ncol(X)
w <- w/sum(w) #scale to have a sum of 1
wmeans <- X %*% w
Assuming your table is a matrix called dataset of n * 20000 and you have the weigths in a weights array you just need to do:
# The 1 as 2nd parameter indicates to apply the function on the rows
w.means <- apply(dataset, 1, weighted.mean, w=weights)
I am afraid the question is a bit technical, but I hope someone might have stumbled into a similar subject, or give me a pointer of some kind.
If G is a group (in the sense of algebraic structure), and if g1, ..., gn are elements of G, is there an algorithm (or a function in some dedicated program, like GAP) to determine whether there is a subgroup of G such that those elements form a set of representatives for the cosets of the subgroup? (We may assume that G is a permutation group, and probably even the full symmetric group.)
(There are of course several algorithms to find the cosets of a given subgroups, like Todd-Coxeter algorithm; this is a kind of inverse question.)
The only solution I can come up with is naive. Basically if you have elements x1,...,xn, you would use GAP's LowIndexSubgroupsFpGroup to enumerate all subgroups with index n (discarding those with index < n). Then you would go through each such group, generate the cosets, and check that each coset contains one of the elements.
This is all I could think of. I would be very interested if you came up with a better approach.
What you're trying to determine is if there is a subgroup H of G such that {g1, ..., gn} is a transversal of the cosets of H. i.e. A set of representatives of the partitioning of G by the cosets of H.
First, by Lagrange's theorem, |G| = |G:H| * |G|, where |G:H| = |G|/|H| is the index of the subgroup H of G. If {g1, ..., gn} is indeed a transversal, then |G:H| = |{g1, ..., gn}|, so the first test in your algorithm should be whether n divides |G|.
Moreover, since gi and gj are in the same right coset only if gigj-1 is in H, you can then check subgroups with index n to see if they avoid gigj-1. Also, note that (gigj-1)(gjgk-1) = gigk-1, so you can choose any pairing of the gis.
This should be sufficient if n is small compared to |G|.
Another approach is to start with H being the trivial group and add elements of the set H* = {h in G : hk != gigj-1, for all i, j, k; i != j} to the generators of H until you can't add any more (i.e. until it's no longer a subgroup). H is then a maximal subgroup of G such that H is a subset of H*. If you can get all such H (and have them be large enough) then the subgroup you're looking for must be one of them.
This approach would work better for larger n.
Either way a non-exponential-time approach isn't obvious.
EDIT: I've just found a discussion of this very topic here: http://en.wikipedia.org/wiki/Wikipedia:Reference_desk/Archives/Mathematics/2009_April_18#Is_a_given_set_of_group_elements_a_set_of_coset_representatives.3F
A slightly less brute approach would be to enumerate all subgroups of index n, as Il-Bhima suggested, and then for each subgroup, check each xi * xj-1 to see if it is contained in the subgroup.
The elements x1, ..., xn will be representatives for a subgroup if and only if EVERY product
xi * xj-1 where (i != j)
is NOT in the subgroup.
This type of check seems both simpler than generating all cosets, and computationally faster.