Creating random binary matrices with different distributions - r

I have been recently helped in getting a function to write a random binary matrix, with the condition that the diagonal is 0s.
fun <- function(n){
vals <- sample(0:1, n*(n-1)/2, rep = T)
mat <- matrix(0, n, n)
mat[upper.tri(mat)] <- vals
mat[lower.tri(mat)] <- vals
mat
}
Here I am entering values from the 'sample' to the upper and lower triangle separately. I would like to keep this in any updated function because sometimes I may wish to enter transpositions of each triangle into the other.
What I would like assistance with is how to change the frequency of 1s in the random matrix. This already varies around, I believe, a normal distribution. e.g. in a 9x9 matrix, there are 81-9=72 cells to fill, and the average number of 1s used is 36.
However, if I wanted to create matrices with a probability of e.g. p=0.9 of there being a 1, or e.g. p=0.2 of there being a 1... - how is this done?
I tried some ways of changing the sample(0:1,) part of the code by adding in probability functions but I only got errors.
Thanks

You should look in to help page of sample function
?sample shows :
Usage
sample(x, size, replace = FALSE, prob = NULL)
where
prob
A vector of probability weights for obtaining the elements of
the vector being sampled.
and further below in Details you will see
The optional prob argument can be used to give a vector of weights for
obtaining the elements of the vector being sampled. They need not sum
to one, but they should be non-negative and not all zero.
So to answer your question, apart from read the manual , use prob=c(0.1,0.9) if you want probability of 0.1 for first element of x and 0.9 for the second.

Related

Number from sample to be drawn from a Poisson distribution with upper/lower bounds

Working in R, I need to create a vector of length n with the values randomly drawn from a Poisson distribution with lambda=1, but with a lower bound of 2 and upper bound of 6 (i.e. all numbers will be either 2,3,4,5, or 6).
I am unsure how to do this. I tried creating a for loop that would replace any values outside that range with values inside the range:
seed(123)
n<-25 #example length
example<-rpois(n,1)
test<-example #redundant - only duplicating to compare with original *example* values
for (i in 1:length(n)){
if (test[i]<2||test[i]>6){
test[i]<-rpois(1,1)
}
}
But this didn't seem to work (still getting 0's and 1, etc, in test). Any ideas would be greatly appreciated!
Here is one way to generate n numbers with Poisson distribution and replace all the numbers which are outside range to random number inside the range.
n<-25 #example length
example<-rpois(n,1)
inds <- example < 2 | example > 6
example[inds] <- sample(2:6, sum(inds), replace = TRUE)

How does distances weighting work in KNN?

I'm writing KNN classifier in R. I want to add weighting scheme, e. g. inverted indices 1/d. As it is, for Iris dataset I get almost perfect 66% accuracy (no matter the metric used) since value no. 3 ("virginica") almost never shows up and I want to make it better with weighting. My question is: what exactly and how do I weight? I've read that I should weight classes of K nearest neighbours with those distances.
I've tried creating vectors of classes and distances to K nearest neighbours and then taking weighted mean from it:
inverted <- function(vals, distances)
{
inv_distances <- 1 / distances
# eliminate division-by-zero errors
inv_distances <- ifelse((inv_distances < 0.01), 0.01, inv_distances)
weighted.mean(vals, inv_distances)
}
My results are weird: for correct vectors vals (classes) and distances I sometimes get NaN (Not a Number) or NA values. Also my weights don't sum to 1, and... they probably should? I'm not sure. I just need someone to clear this weighting scheme for me.
EDIT:
I've debugged above code, since it multiplied by weight too late (therefore not eliminating distance 0 and causing NaNs). I've also changed it to harmonic series weights, not using distance (so first neighbour has weight 1, second 1/2, third 1/3 etc.). I still don't know exactly how it works and what other weights may be.
inverted <- function(vals)
{
weights <- 1 / seq(length(vals))
res <- weighted.mean(vals, weights)
res
}

Is there an R function that populates the lower (or upper) diagonal of a matrix (with diagonal included) from a numeric object?

I have a list of a total of 55 numeric values. I want to create a 10x10 matrix in which only the lower (or upper) triangular matrix (with the diagonal itself) is populated. I know that I can use lower.tri() to create a lower triangular matrix, however, when I use this function, it seems like data is not populated by row. If i use, matrix(v, nrow= 10, ncol= 10, byrow= TRUE) then the full matrix is populated instead of just the lower diagonal. I have seen solutions to a similar problem (Fill lower matrix with vector by row, not column), but in that example, they use only 6 variables, whereas I have 10, and that solution gets distorted for me.
v <- 1:55
m <- diag(10)
A simple trick can do. For example, to populate a lower triangular matrix by row. First, populate an upper triangular matrix by column(which is by default), then transform the matrix. And vice versa.
Because R fill matrix in column by default, just fill the transformed matrix first, and transform it back.
Code example
m = diag(10)
upperm = upper.tri(m, diag = T)
m[upperm] = v; t(m)

R Repeat function from first subvector of vector until total vector length reached

I have a vector epsilon of length N. I am applying the function bw.CDF.pi(x, pilot="UCV") from the sROC package to compute bandwidths for cdf Kernel estimation.
My goal is to repeat this bandwidth function for every subvector from epsilon from the beginning value on. Stated otherwise, I would like to apply this function for the first value in epsilon, then for the first two values in epsilon, then for the first three values in epsilon, continiuing until the function is applied fot the total vector epsilon. Finally i want to have then N values for the bandwidth.
How can I accomplish this?
Apparently you need a vector of 2 elements for the function bw.CDF.pi to run. If you want to run it for the first 2 elemts of a vector, then the first 3, etc, you can do the following. Note that the data example is the one in the help page for the function.
library(sROC)
set.seed(100)
n <- 200
x <- c(rnorm(n/2, mean=-2, sd=1), rnorm(n/2, mean=3, sd=0.8))
lapply(seq_along(x)[-1], function(m) bw.CDF.pi(x[seq_len(m)], pilot="UCV"))

Random sampling based on vector of probability weights

I have the vector d<-1:100
I want to sample k=3 times from this vector without replacement. I would like to make elements that are at a distance length(d)/k from the first sampled element to have a higher probability of getting sampled. I am not yet sure how much higher. I know that sample has a prob= argument, however i can't seem to find a way so that the prob= vectors gets to be recalculated from the location of the initial sample.
Any ideas?
Example:
d<-1:100 . Lets say the first trial samples d[30]=30. Then the elements of ddd that are near 0, 60 and 90 should have a higher probability of sampling. So after the initial sample the the distribution of the sampling probabilities of the rest of the elements of ddd is as in the image:
I think:
samp <- sample(1:100,1)
prob <- rep(1,100)
prob[samp]=0
MORE EDIT: I'm an idiot today. Now this will make the probability shape you asked for.
peke<-c(2,5,7,10,7,5,2) #your 'triangle' probability
for (jj = c(0,2,3){
prob[(1:7)*(1+samp*(jj)] <- peke
}
newsamp <-sample(1:100,1,prob)
You may want to add a slight offset if that doesn't place the probability peaks where you wanted them.

Resources