Im not aware of any direct commands to do this in R. Any inputs?
To make a 3x3 matrix, do this:
matrix(something, nrow=3, ncol=3)
But you need to replace something with however you want to make "arbitrary" numbers. Use runif(9) for a random (uniformly distributed) real number between 0 and 1. Use sample(1:100, 9, T) to draw 9 numbers from the integers 1 through 100 with replacement. Use rnorm(9) to draw 9 numbers from a standard normal distribution. Etc.
Related
I am simulating some draws using random numbers. Unlikely, the generated numbers are not random as I would like. In fact, I obtain that there are some linear combinations.
In details, I have the following starting data:
start_vector = c(1,10,30,40,50,100) # length equal to 6
residual_of_model = 5
n = 1000 # Number of simulations
I try to simulate n observations from a random normal distribution for each of the start_vector elements, assuming it as a "random noise" to add to the original value (that is the one into start_vector):
out_vec <- matrix(NA, nrow = n, ncol = length(start_vector))
for (h_aux in 1:length(start_vector))
{
random_noise <- rnorm(n, 0, residual_of_model)
out_vec[,h_aux] <- as.numeric(start_vector[h_aux]) + random_noise
}
At this point, I obtain a matrix of size 6x1000. In theory, I assume all the columns and the rows in the matrix are linearly independent among them.
If I try to check it, using the findLinearCombos() function from the caret package I obtain that all the columns are indepent:
caret::findLinearCombos(out_vec)
If I try to evaluate the independence among the rows, using the following code:
caret::findLinearCombos(t(out_vec))
I obtain that all the rows from 7 to 1000 are a linear combination of the first 6 (the length of start_vector).
It is really strange in my opinion, I would like to not observe no dependencies at all since the rows are generated adding a random number using rnorm.
What am I missing? Is there some bug? Thanks in advance!
Working in R, I need to create a vector of length n with the values randomly drawn from a Poisson distribution with lambda=1, but with a lower bound of 2 and upper bound of 6 (i.e. all numbers will be either 2,3,4,5, or 6).
I am unsure how to do this. I tried creating a for loop that would replace any values outside that range with values inside the range:
seed(123)
n<-25 #example length
example<-rpois(n,1)
test<-example #redundant - only duplicating to compare with original *example* values
for (i in 1:length(n)){
if (test[i]<2||test[i]>6){
test[i]<-rpois(1,1)
}
}
But this didn't seem to work (still getting 0's and 1, etc, in test). Any ideas would be greatly appreciated!
Here is one way to generate n numbers with Poisson distribution and replace all the numbers which are outside range to random number inside the range.
n<-25 #example length
example<-rpois(n,1)
inds <- example < 2 | example > 6
example[inds] <- sample(2:6, sum(inds), replace = TRUE)
Many R functions for simulating from probability distributions are vectorised. ?rmultinom says that dmultinom is not vectorized, hence I assume that also rmultinom is not. What is the most efficient way to execute rmultinom repeatedly across a set of probabilities?
For example:
p = matrix( c(0.1,0.2,0.3,0.4,0.2,0.3,0.4,0.1,0.3,0.4,0.2,0.1), ncol=4, nrow = 3, T)
p is a 3 x 4 matrix of probabilities that sum to one for each row. It is now the goal to create n samples of size for each row. For simplicity use n=1, size=1, the categorical distribution.
rmultinom(1,1,p) gives a 12 x 1 matrix. The desired result is a 4 x 3 matrix though, for which there is exactly 1 element equal to 1 for each column.
A for loop is possible but seems inefficient. Is there a better way to achieve this (for large matrices p)?
I want to generate a sample of integer numbers in R with a specified mean.
I used mu+sd*scale(rnorm(n)) to generate a sample of n values that has exactly the mean=mu
but this generates floating-point values; I would like to generate integer values instead. For example, I would like to generate a sample of mean=4. My sample size n=5, an example of generated values would be {2,6,4,3,5}.
Any ideas on how to do this in R while satisfying the constraint of a specific value of the mean?
Picking n values with a mean of m is equivalent to picking n values that sum to m*n. (I'm assuming you're going to stick to positive integers -- otherwise things get much harder!) Here is a solution based on sampling partitions (sets of values that add up to the desired total) uniformly, but I'm not sure it's what you want, since it doesn't sample uniformly over values, but over partitions ... perhaps someone else can do better, or figure out how to reweight the samples.
This brute-force solution will also probably fail for cases much larger than your example (there are 627 partitions for a total of 20, 5604 for a total of 30, 37338 for a total of 40 ...)
m <- 4
n <- 5
library("partitions")
pp <- parts(m*n) ## all sets of integers that sum to m*n (=20 here)
## restrict to partitions with exactly n (=5) non-zero values.
pp5 <- pp[1:5,colSums(pp>0)==n]
set.seed(101) ## for reproducibility
## sample uniformly from this set
pp5[,sample(ncol(pp5),size=1)] ## 9, 5, 4, 1, 1
I have the number of samples per unit and need to calculate statistics with R.
The table is like this (all rows and columns are actually filled with values, I only write a few here for easier visibility, and there are many more columns):
Hour 1 2 3 4
H1 72 11 98 65
H2 19 27
H3
H4
H5
:
H200000
I.e. the first hour (H1) there were 72 samples of value 1, 11 samples of value 2, etc. The second hour(H2) there were 19 samples of value 1, 27 samples of value 2, etc.
I need to calculate the mean and standard deviation per hour (i.e. per row). As there are many thousands of rows I need a fast method.
Example: The manual mean-calculation for hour 1 (H1) would be:
(72x1 + 11x2 + 98x3 + 65x4)/(72+11+98+65) = 2.6
I suppose there are R-methods or packages that can do this, but I fail to find where. Your support is highly appreciated.
Thanks,
Chris
You want to calculate a weighted mean, so you need weighted.mean. For the first row:
values <- c(1, 2, 3, 4)
weights <- c(72, 11, 98, 65)
weighted.mean(values, weights)
The weighted standard deviation is not well-defined. You could use a hand-rolled weighted RMS as an estimator (but this assumes that your input sample is really from a single Gaussian, i.e. there are no outliers -- not sure if that's the case for your example).
# same values and weights as above
sqrt(sum(values^2*weights^2))/sum(weights)
You should read your data into a table and iterate over every row. Also, "many thousands of rows" is not necessarily a large number for such a simple calculation. This is very basic stuff, maybe checking out a tutorial would also be beneficial.
You are much better off (i.e. faster calculations) using matrix operations instead of applying something by row. For example, assuming X is the matrix containing your data, you can get the weighted means the following way:
w <- 1:ncol(X)
w <- w/sum(w) #scale to have a sum of 1
wmeans <- X %*% w
Assuming your table is a matrix called dataset of n * 20000 and you have the weigths in a weights array you just need to do:
# The 1 as 2nd parameter indicates to apply the function on the rows
w.means <- apply(dataset, 1, weighted.mean, w=weights)