How to use dim. argument in rowVars on an array in R

How to use dim. argument in rowVars on an array in R - r

This question is quite basic: I'm very confused by the documentation for rowVars in the package MatrixStats in R.
I have an array of dimensions (12, 12, 10000), ie 10000 12x12 matrices. rowMeans very easily gives me the mean of each row of each matrix in the form of a list with 10000 items. I want to do the same with rowVars to get variances.
This works fine for a single matrix, but for anything with more dimensions it gives an error message saying to use the dim argument, and I don't understand how it works. The package documentation says that dim is "An integer vector of length two specifying the dimension of x, also when not a matrix." (where x is the object the function is to be used on). However, I don't understand what this means, and haven't been able to find any helpful examples of it in use. What does 'specifying the dimension' mean- specifying how many dimensions? or specifying the size of each eg (12, 12, 10000)? If so, how can it be of length 2?
Thank you!

The documentation for rowVars states that the input x should be a "numeric N x K matrix," so it sounds like this function only supports two-dimensional matrices. If you want the variances of each row for your 10000 matrices, you could instead do something like this:
mat <- array(rnorm(12*12*10000,0,1),dim=c(12,12,10000))
rvs <- sapply(1:10000,function(x) rowVars(mat[,,x]))
The resulting object rvs will be a matrix where the nth column is the row variances for the nth 12 x 12 matrix.

Related

dim(x) must have a positive length, yet I can't see the problem

I'm creating a data frame of a deck of cards (1,2,3,3,4,4,5,6,7,8). Taking a ggplot but applying a tt=sapply(t,card_2), R gives me an error saying dim(X) must have a positive length. Can anyone help me on this? Thank you

This is failing because of the following lines:
sum_a=apply(a,2,sum)
min_a=apply(a,2,min)
sum_b=apply(a,2,sum)
min_b=apply(b,2,min)
The sum and min functions are aggregating functions. They return a single value over a vector (or matrix). You are asking R to iterate over your a and calculate a sum or minimum for each value (which is nonsense). Just do:
sum_a=sum(a)
min_a=min(a)
sum_b=sum(b)
min_b=min(b)
Also, you need to make sure a and b are numeric first.

Indexing variables in R

I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)

I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!

Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.

Connect dots function

I am learning cross validation method.
In the lines below, the input and query are both a data frame.
my.knn <- get.knnx(input,query,k=2)
nn.index <- my.knn$nn.index
What does the second line mean? What will nn.index be?

my.knn is a list of variables. So nn.index is taking that value out of the list so you can work on it as a single variable.
EXAMPLE OF GETTING ELEMENTS OUT OF A LIST
stats <- list("mean" = 10, "data" = c(0, 10 ,20))
#just get the average out
my.average <- stats$mean
So a list can have different kind of results from your testing, and can have a mix of variable types (integers, strings, vectors). The $ syntax is taking one of the variables out of the list into a single variable.
If you type my.knn at the prompt you will see its contents with sections marked with $. This will help see what is in your list.
In the example:
> stats
$mean
[1] 10
$data
[1] 0 10 20
SPECIFICS ON FUNCTION
I looked at get.knnx function notes, assuming you are using FNN package, here http://www.inside-r.org/packages/cran/fnn/docs/get.knn:
Output a list contains:
nn.index
an n x k matrix for the nearest neighbor indice(s).
nn.dist
an n x k matrix for the nearest neighbor Euclidean distances.
So you can see your function output list has these two variables - an index of the nearest neighbour, and the second is the distances.
Trust this helps.

Preallocate sparse matrix with max nonzeros in R

I'm looking to preallocate a sparse matrix in R (using simple_triplet_matrix) by providing the dimensions of the matrix, m x n, and also the number of non-zero elements I expect to have. Matlab has the function "spalloc" (see below), but I have not been able to find an equivalent in R. Any suggestions?
S = spalloc(m,n,nzmax) creates an all zero sparse matrix S of size m-by-n with room to hold nzmax nonzeros.

Whereas it may make sense to preallocate a traditional dense matrix in R (in the same way it is much more efficient to preallocate a regular (atomic) vector rather than increasing its size one by one,
I'm pretty sure it will not pay to preallocate sparse matrices in R, in most situations.
Why?
For dense matrices, you allocate and then assign "piece by piece", e.g.,
m[i,j] <- value
For sparse matrices, however that is very different: If you do something like
S[i,j] <- value
the internal code has to check if [i,j] is an existing entry (typically non-zero) or not. If it is, it can change the value, but otherwise, one way or the other, the triplet (i,j, value) needs to be stored and that means extending the current structure etc. If you do this piece by piece, it is inefficient... mostly irrespectively if you had done some preallocation or not.
If, on the other hand, you already know in advance all the [i,j] combinations which will contain non-zeroes, you could "pre-allocate", but in this case,
just store the vector i and j of length nnzero, say. And then use your underlying "algorithm" to also construct a vector x of the same length which contains all the corresponding values, i.e., entries.
Now, indeed, as #Pafnucy suggested, use spMatrix() or sparseMatrix(), two slightly different versions of the same functionality: Constructing a sparse matrix, given its contents.
I am happy to help further, as I am the maintainer of the Matrix package.

Bandwidth selection using NP package

New to R and having problem with a very simple task! I have read a few columns of .csv data into R, the contents of which contains of variables that are in the natural numbers plus zero, and have missing values. After trying to use the non-parametric package, I have two problems: first, if I use the simple command bw=npregbw(ydat=y, xdat=x, na.omit), where x and y are column vectors, I get the error that "number of regression data and response data do not match". Why do I get this, as I have the same number of elements in each vector?
Second, I would like to call the data ordered and tell npregbw this, using the command bw=npregbw(ydat=y, xdat=ordered(x)). When I do that, I get the error that x must be atomic for sort.list. But how is x not atomic, it is just a vector with natural numbers and NA's?
Any clarifications would be greatly appreciated!

1) You probably have a different number of NA's in y and x.
2) Can't be sure about this, since there is no example. If it is of following type:
x <- c(3,4,NA,2)
Then ordered(x) should work fine. Please provide an example of your case.
EDIT: You of course tried bw=npregbw(ydat=y, xdat=x)? ordered() makes your vector an ordered factor (see ?ordered), which is not an atomic vector (see 2.1.1 link and ?factor)
EDIT2: So the problem was the way of subsetting data. Note the difference in various ways of subsetting. data$x and data[,i] (where i = column number of column x) give you vectors, while data[c("x")] and data[i] give a data frame. Functions expect vectors, unless they call for data = (your data). In that case they work with column names

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to use dim. argument in rowVars on an array in R - r

Related

dim(x) must have a positive length, yet I can't see the problem

Indexing variables in R

Connect dots function

Preallocate sparse matrix with max nonzeros in R

Bandwidth selection using NP package

Categories

Resources