I want to create k blocks with identical values in a n*n matrix (k can be divided exactly by the row number times the columns number as n*n ):
for example, when n = 4 and k = 4, (k can be divided exactly by 4*4=16), a matrix is create like this:
1 1 2 2
1 1 2 2
3 3 4 4
3 3 4 4
How can I do this without a for loop?
There's a fantastically useful mathematical operator called a Kronecker product:
m1 <- matrix(1:4,nrow=2,byrow=TRUE)
m2 <- matrix(1,nrow=2,ncol=2)
kronecker(m1,m2)
The Matrix package has methods for Kronecker products of sparse matrices (?"kronecker-methods"), so that you can easily build huge sparse patterned matrices as long as you can find a way to express the pattern in terms of Kronecker products.
Related
If I have an input file with 7 columns or an arbitrary number of columns, and each column has a number of values in it, and I want to run correlations for all unique pairs(where AB and BA are the same) of columns without having to go in and do cor.test(column$A,
Column$B) for every possible pair how do I do this in R?
Example data:
A B C D
1 2 2 3
3 2 2 1
5 2 4 3
5 2 3 3
In this case A, B, C,D are different columns and I would want to do all possible correlations for unique pairs where AB and BA count as the same pair, just as DA and AD would be the same pair.
You can try:
cor(df, use="complete.obs", method="kendall") #or whichever method fits you
or:
#this gives significance levels also
library(Hmisc)
rcorr(df, type="pearson") # type can be pearson or spearman
#or as matrix
rcorr(as.matrix(df))
If this doesn't work, try creating a list of column vectors and loop through everything (I will try to provide an example in edit)
Hope this helps
Let's say I have a vector a = (1,3,4).
I want to create new vector with integer numbers in range [1,length(a)]. But the i-th number should appear a[i] times.
For the vector a I want to get:
(1,2,2,2,3,3,3,3)
Would you explain me how to implement this operation without several messy concatenations?
You can try rep
rep(seq_along(a), a)
#[1] 1 2 2 2 3 3 3 3
data
a <- c(1,3,4)
Let's say I have a vector of integers 1:6
w=1:6
I am attempting to obtain a matrix of 90 rows and 6 columns that contains the multinomial combinations from these 6 integers taken as 3 groups of size 2.
6!/(2!*2!*2!)=90
So, columns 1 and 2 of the matrix would represent group 1, columns 3 and 4 would represent group 2 and columns 5 and 6 would represent group 3. Something like:
1 2 3 4 5 6
1 2 3 5 4 6
1 2 3 6 4 5
1 2 4 5 3 6
1 2 4 6 3 5
...
Ultimately, I would want to expand this to other multinomial combinations of limited size (because the numbers get large rather quickly) but I am having trouble getting things to work. I've found several functions that do binomial combinations (only 2 groups) but I could not locate any functions that do this when the number of groups is greater than 2.
I've tried two approaches to this:
Building up the matrix from nothing using for loops and attempting things with the reshape package (thinking that might be something there for this with melt() )
working backwards from the permutation matrix (720 rows) by attempting to retain unique rows within groups and or removing duplicated rows within groups
Neither worked for me.
The permutation matrix can be obtained with
library(gtools)
dat=permutations(6, 6, set=TRUE, repeats.allowed=FALSE)
I think working backwards from the full permutation matrix is a bit excessive but I'm tring anything at this point.
Is there a package with a prebuilt function for this? Anyone have any ideas how I shoud proceed?
Here is how you can implement your "working backwards" approach:
gps <- list(1:2, 3:4, 5:6)
get.col <- function(x, j) x[, j]
is.ordered <- function(x) !colSums(diff(t(x)) < 0)
is.valid <- Reduce(`&`, Map(is.ordered, Map(get.col, list(dat), gps)))
dat <- dat[is.valid, ]
nrow(dat)
# [1] 90
I'm using the SVD package with R and I'm able to reduce the dimensionality of my matrix by replacing the lowest singular values by 0. But when I recompose my matrix I still have the same number of features, I could not find how to effectively delete the most useless features of the source matrix in order to reduce it's number of columns.
For example what I'm doing for the moment:
This is my source matrix A:
A B C D
1 7 6 1 6
2 4 8 2 4
3 2 3 2 3
4 2 3 1 3
If I do:
s = svd(A)
s$d[3:4] = 0 # Replacement of the 2 smallest singular values by 0
A' = s$u %*% diag(s$d) %*% t(s$v)
I get A' which has the same dimensions (4x4), was reconstruct with only 2 "components" and is an approximation of A (containing a little bit less information, maybe less noise, etc.):
[,1] [,2] [,3] [,4]
1 6.871009 5.887558 1.1791440 6.215131
2 3.799792 7.779251 2.3862880 4.357163
3 2.289294 3.512959 0.9876354 2.386322
4 2.408818 3.181448 0.8417837 2.406172
What I want is a sub matrix with less columns but reproducing the distances between the different rows, something like this (obtained using PCA, let's call it A''):
PC1 PC2
1 -3.588727 1.7125360
2 -2.065012 -2.2465708
3 2.838545 0.1377343 # The similarity between rows 3
4 2.815194 0.3963005 # and 4 in A is conserved in A''
Here is the code to get A'' with PCA:
p = prcomp(A)
A'' = p$x[,1:2]
The final goal is to reduce the number of columns in order to speed up clustering algorithms on huge datasets.
Thank you in advance if someone can guide me :)
I would check out this chapter on dimensionality reduction or this cross-validated question. The idea is that the entire data set can be reconstructed using less information. It's not like PCA in the sense that you might only choose to keep 2 out of 10 principal components.
When you do the kind of trimming you did above, you're really just taking out some of the "noise" of your data. The data still as the same dimension.
This question already has answers here:
Create a Data Frame of Unequal Lengths
(6 answers)
Closed 9 years ago.
I have the following elementary issue in R.
I have a for (k in 1:x){...} cycle which produces numerical vectors whose length depends on k.
For each value of k I produce a single numerical vector.
I would like to collect them as rows of a data frame in R, if possible. In other words, I would like to introduce a data frame data s.t.
for (k in 1:x) {
data[k,] <- ...
}
where the dots represent the command producing the vector with length depending on k.
Unfortunately, as far as I know, the length of the rows of a dataframe in R is constant, as it is a list of vectors of equal length. I have already tried to complete each row with a suitable number of zeroes to arrive at a constant length (in this case equal to x). I would like to work "dynamically", instead.
I do not think that this issue is equivalent to merge vectors of different lengths in a dataframe; due to the if cycle, only 1 vector is known at each step.
Edit
A very easy example of what I mean. For each k, I would like to write the vector whose components are 1,2,...,k and store it as kth row of the dataframe data. In the above setting, I would write
for (k in 1:x) {
data[k,] <- seq(1,k,1)
}
As the length of seq(1,k,1) depends on k the code does not work.
You could consider using ldply from plyr here.
set.seed(123)
#k is the length of each result
k <- sample( 5 , 3 , repl = TRUE )
#[1] 2 4 3
# Make a list of vectors, each a sequence from 1:k
ll <- lapply( k , function(x) seq_len(x) )
#[[1]]
#[1] 1 2
#[[2]]
#[1] 1 2 3 4
#[[3]]
#[1] 1 2 3
# take our list and rbind it into a data.frame, filling in missing values with NA
ldply( ll , rbind)
# 1 2 3 4
#1 1 2 NA NA
#2 1 2 3 4
#3 1 2 3 NA