I have a table (t1) as follows:
t1 <- array(1:20, dim=c(10,10))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 11 1 11 1 11 1 11 1 11
[2,] 2 12 2 12 2 12 2 12 2 12
[3,] 3 13 3 13 3 13 3 13 3 13
[4,] 4 14 4 14 4 14 4 14 4 14
[5,] 5 15 5 15 5 15 5 15 5 15
[6,] 6 16 6 16 6 16 6 16 6 16
[7,] 7 17 7 17 7 17 7 17 7 17
[8,] 8 18 8 18 8 18 8 18 8 18
[9,] 9 19 9 19 9 19 9 19 9 19
[10,] 10 20 10 20 10 20 10 20 10 20
I want to transform this table to either 1 or 0. If the cells number is >5, we give it a 1, if the cells number is <5 or = 5, we give it a 0. Thus after transformation, table t1 will become the followings:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 1 0 1 0 1 0 1 0 1
[2,] 0 1 0 1 0 1 0 1 0 1
[3,] 0 1 0 1 0 1 0 1 0 1
[4,] 0 1 0 1 0 1 0 1 0 1
[5,] 0 1 0 1 0 1 0 1 0 1
[6,] 1 1 1 1 1 1 1 1 1 1
[7,] 1 1 1 1 1 1 1 1 1 1
[8,] 1 1 1 1 1 1 1 1 1 1
[9,] 1 1 1 1 1 1 1 1 1 1
[10,] 1 1 1 1 1 1 1 1 1 1
Which commands should I used?
You can just use an ifelse() statement:
ifelse(t1 > 5, 1, 0)
Since a statement like t1 > 5 will result in a matrix of TRUE and FALSE, and since R treats TRUE as "1" and FALSE as "0", for this particular transformation, you could also do:
(t1 > 5)+0
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 1 0 1 0 1 0 1 0 1
[2,] 0 1 0 1 0 1 0 1 0 1
[3,] 0 1 0 1 0 1 0 1 0 1
[4,] 0 1 0 1 0 1 0 1 0 1
[5,] 0 1 0 1 0 1 0 1 0 1
[6,] 1 1 1 1 1 1 1 1 1 1
[7,] 1 1 1 1 1 1 1 1 1 1
[8,] 1 1 1 1 1 1 1 1 1 1
[9,] 1 1 1 1 1 1 1 1 1 1
[10,] 1 1 1 1 1 1 1 1 1 1
I haven't timed it, but an alternative to ifelse is just to use a conditional statement.
foo <- bar < 5
Oh, dang, Julius went and beat me to it by 10 seconds.
Related
i have to put a new vector (in this example zero-vector) into an exisiting matrix. The problem is that I have an iterative process and the positions and number of vectors to insert change. I have not been able to come up with a function that a) works and b) is efficient enough for huge amounts of data.
A non-dynamic approach using simply cbind() is
old <- matrix(1,10,10) #original matrix
vec <- matrix(5,10,1) #vector 1 to insert
vec2 <- matrix(8,10,1) #vector 2 to insert
old
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 1 1 1 1
[2,] 1 1 1 1 1 1 1 1 1 1
[3,] 1 1 1 1 1 1 1 1 1 1
[4,] 1 1 1 1 1 1 1 1 1 1
[5,] 1 1 1 1 1 1 1 1 1 1
[6,] 1 1 1 1 1 1 1 1 1 1
[7,] 1 1 1 1 1 1 1 1 1 1
[8,] 1 1 1 1 1 1 1 1 1 1
[9,] 1 1 1 1 1 1 1 1 1 1
[10,] 1 1 1 1 1 1 1 1 1 1
#assume that the positions to insert are 4 and 8
goal <- cbind(old[,c(1:3)],
vec,
old[,4:6], #attention, now old column 6 is new column 7
vec2,
old[,7:ncol(old)])
goal
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 1 5 1 1 1 8 1 1 1 1
[2,] 1 1 1 5 1 1 1 8 1 1 1 1
[3,] 1 1 1 5 1 1 1 8 1 1 1 1
[4,] 1 1 1 5 1 1 1 8 1 1 1 1
[5,] 1 1 1 5 1 1 1 8 1 1 1 1
[6,] 1 1 1 5 1 1 1 8 1 1 1 1
[7,] 1 1 1 5 1 1 1 8 1 1 1 1
[8,] 1 1 1 5 1 1 1 8 1 1 1 1
[9,] 1 1 1 5 1 1 1 8 1 1 1 1
[10,] 1 1 1 5 1 1 1 8 1 1 1 1
However, I could not think of something that works with both changing positions and number of vectors to insert.
Any help is greatly appreciated, thank you very much.
cbind the vectors onto old and then reorder. If we knew that no were already sorted then we could replace sort(no) with no.
no <- c(4, 8)
vecs <- cbind(vec, vec2)
cbind(old, vecs)[, order(c(1:ncol(old), sort(no) - seq_along(no))) ]
Extending G. Grothendiecks approach and solving the ordering problem:
pos<-c(4,8)
pos<-pos-c(1:length(pos))
cbind(old, vec, vec2)[, order(c(1:ncol(old), c(pos)))]
Edit: Sorry, didn't see the edit of the answer above :)
How to construct a two-circulant matrix?
For example, the following matrix A is two-circulant, i.e every column (expect from the first one) is obtained from the previous one by putting the last two elements as first. Note that the first column is the generator of the matrix.
N=12
k=6
x=c(0,0,0,0,1,1,1,1,2,2,2,2)
A=matrix(0,N,k)
A[,1]=x
for( j in 2:ncol(A) )
{
A[,j]=c(A[11:12,j-1],A[1:10,j-1])
}
> A
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 2 2 1 1 0
[2,] 0 2 2 1 1 0
[3,] 0 0 2 2 1 1
[4,] 0 0 2 2 1 1
[5,] 1 0 0 2 2 1
[6,] 1 0 0 2 2 1
[7,] 1 1 0 0 2 2
[8,] 1 1 0 0 2 2
[9,] 2 1 1 0 0 2
[10,] 2 1 1 0 0 2
[11,] 2 2 1 1 0 0
[12,] 2 2 1 1 0 0
Is there any other way to constuct the matrix A? For example by using a function.
You could use the following:
circular_matrix <- function(x, ncol) {
coll <- list(x)
for (i in 1:(ncol-1)) {
current <- coll[[length(coll)]]
coll[[length(coll) + 1]] <- c(tail(current, 2), current[1:(length(current) - 2)])
}
do.call(cbind, coll)
}
circular_matrix(1:10, 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 9 7 5 3
# [2,] 2 10 8 6 4
# [3,] 3 1 9 7 5
# [4,] 4 2 10 8 6
# [5,] 5 3 1 9 7
# [6,] 6 4 2 10 8
# [7,] 7 5 3 1 9
# [8,] 8 6 4 2 10
# [9,] 9 7 5 3 1
#[10,] 10 8 6 4 2
I just was wondering if there was an easy way to compute the maximal number of identical elements between any two columns of a matrix in R.
For example, I have a matrix
test <- replicate(10, sample((0:3), 10, replace = TRUE))
test
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 0 1 0 2 2 1 0 2 0
[2,] 1 1 3 2 0 2 3 0 2 2
[3,] 2 3 0 0 1 2 0 3 0 2
[4,] 2 2 1 1 2 0 0 1 1 0
[5,] 2 0 1 2 0 1 1 1 0 0
[6,] 1 0 1 3 2 3 3 1 3 2
[7,] 0 1 3 2 1 0 1 2 1 1
[8,] 0 3 1 3 0 2 3 1 1 1
[9,] 2 3 1 3 0 1 0 1 3 2
[10,] 3 2 1 0 2 1 3 2 3 1
To compare column 1 and 2 I use
table(test[,1] == test[,2])
FALSE TRUE
8 2
So there are two identical elements between these two columns.
I could now repeat this for all pairs of columns using two nested for loops and then find the maximum number of TRUE calls but this does not look nice. Can anyone think of a better way?
Cheers,
Maik
It is always interesting to see a reasonable answer being voted down. Though I don't like this minus score, I would keep my answer. Voter, what do you think?
Let's first get some reproducible toy data:
set.seed(0); x <- replicate(10, sample((0:3), 10, replace = TRUE))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 3 0 3 1 1 2 1 3 3 0
# [2,] 1 0 3 1 3 1 3 1 1 0
# [3,] 1 0 0 2 2 3 1 3 2 0
# [4,] 2 2 2 1 3 1 1 1 1 2
# [5,] 3 1 0 0 2 0 1 1 1 3
# [6,] 0 3 1 3 2 0 2 1 3 3
# [7,] 3 1 1 2 3 0 1 3 0 3
# [8,] 3 2 0 3 0 1 1 3 2 1
# [9,] 2 3 1 0 1 2 3 1 0 1
#[10,] 2 1 3 2 2 2 0 3 0 3
For any input matrix x, you can use:
y <- unlist(lapply(seq_len(ncol(x)-1L),
function(i) colSums(x[, (i+1):ncol(x), drop = FALSE] == x[, i])))
# [1] 1 2 3 2 4 1 4 2 3 3 1 0 0 3 1 3 5 1 3 1 2 4 1 4 3 4 2 3 5 1 1 3 2 1 2 2 3 3
#[39] 1 2 3 1 4 3 1
max(y)
# [1] 5
The comment by #David is doing essentially the same thing but way slower:
y <- combn(ncol(x), 2, FUN = function(u) sum(x[, u[1]] == x[, u[2]]))
# [1] 1 2 3 2 4 1 4 2 3 3 1 0 0 3 1 3 5 1 3 1 2 4 1 4 3 4 2 3 5 1 1 3 2 1 2 2 3 3
#[39] 1 2 3 1 4 3 1
max(y)
# [1] 5
Benchmarking
We generate a 10 * 1000 matrix for experiment:
set.seed(0); x <- replicate(1e+3, sample((0:3), 10, replace = TRUE))
system.time(unlist(lapply(seq_len(ncol(x)-1L), function(i) colSums(x[, (i+1):ncol(x), drop = FALSE] == x[, i]))))
# user system elapsed
# 0.176 0.032 0.207
system.time(combn(ncol(x), 2, FUN = function(u) sum(x[, u[1]] == x[, u[2]])))
# user system elapsed
# 4.692 0.008 4.708
Something like a distance matrix?
With this idea, you could also generate a "distance" matrix for number of non-equal elements between all columns (just replace the == with !=):
y <- unlist(lapply(seq_len(ncol(x)-1L),
function(i) colSums(x[, (i+1):ncol(x), drop = FALSE] != x[, i])))
z <- matrix(0L, ncol(x), ncol(x))
z[lower.tri(z)] <- y
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 0 0 0 0 0 0 0 0 0 0
# [2,] 9 0 0 0 0 0 0 0 0 0
# [3,] 8 7 0 0 0 0 0 0 0 0
# [4,] 7 9 9 0 0 0 0 0 0 0
# [5,] 8 10 7 7 0 0 0 0 0 0
# [6,] 6 10 9 6 9 0 0 0 0 0
# [7,] 9 7 8 8 7 8 0 0 0 0
# [8,] 6 9 6 7 8 7 8 0 0 0
# [9,] 8 7 9 5 9 7 7 6 0 0
#[10,] 7 5 6 9 8 9 9 7 9 0
Note that only lower triangular matrix is computed due to symmetry. Diagonal are all zeros (or course).
Try:
max(combn(split(test, col(test)), 2, function(x) sum(x[[1]] == x[[2]])))
If you want to know which pair has the greatest number of equal elements it's a little more complicated.
I have a variable that is a list.
[[1]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 1 1 1 1 1 4
[2,] 1 1 1 1 1 1 1 4
[3,] 1 1 1 1 1 1 1 4
[4,] 1 1 1 1 1 1 1 4
[5,] 1 1 1 1 1 1 1 4
[6,] 1 1 1 1 1 1 1 4
[7,] 1 1 1 1 1 1 1 4
[8,] 1 1 1 1 1 1 1 4
[9,] 1 1 1 1 1 1 1 4
[10,] 1 1 1 1 1 1 1 4
[11,] 1 1 1 1 1 1 1 4
[[2]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 1 1 1 1 1 3
[2,] 1 1 1 1 1 1 1 4
[3,] 1 1 1 1 1 1 1 4
[4,] 1 1 1 1 1 1 1 3
[5,] 1 1 1 1 1 1 1 4
[6,] 1 1 1 1 1 1 1 4
[7,] 1 1 1 1 1 1 1 4
[8,] 1 1 1 1 1 1 1 4
[9,] 1 1 1 1 1 1 1 4
[10,] 1 1 1 1 1 1 1 4
[11,] 1 1 1 1 1 1 1 4
I need an empty variable which is same format as this variable.
My function will do some computation and will be put in the same location as this variable. However in the new variable I will not have 8th column.
Replicate your data
df = list(matrix(rep(1, 88), ncol = 8), matrix(rep(1, 88), ncol = 8))
Remove the 8th column from all sublist
new_df = lapply(df, function(x) x[,-8])
Remove the 8th column from all sublist and replace all the 1's with NA
new_df = lapply(df, function(x) replace(x[,-8], x[,-8] == 1, NA))
I work with large data set(1200*10000),in my data sets some columns have a same value except in one or two point, I need to detect and delete this columns, for example in column “1846”:
> x[317:400,1846]
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[81] 2 2 **1** 2
Other row values(1:317 and 400:1200)=2.
How can I solve this?
For example in some part of My file (1200*10000),
x
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 0 1 2 0 1 0 1 2 2 1
[2,] 1 1 0 1 2 0 1 0 1 2 1 1
[3,] 2 1 0 1 2 0 1 0 1 2 2 1
[4,] 1 2 0 1 2 0 1 0 1 2 2 2
[5,] 0 1 0 1 2 0 1 0 1 2 1 1
[6,] 2 0 0 1 2 0 1 2 0 2 1 2
[7,] 1 1 0 1 2 1 1 0 1 2 0 2
[8,] 0 1 0 1 2 0 1 0 1 2 0 0
[9,] 0 1 0 1 2 0 1 0 1 1 2 1
[10,] 1 1 0 1 2 0 1 0 1 2 1 1
I want to remove in my original data set columns like 3 to 10.
Continue from my answer in your first post,
detect.col <- function(
x,
n.diff=3 # the minimal number of unique values required per column
)
{
ret <- which(apply(x,2,function(e){length(unique(e))}) >= n.diff)
ret
}
x[,detect.col(x)]
I guess this is what you actually mean?
mm<-read.table(text=" [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 0 1 2 0 1 0 1 2 2 1
[2,] 1 1 0 1 2 0 1 0 1 2 1 1
[3,] 2 1 0 1 2 0 1 0 1 2 2 1
[4,] 1 2 0 1 2 0 1 0 1 2 2 2
[5,] 0 1 0 1 2 0 1 0 1 2 1 1
[6,] 2 0 0 1 2 0 1 2 0 2 1 2
[7,] 1 1 0 1 2 1 1 0 1 2 0 2
[8,] 0 1 0 1 2 0 1 0 1 2 0 0
[9,] 0 1 0 1 2 0 1 0 1 1 2 1
[10,] 1 1 0 1 2 0 1 0 1 2 1 1", row.names=1, header=T)
now,
mm[,which(apply(mm,2,function (x) {length(unique(x))})==3)
output
X..1. X..2. X..11. X..12.
[1,] 1 1 2 1
[2,] 1 1 1 1
[3,] 2 1 2 1
[4,] 1 2 2 2
[5,] 0 1 1 1
[6,] 2 0 1 2
[7,] 1 1 0 2
[8,] 0 1 0 0
[9,] 0 1 2 1
[10,] 1 1 1 1
I am not certain, but I think you want to delete any columns that contain a single value in n-1 or n-2 rows where n is the number of rows. If so, then you would want to delete:
column x2 in my.data because it contains 9 '1's and one '0' and
column x5 in my.data because it contains 8 '2's and two '1's.
The code below does that. Sorry if this is not what you are trying to do. I am not sure whether this code would perform well with a huge data frame.
my.data <- read.table(text='
x1 x2 x3 x4 x5 x6
1 1 2 2 2 1
1 1 2 1 1 2
1 1 2 2 2 3
1 1 2 2 2 4
1 1 2 1 2 5
1 1 2 2 2 6
1 0 2 2 2 7
1 1 2 1 2 8
1 1 2 2 1 9
1 1 2 2 2 10
', header = TRUE)
my.data
my.summary <- as.data.frame.matrix(table( rep(colnames(my.data),
each=nrow(my.data)), unlist(my.data)))
my.summary
delete.these <- which(my.summary == (nrow(my.data)-2) |
my.summary == (nrow(my.data)-1), arr.ind = TRUE)[,1]
my.data[,-delete.these]
x1 x3 x4 x6
1 1 2 2 1
2 1 2 1 2
3 1 2 2 3
4 1 2 2 4
5 1 2 1 5
6 1 2 2 6
7 1 2 2 7
8 1 2 1 8
9 1 2 2 9
10 1 2 2 10
This will keep only columns with one distinct value, assuming your data.frame is named x:
keepIndex <- apply(
x,
2,
FUN = function(column) {
return(length(unique(column)) == 1)
})
x <- x[, keepIndex]
This Should work,
m<-matrix(2,nrow=100, ncol=100) #making dummy matrix m
m[sample(1:100,10), sample(1:100,10)]<-1 #replacing some random row and col to 1
m[,-which(colSums(m==1)>0)] #getting rid of cols with 1
A solution based on Boolean indexing.
> x<-cbind(c(1,1,1,1),c(1,1,1,2),c(1,1,1,1))
> x
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
[4,] 1 2 1
> x[,colSums(x!=x[1,])==0]
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 1 1
[4,] 1 1
If your data is stored in a data frame named df:
df[ ,sapply(df, function(x) all(x[1] == x[-1]))]
Either search the whole data or a subset of it:
detect.col <- function(
x,row.from=1,row.to=nrow(x),col.from=1,col.to=ncol(x),
n.diff=3 # the minimal number of unique values required per column
)
{
tmp.x <- x[row.from:row.to,col.from:col.to]
ret <- which(apply(tmp.x,2,function(e){length(unique(e))}) < n.diff )
if(length(ret)){
ret <- ret+col.from-1
}
ret
}
## search the whole
detect.col(x) # columns to remove
## Or only search within a range, like in your case
row.from <- 317
row.to <- 400
col.from <- 1000
col.to <- 2000
col.to.remove <- detect.col(x,row.from,row.to,col.from,col.to)
x[,-col.to.remove] # print those to keep