Creating a matrix from operations done on another dataset - r

I have a large dataset, X with 58140 columns, filled with either 1 or 0
I would like to create a 58139 x 58139 matrix from the information of the 58139 columns in the dataset.
For each Aij in the matrix I would like to find the number of common rows which contain the value 1 for Column i+1 and Column J+1 from X.
I figured I can do this through sum(X[[2]]+X[[3]] == 2) for the A12 element of the matrix.
The only problem left is a way to code the matrix in.

You can use mapply. That returns a numeric vector. Then you can just wrap it in a call to matrix and ignore the first row and column.
# sample data
set.seed(123)
X <- data.frame(matrix(rbinom(200, 1, .5), nrow=10))
#
A <- matrix(mapply(function(i, j) sum(rowSums(X[, c(i,j)])==2),
i=rep(1:ncol(X), ncol(X)),
j=rep(1:ncol(X), each=ncol(X))),
ncol=ncol(X))[-1, -1]
A

Related

Applying a specific formula to a large matrix/data frame in R, in order to normalize my data

I have a large "distance matrix" (actually a 170x170 data frame in R), for example:
A B C
A 0.198395022 0.314012433 0.32704998
B 0.314012433 0.262514533 0.318539233
C 0.32704998 0.318539233 0.211224133
I am trying to apply a specific formula (which I already have) to bring this variation into the scale of 0-1, as required for my statistical modeling. I am expecting to obtain something like this across the whole data frame (expected output, when applying the formula):
A B C
A 1 0.846050953 0.825897603
B 0.846050953 1 0.822548469
C 0.825897603 0.822548469 1
So, I need to re-calculate each off-diagonal cell relative to the respective values by applying this formula in R:
Formula here
where B is the matrix of normalized values, H is my matrix/data frame, while i and j are the rows and columns of my matrix/data frame, respectively. It is supposed that this normalization procedure systematically replaces the diagonal (i = j) by 1.
Thanks!
You can make a loop in order to replace each value according to your formula:
df <- data.frame(rnorm(3, 20,15), rnorm(3, 10,5), rnorm(3, 200,100))
df # Check out the results
for (i in 1:length(df)){
for (j in 1:nrow(df)){
df[i,j] <- df[i,j]/((df[i,i]+df[j,j])/2)
}
}
df ## Note that is done!

How to Delete Every Row&Columns Which Contains Negative Value

I have dataframe called lexico which has a dimension of 11293x512.
I'd like to purge every row and column if any element in that column or row holds negative value.
How could I do this?
Following is my code that I tried but it takes too long time to run since it has nested loop structure.
(I was about to first get every column number that holds neg value in it)
colneg <- c()
for(i in 1:11293){
for(j in 1:512){
if(as.numeric(as.character(lexico[1283,2]))< 0)
colneg <- c(colneg, j)
}
}
It would be appreciate for your harsh advice for this novice.
A possible solution:
# create an index of columns with negative values
col_index <- !colSums(d < 0)
# create an index of rows with negative values
row_index <- !rowSums(d < 0)
# subset the dataframe with the two indexes
d2 <- d[row_index, col_index]
What this does:
colSums(d < 0) gives a numeric vector of the number of negative values in the columns.
By negating it with ! you create a logical vector where for the columns with no negative values get a TRUE value.
It works the same for rows.
Subsetting the dataframe with the row_index and the col_index gives you a dataframe where the rows as wel as the columns where the negative values appeared are removed.
Reproducible example data:
set.seed(171228)
d <- data.frame(matrix(rnorm(1e4, mean = 3), ncol = 20))

How to sum a matrix cell value into another by row/column names?

Imagine I have an overall list of authors
Authors <- c("Abel","Babel","Cain","Devil","Esau")
with it I build an overall adjacency matrix, initialized with zeroes
allAuthors <- matrix(0L,nrow=length(Authors),ncol=length(Authors),dimnames=list(Authors,Authors))
now I am stumbling on a paper coAuthored by these three guys
paperAuthors <- c("Babel","Cain","Devil")
and build another adjacency matrix of their collaboration, initialized with all 1s
coAuth <- matrix(1L,nrow=length(paperAuthors),ncol=length(paperAuthors),dimnames=list(paperAuthors,paperAuthors))
Question :
How do I sum the coAuth matrix cell values into the corresponding allAuthors
matrix cells using the row and colum names as indices ?
In other words I'd like to obtain the cells of the allAuthors matrix having 1s at the intersection of the paperAuthors authors while all other remain 0s.
Thank you very much
First we get the indexes in the coAuth matrix.
ind <- which(coAuth == 1, arr.ind = TRUE)
Now we have to find the corresponding indexes in the allAuthors matrix.
ind.allAuthors <- cbind(
match(rownames(coAuth), rownames(allAuthors))[ind[, 'row']],
match(colnames(coAuth), colnames(allAuthors))[ind[, 'col']])
And now we can sum the elements from both matrices:
allAuthors[ind.allAuthors] <- allAuthors[ind.allAuthors] + 1
Would subsetting work for your needs or do you need to define new matrix summation operation?
allAuthors[paperAuthors, paperAuthors] <-
allAuthors[paperAuthors, paperAuthors] + 1
allAuthors

Forcing Rbind with uneven columns in R

I am trying to force some list objects (e.g. 4 tables of frequency count) into a matrix by doing rbind. However, they have uneven columns (i.e. some range from 2 to 5, while others range from 1:5). I want is to display such that if a table does not begin with a column of 1, then it displays NA in that row in the subsequent rbind matrix. I tried the approach below but the values repeat itself in the row rather than displaying NAs if is does not exist.
I considered rbind.fill but it requires for the table to be a data frame. I could create some loops but in the spirit of R, I wonder if there is another approach I could use?
# Example
a <- sample(0:5,100, replace=TRUE)
b <- sample(2:5,100, replace=TRUE)
c <- sample(1:4,100, replace=TRUE)
d <- sample(1:3,100, replace=TRUE)
list <- list(a,b,c,d)
table(list[4])
count(list[1])
matrix <- matrix(ncol=5)
lapply(list,(table))
do.call("rbind",(lapply(list,table)))
When I have a similar problem, I include all the values I want in the vector and then subtract one from the result
table(c(1:5, a)) - 1
This could be made into a function
table2 <- function(x, values, ...){
table(c(x, values), ...) - 1
}
Of course, this will give zeros rather than NA

Removing the 3 biggest values of each column in a matrix in R

I have a matrix like
mat <- matrix(sample(100,100,replace=TRUE),nr=10)
I would now like to remove the 3 biggest values of each column so I would then have a new matrix with 7 rows.
I tried to make vectors of each column and then remove the 3 biggest values there with
x1 = x[x!=max(x)]
x2 = x1[x1!=max(x1)]
x3 = x2[x2!=max(x2)]
and then put the vectors into a new matrix, but as my matrices sometimes have a lot of columns I'd like to find a easier way.
Thanks for your help
We could loop through the columns using apply with MARGIN=2, sort each column and remove the three highest values with head
apply(mat, 2, FUN=function(x) head(sort(x),-3))
Or if we want to keep the order, use rank to get the numeric index, get a logical index by comparing with 1:3, negate (!) and subset the columns.
apply(mat, 2, FUN=function(x) x[!rank(-x, ties.method='first') %in% 1:3])

Resources