I have a vector with two columns, one column containing numerical values and one column containing names.I'm a novice to R but essentially I want to take a vector and create a matrix with it wherein the values within the matrix would add together. So for example, where the vector A has a value of 1 and B has a value of 1, in the matrix at the intersection of A and B I want the values to add and become 2.
I've tried to use a for loop but I'm having trouble with the arguments to put within the loop. Any help would be greatly appreciated and I'd be glad to clarify stuff if it doesn't make sense.
Essentially what I want is to take this:
A 1
B 0
C 0
D 1
And turn it into this:
A B C D
A 1 1 2
B 1 0 1
C 1 0 1
D 2 1 1
Thanks!
R > x <- c(1,0,0,1)
R > outer(x, x, "+")
[,1] [,2] [,3] [,4]
[1,] 2 1 1 2
[2,] 1 0 0 1
[3,] 1 0 0 1
[4,] 2 1 1 2
The next thing is to ignore the diagonal. Updated by Vincent:
names(x) <- c("A","B","C","D")
Related
I was wondering if there is a way to avoid using nested ifs inside a loop to this problem:
I have a 100 x 14 matrix with values. Column 1 is the group to which each row belongs (there are a total of 11 groups). Let's call it MatrixA.
Matrix B, has the average value for each column for each of the 11 different groups (size 11x14, where the first column has the group number).
I have columns in the same order for both Matrices
I want to change the values of MatrixA to 1 if the value on the cell is greater than the average value from Matrix B, otherwise, put 0. If cell[1,2] in MatrixA belongs to group 5, then check the mean of group 5 column 2 in MatrixB and put 1 or 0, and so on.
MatrixA = cbind(sample(1:11,100,replace = T),matrix(data = rnorm(100*13),nrow = 100,ncol = 13 ))
Let's assume these are the averages of each column from MatrixA
MatrixB = cbind(1:11,matrix(data = rnorm(11*13),nrow = 11,ncol = 13 ))
I hope phrase my question correctly. Thank you.
Try
(MatrixA>MatrixB[match(MatrixA[,1],MatrixB[,1]),])*1L
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 1 1 1
[2,] 0 1 1 0 1
[3,] 0 1 1 1 1
[4,] 0 1 1 0 1
[5,] 0 1 1 1 0
...
note: the first column (groups) will be all 0's, you can replace it back again to get the groups.
I need to create an adjacency matrix from a dataframe using tcrossprod, but the resulting matrix needs to obey a restriction that I will explain below. Consider the following dataframe:
z <- data.frame(Person = c("a","b","c","d"), Man_United = c(1,0,1,0))
z
Person Man_United
1 a 1
2 b 0
3 c 1
4 d 0
I make an adjacency matrix from z using tcrossprod.
x <- tcrossprod(table(z))
diag(x) <- 0
x
Person
Person a b c d
a 0 0 1 0
b 0 0 0 1
c 1 0 0 0
d 0 1 0 0
I need the resulting adjacency matrix to indicate a tie (here signaled with the number 1), only when both persons have value 1 in the original dataframe (i.e. are fans of Manchester United, in this example). For example, persons "a" and "c" of dataframe z are fans, so in the resulting adjacency matrix I want their intersecting cell to be valued 1. That works fine here. However, persons "b" and "d" are not fans, and the fact that both have value 0 in the original dataframe does not mean that they are connected in any meaningful way. tcrossprod, however, produces a matrix that suggests that they are in fact connected.
How to use tcrossprod in a way that it caputures only the positve values of dataframes in producing adjacency matrices?
We may restrict attention on table results of ones with
tcrossprod(table(z)[, "1"])
# [,1] [,2] [,3] [,4]
[# 1,] 1 0 1 0
# [2,] 0 0 0 0
# [3,] 1 0 1 0
# [4,] 0 0 0 0
or, if you want to preserve the names,
tcrossprod(table(z)[, "1", drop = FALSE])
# Person
# Person a b c d
# a 1 0 1 0
# b 0 0 0 0
# c 1 0 1 0
# d 0 0 0 0
If there can be more nonzero values, then you may replace "1" by -1 as to eliminate the column for zeroes.
I'm cleaning up some survey data in R; assigning variables 1,0 based on the responses to a question. Say I had a question with 3 options; a,b,c; and I had a data frame with the responses and logical variables:
df <- data.frame(a = rep(0,3), b = rep(0,3), c = rep(0,3), response = I(list(c(1),c(1,2),c(2,3))))
So I want to change the 0's to 1's if the response matches the column index (ie 1=a, 2=b, 3=c).
This is fairly easy to do with a loop:
for (i in 1:nrow(df2)) df2[i,df2[i,"response"][[1]]] <- 1
Is there any way to do this with an apply/lapply/sapply/etc? Something like:
df <- sapply(df,function(x) x[x["response"][[1]]] <- 1)
Or should I stick with a loop?
You can use matrix indexing, from ?[:
A third form of indexing is via a numeric matrix with the one column
for each dimension: each row of the index matrix then selects a single
element of the array, and the result is a vector. Negative indices are
not allowed in the index matrix. NA and zero values are allowed: rows
of an index matrix containing a zero are ignored, whereas rows
containing an NA produce an NA in the result.
# construct a matrix representing the index where the value should be one
idx <- with(df, cbind(rep(seq_along(response), lengths(response)), unlist(response)))
idx
# [,1] [,2]
#[1,] 1 1
#[2,] 2 1
#[3,] 2 2
#[4,] 3 2
#[5,] 3 3
# do the assignment
df[idx] <- 1
df
# a b c response
#1 1 0 0 1
#2 1 1 0 1, 2
#3 0 1 1 2, 3
or you can try this .
library(tidyr)
library(dplyr)
df1=df %>%mutate(Id=row_number()) %>%unnest(response)
df[,1:3]=table(df1$Id,df1$response)
a b c response
1 1 0 0 1
2 1 1 0 1, 2
3 0 1 1 2, 3
Perhaps this helps
df[1:3] <- t(sapply(df$response, function(x) as.integer(names(df)[1:3] %in% names(df)[x])))
df
# a b c response
#1 1 0 0 1
#2 1 1 0 1, 2
#3 0 1 1 2, 3
Or a compact option is
library(qdapTools)
df[1:3] <- mtabulate(df$response)
Hi I am using R and have a cluster assignment matrix that comes out of my clustering function. (I am applying a clustering algorithm on a gaussian mixture data) I want to create a data matrix of clusters. Here is a toy example of what I want to do.
#simulate data
dat=Z<-c(rnorm(2,0,1),rnorm(2,2,3),rnorm(3,0,1),rnorm(3,2,3))
dat
[1] -0.5350681 1.0444655 2.9229136 8.2528266 -0.7561170 -1.0240702 -1.0012780
[8] -0.1322981 7.8525855 2.2278264
# Making up a cluster assignment matrix (actually this one comes out of my
#clustering function
amat<-matrix(c(1,1,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,1,1,1), ncol=2, nrow=10)
amat
[,1] [,2]
[1,] 1 0
[2,] 1 0
[3,] 0 1
[4,] 0 1
[5,] 1 0
[6,] 1 0
[7,] 1 0
[8,] 0 1
[9,] 0 1
[10,] 0 1
I want to create dataframe or vector called (say) "clust" that contains cluster labels as follows using the assignment matrix given above.Basically it uses first column and second column of assignment matrix and assigns label 1 to data coming from normal distribution N(0,1) and assigns label 2 to the data coming from normal distribution N(2,3).Any help is appreciated. Thanks in advance.
# clust should look like this (I have no idea how to create this using amat and dat)
clust
[1] 1 1 2 2 1 1 1 2 2 2
The vector is already binary. We can add 1L to the second column:
clust <- amat[,2] + 1L
[1] 1 1 2 2 1 1 1 2 2 2
(The suffix L coerces the value to integer)
Isn't this essentially
1 * column1 + 2 * column2 +3 * column3 and so on?
that should be straight forward to write as a matrix multiplocation woth [1,2,3,4,...] and a sum operation.
I'm trying to do something that seems relatively straightforward to do with something apply-esque, but I can only get it to work using a for loop.
The general idea is I have two vectors, with one vector corresponding to a row in the matrix and another vector corresponding to the column, both the same length. I start with a 0 matrix, and increment [row,column] based on the pair of values in the two vectors. For example:
vectorCols <- c(1,2,3,1,3)
vectorRows <- c(2,1,2,3,2)
countMat <- matrix(rep(0,9),ncol=3)
And at the end, countMat is:
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 1 0 2
[3,] 1 0 0
This is pretty manageable with a for loop:
for (i in 1:length(vectorCols)){
countMat[vectorRows[i],vectorCols[i]] <- countMat[vectorRows[i],vectorCols[i]] + 1
}
But I can't help thinking there is a better way to do this in R. I've tried using the apply family of functions, but these don't cooperate well when you want to assign something. I know I could use mapply and build each element of countMat one value at a time, but this seems inefficient--vectorRows and vectorCols are very long, and it seems wasteful to fully traverse them an entire time for each cell in countMat. But other than a loop and mapply, I can't think of how to do this. I've considered using assign with one of the apply family, but there's a caveat--my matrix actually has names for the columns and rows, with the names stored in vectorCols and vectorRows, and it seems assign doesn't want to play well something like countMat["rowName"]["columnName"] (not to mention thatapply` will still want to return a value for each step in the iteration).
Any suggestions? I'd also be curious if there is an ideal way to do this if I don't have names for the vector columns and rows. If that's the case then maybe I can convert vectorCols and vectorRows to numbers, then build the matrix, then rename everything.
Thanks all.
Here are some solutions. No packages are needed.
1) table
table(vectorRows, vectorCols)
giving:
vectorCols
vectorRows 1 2 3
1 0 1 0
2 1 0 2
3 1 0 0
Note that if there is any row or column with no entries then it will not appear.
2) aggregate
ag <- aggregate( Freq ~ ., data.frame(Freq = 1, vectorRows, vectorCols), sum)
countMat[as.matrix(ag[-3])] <- ag[[3]]
giving:
> countMat
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 1 0 2
[3,] 1 0 0
3) xtabs
xtabs(~ vectorRows + vectorCols)
giving:
vectorCols
vectorRows 1 2 3
1 0 1 0
2 1 0 2
3 1 0 0