How to create a custom Matrix? - math

I have 2 matrices(yellow color) as follows:
matrix 1 (size of 4x1) and matrix 2 (size of 1x6),
I am allowed to use matrix unit,matrix inverse, transpose matrix function ,square matrix also [multiplication, addition and subtraction,..calculation] from 2 above matrix
Edit: the numbers 1 in the 4x6 matrix are not necessarily = 1, as long as they are nonzero
my question is: how do I create a 4x6 matrix as in the image?
Actually, I find it quite similar to the diagonal matrix but I'm still stuck because I couldn't find the connection to deploy them!
Please give me a solution. Thanks very much!
Here's other types:

If the input vectors are X and Y, it looks like the output matrix Z is:
Z(i,j) = 1 when X(i)=Y(j)
0 otherwise
If you have an element-wise comparison function, you can do this process. First multiply your vectors with a vector of ones, to produce matrices that have the rows or columns repeated.
1 1 1 1 1 1 1
1 * [1 1 1 1 1 1] = 1 1 1 1 1 1
2 2 2 2 2 2 2
2 2 2 2 2 2 2
1 1 1 1 2 2 2
1 * [1 1 1 2 2 2] = 1 1 1 2 2 2
1 1 1 1 2 2 2
1 1 1 1 2 2 2
An element-wise comparison of these two gives you the result you want.
1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 0 0 0
1 1 1 1 1 1 == 1 1 1 2 2 2 = 1 1 1 0 0 0
2 2 2 2 2 2 1 1 1 2 2 2 0 0 0 1 1 1
2 2 2 2 2 2 1 1 1 2 2 2 0 0 0 1 1 1

Related

How to find the streaks of a particular value in R?

The rle() function returns a list with values and lengths. I have not found a way to subset the output to isolate the streaks of a particular value that does not involve calling rle() twice, or saving the output into an object to later subset (an added step).
For instance, for runs of heads (1's) in a series of fair coin tosses:
s <- sample(c(0,1),100,T)
rle(s)
Run Length Encoding
lengths: int [1:55] 1 2 1 2 1 2 1 2 2 1 ...
values : num [1:55] 0 1 0 1 0 1 0 1 0 1 ...
# Double-call:
rle(s)[[1]][rle(s)[[2]]==1]
[1] 2 2 2 2 1 1 1 1 6 1 1 1 2 2 1 1 2 2 2 2 2 3 1 1 4 1 2
# Adding an intermediate step:
> r <- rle(s)
> r$lengths[r$values==1]
[1] 2 2 2 2 1 1 1 1 6 1 1 1 2 2 1 1 2 2 2 2 2 3 1 1 4 1 2
I see that a very easy way of getting the streak lengths just for 1 is to simply tweak the rle() code (answer), but there may be an even simpler way.
in Base R:
with(rle(s), lengths[values==1])
[1] 1 3 2 2 1 1 1 3 2 1 1 3 1 1 1 1 1 2 3 1 2 1 3 3 1 2 1 1 2
For a sequence of outcomes s and when interested solely the lengths of the streaks on outcome oc:
sk = function(s,oc){
n = length(s)
y <- s[-1L] != s[-n]
i <- c(which(y), n)
diff(c(0L, i))[s[i]==oc]
}
So to get the lengths for 1:
sk(s,1)
[1] 2 2 2 2 1 1 1 1 6 1 1 1 2 2 1 1 2 2 2 2 2 3 1 1 4 1 2
and likewise for 0:
sk(s,0)
[1] 1 1 1 1 2 2 2 2 4 1 1 2 1 1 1 1 1 1 3 1 1 2 6 2 1 1 4 4

number of occurrences by lines R

I have this array:
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
[38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[75] 1 1 2 1 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2
[112] 2 1 1 2 2 2 2 2 2 1 2 1 1 2 1 1 2 1 1 2 1 1 2 2 1 2 2 2 2 1 2 2 2 1 2 2 2
And I want to count the number of occurrences of '1' and '2'. From [1] to [70] and from [71] to the end.
I tried :
sum(x==1)
But this for all.How can I select lines?
the function sum {base} should return the sum of all the values present in its arguments
you could define the arguments the following way:
with x[a:b] you can set boundaries (for example a=1 and b=10, will set the area from [1] to[10]);
with the operator == you can check if one specific value c is present between your boundaries ... e.g.: x[a:b]==c
if you want to look for more than one value ( for example c & d , where c==1 and d==2 , you can (for example) use a simple addition to sum up your results:
Now you can just say: sum(x[a:b]==c) + sum(x[a:b]==c)
Where a&b are your boundaries and c&d are the values you want to compare.

How can i count occurrence with few variables in R

I have some example data.frame:
x<- data.frame(c(0,1,2,1,2,1,2),c(0,1,2,1,2,2,1),c(0,1,2,1,2,1,2),c(0,1,2,1,2,2,1))
colnames(x) <- c('PV','LA','Wiz','LAg')
I want to count occurrence by hole row. The result should look like:
PV LA Wiz Lag Replace
0 0 0 0 1
1 1 1 1 2
2 2 2 2 2
1 2 1 2 1
2 1 2 1 1
The row 0 0 0 0 was replaced 1, row 1 1 1 1 was replaced 2 times etc.
Do you have any idea, how can I do it ?
Maybe you want this?
as.data.frame(table(do.call(paste, x[,-1])))
# Var1 Freq
#1 0 0 0 0 1
#2 1 1 1 1 2
#3 1 2 1 2 1
#4 2 1 2 1 1
#5 2 2 2 2 2

What does this R expression do?

sp_full_in is matrix:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 0 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 1 1 1 1 2
2 1 0 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1
3 2 2 0 2 2 2 2 2 2 1 1 2 2 2 1 2 1 1 1 2 1
4 1 2 1 0 2 2 2 1 2 1 1 1 2 2 1 2 1 1 2 2 1
5 2 2 2 2 0 2 2 2 2 1 1 2 1 2 1 2 1 1 1 2 2
6 2 1 1 1 1 0 1 1 1 2 2 2 2 2 1 2 1 2 2 1 1
7 2 1 1 2 1 1 0 1 1 2 1 1 2 1 1 2 1 1 1 2 1
8 1 2 1 1 1 2 2 0 1 1 1 2 2 2 1 2 1 1 2 1 1
9 2 2 1 2 1 1 2 2 0 1 1 2 1 2 1 2 1 1 2 2 2
10 2 2 1 1 1 2 2 1 1 0 2 2 2 2 1 1 1 1 1 2 2
11 2 2 1 1 1 2 1 1 1 1 0 2 1 2 1 2 1 1 1 1 2
12 1 2 1 1 2 1 1 2 1 1 1 0 2 2 1 2 1 2 1 1 1
13 2 2 2 2 1 3 2 2 2 1 1 3 0 2 1 2 2 1 2 2 2
14 2 2 1 2 1 2 1 2 1 2 2 2 1 0 1 2 1 1 1 1 1
15 2 2 2 2 2 2 2 2 2 1 1 2 2 1 0 2 1 1 1 1 2
16 1 2 2 1 1 2 2 2 1 1 2 2 2 2 1 0 1 1 2 1 2
17 2 2 1 1 1 1 1 2 1 1 1 1 2 2 1 2 0 2 2 1 1
18 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 0 1 1 1
19 2 2 1 2 1 2 2 2 2 1 1 2 2 2 1 2 1 1 0 2 2
20 2 2 1 1 1 2 2 2 2 1 2 2 2 2 1 2 1 1 1 0 1
21 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 2 1 1 2 1 0
mean(sp_full_in[which(sp_full_in != Inf)])
produces the result [1] 1.38322
I'm not quite sure I understand what this does, but the way I read it is: for every cell in sp_full_in, check if it is not infinite, if so, return the output 1, then average all the outputs. Is that correct? If not, how should it be ready?
which(sp_full_in != Inf) returns a vector of integers (and only one of them is 1). That vector of integers is then handed to "[" as indices into sp_full_in and returns all the values of sp_full_in as a vector passed to the mean function.
It is a good idea to learn to read R expressions from the "inside out". Find the innermost function call and mentally evaluate it, in this case sp_full_in != Inf,. That returns a logical matrix of all TRUE's that gets passed to which(), and since there is no 'arr.ind' argument, it returns an atomic vector of indices.
The other answers are good at explaining why you get the mean of all the finite entries in the matrix, but it's worth noting that in this case the which does nothing. I used to have the bad habit of over-using which as well.
> a <- matrix(rnorm(4), nrow = 2)
> a
[,1] [,2]
[1,] 0.5049551 -0.7844590
[2,] -1.7170087 -0.8509076
> a[which(a != Inf)]
[1] 0.5049551 -1.7170087 -0.7844590 -0.8509076
> a[a != Inf]
[1] 0.5049551 -1.7170087 -0.7844590 -0.8509076
> a[1] <- Inf
> a
[,1] [,2]
[1,] Inf -0.7844590
[2,] -1.717009 -0.8509076
> a[which(a != Inf)]
[1] -1.7170087 -0.7844590 -0.8509076
## Similarly if there was an Infinite value
> a[a != Inf]
[1] -1.7170087 -0.7844590 -0.8509076
And, while we're at it, we should also mention the function is.finite which is often preferable to != Inf. is.finite will return FALSE on Inf, -Inf, NA and NaN.
No, but you are close, when which is applied to a matrix, it checks every cell of the matrix against the condition,here it is Not Inf. Return the indices of all cells satisfying the conditions,then, according to your code, output the value of the cell according to the returned indices and finally calculate mean of those.

Conditional counting in R

I have a question I hope some of you might help me with. I am doing a thesis on pharmaceuticals and the effect from parallelimports. I am dealing with this in R, having a Panel Dataset
I need a variable, that counts for a given original product - how many parallelimporters are there for this given time period.
Product_ID PI t
1 0 1
1 1 1
1 1 1
1 0 2
1 1 2
1 1 2
1 1 2
1 1 2
2 0 1
2 1 1
2 0 2
2 1 2
2 0 3
2 1 3
2 1 3
2 1 3
Ideally what i want here is a new column, like number of PI-products (PI=1) for an original (PI=0) at time, t. So the output would be like:
Product_ID PI t nPIcomp
1 0 1 2
1 1 1
1 1 1
1 0 2 4
1 1 2
1 1 2
1 1 2
1 1 2
2 0 1 1
2 1 1
2 0 2 1
2 1 2
2 0 3 3
2 1 3
2 1 3
2 1 3
I hope I have made my issue clear :)
Thanks in advance,
Henrik
Something like this?
x <- read.table(text = "Product_ID PI t
1 0 1
1 1 1
1 1 1
1 0 2
1 1 2
1 1 2
1 1 2
1 1 2
2 0 1
2 1 1
2 0 2
2 1 2
2 0 3
2 1 3
2 1 3
2 1 3", header = TRUE)
find.count <- rle(x$PI)
count <- find.count$lengths[find.count$values == 1]
x[x$PI == 0, "nPIcomp"] <- count
Product_ID PI t nPIcomp
1 1 0 1 2
2 1 1 1 NA
3 1 1 1 NA
4 1 0 2 4
5 1 1 2 NA
6 1 1 2 NA
7 1 1 2 NA
8 1 1 2 NA
9 2 0 1 1
10 2 1 1 NA
11 2 0 2 1
12 2 1 2 NA
13 2 0 3 3
14 2 1 3 NA
15 2 1 3 NA
16 2 1 3 NA
I would use ave and your two columns Product_ID and t as grouping variables. Then, within each group, apply a function that returns the sum of PI followed by the appropriate number of NAs:
dat <- transform(dat, nPIcomp = ave(PI, Product_ID, t,
FUN = function(z) {
n <- sum(z)
c(n, rep(NA, n))
}))
The same idea can be used with the data.table package if your data is large and speed is a concern.
Roman's answers gives exactly what you want. In case you want to summarise the data this would be handy, using the plyr pacakge (df is what I have called your data.frame)...
ddply( df , .(Product_ID , t ) , summarise , nPIcomp = sum(PI) )
# Product_ID t nPIcomp
#1 1 1 2
#2 1 2 4
#3 2 1 1
#4 2 2 1
#5 2 3 3

Resources