Can someone explain to me the problem with my sum in R - r

for a given matrix F I want to calculate the sum of the 2-norm of its rows, so I use the function sum() but it doesn't work as I expect it to do here an example
# The matrix F
> F <- matrix(c(9,1,1,1,4,1),nrow=3)
# index of the sum i
> i=1:NROW(F)
#And here is the result
> sum(norm(F[i,], type = "2")^4)
[1] 7376.60160040254
# and if i calculate each element of the sum i get
> norm(F[1,], type = "2")^4
[1] 6724
> norm(F[2,], type = "2")^4
[1] 289
> norm(F[3,], type = "2")^4
[1] 4

I think you're looking for the apply function. It applies a function along the dimensions of a matrix.
sum(apply(F,MARGIN = 1,function(x){norm(x,type = "2")^4}))
#[1] 7017
The reason yours doesn't work is because you assigned c(1,2,3) to i. Then, when you subset F, you just get the whole matrix.
i=1:NROW(F)
i
#[1] 1 2 3
norm(F,type="2")^4
#[1] 7376.602
norm(F[1:3,],type="2")^4
#[1] 7376.602
norm(F[i,],type="2")^4
#[1] 7376.602
Disclaimer: I have not assessed the mathematical validity of this approach, only programmatically recreated the OP's desired behavior.

Related

function to subset data supplying subset argument as text string

m <- matrix(1:4, ncol=2)
l <- list(a=1:3, b='c')
d <- data.frame(a=1:3, b=3:1)
I was wondering if it is possible to make a function that takes a base R object (matrix, vector, list or data.frame, ...) as well as a text that specifies the subset of the object.
f1 <- function(object, subset) {
# object'subset'
}
For instance
f1(m, '[1,1]') #to evaluate m[1,1]
f1(l, '[[1]][2:3]') #l[[1]][2:3]
f1(d, '$a') #d$a
would give us (respectively):
[1] 1
[1] 2 3
[1] 1 2 3
I guess the function need somehow to glue the two arguments before evaluating. I guess one could make a kind of interpreter for each bit of the subset text and the (for the matrix example) do something like:
`[`(1,1)
This would possible but I thought there would be an easier more direct way (my 'glue' above).
Well one way to go is to use eval(parse)) methodology, i.e.
f1 <- function(x, text){
eval(parse(text = paste0(x, text)))
}
f1('d', '$a')
#[1] 1 2 3
f1('m', '[1,1]')
#[1] 1
f1('l', '[[1]][2:3]')
#[1] 2 3
f1<-function(object, subset){
return(eval(parse(text=paste0(substitute(object),subset))))
}
> m=matrix(4,2,2)
> l=list(c(1,2,3),c(2,3,4))
> f1(m,'[1,1]')
[1] 4
> f1(l,'[[1]][1:2]')
[1] 1 2

R Matrix Multiplication Error with Proper Dimensions

I am trying to multiply matrices in R. However, I am unable to do so without error. The multiplication of the dimensions seem right, but not sure what it could be. Here is some background on my data and what my loop is. Thanks for the help.
t
# [1] 6848
dim(A)
# [1] 2 2
dim(backward)
# [1] 6848 2
dim(B)
# [1] 6848 2
is.matrix(A)
# [1] TRUE
is.matrix(backward)
# [1] TRUE
is.matrix(B)
# [1] TRUE
for (i in (t-1):1){ #FIXXXXX
backward[i,] = t(A%*%(t(backward[i+1,])))*B[i+1,]
}
Error in A %*% (t(backward[i + 1, ])) : non-conformable arguments
By default, selecting a single row or column from a matrix results in a vector. Add drop=FALSE to your subsetting expression to keep this from happening.
t(A %*% t(backward[i+1, , drop=FALSE])) * B[i+1, , drop=FALSE]
And by the way, it would probably be a good idea to rename your t variable to something else, as t is also the transpose function.

R - min, max and mean of off-diagonal elements in a matrix

I have like a matrix in R and I want to get:
Max off - diagonal elements
Min off – diagonal elements
Mean off –diagonal elements
With diagonal I used max(diag(A)) , min(diag(A)) , mean(diag(A)) and worked just fine
But for off-diagonal I tried
dataD <- subset(A, V1!=V2)
Error in subset.matrix(A, V1 != V2) : object 'V1' not found
to use:
colMeans(dataD) # get the mean for columns
but I cannot get dataD b/c it says object 'V1' not found
Thanks!
Here the row() and col() helper functions are useful. Using #James A, we can get the upper off-diagonal using this little trick:
> A[row(A) == (col(A) - 1)]
[1] 5 10 15
and the lower off diagonal via this:
> A[row(A) == (col(A) + 1)]
[1] 2 7 12
These can be generalised to give whatever diagonals you want:
> A[row(A) == (col(A) - 2)]
[1] 9 14
and don't require any subsetting.
Then it is a simple matter of calling whatever function you want on these values. E.g.:
> mean(A[row(A) == (col(A) - 1)])
[1] 10
If as per my comment you mean everything but the diagonal, then use
> diag(A) <- NA
> mean(A, na.rm = TRUE)
[1] 8.5
> max(A, na.rm = TRUE)
[1] 15
> # etc. using sum(A, na.rm = TRUE), min(A, na.rm = TRUE), etc..
So this doesn't get lost, Ben Bolker suggests (in the comments) that the above code block can be done more neatly using the row() and col() functions I mentioned above:
mean(A[row(A)!=col(A)])
min(A[row(A)!=col(A)])
max(A[row(A)!=col(A)])
sum(A[row(A)!=col(A)])
which is a nicer solution all round.
In one simple line of code:
For a matrix A if you wish to find the Minimum, 1st Quartile, Median, Mean, 3rd Quartile and Maximum of the upper and lower off diagonals:
summary(c(A[upper.tri(A)],A[lower.tri(A)])).
The diag of a suitably subsetted matrix will give you the off-diagonals. For example:
A <- matrix(1:16,4)
#upper off-diagonal
diag(A[-4,-1])
[1] 5 10 15
#lower off-diagonal
diag(A[-1,-4])
[1] 2 7 12
To get a vector holding the max of the off-diagonal elements of each col or row of a matrix requires a few more steps. I was directed here when searching for help on that. Perhaps others will do the same, so I offer this solution, which I found using what I learned here.
The trick is to create a matrix of only the off-diagonal elements. Consider:
> A <- matrix(c(10,2,3, 4,10,6, 7,8,10), ncol=3)
> A
[,1] [,2] [,3]
[1,] 10 4 7
[2,] 2 10 8
[3,] 3 6 10
> apply(A, 2, max)
[1] 10 10 10
Subsetting using the suggested indexing, A[row(A)!=col(A)] produces a vector of off-diagonal elements, in column-order:
> v <- A[row(A)!=col(A)]
> v
[1] 2 3 4 6 7 8
Returning this to a matrix allows the use of apply() to apply a function of choice to a margin of only off-diagonal elements. Using the max function as an example:
> A.off <- matrix(v, ncol=3)
> A.off
[,1] [,2] [,3]
[1,] 2 4 7
[2,] 3 6 8
> v <- apply(A.off, 2, max)
> v
[1] 3 6 8
The whole operation can be compactly—and rather cryptically—coded in one line:
> v <- apply(matrix(A[row(A)!=col(A)], ncol=ncol(A)), 2, max)
> v
[1] 3 6 8
Just multiply matrix A by 1-diag (nofelements)
for example if A is a 4x4 matrix, then
mean(A*(1-diag(4)) or A*(1-diag(nrow(A)))
This is faster when you need to run the same line of code multiple times
In addition to James' answer, I want to add that you can use the diag function to directly exclude all diagonal elements of a matrix by use of A[-diag(A)]. For example, consider:
summary(A[-diag(A)])

changing each vector in a list

I'm having a fundamental problem with how the lapply function works. I want to classify each member of each vector in a list.
My list:
s <- list(
a = c(1, 20, 300),
b = c(1.1, 20.1, 300.1),
c = c(1.2, 20.2, 300.3)
)
My classification function:
classify <- function(n, peaks){
which(abs(peaks-n)==min(abs(peaks-n)))
}
My peaks:
peaks <- c(1.27350, 20.32662, 300.02650)
If I classify s$c by itself, I get the result I expect:
> sapply(s$c,classify,peaks)
[1] 1 2 3
But when I try to classify all the vectors at once, I get this:
> lapply(s,classify,peaks)
$a
[1] 3 //should be 1,2,3
$b
[1] 3 //should be 1,2,3
$c
[1] 1 //should be 1,2,3
Why am I getting the result that I do? And how do I get the result that I want?
how bout
> lapply(s,sapply,classify,peaks)
$a
[1] 1 2 3
$b
[1] 1 2 3
$c
[1] 1 2 3
First, a style point: usewhich.min for finding the location of a minimum.
classify <- function(n, peaks){
which.min(abs(peaks-n))
}
Second, break your code down a bit do see what is happening.
abs(peaks - s$a) #3rd value is smallest
abs(peaks - s$b) #3rd value is smallest
abs(peaks - s$c) #1st value is smallest
These indicies are what gets returned from the call to lapply.
Based on your comment, I guess your problem is that lapply acts on each element of a vector, when you really want to call just call it once on everything, since classify is already vectorised. Try this:
if(is.list(s)) lapply(s, classify, peaks = peaks) else classify(s, peaks)
lapply(s, function(x) classify(x, peaks)) will pass each element of the list s as n in the function classify. lapply(s, classify, peaks) passes peaks as n to classify.

How to count TRUE values in a logical vector

In R, what is the most efficient/idiomatic way to count the number of TRUE values in a logical vector? I can think of two ways:
z <- sample(c(TRUE, FALSE), 1000, rep = TRUE)
sum(z)
# [1] 498
table(z)["TRUE"]
# TRUE
# 498
Which do you prefer? Is there anything even better?
The safest way is to use sum with na.rm = TRUE:
sum(z, na.rm = TRUE) # best way to count TRUE values
which gives 1.
There are some problems with other solutions when logical vector contains NA values.
See for example:
z <- c(TRUE, FALSE, NA)
sum(z) # gives you NA
table(z)["TRUE"] # gives you 1
length(z[z == TRUE]) # f3lix answer, gives you 2 (because NA indexing returns values)
Additionally table solution is less efficient (look at the code of table function).
Also, you should be careful with the "table" solution, in case there are no TRUE values in the logical vector. See for example:
z <- c(FALSE, FALSE)
table(z)["TRUE"] # gives you `NA`
or
z <- c(NA, FALSE)
table(z)["TRUE"] # gives you `NA`
Another option which hasn't been mentioned is to use which:
length(which(z))
Just to actually provide some context on the "which is faster question", it's always easiest just to test yourself. I made the vector much larger for comparison:
z <- sample(c(TRUE,FALSE),1000000,rep=TRUE)
system.time(sum(z))
user system elapsed
0.03 0.00 0.03
system.time(length(z[z==TRUE]))
user system elapsed
0.75 0.07 0.83
system.time(length(which(z)))
user system elapsed
1.34 0.28 1.64
system.time(table(z)["TRUE"])
user system elapsed
10.62 0.52 11.19
So clearly using sum is the best approach in this case. You may also want to check for NA values as Marek suggested.
Just to add a note regarding NA values and the which function:
> which(c(T, F, NA, NULL, T, F))
[1] 1 4
> which(!c(T, F, NA, NULL, T, F))
[1] 2 5
Note that which only checks for logical TRUE, so it essentially ignores non-logical values.
Another way is
> length(z[z==TRUE])
[1] 498
While sum(z) is nice and short, for me length(z[z==TRUE]) is more self explaining. Though, I think with a simple task like this it does not really make a difference...
If it is a large vector, you probably should go with the fastest solution, which is sum(z). length(z[z==TRUE]) is about 10x slower and table(z)[TRUE] is about 200x slower than sum(z).
Summing up, sum(z) is the fastest to type and to execute.
Another option is to use summary function. It gives a summary of the Ts, Fs and NAs.
> summary(hival)
Mode FALSE TRUE NA's
logical 4367 53 2076
>
which is good alternative, especially when you operate on matrices (check ?which and notice the arr.ind argument). But I suggest that you stick with sum, because of na.rm argument that can handle NA's in logical vector.
For instance:
# create dummy variable
set.seed(100)
x <- round(runif(100, 0, 1))
x <- x == 1
# create NA's
x[seq(1, length(x), 7)] <- NA
If you type in sum(x) you'll get NA as a result, but if you pass na.rm = TRUE in sum function, you'll get the result that you want.
> sum(x)
[1] NA
> sum(x, na.rm=TRUE)
[1] 43
Is your question strictly theoretical, or you have some practical problem concerning logical vectors?
There's also a package called bit that is specifically designed for fast boolean operations. It's especially useful if you have large vectors or need to do many boolean operations.
z <- sample(c(TRUE, FALSE), 1e8, rep = TRUE)
system.time({
sum(z) # 0.170s
})
system.time({
bit::sum.bit(z) # 0.021s, ~10x improvement in speed
})
I've been doing something similar a few weeks ago. Here's a possible solution, it's written from scratch, so it's kind of beta-release or something like that. I'll try to improve it by removing loops from code...
The main idea is to write a function that will take 2 (or 3) arguments. First one is a data.frame which holds the data gathered from questionnaire, and the second one is a numeric vector with correct answers (this is only applicable for single choice questionnaire). Alternatively, you can add third argument that will return numeric vector with final score, or data.frame with embedded score.
fscore <- function(x, sol, output = 'numeric') {
if (ncol(x) != length(sol)) {
stop('Number of items differs from length of correct answers!')
} else {
inc <- matrix(ncol=ncol(x), nrow=nrow(x))
for (i in 1:ncol(x)) {
inc[,i] <- x[,i] == sol[i]
}
if (output == 'numeric') {
res <- rowSums(inc)
} else if (output == 'data.frame') {
res <- data.frame(x, result = rowSums(inc))
} else {
stop('Type not supported!')
}
}
return(res)
}
I'll try to do this in a more elegant manner with some *ply function. Notice that I didn't put na.rm argument... Will do that
# create dummy data frame - values from 1 to 5
set.seed(100)
d <- as.data.frame(matrix(round(runif(200,1,5)), 10))
# create solution vector
sol <- round(runif(20, 1, 5))
Now apply a function:
> fscore(d, sol)
[1] 6 4 2 4 4 3 3 6 2 6
If you pass data.frame argument, it will return modified data.frame.
I'll try to fix this one... Hope it helps!
I've just had a particular problem where I had to count the number of true statements from a logical vector and this worked best for me...
length(grep(TRUE, (gene.rep.matrix[i,1:6] > 1))) > 5
So This takes a subset of the gene.rep.matrix object, and applies a logical test, returning a logical vector. This vector is put as an argument to grep, which returns the locations of any TRUE entries. Length then calculates how many entries grep finds, thus giving the number of TRUE entries.

Resources