rowsum for matrix over specified number of columns in R - r

I'm trying to get the sum of columns in a matrix in R for a certain row. However, I don't want the whole row to be summed but only a specified number of columns i.e. in this case all column above the diagonal. I have tried sum and rowSums function but they are either giving me strange results or an error message. To illustrate, please see example code for an 8x8 matrix below. For the first row I need the sum of the row except item [1,1], for second row the sum except items [2,1] and [2,2] etc.
m1 <- matrix(c(0.2834803,0.6398198,0.0766999,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,
0.0000000,0.1101746,0.6354086,0.2544168,0.0000000,0.0000000,0.0000000,0.0000000,
0.0000000,0.0000000,0.0548145,0.9451855,0.0000000,0.0000000,0.0000000,0.0000000,
0.0000000,0.0000000,0.0000000,0.3614786,0.6385214,0.0000000,0.0000000,0.0000000,
0.0000000,0.0000000,0.0000000,0.0000000,0.5594658,0.4405342,0.0000000,0.0000000,
0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.7490395,0.2509605,0.0000000,
0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.5834363,0.4165637,
0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,1.0000000),
8, 8, byrow = TRUE,
dimnames = list(c("iAAA", "iAA", "iA", "iBBB", "iBB", "iB", "iCCC", "iD"),
c("iAAA_p", "iAA_p", "iA_p", "iBBB_p", "iBB_p", "iB_p", "iCCC_p", "iD_p")))
I have tried the following:
rowSums(m1[1, 2:8]) --> Error in rowSums(m1[1, 2:8]) :
'x' must be an array of at least two dimensions
Alternatively:
sum(m1[1,2]:m1[1,8]) --> wrong result of 0.6398198 (which is item [1,2])
As I understand rowSums needs an array rather than a vector (although not sure why). But I don't understand why the second way using sum doesn't work. Ideally, there is some way to only sum all columns in a row that lie above the diagonal.
Thanks a lot!

The problem is you are not passing an array to rowSums:
class(m1[1,2:8])
# [1] "numeric"
This is a numeric vector. Use more than a single row and it will work just fine:
class(m1[1:2,2:8])
# [1] "matrix"
rowSums(m1[1:2,2:8])
# iAAA iAA
#0.7165197 1.0000000
If you want to sum all the columns that lie above the diagonal then you can use lower.tri to set all elements below the diagonal to 0 (or perhaps NA) and then use rowSums. If you do not want to include the diagonal elements themselves you can set diag = TRUE (thanks to #Fabio for pointing this out):
m1[lower.tri(m1 , diag = TRUE)] <- 0
rowSums(m1)
# iAAA iAA iA iBBB iBB iB iCCC iD
#0.7165197 0.8898254 0.9451855 0.6385214 0.4405342 0.2509605 0.4165637 0.0000000
# With 'NA'
m1[lower.tri(m1)] <- NA
rowSums(m1,na.rm=T)
# iAAA iAA iA iBBB iBB iB iCCC iD
#0.7165197 0.8898254 0.9451855 0.6385214 0.4405342 0.2509605 0.4165637 0.0000000

Related

How to create a vector from 2 merged matrices in R and fill in any dimension gaps with NA's?

Suppose I have a 4x4 matrix generated as shown in the image below for mat1. (Image is from a snapshot of R Studio Console). Suppose I have another matrix generated of any size for example (1x2 in the below image for mat2). (In my code, the matrix elements and dimensions vary for both of the above matrices based on user inputs; though columns are only generated in multiples of 2, so I can have a [ ]x2 matrix, [ ]x4, [ ]x6, etc., and number of rows can be any integer).
In my code these 2 matrices are merged into a vector (a vector is required for processing via R package shinyMatrix).
Since these 2 matrix dimensions are variable, there are instances when merging the 2 matrices into a vector where you don't end up with the correct number of elements for converting back to a "dimensionable" matrix. As shown in the below image for mat3, where there are 18 elements after combining the 2 matrices mat1 and mat2 and certain elements must be dropped and NA's inserted to end up with a 4x4, 16 element matrix in this example.
There are instances where mat2 can have more elements than mat1. In this example I show mat1 with 16 elements and mat2 with 2 elements.
Importantly, in the code only the first 2 columns of mat1 can ever be replaced in this way with mat2 values. Columns 3+ of mat1 are never changed by mat2.
Is there an efficient way to code this? Where you fit 2 matrices together into the dimensions of the larger matrix, and fill in any missing elements with NA?
I'm going down the path of measuring the length of mat1 columns 1 and 2, measuring the length of mat2 columns 1 and 2, merging, replacing elements, etc., but with my limited experience it's looking very cumbersome and not working quite right. I bet R has a slick way to accomplish this sort of thing. One of the apply functions, like mapply? I read somewhere mapply is great for subsetting matrices.Tibbles?
Here are my matrix inputs for the below image:
mat1 <- matrix(c(1,2,NA,NA, 3,4,NA,NA, 5:12), ncol = 4)
mat2 <- matrix(c(10,30), ncol = 2)
mat3 <- c(mat2,mat1)
Based on the expected output, we may select the columns in 'mat1', that doesn't have anyNA and concatenate (c) with the length corrected columns by appending NA at the end in 'mat2'
c(sapply(mat2, `length<-`, nrow(mat1)), mat1[,!apply(mat1, 2, anyNA)])
[1] 10 NA NA NA 30 NA NA NA 5 6 7 8 9 10 11 12

Combining row vectors for data frame after using quantile function

Novice problem. I ran following command:
CI_95_outcomes_male <- data.frame(do.call(cbind,lapply(1:ncol(outcomes_male_dt), function(r) quantile(outcomes_male_dt[,r],c(.95)))))
and end up with this output:
CI_95_outcomes_male
X1 X2 X3 X4
95% 9629902039 0 2.968924e+15 2.968924e+15
I would like to combine this vector with following vector to end up with 2X4 matrix:
#
mean_outcomes_male
ylg_smoking_simS deaths_averted total_cig total_tax_
9.62990 0.0000 2.78248 2.782480
I tried:
CI_95_outcomes_male<-colnames(mean_outcomes_male)
data.frame(mean_outcomes_male,CI_95_outcomes_male)
Error in data.frame(mean_outcomes_male, CI_95_outcomes_male) :
arguments imply differing number of rows: 4, 0
Any guidance appreciated, thanks!
CI_95_outcomes_male<-colnames(mean_outcomes_male)
I think you forgot to put colnames around CI_95_outcomes_male. But there's another problem here. I'm assuming that mean_outcomes_male is a vector, in which case colnames(mean_outcomes_male) is NULL.
data.frame(mean_outcomes_male,CI_95_outcomes_male)
Even if CI_95_outcomes_male was correct, the above command will result in a 4x5 data frame, with the first column being the mean_outcomes_male vector, second column being the CI_95_outcomes_male value for your first variable (repeated for each row),...,and the fifth column being the CI_95_outcomes_male value for your fourth variable (repeated for each row).
You need to do something like this:
set.seed(42)
# Generate a random dataset for outcomes_male_dt with 4 variables and n rows
n <- 100
outcomes_male_dt <- data.frame(x1=runif(n),x2=runif(n),x3=runif(n),x4=runif(n))
# I'm assuming you want the 95th percentile of each variable in outcomes_male_dt and store them in CI_95_outcomes_male
ptl <- .95 # if you want to add other percentiles you can replace this with something like "ptl <- c(.10,.50,.90,.95)"
CI_95_outcomes_male <- apply(outcomes_male_dt,2,quantile,probs=ptl)
# I'm going to assume that mean_outcomes_male is a vector of means for all the variables in outcomes_male_dt
mean_outcomes_male <- colMeans(outcomes_male_dt)
# You want to end up with a 2x4 matrix - I'm assuming you meant row 1 will be the means, and row 2 will be the 95th percentiles, and the columns will be the variables
want <- rbind(mean_outcomes_male, CI_95_outcomes_male)
colnames(want) <- colnames(outcomes_male_dt)
row.names(want) <- c('Mean',paste0("p",ptl*100)) # paste0("p",ptl*100) is equivalent to paste("p",ptl*100,sep="")
want # Resulting matrix

How to select one plot from coplot(), or how to plot a specific subset

I am trying to use coplot like this:
coplot(lat ~ long | depth, data = quakes)
Which produce this"
However, this will give a plot for each level of conditioning variable. I would like to get a whole scatter plot without showing the levels of variables. For example similar to this:
How can I customize this? any help please?
It appears like what you really want is to plot a subset. How you subset data varies a bit depending on what form it is in (vector, matrix, dataframe, list; numeric, logical, character). This here will work with a numeric dataframe like data(quake).
First of all you'll need to understand the function of []. If you search the help for Extract you'll find info on it and its siblings. [] is used to extract elements from vectors, matrices, dataframes and arrays of any dimension, based on indices of each dimension, with each dimension separated by a comma.
Say you have a vector vec <- c(1, 3, 2, 6, 4). A vector only have one dimension so you don't have to worry about which dimension you are addressing. vec[3] will return 2, the value at place, or index, 3 in the vector, the third element from the left. vec[5] will return the fifth element: 4, while vec[c(2, 1, 4)] will return the second, first and fourth element, in that order.
When dealing with matrices and dataframes we are dealing with two dimensions, commonly referred to as rows and columns. When using [] the dimensions will always be separated like this: [rows, columns]. Lets create a matrix.
mat <- matrix(c(9, 8, 7
6, 5, 4
3, 2, 1), nrow=3, byrow=TRUE)
mat
# [,1] [,2] [,3]
# [1,] 9 8 7
# [2,] 6 5 4
# [3,] 3 2 1
Along the top and bottom there are []'s with numbers in them referring to the index of each dimension. You'll also notice that along the side the number is in front of a comma, signifying row, while along the top the number stand after the comma, signifying column.
So what happens if I call mat[3, ] or mat[, 2]? Well I get every element in that respective row or column.
What about if we want to extract the 4 in the above matrix? Well its row is nr 2, and its column is nr 3, so could it simply be mat[2, 3]? Why yes, it is!
We're nearly there now, only other thing to show is that we can reference rows and columns by name, not just by index. The above matrix doesn't have any dimension names, so we'll have to add them.
rownames(mat) <- c("r1", "r2", "r3")
colnames(mat) <- c("c1", "c2", "c3")
mat
# c1 c2 c3
# r1 9 8 7
# r2 6 5 4
# r3 3 2 1
Now mat[,"c2"] will return the column called c2, and mat["r1"] the row called r1.
Onto quakes.
Say we want to plot all earthquakes with a focus deeper than 400 km.
One straight forward way of extracting that subset could be
quakes.sub <- quakes[quakes[,"depth"] > 400, c("long", "lat")]
The only new thing here is >, which in this instance asks: "are the values in quakes[,"depth"] larger than 400?". For every yes a TRUE is returned, and for every no a FALSE. Only columns returning TRUE will be included in the subset.
To plot the subset you only have to
plot(quakes.sub)
This is only the start of course. You can use with() to avoid having to type quakes several times, using which() will deal with NA's more gracefully, and conditions (<, >, ==) can be stacked using logical operators (not: !, and: &, or: |), allowing for more advanced subsetting rules.
rows <- with(quakes, which(depth < 400 &
depth > 100 &
mag > 4.3 &
stations > 20))
quakes.sub <- quakes[rows, c("long", "lat")]
plot(quakes.sub)

Can R create a matrix and at the same time process the created columns to form a new one?

I am trying to create a new matrix from an existing one. Specifically, I want to subtract column A2 from column A1, A4 from A3 and A6 from A5 as in the example, but at the same time i want to create a new column that is the row mean of these results:
A <- matrix(c(2,3,-2,1,2,2,1,2,3,4,5,6,7,8,9,10,11,12),3,6)
dimnames(A) <- list(c("a","b","c"), c("A1","A2","A3","A4","A5","A6"))
B <- data.frame(minus12=A[, c("A1")]-A[, c("A2")],
minus34=A[, c("A3")]-A[, c("A4")],
minus56=A[, c("A5")]-A[, c("A6")],
mean=rowMeans(B[c("minus12","minus34","minus56")]))
I tried the above and it worked with such a small matrix. But with my actual data, which is a much bigger matrix A and matrix B has other operations calculated while it is created (means of rows from matrix A), this doesn't work. Instead, I get this message:
Error in is.data.frame(x) : object 'B' not found
I checked and didn't find anything different with the above, working, code. What could be the problem? How do I make this happen with a bigger, more complicated matrix?
We can use the logical recycling to subtract the odd/even columns, then cbind with the original dataset to rowMeans of the output.
val <- A[,c(TRUE, FALSE)]-A[, c(FALSE, TRUE)]
colnames(val) <- paste0('minus', seq(1, ncol(A), by=2),
seq(2,ncol(A), by =2))
Anew <- cbind(val, Mean=rowMeans(val))
Anew
# minus12 minus34 minus56 Mean
#a 1 -3 -3 -1.666667
#b 1 -3 -3 -1.666667
#c -4 -3 -3 -3.333333

Data Manipulation, Looping to add columns

I have asked this question a couple times without any help. I have since improved the code so I am hoping somebody has some ideas! I have a dataset full of 0's and 1's. I simply want to add the 10 columns together resulting in 1 column with 3835 rows. This is my code thus far:
# select for valid IDs
data = history[history$studyid %in% valid$studyid,]
sibling = data[,c('b16aa','b16ba','b16ca','b16da','b16ea','b16fa','b16ga','b16ha','b16ia','b16ja')]
# replace all NA values by 0
sibling[is.na(sibling)] <- 0
# loop over all columns and count the number of 174
apply(sibling, 2, function(x) sum(x==174))
The problem is this code adds together all the rows, I want to add together all the columns so I would result with 1 column. This is the answer I am now getting which is wrong:
b16aa b16ba b16ca b16da b16ea b16fa b16ga b16ha b16ia b16ja
68 36 22 18 9 5 6 5 4 1
In apply() you have the MARGIN set to 2, which is columns. Set the MARGIN argument to 1, so that your function, sum, will be applied across rows. This was mentioned by #sgibb.
If that doesn't work (can't reproduce example), you could try first converting the elements of the matrix to integers X2 <- apply(sibling, c(1,2), function(x) x==174), and then use rowSums to add up the columns in each row: Xsum <- rowSums(X2, na.rm=TRUE). With this setup you do not need to first change the NA's to 0's, as you can just handle the NA's with the na.rm argument in rowSums()

Resources