unused arguments when trying to create a matrix in R - r

I want create such matrix
dat <- matrix(
"an_no" = c(14, 17),
"an_yes" = c(3, 1),
row.names = c("TL-MCT-t", "ops"),
stringsAsFactors = FALSE
)
but i get error unused arguments.
What i did wrong and how perform correct matrix with such arguments?
as.matrix didn't help.
Thanks for your help.

You are using the arguments that you would use to build a data frame. If you want a matrix using this syntax you can do:
dat <- as.matrix(data.frame(
an_no = c(14, 17),
an_yes = c(3, 1),
row.names = c("TL-MCT-t", "ops")))
dat
#> an_no an_yes
#> TL-MCT-t 14 3
#> ops 17 1
You don't need the stringsAsFactors = FALSE because none of your data elements are strings, and in any case, stringsAsFactors is FALSE by default unless you are using an old version of R. You also don't need quotation marks around an_no and an_yes because these are both legal variable names in R.

The matrix function estructure is this:
matrix(data = NA,
nrow = 1,
ncol = 1,
byrow = FALSE,
dimnames = NULL)
Appears you're trying to create a data.frame
data.frame(row_names = c("TL-MCT-t", "ops"),
an_no = c(14,17),
an_yes = c(3,1)
)

Related

Efficiently populating rows given possible values for each variable in R

I have a dataframe with 42 variables, each of which have different possible values. I am aiming to create a much larger dataframe which contains a row for each possible combination of values for each of the variables.
This will be millions of rows long and too large to hold in RAM. I have therefore been trying to make a script which appends each possible value to an existing file. The following code works but does so too slowly to be practical (also includes only 5 variables), taking just under 5 minutes to run on my machine.
V1 <- c(seq(0, 30, 1), NA)
V2 <- c(seq(20, 55, 1), NA)
V3 <- c(0, 1, NA)
V4 <- c(seq(1, 16, 1), NA)
V5 <- c(seq(15, 170, 1), NA)
df_empty <- data.frame(V1 = NA, V2 = NA, V3 = NA, V4 = NA)
write.csv(df_empty, "table_out.csv", row.names = FALSE)
start <- Sys.time()
for(v1 in 1:length(V1)){
V1_val <- V1[v1]
for(v2 in 1:length(V2)){
V2_val <- V2[v2]
for(v3 in 1:length(V3)){
V3_val <- V3[v3]
for(v4 in 1:length(V4)){
V4_val <- V4[v4]
row <- cbind(V1_val, V2_val, V3_val, V4_val)
write.table(as.matrix(row), file = "table_out.csv", sep = ",", append = TRUE, quote = FALSE,col.names = FALSE, row.names = FALSE)
}
}
}
}
print(abs(Sys.time() - start)) # 4.8 minutes
print(paste(nrow(read.csv("table_out.csv")), "rows in file"))
I have tested using data.table::fwrite() but this failed to be any faster than write.table(as.matrix(x))
I'm sure the issue I have is with using so many for loops but am unsure how to translate this into a more efficient approach.
Thanks
I guess you can try the following code to generate all combinations
M <- as.matrix(do.call(expand.grid,mget(x = ls(pattern = "^V\\d+"))))
and then you are able to save res to you designated file, e.g.,
write.table(M, file = "table_out.csv", sep = ",", append = TRUE, quote = FALSE,col.names = FALSE, row.names = FALSE)

Cannot assign names when converting matrix using as.data.frame

I want to generate a two-column data frame that has a given correlation. Two columns named "x" and "y". There are tons of ways to do this, sampling from a multivariate normal distribution being one. So, for 50 rows of correlation r = 0.95, this works:
myFrame <- as.data.frame(mvrnorm(10, mu = c(0,0), Sigma = matrix(c(1,0.56,0.56,1),, ncol = 2), empirical = TRUE))
myFrame
...but you'll notice that the column names are V1 and V2. I've read what I think is relevant in the docs, but I can't get the names to change.
I've tried using col.names = c("x", "y") in various places. It seems it would go between the final two closed parentheses, but I've tried other places. Even though I didn't think it correct, I tried names = c("x, y") as well, to no avail.
I understand I could use a second step to change the names, but since as.data.frame() accepts a vector to name the columns, I shouldn't have to resort to that.
As pointed out in comments, neither data.frame() nor the matrix method for as.data.frame have an argument to let you set column names.
The standard way, as you say, would be to set the names of the object in a second line of code. If that is abhorrent to you, you can still get it done in a single line. Here are two options:
myFrame1 = as.data.frame("colnames<-"(mvrnorm(10, mu = c(0,0), Sigma = matrix(c(1, 0.56, 0.56, 1), ncol = 2), empirical = TRUE), c("x", "y")))
myFrame2 = setNames(as.data.frame(mvrnorm(10, mu = c(0,0), Sigma = matrix(c(1, 0.56, 0.56, 1), ncol = 2), empirical = TRUE)), c("x", "y"))
# I would prefer using two lines, much clearer:
myFrame3 = as.data.frame(mvrnorm(10, mu = c(0,0), Sigma = matrix(c(1, 0.56, 0.56, 1), ncol = 2), empirical = TRUE))
names(myFrame3) = c("x", "y")
# Or, if you're a fine of pipes:
library(magrittr)
myFrame4 = mvrnorm(
10,
mu = c(0,0),
Sigma = matrix(c(1, 0.56, 0.56, 1), ncol = 2),
empirical = TRUE
) %>%
as.data.frame %>%
setNames(c("x", "y"))
When looking at ?as.data.frame, these are the methods described:
## S3 method for class 'character'
as.data.frame(x, ...,
stringsAsFactors = default.stringsAsFactors())
## S3 method for class 'list'
as.data.frame(x, row.names = NULL, optional = FALSE, ...,
cut.names = FALSE, col.names = names(x), fix.empty.names = TRUE,
stringsAsFactors = default.stringsAsFactors())
## S3 method for class 'matrix'
as.data.frame(x, row.names = NULL, optional = FALSE,
make.names = TRUE, ...,
stringsAsFactors = default.stringsAsFactors())
Notice that the matrix method does not have a col.names argument. Only the list method does. So in converting a list to a data.frame, you can use col.names, but not converting a matrix.

R: fill an empty matrix with lists and keep row and column names

I have an empty matrix of the following form:
Empty_Matrix = matrix( NA ,nrow = 3, ncol = 2, byrow = TRUE, dimnames = list(c("a","b","c"),c("aa","bb")) )
aa bb
a NA NA
b NA NA
c NA NA
I would like to fill each element of this matrix which another matrix e.g:
Empty_Matrix[,] = list(matrix(0,nrow = 4, ncol=1))
This actually works although, I loose the structure of the row and column names as it shown in the following console screen-shot:
Contrary, if I use the following lines of code
Empty_Matrix = matrix( list(matrix(0,nrow = 4, ncol=2)) ,nrow = 3, ncol = 2, byrow = TRUE, dimnames = list(c("a","b","c"),c("aa","bb")))
the desired output is retrieved:
My question is if it is possible to use a similar line of code such as
Empty_Matrix[,] = list(matrix(0,nrow = 4, ncol=1))
(where Empty_Matrix has already been created with NA elements) and have the console output of the second image.

Heatmap returning error: 'x' must be a numeric matrix, but x is a numeric matrix

I am trying to create a heatmap of species abundances across six sites.
I have a matrix of sites vs species, of numeric abundance data.
However when I run my code, R returns an error that my matrix is non-numeric.
Can anyone figure this one out? I am stumped.
Exported dataframe link: log_mean_wide
Working:
lrc <- rainbow(nrow(log_mean_wide), start = 0, end = .3)
lcc <- rainbow(ncol(log_mean_wide), start = 0, end = .3)
logmap <- heatmap(log_mean_wide, col = cm.colors(256), scale = "column",
RowSideColors = lrc, ColSideColors = lcc, margins = c(5, 10),
xlab = "species", ylab = "Site",
main = "heatmap(<Auckland Council MCI data 1999, habitat:bank>, ..., scale = \"column\")")
error message: Error in heatmap(log_mean_wide, Rowv = NA, Colv = NA, col = cm.colors(256), : 'x' must be a numeric matrix
log_heatmap <- heatmap(log_mean_wide, Rowv=NA, Colv=NA, col = cm.colors(256), scale="column", margins=c(5,10)) #same error
is.numeric(log_mean_wide) #[1] FALSE
is.character(log_mean_wide) #[1] FALSE
is.factor(log_mean_wide) #[1] FALSE
is.logical(log_mean_wide) #[1] FALSE
is.integer(log_mean_wide) #[1] FALSE
?!?!
dims <- dim(log_mean_wide)
log_mean_matrix <- as.numeric(log_mean_wide)
dim(log_mean_matrix) <- dims
Error: (list) object cannot be coerced to type 'double'
str(log_mean_wide) shows species as numeric, site as character- why does this not work then?
storage.mode(log_mean_wide) <- "numeric"
Error in storage.mode(log_mean_wide) <- "numeric" : (list) object cannot be coerced to type 'double'
There are two issues:
The first column log_mean_wide$Site is non-numeric.
heatmap only accepts a matrix as input data (not a data.frame).
To address these issues, you can do the following (mind you, there is a lot of clutter in the heatmap):
# Store Site information as rownames
df <- log_mean_wide;
rownames(df) <- log_mean_wide[, 1];
# Remove non-numeric column
df <- df[, -1];
# Use as.matrix to convert data.frame to matrix
logmap <- heatmap(
as.matrix(df),
col = cm.colors(256),
scale = "column",
margins = c(5, 10),
xlab = "species", ylab = "Site",
main = "heatmap(<Auckland Council MCI data 1999, habitat:bank>, ..., scale = \"column\")")
This is an old question but since I spent some time figuring out what the issue was, I will add an answer here. Drawing a heatmap, specifically adding an annotation column may fail if the annotation data is a tibble.
Reproducible example:
test = matrix(rnorm(200), 20, 10)
test[1:10, seq(1, 10, 2)] = test[1:10, seq(1, 10, 2)] + 3
test[11:20, seq(2, 10, 2)] = test[11:20, seq(2, 10, 2)] + 2
test[15:20, seq(2, 10, 2)] = test[15:20, seq(2, 10, 2)] + 4
colnames(test) = paste("Test", 1:10, sep = "")
rownames(test) = paste("Gene", 1:20, sep = "")
annotation_col = data.frame(
CellType = factor(rep(c("CT1", "CT2"), 5)),
Time = 1:5
)
rownames(annotation_col) = paste("Test", 1:10, sep = "")
pheatmap::pheatmap(test, annotation_col =
annotation_col)
The above works. However, if you instead were using a tibble, you would get an error
pheatmap::pheatmap(test, annotation_col =
dplyr::as_tibble(annotation_col))
Error in cut.default(a, breaks = 100) : 'x' must be numeric
NOTE
I think it would have been better for this error to specify that we needed a data.frame instead of something else. That might have been more specific.
See this issue.

Why is my following matrix statement failing

matrix_1 = matrix(rep(c("p","r"),6), c(rep("control",6), rep("concussion",6)),
nrow = 12, ncol = 2)
It says invalid byrow argument (I want it by column and byrow is F by default) so I basically want the first column to have p and r repeated 6 times for a total of 12 rows and the second column to have Control in first 6 rows and Concussion in the next 6
The usage of matrix is
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
dimnames = NULL)
So, if the data is being split up like in the OP's post, one of the arguments that is not specified would be incorrectly being labelled as byrowi.e. in the OP's code,
data = rep(c("p","r"),6)
The nrow and ncol arguments are specified, leaving the other two arguments in the order to take up the rest of the input i.e.
c(rep("control",6), rep("concussion",6))
would be mistakenly taken as argument for byrow. However, byrow takes a logical argument and it is a potential reason for the error.
matrix_1 = matrix(rep(c("p","r"),6), c(rep("control",6), rep("concussion",6)),
nrow = 12, ncol = 2)
Error in matrix(rep(c("p", "r"), 6), c(rep("control", 6),
rep("concussion", : invalid 'byrow' argument
If we specify the byrow = FALSE, then the error will be based on the dimnames
matrix_1 = matrix(rep(c("p","r"),6), c(rep("control",6), rep("concussion",6)),
nrow = 12, ncol = 2, byrow = FALSE)
Error in matrix(rep(c("p", "r"), 6), c(rep("control", 6),
rep("concussion", : 'dimnames' must be a list
As there is only a single data argument, we need to concatenate the strings
matrix_1 <- matrix(data = c(rep(c("p","r"),6),c(rep("control",6),rep("concussion",6))),
nrow=12,ncol=2)
Now, as byrow = FALSE by default it will not get affected

Resources