Cannot assign names when converting matrix using as.data.frame - r

I want to generate a two-column data frame that has a given correlation. Two columns named "x" and "y". There are tons of ways to do this, sampling from a multivariate normal distribution being one. So, for 50 rows of correlation r = 0.95, this works:
myFrame <- as.data.frame(mvrnorm(10, mu = c(0,0), Sigma = matrix(c(1,0.56,0.56,1),, ncol = 2), empirical = TRUE))
myFrame
...but you'll notice that the column names are V1 and V2. I've read what I think is relevant in the docs, but I can't get the names to change.
I've tried using col.names = c("x", "y") in various places. It seems it would go between the final two closed parentheses, but I've tried other places. Even though I didn't think it correct, I tried names = c("x, y") as well, to no avail.
I understand I could use a second step to change the names, but since as.data.frame() accepts a vector to name the columns, I shouldn't have to resort to that.

As pointed out in comments, neither data.frame() nor the matrix method for as.data.frame have an argument to let you set column names.
The standard way, as you say, would be to set the names of the object in a second line of code. If that is abhorrent to you, you can still get it done in a single line. Here are two options:
myFrame1 = as.data.frame("colnames<-"(mvrnorm(10, mu = c(0,0), Sigma = matrix(c(1, 0.56, 0.56, 1), ncol = 2), empirical = TRUE), c("x", "y")))
myFrame2 = setNames(as.data.frame(mvrnorm(10, mu = c(0,0), Sigma = matrix(c(1, 0.56, 0.56, 1), ncol = 2), empirical = TRUE)), c("x", "y"))
# I would prefer using two lines, much clearer:
myFrame3 = as.data.frame(mvrnorm(10, mu = c(0,0), Sigma = matrix(c(1, 0.56, 0.56, 1), ncol = 2), empirical = TRUE))
names(myFrame3) = c("x", "y")
# Or, if you're a fine of pipes:
library(magrittr)
myFrame4 = mvrnorm(
10,
mu = c(0,0),
Sigma = matrix(c(1, 0.56, 0.56, 1), ncol = 2),
empirical = TRUE
) %>%
as.data.frame %>%
setNames(c("x", "y"))
When looking at ?as.data.frame, these are the methods described:
## S3 method for class 'character'
as.data.frame(x, ...,
stringsAsFactors = default.stringsAsFactors())
## S3 method for class 'list'
as.data.frame(x, row.names = NULL, optional = FALSE, ...,
cut.names = FALSE, col.names = names(x), fix.empty.names = TRUE,
stringsAsFactors = default.stringsAsFactors())
## S3 method for class 'matrix'
as.data.frame(x, row.names = NULL, optional = FALSE,
make.names = TRUE, ...,
stringsAsFactors = default.stringsAsFactors())
Notice that the matrix method does not have a col.names argument. Only the list method does. So in converting a list to a data.frame, you can use col.names, but not converting a matrix.

Related

unused arguments when trying to create a matrix in R

I want create such matrix
dat <- matrix(
"an_no" = c(14, 17),
"an_yes" = c(3, 1),
row.names = c("TL-MCT-t", "ops"),
stringsAsFactors = FALSE
)
but i get error unused arguments.
What i did wrong and how perform correct matrix with such arguments?
as.matrix didn't help.
Thanks for your help.
You are using the arguments that you would use to build a data frame. If you want a matrix using this syntax you can do:
dat <- as.matrix(data.frame(
an_no = c(14, 17),
an_yes = c(3, 1),
row.names = c("TL-MCT-t", "ops")))
dat
#> an_no an_yes
#> TL-MCT-t 14 3
#> ops 17 1
You don't need the stringsAsFactors = FALSE because none of your data elements are strings, and in any case, stringsAsFactors is FALSE by default unless you are using an old version of R. You also don't need quotation marks around an_no and an_yes because these are both legal variable names in R.
The matrix function estructure is this:
matrix(data = NA,
nrow = 1,
ncol = 1,
byrow = FALSE,
dimnames = NULL)
Appears you're trying to create a data.frame
data.frame(row_names = c("TL-MCT-t", "ops"),
an_no = c(14,17),
an_yes = c(3,1)
)

Perform an operation with complete cases without changing the original vectors

I would like to calculate a rank-biserial correlation. But the (only it seems) package can't handle missing values that well. It has no built in "na.omit = TRUE" function. I could remove the missings in the data frame, but that would be a hustle with many different calculations.
n <- 500
df <- data.frame(id = seq (1:n),
ord = sample(c(0:3), n, rep = TRUE),
sex = sample(c("m", "f"), n, rep = TRUE, prob = c(0.55, 0.45))
)
df <- as.data.frame(apply (df, 2, function(x) {x[sample( c(1:n), floor(n/10))] <- NA; x} ))
library(rcompanion)
wilcoxonRG(x = df$ord, g = df$sex, verbose = T)
I imagine something stupidly easy like "complete.cases(wilcoxonRG(x = df$ord, g = df$sex, verbose = T)). It's probably not that hard but I could only find comeplete data frame manipulations. Thanks in advance!

Efficiently populating rows given possible values for each variable in R

I have a dataframe with 42 variables, each of which have different possible values. I am aiming to create a much larger dataframe which contains a row for each possible combination of values for each of the variables.
This will be millions of rows long and too large to hold in RAM. I have therefore been trying to make a script which appends each possible value to an existing file. The following code works but does so too slowly to be practical (also includes only 5 variables), taking just under 5 minutes to run on my machine.
V1 <- c(seq(0, 30, 1), NA)
V2 <- c(seq(20, 55, 1), NA)
V3 <- c(0, 1, NA)
V4 <- c(seq(1, 16, 1), NA)
V5 <- c(seq(15, 170, 1), NA)
df_empty <- data.frame(V1 = NA, V2 = NA, V3 = NA, V4 = NA)
write.csv(df_empty, "table_out.csv", row.names = FALSE)
start <- Sys.time()
for(v1 in 1:length(V1)){
V1_val <- V1[v1]
for(v2 in 1:length(V2)){
V2_val <- V2[v2]
for(v3 in 1:length(V3)){
V3_val <- V3[v3]
for(v4 in 1:length(V4)){
V4_val <- V4[v4]
row <- cbind(V1_val, V2_val, V3_val, V4_val)
write.table(as.matrix(row), file = "table_out.csv", sep = ",", append = TRUE, quote = FALSE,col.names = FALSE, row.names = FALSE)
}
}
}
}
print(abs(Sys.time() - start)) # 4.8 minutes
print(paste(nrow(read.csv("table_out.csv")), "rows in file"))
I have tested using data.table::fwrite() but this failed to be any faster than write.table(as.matrix(x))
I'm sure the issue I have is with using so many for loops but am unsure how to translate this into a more efficient approach.
Thanks
I guess you can try the following code to generate all combinations
M <- as.matrix(do.call(expand.grid,mget(x = ls(pattern = "^V\\d+"))))
and then you are able to save res to you designated file, e.g.,
write.table(M, file = "table_out.csv", sep = ",", append = TRUE, quote = FALSE,col.names = FALSE, row.names = FALSE)

Heatmap returning error: 'x' must be a numeric matrix, but x is a numeric matrix

I am trying to create a heatmap of species abundances across six sites.
I have a matrix of sites vs species, of numeric abundance data.
However when I run my code, R returns an error that my matrix is non-numeric.
Can anyone figure this one out? I am stumped.
Exported dataframe link: log_mean_wide
Working:
lrc <- rainbow(nrow(log_mean_wide), start = 0, end = .3)
lcc <- rainbow(ncol(log_mean_wide), start = 0, end = .3)
logmap <- heatmap(log_mean_wide, col = cm.colors(256), scale = "column",
RowSideColors = lrc, ColSideColors = lcc, margins = c(5, 10),
xlab = "species", ylab = "Site",
main = "heatmap(<Auckland Council MCI data 1999, habitat:bank>, ..., scale = \"column\")")
error message: Error in heatmap(log_mean_wide, Rowv = NA, Colv = NA, col = cm.colors(256), : 'x' must be a numeric matrix
log_heatmap <- heatmap(log_mean_wide, Rowv=NA, Colv=NA, col = cm.colors(256), scale="column", margins=c(5,10)) #same error
is.numeric(log_mean_wide) #[1] FALSE
is.character(log_mean_wide) #[1] FALSE
is.factor(log_mean_wide) #[1] FALSE
is.logical(log_mean_wide) #[1] FALSE
is.integer(log_mean_wide) #[1] FALSE
?!?!
dims <- dim(log_mean_wide)
log_mean_matrix <- as.numeric(log_mean_wide)
dim(log_mean_matrix) <- dims
Error: (list) object cannot be coerced to type 'double'
str(log_mean_wide) shows species as numeric, site as character- why does this not work then?
storage.mode(log_mean_wide) <- "numeric"
Error in storage.mode(log_mean_wide) <- "numeric" : (list) object cannot be coerced to type 'double'
There are two issues:
The first column log_mean_wide$Site is non-numeric.
heatmap only accepts a matrix as input data (not a data.frame).
To address these issues, you can do the following (mind you, there is a lot of clutter in the heatmap):
# Store Site information as rownames
df <- log_mean_wide;
rownames(df) <- log_mean_wide[, 1];
# Remove non-numeric column
df <- df[, -1];
# Use as.matrix to convert data.frame to matrix
logmap <- heatmap(
as.matrix(df),
col = cm.colors(256),
scale = "column",
margins = c(5, 10),
xlab = "species", ylab = "Site",
main = "heatmap(<Auckland Council MCI data 1999, habitat:bank>, ..., scale = \"column\")")
This is an old question but since I spent some time figuring out what the issue was, I will add an answer here. Drawing a heatmap, specifically adding an annotation column may fail if the annotation data is a tibble.
Reproducible example:
test = matrix(rnorm(200), 20, 10)
test[1:10, seq(1, 10, 2)] = test[1:10, seq(1, 10, 2)] + 3
test[11:20, seq(2, 10, 2)] = test[11:20, seq(2, 10, 2)] + 2
test[15:20, seq(2, 10, 2)] = test[15:20, seq(2, 10, 2)] + 4
colnames(test) = paste("Test", 1:10, sep = "")
rownames(test) = paste("Gene", 1:20, sep = "")
annotation_col = data.frame(
CellType = factor(rep(c("CT1", "CT2"), 5)),
Time = 1:5
)
rownames(annotation_col) = paste("Test", 1:10, sep = "")
pheatmap::pheatmap(test, annotation_col =
annotation_col)
The above works. However, if you instead were using a tibble, you would get an error
pheatmap::pheatmap(test, annotation_col =
dplyr::as_tibble(annotation_col))
Error in cut.default(a, breaks = 100) : 'x' must be numeric
NOTE
I think it would have been better for this error to specify that we needed a data.frame instead of something else. That might have been more specific.
See this issue.

R using lapply, sapply and vapply to get named list of dataframes

I want to use a functional program to solve this problem
prob.designs= list()
prob.designs$UF1 = data.table(sigma = c(0, 0,01, 0.1, 1))
prob.designs$UF2 = data.table(sigma = c(0, 0,01, 0.1, 1))
...
prob.designs$UF10 = data.table(sigma = c(0, 0,01, 0.1, 1))
And I try the following code, which generates a list of dataframes but the lists' elements are not named UF1,UF2.....
prob.designs = lapply(paste("UF", 1:10, sep = ""), FUN = function(colname){
prob.designs[[eval(colname)]]=data.table(sigma = c(0, 0.01, 0.1 ,1))}
)
The sapply works but in a wrong way since the names for the list becomes UF1.sigma rather than UF1
prob.designs = sapply(paste("UF", 1:10, sep = ""), FUN = function(colname){
prob.designs[[eval(colname)]]=data.table(sigma = c(0, 0.01, 0.1 ,1))}
)
Finally, I tried vapply, but this is even more strange
prob.designs = vapply(paste("UF", 1:10, sep = ""), FUN = function(colname){
prob.designs[[eval(colname)]]=data.table(sigma = c(0, 0.01, 0.1 ,1))}
, FUN.VAL = data.table(sigma = c(0, 0.01, 0.1 ,1))
)
Now the names of the list are right, which is UF1, UF2,...UF10
but each element of the list now are not dataframes!
Could someone help me explain what happened to the three cases and most importantly, what is the correct way to do this?

Resources