I would like to perform heatmap. I transferred the data frame to matrix. My first column in the matrix contains 51 state names in character format. Due to this when I execute heatmap an error pops out ('X' must be numeric). If I convert the matrix into numeric all the states get converted to numeric values from 1 to 51. Name of the state gets changed to numbers. I would like someone to help me in converting the character column into numeric without any value change in the column.
enter image description here
I get the following error:
> heatmap.2(matrix)
Error in heatmap.2(matrix) : `x' must be a numeric matrix
dput(matrix[1:20,1:5])
structure(c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC", "DE",
"FL", "GA", "HI", "IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA",
" 156023.01", " 934292.20", " 565543.16", " 859246.77", "1802826.03",
" 236048.04", " 277419.16", " 44170.06", " 364245.19", "3059883.80",
"1032052.28", " 49148.00", " 484355.76", " 103032.97", "1501399.16",
"1098716.37", " 536964.81", " 714912.96", " 930454.92", "1006184.61",
NA, " 647281.97", " 243467.03", " 222016.05", "1955376.54", " 284157.80",
" 546510.14", " 310209.01", " 238855.76", "3055374.94", " 620487.04",
" 52286.08", " 183689.95", " 101198.95", "2299302.42", " 682522.43",
" 203429.06", " 566182.29", " 434137.97", "1269701.60", " 279984.88",
" 1785117.72", " 1210217.08", " 1738388.11", "12313826.52", " 1033786.31",
" 1905870.34", " 1589936.20", " 1177198.27", " 7379680.11", " 3182089.09",
" 539865.15", " 907408.47", " 706547.91", " 5616722.28", " 2793763.32",
" 751262.24", " 2620593.80", " 3327343.31", " 3423941.61", " 277346.4",
" 3231424.9", " 1784411.7", " 2539940.3", "13107647.6", " 1623508.4",
" 2475804.7", " 1382151.2", " 1362240.3", "10431341.9", " 4514651.7",
" 1081821.1", " 1653629.7", " 594605.5", " 9147134.3", " 4121661.9",
" 1292330.2", " 3252592.8", " 3360762.2", " 4269284.1"), .Dim = c(20L,
5L), .Dimnames = list(NULL, c("Provider.State", "039 ", "057 ",
"064 ", "065 ")))
(I named it m so that I don't override the matrix function.)
First, your first column is an identifier. I'm going to infer that they have meaning, so I'll keep them around as row-names, but that doesn't change the outcome.
head(m)
# Provider.State 039 057 064 065
# [1,] "AK" " 156023.01" NA " 279984.88" " 277346.4"
# [2,] "AL" " 934292.20" " 647281.97" " 1785117.72" " 3231424.9"
# [3,] "AR" " 565543.16" " 243467.03" " 1210217.08" " 1784411.7"
# [4,] "AZ" " 859246.77" " 222016.05" " 1738388.11" " 2539940.3"
# [5,] "CA" "1802826.03" "1955376.54" "12313826.52" "13107647.6"
# [6,] "CO" " 236048.04" " 284157.80" " 1033786.31" " 1623508.4"
rn <- m[,1]
m <- m[,-1]
rn
# [1] "AK" "AL" "AR" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "HI" "IA" "ID" "IL" "IN" "KS" "KY" "LA" "MA"
head(m)
# 039 057 064 065
# [1,] " 156023.01" NA " 279984.88" " 277346.4"
# [2,] " 934292.20" " 647281.97" " 1785117.72" " 3231424.9"
# [3,] " 565543.16" " 243467.03" " 1210217.08" " 1784411.7"
# [4,] " 859246.77" " 222016.05" " 1738388.11" " 2539940.3"
# [5,] "1802826.03" "1955376.54" "12313826.52" "13107647.6"
# [6,] " 236048.04" " 284157.80" " 1033786.31" " 1623508.4"
(We'll use rn in a minute.) Now we need to convert everything to numbers.
m <- apply(m, 2, as.numeric)
rownames(m) <- rn
head(m)
# 039 057 064 065
# AK 156023.0 NA 279984.9 277346.4
# AL 934292.2 647282.0 1785117.7 3231424.9
# AR 565543.2 243467.0 1210217.1 1784411.7
# AZ 859246.8 222016.0 1738388.1 2539940.3
# CA 1802826.0 1955376.5 12313826.5 13107647.6
# CO 236048.0 284157.8 1033786.3 1623508.4
And now the heatmap works.
heatmap(m)
it can be done with purrr package
try with below :
library(purrr)
df<-df %>%
map_if(is.factor,as.character) %>%
as.matrix
Related
I have a problem with the mutate(across()) function.
In the tibble you can see below, I want to delete the "letter + underscores" (e.g. "p__", "c__" etc) in the columns.
A tibble: 2,477 x 4
Phylum Class Order Family
<chr> <chr> <chr> <chr>
1 " p__Proteobacteria" " c__Gammaproteobacter~ " o__Aeromonadales" " f__Aeromonadaceae"
2 " p__Bacteroidota" " c__Bacteroidia" " o__Bacteroidales" " f__Williamwhitmaniac~
3 " p__Fusobacteriota" " c__Fusobacteriia" " o__Fusobacterial~ " f__Leptotrichiaceae"
4 " p__Firmicutes" " c__Clostridia" " o__Clostridiales" " f__Clostridiaceae"
5 " p__Proteobacteria" " c__Gammaproteobacter~ " o__Enterobactera~ " f__Enterobacteriacea~
6 " p__Bacteroidota" " c__Bacteroidia" " o__Bacteroidales" " f__Williamwhitmaniac~
7 " p__Firmicutes" " c__Clostridia" " o__Lachnospirale~ " f__Lachnospiraceae"
8 " p__Bacteroidota" " c__Bacteroidia" " o__Cytophagales" " f__Spirosomaceae"
9 " p__Proteobacteria" " c__Gammaproteobacter~ " o__Burkholderial~ " f__Comamonadaceae"
10 " p__Actinobacteriot~ " c__Actinobacteria" " o__Frankiales" " f__Sporichthyaceae"
# ... with 2,467 more rows
A year ago I used the command
table <- table %>%
mutate_at(vars(Phylum, Class, Order, Family),funs(sub(pattern = "^([a-z])(_{2})", replacement = "", .)))
Now, it gives me the hint that the funs-function is not longer supported and it does not work anymore.
Do you have some suggestions for me?
I thought about:
taxon <- c("Phylum", "Class", "Order", "Family")
table <- table %>%
mutate(across(taxon), gsub(pattern = "^([a-z])(_{2})", replacement = "", .))
But here I get the error:
Error: Invalid index: out of bounds
Thanks a lot :)
Kathrin
You can do :
library(dplyr)
taxon <- c("Phylum", "Class", "Order", "Family")
table <- table %>% mutate(across(taxon,
~gsub(pattern = "^([a-z])(_{2})", replacement = "", .)))
I don't have your data to confirm this but there seems to be a whitespace at the beginning of the string which should be removed first.
table <- table %>% mutate(across(taxon,
~gsub(pattern = "^([a-z])(_{2})", replacement = "", trimws(.))))
I have been looking around for few hours now and have not been able not remove "" from the character of strings below.
c("Final", "A", "7.43", "8.50", "15.93", "2.00",
"1.00", "0.30", "0.37", " 7.43", " 8.50", "0.50", "0.67", " ",
" ", " ", " ", " ", " ", " ", "B", "7.00", "3.77", "10.77",
" 7.00", "1.67", "3.77", " ", " ", " ", " ", " ", " ", " ", " ",
I have many more of these empty values in this dataset and just want to get rid of them before organizing then as a data frame like
Final
A B
7.43 7.43
8.50 8.50
15.93 0.50
2.00 0.67
1.00
0.30
Thanks,
You can use the base grep with values = TRUE. That searches the character vector for a given regex pattern and returns all values where that pattern is found.
You can think about the logic of your pattern a couple ways. One might be to think of it as keeping values with a "word" character, which are letters, numbers, or underscores.
x <- c("Final", "A", "7.43", "8.50", "15.93", "2.00", "1.00", "0.30", "0.37", " 7.43", " 8.50", "0.50", "0.67", " ", " ", " ", " ", " ", " ", " ", "B", "7.00", "3.77", "10.77", " 7.00", "1.67", "3.77", " ", " ", " ", " ", " ", " ", " ", " ")
grep("\\w", x, value = T)
#> [1] "Final" "A" "7.43" "8.50" "15.93" "2.00" "1.00" "0.30"
#> [9] "0.37" " 7.43" " 8.50" "0.50" "0.67" "B" "7.00" "3.77"
#> [17] "10.77" " 7.00" "1.67" "3.77"
Another way is to find values with a character that isn't a space (\\S is the negation of \\s):
grep("\\S", x, value = T)
#> [1] "Final" "A" "7.43" "8.50" "15.93" "2.00" "1.00" "0.30"
#> [9] "0.37" " 7.43" " 8.50" "0.50" "0.67" "B" "7.00" "3.77"
#> [17] "10.77" " 7.00" "1.67" "3.77"
Created on 2018-12-10 by the reprex package (v0.2.1)
I've tried:
i <- as.numeric(as.character(Impress))
i <- as.numeric(as.character(levels(Impress)))
i <- as.numeric(paste(Impress))
I always get:
Warning message:
NAs introduced by coercion
> i
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
This is the data I want to be numeric:
> Impress
[1] 24,085,563.00 35,962,587.00 31,714,513.00 28,206,422.00 40,161,010.00 36,292,929.00 31,545,482.00
[8] 28,213,878.00 35,799,224.00 32,400,885.00 28,496,459.00 37,456,344.00 38,108,667.00 33,407,771.00
[15] 32,540,479.00 30,692,707.00 22,873,000.00 21,329,146.00 28,921,953.00 30,471,519.00 28,601,289.00
[22] 27,450,630.00 26,708,790.00 19,825,041.00 18,844,169.00 29,592,039.00 31,012,594.00 28,792,531.00
[29] 28,578,028.00 24,913,985.00
30 Levels: 18,844,169.00 19,825,041.00 21,329,146.00 22,873,000.00 24,085,563.00 24,913,985.00 ... 40,161,010.00
> paste(Impress)
[1] " 24,085,563.00 " " 35,962,587.00 " " 31,714,513.00 " " 28,206,422.00 " " 40,161,010.00 " " 36,292,929.00 " " 31,545,482.00 "
[8] " 28,213,878.00 " " 35,799,224.00 " " 32,400,885.00 " " 28,496,459.00 " " 37,456,344.00 " " 38,108,667.00 " " 33,407,771.00 "
[15] " 32,540,479.00 " " 30,692,707.00 " " 22,873,000.00 " " 21,329,146.00 " " 28,921,953.00 " " 30,471,519.00 " " 28,601,289.00 "
[22] " 27,450,630.00 " " 26,708,790.00 " " 19,825,041.00 " " 18,844,169.00 " " 29,592,039.00 " " 31,012,594.00 " " 28,792,531.00 "
[29] " 28,578,028.00 " " 24,913,985.00 "
and when I do i<-as.numeric(Impress), it pastes the wrong values.
Thanks!
As far as the computer is concerned, , is not a number and hence any number string containing it must not be numeric, even if to a human these look like perfectly acceptable numbers.
Get rid of the , and then it will work, e.g. using gsub()
i <- as.numeric(gsub(",", "", as.character(Impress)))
E.g.
Impress <- c("24,085,563.00", "35,962,587.00", "31,714,513.00", "28,206,422.00")
gsub(",", "", as.character(Impress))
i <- as.numeric(gsub(",", "", as.character(Impress)))
i
R> gsub(",", "", as.character(Impress))
[1] "24085563.00" "35962587.00" "31714513.00" "28206422.00"
R> i
[1] 24085563 35962587 31714513 28206422
R> is.numeric(i)
[1] TRUE
Because the data has commas, R cannot convert it to a numeric. You have to remove the commas with sub() first and then convert:
i <- as.numeric(gsub(",", "", as.character(impress)))
I am plotting a graph with barplot() and any attempts to use the beside=TRUE parameter seem to return the error of Error in -0.01 * height : non-numeric argument to binary operator
The following is the code for the graph:
combi <- as.matrix(combine)
barplot(combi, main="Top 5 hospitals in California",
ylab="Mortality/Admission Rates", col = heat.colors(5), las=1)
The output of the graph is that the bars are stacked on each other instead of being beside each other.
The issue is not reproducible, when combineis a data.frame:
combine <- data.frame(
HeartAttack = c(13.4,12.3,16,13,15.2),
HeartFailure = c(11.1,7.3,10.7,8.9,10.8),
Pneumonia = c(11.8,6.8,10,9.9,9.5),
HeartAttack2 = c(18.3,19.3,21.8,21.6,17.3),
HeartFailure2 = c(24,23.3,24.2,23.8,24.6),
Pneumonia2 = c(17.4,19,17,18.4,18.2)
)
combi <- as.matrix(combine)
barplot(combi, main="Top 5 hospitals in California",
ylab="Mortality/Admission Rates", col = heat.colors(5), las=1, beside = TRUE)
Had the same issue earlier (different dataset, tho) and resolved it by using as.numeric() on my dataframe after I converted it to matrix with as.matrix(). Leaving as as.numeric()" out leads to "Error in -0.01 * height : non-numeric argument to binary operator"
¯\(ツ)/¯
My df called tmp:
> tmp
125 1245 1252 1254 1525 1545 12125 12425 12525 12545 125245 125425
Freq.x.2d "14" " 1" " 1" " 1" " 3" " 2" " 1" " 1" " 9" " 4" " 1" " 5"
Freq.x.3d "13" " 0" " 1" " 0" " 4" " 0" " 0" " 0" "14" " 4" " 1" " 2"
> dim(tmp)
[1] 2 28
> is(tmp)
[1] "matrix" "array" "structure" "vector"
> tmp <- as.matrix(tmp)
> dim(tmp)
[1] 2 28
> is(tmp)
[1] "matrix" "array" "structure" "vector"
> tmp <- as.numeric(tmp)
> dim(tmp)
NULL
> is(tmp)
[1] "numeric" "vector"
barplot(tmp, las=2, beside=TRUE, col=c("grey40","grey80"))
I've tried:
i <- as.numeric(as.character(Impress))
i <- as.numeric(as.character(levels(Impress)))
i <- as.numeric(paste(Impress))
I always get:
Warning message:
NAs introduced by coercion
> i
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
This is the data I want to be numeric:
> Impress
[1] 24,085,563.00 35,962,587.00 31,714,513.00 28,206,422.00 40,161,010.00 36,292,929.00 31,545,482.00
[8] 28,213,878.00 35,799,224.00 32,400,885.00 28,496,459.00 37,456,344.00 38,108,667.00 33,407,771.00
[15] 32,540,479.00 30,692,707.00 22,873,000.00 21,329,146.00 28,921,953.00 30,471,519.00 28,601,289.00
[22] 27,450,630.00 26,708,790.00 19,825,041.00 18,844,169.00 29,592,039.00 31,012,594.00 28,792,531.00
[29] 28,578,028.00 24,913,985.00
30 Levels: 18,844,169.00 19,825,041.00 21,329,146.00 22,873,000.00 24,085,563.00 24,913,985.00 ... 40,161,010.00
> paste(Impress)
[1] " 24,085,563.00 " " 35,962,587.00 " " 31,714,513.00 " " 28,206,422.00 " " 40,161,010.00 " " 36,292,929.00 " " 31,545,482.00 "
[8] " 28,213,878.00 " " 35,799,224.00 " " 32,400,885.00 " " 28,496,459.00 " " 37,456,344.00 " " 38,108,667.00 " " 33,407,771.00 "
[15] " 32,540,479.00 " " 30,692,707.00 " " 22,873,000.00 " " 21,329,146.00 " " 28,921,953.00 " " 30,471,519.00 " " 28,601,289.00 "
[22] " 27,450,630.00 " " 26,708,790.00 " " 19,825,041.00 " " 18,844,169.00 " " 29,592,039.00 " " 31,012,594.00 " " 28,792,531.00 "
[29] " 28,578,028.00 " " 24,913,985.00 "
and when I do i<-as.numeric(Impress), it pastes the wrong values.
Thanks!
As far as the computer is concerned, , is not a number and hence any number string containing it must not be numeric, even if to a human these look like perfectly acceptable numbers.
Get rid of the , and then it will work, e.g. using gsub()
i <- as.numeric(gsub(",", "", as.character(Impress)))
E.g.
Impress <- c("24,085,563.00", "35,962,587.00", "31,714,513.00", "28,206,422.00")
gsub(",", "", as.character(Impress))
i <- as.numeric(gsub(",", "", as.character(Impress)))
i
R> gsub(",", "", as.character(Impress))
[1] "24085563.00" "35962587.00" "31714513.00" "28206422.00"
R> i
[1] 24085563 35962587 31714513 28206422
R> is.numeric(i)
[1] TRUE
Because the data has commas, R cannot convert it to a numeric. You have to remove the commas with sub() first and then convert:
i <- as.numeric(gsub(",", "", as.character(impress)))