I am plotting a graph with barplot() and any attempts to use the beside=TRUE parameter seem to return the error of Error in -0.01 * height : non-numeric argument to binary operator
The following is the code for the graph:
combi <- as.matrix(combine)
barplot(combi, main="Top 5 hospitals in California",
ylab="Mortality/Admission Rates", col = heat.colors(5), las=1)
The output of the graph is that the bars are stacked on each other instead of being beside each other.
The issue is not reproducible, when combineis a data.frame:
combine <- data.frame(
HeartAttack = c(13.4,12.3,16,13,15.2),
HeartFailure = c(11.1,7.3,10.7,8.9,10.8),
Pneumonia = c(11.8,6.8,10,9.9,9.5),
HeartAttack2 = c(18.3,19.3,21.8,21.6,17.3),
HeartFailure2 = c(24,23.3,24.2,23.8,24.6),
Pneumonia2 = c(17.4,19,17,18.4,18.2)
)
combi <- as.matrix(combine)
barplot(combi, main="Top 5 hospitals in California",
ylab="Mortality/Admission Rates", col = heat.colors(5), las=1, beside = TRUE)
Had the same issue earlier (different dataset, tho) and resolved it by using as.numeric() on my dataframe after I converted it to matrix with as.matrix(). Leaving as as.numeric()" out leads to "Error in -0.01 * height : non-numeric argument to binary operator"
¯\(ツ)/¯
My df called tmp:
> tmp
125 1245 1252 1254 1525 1545 12125 12425 12525 12545 125245 125425
Freq.x.2d "14" " 1" " 1" " 1" " 3" " 2" " 1" " 1" " 9" " 4" " 1" " 5"
Freq.x.3d "13" " 0" " 1" " 0" " 4" " 0" " 0" " 0" "14" " 4" " 1" " 2"
> dim(tmp)
[1] 2 28
> is(tmp)
[1] "matrix" "array" "structure" "vector"
> tmp <- as.matrix(tmp)
> dim(tmp)
[1] 2 28
> is(tmp)
[1] "matrix" "array" "structure" "vector"
> tmp <- as.numeric(tmp)
> dim(tmp)
NULL
> is(tmp)
[1] "numeric" "vector"
barplot(tmp, las=2, beside=TRUE, col=c("grey40","grey80"))
Related
I need to add a space between every 3rd character in the string but from the end. Also, ignore the element, which has a percentage %.
string <- c('186527500', '3875055', '23043', '10.8%', '9.8%')
And need to get the view:
186 527 500, 3 875 055, 23 043, 10.8%, 9.8%
You could do:
ifelse(grepl('%', string), string, scales::comma(as.numeric(string), big = ' '))
#> [1] "186 527 500" "3 875 055" "23 043" "10.8%" "9.8%"
Using format and ifelse:
ifelse(!grepl("\\D", string), format(as.numeric(string), big.mark = " ", trim = T), string)
#[1] "186 527 500" "3 875 055" "23 043" "10.8%" "9.8%"
Here is a base R solution with prettyNum. The trick is to set big.mark to one space character.
I use a variant of my answer to another post, but instead of returning an index to the numbers cannot be converted to numeric, the function below returns the index of the numbers that can. This is to avoid trying to put spaces in the % numbers.
check_num <- function(x){
y <- suppressWarnings(as.numeric(x))
if(anyNA(y)){
which(!is.na(y))
} else invisible(NULL)
}
string <- c('186527500', '3875055', '23043', '10.8%', '9.8%')
i <- check_num(string)
prettyNum(string[i], big.mark = " ", preserve.width = "none")
#> [1] "186 527 500" "3 875 055" "23 043"
Created on 2022-05-16 by the reprex package (v2.0.1)
You can then assign the result back to the original string.
string[i] <- prettyNum(string[i], big.mark = " ", preserve.width = "none")
An idea is to reverse the strings, add the space and reverse back, i.e.
new_str <- string[!grepl('%', string)]
stringi::stri_reverse(sub("\\s+$", "", gsub('(.{3})', '\\1 ',stringi::stri_reverse(new_str))))
#[1] "186 527 500" "3 875 055" "23 043"
Another way is via formatC, i.e.
sapply(new_str, function(i)formatC(as.numeric(i), big.mark = " ", big.interval = 3, format = "d", flag = "0", width = nchar(i)))
# 186527500 3875055 23043
# "186 527 500" "3 875 055" "23 043"
I would like to perform heatmap. I transferred the data frame to matrix. My first column in the matrix contains 51 state names in character format. Due to this when I execute heatmap an error pops out ('X' must be numeric). If I convert the matrix into numeric all the states get converted to numeric values from 1 to 51. Name of the state gets changed to numbers. I would like someone to help me in converting the character column into numeric without any value change in the column.
enter image description here
I get the following error:
> heatmap.2(matrix)
Error in heatmap.2(matrix) : `x' must be a numeric matrix
dput(matrix[1:20,1:5])
structure(c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC", "DE",
"FL", "GA", "HI", "IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA",
" 156023.01", " 934292.20", " 565543.16", " 859246.77", "1802826.03",
" 236048.04", " 277419.16", " 44170.06", " 364245.19", "3059883.80",
"1032052.28", " 49148.00", " 484355.76", " 103032.97", "1501399.16",
"1098716.37", " 536964.81", " 714912.96", " 930454.92", "1006184.61",
NA, " 647281.97", " 243467.03", " 222016.05", "1955376.54", " 284157.80",
" 546510.14", " 310209.01", " 238855.76", "3055374.94", " 620487.04",
" 52286.08", " 183689.95", " 101198.95", "2299302.42", " 682522.43",
" 203429.06", " 566182.29", " 434137.97", "1269701.60", " 279984.88",
" 1785117.72", " 1210217.08", " 1738388.11", "12313826.52", " 1033786.31",
" 1905870.34", " 1589936.20", " 1177198.27", " 7379680.11", " 3182089.09",
" 539865.15", " 907408.47", " 706547.91", " 5616722.28", " 2793763.32",
" 751262.24", " 2620593.80", " 3327343.31", " 3423941.61", " 277346.4",
" 3231424.9", " 1784411.7", " 2539940.3", "13107647.6", " 1623508.4",
" 2475804.7", " 1382151.2", " 1362240.3", "10431341.9", " 4514651.7",
" 1081821.1", " 1653629.7", " 594605.5", " 9147134.3", " 4121661.9",
" 1292330.2", " 3252592.8", " 3360762.2", " 4269284.1"), .Dim = c(20L,
5L), .Dimnames = list(NULL, c("Provider.State", "039 ", "057 ",
"064 ", "065 ")))
(I named it m so that I don't override the matrix function.)
First, your first column is an identifier. I'm going to infer that they have meaning, so I'll keep them around as row-names, but that doesn't change the outcome.
head(m)
# Provider.State 039 057 064 065
# [1,] "AK" " 156023.01" NA " 279984.88" " 277346.4"
# [2,] "AL" " 934292.20" " 647281.97" " 1785117.72" " 3231424.9"
# [3,] "AR" " 565543.16" " 243467.03" " 1210217.08" " 1784411.7"
# [4,] "AZ" " 859246.77" " 222016.05" " 1738388.11" " 2539940.3"
# [5,] "CA" "1802826.03" "1955376.54" "12313826.52" "13107647.6"
# [6,] "CO" " 236048.04" " 284157.80" " 1033786.31" " 1623508.4"
rn <- m[,1]
m <- m[,-1]
rn
# [1] "AK" "AL" "AR" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "HI" "IA" "ID" "IL" "IN" "KS" "KY" "LA" "MA"
head(m)
# 039 057 064 065
# [1,] " 156023.01" NA " 279984.88" " 277346.4"
# [2,] " 934292.20" " 647281.97" " 1785117.72" " 3231424.9"
# [3,] " 565543.16" " 243467.03" " 1210217.08" " 1784411.7"
# [4,] " 859246.77" " 222016.05" " 1738388.11" " 2539940.3"
# [5,] "1802826.03" "1955376.54" "12313826.52" "13107647.6"
# [6,] " 236048.04" " 284157.80" " 1033786.31" " 1623508.4"
(We'll use rn in a minute.) Now we need to convert everything to numbers.
m <- apply(m, 2, as.numeric)
rownames(m) <- rn
head(m)
# 039 057 064 065
# AK 156023.0 NA 279984.9 277346.4
# AL 934292.2 647282.0 1785117.7 3231424.9
# AR 565543.2 243467.0 1210217.1 1784411.7
# AZ 859246.8 222016.0 1738388.1 2539940.3
# CA 1802826.0 1955376.5 12313826.5 13107647.6
# CO 236048.0 284157.8 1033786.3 1623508.4
And now the heatmap works.
heatmap(m)
it can be done with purrr package
try with below :
library(purrr)
df<-df %>%
map_if(is.factor,as.character) %>%
as.matrix
I am seeing some unexpected behavior (to me anyway) when print() is included as a side effect in a function wrapped in mapply().
For example, this works as expected (and yes I know it's not how we add vectors):
mapply(function(i,j) i+j, i=1:3, j=4:6) # returns [1] 5 7 9
And so does this:
mapply(function(i,j) paste(i, "plus", j, "equals", i+j), i=1:3, j=4:6)
# returns [1] "1 plus 4 equals 5" "2 plus 5 equals 7" "3 plus 6 equals 9"
But this doesn't:
mapply(function(i,j) print(paste(i, "plus", j, "equals", i+j)), i=1:3, j=4:6)
# returns:
# [1] "1 plus 4 equals 5"
# [1] "2 plus 5 equals 7"
# [1] "3 plus 6 equals 9"
# [1] "1 plus 4 equals 5" "2 plus 5 equals 7" "3 plus 6 equals 9"
What's going on here? I haven't used mapply() in a while, so maybe this is a no-brainer... I'm using R version 3.4.0.
print both prints its argument and returns its value.
p <- print("abc")
# [1] "abc"
p
# [2] "abc"
So each element gets printed, then the vector of stuff gets returned (and printed). Try e.g. invisible(mapply(...)) or m <- mapply(...) for comparison.
FWIW cat() returns NULL ...
I have a data frame of values called "games" with several columns of numerics. The original csv file had some missing values, which became NAs when I read them in. I'm trying to replace these NAs with the row median (already stored as a column of the data frame). I can't get the original NA to coerce from a character to a numeric.
I first found the indices of the missing values.
ng <- which(is.na(games), arr.ind = TRUE)
Then I tried replacing the NAs with a value from the column "linemedian".
games[ng] <- games[ng[,1], "linemedian"]
games[ng]
[1] " -3.25" " 9.98" " -9.1" " -9.1" " 14.0" " -3.25" " 9.98" " -3.25" " 9.98" " 2.30" " 13.75" "-24.00" " 3.71" " 15.94" " 14.25" " -9.83" " 13.75" " -4.88"
Replacing the NAs with just any number also did not work.
games[is.na(games)] <- 0
[1] " 0.0" " 0.0" " 0" " 0" " 0" " 0.0" " 0.0" " 0.0" " 0.0" " 0.00" " 0.00" " 0.00" " 0" " 0" " 0.00" " 0.00" " 0.00" " 0.00"
I thought that removing the whitespace might change the outcome but it did not.
games[ng] <- as.numeric(trimws(games[ng[,1], "linemedian"]))
[1] "-3.25" "9.98" "-9.1" "-9.1" "14" "-3.25" "9.98" "-3.25" "9.98" "2.3" "13.75" "-24" "3.71" "15.94" "14.25" "-9.83" "13.75" "-4.88"
Other attempts that did not work:
games[ng] <- type.convert(games[ng]) # using type.convert()
games[, -c(1,2)] <- as.numeric(games[, -c(1,2)]) # first two columns are metadata
Error: (list) object cannot be coerced to type 'double'
games[, -c(1,2)] <- as.numeric(unlist(games[, -c(1,2)]))
games[ng] <- as.numeric(as.character(trimws(games[ng[,1], "linemedian"])))
# New Addition from Answer
games[, sapply(games, is.numeric)][ng] <- games[, sapply(games, is.numeric)][ng[,1], "linemedian"]
I know for sure that the value I'm assigning to games[ng] is a numeric.
games[ng[,1], "linemedian"]
[1] -3.25 9.98 -9.10 -9.10 14.00 -3.25 9.98 -3.25 9.98 2.30 13.75 -24.00 3.71 15.94 14.25 -9.83 13.75 -4.88
typeof(games[ng[,1], "linemedian"])
[1] "double"
Everywhere I look on the Stack Overflow boards, the obvious answer should be games[is.na(games)] <- VALUE. But that isn't working. Anybody have some idea?
Here's the full code if you want to replicate:
## Download Raw Files
download.file("http://www.thepredictiontracker.com/ncaa2016.csv",
"data/ncaa2016.csv")
download.file("http://www.thepredictiontracker.com/ncaapredictions.csv",
"data/ncaapredictions.csv")
## Create Training and Prediction Data Sets
games <- read.csv("data/ncaa2016.csv", header = TRUE, stringsAsFactors = FALSE,
colClasses=c(rep("character",2),rep("numeric",72)))
preds <- read.csv("data/ncaapredictions.csv", header = TRUE, stringsAsFactors = TRUE)
colnames(preds)[colnames(preds) == "linebillings"] <- "linebill"
colnames(preds)[colnames(preds) == "linebillings2"] <- "linebill2"
colnames(preds)[colnames(preds) == "home"] <- "Home"
colnames(preds)[colnames(preds) == "road"] <- "Road"
## Remove Columns with too many missing values
rm <- unique(c(names(games[, sapply(games, function(z) sum(is.na(z))) > 50]), # Games and predictions
names(preds[, sapply(preds, function(z) sum(is.na(z))) > 10]))) # with missing data
games <- games[, !(names(games) %in% rm)] # Remove games with no prediction data
preds <- preds[, !(names(preds) %in% rm)] # Remove predictions with no game data
## Replace NAs with Prediction Median
ng <- which(is.na(games), arr.ind = TRUE)
games[ng] <- games[ng[,1], "linemedian"]
Also, I can't post the entire dput() output, but here's a bit of a the data set just to show the structure.
dput(head(games[1:6]))
structure(list(Home = c("Alabama", "Arizona", "Arkansas", "Arkansas St.",
"Auburn", "Boston College"), Road = c("USC", "BYU", "Louisiana Tech",
"Toledo", "Clemson", "Georgia Tech"), line = c("12", "-2", "24.5",
"4", "-8.5", "-3"), linesag = c(12.19, 0.97, 24.26, -2.07, -4.78,
-2.74), linepayne = c(12, -0.81, 12.53, -0.86, -10.72, -3.87),
linemassey = c(19.15, -2.1, 21.07, -8.68, -5.45, -6.76)), .Names = c("Home",
"Road", "line", "linesag", "linepayne", "linemassey"), row.names = c(NA,
6L), class = "data.frame")
Lastly, I'm running R Version 3.2.1 on x86_64-w64-mingw32.
Without a test case this will be untested. It appears you are getting a global replacement but because some of your columns are character, you get coercion to all character values coerced from 0. I might have tried restricting the process to just numeric columns:
games[ , sapply(games, is.numeric) ][ ng ] <-
games[ , sapply(games, is.numeric)][ng[,1], "linemedian"]
After modifying your almost reproducible code I've concluded that your original code was successful but the output of your checking was the problem area>
str( games[ , sapply(games, is.numeric)][ng[,1], "linemedian"] )
#num [1:23] -3.25 9.98 -9.1 -9.1 14 -3.25 9.98 -3.25 9.98 2.3 ...
games[ ng ] <-
games[ , sapply(games, is.numeric)][ng[,1], "linemedian"]
games[ ng[1:2,] ]
[1] " -3.25" " 9.98"
> ng[1:2,]
row col
[1,] 619 3
[2,] 678 3
> str(games)
'data.frame': 720 obs. of 58 variables:
$ Home : chr "Alabama" "Arizona" "Arkansas" "Arkansas St." ...
$ Road : chr "USC" "BYU" "Louisiana Tech" "Toledo" ...
$ line : num 12 -2 24.5 4 -8.5 -3 8.5 37 -10.5 5 ...
$ linesag : num 12.19 0.97 24.26 -2.07 -4.78 ...
$ linepayne : num 12 -0.81 12.53 -0.86 -10.72 ...
deleted
> games[ c(619,678) , 3]
#[1] -3.25 9.98
> games[ matrix(c(619,678,3,3), ncol=2)]
[1] " -3.25" " 9.98"
So the third column remained numeric after the assignment, but for reasons I don't understand the output of the print function for matrix-indexed-extract looked like it was character when it was in fact numeric.
I generated an adjacency table mytable with cosine similarity, m1 is a DTM
cosineSim <- function(x){
as.dist(x%*%t(x)/(sqrt(rowSums(x^2) %*% t(rowSums(x^2)))))
}
cs <- cosineSim(m1)
mytable
"";"1";"2";"3";"4";"5";"6";"7";"8"
"1";0;0;0;0;0;0;0;0
"2";0;0;0;0;0;0;0;0
"3";0;0;0;0.259;0;0;0;0
"4";0;0;0;0;0;0;0;0.324
"5";0;0;0;0;0;0;0;0
"6";0;0;0;0;0;0;0;0
"7";0;0;0;0;0;0;0;0
"8";0;0;0;0;0;0;0;0
When I open it with Gephi, I find that the nodes include all the numbers in the table
Id label
" "
1" 1"
2" 2"
3" 3"
4" 4"
5" 5"
6" 6"
7" 7"
8 8
0 0
0.259 0.259
0.324 0.324
8" 8"
I expected the nodes only include 1-8 as ids, not "", "0 and other numbers. Is there something wrong with my adjacency table?
Remove the double quotes and try to reimport. Since you are using R I would propose to automate your pipeline by using igraph and in your case graph_from_adjacency_matrix, cf here. Then you will need to export the graph in GraphML which Gephi can easily read
Here is some example code for the sake of completeness:
library(igraph)
t <- ';1;2;3;4;5;6;7;8
1;0;0;0;0;0;0;0;0
2;0;0;0;0;0;0;0;0
3;0;0;0;0.259;0;0;0;0
4;0;0;0;0;0;0;0;0.324
5;0;0;0;0;0;0;0;0
6;0;0;0;0;0;0;0;0
7;0;0;0;0;0;0;0;0
8;0;0;0;0;0;0;0;0'
f <- read.csv(textConnection(t), sep = ";", header = T, row.names = 1)
m <- as.matrix(f, rownames.force = T)
colnames(m) <- seq(1:dim(f)[1])
rownames(m) <- seq(1:dim(f)[1])
graph <- graph_from_adjacency_matrix(m, mode=c("directed"), weighted = T)
write.graph(graph, "mygraph.graphml", format=c("graphml") )