Related
import hvplot.pandas
from bokeh.sampledata.autompg import autompg_clean
autompg_clean['origin']=autompg_clean.origin.map({'North America': 'North America '*5,
'Asia': 'Asia '*5,
'Europe': 'Europe '*5,
})
Here is the corresponding annotated output. I have tried using p=hv.render() to get the Bokeh figure object back, but doing something like p.yaxis.major_label_text_align = 'left' does not seem to do anything even if I inject newline \n characters into the long string label.
Multiline labels are available with the newline charactert \n for categorical factors.
I was not able to reproduce your example, but I think the solution is to set you y-axis to a FactorRange and set the factors with a list of your wanted strings, which can include \n.
See the example below, which is adapted from here.
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import GnBu3, OrRd3
from bokeh.plotting import figure
output_file("stacked_split.html")
fruits = [f'{item}\n{item}' for item in ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']]
years = ["2015", "2016", "2017"]
exports = {'fruits' : fruits,
'2015' : [2, 1, 4, 3, 2, 4],
'2016' : [5, 3, 4, 2, 4, 6],
'2017' : [3, 2, 4, 4, 5, 3]}
imports = {'fruits' : fruits,
'2015' : [-1, 0, -1, -3, -2, -1],
'2016' : [-2, -1, -3, -1, -2, -2],
'2017' : [-1, -2, -1, 0, -2, -2]}
p = figure(y_range=fruits, height=250, x_range=(-16, 16), title="Fruit import/export, by year",
toolbar_location=None)
p.hbar_stack(years, y='fruits', height=0.9, color=GnBu3, source=ColumnDataSource(exports),
legend_label=["%s exports" % x for x in years])
p.hbar_stack(years, y='fruits', height=0.9, color=OrRd3, source=ColumnDataSource(imports),
legend_label=["%s imports" % x for x in years])
p.y_range.range_padding = 0.1
p.ygrid.grid_line_color = None
p.legend.location = "top_left"
p.axis.minor_tick_line_color = None
p.outline_line_color = None
show(p)
Output
I'm trying the R cartography package. Had to work to find a US state shapefile that would like to work with the cartography stuff - many seemed too big, etc.. I seemed to get everything going well, but the state of Colorado misplots.
library(cartography)
library(sf)
library(RColorBrewer)
library(maps)
library(ggplot2)
rm(list = ls())
# USA shape file
states <- st_as_sf(map("state", plot = F, fill = TRUE))
#seems to plot correctly here
#ggplot(states) + geom_sf(aes(fill = ID))
usa <- st_transform(states,
CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96"))
# still seems to plot fine
#ggplot(usa) + geom_sf(aes(fill = ID))
usa <- st_buffer(usa, dist=0)
datamap <- usa
datamap$randoVar <- sample(1:3, length(datamap$ID), replace = T)
datamap_pencil <- getPencilLayer(
x = datamap,
buffer = 500,
size = 400,
lefthanded = F
)
plot(st_geometry(usa), col = "white", border = "black", bg = "lightblue1")
typoLayer(
x = datamap_pencil,
var="randoVar",
col = c("aquamarine4", "yellow3","#3c5cb0"),
lwd = .7,
legend.values.order = 1:3,
legend.pos = "bottomleft",
legend.title.txt = "",
add = TRUE
)
labelLayer(x = datamap, txt = "ID",
cex = 0.9, halo = TRUE, r = 0.15)
I first noticed because when I tried to merge in a data file and do a fill with that feature, colorado came up as "No Data". Likewise, the code above seems to indicate the state gemometry or ID is off. I don't know enough GIS to understand why. I did have to change the CRS projection so that I could buffer the map file (getPencilLayer kept throwing a self-intercection error, which seems to be common with R mapping).
Any ideas on what to do?
Well, I ended up fixing by using a shapefile from the US Census
https://www2.census.gov/geo/tiger/TIGER2017/STATE/
states <- st_read("#mypath#/tl_2017_us_state/tl_2017_us_state.shp")
states <- states[!(states$NAME %in%
c("Commonwealth of the Northern Mariana Islands", "United States Virgin Islands",
"Puerto Rico", "American Samoa", "Hawaii", "Guam", "Alaska")
), ]
usa <- st_transform(states,
CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96"))
usa <- st_buffer(usa, dist=0)
Not sure how to adjust for getting the geodate from map("state"...) but this worked for me.
Used these buffer and size settings (later after merging in data)
datamap_pencil <- getPencilLayer(
x = datamap,
buffer = 500,
size = 400,
lefthanded = F
)
Already answered here:
https://gis.stackexchange.com/a/351910/142200
The problem is that the initial map object is non-valid
library(cartography)
library(sf)
library(RColorBrewer)
library(maps)
library(ggplot2)
rm(list = ls())
# USA shape file
states <- st_as_sf(map("state", plot = F, fill = TRUE))
usa <- st_transform(states,
"+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96")
datamap <- usa
# Check validity----
st_is_valid(datamap)
#> [1] TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [13] TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [25] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
#> [37] TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE TRUE
#> [49] TRUE
#Make valid
library(lwgeom)
datamap<-st_make_valid(datamap)
st_is_valid(datamap)
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [46] TRUE TRUE TRUE TRUE
# Start cartography
datamap$randoVar <- sample(1:3, length(datamap$ID), replace = T)
datamap_pencil <- getPencilLayer(
x = datamap,
buffer = 500,
size = 400,
lefthanded = F
)
plot(st_geometry(usa), col = "white", border = "black", bg = "lightblue1")
typoLayer(
x = datamap_pencil,
var="randoVar",
col = c("aquamarine4", "yellow3","#3c5cb0"),
lwd = .7,
legend.values.order = 1:3,
legend.pos = "bottomleft",
legend.title.txt = "",
add = TRUE
)
Created on 2020-02-25 by the reprex package (v0.3.0)
I'm using heatmap.2 to create a plot, however, the initial plot that is saved to my source folder is missing a key and title.
When I then run the dev.off() command, the Key and the Title are then used to overwrite the original graph?
For instance, I will produce a plot like this:
Which is far from perfect. But then when I run the dev.off() to close the device (otherwise a host of other errors ensue):
What you are looking at above is a very distorted Key and my 'XYZ' title.
Why on earth is it creating two files, firstly the one with my matrix, and then overwriting this with a second file containing my flipping key and my title? I cannot follow the logic.
I've updated my OS, my version of R, RStudio, all my packages and unistalled RStudio. Nothing seems to help.
If you'd like to try and replicate my error here is the example matrix:
structure(c(1, 4, 5, 3, 3, 4, 6, 1, 7, 5, 5, 4, 4, 8, 1, 3, 9,
2, 2, 9, 3, 1, 3, 4, 4, 5, 5, 5, 1, 4, 4, 3, 3, 3, 9, 1), .Dim = c(6L,
6L))
And this is the script I'm using to plot my example data. You'll need to provide a SourceDir and make sure you assign the matrix to the name "Matrix".
if (!require("gplots")) {
install.packages("gplots", dependencies = TRUE)
library(gplots)
}
if (!require("RColorBrewer")) {
install.packages("RColorBrewer", dependencies = TRUE)
library(RColorBrewer)
}
my_palette <- colorRampPalette(c("snow", "yellow", "darkorange", "red"))(n = 399)
transition
col_breaks = c(seq(0,1,length=100), #white 'snow'
seq(2,4,length=100), # for yellow
seq(5,7,length=100), # for orange 'darkorange'
seq(8,9,length=100)) # for red
png(paste(SourceDir, "Heatmap_Test.png"),
width = 5*1000,
height = 5*1000,
res = 300,
pointsize =15)
heatmap.2(Matrix,
main = paste("XYZ"),
notecol="black",
key = "true" ,
colsep = c(3, 6, 9),
rowsep = c(3, 6, 9),
labCol = NULL,
labRow = NULL,
sepcolor="white",
sepwidth=c(0.08,0.08),
density.info="none",
trace="none",
margins=c(1,1),
col=my_palette,
breaks=col_breaks,
dendrogram="none",
RowSideColors = c(rep("blue", 3), rep("orange", 3)),
ColSideColors = c(rep("blue", 3), rep("orange", 3)),
srtCol = 0 ,
asp = 1 ,
adjCol = c(NA, 0) ,
adjRow = c(0, NA) ,
#keysize = 2 ,
Colv = FALSE ,
Rowv = FALSE ,
key.xlab = paste("Correlation") ,
cexRow = (1.8) ,
cexCol = (1.8) ,
notecex = (1.5) ,
lmat = rbind(c(0,0,0,0), c(0,0,2,0),c(0,1,3,0),c(0,0,0,0)) ,
#par(ColSideColors = c(2,2)),
lhei = c(1, 1, 3, 1) ,
lwid = c(1, 1, 3, 1))
dev.off()
I'd really appreciate any insight into this problem.
I believe this resulted from the fact that I had more than just elements 1 to four, as the coloured rows I had added counted as additional elements that had to be arranged in the display matrix.
As such:
mat = rbind(c(0,0,0,0), c(0,0,2,0),c(0,1,3,0),c(0,0,0,0)) ,
lhei = c(1, 1, 3, 1) ,
lwid = c(1, 1, 3, 1))
No longer cut the butter. After much ado, I finally managed to get the following layout to work (on my actual data, not my example data).
lmat = rbind(c(0,4,5,0), c(0,0,2,0),c(0,1,3,0),c(0,0,6,0)) ,
lhei = c(0.4, 0.16, 3, 0.4) , # Alter dimensions of display array cell heighs
lwid = c(0.4, 0.16, 3, 0.4),
Notice the inclusion of elements 5 and 6.
So my final command looks like this (note that there will be many other changes but the real progress happened once I added in 5 and 6):
png(paste(SourceDir, "XYZ.png"),
width = 5*1500,
height = 5*1500,
res = 300, # 300 pixels per inch
pointsize =30)
heatmap.2(CombinedMtx,
main = paste("XYZ"), # heat map title
notecol="black",
key = "true" ,# change font color of cell labels to black
colsep = c(6, 12, 18),
labCol = c(" "," "," ", "XX"," "," "," "," "," ", "YY"," "," "," "," "," ", "ZZ"," "," "," "," "," ", "QQ"),
rowsep = c(6, 12, 18),
labRow = c(" "," "," ", "XX"," "," "," "," "," ", "YY"," "," "," "," "," ", "ZZ"," "," "," "," "," ", "QQ"),
sepcolor="white",
sepwidth=c(0.08,0.08),
density.info="none",
trace="none",
margins=c(1,1),
col=my_palette,
breaks=col_breaks,
dendrogram="none",
RowSideColors = c(rep("#deebf7", 6), rep("#1c9099", 6), rep("#addd8e", 6), rep("#fee391", 6)),
ColSideColors = c(rep("#deebf7", 6), rep("#1c9099", 6), rep("#addd8e", 6), rep("#fee391", 6)),
srtCol = 0 ,
asp = 1 ,
adjCol = c(1.5, -61.5) ,
adjRow = c(0, -1.38),
offsetRow = (-59.5),
keysize = 2 ,
Colv = FALSE ,
Rowv = FALSE ,
key.xlab = NA ,
key.ylab = NULL ,
key.title = NA ,
cexRow = (1.6) ,
cexCol = (1.6) ,
notecex = (1.5) ,
cex.main = (20),
lmat = rbind(c(0,4,5,0), c(0,0,2,0),c(0,1,3,0),c(0,0,6,0)) ,
#par(ColSideColors = c(2,2)),
lhei = c(0.4, 0.16, 3, 0.4) , # Alter dimensions of display array cell heighs
lwid = c(0.4, 0.16, 3, 0.4),
symkey = any(0.5 < 0, na.rm=FALSE) || col_breaks,
key.par=list(mar=c(3.5,0, 1.8,0) ) #tweak specific key paramters
)
dev.off()
Also, if you don't start each time by creating the PNG and enf each time by using dev.off() it won't work. I believe this might also have been contribution to my confusion, and potentially after drawing the heatmap, some elements were being drawn once the dev.off() command was run, causing the heatmap to be overwritten.
This (with my matrix) creates this image.
What I have done is a really gammy way of labelling my blocks but until I can work out how to get ComplexHeatmap working properly I'll be stuck using hacks like this with Heatmap.2.
Goal
From the data.frame d, I am trying to make a histogram of the column cMPerSite weighted by bpInPiece. In other words, bpInPiece is the number of observations at each cMPerSite value.
The Y-axis should represent densities and the X-axis should be on a log scale.
Attempts
I could do something like (which could be improved by pre-allocating the memory size for x).
x = c()
for (row in 1:nrow(d))
{
x = c(x, rep(d$cMPerSite[row],d$bpInPiece[row]))
}
hist(x,breaks=100, freq=FALSE)
but this becomes completely impractical when there is too much data (I have about 10 millions rows in my full data set) because x becomes too large to be stored in the RAM. Also, putting the X-axis in log scale is, I think, necessarily a bit of a mess.
Alternatively, I would have thought I could do
ggplot(d) + geom_histogram(aes(x = cMPerSite, y=bpInPiece), stat="identity") + scale_x_log10() + theme_classic(25)
Warning: Ignoring unknown parameters: binwidth, bins, pad
but, for some reason I do not understand, nothing gets displayed. Also, I am not sure how to put the Y-axis in density rather than count.
I suppose the bin size should vary logarithmically as the X-axis varies but that's confuses me as it would result in bins gathering an "artificial" high number of observations. Not sure how histograms are typically displayed with log scale X axis. Note that ggplot(d) + geom_histogram(aes(x = cMPerSite, y=bpInPiece), stat="identity") does not display anything either so the problem is not only a question of log scale on the X-axis.
Can you help me to make this histogram?
Subset of my data
structure(list(chrom = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), end = c(241608,
612298, 715797, 956634, 983330, 1190613, 1236417, 1330208, 1391915,
1464000, 1911436, 1913462, 2092038, 2169783, 2354812, 2363639,
2544241, 2551672, 2575287, 2589721, 2659117, 2884565, 3037319,
3100967, 3152276, 4319658, 4335072, 6301896, 6550219, 6596684,
7132319, 7435267, 7469158, 7604030, 7937619, 8131876, 9359659,
9598491, 9945959, 10262757, 10392172, 10646861, 10816847, 11094415,
11360199, 11964985, 12220179, 12222166, 12389943), cMInPiece = c(0,
1e-07, 1e-07, 0.7118558, 9.99999999473644e-08, 0.9540829, 9.99999998363421e-08,
0.4967211, 1.244988, 0.2137991, 8.808171, 0.500545200000001,
1.5721302, 1.6856566, 2.2552469, 1.0000000116861e-07, 2.6973586,
0.355113100000001, 0.355233800000001, 1.0000000116861e-07, 1.4903822,
2.8174978, 1.0000000116861e-07, 0.355231, 1.0000000116861e-07,
8.2735924, 0.425817699999996, 6.4568106, 0.372779399999999, 0.363684999999997,
0.181640399999999, 0.177473599999999, 1.0000000116861e-07, 0.177463800000005,
0.355294099999995, 1.0000000116861e-07, 1.6101482, 1.0000000116861e-07,
0.533477099999999, 0.355287800000006, 9.99999940631824e-08, 1.0000000116861e-07,
1.0000000116861e-07, 1.0000000116861e-07, 1.0000000116861e-07,
1.0000000116861e-07, 9.99999940631824e-08, 1.0000000116861e-07,
1.0000000116861e-07), bpInPiece = c(241608, 370690, 103499, 240837,
26696, 207283, 45804, 93791, 61707, 72085, 447436, 2026, 178576,
77745, 185029, 8827, 180602, 7431, 23615, 14434, 69396, 225448,
152754, 63648, 51309, 1167382, 15414, 1966824, 248323, 46465,
535635, 302948, 33891, 134872, 333589, 194257, 1227783, 238832,
347468, 316798, 129415, 254689, 169986, 277568, 265784, 604786,
255194, 1987, 167777), cMPerSite = c(1e-16, 2.69767190914241e-13,
9.66192910076426e-13, 2.95575762860358e-06, 3.74587953054257e-12,
4.60280341369046e-06, 2.18321543612659e-12, 5.29604226418313e-06,
2.01757985317711e-05, 2.96593049871679e-06, 1.96858790977928e-05,
0.000247060809476802, 8.80370374518411e-06, 2.16818650717088e-05,
1.21886131363192e-05, 1.13288774406491e-11, 1.49353750235324e-05,
4.77880635176962e-05, 1.50427186110523e-05, 6.92808654348135e-12,
2.14764856764078e-05, 1.24973288740641e-05, 6.54647349127419e-13,
5.58118086978381e-06, 1.94897583598608e-12, 7.08730509807415e-06,
2.76253860127155e-05, 3.28286140498591e-06, 1.50118756619403e-06,
7.82707414182711e-06, 3.39112268615754e-07, 5.85821989252278e-07,
2.95063589650969e-12, 1.31579423453352e-06, 1.06506539484214e-06,
5.14781970114898e-13, 1.31142734506016e-06, 4.18704366117646e-13,
1.53532728193675e-06, 1.1214963478305e-06, 7.72707909154135e-13,
3.92635728942395e-13, 5.88283747888707e-13, 3.60272081683082e-13,
3.76245376578762e-13, 1.65347744770232e-13, 3.91858719496471e-13,
5.03271269092148e-11, 5.96029260080999e-13)), .Names = c("chrom",
"end", "cMInPiece", "bpInPiece", "cMPerSite"), row.names = c(NA,
-49L), class = "data.frame")
This might get you started
Assuming your data is too large to process in one step - the idea is to manually generate a histogram, which is essentially the number of observations per bin
1) Split your data.frame to a size that's manageable for your memory - N can be any number
N <- 10
L <- split(df, cut(seq_len(nrow(df)), breaks=N))
2) For each split
sum bpInPiece for each group - { i %>% group_by(G = floor(-log10(cMPerSite))) %>% summarise(sum=sum(bpInPiece)) }
Then aggregate all splits - %>% group_by(G) %>% summarise(sum = sum(sum))
Then plot - ggplot(...)
library(tidyverse)
counts <- map_df(L, function(i) { i %>% group_by(G = floor(-log10(cMPerSite))) %>% summarise(sum=sum(bpInPiece)) }) %>%
group_by(G) %>% summarise(sum = sum(sum)) %>%
ggplot(., aes(G, sum)) + geom_col()
counts
I have a data.frame in R :
p=structure(list(WSbin01 = c(214.98151752527, -46.9493685420515,
154.726947679253), WSbin02 = c(1093.46050365665, 420.318207941967,
927.97317496775), WSbin03 = c(2855.24990411661, 2035.57575481323,
2662.2595957214), WSbin04 = c(5863.91399544626, 4881.81544665127,
5625.17650575444), WSbin05 = c(9891.70254019722, 8845.32506336827,
9666.14583347469), WSbin06 = c(14562.1527820802, 13401.1727730953,
14321.601249974), WSbin07 = c(19091.1307681137, 18003.2115315665,
18903.0179613827), WSbin08 = c(24422.7094972645, 23694.5453703207,
24357.8071162775), WSbin09 = c(30215.4088114124, 30214.3195264298,
30310.242671113), WSbin10 = c(36958.2122031382, 37964.9044838778,
37239.6908819524), WSbin11 = c(41844.810779792, 43701.2643596447,
42343.7442683171), WSbin12 = c(37616.8187087318, 39348.3188777835,
38178.9009247311), WSbin13 = c(20953.0973658833, 21720.1930292221,
21251.8654076726), WSbin14 = c(7155.3786781173, 7262.61983182254,
7233.60584469268), WSbin15 = c(2171.61052809769, 2120.97045661101,
2173.49396732091), WSbin16 = c(779.72276608943, 745.52198490267,
767.81436310063)), .Names = c("WSbin01", "WSbin02", "WSbin03",
"WSbin04", "WSbin05", "WSbin06", "WSbin07", "WSbin08", "WSbin09",
"WSbin10", "WSbin11", "WSbin12", "WSbin13", "WSbin14", "WSbin15",
"WSbin16"), class = "data.frame", row.names = c(NA, -3L))
I would like to set a background color for the maximum value of each column.
Using DT::datatable would return the table but I don't know how to set the formatStyle parameters to return the max value in each column in different color.
Furthermore, I have a vector z= c(1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 2, 2, 2, 2, 3, 1) . I wanna have the background color in each column like if z[i]=1 column i should be green, if z[i]=2 then column i should be red and if z[i]=3 the column i should be blue.
Combining parts of the dt guide (https://rstudio.github.io/DT/010-style.html) and this q (Datatable: apply different formatStyle to each column), I get this:
colors <- apply(col2rgb(rainbow(n=ncol(p))),2,function(x)paste0("rgb(",paste(x,collapse=","),")"))
data <- datatable(p)
sapply(c(1:ncol(p)),function(x){
data <<- data %>% formatStyle(colnames(p)[[x]],backgroundColor = styleEqual(max(p[[x]]), colors[x]))
})
data
The answer to your second q is similar-
z= c(1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 2, 2, 2, 2, 3, 1)
colors <- apply(col2rgb(rainbow(n=max(z))),2,function(x)paste0("rgb(",paste(x,collapse=","),")"))
data <- datatable(p)
sapply(c(1:ncol(p)),function(x){
data <<- data %>% formatStyle(
colnames(p)[[x]],
backgroundColor = colors[z[x]]
)
})
data