Highlighting the maximum value of each column of data frame in R - r

I have a data.frame in R :
p=structure(list(WSbin01 = c(214.98151752527, -46.9493685420515,
154.726947679253), WSbin02 = c(1093.46050365665, 420.318207941967,
927.97317496775), WSbin03 = c(2855.24990411661, 2035.57575481323,
2662.2595957214), WSbin04 = c(5863.91399544626, 4881.81544665127,
5625.17650575444), WSbin05 = c(9891.70254019722, 8845.32506336827,
9666.14583347469), WSbin06 = c(14562.1527820802, 13401.1727730953,
14321.601249974), WSbin07 = c(19091.1307681137, 18003.2115315665,
18903.0179613827), WSbin08 = c(24422.7094972645, 23694.5453703207,
24357.8071162775), WSbin09 = c(30215.4088114124, 30214.3195264298,
30310.242671113), WSbin10 = c(36958.2122031382, 37964.9044838778,
37239.6908819524), WSbin11 = c(41844.810779792, 43701.2643596447,
42343.7442683171), WSbin12 = c(37616.8187087318, 39348.3188777835,
38178.9009247311), WSbin13 = c(20953.0973658833, 21720.1930292221,
21251.8654076726), WSbin14 = c(7155.3786781173, 7262.61983182254,
7233.60584469268), WSbin15 = c(2171.61052809769, 2120.97045661101,
2173.49396732091), WSbin16 = c(779.72276608943, 745.52198490267,
767.81436310063)), .Names = c("WSbin01", "WSbin02", "WSbin03",
"WSbin04", "WSbin05", "WSbin06", "WSbin07", "WSbin08", "WSbin09",
"WSbin10", "WSbin11", "WSbin12", "WSbin13", "WSbin14", "WSbin15",
"WSbin16"), class = "data.frame", row.names = c(NA, -3L))
I would like to set a background color for the maximum value of each column.
Using DT::datatable would return the table but I don't know how to set the formatStyle parameters to return the max value in each column in different color.
Furthermore, I have a vector z= c(1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 2, 2, 2, 2, 3, 1) . I wanna have the background color in each column like if z[i]=1 column i should be green, if z[i]=2 then column i should be red and if z[i]=3 the column i should be blue.

Combining parts of the dt guide (https://rstudio.github.io/DT/010-style.html) and this q (Datatable: apply different formatStyle to each column), I get this:
colors <- apply(col2rgb(rainbow(n=ncol(p))),2,function(x)paste0("rgb(",paste(x,collapse=","),")"))
data <- datatable(p)
sapply(c(1:ncol(p)),function(x){
data <<- data %>% formatStyle(colnames(p)[[x]],backgroundColor = styleEqual(max(p[[x]]), colors[x]))
})
data
The answer to your second q is similar-
z= c(1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 2, 2, 2, 2, 3, 1)
colors <- apply(col2rgb(rainbow(n=max(z))),2,function(x)paste0("rgb(",paste(x,collapse=","),")"))
data <- datatable(p)
sapply(c(1:ncol(p)),function(x){
data <<- data %>% formatStyle(
colnames(p)[[x]],
backgroundColor = colors[z[x]]
)
})
data

Related

gt R package: Giving a different color to a table's cells according to numerical threshold(s)

Aim
Giving a different color to a table's cells according to numerical threshold(s).
R Package
gt
Reproducible example
mydata <- structure(list(none = c(4, 4, 25, 18, 10), light = c(2, 3, 10,
24, 6), medium = c(3, 7, 12, 33, 7), heavy = c(2, 4, 4, 13, 2
)), row.names = c("SM", "JM", "SE", "JE", "SC"), class = "data.frame")
Using the above dataset, I can produce a table (however crude), using the following code:
mytable <- gt::gt(mydata)
Where I got stuck
It must be really easy, but I can wrap my head around how to assign (say) red to the cells where the value is (say) larger than 20 AND blue to cells whose value is (say) smaller than 10. It's days now that I am trying to do a little of google search (example HERE), but I could not find a solution. It must be pretty simple but no success so far. My best guess is using the tab_style() function, but I am at loss of understanding how to tune the parameters to get what I am after.
This isn't ideal if you have an arbitrarily large data frame, but for an example of your size it's certainly manageable, imo. I generalized the tests as separate functions to reduce additional code duplication and make it easier to adjust your conditional parameters.
If you're looking for a more generalized solution it would be to look over a vector of columns, as described here.
library(gt)
isHigh <- function(x) {
x > 20
}
isLow <- function(x) {
x < 10
}
mydata %>%
gt() %>%
tab_style(
style = list(
cell_fill(color = 'red'),
cell_text(weight = 'bold')
),
locations =
list(
cells_body(
columns = none,
rows = isHigh(none)
),
cells_body(
columns = light,
rows = isHigh(light)
),
cells_body(
columns = medium,
rows = isHigh(medium)
),
cells_body(
columns = heavy,
rows = isHigh(heavy)
)
)
) %>%
tab_style(
style = list(
cell_fill(color = 'lightblue'),
cell_text(weight = 'bold')
),
locations =
list(
cells_body(
columns = none,
rows = isLow(none)
),
cells_body(
columns = light,
rows = isLow(light)
),
cells_body(
columns = medium,
rows = isLow(medium)
),
cells_body(
columns = heavy,
rows = isLow(heavy)
)
)
)
On the basis of the comment I got, and after having read the earlier post here on SO, I came up with the following:
Create a dataset to work with:
mydata <- structure(list(none = c(4, 4, 25, 18, 10), light = c(2, 3, 10,
24, 6), medium = c(3, 7, 12, 33, 7), heavy = c(2, 4, 4, 13, 2
)), row.names = c("SM", "JM", "SE", "JE", "SC"), class = "data.frame")
Create a 'gt' table:
mytable <- gt::gt(mydata)
Create a vector of columns name to be later used inside the 'for' loops:
col.names.vect <- colnames(mydata)
Create two 'for' loops, one for each threshold upon which we want our values to be given different colors (say, a RED text for values > 20; a BLUE text for values < 5):
for(i in seq_along(col.names.vect)) {
mytable <- gt::tab_style(mytable,
style = gt::cell_text(color="red"),
locations = gt::cells_body(
columns = col.names.vect[i],
rows = mytable$`_data`[[col.names.vect[i]]] > 20))
}
for(i in seq_along(col.names.vect)) {
mytable <- gt::tab_style(mytable,
style = gt::cell_text(color="blue"),
locations = gt::cells_body(
columns = col.names.vect[i],
rows = mytable$`_data`[[col.names.vect[i]]] < 5))
}
This seems to achieve the goal I had in mind.

Change Size of Additional Flextable Header

I have been using add_header_lines() to add a header to my flextable. I would like this header to have a different text format than the table as a whole. Unfortunately, when I use fontsize() it formats text for both the header and the column names. Is there a way to only change the text format for the header produced with add_header_lines()?
Here is some sample code:
df <- data.frame(crime = c("assault", "homicide", "burglary"),
Jan = c(5, 2, 7),
Feb = c(2, 4, 0),
Mar = c(1, 2, 1))
flex <- flextable(df)
flex <- add_header_lines(flex, values = "This is a header")
flex <- fontsize(flex, size = 15, part = "header")
As you can see, the fontsize function affects both the added header and the column values. Any idea of how to change just the headers text format?
You have to use the selector i.
library(flextable)
df <- data.frame(crime = c("assault", "homicide", "burglary"),
Jan = c(5, 2, 7),
Feb = c(2, 4, 0),
Mar = c(1, 2, 1))
flex <- flextable(df)
flex <- add_header_lines(flex, values = "This is a header")
flex <- fontsize(flex, size = 15, part = "header", i = 1)
flex <- color(flex, color = "red", part = "header", i = 1)
flex

Parts Missing From Plot, That then Reappear and Overwrite The Entire Plot When Saved? (R, Heatmap.2)

I'm using heatmap.2 to create a plot, however, the initial plot that is saved to my source folder is missing a key and title.
When I then run the dev.off() command, the Key and the Title are then used to overwrite the original graph?
For instance, I will produce a plot like this:
Which is far from perfect. But then when I run the dev.off() to close the device (otherwise a host of other errors ensue):
What you are looking at above is a very distorted Key and my 'XYZ' title.
Why on earth is it creating two files, firstly the one with my matrix, and then overwriting this with a second file containing my flipping key and my title? I cannot follow the logic.
I've updated my OS, my version of R, RStudio, all my packages and unistalled RStudio. Nothing seems to help.
If you'd like to try and replicate my error here is the example matrix:
structure(c(1, 4, 5, 3, 3, 4, 6, 1, 7, 5, 5, 4, 4, 8, 1, 3, 9,
2, 2, 9, 3, 1, 3, 4, 4, 5, 5, 5, 1, 4, 4, 3, 3, 3, 9, 1), .Dim = c(6L,
6L))
And this is the script I'm using to plot my example data. You'll need to provide a SourceDir and make sure you assign the matrix to the name "Matrix".
if (!require("gplots")) {
install.packages("gplots", dependencies = TRUE)
library(gplots)
}
if (!require("RColorBrewer")) {
install.packages("RColorBrewer", dependencies = TRUE)
library(RColorBrewer)
}
my_palette <- colorRampPalette(c("snow", "yellow", "darkorange", "red"))(n = 399)
transition
col_breaks = c(seq(0,1,length=100), #white 'snow'
seq(2,4,length=100), # for yellow
seq(5,7,length=100), # for orange 'darkorange'
seq(8,9,length=100)) # for red
png(paste(SourceDir, "Heatmap_Test.png"),
width = 5*1000,
height = 5*1000,
res = 300,
pointsize =15)
heatmap.2(Matrix,
main = paste("XYZ"),
notecol="black",
key = "true" ,
colsep = c(3, 6, 9),
rowsep = c(3, 6, 9),
labCol = NULL,
labRow = NULL,
sepcolor="white",
sepwidth=c(0.08,0.08),
density.info="none",
trace="none",
margins=c(1,1),
col=my_palette,
breaks=col_breaks,
dendrogram="none",
RowSideColors = c(rep("blue", 3), rep("orange", 3)),
ColSideColors = c(rep("blue", 3), rep("orange", 3)),
srtCol = 0 ,
asp = 1 ,
adjCol = c(NA, 0) ,
adjRow = c(0, NA) ,
#keysize = 2 ,
Colv = FALSE ,
Rowv = FALSE ,
key.xlab = paste("Correlation") ,
cexRow = (1.8) ,
cexCol = (1.8) ,
notecex = (1.5) ,
lmat = rbind(c(0,0,0,0), c(0,0,2,0),c(0,1,3,0),c(0,0,0,0)) ,
#par(ColSideColors = c(2,2)),
lhei = c(1, 1, 3, 1) ,
lwid = c(1, 1, 3, 1))
dev.off()
I'd really appreciate any insight into this problem.
I believe this resulted from the fact that I had more than just elements 1 to four, as the coloured rows I had added counted as additional elements that had to be arranged in the display matrix.
As such:
mat = rbind(c(0,0,0,0), c(0,0,2,0),c(0,1,3,0),c(0,0,0,0)) ,
lhei = c(1, 1, 3, 1) ,
lwid = c(1, 1, 3, 1))
No longer cut the butter. After much ado, I finally managed to get the following layout to work (on my actual data, not my example data).
lmat = rbind(c(0,4,5,0), c(0,0,2,0),c(0,1,3,0),c(0,0,6,0)) ,
lhei = c(0.4, 0.16, 3, 0.4) , # Alter dimensions of display array cell heighs
lwid = c(0.4, 0.16, 3, 0.4),
Notice the inclusion of elements 5 and 6.
So my final command looks like this (note that there will be many other changes but the real progress happened once I added in 5 and 6):
png(paste(SourceDir, "XYZ.png"),
width = 5*1500,
height = 5*1500,
res = 300, # 300 pixels per inch
pointsize =30)
heatmap.2(CombinedMtx,
main = paste("XYZ"), # heat map title
notecol="black",
key = "true" ,# change font color of cell labels to black
colsep = c(6, 12, 18),
labCol = c(" "," "," ", "XX"," "," "," "," "," ", "YY"," "," "," "," "," ", "ZZ"," "," "," "," "," ", "QQ"),
rowsep = c(6, 12, 18),
labRow = c(" "," "," ", "XX"," "," "," "," "," ", "YY"," "," "," "," "," ", "ZZ"," "," "," "," "," ", "QQ"),
sepcolor="white",
sepwidth=c(0.08,0.08),
density.info="none",
trace="none",
margins=c(1,1),
col=my_palette,
breaks=col_breaks,
dendrogram="none",
RowSideColors = c(rep("#deebf7", 6), rep("#1c9099", 6), rep("#addd8e", 6), rep("#fee391", 6)),
ColSideColors = c(rep("#deebf7", 6), rep("#1c9099", 6), rep("#addd8e", 6), rep("#fee391", 6)),
srtCol = 0 ,
asp = 1 ,
adjCol = c(1.5, -61.5) ,
adjRow = c(0, -1.38),
offsetRow = (-59.5),
keysize = 2 ,
Colv = FALSE ,
Rowv = FALSE ,
key.xlab = NA ,
key.ylab = NULL ,
key.title = NA ,
cexRow = (1.6) ,
cexCol = (1.6) ,
notecex = (1.5) ,
cex.main = (20),
lmat = rbind(c(0,4,5,0), c(0,0,2,0),c(0,1,3,0),c(0,0,6,0)) ,
#par(ColSideColors = c(2,2)),
lhei = c(0.4, 0.16, 3, 0.4) , # Alter dimensions of display array cell heighs
lwid = c(0.4, 0.16, 3, 0.4),
symkey = any(0.5 < 0, na.rm=FALSE) || col_breaks,
key.par=list(mar=c(3.5,0, 1.8,0) ) #tweak specific key paramters
)
dev.off()
Also, if you don't start each time by creating the PNG and enf each time by using dev.off() it won't work. I believe this might also have been contribution to my confusion, and potentially after drawing the heatmap, some elements were being drawn once the dev.off() command was run, causing the heatmap to be overwritten.
This (with my matrix) creates this image.
What I have done is a really gammy way of labelling my blocks but until I can work out how to get ComplexHeatmap working properly I'll be stuck using hacks like this with Heatmap.2.

Radar chart is not displaying the values accurately on the axis

I am trying to create a radar chart but the axis stops at 4 when the maximum I have given is 10. Any help is much appreciated.
I have taken a subset of the dataset from a .csv file. Below is the subset
datasample = structure(list(Score = 7.522, Whisker.high = 7.581728, Whisker.low = 7.462272,
GDP = 1.482383, Family = 1.551122, Health = 0.7925655, Freedom = 0.6260067,
Generosity = 0.3552805, TrustInGovernmentAndCorruption. = 0.4007701,
Dystopia.Residual = 2.3137), .Names = c("Score", "Whisker.high",
"Whisker.low", "GDP", "Family", "Health", "Freedom", "Generosity",
"TrustInGovernmentAndCorruption.",
"Dystopia.Residual"), class = "data.frame", row.names = c(NA,
-1L))
The subset contains 10 columns and 1 row.
Code:
library(fmsb)
#data <- read.csv("~/Desktop/App-1/2017.csv")
#datasample <- data[2, 3:12]
datasample <- rbind(rep(10,12), rep(0,12), datasample)
radarchart(datasample, axistype = 1,
pcol = rgb(0.2,0.5,0.5,0.9), pfcol = rgb(0.2,0.5,0.5, 0.5), plwd = 4,
cglcol = "grey", cglty = 1, axislabcol = "grey",caxislabels = seq(0,20,1), cglwd = 0.8,
vlcex = 0.8)
Here is the chart created:
The highest value in the subset (datasample) is 7.581728 but the chart does not display that value. How do I fix this?
The radar plot you are generating has an axis that is percentages of the max data point (you can see these when you remove the caxislabels from your plot (you'll get 100%,75%,50%,25%). The values you are seeing around 3 are actually the ~7.5 values (on the 75% grid line).
One way to fix this so that the axis is as expected:
radarchart(datasample, axistype = 1,
pcol = rgb(0.2,0.5,0.5,0.9), pfcol = rgb(0.2,0.5,0.5, 0.5), plwd = 4,
cglcol = "grey", cglty = 1, axislabcol = "grey",
caxislabels = seq(min(datasample),max(datasample),1), seg=length(seq(min(datasample),max(datasample),1))-1,
cglwd = 0.8, vlcex = 0.8)

R: Smoothing Time Series Data by Item

I have a data series that displays purchases over time by item ID. Purchasing habits are irregular, so I would like to smooth this data out over time and by item ID.
If items had orders placed more regularly (i.e. Every day) we could better plot/evaluate our ordering and set stocking levels. However, some people will purchase excess of an item so they don't have to restock. This then is skewing our par level data (Since a 1 day total could really be a week's worth of product since they could only be ordering once per week.
Reproducible Example:
POData <- structure(list(a = structure(c(1499918400, 1499918400, 1499918400,
1499918400, 1499918400, 1499918400, 1496376000, 1497412800, 1497412800,
1497412800, 1497412800, 1497412800, 1497240000, 1497412800, 1497412800,
1497412800, 1501214400, 1496376000, 1496376000, 1496376000, 1496289600,
1496289600, 1496289600, 1496289600, 1496289600, 1496289600, 1501214400,
1501214400, 1501214400, 1501214400), class = c("POSIXct", "POSIXt"
), tzone = ""), b = c(446032L, 101612L, 37740L, 482207L, 152360L,
4483L, 482207L, 141729L, 81192L, 482207L, 85273L, 142955L, 460003L,
142955L, 17752L, 29763L, 309189L, 361905L, 17396L, 410762L, 437420L,
17752L, 18002L, 150698L, 163342L, 433332L, 150587L, 44159L, 433332L,
446032L), c = c(4, 1, 25, 1, 1, 1, 3, 12, 12, 1, 1, 1, 300, 1,
1, 2, 6, 6, 2, 1, 1, 1, 1, 1, 1, 1, 40, 2, 1, 2)), .Names = c("PO Date",
"PS Item ID", "PO Qty"), row.names = c(NA, 30L), class = "data.frame")
This is probably a simple question, but I hope someone has a simple way to do this.
You could use something like this
require(zoo)
require(dply)
df2 = POData %>%
arrange(`PS Item ID`,`PO Date`)%>%
group_by(`PS Item ID`)%>%
mutate(temp_lag1 = lag( `PO Qty`))%>%
mutate(temp.5.previous = rollapply(data = temp_lag1,
width = 2,
FUN = mean,
align = "left",
fill = `PO Qty`,
na.rm = T))
It essentially groups by PS Item ID and arranges by PS Item ID and PO Date. The width argument in mutate specifies how far you would like to go back for a moving average. As of now its set to 1 because your data is not that extensive by product ID.

Resources