So I have a bunch of data that I am looking through. In the past, I have used the openxlsx to highlight entire rows. I want to step it up a bit and highlight specific cells. Here is a sample of the format of the data I am working with
df <- structure(list(Name = c("ENSCAFG00000000019","ENSCAFG00000000052", "ENSCAFG00000000094","ENSCAFG00000000210"), baseMean = c(692.430970065448, 391.533849079888, 1223.74083601928, 280.477417588943), log2FoldChange = c("0.0819834415495699",
"-2.6249568393179099", "6.15181461329998", "0.23483770613468"
), lfcSE = c("0.247177913269579", "0.65059275393549898", "0.33371763683349598", "0.353449339778654"), stat = c("4.3773467751931898", "-4.0347157625707997",
"3.4514646101088902", "3.4936766522410099"), pvalue = c("1.20132758621478E-5", "5.4668435006169397E-5", "5.5755287106466398E-4", "4.7641767052765697E-4"), padj = c("9.8372077245438908E-4", "0.00004", "0.000006", "1.47480018315951E-2"), symbol = c("ZNF516", "CDH19", "LMAN1", "NA"), entrez = c("483930", "483948", "476186", "NA")), .Names = c("Names", "baseMean", "log2FoldChange", "lfcSE", "stat", "pvalue", "padj", "symbol", "entrez"), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
So what I want to do is highlight cells in log2FoldChange that are either <= -1 or >= 1 and highlight cells that are <= 0.05. Is this something that can be done? I have read a lot about highlighting rows but not specific cells with a condition.
This is sort of what I am hoping I can get the data to look like. The log2Foldchange and the padj don't need to make up like the example above.
Thanks in advance
Here is one example. Note, however, that all cells in column padj have values below 0.05.
library(openxlsx)
# note that some columns of df look numeric, but are character
df <- data.frame(
Name = c("ENSCAFG00000000019","ENSCAFG00000000052", "ENSCAFG00000000094","ENSCAFG00000000210"),
baseMean = c(692.430970065448, 391.533849079888, 1223.74083601928, 280.477417588943),
log2FoldChange = c(0.0819834415495699, -2.6249568393179099, 6.15181461329998, 0.23483770613468),
lfcSE = c(0.247177913269579, 0.65059275393549898, 0.33371763683349598, 0.353449339778654),
stat = c(4.3773467751931898, -4.0347157625707997, 3.4514646101088902, 3.4936766522410099),
pvalue = c(1.20132758621478E-5, 5.4668435006169397E-5, 5.5755287106466398E-4, 4.7641767052765697E-4),
padj = c(9.8372077245438908E-4, 0.00004, 0.000006, 1.47480018315951E-2),
symbol = c("ZNF516", "CDH19", "LMAN1", "NA"), entrez = c("483930", "483948", "476186", "NA"),
stringsAsFactors=FALSE
)
# write dataset
wb <- createWorkbook()
addWorksheet(wb, sheetName="df")
writeData(wb, sheet="df", x=df)
# define style
yellow_style <- createStyle(fgFill="#FFFF00")
# log2FoldChange
y <- which(colnames(df)=="log2FoldChange")
x <- which(abs(df$log2FoldChange)>=1)
addStyle(wb, sheet="df", style=yellow_style, rows=x+1, cols=y, gridExpand=TRUE) # +1 for header line
# padj
y <- which(colnames(df)=="padj")
x <- which(abs(df$padj)<=0.05)
addStyle(wb, sheet="df", style=yellow_style, rows=x+1, cols=y, gridExpand=TRUE) # +1 for header line
# write result
saveWorkbook(wb, "yellow.xlsx", overwrite=TRUE)
You may also want to have a look at BERT.
Related
I’m wondering if there is a way to do the following in R:
Produce separate .xlsx workbooks from a single dataset based on a column value
Apply conditional formatting to rows in each .xlsx file based on a column value
I can do each of these separately, but efforts to combine them haven't been successful and I can't find an exact use-case match online. Any help would be greatly appreciated.
I can't share my specific data, but here is a sample that replicates the data I have.
df <- data.frame (
assign = c("YES", "NO", "NO", "YES", "NO", "YES", "YES", "NO"),
dept = c("HIST","HIST", "PSYC", "PSYC", "PSYC", "ENGL", "ENGL", "ENGL"),
class = c(1009, 1330, 1001, 1015, 2190, 1001, 3001, 4390))
I can successfully create separate workbooks by generating a list of the dept variable and then using lapply(), but attempts to incorporate conditional formatting are unsuccessful:
# create a list of dept values to split into separate workbooks
li <- split(df, with(df, df$dept), drop = FALSE)
# using lapply to generate .xlsx docs
lapply(names(li), function(x){write.xlsx(li[[x]], "report", file = paste0("report_", x, ".xlsx"), row.names = FALSE)})
With the following code, I can generate a .xlsx file with conditional formatting, but can only produce a single file with all rows rather than multiple files:
# create style for classes that haven’t finished the assignment
noadmin <- createStyle(fontColour = "#FF0000", fontSize = 10)
# create style for top row
Heading <- createStyle(textDecoration = "bold", fgFill = "#FFFFCC", border = "TopBottomLeftRight")
# workbook call begins here
assign_all <- createWorkbook()
addWorksheet(assign_all, 1, gridLines = TRUE)
writeData(assign_all, 1, df, withFilter = TRUE)
# identify which rows didn’t complete (e.g., need to be formatted)
noRows = data.frame(which(df$assign == "NO", arr.ind=FALSE))
# freeze top row
freezePane(assign_all, 1, firstActiveRow = 2, firstActiveCol = 1)
# add style to header
addStyle(assign_all, 1, cols = 1:ncol(df), rows = 1, style = Heading)
# add style to "NO" rows
addStyle(assign_all, 1, cols = 1:ncol(df), rows = noRows[,1]+1, style = noadmin, gridExpand = TRUE)
saveWorkbook(assign_all, paste0("report.xlsx"), overwrite = TRUE)
This produces the output I want, but with all rows in one file:
Thanks in advance for any guidance you can provide. I've been working on this problem for a few weeks and have run out of ideas.
You could put your code to create the workbook inside a function, then loop over the list of splitted dataframes to create your xlsx files. Instead of lapply I use mapply to loop over both the list and the names:
li <- split(df, df$dept)
library(openxlsx)
# create style for classes that haven’t finished the assignment
noadmin <- createStyle(fontColour = "#FF0000", fontSize = 10)
# create style for top row
Heading <- createStyle(textDecoration = "bold", fgFill = "#FFFFCC", border = "TopBottomLeftRight")
make_xl <- function(x, y) {
assign_all <- createWorkbook()
addWorksheet(assign_all, 1, gridLines = TRUE)
writeData(assign_all, 1, x, withFilter = TRUE)
# identify which rows didn’t complete (e.g., need to be formatted)
noRows = data.frame(which(x$assign == "NO", arr.ind=FALSE))
# freeze top row
freezePane(assign_all, 1, firstActiveRow = 2, firstActiveCol = 1)
# add style to header
addStyle(assign_all, 1, cols = 1:ncol(x), rows = 1, style = Heading)
# add style to "NO" rows
addStyle(assign_all, 1, cols = 1:ncol(x), rows = noRows[,1]+1, style = noadmin, gridExpand = TRUE)
saveWorkbook(assign_all, paste0("report_", y, ".xlsx"), overwrite = TRUE)
}
mapply(make_xl, li, names(li))
#> ENGL HIST PSYC
#> 1 1 1
list.files(pattern = "^report")
#> [1] "report_ENGL.xlsx" "report_HIST.xlsx" "report_PSYC.xlsx"
How do I use openxlsx::conditionalFormatting on a list of column indexes, not necessarily in order? In the documentation, ?conditionalFormatting, all the examples fill the cols argument with a : like cols = 1:5 Meaning 1,2,3,4,5; however, I want my columns to be color coded according to if their index is in a list. The column index isn’t necessarily in a numerical order like 1:5. It could be 1,2,4,6,8 or something like that.
As an example:
library(tidyverse)
# install.packages("openxlsx")
library(openxlsx)
library(writexl)
library(glue)
data_format <- data.frame(vals = c(5,6,2,12,5,12,5,4.5,12,13,3,15,17,30,7,19),
vals1 = c(2,6,2,12,13,12,5,4.5,12,13,3,15,19,30,7,9),
vals2 = c(2,7,2,7,13,12,5,4.5,12,1,3,15,20,30,7,6),
vals3 = c(1,20,2,8,12,1,1,9,4.2,16,11,3,14,10,28,5),
vals4 = c(5,13,2,12,13,12,1,4.5,12,10,3,15,20,29,7,9),
vals5 = c(5,15,2,10,18,11,3,4.5,12,13,2,15,86,90,9,11),
thresh1 = c(4,11,9,13.5,12,12,6,4.8,10,14,3,17,22,80,8,13),
thresh2 = c(6,12,1,13,16,11,5,3,16,12,1,13,19,20,6,10))
data_format <-
data_format %>%
relocate(thresh1, .before=vals)
data_format <-
data_format %>%
relocate(thresh2, .after=thresh1)
wb <- createWorkbook()
addWorksheet(wb, "data_format")
writeData(wb, "data_format",data_format)
Colpink1 <- c(4,5,6,8) # I would expect these columns to be pink when they are less than column A
Colpurple2 <- c(3,7) # I would expect these columns to be purple when they are less than column B
pinkStyle <- createStyle(fontColour = "#FA977C")
purpleStyle <- createStyle(fontColour = "#9B7CFA")
conditionalFormatting(wb, "data_format",
cols = Colpink1,
rows = 2:nrow(data_format), rule = "<$A2", style = pinkStyle
)
conditionalFormatting(wb, "data_format",
cols = Colpurple2,
rows = 2:nrow(data_format), rule = "<$B2", style = purpleStyle
)
filepath <-
glue("PATH/format_coloring.xlsx")
saveWorkbook(wb, file = filepath)
Column C is purple as expected but G has pink and purple values. I would want G to just be color coded according to purple. The other columns have a mixture of pink and purple where I would expect pink. Does anyone have an idea on how to conditionally format according to index not in order?
If anyone has ideas that would be appreciated.
The cols argument looks like it takes ranges, even though you specify c(4,5,6,8) that is interpreted as columns 4-8 (including column 7). You could loop through Colpink1 and Colpurple2, applying the conditional formatting
wb <- createWorkbook()
addWorksheet(wb, "data_format")
writeData(wb, "data_format",data_format)
Colpink1 <- c(4,5,6,8)
Colpurple2 <- c("C","G") #just testing that column letters worked too
pinkStyle <- createStyle(fontColour = "#FA977C")
purpleStyle <- createStyle(fontColour = "#9B7CFA")
for(i in Colpink1){
conditionalFormatting(wb, "data_format",
cols = i,
rows = 2:nrow(data_format), rule = "<$A2", style = pinkStyle)
}
for(i in Colpurple2){
conditionalFormatting(wb, "data_format",
cols = i,
rows = 2:nrow(data_format), rule = "<$B2", style = purpleStyle)
}
openXL(wb)
I am trying to recreate this plot but I am having an issue with ggplot not liking the negative numbers in the data frame by the looks of the error message? Error: colours encodes as numbers must be positive. Does anyone know what its issue is? These are very large data frames but I wouldn't have thought that would have been an issue?
## Load packages
library(tidyverse)
require(data.table)
## Read in data frames
m1<-fread("m1.csv", header = F)
m2<-fread("m2.csv", header = F)
L<-fread("l.csv", header = F)
LP<-fread("LP.csv", header = F)
## Get rate by taking m1 from m2
rate<-m1[1,]-m2[1,] ### subtract p1 rate from p2
## Transpose the data frame
t_rate <- transpose(rate)
## Create row ID's to merge data frames
L$row_num <- seq.int(nrow(L))
t_rate$row_num <- seq.int(nrow(t_rate))
all<-merge(L, t_rate, by = "row_num") ## merge the dataframes based on their ID
## Get rid of ID now we don't need it
all$row_num=NULL
## Plot the graph
ggplot(all,x=all$V1.x,y=all$V2,col=all$V1.y)+
geom_point(data=all,x=all$V1.x,y=all$V2,col=all$V1.y,size=0.1)+
geom_point(data=LP,x=LP$V1,y=LP$V2,size=1)
### Data (all)
structure(list(V1.x = c(163.75, 164.25, 164.75, 165.25, 165.75,
166.25), V2 = c(-75.25, -75.25, -75.25, -75.25, -75.25, -75.25
), V1.y = c(1.55995, 1.56093, 1.56237, 1.56545, 1.56764, 1.56827
)), class = c("data.table", "data.frame"), row.names = c(NA,
-6L), .internal.selfref = <pointer: 0x7f9bd4811ae0>)
## Data (LP)
structure(list(V1 = c(169.7, 147.93, 150.01, 146.71, 147.31,
-63.26), V2 = c(-46.47, -42.344, -36.59, -38.64, -43.3, 44.739
)), row.names = c(NA, -6L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x7f9bd4811ae0>)
The issue is that you did not map on aesthetics but instead pass vectors to arguments. When doing so you have to pass color names or codes or a positive number to the color argument.
But to fix your issue you could simply map on aesthetics like so:
library(ggplot2)
ggplot(all, aes(x = V1.x, y = V2)) +
geom_point(aes(color = V1.y), size = 0.1) +
geom_point(data = LP, aes(x = V1, y = V2), size = 1)
I need to color the cell if the value of the cell is greater than 80. For example, given this data frame called df:
dput(df)
structure(list(Server = structure(1:2, .Label = c("Server1",
"Server2"), class = "factor"), CPU = c(79.17, 93), UsedMemPercent = c(16.66,
18.95)), .Names = c("Server", "CPU", "UsedMemPercent"), row.names = c(NA,
-2L), class = "data.frame")
df[2,2] should be in red color. I was able to change the color of the text by something like this using xtable:
df[, 2] = ifelse(df[, 2] > 80, paste("\\color{red}{", round(df[, 2], 2), "}"), round(df[, 2], 2))
If I do this and print out the table with kable, it wont print out. Any ideas how can I color the cell in kable output table?
In fact, you don't even need DT or kableExtra if all you need is the color of that cell. However, as the author of kableExtra, I do recommend that package though :P
# What u have now
df <-structure(list(Server =structure(1:2, .Label =c("Server1","Server2"), class = "factor"), CPU =c(79.17, 93), UsedMemPercent =c(16.66,18.95)), .Names =c("Server", "CPU", "UsedMemPercent"), row.names =c(NA,-2L), class = "data.frame")
df[, 2] =ifelse(df[, 2]>80,paste("\\color{red}{",round(df[, 2], 2), "}"),round(df[, 2], 2))
# What you need
kable(df, "latex", escape = F)
Not a knitr solution...
You can modify specific cells with DT::datatable formatStyle. It has more display options and I'm using list(dom = "t") to turn them off and ordering = FALSE to remove sorting options from the top off the table.
library(magrittr)
library(DT)
df %>%
datatable(options = list(dom = "t", ordering = FALSE),
rownames = FALSE,
width = 10) %>%
formatStyle("CPU", backgroundColor = styleEqual(93, "red"))
If you prefer kable way then you should try kableExtra. They have option to change background for specified rows.
Another solution using my huxtable package:
library(huxtable)
ht <- as_hux(df)
ht <- set_background_color(ht, where(ht > 80), "red")
ht
I would like to insert a blank column in between "Delta = delta" and "Card = vars" in the dataframe below. I would also like to sort the output by the column "Model_Avg_Error" in the dataframe as well.
df = data.frame(Card = vars, Model_Avg_Error = model_error, Forecast = forecasts, Delta = delta, ,Card = vars, Model_Avg_Error = model_error,
Forecast = forecasts, Delta = delta)
# save
write.csv(df, file = file.path(proj_path, "output.csv"), row.names = F)
This was the error received from above:
Error in data.frame(Card = vars, Model_Avg_Error = model_error, Forecast = forecasts, :
argument is missing, with no default
You can add your blank column, re-order, and sort using the code below:
df$blankVar <- NA #blank column
df[c("Card", "blankVar", "Model_Avg_Error", "Forecast", "Delta")] #re-ordering columns by name
df[order(df$Model_Avg_Error),] #sorting by Model_Avg_Error
Here's a general way to add a new, blank column
library(tibble)
# Adds after the second column
iris %>% add_column(new_col = NA, .after = 2)
# Adds after a specific column (in this case, after Sepal.Width)
iris %>% add_column(new_col = NA, .after = "Sepal.Width")