I am attempting to create save multiple formatted Excel files, each of which are subsetted from a certain data frame by a factor.
This is an example of what I have tried so far
# Create data
df <- data.frame(category = rep(c("a","b","c","d"),times = 20),
values = rnorm(20,5,2))
# Create workbooks named after specific level of factor
l1 <- sapply(levels(df$category), assign, value = createWorkbook())
# Create styles
hs <- createStyle(fgFill = "#808080", border = "bottom", textDecoration = "bold")
lt8 <- createStyle(bgFill = "#ff0000")
gt30 <- createStyle(bgFill = "#00b0f0")
grn <- createStyle(bgFill = "#00b000")
# For loop
for (i in l1) {
addWorksheet(i, names(i))
writeData(i, names(i), df[df$category == names(i),], headerStyle = hs)
conditionalFormatting(i, names(i), cols = 1:2, rows = 2:nrow(df[df$category == names(i),]), rule = "$B2<2", type = "expression", style = lt8)
conditionalFormatting(i, names(i), cols = 1:2, rows = 2:nrow(df[df$category == names(i),]), rule = "$B2>=7", type = "expression", style = gt30)
conditionalFormatting(i, names(i), cols = 1:2, rows = 2:nrow(df[df$category == names(i),]), rule = "AND($B2>=4, $B2<5.5)", style = grn)
setColWidths(i, names(i), cols=1:2, widths = "auto")
saveWorkbook(paste(i, ".wb", sep = ""), file = paste(i, " Report ", ".xlsx", sep = ""))
}
Each time, I run into this error
Error in if (tolower(sheetName) %in% tolower(wb$sheet_names)) stop("A worksheet by that name already exists! Sheet names must be unique case-insensitive.")
This is the first time I've attempted to assign any sheets so I'm not exactly sure why I keep getting this error.
Ultimately, I would like to save the subsetted and formatted excel workbooks through a repetitive process because my real data would produce many more workbooks. The workbooks must be separate and placing these subsets in sheets won't work.
Any and all advice on how to achieve this would be greatly appreciated.
Your error is coming from this line:
addWorksheet(i, names(i))
because names(i) is empty:
> names(l1[['a']])
character(0)
You might be better off looping over the names of l1, so you have the categories you want, using that to pull the appropriate workbook from the list. Something like:
for (i in names(l1)) {
wb = l1[[i]]
addWorksheet(wb, i)
category_data <- df[df$category == i,]
writeData(wb, i, category_data, headerStyle = hs)
conditionalFormatting(wb, i, cols = 1:2, rows = 2:nrow(category_data), rule = "$B2<2", type = "expression", style = lt8)
conditionalFormatting(wb, i, cols = 1:2, rows = 2:nrow(category_data), rule = "$B2>=7", type = "expression", style = gt30)
conditionalFormatting(wb, i, cols = 1:2, rows = 2:nrow(category_data), rule = "AND($B2>=4, $B2<5.5)", style = grn)
setColWidths(wb, i, cols=1:2, widths = "auto")
saveWorkbook(wb, file = paste(i, " Report ", ".xlsx", sep = ""))
}
There's still one subtle error here:
l1 <- sapply(levels(df$category), assign, value = createWorkbook())
createWorkbook() is only being called once, so you have 4 copies of the same workbook. That means the final save will have all 4 tabs. Compare:
> identical(l1$a, l1$b)
[1] TRUE
with 2 separate calls to createWorkbook():
> identical(createWorkbook(), createWorkbook())
[1] FALSE
Might be worth just looping over the distinct categories, and creating the workbook inside the loop. That is:
library(openxlsx)
# Create data
df <- data.frame(category = rep(c("a","b","c","d"),times = 20),
values = rnorm(20,5,2))
# Create styles
hs <- createStyle(fgFill = "#808080", border = "bottom", textDecoration = "bold")
lt8 <- createStyle(bgFill = "#ff0000")
gt30 <- createStyle(bgFill = "#00b0f0")
grn <- createStyle(bgFill = "#00b000")
# For loop
for (i in levels(df$category)) {
wb <- createWorkbook()
addWorksheet(wb, i)
category_data <- df[df$category == i,]
writeData(wb, i, category_data, headerStyle = hs)
conditionalFormatting(wb, i, cols = 1:2, rows = 2:nrow(category_data), rule = "$B2<2", type = "expression", style = lt8)
conditionalFormatting(wb, i, cols = 1:2, rows = 2:nrow(category_data), rule = "$B2>=7", type = "expression", style = gt30)
conditionalFormatting(wb, i, cols = 1:2, rows = 2:nrow(category_data), rule = "AND($B2>=4, $B2<5.5)", style = grn)
setColWidths(wb, i, cols=1:2, widths = "auto")
saveWorkbook(wb, file = paste(i, " Report ", ".xlsx", sep = ""))
}
Related
How do I use openxlsx::conditionalFormatting on a list of column indexes, not necessarily in order? In the documentation, ?conditionalFormatting, all the examples fill the cols argument with a : like cols = 1:5 Meaning 1,2,3,4,5; however, I want my columns to be color coded according to if their index is in a list. The column index isn’t necessarily in a numerical order like 1:5. It could be 1,2,4,6,8 or something like that.
As an example:
library(tidyverse)
# install.packages("openxlsx")
library(openxlsx)
library(writexl)
library(glue)
data_format <- data.frame(vals = c(5,6,2,12,5,12,5,4.5,12,13,3,15,17,30,7,19),
vals1 = c(2,6,2,12,13,12,5,4.5,12,13,3,15,19,30,7,9),
vals2 = c(2,7,2,7,13,12,5,4.5,12,1,3,15,20,30,7,6),
vals3 = c(1,20,2,8,12,1,1,9,4.2,16,11,3,14,10,28,5),
vals4 = c(5,13,2,12,13,12,1,4.5,12,10,3,15,20,29,7,9),
vals5 = c(5,15,2,10,18,11,3,4.5,12,13,2,15,86,90,9,11),
thresh1 = c(4,11,9,13.5,12,12,6,4.8,10,14,3,17,22,80,8,13),
thresh2 = c(6,12,1,13,16,11,5,3,16,12,1,13,19,20,6,10))
data_format <-
data_format %>%
relocate(thresh1, .before=vals)
data_format <-
data_format %>%
relocate(thresh2, .after=thresh1)
wb <- createWorkbook()
addWorksheet(wb, "data_format")
writeData(wb, "data_format",data_format)
Colpink1 <- c(4,5,6,8) # I would expect these columns to be pink when they are less than column A
Colpurple2 <- c(3,7) # I would expect these columns to be purple when they are less than column B
pinkStyle <- createStyle(fontColour = "#FA977C")
purpleStyle <- createStyle(fontColour = "#9B7CFA")
conditionalFormatting(wb, "data_format",
cols = Colpink1,
rows = 2:nrow(data_format), rule = "<$A2", style = pinkStyle
)
conditionalFormatting(wb, "data_format",
cols = Colpurple2,
rows = 2:nrow(data_format), rule = "<$B2", style = purpleStyle
)
filepath <-
glue("PATH/format_coloring.xlsx")
saveWorkbook(wb, file = filepath)
Column C is purple as expected but G has pink and purple values. I would want G to just be color coded according to purple. The other columns have a mixture of pink and purple where I would expect pink. Does anyone have an idea on how to conditionally format according to index not in order?
If anyone has ideas that would be appreciated.
The cols argument looks like it takes ranges, even though you specify c(4,5,6,8) that is interpreted as columns 4-8 (including column 7). You could loop through Colpink1 and Colpurple2, applying the conditional formatting
wb <- createWorkbook()
addWorksheet(wb, "data_format")
writeData(wb, "data_format",data_format)
Colpink1 <- c(4,5,6,8)
Colpurple2 <- c("C","G") #just testing that column letters worked too
pinkStyle <- createStyle(fontColour = "#FA977C")
purpleStyle <- createStyle(fontColour = "#9B7CFA")
for(i in Colpink1){
conditionalFormatting(wb, "data_format",
cols = i,
rows = 2:nrow(data_format), rule = "<$A2", style = pinkStyle)
}
for(i in Colpurple2){
conditionalFormatting(wb, "data_format",
cols = i,
rows = 2:nrow(data_format), rule = "<$B2", style = purpleStyle)
}
openXL(wb)
I am using following script for an eQTL analysis using gene expression data as input.xlsx which has genes as rows and expression values in columns and sample names as column headings. Then I have a vcf file sample.vcf with same samples as of gene expression data.
library(readxl)
library(vcfR)
library(tidyverse)
library(MatrixEQTL)
all = read_xlsx('input.xlsx')
ge[, 2:ncol(ge)] = lapply(ge[, 2:ncol(ge)], as.numeric)
zeroes = apply(ge[, 2:ncol(ge)], 1, function(x) {sum(x, na.rm = T) == 0})
ge = ge[!zeroes, ]
genes = ge$gene
ge = as.matrix(ge[, -1])
rownames(ge) = genes
write.table(ge, 'ge.txt', sep = '\t', row.names = T)
rm(all, ge)
data.lines = length(count.fields('sample.vcf', comment.char = '#'))
chunk.n = 1e5
start.seq = seq(0, data.lines-1, by = chunk.n)
for (n in start.seq){
first = ifelse(n == 0, T, F)
lines = ifelse(n == max(start.seq), (data.lines %% chunk.n) - 1, chunk.n)
tmp = read.vcfR('sample.vcf', nrows = lines, skip = n)
gt = extract.gt(tmp, as.numeric = T)
write.table(gt, 'snp.txt', sep = '\t',
row.names = T, col.names = first,
append = !first)
}
vcf = bedr::read.vcf('sample.vcf')
snp_pos = bedr::vcf2bed(vcf)
snp_pos = snp_pos %>%
separate(col = 'chr', into = c('snpid', 'chr'), sep = 'ch', convert = T) %>%
mutate(snpid = 1:nrow(snp_pos)) %>%
rename(pos = start) %>%
select(!end)
snp_pos = rename(sp)
snp_pos$snpid = 1:nrow(snp_pos)
snp_pos = snp_pos[, c('snpid', 'chr', )]
ge = SlicedData$new()
ge$LoadFile('ge.txt', delimiter = '\t')
snps = SlicedData$new()
snps$LoadFile('newFile', delimiter = '\t', skipColumns = 186)
colnames(snps) = colnames(ge)
Matrix_eQTL_main(snps = snps, gene = ge,
snpspos = snp_pos, genepos = gene_pos,
useModel = modelANOVA,
output_file_name = 'eQTL_results.txt',
output_file_name.cis = 'eQTL_cis_results.txt',
pvOutputThreshold.cis = 1e-3,
pvOutputThreshold = 1e-3)
But it gives an out of memory error as:
CONVERT VCF TO BED
slurmstepd: error: Detected 1 oom-kill event(s) in step. Some of your processes may have been killed by the cgroup out-of-memory handler.
I think this error is from this script line:
vcf = bedr::read.vcf('sample.vcf')
It makes sense as the vcf file is large in size and the script line is trying to read the entire file all together. I was wondering if there is a way to add another loop for this step so that it can read vcf file in small chunks for the conversion step. Thank you for the help!
The following code works on the example VCF file here: https://www.internationalgenome.org/wiki/Analysis/vcf4.0/
The vcfR object is an S4 object, meaning you can access its components using slot and slot<-.
data.lines = length(count.fields('example.vcf', comment.char = '#'))
chunk.n = 2
start.seq = seq(0, data.lines-1, by = chunk.n)
sample_vcf = read.vcfR('example.vcf', nrows = 2)
slotNames(sample_vcf)
# [1] "meta" "fix" "gt"
vcf = new("vcfR")
slot(object = vcf, name = "meta") = slot(object = sample_vcf, name = "meta")
for (n in start.seq) {
first = n == 0
lines = ifelse(n == max(start.seq), (data.lines %% chunk.n), chunk.n)
# each chunk
ind = which(n == start.seq)
sample_vcf = vcfR::read.vcfR('example.vcf',
nrows = lines, skip = n)
# partial matrices of the whole file
if (first) {
slot(object = vcf, name = "fix") = slot(object = sample_vcf, name = "fix")
slot(object = vcf, name = "gt") = slot(object = sample_vcf, name = "gt")
} else {
slot(object = vcf, name = "fix") = rbind(
slot(object = vcf, name = "fix"),
slot(object = sample_vcf, name = "fix"))
slot(object = vcf, name = "gt") = rbind(
slot(object = vcf, name = "gt"),
slot(object = sample_vcf, name = "gt"))
}
}
# check that worked okay
stopifnot(validObject(vcf))
nrow(slot(object = vcf, name = "fix"))
# [1] 5
As you import each chunk of file, bind it into the object. The file I used had elements meta, fix and gt. It would be worth checking that these are the only elements present before running the loop on the whole file.
I think your calculation of lines is incorrect; you are not counting from 0, you are skipping 0, so the last chunk contains data.lines %% chunk.n rows, not data.lines %% chunk.n - 1.
The next step you have uses bedr::read.vcf. Unfortunately this creates a pseudo-class list object with elements header and vcf and attribute vcf. It seems that we can convert the vcfR object into a vcf object which can then be processed. The header contains parsed meta information, but it does not appear to be needed by your workflow.
#' #title Convert VCFR to VCF
#' #param x a VCFR object
#' #return a VCF object
as.vcf.vcfR <- function(x) {
vcf <- list(
# TODO implement parsing if needed downstream
header = list(meta = slot(object = x, name = "meta")),
vcf = cbind(
data.table::data.table(slot(object = x, name = "fix")),
data.table::data.table(slot(object = x, name = "gt"))))
attr(vcf, which = "vcf") <- TRUE
vcf$vcf$POS <- as.numeric(vcf$vcf$POS)
vcf
}
vcf2 = as.vcf.vcfR(vcf)
snp_pos = bedr::vcf2bed(vcf2)
# CONVERT VCF TO BED
# Warning messages:
# 1: In bedr::vcf2bed(vcf2) :
# ALT contains a comma and the variant length was decided based on the first element of ALT.
# 2: In bedr::vcf2bed(vcf2) :
# ALT contains a comma and the variant length was decided based on the first element of ALT.
snp_pos
# chr start end
# 1 20 14369 14370
# 2 20 17329 17330
# 3 20 1110695 1110696
# 4 20 1230236 NA
# 5 20 1234566 1234567
I have written a function that creates excel files based on different taxa. Each taxa has to form it's own sheet with its name i.e. "Phyla" sheet from Phylum_count_1.1 data.
excel_taxa <- function(taxa_var, taxa, file_name) {
if (file.exists(file_name)) {
file.remove(file_name)
write.xlsx(taxa_var, file_name, sheetName = taxa, row.names = FALSE)
if(file.exists(file_name)) {
write.xlsx(taxa_var, file_name, sheetName = taxa, append = TRUE, row.names = FALSE)
}
} else {
write.xlsx(taxa_var, file_name, sheetName = taxa, row.names = FALSE)
if(file.exists(file_name)) {
write.xlsx(taxa_var, file_name, sheetName = taxa, append = TRUE, row.names = FALSE)
}
}
}
The function works fine, however I have a problem when I am trying to add the data with its associated taxa name. So Domain_count1.1 should have a sheet name "Domain". Here is my attempt to conduct the for loops:
var_list1 = list(Domain_count_1.1, Phylum_count_1.1)
var_list2 = list("Domain", "Phyla")
for (i in var_list1) {
for (j in var_list2) {
excel_taxa(i, j, "test.xlsx")
}
}
Any suggestions?
I have used the following loop to create my excel files. it may help you
library(xlsx)
file_name = "test.xlsx"
#data
Domain_count_1.1 <- as.data.frame(matrix(1:100, nrow = 10))
Phylum_count_1.1 <- as.data.frame(matrix(101:200, nrow = 10))
#lists
taxa_var = list(Domain_count_1.1, Phylum_count_1.1)
taxa = list("Domain", "Phyla")
#creat a workbook an if the file exist delet it
if (file.exists(file_name)) {
file.remove(file_name)
}
wb <- createWorkbook(type = "xlsx")
#loop
for (i in 1:length(taxa_var)) {
sheet <- createSheet(wb, taxa[[i]])
addDataFrame(
taxa_var[[i]],
sheet,
startRow = 3,
startColumn = 1,
col.names = TRUE,
row.names = T
)
saveWorkbook(wb, paste0(file_name))
}
After running this code
library(XLConnect)
template <- loadWorkbook ( filename = "template.xlsx" , create = T )
createSheet ( template , c("sheet1","sheet2") )
# setStyleAction(template,XLC$"STYLE_ACTION.NONE")
Data <- data.frame(
a = 1:10,
b = 11:20
)
setDataFormatForType(template, type = XLC$DATA_TYPE.NUMERIC, format = "0.00" )
# list22$`Brand Equity` <- as.numeric(list22$`Brand Equity`)
# list22$`Purchase Intent` <- as.numeric(list22$`Purchase Intent`)
csHeader <- createCellStyle(template, name = "header10")
setFillPattern(csHeader, fill = XLC$BORDER.DOUBLE)
setFillForegroundColor(csHeader, color = XLC$COLOR.DARK_RED)
# setCellFormula(object = template, sheet = (paste0("sheet",i)), row = c(2:4),col = c(1:3), formula = )
setCellStyle(template, sheet = "sheet1", row = 1,
col = c(1:2), cellstyle = csHeader)
setCellStyle(template, sheet = "sheet2", row = 1,
col = c(1:2), cellstyle = csHeader)
for (i in 1:2)
{
setColumnWidth(template, sheet = (paste0("sheet",i)), column = c(1:3), width = 15800)
writeWorksheet ( template , data = Data, sheet = (paste0("sheet",i)), startRow = 1 , startCol = 1 ,
header = TRUE )
}
saveWorkbook ( template )
I obtain
and
It does not seem to pass my argument about the color of the cell. Any ideas ? Moreover is there a way to write transform the numbers in percentages ? So 1 for instance would be 100%, 2 would be 200% etc...
For converting the numbers into percentage, you can write a function similar to this one:
addformatperc<-function(num,roundlevel){
betternum<-paste(prettyNum(round(num*100,roundlevel),big.mark = ","),"%",sep="")
return(betternum)
}
#Output
addformatperc(1,0)
[1] "100%"
I export tables with large values to .xlsx using XLConnect.
Is there a way for the results to be written into an excel-cell with activated thousand separators?
library(XLConnect)
#example for a large value
a <- 10000000000
wb <- loadWorkbook("sof_q.xlsx"), create = TRUE)
cs <- createCellStyle(wb)
setDataFormat(cs, format = "0.00")
createSheet(wb, name = "a")
writeWorksheet(wb,a,"a",startRow = 1, startCol = 1, header = TRUE)
rc = expand.grid(row = 1:2, col = 1:2)
setCellStyle(wb, sheet = "a", row = rc$row, col = rc$col, cellstyle = cs)
setColumnWidth(wb, sheet = "a", column = 1:5, width = -1)
saveWorkbook(wb)
In Excel, a should look like this
10.000.000.000
Using
setDataFormat(cs, format = "0,000,000.00")
might work but in the case of shorter values, I have values like
0,032,666.29
Use # for a digit placeholder:
setDataFormat(cs, format = "###,###.00")
or
setDataFormat(cs, format = "###,##0")