Extract Links from Excel Sheet using R-package 'openxlsx' - r

I'm working with an Excel sheet in which some columns contain hyperlinks that are represented as text that is completely different from the actual address the hyperlinks point to. I want to use some R code to modify and subset the Excel sheet but keep the hyperlinks. I think I can do this by extracting those hyperlinks as an indexed character vector then re-introducing them into a new Excel document using the makeHyperlinkString() and writeFormula() functions. But I cannot figure out how to get a vector of the links themselves.
In case it matters, my intention is to do all the modifying and subsetting on a data.frame version of the Excel sheet rather than a workbook object.

Oh, now I think I got your problem. I thought there were only normal hyperlinks not Excel-Hyperlinks.
I think this may help you to get a vector of the hyperlinks, although its a bit messy.
library(openxlsx)
pathtofile = "path to .xlsx file"
df1 <- read.xlsx(xlsxFile = pathtofile,
sheet = 1, skipEmptyRows = FALSE,
colNames = F, rowNames = F,
startRow = 1)
## Sheet or Tabelle
Sheet = "Sheet" ## Or "Tabelle"
## Get Names of rows from Hyperlink column
rowIndex <- sub(x = df1[,1], pattern = paste0("(#'",Sheet,"\\d'!)"), replacement = "")
## Get the Sheet, where Hyperlinks are saved
SheetName <- regmatches(df1[,1], regexpr(text = df1[,1], pattern = paste0("(",Sheet,"\\d)")))
## Extract only the Sheet number
SheetIndex <- as.numeric(sub(x = SheetName, pattern = Sheet, replacement = ""))
## Get the row Indexes as numeric
RowIndexNum <- as.numeric(regmatches(rowIndex, regexpr(text = rowIndex, pattern = "\\d")))
## Get the column name as character
RowIndexName <- sub(x = rowIndex, pattern = "\\d", "")
## Create uppercase Letters
myLetters <- toupper(letters[1:26])
## Convert Row Name (character) to numeric (based on alphabetical order)
RowIndexNameNum <- match(RowIndexName, myLetters)
## If Hyperlinks only in 1 Sheet or several sheets
if (length(unique(SheetIndex)) == 1) {
dfLinks <- read.xlsx(xlsxFile = pathtofile,
sheet = unique(SheetIndex),
skipEmptyRows = FALSE,
colNames = F, rowNames = F,
rows = RowIndexNum[1]:tail(RowIndexNum,1),
cols = unique(RowIndexNameNum),
startRow = 1
);
} else {
dfLinks <- data.frame()
for (i in unique(SheetIndex)){
dfTmp <- read.xlsx(xlsxFile = pathtofile,
sheet = i,
skipEmptyRows = FALSE,
colNames = F, rowNames = F,
rows = RowIndexNum[1]:tail(RowIndexNum,1),
cols = unique(RowIndexNameNum),
startRow = 1)
dfLinks <- rbind(dfLinks, dfTmp)
}
}
dfLinks
This is how my Excel File looks like:

Related

Formatting numbers with openxlsx package in R

I am trying to format certain columns using openxlsx to write an r data frame to an excel file.
Here is a snippet of the R data frame:
The square bracket part in the "seed" column is used to superscript the excel output.
Here is the code I used to write the file:
openxlsx::addWorksheet(wb, sheetName = 'data') # add sheet
openxlsx::writeData(wb, sheet ='data',
x=df, xy=c(1, 1),withFilter = T) # write data on workbook
# make quality codes superscript
for(i in grep("\\_\\[([A-z0-9\\s]*)\\]", wb$sharedStrings)){
# if empty string in superscript notation, then just remove the superscript notation
if(grepl("\\_\\[\\]", wb$sharedStrings[[i]])){
wb$sharedStrings[[i]] <- gsub("\\_\\[\\]", "", wb$sharedStrings[[i]])
next # skip to next iteration
}
# insert additional formatting in shared string
wb$sharedStrings[[i]] <- gsub("<si>", "<si><r>", gsub("</si>", "</r></si>", wb$sharedStrings[[i]]))
# find the "_[...]" pattern, remove brackets and udnerline and enclose the text with superscript format
wb$sharedStrings[[i]] <- gsub("\\_\\[([A-z0-9\\s]*)\\]",
"</t></r><r><rPr><vertAlign val=\"superscript\"/></rPr><t xml:space=\"preserve\">\\1</t></r><r><t xml:space=\"preserve\">",
wb$sharedStrings[[i]])
}
openxlsx::modifyBaseFont(wb, fontSize = 10, fontName = 'Arial')
# right-justify data
openxlsx::addStyle(wb, sheet = 'data',
style = openxlsx::createStyle(halign = "right"), rows = 1:nrow(df)+1, cols = 3:12, gridExpand = TRUE)
#apply to rows with "All" in column B
openxlsx::conditionalFormatting(wb,sheet = 'data',
cols = 1:ncol(df),
rows = 1:nrow(df)+1,
rule = 'LEFT($B2,3)="ALL"',
style = openxlsx::createStyle(textDecoration = 'bold', bgFill = '#dad9d9'))
# format numbers
openxlsx::addStyle(wb = wb, sheet = 'data',
style = openxlsx::createStyle(numFmt = "#,###.0"),
rows = 1:nrow(df)+1, cols = c(5:7,10:12),gridExpand = T)
# write excel file
openxlsx::saveWorkbook(wb, file="file.xlsx",
overwrite = TRUE)
The output looks like this:
I am trying to conditionally format the numbers in the seed/harv/yield/prod columns to be numbers, but the "F" values are creating a mixed-class vector at best. (I need these Fs!)
Any ideas?
Thanks!

Issue with combining multiple separate data points in R

I have X number of spreadsheets with information spreading out over two tabs.
I am looking to combine these into one data frame.
The files have 3 distinct cells on tab 1 (D6, D9, D12) and tab 2 has a grid (D4:G6) that i want to pull out of each spreadsheet into a row.
So far i have made the data frame, and pulled a list of the files. I have managed to get a for-loop working that pulls out the data from sheet1 D6, i plan to copy this code for the rest of the cells I need.
file.list <-
list.files(
path = "filepath",
pattern = "*.xlsx",
full.names = TRUE,
recursive = FALSE
)
colnames <- c( "A","B","C","etc",)
output <- matrix(NA,nrow = length(file.list), ncol = length(colnames), byrow = FALSE)
colnames(output) <- c(colnames)
rownames(output) <- c(file.list)
for (i in 1:length(file.list)) {
filename=file.list[i]
data = read.xlsx(file = filename, sheetIndex = 1, colIndex = 7, rowIndex = 6)
assign(x = filename, value = data)
}
The issue i have is that R then pulls out X number of single data points, and I am unable to bring this out as one list of multiple data points to insert in to the dataframe.

exporting a list from R to excel in a good format

I'm trying to use the library(xlsx) to write some data from R into excel in a readable format.
My dataset is formatted as:
tbl <- list("some_name"=head(mtcars),"some_name2"=head(iris))
I would like to write this table to excel, with each item in the list being identified and the data being next to the item. E.g. the excel file should look like
"some_name" in cell A1
paste the dataframe head(mtcars) in cell A3
"some_name2" in cell A11
paste the dataframe head(iris) in cell A13
or something similar, e.g. pasting each item into a new worksheet.
Using
write.xlsx(tbl,"output.xlsx")
will output it correctly however it is not formatted in a readable way.
Any help would be great
The following codes create a xlsx file with multiple sheets, each of which holds a list name as the sheet name and a title, and a data frame below the title. You can modify it as you like.
require(xlsx)
ls2xlsx <- function(x, wb){
for(i in 1:length(x)){
sh <- createSheet(wb, names(x[i]))
cl_title <- createCell(createRow(sh, 1), 1)
addDataFrame(x[i], sh, startRow = 2, startColumn = 1)
setCellValue(cl_title[[1, 1]], names(x[i]))
}
}
tbl <- list("some_name" = head(mtcars),"some_name2"=head(iris))
wb <- createWorkbook()
ls2xlsx(tbl, wb)
saveWorkbook(wb, 'test.xlsx')
The following function writes a list of dataframes to an .xlsx file.
It has two modes, given by argument beside.
beside = TRUE is the default. It writes just one sheet, with the dataframe name on the first row, then an empty cell, then the dataframe. And repeats this for all dataframes, written side by side.
beside = FALSE writes one dataframe per sheet. The sheets' names are the dataframes names. If the list members do not have a name, the name is given by argument sheetNamePrefix.
The .xlsx file is written in the directory given by argument file.
writeList_xlsx <- function(x, file, beside = TRUE, sheetNamePrefix = "Sheet"){
xnames <- names(x)
shNames <- paste0(sheetNamePrefix, seq_along(x))
if(is.null(xnames)) xnames <- shNames
if(any(xnames == "")){
xnames[xnames == ""] <- shNames[xnames == ""]
}
wb <- createWorkbook(type = "xlsx")
if(beside){
sheet <- createSheet(wb, sheetName = shNames[1])
row <- createRow(sheet, rowIndex = 1)
col <- 0
for(i in seq_along(x)){
col <- col + 1
cell <- createCell(row, colIndex = col)
setCellValue(cell[[1, 1]], xnames[i])
col <- col + 2
addDataFrame(x[[i]], sheet,
startRow = 1, startColumn = col,
row.names = FALSE)
col <- col + ncol(x[[i]])
}
}else{
for(i in seq_along(x)){
sheet <- createSheet(wb, sheetName = xnames[i])
addDataFrame(x[[i]], sheet, row.names = FALSE)
}
}
if(!grepl("\\.xls", file)) file <- paste0(file, ".xlsx")
saveWorkbook(wb, file = file)
}
writeList_xlsx(tbl, file = "test.xlsx")
writeList_xlsx(tbl, file = "test2.xlsx", beside = FALSE)

Colouring cells in Excel output from R [duplicate]

I want to export data frames to Excel and highlight cells according to certain rules. I don't think this answer to a similar question is correct. I think it is possible, and I think I get close using the CellStyle functions of the xlsx package.
Below I outline what I've tried. Most of the ideas come from the package help files. I get all the way to the end and get an error when I try to apply the style I created to the cells that meet the criteria. I get the error: Error in .jcall(cell, "V", "setCellStyle", cellStyle$ref) : RcallMethod: invalid object parameter.
library(xlsx)
# create data
cols <- sample(c(1:5), 1) # number of columns to vary to mimic this unknown
label <- rep(paste0("label ", seq(from=1, to=10)))
mydata <- data.frame(label)
for (i in 1:cols) {
mydata[,i+1] <- sample(c(1:10), 10)
}
# exporting data.frame to excel is easy with xlsx package
sheetname <- "mysheet"
write.xlsx(mydata, "mydata.xlsx", sheetName=sheetname)
file <- "mydata.xlsx"
# but we want to highlight cells if value greater than or equal to 5
wb <- loadWorkbook(file) # load workbook
fo <- Fill(backgroundColor="yellow") # create fill object
cs <- CellStyle(wb, fill=fo) # create cell style
sheets <- getSheets(wb) # get all sheets
sheet <- sheets[[sheetname]] # get specific sheet
rows <- getRows(sheet) # get rows
cells <- getCells(rows) # get cells
values <- lapply(cells, getCellValue) # extract the values
# find cells meeting conditional criteria
highlight <- "test"
for (i in names(values)) {
x <- as.numeric(values[i])
if (x>=5 & !is.na(x)) {
highlight <- c(highlight, i)
}
}
highlight <- highlight[-1]
# apply style to cells that meet criteria
if (length(highlight)>0) { # proceed if any cells meet criteria
setCellStyle(cells[highlight], cs) # DOES NOT WORK
}
# save
saveWorkbook(wb, file)
Update:
I've also tried:
if (length(highlight)>0) { # proceed if any cells meet criteria
for (h in 1:length(highlight)) {
setCellStyle(cells[highlight[h]], cs) # DOES NOT WORK
}
}
But I get the error: Error in .jcall(cell, "V", "setCellStyle", cellStyle$ref) : RcallMethod: cannot determine object class
Try this out. I changed a few things, including the a slight change to the call to Fill and limiting the cells included for consideration to those with numeric data. I used lapply to apply the conditional formatting.
cols <- sample(c(1:5), 1) # number of columns to vary to mimic this unknown
label <- rep(paste0("label ", seq(from=1, to=10)))
mydata <- data.frame(label)
for (i in 1:cols) {
mydata[,i+1] <- sample(c(1:10), 10)
}
# exporting data.frame to excel is easy with xlsx package
sheetname <- "mysheet"
write.xlsx(mydata, "mydata.xlsx", sheetName=sheetname)
file <- "mydata.xlsx"
# but we want to highlight cells if value greater than or equal to 5
wb <- loadWorkbook(file) # load workbook
fo <- Fill(foregroundColor="yellow") # create fill object
cs <- CellStyle(wb, fill=fo) # create cell style
sheets <- getSheets(wb) # get all sheets
sheet <- sheets[[sheetname]] # get specific sheet
rows <- getRows(sheet, rowIndex=2:(nrow(mydata)+1) # get rows
# 1st row is headers
cells <- getCells(rows, colIndex = 3:(cols+3)) # get cells
# in the wb I import with loadWorkbook, numeric data starts in column 3
# and the first two columns are row number and label number
values <- lapply(cells, getCellValue) # extract the values
# find cells meeting conditional criteria
highlight <- "test"
for (i in names(values)) {
x <- as.numeric(values[i])
if (x>=5 & !is.na(x)) {
highlight <- c(highlight, i)
}
}
highlight <- highlight[-1]
lapply(names(cells[highlight]),
function(ii)setCellStyle(cells[[ii]],cs))
saveWorkbook(wb, file)
Old question, but for people that still research this topic:
In the package openxlsx, there is a function that makes this much easier- conditionalFormatting()
Below is an example:
#Load needed package
if (!require("pacman")
) install.packages("pacman")
pacman::p_load(
#add list of libraries here
openxlsx
)
##Create workbook and write in sample data
wb <- createWorkbook()
addWorksheet(wb, "Moving Row")
writeData(wb, "Moving Row", -5:5)
writeData(wb, "Moving Row", LETTERS[1:11], startCol = 2)
##Define how you want the cells to be formatted
negStyle <- createStyle(fontColour = "#9C0006", bgFill = "#FFC7CE")
posStyle <- createStyle(fontColour = "#006100", bgFill = "#C6EFCE")
## highlight row dependent on first cell in row
conditionalFormatting(wb, "Moving Row",
cols = 1:2,
rows = 1:11, rule = "$A1<0", style = negStyle
)
conditionalFormatting(wb, "Moving Row",
cols = 1:2,
rows = 1:11, rule = "$A1>0", style = posStyle
)
##Save workbook in default location
saveWorkbook(wb, "conditionalFormattingExample.xlsx", TRUE)
you can read about it here and see many other types of conditional highlighting it can do:
https://www.rdocumentation.org/packages/openxlsx/versions/4.2.5/topics/conditionalFormatting
It has been a while since I used this feature. Yes it should be possible to save conditional formatting. My (old) code is given below. Hope it helps you.
file.name <- paste('loadings.',state$data,'.xls', sep = "")
wb <- loadWorkbook(file.name, create = TRUE)
createSheet(wb, name = 'loadings')
clearSheet(wb, sheet = 'loadings')
Variables <- rownames(df)
df.loadings <- cbind(Variables,df)
df.loadings[,'Communality'] <- NULL
writeWorksheet(wb,df.loadings[,-1], sheet = 'loadings', rownames = 'Variables', startRow = 1, startCol = 1)
max.loading <- createCellStyle(wb)
setFillPattern(max.loading, fill = XLC$"FILL.SOLID_FOREGROUND")
setFillForegroundColor(max.loading, color = XLC$"COLOR.SKY_BLUE")
maxVal <- apply(abs(df.loadings[,-1]),1,max)
maxValIndex <- which(abs(df.loadings[,-1]) == maxVal, arr.ind = TRUE)
setCellStyle(wb, sheet = "loadings", row = maxValIndex[,'row']+1, col = maxValIndex[,'col']+1, cellstyle = max.loading)
df.corr <- data.frame(cor(f.data))
df.corr <- cbind(Variables,df.corr)
createSheet(wb, name = 'correlations')
clearSheet(wb, sheet = 'correlations')
writeWorksheet(wb, df.corr, sheet = 'correlations', startRow = 1, startCol = 1)
corr <- createCellStyle(wb)
setFillPattern(corr, fill = XLC$"FILL.SOLID_FOREGROUND")
setFillForegroundColor(corr, color = XLC$"COLOR.SKY_BLUE")
corrIndex <- which(abs(df.corr[,-1]) > .3 & df.corr[,-1] != 1 , arr.ind = TRUE)
setCellStyle(wb, sheet = "correlations", row = corrIndex[,'row']+1, col = corrIndex[,'col']+1, cellstyle = corr)
saveWorkbook(wb)
if(.Platform$OS.type == "unix") {
execute(paste("browseURL(\"",getwd(),'/',file.name,"\", browser = '/usr/bin/open')",sep=''))
} else {
execute(paste("browseURL(\"",getwd(),'/',file.name,"\", browser = NULL)",sep=''))
}

Color and Style Format Data.Frames in R

I am trying to format a data.frame I have created in R and can not seem to find a solution. By format I mean that I would like to bold/italicize the headers(i.e. names) of the data.frame and also color/highlight certain rows within the data.frame.
I tried researching on google/stackoverflow, but I could not find something applicable or directly useful. I also tried using CellStyle, but that would not allow me to bold/italicize the headers of the data.frame.
Any help is appreated!
You can go a long way with XLConnect package to format a data frame in Excel.
Here is code I have been using, which would be a good starting point for you learning it and exploring changes to suit you. Usage: save.xls(your data frame, the name you want to give the file with the .xlsx suffix, ....
save.xls <- function(df, filename, sheetname="Sheet", create=TRUE, rownames=NULL, startRow=1, zebra=F) {
require(XLConnect)
require(stringr)
if (is.matrix(df)) df <- as.data.frame(df)
if (!str_detect(filename, "\\.xlsx$")) filename <- str_c(filename, ".xlsx")
wb <- loadWorkbook(filename, create=create)
if (existsSheet(wb, sheetname))
warning(sprintf("Sheet %s already existed and was overwritten", sheetname))
createSheet(wb, name=sheetname)
if (!is.null(rownames)) df <- transform(df, rownames = row.names(df))
writeWorksheet(wb, df, startRow=startRow, sheet=sheetname, rownames=rownames)
if (zebra) {
color <- createCellStyle(wb)
setFillForegroundColor(color, color = XLC$"COLOR.LIGHT_CORNFLOWER_BLUE")
setFillPattern(color, fill = XLC$FILL.SOLID_FOREGROUND)
for (i in 1:ncol(df)) {
setCellStyle(wb, sheet = sheetname, row = seq(startRow+1, nrow(df)+2, 2), col = i,
cellstyle = color)
}
#prcntg <- createCellStyle(wb) see my script of XLConnect.R for how it worked
#dollar <- createCellStyle(wb)
#setDataFormat(prcntg, format = "0.0")
#setDataFormat(dollar, format = "$ 0.00")
border <- createCellStyle(wb)
setBorder(border, side = c("bottom","top"), type = XLC$"BORDER.THICK", color = XLC$"COLOR.RED")
setCellStyle(wb, sheet = "Sheet", row = startRow, col = 1:ncol(df), cellstyle = border)
setColumnWidth(wb, sheet = "Sheet", column = 1:ncol(df), width = -1) # this autosizes each column
}
saveWorkbook(wb)
}

Resources