Dynamic Reporting in R - r

I am looking for a help to generate a 'rtf' report from R (dataframe).
I am trying output data with many columns into a 'rtf' file using following code
library(rtf)
inp.data <- cbind(ChickWeight,ChickWeight,ChickWeight)
outputFileName = "test.out"
rtf<-RTF(paste(".../",outputFileName,".rtf"), width=11,height=8.5,font.size=10,omi=c(.5,.5,.5,.5))
addTable(rtf,inp.data,row.names=F,NA.string="-",col.widths=rep(1,12),header.col.justify=rep("C",12))
done(rtf)
The problem I face is, some of the columns are getting hide (as you can see last 2 columns are getting hide). I am expecting these columns to print in next page (without reducing column width).
Can anyone suggest packages/techniques for this scenario?
Thanks

Six years later, there is finally a package that can do exactly what you wanted. It is called reporter (small "r", no "s"). It will wrap columns to the next page if they exceed the available content width.
library(reporter)
library(magrittr)
# Prepare sample data
inp.data <- cbind(ChickWeight,ChickWeight,ChickWeight)
# Make unique column names
nm <- c("weight", "Time", "Chick", "Diet")
nms <- paste0(nm, c(rep(1, 4), rep(2, 4), rep(3, 4)))
names(inp.data) <- nms
# Create table
tbl <- create_table(inp.data) %>%
column_defaults(width = 1, align = "center")
# Create report and add table to report
rpt <- create_report("test.rtf", output_type = "RTF", missing = "-") %>%
set_margins(left = .5, right = .5) %>%
add_content(tbl)
# Write the report
write_report(rpt)
Only thing is you need unique columns names. So I added a bit of code to do that.

If docx format can replace rtf format, use package ReporteRs.
library( ReporteRs )
inp.data <- cbind(ChickWeight,ChickWeight,ChickWeight)
doc = docx( )
# uncomment addSection blocks if you want to change page
# orientation to landscape
# doc = addSection(doc, landscape = TRUE )
doc = addFlexTable( doc, vanilla.table( inp.data ) )
# doc = addSection(doc, landscape = FALSE )
writeDoc( doc, file = "inp.data.docx" )

Related

How to display period class from lubridate in datatable from DT?

I have runtime data for various devices that can be widely different, ranging from a few minutes to several months that I would like to display in a datatable. So I thought the seconds_to_period function from lubridate provides a neat format to print this data. However, I seem unable to display it within a datatable from DT, which is what I want to do (within a shiny App).
Some example data:
library(lubridate)
library(DT)
names <- c("A","B","C","D","E","F")
timevec <- c(225,2250,22500,225000,2250000,22500000)
timevec <- seconds_to_period(timevec)
Writing this into a datatable without any formatting does not work as it only displays the seconds without considering the minutes/hours etc.:
##### This cuts off at the seconds -> useless
table <- data.frame(name = names, time = timevec)
my_table <- datatable(table)
Formatting the time column with formatDate also doesn't work since it is not a date or POSIXct object. I can print the desired format by typecasting it as a string, but then the sorting of the column doesn't work as it is sorted alphabetically:
##### This prints the period format, but sorting does not work
table <- data.frame(name = names, time = as.character(timevec))
my_table <- datatable(table)
and of course I could just print the total time in seconds, but as I said I find this very unintuitive to read:
##### This prints the seconds -> unintuitive to read
table <- data.frame(name = names, time = as.duration(timevec))
my_table <- datatable(table)
Any Ideas on how to achieve this or alternative suggestions how to intuitively display duration data?
solution by programming DT to sort a shown character column by a hidden numeric column via columnDefs
library(tidyverse)
library(lubridate)
library(DT)
names <- c("A", "B", "C", "D", "E", "F")
timevec_raw <- c(225, 2250, 22500, 225000, 2250000, 22500000)
timevec_period <- seconds_to_period(timevec_raw)
(table <- tibble(
name = names,
timenum = timevec_raw,
timechar = as.character(timevec_period)
)
)
my_table <- datatable(table,
options = list(
columnDefs = list(
list(
visible = FALSE, targets = 2
), # hide column 2 the numeric one
list(
orderData = c(2), # the ordering of column 3 comes from hidden column 2
targets = c(3)
)
)
)
)

R: Conditional Formatting across excel files

I am trying to highlight rows of an excel file based on a match from the columns in a separate excel file. Pretty much, I want to highlight a row in file1 if a cell in that row matches a cell in file2.
I saw the R package "conditionalFormatting" has some of this functionality, but I cannot figure out how to use it.
the pseudo-code i think would look something like this:
file1 <- read_excel("file1")
file2 <- read_excel("file2")
conditionalFormatting(file1, sheet = 1, cols = 1:end, rows = 1:22,
rule = "number in file1 is found in a specific column of file 2")
Please let me know if this makes sense or if i need to clarify something.
Thanks!
The conditionalFormatting() function embeds active conditional formatting into the excel document but is likely more complicated than you need for a one-time highlight. I'd suggest loading each file into a dataframe, determining which rows contain a matching cell, creating a highlight style (yellow background), loading the file as a workbook object, setting the appropriate rows to the highlight style, and saving the updated workbook object.
The following function is the used to determine which rows have a match. The magrittr package provides the %>% pipes and the data.table package provides the transpose() function.
find_matched_rows <- function(df1, df2) {
require(magrittr)
require(data.table)
# the dataframe object treats each column as a list making it much easier and
# faster to search via column than row. Transpose the original file1 dataframe
# to treat the rows as columns.
df1_transposed <- data.table::transpose(df1)
# assuming that the location of the match in the second file is irrelevant,
# unlist the file2 dataframe so that each value in file1 can be searched in a
# vector
df2_as_vector <- unlist(df2)
# determine which columns contain a match. If one or more matches are found,
# attribute the row as 'TRUE' in the output vector to be used to subset the
# row numbers
match_map <- lapply(df1_transposed,FUN = `%in%`, df2_as_vector) %>%
as.data.frame(stringsAsFactors = FALSE) %>%
sapply(function(x) sum(x) > 0)
# make a vector of row numbers using the logical match_map vector to subset
matched_rows <- seq(1:nrow(df1))[match_map]
return(matched_rows)
}
The following code loads the data, finds the matched rows, applies the highlight, and saves over the original file1.xlsx. The second tst_df1 and tst_df2 provide for an easy way of testing the find_matched_rows() function. As expected, it finds that the 1st and 3rd rows of the first dataframe contain a cell that matches a cell in second dataframe.
# used to ensure that the correct rows are highlighted. the dataframe does not
# include the header as an independent row unlike excel.
file1_header_row <- 1
file2_header_row <- 1
tst_df1 <- openxlsx::read.xlsx("./file1.xlsx",
startRow = file1_header_row)
tst_df2 <- openxlsx::read.xlsx("./file2.xlsx",
startRow = file2_header_row)
#example data for testing
tst_df1 <- data.frame(fname = c("John", "Bob", "Bill"),
lname = c("Smith", "Johnson", "Samson"),
wage = c(10, 15.23, 137.38),
stringsAsFactors = FALSE)
tst_df2 <- data.frame(a = c(10, 34, 284.2),
b = c("Billy", "Bill", "Billy-Bob"),
c = c("Samson", "Johansson", NA),
stringsAsFactors = FALSE)
df_matched_rows <- find_matched_rows(tst_df1, tst_df2)
# any color found in colours() can be used here or hex color beginning with "#"
highlight_style <- openxlsx::createStyle(fgFill = "yellow")
file1_wb <- openxlsx::loadWorkbook(file = "./file1.xlsx")
openxlsx::addStyle(wb = file1_wb,
sheet = 1,
style = highlight_style,
rows = file1_header_row + df_matched_rows,
cols = 1:ncol(tst_df1),
stack = TRUE,
gridExpand = TRUE)
openxlsx::saveWorkbook(wb = file1_wb,
file = "./file1.xlsx",
overwrite = TRUE)

merge columns every other row using Sweave/R/Latex

I am writing a conference abstract booklet using R/Sweave. I have already made the program booklet for printing that contains the id, author, title only.
Now I want to modify it to include the abstract (not for print). But abstracts are lengthy. My thought is to take the cell with the abstract info, and have it display below the row with the author info - expanded across the full width of the page.
ID--author--------title--------------------------------
abstract-----------------------------------------------
So every other row has only one column spanning the width of the entire table.
Is there a way to add multicolmn{x} to every other row?
If a solution can't be figured out, advice for how to print full abstracts in a nice way would be welcome. (Something other than "just use landscape" or "adjust column widths")
Also, it doesn't have to be PDF. I could switch to markdown/html - and make it look closer to real conference program schedules that have full abstracts on them. Again, one I figure out how to print a table where every other row has only one column that is the width of the entire table.
If you want to try - Here is a complete MWE for what I have working now. Note that it uses the R package lipsum which has to be installed via devtools/github.
\documentclass{article}
\usepackage{booktabs, multicol, array}
\usepackage[margin=0.75in]{geometry}
%%%%%%%%%%% Let tables to span entire page
\newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
<<echo=FALSE, warning=FALSE, message=FALSE>>=
# devtools::install_github("coolbutuseless/lipsum")
library(lipsum)
library(xtable)
knitr::opts_chunk$set(echo = FALSE, warning=FALSE, message=FALSE)
options(xtable.comment = FALSE)
tblalign <- "lL{0.5cm}|L{4cm}L{6cm}L{8cm}"
# fake data setup
dat <- data.frame(ID = c(1:3), author = substr(lipsum[1:3], 1, 40),
title = substr(lipsum[4:6], 1, 100),
abstract = lipsum[7:9])
names(dat)=c("\\multicolumn{1}{c}{\\textbf{\\large{ID}}}",
"\\multicolumn{1}{c}{\\textbf{\\large{Author List}}}",
"\\multicolumn{1}{c}{\\textbf{\\large{Title}}}",
"\\multicolumn{1}{c}{\\textbf{\\large{Abstract}}}")
#
\begin{document}
<<results='asis'>>=
print(
xtable(x = dat
, align = tblalign)
, table.placement = "H"
, sanitize.colnames.function=function(x){x}
, include.rownames = FALSE
, include.colnames = TRUE
, size = "small"
, floating = FALSE
, hline.after = c(0,1:nrow(dat))
)
#
\end{document}
Split data from abstract manually
out <- dat[,-4]
ab.list <- dat$abstract
then add.to.row
, add.to.row = list(pos = as.list(1:nrow(out)),
command = paste0("\\multicolumn{3}{L{15cm}}{\\textbf{Abstract: }", ab.list, "} \\\\"))
One approach using my package huxtable. I couldn't install lipsum for some reason, so just hacked it. This is in a .Rmd file with output pdf_document.
```{r, results = 'asis'}
lipsum <- rep(do.call(paste, list(rep('blah ', 100), collapse = '')), 10)
dat <- data.frame(ID = c(1:3), author = substr(lipsum[1:3], 1, 40),
title = substr(lipsum[4:6], 1, 100),
abstract = lipsum[7:9], stringsAsFactors = FALSE)
library(huxtable)
# shape data
datmat <- matrix(NA_character_, nrow(dat) * 2, 3)
datmat[seq(1, nrow(datmat), 2), ] <- as.matrix(dat[, c('ID', 'author', 'title')])
datmat[seq(2, nrow(datmat), 2), 1] <- dat$abstract
# print as PDF
ht <- as_huxtable(datmat)
colspan(ht)[seq(2, nrow(ht), 2), 1] <- 3
wrap(ht) <- TRUE
col_width(ht) <- c(.2, .2, .6)
number_format(ht) <- 0
ht
```

How to retrieve informations about journals from ISI Web of Knowledge?

I am working on some work of prediction citation counts for articles. The problem I have is that I need information about journals from ISI Web of Knowledge. They're gathering these information (journal impact factor, eigenfactor,...) year by year, but there is no way to download all one-year-journal-informations at once. There's just option to "mark all" which marks always first 500 journals in the list (this list then can be downloaded). I am programming this project in R. So my question is, how to retrieve this information at once or in efficient and tidy way? Thank you for any idea.
I used RSelenium to scrape WOS to get citation data and make a plot similar to this one by Kieran Healy (but mine was for archaeology journals, so my code is tailored to that):
Here's my code (from a slightly bigger project on github):
# setup broswer and selenium
library(devtools)
install_github("ropensci/rselenium")
library(RSelenium)
checkForServer()
startServer()
remDr <- remoteDriver()
remDr$open()
# go to http://apps.webofknowledge.com/
# refine search by journal... perhaps arch?eolog* in 'topic'
# then: 'Research Areas' -> archaeology -> refine
# then: 'Document types' -> article -> refine
# then: 'Source title' -> choose your favourite journals -> refine
# must have <10k results to enable citation data
# click 'create citation report' tab at the top
# do the first page manually to set the 'save file' and 'do this automatically',
# then let loop do the work after that
# before running the loop, get URL of first page that we already saved,
# and paste in next line, the URL will be different for each run
remDr$navigate("http://apps.webofknowledge.com/CitationReport.do?product=UA&search_mode=CitationReport&SID=4CvyYFKm3SC44hNsA2w&page=1&cr_pqid=7&viewType=summary")
Here's the loop to automate collecting data from the next several hundred pages of WOS results...
# Loop to get citation data for each page of results, each iteration will save a txt file, I used selectorgadget to check the css ids, they might be different for you.
for(i in 1:1000){
# click on 'save to text file'
result <- try(
webElem <- remDr$findElement(using = 'id', value = "select2-chosen-1")
); if(class(result) == "try-error") next;
webElem$clickElement()
# click on 'send' on pop-up window
result <- try(
webElem <- remDr$findElement(using = "css", "span.quickoutput-action")
); if(class(result) == "try-error") next;
webElem$clickElement()
# refresh the page to get rid of the pop-up
remDr$refresh()
# advance to the next page of results
result <- try(
webElem <- remDr$findElement(using = 'xpath', value = "(//form[#id='summary_navigation']/table/tbody/tr/td[3]/a/i)[2]")
); if(class(result) == "try-error") next;
webElem$clickElement()
print(i)
}
# there are many duplicates, but the code below will remove them
# copy the folder to your hard drive, and edit the setwd line below
# to match the location of your folder containing the hundreds of text files.
Read all text files into R...
# move them manually into a folder of their own
setwd("/home/two/Downloads/WoS")
# get text file names
my_files <- list.files(pattern = ".txt")
# make list object to store all text files in R
my_list <- vector(mode = "list", length = length(my_files))
# loop over file names and read each file into the list
my_list <- lapply(seq(my_files), function(i) read.csv(my_files[i],
skip = 4,
header = TRUE,
comment.char = " "))
# check to see it worked
my_list[1:5]
Combine list of dataframes from the scrape into one big dataframe
# use data.table for speed
install_github("rdatatable/data.table")
library(data.table)
my_df <- rbindlist(my_list)
setkey(my_df)
# filter only a few columns to simplify
my_cols <- c('Title', 'Publication.Year', 'Total.Citations', 'Source.Title')
my_df <- my_df[,my_cols, with=FALSE]
# remove duplicates
my_df <- unique(my_df)
# what journals do we have?
unique(my_df$Source.Title)
Make abbreviations for journal names, make article titles all upper case ready for plotting...
# get names
long_titles <- as.character(unique(my_df$Source.Title))
# get abbreviations automatically, perhaps not the obvious ones, but it's fast
short_titles <- unname(sapply(long_titles, function(i){
theletters = strsplit(i,'')[[1]]
wh = c(1,which(theletters == ' ') + 1)
theletters[wh]
paste(theletters[wh],collapse='')
}))
# manually disambiguate the journals that now only have 'A' as the short name
short_titles[short_titles == "A"] <- c("AMTRY", "ANTQ", "ARCH")
# remove 'NA' so it's not confused with an actual journal
short_titles[short_titles == "NA"] <- ""
# add abbreviations to big table
journals <- data.table(Source.Title = long_titles,
short_title = short_titles)
setkey(journals) # need a key to merge
my_df <- merge(my_df, journals, by = 'Source.Title')
# make article titles all upper case, easier to read
my_df$Title <- toupper(my_df$Title)
## create new column that is 'decade'
# first make a lookup table to get a decade for each individual year
year1 <- 1900:2050
my_seq <- seq(year1[1], year1[length(year1)], by = 10)
indx <- findInterval(year1, my_seq)
ind <- seq(1, length(my_seq), by = 1)
labl1 <- paste(my_seq[ind], my_seq[ind + 1], sep = "-")[-42]
dat1 <- data.table(data.frame(Publication.Year = year1,
decade = labl1[indx],
stringsAsFactors = FALSE))
setkey(dat1, 'Publication.Year')
# merge the decade column onto my_df
my_df <- merge(my_df, dat1, by = 'Publication.Year')
Find the most cited paper by decade of publication...
df_top <- my_df[ave(-my_df$Total.Citations, my_df$decade, FUN = rank) <= 10, ]
# inspecting this df_top table is quite interesting.
Draw the plot in a similar style to Kieran's, this code comes from Jonathan Goodwin who also reproduced the plot for his field (1, 2)
######## plotting code from from Jonathan Goodwin ##########
######## http://jgoodwin.net/ ########
# format of data: Title, Total.Citations, decade, Source.Title
# THE WRITERS AUDIENCE IS ALWAYS A FICTION,205,1974-1979,PMLA
library(ggplot2)
ws <- df_top
ws <- ws[order(ws$decade,-ws$Total.Citations),]
ws$Title <- factor(ws$Title, levels = unique(ws$Title)) #to preserve order in plot, maybe there's another way to do this
g <- ggplot(ws, aes(x = Total.Citations,
y = Title,
label = short_title,
group = decade,
colour = short_title))
g <- g + geom_text(size = 4) +
facet_grid (decade ~.,
drop=TRUE,
scales="free_y") +
theme_bw(base_family="Helvetica") +
theme(axis.text.y=element_text(size=8)) +
xlab("Number of Web of Science Citations") + ylab("") +
labs(title="Archaeology's Ten Most-Cited Articles Per Decade (1970-)", size=7) +
scale_colour_discrete(name="Journals")
g #adjust sizing, etc.
Another version of the plot, but with no code: http://charlesbreton.ca/?page_id=179

Produce pretty table for print that shows which point estimates differ significantly using R

I want to create a table of point estimates from a sample for print in the following format
variable group1 group2 group3 etc
age 18.2 18.5 23.2
weight 125.4 130.1 117.1
etc
I also have confidence intervals for each point estimate, but displaying them will cause too much clutter. Instead, I'd like to use text attributes (italics, bold, underline, font) to signal which point estimates in a row differ significantly. So, in the first row, if 23.2 differed significantly from the other two, it would be displayed in bold (for example). I'm not sure if such a display would appear bewildering, but I'd like to try.
Could anyone suggest a table formatting library in R that would allow me to accomplish this? Perhaps one that allows me to supply text attributes in the data table for each point estimate?
Another solution could be to use ReporteRs package using FlexTable API and send the object to a docx document :
library( ReporteRs )
data = iris[45:55, ]
MyFTable = FlexTable( data = data )
MyFTable[data$Petal.Length < 3, "Species"] = textProperties( color="red"
, font.style="italic")
MyFTable[data$Sepal.Length < 5, 1:4] = cellProperties( background.color="#999999")
MyFTable[ , 1:4] = parProperties( text.align="right" )
doc.filename = "test.docx"
doc = docx( )
doc = addFlexTable( doc, MyFTable )
writeDoc( doc, file = doc.filename )
I believe you can do something like this with the xtable() package - if you have xtable output your table, you can use knitr/pandoc to convert it to word, HTML, etc. or you can just paste the LaTeX output into a document and compile it.
Here's a demo:
library(xtable)
# original data frame
df <- data.frame(var=c("age", "weight", "etc"), group1=c("18.2", "125.4", "3"), group2=c("18.5", "130.1", "3"), group3=c("23.2", "117.1", "3"), etc=c("1", "2", "3"))
# data frame in similar format indicating significance
significant <- data.frame(var=c("age", "weight", "etc"), group1=c(F, T, F), group2=c(T, F, T), group3=c(F, T, F))
library(reshape2)
# transform everything into long form to apply text formatting
df.melt <- melt(df, id.vars = 1, variable.name="group", value.name="value")
sig.melt <- melt(significant, id.vars=1, variable.name = "group", value.name="sig")
# merge datasets together
tmp <- merge(df.melt, sig.melt)
tmp$ans <- tmp$value
# apply text formatting using LaTeX functions
tmp$ans[tmp$sig] <- paste0("\\textit{", tmp$ans, "}")[tmp$sig]
# transform dataset back to "wide form" for table output
df2 <- dcast(tmp, var~group, value.var="ans")
# output table in LaTeX format
print(xtable(df2), include.rownames=FALSE, sanitize.text.function=identity)
A qucik demo based on the OP-mentioned pander package:
Load it:
library(pander)
Create some dummy data, which I will import from the rapport package this time:
df <- rapport::ius2008
Compute a basic cross table:
t <- table(df$dwell, df$net.pay)
Identify those cells with high standardized residuals and emphasize those:
emphasize.cells(which(abs(chisq.test(t)$stdres) > 2, arr.ind = TRUE))
Do not split the markdown table:
panderOptions('table.split.table', Inf)
Print the markdown table:
pander(t)
Resulting in:
----------------------------------------------------------------------------
parents school/faculty employer self-funded other
---------------- --------- ---------------- ---------- ------------- -------
**city** 276 14 26 229 *20*
**small town** 14 1 1 11 *4*
**village** 13 1 0 13 2
----------------------------------------------------------------------------

Resources