I'm using the excellent modelsummary package to summarize my data and regression models.
One problem I've run into is having column labels -- variables for datasummary_correlation and model names for regression output -- that are too wide for the PDF page size I want. It looks fine in Rstudio and HTML, but that's just because there's no width issue there. In PDF, an overly wide table runs into the margin.
Here's an example where you can see how the column labels are too wide.
What I would like is to break up the column labels at the place of my choosing, so that they would consist of multiple rows (and narrower columns). In fact, what it looks like more on the HTML rendering:
Looks like the correct strategy (h/t the author of modelsummary) is to pipe the output into one of the table helper functions, then to use the column_spec() function.
So for me, I can do (making the data columns shorter and the label column a little longer):
out <- datasummary_correlation(output="kableExtra", booktabs=T)
out %>% column_spec(1,width="1.5in") %>% column_spec(2:5,width="0.75in")
Related
The text from a pdf I scraped is jumbled up in different elements. Not to mention, it deleted data when it was converted to a data frame. It's really hard to tell where the text should have been split since it seems like I got it correct in the below code. How do I split the text so that it looks looks like the original table?
mintz = "https://www.mintz.com/sites/default/files/media/documents/2019-02-08/State%20Legislation%20on%20Biosimilars.pdf"
mintzText = pdf_subset(mintz,pages = 2:23)
mintzText = pdf_text(mintzText)
q = data.frame(trimws(mintzText))
mintzdf <- q %>%
rename(x = trimws.mintzText.) %>%
mutate(x=strsplit(x, "\\n")) %>%
unnest(x)
View(mintzdf)
mintzDF=mintzdf[-c(1:2),]
mintzDF=mintzDF %>%
separate(x, c("a","State", "Substitution
Requirements","Pharmacy Notification Requirements
(to prescriber, patient, or others)","Recordkeeping
Requirements"))%>%
select(-a)
View(mintzdf)
what it looks like
what it should look like
Pdf stored order for a page may be random or bottom rows upwards as there are no key press order rules for when lasers charge a drum (The design requirement for PDF introduction)
We are lucky if the order can be sensibly extracted, but this is a very well ordered PDF. So remember there is no need to observe the grid simply output by rows with spaces that with luck form columns.
In this case using poppler pdftotext with no controls a single page text order could look like this with the first column headed State and the second starting with Substitution\nRequirements\n so clearly there may be head scratching why State is not spaced away from Alaska? but then it is PDF after all, so expect there are no rules.
Looks like it was written down one column then across two then perhaps down the last ?.
Dependant on the very different page variations, I would attempt to target as vertical strips, rather than horizontals. so set a template as 4 vertical page high zones and then hope the horizontal breaks can be determined as matches. The alternative (probably better) is extract as a tabular layout and xpdf pdftotext may then give a better result.
Or use a python table extractor like pdfminer.
I am trying to print a series of named numbers as part of an R Markdown html document (specifically, the fitted values and residuals of an lm regression. The list of numbers is very long, so I have the document set to paged tables using the option df_print: paged. However, I've noticed that there is a lot of unused blank space because there is only one column in addition to row names. I would like to try and set the table such that the displayed table wraps around and there are two or three series of columns displayed at once, sort of like what happens if I try to print a named number and there are two columns of named numbers displayed.
Here is an example that produces results similar to what I am trying to avoid using the mtcars dataset.
data(mtcars)
fit<-lm(mpg~cyl,data=mtcars)
data.frame(fit$residuals)
fit$residuals
Printing fit$residuals gets me closer to what I want in terms of space optimization, but it prints every single row and I cannot paginate the data (or at least, I do not know if it can be done).
For two columns:
data(mtcars)
fit<-lm(mpg~cyl,data=mtcars)
df=data.frame(fit$residuals)
df1=cbind(df[1:16,], df[17:32,])
library(htmlTable)
htmlTable(df1, cgroup=c("Residuals 1:16", "Residuals 17:32"), n.cgroup=c(1,1), rnames=FALSE)
For three columns:
data(mtcars)
fit<-lm(mpg~cyl,data=mtcars)
df=data.frame(fit$residuals)
df1=cbind(df[1:11,], df[12:22,], df[23:33,])
htmlTable(df1, cgroup=c("Residuals 1:11", "Residuals 12:22", "Residuals 23:32"), n.cgroup=c(1,1,1), rnames=FALSE)
I'd like to do a contingency table between sex and disease. As I use R.markdown for pdf. reports, I use kableExtra to customize the tables. KableExtra doesn't make the tables well when they are not data.frame. So they make ugly table with tableby
With this data.frame here is what I got.
library(kableExtra)
library(arsenal)
set.seed(0)
Disease<-sample(c(rep("Name of the first category of the disease",20),
rep("Name of the Second category of the disease",32),
rep("Name of the third category of the disease",48),
rep("The category of those who do not belong to the first three categories",13)))
ID<-c(1:length(Disease))
Gender<-rbinom(length(Disease),1,0.55)
Gender<-factor(Gender,levels = c(0,1),labels = c("F","M"))
data<-data.frame(ID,Gender,Disease)
When I run the result of this analysis with R.markdown (pdf) I get this kind of table
There are two problems, thirsly KableExtra:: doesn't deal with the characters
Secondly I can't customize columns width when I use tableby with kableExtra, cause I would like to enlarge the column containing variable names, since I am really working with data where the names of the variable values are very long. But if I use kable of knitr::, the characters are removed, but the tables are not scale down, and a part is not displayed. I think knitr has many limitations.
How can I deal with this problem? Or is there another function which could be used in R.markdown (pdf format) to make beautiful contingency table with p.value.
To avoid the being added to your output when using knitr, use results='asin' in the chunk set up options, so like:
{r results='asis'}
You can control the width of the column with your variable names in them by using the width option in the print function. You technically do not need to wrap the summary call in print, but it doesn't change anything if you do, and you are able to adjust the width setting.
So, for your example:
print(summary(tableby(Gender~Disease)), width = 20)
should make it more readable when it renders to pdf. You can change the width and it will wrap at the limit you set.
Using the code from your example and the above function call, the table looks like this when knit to pdf:
I have made a table in Rmarkdown comparing information about models I have created. Code below:
mlist <- list(fitdm3hyp,fitdm3.1,fitdm3.2,fitdm3full,fitdm3.5,fitdm3.5b,fitdm3bi,fitdm3bio)
tablea <- compareLavaan(mlist,fitmeas = c("chisq","df","rmsea.robust","cfi.robust","srmr"),digits=4,type="html",chidif=FALSE)
However, the 1st and 2nd columns (chisq and df) end up so close that you can't tell where each value ends and the other begins. It ends up looking like: 343.44160.00 rather than 343.44 | 160.00.
How can I format this to increase the space between columns, please?
You can 'View()' function in dyplr to better view the dataset. make sure you are using a wider window view too. If you use the Tidyverse package, and make it a dataframe, the data should pad itself enough for better visuals.
First I have an input and I want to get an output like this
(I want to group the occurrences in the dataset between columns):
Second:
Display this table in a good looking way (something that looks like Word or Excel)
I can't use Word or Excel as I'm making some calculations in R with this dataset (it contains columns with numbers which aren't displayed here)
calculating the output table
I don't really see how the output table is calculated. Shouldn't it be output_table["A", "C"] = "e"?
Displaying your data
There are a lot of ways to do that. You might consider using RMarkdown to create a report-style output.
The DT library is also a very handy tool to display tables. It works well with RMarkdown and can be embedded in HTML documents. If you are using RStudio, you can use the following code to display your data
library(DT)
DT::datatable(iris)