Split xtable ouput into sub tables - r

Have a question on using xtable with Sweave when there are multiple columns. A table I am working on has about 25 columns and 5 rows. The exact number of columns is not known as that is dynamic.
When I run say,
table1 <- table (df$someField)
I get a table that essentially exceeds the page length.
ColA ColB ColC
---------------------------
RowA 1 2 3 ......
RowB 3 4 6 ......
If a do a xtable on this, and run it through Sweave,
xtable(table1, caption="some table")
it overflows.
What I am looking for is something like,
ColA ColB ColC
---------------------------
RowA 1 2 3
RowB 3 4 6
ColD ColE ColF
---------------------------
RowA 11 9 34
RowB 36 8 65
with the \hline etc markups. Basically, split the xtable into parts by say 5 columns per "sub-table".
I am also running this in a batch job, so I won't be able to make changes to individual files, whatever the solution it has to be able to be generated by running Sweave on the Rnw file.
Thanks in advance,
Regards,
Raj.

Here's an example of this from ?latex.table.by in the taRifx package. You can brew something similar using longtable in LaTeX and use the latex.table.by code as a prototype.
my.test.df <- data.frame(grp=rep(c("A","B"),10),data=runif(20))
library(xtable)
latex.table.by(my.test.df)
# print(latex.table.by(test.df), include.rownames = FALSE, include.colnames = TRUE, sanitize.text.function = force)
# then add \usepackage{multirow} to the preamble of your LaTeX document
# for longtable support, add ,tabular.environment='longtable' to the print command (plus add in ,floating=FALSE), then \usepackage{longtable} to the LaTeX preamble
Regardless, the longtable package in LaTeX is the key.
Edit: It appears you have too many columns not too many rows. In that case, first try landscaping just that page.
In the header:
\usepackage{lscape}
Around your table:
\begin{landscape}
...
\end{landscape}
Or just use sidewaystable.
If your table is too wide to fit in one page, try the supertabular package, which from the description sounds like it might handle breaking over multiple pages based on width (but I've never used it so can't be sure).

Related

r exporting summarise results to html or word

After going to find how to summarize a DataFrame I did it.
I can see the results in my Console which is what is shown below after the first two lines of code
byTue <- group_by(luckyloss.3,L_byUXR)
( sumMon <- summarize(byTue,count=n()) )
Below is what I see on the Console It feels good because it shows I got what I was looking for
The results below come from a column of 234 rows which has many values repeated.
So this I did a summarise of the 234 rows where in the case of ANA comes 8 times, ARI 14 and so on
# A tibble: 30 × 2
L_byUXR count
<chr> <int>
1 ANA 8
2 ARI 14
3 ATL 16
4 BAL 4
5 BOS 6
6 CHA 12
7 CHN 8
8 CIN 10
9 CLE 4
10 COL 8
# ... with 20 more rows
What I want is to have this output of 30 rows by two columns in a way I can take it to a word document or could even be HTML
I tried to do a write(byTUE.csv) but what I received was the list of 234 rows of the original data frame. It's like the summarise disappeared, I have checked other ways like markdown or create new files tried to see if the knitr package could help but nothing.
library(stringi) # ONLY NECESSARY FOR DATA SIMULATION
library(officer) # <<= install this
library(tidyverse)
Simulate some data:
set.seed(2017-11-18)
data_frame(
L_byUXR = stri_rand_strings(30, 3, pattern="[A-Z]"),
count = sample(20, 30, replace=TRUE)
) -> sumMon
Start a new Word doc and add the table, saving to a new doc:
read_docx() %>% # a new, empty document
body_add_table(sumMon, style = "table_template") %>%
print(target="new.docx")
I kept looking for an answer and found the "stargazer" package for R, which allowed me to get the result of the dataframe as a text which can be further edited
When you write the R instruction, in "out = ", name the file you want as output and stargazer will place it there for you in your session's folder
The instruction I used was:
stargazer(count, type = "text", summary = FALSE, title="Any Title", digits=1, out="table1.txt")
Even though I found the answer I could not have done it without the help of hrbrmstr who showed me there was a package do do it, I just needed to work more on it

R Markdown Table Sizing Inconsistent [duplicate]

This question already has answers here:
Adjusting width of tables made with kable() in RMarkdown documents
(5 answers)
Closed 5 years ago.
When I output my R Markdown (knitr / RStudio) to html the following table stretches the full width of the browser you view it in. It's only two columns and looks rather odd stretched out on a widescreen display.
Col1 | Col2
--- | ---
1 | 1
1349 | 143910
This same table shown below, same syntax, correctly limits the width of Column 1 to the width of its own contents. The only difference is the cell contents of position [2, 2] are extremely long.
Col1 | Col2
--- | ---
1 | 1
1349 | 143910143910143910143910143910143910143910143910143910
How do I force knitr or pandoc or R or whatever to limit column width to only slightly larger than the columns contents. Why did the extreme number of characters in cell [2, 2] force my output to behave as I wish? I didn't involve any CSS in the second table and prefer not to mess around with CSS.
I suggest you use kable from the knitr package and kable_styling from the kableExtra package.
Supposing your dataframe is df
kable(df, "html") %>%
kable_styling(full_width = F)
You can find more info here.
https://www.rdocumentation.org/packages/kableExtra/versions/0.6.1/topics/kable_styling
EDIT: eipi10´S clarifications on packages. (Thanks!)

pdf style using R objects and Rmarkdown (template style)

Let's assume I want to generate a template using R markdown. Let's also assume I have some R objects I want to paste in the Rmarkdown document.
For example, the R objects I have are:
dose = 10
units = "mg"
If I want to write a sentence like:
The dose administered was 10 mg every 3 days.
I can use:
```{r}
paste("The dose administered was",dose,units,"every 3 days.")
```
However, the output will be:
## [1] The dose administered was 10 mg every 3 days.
I know I can remove the "##" using comment=NA.
Is there any way to remove also "[1]"?
Is there any other and more efficient way to insert R objects with text using R markdown?
Thanks in advance,
Don't use an R chunk, use inline code
The dose administered was `r paste(dose, units)` every 3 days.
The function cat() makes the job :)

R code chunk printing extra line in Markdown

I'm creating a data analysis report using Markdown, knitr.
When I run a code chunk containing a table,
addmargins(table(x$gender, exclude=NULL))
This is what I get:
##
## Female Male <NA> Sum
## 49 53 0 102
This is what I want:
## Female Male <NA> Sum
## 49 53 0 102
Markdown naturally outputs a lot of white space, and I'm trying to provide as condensed an output as possible since these reports need to be printed. These extra lines add up to be a lot of extra pages.
As far as I've seen, this seems to happen only with tables, and not with other code. It seems that table() is causing the problem by inserting the extra line above the table. Any way to disable this quirk?
I believe table() is printing a blank line for your dimension names. If you specify dnn=NULL, it should go away.
addmargins(table(x$gender, exclude=NULL, dnn=NULL))

Mixing other languages with R

I use R for most of my statistical analysis. However, cleaning/processing data, especially when dealing with sizes of 1Gb+, is quite cumbersome. So I use common UNIX tools for that. But my question is, is it possible to, say, run them interactively in the middle of an R session? An example: Let's say file1 is the output dataset from an R processes, with 100 rows. From this, for my next R process, I need a specific subset of columns 1 and 2, file2, which can be easily extracted through cut and awk. So the workflow is something like:
Some R process => file1
cut --fields=1,2 <file1 | awk something something >file2
Next R process using file2
Apologies in advance if this is a foolish question.
Try this (adding other read.table arguments if needed):
# 1
DF <- read.table(pipe("cut -fields=1,2 < data.txt| awk something_else"))
or in pure R:
# 2
DF <- read.table("data.txt")[1:2]
or to not even read the unwanted fields assuming there are 4 fields:
# 3
DF <- read.table("data.txt", colClasses = c(NA, NA, "NULL", "NULL"))
The last line could be modified for the case where we know we want the first two fields but don't know how many other fields there are:
# 3a
n <- count.fields("data.txt")[1]
read.table("data.txt", header = TRUE, colClasses = c(NA, NA, rep("NULL", n-2)))
The sqldf package can be used. In this example we assume a csv file, data.csv and that the desired fields are called a and b . If its not a csv file then use appropriate arguments to read.csv.sql to specify other separator, etc. :
# 4
library(sqldf)
DF <- read.csv.sql("data.csv", sql = "select a, b from file")
I think you may be looking for littler which integrates R into the Unix command-line pipelines.
Here is a simple example computing the file size distribution of of /bin:
edd#max:~/svn/littler/examples$ ls -l /bin/ | awk '{print $5}' | ./fsizes.r
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
4 5736 23580 61180 55820 1965000 1
The decimal point is 5 digit(s) to the right of the |
0 | 00000000000000000000000000000000111111111111111111111111111122222222+36
1 | 01111112233459
2 | 3
3 | 15
4 |
5 |
6 |
7 |
8 |
9 | 5
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 | 6
edd#max:~/svn/littler/examples$
and it takes for that is three lines:
edd#max:~/svn/littler/examples$ cat fsizes.r
#!/usr/bin/r -i
fsizes <- as.integer(readLines())
print(summary(fsizes))
stem(fsizes)
See ?system for how to run shell commands from within R.
Staying in the tradition of literate programming, using e.g. org-mode and org-babel will do the job perfectly:
You can combine several different programming languages in one script and execute then separate, in sequence, export the results or the code, ...
It is a little bit like sweave, only that the code blocks can by python, bash, R, sql, and numerous other. Check t out: org-mode and bable and an example using different programming languages
Apart from that, I think org-mode and babel is the perfect way of writing even pure R scripts.
Preparing data before working with it in R is quite common, and I have a lot of scripts for Unix and Perl pre-processing, and have, at various times, maintained scripts/programs for MySQL, MongoDB, Hadoop, C, etc. for pre-processing.
However, you may get better mileage for portability if you do some kinds of pre-processing in R. You might try asking new questions focused on some of these particulars. For instance, to load large amounts of data into memory mapped files, I seem to evangelize bigmemory. Another example is found in the answers (especially JD Long's) to this question.

Resources