R code chunk printing extra line in Markdown - r

I'm creating a data analysis report using Markdown, knitr.
When I run a code chunk containing a table,
addmargins(table(x$gender, exclude=NULL))
This is what I get:
##
## Female Male <NA> Sum
## 49 53 0 102
This is what I want:
## Female Male <NA> Sum
## 49 53 0 102
Markdown naturally outputs a lot of white space, and I'm trying to provide as condensed an output as possible since these reports need to be printed. These extra lines add up to be a lot of extra pages.
As far as I've seen, this seems to happen only with tables, and not with other code. It seems that table() is causing the problem by inserting the extra line above the table. Any way to disable this quirk?

I believe table() is printing a blank line for your dimension names. If you specify dnn=NULL, it should go away.
addmargins(table(x$gender, exclude=NULL, dnn=NULL))

Related

R markdown export diacritics

I am trying to export a summary of categorical variables through the R markdown. The output of the summary is in the Czech language with diacritics, but the R isn't able to encode it.
Example
summary(data$Et6_d1q_ii)
3 a více hodin 1 - 2 hodiny méně než 1 hodinu žádný NA's
113 240 196 111 6932
Is there any way how to set the endocing globally so the output is readable? I wasn't able to find it anywhere.
Thank you!

Memory management in R ComplexUpset Package

I'm trying to plot an stacked barplot inside an upset-plot using the ComplexUpset package. The plot I'd like to get looks something like this (where mpaa would be component in my example):
I have a dataframe of size 57244 by 21, where one column is ID and the other is type of recording, and other 19 columns are components from 1 to 19:
ID component1 component2 ... component19 type
1 1 0 1 a
2 0 0 1 b
3 1 1 0 b
Ones and zeros indicate affiliation with a certain component. As shown in the example in the docs, I first convert these ones and zeros to logical, and then try to plot the basic upset plot. Here's the code:
df <- df %>% mutate(across(where(is.numeric), as.logical))
components <- colnames(df)[2:20]
upset(df, components, name='protein', width_ratio = 0.1)
But unfortunately after thinking for a while when processing the last line it spits out an error message like this:
Error: cannot allocate vector of size 176.2 Mb
Though I know I'm using the 32Gb RAM architecture, I'm sure I couldn't have flooded the memory so much that 167 Mb can't be allocated, so my guess is I am managing memory in R somehow wrong. Could you please explein what's faulty in my code, if possible.
I also know that UpsetR package plots the same data, but as far as i know it provides no way for the stacked barplotting.
Somehow, it works if you:
Tweak the min_size parameter so that the plot is not overloaded and makes a better impression
Making the first argument of ComplexUpset a sample with some data also helps, even if your sample is the whole dataset.

Why is R adding empty factors to my data?

I have a simple data set in R -- 2 conditions called "COND", and within those conditions adults chose between one of 2 pictures, we call house or car. This variable is called "SAW"
I have 69 people, and 69 rows of data
FOR SOME Reason -- R is adding an empty factor to both, How do I get rid of it?
When I type table to see how many are in each-- this is the output
table(MazeData$SAW)
car house
2 9 59
table(MazeData$COND)
Apples No_Apples
2 35 33
Where the heck are these 2 mystery rows coming from? it wont let me make my simple box plots and bar plots or run t.test because of this error - can someone help? thanks!!

R readr package - written and read in file doesn't match source

I apologize in advance for the somewhat lack of reproducibility here. I am doing an analysis on a very large (for me) dataset. It is from the CMS Open Payments database.
There are four files I downloaded from that website, read into R using readr, then manipulated a bit to make them smaller (column removal), and then stuck them all together using rbind. I would like to write my pared down file out to an external hard drive so I don't have to read in all the data each time I want to work on it and doing the paring then. (Obviously, its all scripted but, it takes about 45 minutes to do this so I'd like to avoid it if possible.)
So I wrote out the data and read it in, but now I am getting different results. Below is about as close as I can get to a good example. The data is named sa_all. There is a column in the table for the source. It can only take on two values: gen or res. It is a column that is actually added as part of the analysis, not one that comes in the data.
table(sa_all$src)
gen res
14837291 822559
So I save the sa_all dataframe into a CSV file.
write.csv(sa_all, 'D:\\Open_Payments\\data\\written_files\\sa_all.csv',
row.names = FALSE)
Then I open it:
sa_all2 <- read_csv('D:\\Open_Payments\\data\\written_files\\sa_all.csv')
table(sa_all2$src)
g gen res
1 14837289 822559
I did receive the following parsing warnings.
Warning: 4 parsing failures.
row col expected actual
5454739 pmt_nature embedded null
7849361 src delimiter or quote 2
7849361 src embedded null
7849361 NA 28 columns 54 columns
Since I manually add the src column and it can only take on two values, I don't see how this could cause any parsing errors.
Has anyone had any similar problems using readr? Thank you.
Just to follow up on the comment:
write_csv(sa_all, 'D:\\Open_Payments\\data\\written_files\\sa_all.csv')
sa_all2a <- read_csv('D:\\Open_Payments\\data\\written_files\\sa_all.csv')
Warning: 83 parsing failures.
row col expected actual
1535657 drug2 embedded null
1535657 NA 28 columns 25 columns
1535748 drug1 embedded null
1535748 year an integer No
1535748 NA 28 columns 27 columns
Even more parsing errors and it looks like some columns are getting shuffled entirely:
table(sa_all2a$src)
100000000278 Allergan Inc. gen GlaxoSmithKline, LLC.
1 1 14837267 1
No res
1 822559
There are columns for manufacturer names and it looks like those are leaking into the src column when I use the write_csv function.

Split xtable ouput into sub tables

Have a question on using xtable with Sweave when there are multiple columns. A table I am working on has about 25 columns and 5 rows. The exact number of columns is not known as that is dynamic.
When I run say,
table1 <- table (df$someField)
I get a table that essentially exceeds the page length.
ColA ColB ColC
---------------------------
RowA 1 2 3 ......
RowB 3 4 6 ......
If a do a xtable on this, and run it through Sweave,
xtable(table1, caption="some table")
it overflows.
What I am looking for is something like,
ColA ColB ColC
---------------------------
RowA 1 2 3
RowB 3 4 6
ColD ColE ColF
---------------------------
RowA 11 9 34
RowB 36 8 65
with the \hline etc markups. Basically, split the xtable into parts by say 5 columns per "sub-table".
I am also running this in a batch job, so I won't be able to make changes to individual files, whatever the solution it has to be able to be generated by running Sweave on the Rnw file.
Thanks in advance,
Regards,
Raj.
Here's an example of this from ?latex.table.by in the taRifx package. You can brew something similar using longtable in LaTeX and use the latex.table.by code as a prototype.
my.test.df <- data.frame(grp=rep(c("A","B"),10),data=runif(20))
library(xtable)
latex.table.by(my.test.df)
# print(latex.table.by(test.df), include.rownames = FALSE, include.colnames = TRUE, sanitize.text.function = force)
# then add \usepackage{multirow} to the preamble of your LaTeX document
# for longtable support, add ,tabular.environment='longtable' to the print command (plus add in ,floating=FALSE), then \usepackage{longtable} to the LaTeX preamble
Regardless, the longtable package in LaTeX is the key.
Edit: It appears you have too many columns not too many rows. In that case, first try landscaping just that page.
In the header:
\usepackage{lscape}
Around your table:
\begin{landscape}
...
\end{landscape}
Or just use sidewaystable.
If your table is too wide to fit in one page, try the supertabular package, which from the description sounds like it might handle breaking over multiple pages based on width (but I've never used it so can't be sure).

Resources