Insert blank lines in kable - r

I'm tabling groups of rows in a kable. Each group contains data for one group, with between 3 and 5 rows per group. I want to leave blank lines in the table between groups for readability, but can't get it to work.
I put in a row of all NA, and then set options(knitr.kable.NA=""). This works OK when printed in the console, as here:
|C.01.C.00522 | 3| 1203| 0.043| -0.096| -16.441|
|C.01.C.00522 | 4| 8364| 0.298| 0.159| 31.765|
|C.01.C.00522 | 5| 3494| 0.124| -0.014| -2.588|
| | | | | | |
|C.02.A.00577 | 1| 2496| 0.089| -0.014| -2.410|
|C.02.A.00577 | 2| 1975| 0.070| -0.032| -5.609|
|C.02.A.00577 | 3| 3400| 0.121| 0.018| 3.297|
But in the rendered pdf document there one table for the first group, and then all unformatted lines after that.
C.01.C.00522 3 1203 0.043 -0.096 -16.441 C.01.C.00522 4 8364 0.298 0.159 31.765 C.01.C.00522 5 3494 0.124
-0.014 -2.588
C.02.A.00577 1 2496 0.089 -0.014 -2.410 C.02.A.00577 2 1975 0.070 -0.032 -5.609
I also tried options(knitr.kable.NA='.') and this produces a properly formatted table, but all the dots are a little annoying.
Any ideas?

Thank you Imran for mentioning kableExtra. In kableExtra 0.3 which I released last week, a new function called collapse_rows may do some help in this case.
dt <-data.frame(id =c(rep("C.01.C.00522", 3),rep("C.02.A.00577", 3)),var1 =c(3,4,5,1,2,3), var2 =c(1203, 8364, 3494, 2496, 1975, 3400))
kable(dt, "latex", booktabs = T) %>%
collapse_rows(columns = 1)

Related

Is there an elegant pyspark solution for the following ranking problem?

is there an elegant function that exists for the following problem?
I've been tasked to create a function to determine the differences in days and rank the values. The closest positive number would rank as 0 and be the 'starting point'. From the starting point, depending on whether the ranked value is negative, or non-negative, the function would assign a rank to the value either positive or negative respectively.
Datediff()
Rank
-50
-3
-32
-2
-1
-1
5
0
14
1
32
2
128
3
254
4
My solution so far would be to separate the negative and positive numbers and use the window.partitionBy() function to assign the correlating rank. It would work, but I'm curious for a more elegant solution. :)
You can use Window to generate serial numbers to use as rank:
from pyspark.sql.functions import col, row_number
from pyspark.sql.window import Window
df = spark.createDataFrame([(10,),(-10,),(5,),(-5,),(15,),(-15,)], ["dated_diff"])
window = Window.orderBy(df["dated_diff"])
df = df.select("dated_diff", row_number().over(window).alias("row_number"))
df.show()
+----------+----------+
|dated_diff|row_number|
+----------+----------+
| -15| 1|
| -10| 2|
| -5| 3|
| 5| 4|
| 10| 5|
| 15| 6|
+----------+----------+
Then find rank of first positive number:
first_positive_rank = df.filter("dated_diff>=0").first()["row_number"]
print(first_positive_rank)
>> 4
OR
first_positive_rank = df.filter("dated_diff<0").count() + 1
print(first_positive_rank)
>> 4
And finally subtract that rank from rank of all:
df = df.withColumn("row_number", col("row_number") - first_positive_rank)
df.show()
+----------+----------+
|dated_diff|row_number|
+----------+----------+
| -15| -3|
| -10| -2|
| -5| -1|
| 5| 0|
| 10| 1|
| 15| 2|
+----------+----------+

How to combine countpct and binomCI into the same summary statistic to be used in tableby function?

I'm using the tableby function from the arsenal package to create summary tables. For most of the statistics I need to generate, this package gives me exactly the format I'm asked except for one. I need to get in the same cell something like this:
n (%) [95%CI of the percentage]
For now, I'm using the countpct function which gives me the "n (%)" and binomCI which gives me the proportion with 95%CI but it doubles the number of rows in my final table so it's not ideal...
How could I do to have everything on the same line ?
I tried to see if I could create another function from the original ones but I don't really understand their syntax...
Thanks for your help.
EDIT : Here is a reproducible example.
Code for the original functions can be found here.
So this is what I have now :
data<-NULL
data$Visit2<-c(rep("Responder",121),rep("Not Responder",29),rep("Responder",4),rep("Not Responder",47))
data$Group<-c(rep("Tx",150),rep("No Tx",51))
data<-as.data.frame(data)
library(arsenal)
my_controls <- tableby.control(test = F,total = F, cat.stats = c("countpct" ,"binomCI"), conf.level = 0.95)
summary(tableby(Group ~ Visit2,
data = data,
control = my_controls),
digits=2, digits.p=3, digits.pct=1)
# Results :
| | No Tx (N=51) | Tx (N=150) |
|:-------------------------------|:-----------------:|:-----------------:|
|**Visit2** | | |
| Not Responder | 47 (92.2%) | 29 (19.3%) |
| Responder | 4 (7.8%) | 121 (80.7%) |
| Not Responder | 0.92 (0.81, 0.98) | 0.19 (0.13, 0.27) |
| Responder | 0.08 (0.02, 0.19) | 0.81 (0.73, 0.87) |
And this is what I want :
| | No Tx (N=51) | Tx (N=150) |
|:----------------|:-------------------------:|:------------------------:|
|**Visit2** | |
| Not Responder | 47 (92.2%) [81.1, 97.8] | 29 (19.3%) [13.3, 26.6] |
| Responder | 4 (7.8%) [2.2, 18.9] | 121 (80.7%) [73.4, 86.7] |
|

Control digits in specific cells

I have a table that looks like this:
+-----------------------------------+-------+--------+------+
| | Male | Female | n |
+-----------------------------------+-------+--------+------+
| way more than my fair share | 2,4 | 21,6 | 135 |
| a little more than my fair share | 5,4 | 38,1 | 244 |
| about my fair share | 54,0 | 35,3 | 491 |
| a littles less than my fair share | 25,1 | 3,0 | 153 |
| way less than my fair share | 8,7 | 0,7 | 51 |
| Can't say | 4,4 | 1,2 | 31 |
| n | 541,0 | 564,0 | 1105 |
+-----------------------------------+-------+--------+------+
Everything is fine but what I would like to do is to show no digits in the last row at all since they show the margins (real cases). Is there any chance in R I can manipulate specific cells and their digits?
Thanks!
You could use ifelse to output the numbers in different formats in different rows, as in the example below. However, it will take some additional finagling to get the values in the last row to line up by place value with the previous rows:
library(knitr)
library(tidyverse)
# Fake data
set.seed(10)
dat = data.frame(category=c(LETTERS[1:6],"n"), replicate(3, rnorm(7, 100,20)))
dat %>%
mutate_if(is.numeric, funs(sprintf(ifelse(category=="n", "%1.0f", "%1.1f"), .))) %>%
kable(align="lrrr")
|category | X1| X2| X3|
|:--------|-----:|-----:|-----:|
|A | 100.4| 92.7| 114.8|
|B | 96.3| 67.5| 101.8|
|C | 72.6| 94.9| 80.9|
|D | 88.0| 122.0| 96.1|
|E | 105.9| 115.1| 118.5|
|F | 107.8| 95.2| 109.7|
|n | 76| 120| 88|
The huxtable package makes it easy to decimal-align the values (see the Vignette for more on table formatting):
library(huxtable)
tab = dat %>%
mutate_if(is.numeric, funs(sprintf(ifelse(category=="n", "%1.0f", "%1.1f"), .))) %>%
hux %>% add_colnames()
align(tab)[-1] = "."
tab
Here's what the PDF output looks like when knitted to PDF from an rmarkdown document:

How to replace 0's with blank in kables

I'm creating a rather large html table using the kable function, and this table has a lot of 0's in it. In order to only show the relevant information more clearly, I'm trying hide the 0's in the table by just replacing them with blank space.
Right now, I'm trying something like this but it's not working:
my_table = knitr::kable(...)
cat(gsub(0," ",my_table), sep = '\n')
Something similar to the above works to remove NA's, but I can't seem to get it to work for 0's.
Thanks in advance!
EDIT: example data:
Product = c('A','B','A','A','C','B')
Month = c('Jan', 'Feb', 'Feb', 'Apr', 'Jan', 'Feb')
my_data = data.frame(Product, Month)
my_table = table(my_data)
kable(my_table) #This has the 0's which I don't want
Product | Month
A | Jan
B | Feb
A | Feb
A | Apr
C | Jan
B | Feb
Current output:
----Jan Feb Mar Apr
A 1 1 0 1
B 0 2 0 0
C 1 0 0 0
Desired output:
----Jan Feb Mar Apr
A 1 1 - 1
B - 2 - -
C 1 - - -
except "-" would be a blank space instead of a dash
EDIT2: never mind, I figured it out even though this is really hacky:
my_kable = knitr::kable(my_table)
gsub(0, ' ', my_kable)
lol
The reason your original gsub wasn't working was that it was flattening the table to a vector. One of many options to maintain the table structure would be to use the replace function:
knitr::kable(replace(my_table, my_table==0, ""))
#| |Apr |Feb |Jan |
#|:--|:---|:---|:---|
#|A |1 |1 |1 |
#|B | |2 | |
#|C | | |1 |
You can use base R gsub():
gsub(0, " ", kable(my_table))
To get:
| | Apr| Feb| Jan|
|:--|---:|---:|---:|
|A | 1| 1| 1|
|B | | 2| |
|C | | | 1|
You can try:
gsub(" 0", " ", kable(my_table))

kable function: "id" in the columns

When I trying print table with knitr::kable function "id" word apperas in the column names. How can I change it?
Example:
> x <- structure(c(42.3076923076923, 53.8461538461538, 96.1538461538462,
2.56410256410256, 1.28205128205128, 3.84615384615385,
44.8717948717949, 55.1282051282051, 100),
.Dim = c(3L, 3L),
.Dimnames = structure(list(Condition1 = c("Yes", "No", "Sum"),
Condition2 = c("Yes", "No", "Sum")),
.Names = c("Condition1", "Condition2")), class = c("table", "matrix"))
> print(x)
Condition2
Condition1 Yes No Sum
Yes 42,31 2,56 44,87
No 53,85 1,28 55,13
Sum 96,15 3,85 100,00
> library(knitr)
> kable(x)
|id | Yes| No| Sum|
|:----|-----:|-----:|------:|
|Yes | 42,3| 2,56| 44,9|
|No | 53,8| 1,28| 55,1|
|Sum | 96,2| 3,85| 100,0|
Edit: I find reason of this behavior in the knitr:::kable_mark function. But now I not understand how to make it more flexible.
An alternative to kable might be the general S3 method of pander:
> library(pander)
> pander(x, style = 'rmarkdown')
| | Yes | No | Sum |
|:---------:|:-----:|:-----:|:-----:|
| **Yes** | 42.31 | 2.564 | 44.87 |
| **No** | 53.85 | 1.282 | 55.13 |
| **Sum** | 96.15 | 3.846 | 100 |
If you need to set the decimal mark to comma, then set the relevant option before and use that in your R session:
> panderOptions('decimal.mark', ',')
> pander(x, style = 'rmarkdown')
| | Yes | No | Sum |
|:---------:|:-----:|:-----:|:-----:|
| **Yes** | 42,31 | 2,564 | 44,87 |
| **No** | 53,85 | 1,282 | 55,13 |
| **Sum** | 96,15 | 3,846 | 100 |
There are also some other possible tweaks: http://rapporter.github.io/pander/#pander-options
I think the easiest way is to rip out and replace kable_mark completely. Note: this is quite dirty – but it seems to work, and there is no current way to customise how kable_mark works (you could submit a patch to knitr though).
km <- edit(knitr:::kable_mark)
# Now edit the code and remove lines 7 and 8.
unlockBinding('kable_mark', environment(knitr:::kable_mark))
assign('kable_mark', km, envir=environment(knitr:::kable_mark))
Explanation: First we edit the function and store the amended definition in a temporary variable. We remove the two lines
if (grepl("^\\s*$", cn[1L]))
cn[1L] = "id"
… of course you can also hard-code the amended function rather than editing it, or change the function around completely.
Next we use unlockBinding to make knitr:::kable_mark overridable. If we don’t do this, the next assign command wouldn’t work.
Finally, we assign the patched function back to knitr:::kable_mark. Done.

Resources