knitr's kable is printing 2.29e-30 as "0" - r

CODE:
# some data
dat <-
data.frame(
log2fc = c(0.28, 10.82, 8.54, 5.64, 8.79, 6.46),
pvalue = c(0.00e+00, 2.29e-30, 7.02e-30, 4.14e-29, 1.86e-28, 1.78e-27)
)
# observe in markdown format
knitr::kable(dat, format="markdown")
OUTPUT:
| log2fc| pvalue|
|------:|------:|
| 0.28| 0|
| 10.82| 0|
| 8.54| 0|
| 5.64| 0|
| 8.79| 0|
| 6.46| 0|
PROBLEM:
The problem with the output is that, it is rendering the last column pvalue as zeros. But I would want to retain the same format as I see in my dataframe. How do I do that ? I've tried several solutions from various threads but nothing seems to work. Can someone point me to the right direction ?
Please do not suggest me to convert the pvalue column into a character vector. That is a quick and dirty solution that works, but I don't want to do that because:
I don't want to mess around with my dataframe.
I am interested in the reason for why the scientific format of the last column is not being retained while printing it in markdown.
I have many tables each with various columns with scientific format, I am looking for a way that automatically handles this issue.

kable() calls the base R function round(), which truncates those small values to zero unless you set digits to a really large value. But you can do that, e.g.
knitr::kable(dat, format = "markdown", digits = 32)
which gives
| log2fc| pvalue|
|------:|--------:|
| 0.28| 0.00e+00|
| 10.82| 2.29e-30|
| 8.54| 7.02e-30|
| 5.64| 4.14e-29|
| 8.79| 1.86e-28|
| 6.46| 1.78e-27|
If you do want the regular rounding in some columns, you can specify multiple values for digits, e.g.
knitr::kable(dat, format = "markdown", digits = c(1, 32))
| log2fc| pvalue|
|------:|--------:|
| 0.3| 0.00e+00|
| 10.8| 2.29e-30|
| 8.5| 7.02e-30|
| 5.6| 4.14e-29|
| 8.8| 1.86e-28|
| 6.5| 1.78e-27|

Related

How to match two columns in one dataframe using values in another dataframe in R

I have two dataframes. One is a set of ≈4000 entries that looks similar to this:
| grade_col1 | grade_col2 |
| --- | --- |
| A-| A-|
| B | 86|
| C+| C+|
| B-| D |
| A | A |
| C-| 72|
| F | 96|
| B+| B+|
| B | B |
| A-| A-|
The other is a set of ≈700 entries that look similar to this:
| grade | scale |
| --- | --- |
| A+|100|
| A+| 99|
| A+| 98|
| A+| 97|
| A | 96|
| A | 95|
| A | 94|
| A | 93|
| A-| 92|
| A-| 91|
| A-| 90|
| B+| 89|
| B+| 88|
...and so on.
What I'm trying to do is create a new column that shows whether grade_col2 matches grade_col1 with a binary, 0-1 output (0 = no match, 1 = match). Most of grade_col2 is shown by letter grade. But every once in awhile an entry in grade_col2 was accidentally entered as a numeric grade instead. I want this match column to give me a "1" even when grade_col2 is a numeric grade instead of a letter grade. In other words, if grade_col1 is B and grade_col2 is 86, I want this to still be read as a match. Only when grade_col1 is F and grade_col2 is 96 would this not be a match (similar to when grade_col1 is B- and grade_col2 is D = not a match).
The second data frame gives me the information I need to translate between one and the other (entries between 97-100 are A+, between 93-96 are A, and so on). I just don't know how to run a script that uses this information to find matches through all ≈4000 entries. Theoretically, I could do this manually, but the real dataset is so lengthy that this isn't realistic.
I had been thinking of using nested if_else statements with dplyr. But once I got past the first "if" statement, I got stuck. I'd appreciate any help with this people can offer.
You can do this using a join.
Let your first dataframe be grades_df and your second dataframe be lookup_df, then you want something like the following:
output = grades_df %>%
# join on look up, keeping everything grades table
left_join(lookup_df, by = c(grade_col2 = "scale")) %>%
# combine grade_col2 from grades_df and grade from lookup_df
mutate(grade_col2b = ifelse(is.na(grade), grade_col2, grade)) %>%
# indicator column
mutate(indicator = ifelse(grade_col1 == grade_col2b, 1, 0))

Proxy for Excel's split cell in R

Apologies in advance if anyone finds this to be a duplicate to a question answered before. I haven't found anything so here it is:
I have a 3x3 contingency table I made in RStudio (I am specifying this as a data frame below but I can also produce this as as.matrix, if that'll work better):
mat.s=data.frame("WT(H)"=11,"DEL(H)"=2)
mat.s[2,1]=13
mat.s[2,2]=500369
row.names(mat.s)=c("DEL(T)", "WT(T)")
mat.s=cbind(mat.s, Total=rowSums(mat.s))
mat.s=rbind(mat.s, Total=colSums(mat.s))
which looks like:
kable(mat.s)
| | WT.H.| DEL.H.| Total|
|:------|-----:|------:|------:|
|DEL(T) | 11| 2| 13|
|WT(T) | 13| 500369| 500382|
|Total | 24| 500371| 500395|
However, if I wanted to split a cell in this table (like you can do in Excel) into two, how would I do that? So I'd like to get something like the following when I render the document with kable:
| | WT.H.| DEL.H.| Total|
|:------|-----:|------:|------:|
|DEL(T) | S D | 2| 13|
| | 8 3 | | |
|WT(T) | 13| 500369| 500382|
|Total | 24| 500371| 500395|
So that when I want to calculate something from this table, I can call the split 8 or 3. Sorry if this is something very simple and easy to do! Still learning. Thanks!

How to separate out letters in a sentence using R

I have a character vector that is a string of letters and punctuation. I want to create a data frame where each column is made up of a letter/character from this string.
e.g.
Character string = I WENT TO THE FAIR
Dataframe = | I | | W | E | N | T | | T | O | | T | H | E | | F | A | I | R |
I thought I could do this using a loop with substr, but I can't work out how to get R to write into separate columns, rather than just writing over the previous letter. I'm new to writing loops etc so struggling a bit to get my head around the way in which to compose what I need.
Thanks for any help and advice that you can offer.
Best wishes,
Natalie
This should get that result
string <- "I WENT TO THE FAIR"
df <- as.data.frame(t(as.data.frame(strsplit(string,""))), row.names = "1")

kable function: "id" in the columns

When I trying print table with knitr::kable function "id" word apperas in the column names. How can I change it?
Example:
> x <- structure(c(42.3076923076923, 53.8461538461538, 96.1538461538462,
2.56410256410256, 1.28205128205128, 3.84615384615385,
44.8717948717949, 55.1282051282051, 100),
.Dim = c(3L, 3L),
.Dimnames = structure(list(Condition1 = c("Yes", "No", "Sum"),
Condition2 = c("Yes", "No", "Sum")),
.Names = c("Condition1", "Condition2")), class = c("table", "matrix"))
> print(x)
Condition2
Condition1 Yes No Sum
Yes 42,31 2,56 44,87
No 53,85 1,28 55,13
Sum 96,15 3,85 100,00
> library(knitr)
> kable(x)
|id | Yes| No| Sum|
|:----|-----:|-----:|------:|
|Yes | 42,3| 2,56| 44,9|
|No | 53,8| 1,28| 55,1|
|Sum | 96,2| 3,85| 100,0|
Edit: I find reason of this behavior in the knitr:::kable_mark function. But now I not understand how to make it more flexible.
An alternative to kable might be the general S3 method of pander:
> library(pander)
> pander(x, style = 'rmarkdown')
| | Yes | No | Sum |
|:---------:|:-----:|:-----:|:-----:|
| **Yes** | 42.31 | 2.564 | 44.87 |
| **No** | 53.85 | 1.282 | 55.13 |
| **Sum** | 96.15 | 3.846 | 100 |
If you need to set the decimal mark to comma, then set the relevant option before and use that in your R session:
> panderOptions('decimal.mark', ',')
> pander(x, style = 'rmarkdown')
| | Yes | No | Sum |
|:---------:|:-----:|:-----:|:-----:|
| **Yes** | 42,31 | 2,564 | 44,87 |
| **No** | 53,85 | 1,282 | 55,13 |
| **Sum** | 96,15 | 3,846 | 100 |
There are also some other possible tweaks: http://rapporter.github.io/pander/#pander-options
I think the easiest way is to rip out and replace kable_mark completely. Note: this is quite dirty – but it seems to work, and there is no current way to customise how kable_mark works (you could submit a patch to knitr though).
km <- edit(knitr:::kable_mark)
# Now edit the code and remove lines 7 and 8.
unlockBinding('kable_mark', environment(knitr:::kable_mark))
assign('kable_mark', km, envir=environment(knitr:::kable_mark))
Explanation: First we edit the function and store the amended definition in a temporary variable. We remove the two lines
if (grepl("^\\s*$", cn[1L]))
cn[1L] = "id"
… of course you can also hard-code the amended function rather than editing it, or change the function around completely.
Next we use unlockBinding to make knitr:::kable_mark overridable. If we don’t do this, the next assign command wouldn’t work.
Finally, we assign the patched function back to knitr:::kable_mark. Done.

Creating a unique integer on the basis of a string

I have a larger dataset (data.table with approx 9m rows) with a column that I would like to use to aggregate values (min and max etc). The column is a combination of various other columns and has a string based format, like the one below:
string <- "318XXXX | VNSGN | BIER"
To gain some speed in performing tasks, I would like to recode this to a unique integer. Another application that I use on a regular basis to deal with data has a build-in function that transforms a string as the one above in a integer (e.g. 73823). I was wondering whether there is a similar function in R? The idea is that a particular string will always result in the same integer; this will allow it to be used in merging data.tables etc.
Here a little example of the data.table column that I would like to encode in simple integer values:
sample <- c("318XXXX | VNSGN | BIER", "462XXXX | TZZZH | 9905", "462XXXX | TZZZH | 9905",
"462XXXX | TZZZH | 9905", "511XXXX | FAWOR | 336H", "511XXXX | FAWOR | 336H",
"652XXXX | XXXXR | T136", "652XXXX | XXXXR | T136", "672XXXX | BQQSZ | 7777",
"672XXXX | BQQSZ | 7777")
I am hoping to encode the strings into an additional column to the table like the one below; note that the same strings result in the same numbers.
String Number
318XXXX | VNSGN | BIER 19872
462XXXX | TZZZH | 9905 78392
462XXXX | TZZZH | 9905 78392
462XXXX | TZZZH | 9905 78392
511XXXX | FAWOR | 336H 23053
511XXXX | FAWOR | 336H 23053
652XXXX | XXXXR | T136 95832
652XXXX | XXXXR | T136 95832
672XXXX | BQQSZ | 7777 71829
672XXXX | BQQSZ | 7777 71829
The data.table package will create indexes for you without making you handle them explicitly so it would be less work than the approach in the question. See the setkey function in data.table.
Also the sqldf package can use the SQL create index statement as per Examples 4h and 4i on the sqldf home page as can just about any database package.

Resources