writing a data.frame using cat - r

How can I add/append data.frame abc to the text file that I have opened previously. I am writing some important information to that file and then I want to append that data.frame below that information. I get an error when I try to write the data.frame abc using cat.
fileConn<-file("metadata.txt","w+")
smoke <- matrix(c(51,43,22,92,28,21,68,22,9),ncol=3,byrow=TRUE)
smoke <- as.data.frame(smoke)
table <- sapply (smoke, class)
abc <- data.frame(nm = names(smoke), cl = sapply(unname(smoke), class))
cat("some imp info","\n", file=fileConn)
cat(abc,"\n", file=fileConn)
close(fileConn)
class(abc)

Just use the standard tools for writing data.frame's, i.e. write.table:
write.table(abc, 'yourfile', append=TRUE) # plus whatever additional params

Try this
capture.output(abc, file = fileConn)

To make sure the output is readable, you could use also knitr::kable(). This will print your table as character, which has the advantage that you can embed it directly within the cat() call. It has lso several printing options (digits, align, row.names) etc that make it easy to control for how your table is printed:
tab <- knitr::kable(head(swiss))
cat("This is my file:",
"Some important note about it",
tab,
sep="\n")
#> This is my file:
#> Some important note about it
#> | | Fertility| Agriculture| Examination| Education| Catholic| Infant.Mortality|
#> |:------------|---------:|-----------:|-----------:|---------:|--------:|----------------:|
#> |Courtelary | 80.2| 17.0| 15| 12| 9.96| 22.2|
#> |Delemont | 83.1| 45.1| 6| 9| 84.84| 22.2|
#> |Franches-Mnt | 92.5| 39.7| 5| 5| 93.40| 20.2|
#> |Moutier | 85.8| 36.5| 12| 7| 33.77| 20.3|
#> |Neuveville | 76.9| 43.5| 17| 15| 5.16| 20.6|
#> |Porrentruy | 76.1| 35.3| 9| 7| 90.57| 26.6|

Related

RMarkdown: Statamarkdown produces undesired output when collectcode=TRUE

I'm using Statamarkdown to produce HTML documents using RMarkdown and Stata.
As documented here, each code chunk is executed as a separate Stata session. collectcode=TRUE is a chunk option to collect Stata code across chunks.
While this works neatly, the outputs of the second (and any further) chunks follwing the first with collectcode=TRUE contain an undesired echo at the top:
Running .......\profile.do
For instance, when running a second chunk with {stata stata2, echo = T,collectcode=TRUE}
reg mpg price i.foreign , noheader
yields this output:
reg mpg price i.foreign , noheader
Running C:\Cloud\Methods\prog\profile.do . reg mpg price i.foreign , noheader
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
price | -.000959 .0001815 -5.28 0.000 -.001321 -.000597
|
foreign |
Foreign | 5.245271 1.163592 4.51 0.000 2.925135 7.565407
_cons | 25.65058 1.271581 20.17 0.000 23.11512 28.18605
------------------------------------------------------------------------------
Here's my RMarkdown repex:
---
title: "Statamarkdown output problem"
output: html_document
---
```{r setup, include = F}
library(Statamarkdown)
```
First chunk is clean:
```{stata stata1,collectcode=TRUE}
sysuse auto
su mpg price
```
Second Stata Output contains undesired `Running .......\profile.do` output:
```{stata stata2, echo = T,collectcode=TRUE}
reg mpg price i.foreign , noheader
```
Problem persists even in chunks with `collectcode=FALSE`:
```{stata new_data, echo = T,collectcode=F}
webuse bpwide, clear
su sex agegrp
```
`cleanlog = F` does not do the trick:
```{stata new_data2, echo = T,collectcode=F, cleanlog = FALSE}
webuse bpwide, clear
su sex agegrp
```
Avoiding collectcode=T alltogether, i.e. load and preparing the data for each chunks would of course be a workaround, but extremely tedious.
I'm using R 3.6.3 and Stata 16.1 on a Windows machine.
Any ideas are very much appreciated!
It turns out Stata changed from
running .......\profile.do
to
Running .......\profile.do
A new version of the Statamarkdown package (0.5.0) accomodates this, now.

Sparklyr split strings

I have a file with several lines. For example
A B C
awer.ttp.net Code 554
abcd.ttp.net Code 747
asdf.ttp.net Part 554
xyz.ttp.net Part 747
I want to make a command in spark in R using sparklyr library with statement to split just column A of the table and I want a new row added to the table D, with values awer, abcd, asdf, and xyz.
I have tried
data_2 %>% sdf_mutate(node2=ft_regex_tokenizer(data_2, input.col = "A", output.col = "D", pattern="[.]")) %>% sdf_register("mutated")
And then I try
mut_trial %>% mutate(E=D[[1]])
Error in eval(expr, envir, enclos) : object 'D' not found.
I'm not sure if Im doing this the right way but wanted to see if there's any other function to use or if theres a way to fix this function to do what I want.
The code is in the scala spark hope you get the idea and convert it in SparkR
import spark.implicits._
val data = spark.sparkContext.parallelize(Seq(
("awer.ttp.net","Code", 554),
("abcd.ttp.net","Code", 747),
("asdf.ttp.net","Part", 554),
("xyz.ttp.net","Part", 747)
)).toDF("A","B","C")
data.withColumn("D", split($"A", "\\.")(0)).show(false)
Output:
+------------+----+---+----+
|A |B |C |D |
+------------+----+---+----+
|awer.ttp.net|Code|554|awer|
|abcd.ttp.net|Code|747|abcd|
|asdf.ttp.net|Part|554|asdf|
|xyz.ttp.net |Part|747|xyz |
+------------+----+---+----+
Hope this helped!

Pander formats tables weirdly when using significance stars and pandoc

If I run a linear regression with significance stars, render it through pander, and "Knit PDF" such as this:
pander(lm(crimerate ~ conscripted + birthyr + indigenous + naturalized, data = data), add.significance.stars = T)
I occasionally get output where there is weird spacing issues between rows in the output table.
I've tried setting pander options to report fewer digits panderOptions('digits', 2), but the problem persists.
Does anybody have any ideas?
I had the same problem. Something is wrong with the cell alignment, this error disappeared when i changed style to rmarkdown.
library(data.table)
dt <- data.table(Test = c("0 - 10 000"),
ALDT = "99.18 %")
First(space in table):
pandoc.table(dt, justify = c("left", "right"))
# From pandoc below
------------------
Test ALDT
---------- -------
0 - 10 000 99.18 %
------------------
Second(good formatting):
pandoc.table(dt, style = "rmarkdown", justify = c("left", "right"))
# From pandoc below
| Test | ALDT |
|:--------------|--------:|
| 0 - 10 000 | 99.18 % |
The first try doesn't work, something is wrong with the formatting pandoc gives us. But if you specify the style as rmarkdown it seems like the formatting is as it should be.

Knitr kable/pandoc/pander/geom_title/grid.table custom cell formatting

I would like to add symbols and letters before and after some numbers when using knitr's kable function, but do not know how to do this efficiently. I am however also willing to consider pandoc/pander if its is better/more efficient.
The end result should be an HTML table...or very good graphic of one....
Please see the following code as a mock reproducible example that is in a .Rmd file:
### Notional and Cumulative P&L
```{r echo=FALSE}
Notional <- 10000
yday_pnl <- -2942
wtd_pnl <- 2300
mtd_pnl <- -3334
ytd_pnl <- 5024
yday_rtn <- (yday_pnl/Notional)*10000
wtd_rtn <- (wtd_pnl/Notional)*10000
mtd_rtn <- (mtd_pnl/Notional)*10000
ytd_rtn <- (ytd_pnl/Notional)*10000
Value <- c(Notional,yday_pnl,wtd_pnl,mtd_pnl,ytd_pnl)
rtn <- c(NA,yday_rtn,wtd_rtn,mtd_rtn,ytd_rtn)
COB.basics <- as.data.frame(cbind(Value,rtn))
rownames(COB.basics) <- c('Notional','yday pnl','wtd_pnl','mtd_pnl','ytd_pnl')
```
```{r results='asis',echo=FALSE}
kable(COB.basics,digits=2)
```
So similar to Excel's format type of currency or accountancy I would like the value field to have the $ sign for the Value column, and for the rtn column I would like to have the string bps after the numbers...also for readability purposes is it possible to have commas after three digits if it is before the decimal point? i.e. to represent thousands etc.
Also is it possible to colour the cells? and also colour the text/numbers too? i.e. red for negative values?
Partial solution with pander:
Set "big mark" for pander so that it would be used for all numbers:
panderOptions('big.mark', ',')
You can also set the table syntax to rmarkdown (optional, as now rmarkdoen v2 also uses Pandoc, where the multiline format has some cool features compared to what rmarkdown format offered before:
panderOptions('table.style', 'rmarkdown')
You can highlight some cells with e.g. which and some custom R expression:
emphasize.strong.cells(which(COB.basics > 0, arr.ind = TRUE))
Simply call pander on your data.frame:
> library(pander)
> emphasize.strong.cells(which(COB.basics > 0, arr.ind = TRUE))
> panderOptions('big.mark', ',')
> pander(COB.basics)
-----------------------------------
Value rtn
-------------- ---------- ---------
**Notional** **10,000** NA
**yday pnl** -2,942 -2,942
**wtd_pnl** **2,300** **2,300**
**mtd_pnl** -3,334 -3,334
**ytd_pnl** **5,024** **5,024**
-----------------------------------
> panderOptions('table.style', 'rmarkdown')
> pander(COB.basics)
| | Value | rtn |
|:--------------:|:-------:|:------:|
| **Notional** | 10,000 | NA |
| **yday pnl** | -2,942 | -2,942 |
| **wtd_pnl** | 2,300 | 2,300 |
| **mtd_pnl** | -3,334 | -3,334 |
| **ytd_pnl** | 5,024 | 5,024 |
To color the cells, you could add some custom HTML/CSS markup manually (or LaTeX if working with pdf in the long run), and the same stands also for adding % or other symbols/strings to your cells with e.g. paste and apply -- but pls feel free to submit a feature request at https://github.com/Rapporter/pander

ascii package printing unnecessary characters

Running the following example
require(ascii)
mat <- matrix(c(1,11,2,12),nrow=2)
rownames(mat)<-letters[1:2]
colnames(mat)<-letters[11:12]
tab<- as.table(mat)
ascitab <- ascii(tab,digits=0,align="r")
print(ascitab,type="t2t")
produces the following output:
|| | k | l
| a | 1 | 2
| b | 11 | 12
Warning messages:
1: In rep(rownames, length = nrow(x)) :
'x' is NULL so the result will be NULL
2: In rep(colnames, length = ncol(x)) :
'x' is NULL so the result will be NULL
There are 2 issues:
The double vertical bar at the very beginning is incorrect, it should only be one bar.
And the warnings are very strange. The table has a sensible format.
Changing the print statement to
print(ascitab,type="t2t",include.rownames=TRUE,include.colnames=TRUE)
does not solve the problem.
Can anybody help?
I know a clumsy solution which includes capturing and postprocessing the output,
but I would like to see a clean solution.

Resources