Programatically inserting headers and text from a data.frame - r

I am trying to make a reproducible "data dictionary" in RMarkdown to ease my job of describing the various undocumented data sets I work with. I've looked at the most related post here: Programmatically insert text, headers and lists with R markdown, but am running into problems. I have a data.frame that has the colnames from my dataset and a column of strings that describe each variable. When I knit my RMarkdown document, I get one formatted header for the first variable and the rest show up with the formatting hash marks (##) and the variable name, but not as a formatted header.
```{r, results = 'asis'}
varnames <- c("A", "B", "C")
vardesc <- c("A is this.", "B is this.", "C is this.")
df <- data.frame(varnames, vardesc)
for(i in 1:nrow(df)) {
cat("##", df$vars[i], " \n")
cat("Description: ", df$vardesc[i])
cat(" \n")
}
```
This gives me variable "A" is a formatted header only. It seems my rookie knowledge of functions could be to blame but I can't figure out what I am doing wrong.
My output is as follows (with A being formatted and the rest not formatted):
## A
Description:
A is this.
## B
Description:
B is this.
## C
Description:
C is this.
Any advice would be greatly appreciated. I'm open to other methods to do this if they exist.

Try this instead:
varnames <- c("A", "B", "C")
vardesc <- c("A is this.", "B is this.", "C is this.")
df <- data.frame(varnames, vardesc, stringsAsFactors = FALSE)
for(i in 1:nrow(df)) {
cat("##", df$varnames[i], "\n")
cat("Description:", "\n")
cat(df$vardesc[i], "\n")
cat("\n")
}
Output is:
## A
Description:
A is this.
## B
Description:
B is this.
## C
Description:
C is this.

Related

Write each row of a dataframe as a text chunk and format all chunks in a text file with R

I have a dataframe of experiment materials and would like to format it into an appendix text file. Each row in the dataframe represents one experiment item. Each column contains one aspect of the item. The dataframe looks like this:
df <- data.frame(Item = 1, Title = "Title of the pss", Passage = " Content of the passage", Question = "Is this statement correct?", Answer = "Yes", stringsAsFactors = FALSE)
item2 <- c(2, "Title 2", "Passage 2", "Question 2", "No")
df <- rbind(df, item2)
df
>
Item Title Passage Question Answer
1 1 Title of the pss Content of the passage Is this statement correct? Yes
2 2 Title 2 Passage 2 Question 2 Answer 2
I would like to write this dataframe into a text file in the following format:
Title: Title of the pss
Passage:
Content of the passage
Question:
Is this statement correct?
Answer:
Yes
Title: Title 2
Passage:
Passage 2
Question:
Question 2
Answer:
Answer 2
I would like to know
How to write each row as a text chunk?
How to format the text with R code?
I figured out how to do this with python in a roundabout way, but still like to know whether there is neat way to do it in R.
Many many thanks!
You can use the sprintf function in base R (see this post for more info). This allows you to define a template string and then substitute values in. See below for an example, from this you could just iterate through rows of the dataframe to get the full output you need.
template <- "Title: %s
Passage:
%s
Question:
%s
Answer:
%s"
df <- data.frame(Item = 1, Title = "Title of the pss", Passage = " Content of the passage", Question = "Is this statement correct?", Answer = "Yes", stringsAsFactors = FALSE)
item2 <- c(2, "Title 2", "Passage 2", "Question 2", "No")
df <- rbind(df, item2)
df
cat(sprintf(template, df[1, "Title"], df[1, "Passage"], df[1, "Question"], df[1, "Answer"]))
This gives the output
Title: Title of the pss
Passage:
Content of the passage
Question:
Is this statement correct?
Answer:
Yes
Edit: To go further you could wrap the sprintf in a function that returns a list and then print that list like so:
chunk_fun <- function(df) {
text = c()
for(i in 1:nrow(df)) {
text <- c(text, sprintf(template, df[i, "Title"], df[i, "Passage"], df[i, "Question"], df[i, "Answer"]))
}
return(text)
}
textx = chunk_fun(df)
cat(textx)
Output:
Title: Title of the pss
Passage:
Content of the passage
Question:
Is this statement correct?
Answer:
Yes
Title: Title 2
Passage:
Passage 2
Question:
Question 2
Answer:
No
1) Convert to dcf format, add appropriate newlines and write it out.
library(magrittr)
fileout <- stdout() # replace with your file name
df %>%
.[-1] %>%
write.dcf(stdout()) %>%
capture.output %>%
sub("^(Passage|Question|Answer): (.*)", "\n\\1:\n\\2", .) %>%
sub("^Title", "\nTitle", .) %>%
writeLines(fileout)
giving:
Title: Title of the pss
Passage:
Content of the passage
Question:
Is this statement correct?
Answer:
Yes
Title: Title 2
Passage:
Passage 2
Question:
Question 2
Answer:
No
2) If you are open to using dcf format instead then it is just one line and also has the benefit that read.dcf can read it back in.
write.dcf(df[-1], fileout) # fileout is from above
giving:
Title: Title of the pss
Passage: Content of the passage
Question: Is this statement correct?
Answer: Yes
Title: Title 2
Passage: Passage 2
Question: Question 2
Answer: No

Convert list object to unordered list in markdown

I have a list containing character vectors. I would like to create an unordered list in an RMarkdown document. I have tried to accomplish this by looping through the list and pasting the output in an markdown list. In knitr in print the results 'asis'. Here is a toy example.
test <- list(x = c('a', 'b', 'c'), y = c('d', 'e'))
I would like to create an unordered list like this:
- x
- a
- b
- c
- y
- d
- e
I have tried to do this using a for loop in conjunction with cat and paste0.
cols <- names(test)
for (columns in names(test)) {
cat(paste0("- ", names(test[columns]), '\n', ' ',
"- ", test[[cols[columns]]], '\n'))
}
Which outputs"
- x
-
- y
-
I would appreciate some help to get the desired unordered list I have described above.
Here's a solution where you don't need loops. List is very similar to yaml document, therefore you can convert it to yaml (modify a little bit) and cat.
test <- list(A = c("a", "b", "c"), B = c("d", "e"), C = 1:5)
cat(gsub("^!omap\n|:", "", yaml::as.yaml(test, omap = TRUE)))
Explanation:
convert list to ordered yaml using as.yaml function from yaml package.
Remove omap header using gsub.
cat result.
You can also put it in a custom function so you wouldn't flood code:
catList <- function(inputList) {
cat(gsub("^!omap\n|:", "", yaml::as.yaml(inputList, omap = TRUE)))
}
catList(test)
Try this:
---
title: "SO Answer"
author: "duckmayr"
date: "September 14, 2018"
output: html_document
---
```{r unordered_list, echo=FALSE, results='asis'}
test <- list(x = c('a', 'b', 'c'), y = c('d', 'e'))
for (name in names(test)) {
cat("-", name, '\n', paste(' -', test[[name]], '\n'))
}
```
For me, this yields:
The way you were trying it before had two issues:
You should have been subsetting by test[[columns]] rather than test[[cols[columns]]], and
Even after you fix that, you can see that paste was causing some issues for you:
for (columns in names(test)) {
cat(paste0("- ", names(test[columns]), '\n', ' ',
"- ", test[[columns]], '\n'))
}
- x
- a
- x
- b
- x
- c
- y
- d
- y
- e

Loop in rmarkdown

I am relatively new to r and rmarkdown so I apologise in advance if this is a stupid question. This is a simple replication of a bigger dataset.
I have three columns in a dataframe:
df <- data.frame( c(a, b), c(c, d), c(e, NA))
names(df) <- c("X", "Y", "Z")
I want to show them in a rmarkdown file as follows:
I like a b.
This is c
This is e
This is d
I have written a function that includes
X <- 0
for (i in 1:nrow(df)) {
X[i] <- df$X[[i]] }
Y <- 0
for (i in 1:nrow(df)) {
Y[i] <- df$Y[[i]] }
X <- 0
for (i in 1:nrow(df)) {
Z[i] <- df$Z[[i]] }
And in the markdown file (the bit I'm struggling with)
I like `r X` ### This is fine
``` {r}
for (i in 1:nrow(df)) {
Y[i]
Z[i] } ### Doesn't work and I want to include text i.e. This is
```
I want to make some sort of loop so it prints the element in row 1 of column Y then Z, then the next row etc. and skip ifNA
Any help whatsoever would be majorly appreciated! :)
First, I'd give you some tips in your first loop. If you want to pass a data.frame column to a vector, you can vectorize it. I recommend you check this later. Hence, instead of:
X <- 0
for (i in 1:nrow(df)) {
X[i] <- df$X[[i]] }
try to do:
X <- vector("numeric", nrow(df)) #suggestion to create a empty numerical vector
X <- as.numeric(df$X)
Answering your main question, you can name your code chunk to keep the things organized. Use eval=FALSE if you desire only the output and not the code printed. Now, you have your vectors and can use #jason suggestion:
I like `r X`
```{r code_chunk1, eval=FALSE}
paste0("This is ", X)
paste0("This is ", Y)
paste0("This is ", paste(Z,collapse = " ")) # if you want them all in the same line
}
```
Avoid the operator, it can produce unexpected results and create problems without you noticing! Visit this.
There is no need to use loops. However, the elements of df need to be re-arranged to get printed row-wise.
The rmarkdown file below reproduces the expected result:
---
title: Loop in rmarkdown
output: html_document
---
```{r, echo=FALSE}
df <- data.frame( c("a", "b"), c("c", "d"), c("e", NA))
names(df) <- c("X", "Y", "Z")
```
I like `r df$X`
```{r, echo=FALSE, warning=FALSE}
library(magrittr) # use piping to improve readability
df[, 2:3] %>% # treat columns like a matrix
t() %>% # change from row first to column first order
as.vector() %>% # flatten into vector
na.omit() %>% # drop NAs
paste("This is", .) %>% # Prepend text
knitr::kable(col.names = NULL) # print as table
```
The output is
Note that knitr::kable(col.names = NULL) is used to create inline text, i.e., text output not wrapped in a verbatim element.
Alternatively, the chunk option results='asis' can be used:
```{r, echo=FALSE, warning=FALSE, results='asis'}
library(magrittr) # use piping to improve readability
df[, 2:3] %>% # treat columns like a matrix
t() %>% # change from row first to column first order
as.vector() %>% # flatten into vector
na.omit() %>% # drop NAs
paste("This is", ., collapse = " \n") %>% # Prepend text and collapse into one string
cat() # use cat() instead of print()
```
Note that the 2 blanks before \n are required to indicate a line break in rmarkdown.

Replace strings in text based on dictionary

I am new to R and need suggestions.
I have a dataframe with 1 text field in it. I need to fix the misspelled words in that text field. To help with that, I have a second file (dictionary) with 2 columns - the misspelled words and the correct words to replace them.
How would you recommend doing it? I wrote a simple "for loop" but the performance is an issue.
The file has ~120K rows and the dictionary has ~5k rows and the program's been running for hours. The text can have a max of 2000 characters.
Here is the code:
output<-source_file$MEMO_MANUAL_TXT
for (i in 1:nrow(fix_file)) { #dictionary file
target<-paste0(" ", fix_file$change_to_target[i], " ")
replace<-paste0(" ", fix_file$target[i], " ")
output<-gsub(target, replace, output, fixed = TRUE)
I would try agrep. I'm not sure how well it scales though.
Eg.
> agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE)
[1] "1 lazy"
Also check out pmatch and charmatch although I feel they won't be as useful to you.
here an example , to show #joran comment using a data.table left join. It is very fast (instantaneously here).
library(data.table)
n1 <- 120e3
n2 <- 1e3
set.seed(1)
## create vocab
tt <- outer(letters,letters,paste0)
vocab <- as.vector(outer(tt,tt,paste0))
## create the dictionary
dict <- data.table(miss=sample(vocab,n2,rep=F),
good=sample(letters,n2,rep=T),key='miss')
## the text table
orig <- data.table(miss=sample(vocab,n1,rep=TRUE),key='miss')
orig[dict]
orig[dict]
miss good
1: aakq v
2: adac t
3: adxj r
4: aeye t
5: afji g
---
1027: zvia d
1028: zygp p
1029: zyjm x
1030: zzak t
1031: zzvs q

How to add header to a dataset in R?

I need to read the ''wdbc.data' in the following data folder:
http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/
Doing this in R is easy using command read.csv but as the header is missing how can I add it? I have the information but don't know how to do this and I'd prefer do not edit the data file.
You can do the following:
Load the data:
test <- read.csv(
"http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data",
header=FALSE)
Note that the default value of the header argument for read.csv is TRUE so in order to get all lines you need to set it to FALSE.
Add names to the different columns in the data.frame
names(test) <- c("A","B","C","D","E","F","G","H","I","J","K")
or alternative and faster as I understand (not reloading the entire dataset):
colnames(test) <- c("A","B","C","D","E","F","G","H","I","J","K")
You can also use colnames instead of names if you have data.frame or matrix
You can also solve this problem by creating an array of values and assigning that array:
newheaders <- c("a", "b", "c", ... "x")
colnames(data) <- newheaders
in case you are interested in reading some data from a .txt file and only extract few columns of that file into a new .txt file with a customized header, the following code might be useful:
# input some data from 2 different .txt files:
civit_gps <- read.csv(file="/path2/gpsFile.csv",head=TRUE,sep=",")
civit_cam <- read.csv(file="/path2/cameraFile.txt",head=TRUE,sep=",")
# assign the name for the output file:
seqName <- "seq1_data.txt"
#=========================================================
# Extract data from imported files
#=========================================================
# From Camera:
frame_idx <- civit_cam$X.frame
qx <- civit_cam$q.x.rad.
qy <- civit_cam$q.y.rad.
qz <- civit_cam$q.z.rad.
qw <- civit_cam$q.w
# From GPS:
gpsT <- civit_gps$X.gpsTime.sec.
latitude <- civit_gps$Latitude.deg.
longitude <- civit_gps$Longitude.deg.
altitude <- civit_gps$H.Ell.m.
heading <- civit_gps$Heading.deg.
pitch <- civit_gps$pitch.deg.
roll <- civit_gps$roll.deg.
gpsTime_corr <- civit_gps[frame_idx,1]
#=========================================================
# Export new data into the output txt file
#=========================================================
myData <- data.frame(c(gpsTime_corr),
c(frame_idx),
c(qx),
c(qy),
c(qz),
c(qw))
# Write :
cat("#GPSTime,frameIdx,qx,qy,qz,qw\n", file=seqName)
write.table(myData, file = seqName,row.names=FALSE,col.names=FALSE,append=TRUE,sep = ",")
Of course, you should modify this sample script based on your own application.
this should work out,
kable(dt) %>%
kable_styling("striped") %>%
add_header_above(c(" " = 1, "Group 1" = 2, "Group 2" = 2, "Group 3" = 2))
#OR
kable(dt) %>%
kable_styling(c("striped", "bordered")) %>%
add_header_above(c(" ", "Group 1" = 2, "Group 2" = 2, "Group 3" = 2)) %>%
add_header_above(c(" ", "Group 4" = 4, "Group 5" = 2)) %>%
add_header_above(c(" ", "Group 6" = 6))
for more you can check the link

Resources