I would like to create several pdf files in rmarkdown.
This is a sample of my data:
mydata <- data.frame(First = c("John", "Hui", "Jared","Jenner"), Second = c("Smith", "Chang", "Jzu","King"), Sport = c("Football","Ballet","Ballet","Football"), Age = c("12", "13", "12","13"), submission = c("Microbes may be the friends of future colonists living off the land on the moon, Mars or elsewhere in the solar system and aiming to establish self-sufficient homes.
Space colonists, like people on Earth, will need what are known as rare earth elements, which are critical to modern technologies. These 17 elements, with daunting names like yttrium, lanthanum, neodymium and gadolinium, are sparsely distributed in the Earth’s crust. Without the rare earths, we wouldn’t have certain lasers, metallic alloys and powerful magnets that are used in cellphones and electric cars.", "But mining them on Earth today is an arduous process. It requires crushing tons of ore and then extracting smidgens of these metals using chemicals that leave behind rivers of toxic waste water.
Experiments conducted aboard the International Space Station show that a potentially cleaner, more efficient method could work on other worlds: let bacteria do the messy work of separating rare earth elements from rock.", "“The idea is the biology is essentially catalyzing a reaction that would occur very slowly without the biology,” said Charles S. Cockell, a professor of astrobiology at the University of Edinburgh.
On Earth, such biomining techniques are already used to produce 10 to 20 percent of the world’s copper and also at some gold mines; scientists have identified microbes that help leach rare earth elements out of rocks.", "Blank"))
With help from the community, I was able to arrive at a cool rmarkdown solution that would create a single html file, with all the data I want.
This is saved as Essay to Word.Rmd
```{r echo = FALSE}
# using data from above
# mydata <- mydata
# Define template (using column names from data.frame)
template <- "**First:** `r First` **Second:** `r Second` <br>
**Age:** `r Age`
**Submission** <br>
`r Submission`"
# Now process the template for each row of the data.frame
src <- lapply(1:nrow(mydata), function(i) {
knitr::knit_child(text=template, envir=mydata[i, ], quiet=TRUE)
})
```
# Print result to document
`r knitr::knit_child(text=unlist(src))`
```
This creates a single file:
I would like to create a single html (or preferably PDF file) for each "sport" listed in the data. So I would have all the submissions for students who do "Ballet" in one file, and a separate file with all the submissions of students who play football.
I have been looking a few different solutions, and I found this to be the most helpful:
R Knitr PDF: Is there a posssibility to automatically save PDF reports (generated from .Rmd) through a loop?
Following suite, I created a separate R script to loop through and subset the data by sport:
Unfortunately, this is creating a separate file with ALL the students, not just those who belong to that sport.
for (sport in unique(mydata$Sport)){
subgroup <- mydata[mydata$Sport == sport,]
render("Essay to Word.Rmd",output_file = paste0('report.',sport, '.html'))
}
Any idea what might be going on with this code above?
Is it possible to directly create these files as PDF docs instead of html? I know I can click on each file to save them as pdf after the fact, but I will have 40 different sports files to work with.
Is is possible to add a thin line between each "submission" essay within a file?
Any help would be great, thank you!!!
This could be achieved via a parametrized report like so:
Add parameters for the data and e.g. the type of sport to your Rmd
Inside the lapply pass your subgroup dataset to render via argument params
You can add horizontal lines via ***
If you want pdf then use output_format="pdf_document". Additionally to render your document I had to switch the latex engine via output_options
Rmd:
---
params:
data: null
sport: null
---
```{r echo = FALSE}
# using data from above
data <- params$data
# Define template (using column names from data.frame)
template <- "
***
**First:** `r First` **Second:** `r Second` <br>
**Age:** `r Age`
**Submission** <br>
`r Submission`"
# Now process the template for each row of the data.frame
src <- lapply(1:nrow(data), function(i) {
knitr::knit_child(text=template, envir=data[i, ], quiet=TRUE)
})
```
# Print result to document. Sport: `r params$sport`
`r knitr::knit_child(text=unlist(src))`
R Script:
mydata <- data.frame(First = c("John", "Hui", "Jared","Jenner"),
Second = c("Smith", "Chang", "Jzu","King"),
Sport = c("Football","Ballet","Ballet","Football"),
Age = c("12", "13", "12","13"),
Submission = c("Microbes may be the friends of future colonists living off the land on the moon, Mars or elsewhere in the solar system and aiming to establish self-sufficient homes.
Space colonists, like people on Earth, will need what are known as rare earth elements, which are critical to modern technologies. These 17 elements, with daunting names like yttrium, lanthanum, neodymium and gadolinium, are sparsely distributed in the Earth’s crust. Without the rare earths, we wouldn’t have certain lasers, metallic alloys and powerful magnets that are used in cellphones and electric cars.", "But mining them on Earth today is an arduous process. It requires crushing tons of ore and then extracting smidgens of these metals using chemicals that leave behind rivers of toxic waste water.
Experiments conducted aboard the International Space Station show that a potentially cleaner, more efficient method could work on other worlds: let bacteria do the messy work of separating rare earth elements from rock.", "“The idea is the biology is essentially catalyzing a reaction that would occur very slowly without the biology,” said Charles S. Cockell, a professor of astrobiology at the University of Edinburgh.
On Earth, such biomining techniques are already used to produce 10 to 20 percent of the world’s copper and also at some gold mines; scientists have identified microbes that help leach rare earth elements out of rocks.", "Blank"))
for (sport in unique(mydata$Sport)){
subgroup <- mydata[mydata$Sport == sport,]
rmarkdown::render("test.Rmd", output_format = "html_document", output_file = paste0('report.', sport, '.html'), params = list(data = subgroup, sport = sport))
rmarkdown::render("test.Rmd", output_format = "pdf_document", output_options = list(latex_engine = "xelatex"), output_file = paste0('report.', sport, '.pdf'), params = list(data = subgroup, sport = sport))
}
In order to directly create a pdf from your rmd-file , you could use the following function in a separate R script where your data is loaded, and then use map from the purrr package to iterate over the data (in the rmd-file the output must be set to pdf_document):
library(tidyverse)
library(lazyeval)
get_report <- function(sport){
sport <- enquo(sport)
mydata <- mydata %>%
filter(Sport == !!sport)
render("test.rmd", output_file = paste('report_', as_name(sport), '.pdf', sep=''))
}
map(as.vector(data$Sport), get_report)
Hope that is what you are looking for?
Related
I am trying to convert my data into a html document using Rmarkdown, and I am currently relying on conversion to vectors and indexing to solve my problem.
Although my sample data has 4 observations, my actual datasets has over 30 records, so indexing seems cumbersome and unnatural.
Is there a better way to pull out each of these elements in sequence? Any suggestions would be great.
--
title: "Rmarkdown report"
output: html_document
---
```{r echo = FALSE}
mydata <- data.frame(First = c("John", "Hui", "Jared"), Second = c("Smith", "Chang", "Jzu"), Sport = c("Football","Soccer","Ballet"), Age = c("12", "13", "12"), submission = c("Microbes may be the friends of future colonists living off the land on the moon, Mars or elsewhere in the solar system and aiming to establish self-sufficient homes. Space colonists, like people on Earth, will need what are known as rare earth elements, which are critical to modern technologies. These 17 elements, with daunting names like yttrium, lanthanum, neodymium and gadolinium, are sparsely distributed in the Earths crust. Without the rare earths, we wouldn’t have certain lasers, metallic alloys and powerful magnets that are used in cellphones and electric cars. But mining them on Earth today is an arduous process. It requires crushing tons of ore and then extracting smidgens of these metals using chemicals that leave behind rivers of toxic waste water.",
"Experiments conducted aboard the International Space Station show that a potentially cleaner, more efficient method could work on other worlds: let bacteria do the messy work of separating rare earth elements from rock. The idea is the biology is essentially catalyzing a reaction that would occur very slowly without the biology, said Charles S. Cockell, a professor of astrobiology at the University of Edinburgh.
On Earth, such biomining techniques are already used to produce 10 to 20 percent of the world’s copper and also at some gold mines; scientists have identified microbes that help leach rare earth elements out of rocks.",
"Experiments conducted aboard the International Space Station show that a potentially cleaner, more efficient method could work on other worlds: let bacteria do the messy work of separating rare earth elements from rock. The idea is the biology is essentially catalyzing a reaction that would occur very slowly without the biology, said Charles S. Cockell, a professor of astrobiology at the University of Edinburgh.
On Earth, such biomining techniques are already used to produce 10 to 20 percent of the world’s copper and also at some gold mines; scientists have identified microbes that help leach rare earth elements out of rocks."))
first<- as.vector(mydata$First)
sec <- as.vector(mydata$Second)
age <- as.vector(mydata$Age)
submission <- as.vector(mydata$submission)
```
##
**First:** `r first[1]` **Second:** `r sec[1]` <br>
**Age:** `r age[1]`
**submission** <br>
`r submission[1]`
***
**First:** `r first[2]` **Second:** `r sec[2]` <br>
**Age:** `r age[2]`
**submission** <br>
`r submission[2]`
Here's a way to iterate over all rows
---
title: "Rmarkdown report"
output: html_document
---
```{r echo = FALSE}
# using data from above
# mydata <- data.frame(...)
# Define template (using column names from data.frame)
template <- "**First:** `r First` **Second:** `r Second` <br>
**Age:** `r Age`
**submission** <br>
`r submission`"
# Now process the template for each row of the data.frame
src <- lapply(1:nrow(mydata), function(i) {
knitr::knit_child(text=template, envir=mydata[i, ], quiet=TRUE)
})
```
# Print result to document
`r knitr::knit_child(text=unlist(src))`
Here we use knit_child to take a template string and then use that for each row of the data.frame. I used a trick here to pass in the row of the data.frame as an environment so the template can see all the columns as variables so we don't need to create the vector versions of all the data.frame columns.
If we need to create objects in the global env, subset the columns of data into a list, rename it and use list2env
nm1 <- c('First', 'Second', 'Age', 'submission')
nm2 <- c('first', 'sec', 'age', submission')
list2env(setNames(unclass(mydata[nm1]), nm2), .GlobalEnv)
This is the answer I gave to your previous question:
You can use cat to add the HTML code to an R markdown chunk in order to loop through your data.
Important
You have to add results = "asis" to {r}
Here is the loop:
{r results="asis", echo = FALSE}
i = 1
NR_OF_ROWS <-
nrow(data) # number of rows that the loop will go through
while (i <= NR_OF_ROWS) {
cat("\n **First:** ", data[i, 1], " **Last:** ", data[i, 2], "<br> \n")
cat("\n **Age:** ", data[i, 3], " **Sport:** ", data[i, 4], "<br> \n")
cat("\n **submission** ", data[i, 5], "<br> \n")
# cat("\n <br> \n") extra space between entries
cat("\n *** \n") line between entries
i = i + 1
}
Here is the result:
I am attempting to create a document frequency matrix in R.
I currently have a dataframe (df_2), which is made up of 2 columns:
doc_num: which details which document each term is coming from
text_token: which contains each tokenized word relating to each document.
The df's dimensions are 79,447 * 2.
However, there are only 400 actual documents in the 79,447 rows.
I have been trying to create this dfm using the tm package.
I have tried creating a corpus (vectorsource) and then attempting to coerce that into a dfm using
the appropriately named "dfm" command.
However, this indicates that "dfm() only works on character, corpus, dfm, tokens objects."
I understand my data isn't currently in the correct format for the dfm command to work.
My issue is that I don't know how to get from my current point to a matrix as appears below.
Example of what I would like the matrix to look like when complete:
Where 2 is the number of times cat appears in doc_2.
Any help on this would be greatly appreciated.
Is mise le meas.
It will be useful for you and others if all pertinent details are made available with your code - such as the use of quanteda package for dfm().
If the underlying text is setup correctly, the dfm() will directly give you what you are looking for - that is precisely what it is set up for.
Here is a simulation:
library(tm)
library(quanteda)
# install.packages("readtext")
library(readtext)
doc1 <- "COVID-19 can be beaten if all ensure social distance, social distance is critical"
doc2 <- "COVID-19 can be defeated through early self isolation, self isolation is your responsibility"
doc3 <- "Corona Virus can be beaten through early detection & slowing of spread, Corona Virus can be beaten, Yes, Corona Virus can be beaten"
doc4 <- "Corona Virus can be defeated through maximization of social distance"
write.table(doc1,"doc1.txt",sep="\t",row.names=FALSE, col.names = F)
write.table(doc2,"doc2.txt",sep="\t",row.names=FALSE, col.names = F)
write.table(doc3,"doc3.txt",sep="\t",row.names=FALSE, col.names = F)
write.table(doc4,"doc4.txt",sep="\t",row.names=FALSE, col.names = F)
# save above into your WD
getwd()
txt <- readtext(paste0("Your WD/docs", "/*"))
txt
corp <- corpus(txt)
x <- dfm(corp)
View(x)
If the issue is one of formatting /cleaning your data so that you can run dfm(), then you need to post a new question which provides necessary details on your data.
I have a data frame with several information about patients. I created a loop with R to process each information and write them to a docx file using ReporteRs, but with this loop I obtain as much docx as subjects I have, instead I would like to have 1 unique docx with all information one after the other.
this is the df
Surname Name Born Subject Place
Halls Ben 09/08/2019 3387502 S.Jeorge
Beck David 12/08/2019 1319735 S.Jeorge
Essimy Daniel 12/08/2019 3387789 S.Jeorge
Rich Maria 12/08/2019 3307988 S.Agatha
and this is the code I have written
dfY2 <- read.table("file.txt",header=T)
for(i in 1:nrow(dfY2)) {
my_title <- pot('Exam', textProperties(font.weight = "bold",font.size=12, font.family="Times New Roman"))
row1<-pot("Surname and Name",textProperties(font.weight="bold"))+" "+pot(dfY2[i,1])+" "+pot(dfY2[i,2])+" "+pot("Born",textProperties(font.weight="bold"))+pot(dfY2[i,3])
row2<-pot("SubjectID",textProperties(font.weight="bold"))+" "+pot(dfY2[i,4])+pot("Place",textProperties(font.weight="bold"))+" "+pot(dfY2[i,5])
doc<-docx("Template.docx")%>%
addParagraph(my_title, par.properties=parProperties( text.align = "center"))%>%
addParagraph(c(""))%>%
addParagraph(row1)%>%
addParagraph(row2)%>%
writeDoc(doc,file = paste0(dfY2[i,1],"output.docx"))
}
So, in this way, I obtain several outputs, while I would like to write all the rows one after the other for each subject in only a single doc.
What can I do?
thanks
First of all, I would recommend using the newer package officer from the same author because ReporteRs is not anymore maintained.
To your question: you need to create the 'docx' object before the loop and save it after the loop (eventually you want to add the title before the loop as well):
doc <- docx("Template.docx")
for(i in 1:nrow(dfY2)) {
...
doc <- doc %>%
addParagraph(my_title, par.properties=parProperties( text.align = "center")) %>%
addParagraph(c("")) %>%
addParagraph(row1) %>%
addParagraph(row2)
}
writeDoc(doc, file ="output.docx")
I have a text file that looks like:
These are the hig hlights. Transit ioning to this, hello. I have
provided this informat ion. The man has this dis eas e. He needs to take this dos age of medicine. Fo r o ne mo nth, thro ug h this pro g ram, do this. Do no t overdose.
There are numerous words that are broken up. Is there any way to notice this errors in word structure and fix them through r?
So basically:
These are the highlights. Transitioning to this, hello. I have
provided this information. The man has this disease. He needs to take this dosage of medicine. For one month, through this program, do this. Do not overdose.
I got the text from a pdf using the following code:
file <- 'C:/Project/Section/SubSection/text.pdf'
Rpdf <- readPDF(control = list(text = "-layout"))
corpus <- VCorpus(URISource(file), readerControl = list(reader = Rpdf))
corpus.array <- content(content(corpus)[[1]])
txt<- write(corpus.array, "C:/Project/Section/SubSection/text1.txt")
readtext<- eval(readLines("C:/Project/Section/SubSection/text1.txt"))
This produced the text with awkward spacing. Is there a better way to convert a pdf to to text file?
#Report Section
output<-"D:/R/Reference program for R/Table_EG_chg.doc" # although this is RTF, we can use the
rtf<-RTF(output,width=8.5,height=11,font.size=9,omi=c(0.5,0.5,0.5,0.5))
addHeader(rtf,title = " Table14.3.2.3.1", subtitle =" Vital Signs - Absolute Values", font.size=9,TOC.level=0)
addTable(rtf,final,font.size=9,row.names=FALSE,NA.string="0",col.justify='L',header.col.justify='L',col.widths=c(1.75,1.5,1.25,0.5,0.5,0.5,0.5,0.5,0.5))
addTable(rtf,as.data.frame(head(iris)),font.size=10,row.names=FALSE,NA.string="-")
addText(rtf, "\n\n", bold=TRUE, italic=FALSE)
done(rtf) # writes and closes the file
final is my data frame which i need to print in the RTF output.
This is the code i have used to create the output in RTF. It works fines for first page alone, for the rest of the page output doesn't have Title and footnotes in all the pages. Please can anyone has done the this method if so please can you send the code...
This is easily done in SAS. I need it in R.
Any one has answer for this.....
Think you are asking the listings where we can do in SAS programming, I have tried using R program and got the outputs. Please find the below code I have used a dummy dataset and applied logics which you need to get the rtf document where we can see titles and footnotes in multiple pages.
library(rtf)
final <- data.frame(Subject = c(1001,1002,1003,1004,1005,1006), Country = c("USA","IND","CHN","JPN","SA","EUR"),
Age = c(50,60,51,63,73,65), Sex = c("M","F","M","F","M","F"), SBP = c(120,121,119,123,126,128),
DBP = c(80,70,75,85,89,71))
final$seq <- rep(seq(1,nrow(final),2),each =2)
rtf<-RTF("Table_EG_chg.rtf",width=11,height=5,font.size=9,omi=c(0.5,0.5,0.5,0.5))
for ( i in unique(final$seq)){
new <- final [final$seq == i , ]
new$seq <- NULL
name.width <- max(sapply(names(new), nchar))
new <- format(new, justify = "centre")
addHeader(rtf,title = "\t\t\t\t\t\t\t\t\tTable14.3.2.3.1", subtitle ="\t\t\t\t\t\t\t\t\tVital Signs - Absolute Values", font.size=9)
addTable(rtf,new,font.size=9,row.names=FALSE,NA.string="0",col.justify='L',header.col.justify='L',col.widths=c(1.75,1.5,1.25,1.5,1.5,1.5))
startParagraph.RTF(rtf)
addText.RTF(rtf,paste("\n","- Vital signs lab values are collected at the day of ICF.\n"))
addText.RTF(rtf,"- Vital signs SBP - systolic blood pressure; DBP - Diastolic blood pressure")
endParagraph.RTF(rtf)
addPageBreak(rtf, width=11,height=5,font.size=9,omi=rep(0.5,0.5,0.5,0.5))
}
done(rtf)
#Jaikumar Sorry it took 6 years for a package to come out that can finally do what you want. At the end of last year, the reporter package was released. This package replicates a lot of the functionality of SAS proc report. It can do dataset listings, just like SAS. It will repeat titles and footnotes on every page, without having to do anything special. Here is an example:
library(reporter)
library(magrittr)
# Create table
tbl <- create_table(iris) %>%
titles("Sample Title for Iris Data") %>%
footnotes("My footnote")
# Create report and add table to report
rpt <- create_report("test.rtf", output_type = "RTF") %>%
add_content(tbl)
# Write the report
write_report(rpt)
It can also print in RTF, PDF, and TXT. To use PDF, just change the file name and the output_type.