How to join a PDF files in R? - r

I've an R program that outputs a booklet of graphics as a PDF file onto the local server. There's a separate PDF file, an introduction piece, not written in R, that I would like to join my output to.
I can complete this in Adobe and R-bloggers has the process here, both of which involve joining the files by hand, as it were:
http://www.r-bloggers.com/splitting-and-combining-r-pdf-graphics/
But what I'd really prefer to do is just run my code and have the files join. I wasn't able to find similar posts while searching for "[R] Pdf" and "join", "merge", "import pdf", etc..
My intent is to run the code for a different ID number ("Physician") each time. The report will save as a PDF titled by ID number on the server, and the same addendum would be joined to each document.
Here's the current code creating the R report.
Physician<- 1
#creates handle for file name and location using ID
Jumanji<- paste ("X:\\Feedback_ID_", Physician, ".pdf", sep="")
#PDF graphics device on, using file handle
pdf(file=Jumanji,8.5, 11)
Several plots for this ID occur here and then the PDF is completed with dev.off().
dev.off()
I think I need to pull the outside document into R and reference it in between the opening and closing, but I haven't been successful here.

To do this in R, follow #cbeleites' suggestion (who, I think, is rightly suggesting you move your whole workflow to knitr) to do just this bit in Sweave/knitr. knit the following to pdf, where "test.pdf" is your report that you're appending to, and you'll get the result you want:
\documentclass{article}
\usepackage{pdfpages}
\begin{document}
\includepdf{test.pdf} % your other document
<<echo=FALSE>>=
x <- rnorm(100)
hist(x)
# or whatever you need to do to get your plot
#
\end{document}
Also, the post you link to seems crazy because it's easy to combine plots into a single pdf in R (in fact it's the default option). Simply leave the pdf device open with its parameter onefile=TRUE (the default).
x <- rnorm(100)
y <- rnorm(100)
pdf("test.pdf")
hist(x)
hist(y)
dev.off()
Plots will automatically get paginated.

You can also consider something like :
library(qpdf)
path_PDF1 <- "C:/1.pdf"
path_PDF2 <- "C:/2.pdf"
pdf_combine(input = c(path_PDF1, path_PDF2), output = "C:/merged.pdf")

Related

Is there a way to create a code architecture diagram, that gives an overview over R scripts that source each other?

I have alot of different scripts in R that sources one another with source(). Im looking for a way to create an overview diagram, that links each script visually, so i can easily see the "source hierarchy" of my code.
The result could look something like:
I hope there is a solution, that doesnt require a software license.
Hope it makes sence! :)
I can suggest you use Knime. it has the kind of diagram you are looking for. It has some scripts already wrote to clean, visualize data and write output and has integration with R and Python.
https://docs.knime.com/?category=integrations&release=2019-12
https://www.knime.com/
Good luck.
For purposes of example change directory to an empty directory and run the code in the Note at the end to create some sample .R files.
In the first two lines of the code below we set the files variable to be a character vector containing the paths to the R files of interest. We also set st to the path to the main source file. Here it is a.R but it can be changed appropriately.
The code first inserts the line contained in variable insert at the beginning of each such file.
Then it instruments source using the trace command shown so that each time source is run a log record is produced. We then source the top level R file.
Finally we read in the log and use the igraph package to produce a tree of source files. (Any other package that can produce suitable graphics could be used instead.)
# change the next two lines of code appropriately.
# Settings shown are for the files generated in the Note at the end
# assuming they are in the current directory and no other R files are.
files <- Sys.glob("*.R")
st <- "a.R"
# inserts indicated line at top of each file unless already inserted
insert <- "this.file <- normalizePath(sys.frames()[[1]]$ofile)"
for(f in files)
inp <- readLines(f)
ok <- !any(grepl(insert, inp, fixed = TRUE)) # TRUE if insert not in f
if (ok) writeLines(c(insert, input), f)
}
# instrument source and run to produce log file
if (file.exists("log")) file.remove("log")
this.file <- "root"
trace(source, quote(cat("parent:", basename(this.file),
"file:", file, "\n", file = "log", append = TRUE)))
source(st) # assuming a.R is the top level program
untrace(source)
# read log and display graph
DF <- read.table("log")[c(2, 4)]
library(igraph)
g <- graph.data.frame(DF)
plot(g, layout = layout_as_tree(g))
For example, if we have the files generated in the Note at the end then the code above generates this diagram:
Note
cat('
source("b.R")
source("c.R")
', file = "a.R")
cat("\n", file = "b.R")
cat("\n", file = "C.R")

Save all plots already present in the panel of Rstudio

I've made different plots (more than a hundred) for a project and I haven't capture them on the way (yes it's bad , i know). Now, I need to save them all at once but without running again my script (which takes hours). Is there a way to do so within Rstudio ?
Edit: All the plot are already there and I don't want to run them again.
In RStudio, every session has a temporary directory that can be obtained using tempdir(). Inside that temporary directory, there is another directory that always starts with "rs-graphics" and contains all the plots saved as ".png" files. Therefore, to get the list of ".png" files you can do the following:
plots.dir.path <- list.files(tempdir(), pattern="rs-graphics", full.names = TRUE);
plots.png.paths <- list.files(plots.dir.path, pattern=".png", full.names = TRUE)
Now, you can copy these files to your desired directory, as follows:
file.copy(from=plots.png.paths, to="path_to_your_dir")
Additional feature:
As you will notice, the .png file names are automatically generated (e.g., 0078cb77-02f2-4a16-bf02-0c5c6d8cc8d8.png). So if you want to number the .png files according to their plotting order in RStudio, you may do so as follows:
plots.png.detials <- file.info(plots.png.paths)
plots.png.detials <- plots.png.detials[order(plots.png.detials$mtime),]
sorted.png.names <- gsub(plots.dir.path, "path_to_your_dir", row.names(plots.png.detials), fixed=TRUE)
numbered.png.names <- paste0("path_to_your_dir/", 1:length(sorted.png.names), ".png")
# Rename all the .png files as: 1.png, 2.png, 3.png, and so on.
file.rename(from=sorted.png.names, to=numbered.png.names)
Hope it helps.
Although this discussion has been inactive for a while, there are some persons, like myself, who still come across the same problem, and the other solutions don't really seem to even get what the actual question is.
So, hands on. Your plot history gets saved in a variable called .SavedPlots. You can either access it directly, assign it to another variable in code or do the latter from the plots window.
# ph for plot history
ph <- .SavedPlots
In R 3.4.2, I could index ph to reproduce the corresponding plot in a device. What follows is rather straightforward:
Open a new device (png, jpeg, pdf...).
Reproduce your plot ph[index_of_plot_in_history].
Close the device (or keep plotting if it is a pdf with multiple pages).
Example:
for(i in 1:lastplot) {
png('plotname.png')
print(ph[i])
dev.off()
}
Note: Sometimes this doesn't happen because of poor programming. For instance, I was using the MICE package to impute many datasets with a large number of variables, and plotting as shown in section 4.3 of this paper. Problem was, that only three variables per plot were displayed, and if I used a png device in my code, only the last plot of each dataset would be saved. However, if the plots were printed to a window, all the plots of each dataset would be recorded.
If your plots are 3d, you can take a snapshot of all your plots and save them as a .png file format.
snapshot3d(filename = '../Plots/SnapshotPlots.png', fmt = 'png')
Or else, the best way is to create a multi-paneled plotting window using the par(mfrow) function. Try the following
plotsPath = "../Plots/allPlots.pdf"
pdf(file=plotsPath)
for (x in seq(1,100))
{
par(mfrow = c(2,1))
p1=rnorm(x)
p2=rnorm(x)
plot(p1,p2)
}
dev.off()
You can also use png, bmp, tiff, and jpeg functions instead of pdf. You can read their advantages and disadvantages and choose the one you think is good for your needs.
I am not sure how Rstudio opens the device where the plot are drawn, but I guess it uses dev.new(). In that case one quick way to save all opened graphs is to loop through all the devices and write them using dev.print.
Something like :
lapply(dev.list(),function(d){dev.set(d);dev.print(pdf,file=file.path(folder,paste0("graph_",d,".pdf"))})
where folder is the path of the folder where you want to store your graph (could be for example folder="~" if you are in linux and want to store all your graph in your home folder).
If you enter the following function all that will follow will be save in a document:
pdf("nameofthedocument.pdf")
plot(x~y)
plot(...
dev.off()
You can also use tiff(), jpg()... see ?pdf

export all the content of r script into pdf

I would want to export all the content of r script into pdf. Could it be possible?
I used these commands export, but what I see I just exported graphics
pdf(file = "example.pdf")
dev.off()
Thank you!
setwd("C:/Users/Prat/Desktop/c")
> dir()
[1] "script.R"
> knitr::stitch('script.r')
output file: script.tex
In my folder doesn't appears a script.pdf else a script.tex and a folder with the pictures in pdf
You can do this with the knitr package. Here's a workflow:
Save your script as a file (e.g., myscript.r)
Then run knitr::stitch('myscript.r')
The resulting PDF will be saved locally as myscript.pdf. You can use browseURL('myscript.pdf') to view it.
You can generate html file by using,
knitr::stitch_rhtml('filename.r')
As .tex file is not easily readable but html files can view in any browser easily.
For everyone who is looking for an easy and fast solution, I would propose using the function capture.output (https://www.rdocumentation.org/packages/utils/versions/3.6.2/topics/capture.output) from utils.
One only needs to 1.) capture what ever command one wants to run and assign it to a variable and 2.) then print that variable. Images can be printed along the way as you can see. The example on the webpage I linked above does not use markdown.
Here my example with markdown (this is really all one needs):
```{r, echo = F}
# fake data-set
x = rnorm(50, mean = 3.3, sd=1)
y = rnorm(50, mean = 3.1, sd=0.9)
z = rnorm(50, mean = 3.2, sd=1.1)
# create dataframe
df <- data.frame(x, y, z)
# adding a graphic
plot(df$x, df$y)
# create a model as example
linearMod <- lm(y ~ x + z, data=df)
# all one needs to capture the output!!:
bla <- capture.output(summary(linearMod))
print(bla)
```
Remark: if one also wants to print the command, that is also easy. Just replace "echo = F" with "warning = F" or remove the text altogether if you also wanna have the warnings printed, in case there are any.
I was having the same issue, but I realized I was working in R 4.1 and ignored the warning that knitr was created using R 4.2. However after updating my R version, I was also just getting a .tex file but when I read the .log file I found the error "sh: pdflatex: command not found."
I used this suggestion with success:
Have you installed a LaTeX distribution in your system? For rmarkdown,
tinytex is recommended, you would need to install the R package and
then the TinyTex distribution.
install.packages('tinytex')
tinytex::install_tinytex()
Make sure you not only install the package but also run that second command tinytex::install_tinytex() as I made that mistake also before finally getting the program to create a pdf file.
Here is the link to the site where I found this method.
https://community.rstudio.com/t/knitting-error-pdflatex-command-not-found/139965/3
Please use the below set of codes (you need to modify it according to your dataset/data-frame name).
library(gridExtra)
library(datasets)
setwd("D:\\Downloads\\R Work\\")
data("mtcars") # Write your dataframe name that you want to print in pdf
pdf("data_in_pdf.pdf", height = 11, width = 8.5)
grid.table(mtcars)
dev.off()
Thanks.

R - Keep log of all plots

I do a lot of data exploration in R and I would like to keep every plot I generate (from the interactive R console). I am thinking of a directory where everything I plot is automatically saved as a time-stamped PDF. I also do not want this to interfere with the normal display of plots.
Is there something that I can add to my ~/.Rprofile that will do this?
The general idea is to write a script generating the plot in order to regenerate it. The ESS documentation (in a README) says it well under 'Philosophies for using ESS':
The source code is real. The objects are realizations of the
source code. Source for EVERY user modified object is placed in a
particular directory or directories, for later editing and
retrieval.
With any editor allows stepwise (or regionwise) execution of commands you can keep track of your work this way.
The best approach is to use a script file (or sweave or knitr file) so that you can just recreate all the graphs when you need them (into a pdf file or other).
But here is the start of an approach that does the basics of what you asked:
savegraphs <- local({i <- 1;
function(){
if(dev.cur()>1){
filename <- sprintf('graphs/SavedPlot%03d.pdf', i)
dev.copy2pdf( file=filename )
i <<- i + 1
}
}
})
setHook('before.plot.new', savegraphs )
setHook('before.grid.newpage', savegraphs )
Now just before you create a new graph the current one will be saved into the graphs folder of the current working folder (make sure that it exists). This means that if you add to a plot (lines, points, abline, etc.) then the annotations will be included. However you will need to run plot.new in order for the last plot to be saved (and if you close the current graphics device without running another plot.new then that last plot will not be saved).
This version will overwrite plots saved from a previous R session in the same working directory. It will also fail if you use something other than base or grid graphics (and maybe even with some complicated plots then). I would not be surprised if there are some extra plots on occasion that show up (when internally a plot is created to get some parameters, then immediatly replaced with the one of interest). There are probably other things that I have overlooked as well, but this might get you started.
you could write your own wrapper functions for your commonly used plot functions. This wrapper function would call both the on-screen display and a timestamped pdf version. You could source() this function in your ~/.Rprofile so that it's available every time you run R.
For latice's xyplot, using the windows device for the on-screen display:
library(lattice)
my.xyplot <- function(...){
dir.create(file.path("~","RPlots"))
my.chart <- xyplot(...)
trellis.device(device="windows",height = 8, width = 8)
print(my.chart)
trellis.device(device = "pdf",
file = file.path("~", "RPlots",
paste("xyplot",format(Sys.time(),"_%Y%m%d_%H-%M-%S"),
".pdf", sep = "")),
paper = "letter", width = 8, height = 8)
print(my.chart)
dev.off()
}
my.data <- data.frame(x=-100:100)
my.data$y <- my.data$x^2
my.xyplot(y~x,data=my.data)
As others have said, you should probably get in the habit of working from an R script, rather than working exclusively from the interactive terminal. If you save your scripts, everything is reproducible and modifiable in the future. Nonetheless, a "log of plots" is an interesting idea.

Figure numbers generated by Sweave/R and why only (PDF)LaTeX

I am continuing my earlier post here:
Beginner's questions (figures, bibliography) with Sweave/R/LaTeX---my first document
The working code is reproduced here:
\documentclass[a4paper]{article}
\usepackage{Sweave} %%%%%%
\begin{document}
<<echo=TRUE>>=
x <- rnorm(100)
xm <- mean(x)
xm
#
<<echo=FALSE>>=
x <- rnorm(100)
xm <- mean(x)
xm
#
<<echo=TRUE>>=
##### Remove all comments from your data file
test.frame<-read.table(file="apples.d",header=T,sep= "")
names(test.frame)
head(test.frame)
class(test.frame)
#
\begin{figure}[htbp]
\begin{center}
\setkeys{Gin}{width=0.5\textwidth}
<<echo=FALSE,fig=TRUE,width=4,height=4>>=
#### Must tell plot where to get the data from. Could also use test.frame$year
with(test.frame,plot(year,value))
#
\end{center}
\end{figure}
\end{document}
The above runs fine with RStudio (latest) and Tinn-R (latest) and the desired pdf document is produced.
Questions:
If I name the above file as goodex.snw and I run Sweave, I get the file goodex-004.pdf with either Tinn-R or RStudio as the PDF image of the plot. Why the trailing 004? Can this be changed?
Can an EPS file be produced? Is the tool by which Sweave compiles to PDF is only through (PDF)LaTeX and not through the traditional DVI > PS > PDF route?
Just running the command with(test.frame,plot(year,value)) in the R command window generates more values on the y-axis i.e. 15000, 20000, 25000 and 30000. However in the PDF file produced by Sweave by my code at the top of this post, I do not get all the values on the y-axis (only 15000 and 25000). How to control the size of the plot directly in the code so that all necessary y values appear?
Update: the file apples.d contains:
#Number of apples I ate
year value
8 12050 #year 2008
9 15292 #year 2009
10 23907 #year 2010
11 33997 #year 2011
Your example is not reproducible because we don't have the file apples.d, so we can only guess why the plot goes wrong. Please see:
How to make a great R reproducible example?
on how to make a reproducible example.
Please note that Sweave is not a functionality of Rstudio or Tinn-R, it is an R function (Sweave()) that can be run from command line or with arguments from the R executable. This might be helpful to know if you are searching for information.
As for your questions:
The names of plots always have the form FILENAME-CHUNKLABEL.pdf or eps, where the chunk label can be set as option to the Sweave chunk (it's the first argument). If you don't set a chunk name the plots will be enumerated.
You can use eps with the option eps=true. I am pretty sure that by default both eps and pdf are produced though. As for compiling, Sweave does not compile by itself, it creates a .tex file that you can use in whatever way you want. In R 2.14 there is an option to run pdfLaTeX on the produced .tex file automatically,. The way Rstudio and Tinn-R compile is probably by using an pdfLaTeX call after Sweave. You can do it manually if you want to do it differently.
Without a reproducible example we can only guess. What is going wrong? You could set the limits of the plot with the xlim and ylim arguments but that shouldn't be what is going wrong here.
Edit:
In response to edited question with data. First just a tip. This way of giving the data isn't the most useful way of doing it. We can of course reproduce this but it is much easier if you give the data in a way we can run immediately. e.g.:
test.frame<-data.frame(year=8:11, value= c(12050,15292,23907,33991))
As for the plot, you mean the labels on the y axis? This can be changed by omiting axes in the plot call and setting them manually with the axis() function:
with(test.frame,plot(year,value,axes=FALSE))
axis(1)
axis(2,test.frame$value,las=1)
This does look a bit weird if the ticks aren't constantly distributed over the axis though. Better would be:
with(test.frame,plot(year,value,axes=FALSE))
axis(1)
axis(2,seq(10000,35000,by=5000),las=1)

Resources