I'm using R with imagemagick to crop some borders from a pdf file. I'm executing the following commands:
library(magick)
pdf_total <- image_read_pdf(path = "file1.pdf")
pdf_cropped <- image_crop(pdf_total,"3000x1500")
After this process I have a perfect cropped file, but my problem occurs when I try to save the file to a new pdf file. What is the correct procedure to save this converted pdf?
My final solution is:
library(magick)
pdf_total <- image_read_pdf(path = "file1.pdf")
pdf_cropped <- image_crop(pdf_total,"3000x1500")
for(i in seq(1,length(pdf_cropped))){
plot(pdf_cropped[i])
}
dev.off()
In this case I made a for loop to save all the pages, if you pass plot(pdf_cropped) the result is a pdf with a single page (first picture).
Related
I am new to R and trying to use the below code for cropping and saving multiple files in a R loop. It works fine, but the only problem is that as it saves the output images, the same output file gets overwritten resulting in only the last image being saved. I would like to have the cropped images saved as separate files as 'trial_1.png', 'trial_2.png', etc. I can't figure out how to fix this any any suggestions regarding this would be welcome. The code mainly uses the magick package functions. Thanks in advance.
Code:
library(pdftools)
library(magick)
library(png)
library(raster)
path = "~/Desktop/RME_task"
file.names<-dir(path, pattern = ".png")
for(i in 1:length(file.names)){
rme_stimuli_set1_1<-image_read(file.names[i])
rme_stimuli_set1_1_scaled<-image_scale(rme_stimuli_set1_1, "700x700")
rme_stimuli_set1_1_cropped<-image_crop(rme_stimuli_set1_1_scaled, "305x120+118+322")
image_write(rme_stimuli_set1_1_cropped, "CROPPED/trial_.png")
}
just change the file name in each interaction.
library(pdftools)
library(magick)
library(png)
library(raster)
path = "~/Desktop/RME_task"
file.names<-dir(path, pattern = ".png")
for(i in 1:length(file.names)){
rme_stimuli_set1_1<-image_read(file.names[i])
rme_stimuli_set1_1_scaled<-image_scale(rme_stimuli_set1_1, "700x700")
rme_stimuli_set1_1_cropped<-image_crop(rme_stimuli_set1_1_scaled, "305x120+118+322")
image_write(rme_stimuli_set1_1_cropped, paste0("CROPPED/trial_",i,".png"))
}
in this way each interaction of the loop will create trial_1.png, trial_2.png, etc.
I have seen a few questions asked involving trying to convert a pdf into a png but none of the answers show how to save each page of a multi-paged pdf as a different png file.
Starting out with an example 13-page pdf:
# exmaple pdf
example_pdf <- "https://arxiv.org/ftp/arxiv/papers/1312/1312.2789.pdf"
How can I save each page of the pdf as a different png file?
We can create a png of each page using the image_read_pdf function from the magick package:
#install magick package
install.packages("magick")
library("magick")
# creating magick-image class with a png for each page of the pdf
pages <- magick::image_read_pdf(example_pdf)
pages
# saving each page of the pdf as a png
j <- 1:13
for (i in j){
pages[i] %>% image_write(., path = paste0("image",i,".png"), format = "png")
}
This would save each page as "image(page number).png" in your main directory file.
For a small project I am trying to read some data from scanned PDF files that do not contain the data.
Following the instructions of the Tesseract package, the code below should work.
Unfortunately it triggers an error.
Error in tiff::writeTIFF(bitmap, "page.tiff") :
INTEGER() can only be applied to a 'integer', not a 'raw'
Any clue on how this can be resolved?
library(pdftools)
library(tiff)
library(tesseract)
# A PDF file with some text
setwd(tempdir())
news <- file.path(Sys.getenv("R_DOC_DIR"), "NEWS.pdf")
orig <- pdf_text(news)[1]
# Render pdf to jpeg/tiff image
bitmap <- pdf_render_page(news, dpi = 300)
tiff::writeTIFF(bitmap, "page.tiff")
# Extract text from images
out <- ocr("page.tiff")
cat(out)
Perhaps using pdf_convert() instead of pdf_render_page(), i.e.:
library(pdftools)
# A PDF file with some text
setwd(tempdir())
news <- file.path(Sys.getenv("R_DOC_DIR"), "NEWS.pdf")
orig <- pdf_text(news)[1]
# Render pdf to jpeg/tiff image
pdf_convert(news, format = "tiff")
This generates multiple tiffs in the directory so you should add a code that reads and processes all of them one by one.
I have the following data frame which can be downloaded from here. The column image_path has jpg files in base64 format. I want to extract the image and store it in a local folder. I tried using the code given here and here.
While the second one perfectly opens the image in the browser, I couldn't figure out how to save the file locally. I tried the following code:
library(shiny)
for (i in 1:length(df)){
file <- paste(df$id[i])
png(paste0(~images/file, '.png'))
tags$img(src = df$image_path[i])
dev.off()
}
The following just runs but doesn't create any image files and no errors are shown. When I tried running tags$img(src = df$image_path[1]) to see if it generates the image, it doesn't. I understand tags$img is a function within shiny and works when I pass it inside ui (as suggested by #daatali), but not sure how do I save the files locally.
What I want is to run a for loop from inside a server environment of shiny and save the images locally as jpg using id numbers as filename, which can be rendered with various other details captured in the survey.
I have never worked with images and please bear with me if this is completely novice.
This creates your images from the base64 strings and saves the files to your current working directory, subfolder "/images/". This article describes pretty well how to save files locally in Shiny.
library(shiny)
library(base64enc)
filepath <- "images/"
dir.create(file.path(filepath), showWarnings = FALSE)
df <- read.csv("imagefiletest.csv", header=T, stringsAsFactors = F)
for (i in 1:nrow(df)){
if(df[i,"image_path"] == "NULL"){
next
}
testObj <- strsplit(df[i,"image_path"],",")[[1]][2]
inconn <- testObj
outconn <- file(paste0(filepath,"image_id",df[i,"id"],".png"),"wb")
base64decode(what=inconn, output=outconn)
close(outconn)
}
I have ~10,000 png images saved neatly in different files on my PC. I want to write a function that does something like go to a particular folder and iteratively copy-pastes all the png files in that folder to a word document. Is this possible in R?
I've looked at package R2wd but it sadly only has a function that takes RData and outputs its plot to a word document (function wdPlot).
I also have the RData saved for each and every plot, so reason would dictate that I should be able to simply load the RData associated with a particular plot and then use wdPlot . The problem is that when I generated my png's the plots were grobs and I did something as follows:
png("rp.png",width=w,height=h)
plot(rp)
#Increase size of title
grid.edit(gridTitle_Ref, gp=gpar(fontsize=20))
#Other grid.edit alterations
dev.off()
save(rp)
Now, when I try to get that rp onto a word document by first loading it into R I naively do the following and it does not output a plot to MS Word with the title enlarged or any of the other grid.editalterations.
load("rp.Rdata")
png("rp.png",width=w,height=h)
wdPlot(rp)
#Increase size of title
grid.edit(gridTitle_Ref, gp=gpar(fontsize=20))
#Other grid.edit alterations
dev.off()
So, to reiterate: I have all these png files. At various times I have to copy-paste a subset of them into a word document. I'm too lazy to do that manually each time and want a program to do it for me.
EDIT 1
So, as per suggestions below, I've read up on Markdown. Following this post How to set size for local image using knitr for markdown?
I wrote something along the lines of:
```{r,echo=FALSE,fig.width=100, fig.height=100}
# Generate word documents of reports
# Clear all
rm(list=ls())
library(png)
library(grid)
library(knitr)
dir<-"location\of\file"
setwd(dir)
# Output only directories:
folders<-dir()[file.info(dir())$isdir]
for(folder in folders){
currentDir<-paste(dir,folder,"\\",sep="")
setwd(currentDir)
#All files in current folder
files<-list.files()
imgs<-[A list of all the png images in this particular file that I want in the word document - the png names]
for(img in imgs){
imgRaster<-readPNG(img)
grid.raster(imgRaster)
}
}
```
The following is a screenshot of what's in the resulting word document. How might I fix this? I want the images to appear one after the other in the document as the for loop above runs.
Do note that this is the first time I've ever used Markdown so any relevant tutorials linked in the comments could also be of great help.
EDIT 2
I followed the second answer's example below. Here is the output that I obtained
As you can see there are no images, only the html tags. How do I fix this?
If you have the png's saved you can just use a little html and a for loop to save them to a .doc file.
edit 2 for windows
# Start empty word doc
cat("<body>", file="exOut.doc", sep="\n")
# select all png files in working directory
for(i in list.files(pattern="*.png"))
{
temp <- paste('<img src=', i, '>')
cat(temp, file="exOut.doc", sep="\n", append=TRUE)
}
cat("</body>", file="exOut.doc", sep="\n", append=TRUE)
# Some example plots
for(i in 1:5)
{
png(paste0("ex", i, ".png"))
plot(1:5)
title(paste("plot", i))
dev.off()
}
# Start empty word doc
cat(file="exOut.doc")
# select all png files in working directory
for(i in list.files(pattern="*.png"))
{
temp <- paste('<img src=', i, '>')
cat(temp, file="exOut.doc", sep="\n", append=TRUE)
}
You will then need to embed the figures, either using the drop down menus or by writing a small macro that you can call with system
EDIT : small update to show explicit paths to output and figures
cat("<body>", file="/home/daff/Desktop/exOut.doc", sep="\n")
for(i in list.files(pattern="*.png"))
{
temp <- paste0('<img src=/home/daff/', i, '>')
cat(temp, file="/home/daff/Desktop/exOut.doc", sep="\n", append=TRUE)
}
Note that i used paste0 to remove the space between the path /home/daff/ and ex*.png.
Have you tried Rstudio and Markdown? You could put your code into chunks that load the files and save as word document. http://rmarkdown.rstudio.com/word_document_format.html