Importing pdf in R through package "tm"

Importing pdf in R through package "tm" - r

I know the practical example to get pdf in "R" workspace through package "tm" but not able to understand how the code is working and thus not able to import the desired pdf. The pdf imported in the following code is "tm" vignette.
The code is
if(file.exists(Sys.which("pdftotext"))) {
pdf <- readPDF(PdftotextOptions = "-layout")(elem = list(uri = vignette("tm")$pdf),
language = "en",
id = "id1")
pdf[1:13]
}
The "tm" is vignette. While the pdf which I am trying to bring is "different". So how to change the above code to bring my pdf in the workspace. minn is the pdf document which I am trying to import.
like
if(file.exists(Sys.which("pdftotext"))) {
pdf <- readPDF(PdftotextOptions = "-layout")(elem = list(uri = vignette("minn")$pdf),
language = "en",
id = "id1")
pdf[1:13]
}

So it seems that problem is with the PDF which I was trying to read. However the code goes like the below. Thanks Thomas for the lead. The link for pdf is "http://www.wine-economics.org/workingpapers/AAWE_WP16.pdf"
tt <- readPDF(PdftotextOptions="-layout")
rr <- tt(elem=list(uri="AAWE_WP16.pdf"),language="en",id="id1")
rr[1:15]

Related

How to reference Rmd files which should be rendered (within a package function)

I write a package, which will be used to create automated reports.
There is one function createPdfReport which basically looks as follows (I use RStudio):
createPdfReport <- function(dataset, save_path) {
rmdName <- strsplit(x = basename(dataset, split = ".", fixed = TRUE)[[1]][1]
# some code here which uses "dataset"
relPath <- dirname(rstudioapi::getSourceEditorContext()$path)
rmarkdown::render(input = paste0(relPath, "/myRMDfile.Rmd"),
output_dir = save_path,
output_file = paste0(rmdName , ".html"),
encoding = "UTF-8", quiet = TRUE)
}
Most likely, R will finally run on a server and it is not clear, which operating system or editor is used there.
Therefore, I would like to get rid of rstudioapi::getSourceEditorContext().
But how? I could not find anything.
createPdfReport is part of a typical package with the following structure:
DESCRIPTION
NAMESPACE
/man
/R
createPdfReport.R --> Contains the function createPdfReport() above
myRMDfile.Rmd
/tests

You could store myRMDfile.Rmd in inst/extdata, see package raw data.
This allows to get the file path and use it after package installation with:
system.file("extdata", "myRMDfile.Rmd", package = "myPackage")

Write KML File in R

I am creating a shiny application where I allow the user to write the data out to either csv or kml. However, my code below does not write the features out to the kml file, such that when I open the KML in google earth it shows black dots and when clicked it displays the row index of the original data rather than All column values for that specific point. I was using writeOGR function but it was not writing the file, so I switched to using plotKML package. I want the user to choose where the file is saved (using the filename I specify with date and timestamp) and display in Google Earth all features of any given datapoint.
output$downloadData <- downloadHandler(
filename = function() {
paste0("data_",Sys.Date(), input$download_type)
},
content = function(file) {
if (input$download_type == ".csv"){
write.csv(data, file, row.names = FALSE)
} else if (input$download_type == ".KML") {
features <- c("COLUMN_1", "COLUMN_2", "COLUMN_3") #These are the features I want displayed in Google Earth
data[features] <- as.character(data[features])
coordinates(data) <- ~X + Y
proj4string(data) <- CRS("+proj=longlat +datum=WGS84")
kml_description(data, caption = "Data",
delim.sign = "_", asText = F)
kml(data, file = file) #Not sure why this produces points but doesn't display features in Google Earth
#writeOGR(data, dsn = file, layer="Data", driver = "KML")
}
})
enter image description here

Getting a KML in a way that can be adeuqately read by Google Maps is not yet as easy as it should be.
You may want to give a try to export via the sf package and libkml.
sf::st_write(obj = an_sf_object,
dsn = kml_file_path,
driver = "libkml")
(you may need to install libkml on your server)
See also this function from the latlon2map package for a working (albeit less than ideal) implementation.

getting rmarkdown to print improved tibble printing

The pillar package offers a number of options to format tibble printing.
https://pillar.r-lib.org/reference/pillar-package.html#package-options
For example, this is what I see on my Windows machine, which supports these options:
But when I set the same options for rmarkdown document, I don't see any difference in the printed output.
Is there a way to successfully get this to work or this is not supported in rmarkdown itself?

In the vignette for the tibble package, there is a possible solution. In your setup chunk of your .Rmd file, put:
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(tibble)
set.seed(1014)
options(crayon.enabled = TRUE)
options(pillar.bold = TRUE, pillar.subtle_num = TRUE)
knitr::opts_chunk$set(collapse = TRUE, comment = pillar::style_subtle("#>"))
colourise_chunk <- function(type) {
function(x, options) {
lines <- x
if (type != "output") {
lines <- crayon::red(lines)
}
paste0(
'<div class="sourceCode"><pre class="sourceCode"><code class="sourceCode">',
paste0(
fansi::sgr_to_html(htmltools::htmlEscape(lines)),
collapse = "\n"
),
"</code></pre></div>"
)
}
}
knitr::knit_hooks$set(
output = colourise_chunk("output"),
message = colourise_chunk("message"),
warning = colourise_chunk("warning"),
error = colourise_chunk("error")
)
In a new chunk:
broom::tidy(stats::chisq.test(table(ggplot2::msleep$vore)))
My HTML output:

I'd say that from my markdown experience that Pillow will not work as markdown uses pandoc.
As alternatives, I'd recommend using the kable package for a similar look using it's themes options. A handy tutorial with using a relatively similar theme ->
https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
Another cool option of making really cool markdown tables would be to use formattable, which has a lot of in-depth options for formatting
And a couple of handy tutorials for that ->
https://www.littlemissdata.com/blog/prettytables
https://www.littlemissdata.com/blog/pretty-r-tables-in-github
Hopefully, this helps you out.

How to export a 'reactable' table to an image (PDF,JPG,PNG, or HTML page?) in R?

I have created a beautiful table in R using the 'reactable' pkg/function. I can export (in effect, knit) it as an HTML page, but is there a way to export it as an image (and if so, automatically)?
Not necessary, but here is some code:
x<-as.data.frame(list(a=c("why","can't","I","figure","this","out"),b=c("it","is","probably","something","really","simple")))
reactable(x)

I have not tried but this should work. reactable generate a htmlwidget. So you can use the saveWidget function of the htmlwidgets package to save the table in a html file, then use the webshot package to take a snapshot.
library(reactable)
library(htmlwidgets)
library(webshot)
rtable <- reactable(iris[1:8,])
html <- "rtable.html"
saveWidget(rtable, html)
webshot(html, "rtableSnapshot.png") # you can also export to pdf

Here's an example of saving a Reactable table as an HTML file using saveWidget then using that file and webshot2 to create a .png. Does this help?
library("reactable")
library("htmlwidgets")
library("webshot2")
html_file <- "table.html"
img_file <- "img.png"
df <- mtcars[1:3, 1:3]
table <- reactable(df)
saveWidget(widget = table, file = html_file, selfcontained = TRUE)
webshot(url = html_file, file = img_file, delay = 0.1, vwidth = 1245)

Simply reactablefmtr::save_reactable(x, "x.png")

R tm package readPDF error in strptime(d, fmt) : input string too long

I would like to do text mining of the files on this website using the tm package. I am using the following code to download one of the files (i.e., abell.pdf) to my working directory and attempt to store the contents:
library("tm")
url <- "https://baltimore2006to2010acsprofiles.files.wordpress.com/2014/07/abell.pdf"
filename <- "abell.pdf"
download.file(url = url, destfile = filename, method = "curl")
doc <- readPDF(control = list(text = "-layout"))(elem = list(uri = filename),
language = "en", id = "id1")
But I receive the following error and warnings:
Error in strptime(d, fmt) : input string is too long
In addition: Warning messages:
1: In grepl(re, lines) : input string 1 is invalid in this locale
2: In grepl(re, lines) : input string 2 is invalid in this locale
The pdfs aren't particularly long (5 pages, 978 KB), and I have been able to successfully use the readPDF function to read in other pdf files on my Mac OSX. The information I want most (the total population for the 2010 census) is on the first page of each pdf, so I've tried shortening the pdf to just the first page, but I get the same message.
I am new to the tm package, so I apologize if I am missing something obvious. Any help is greatly appreciated!

Based on what I've read this error has something to do with the way that the "readPDF" function tries to make metadata for the file you're importing. Anyway, you can change the metadata info by using the "info" option. For example, I usually circumvent this error by modifying the command in the following way (using your code):
doc <- readPDF(control = list(info="-f",text = "-layout"))(elem = list(uri = filename),language = "en", id = "id1")
Where the addition of "info="-f"" is the only change. This doesn't really "fix" the problem, but it bypasses the error. Cheers :)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Importing pdf in R through package "tm" - r

Related

How to reference Rmd files which should be rendered (within a package function)

Write KML File in R

getting rmarkdown to print improved tibble printing

How to export a 'reactable' table to an image (PDF,JPG,PNG, or HTML page?) in R?

R tm package readPDF error in strptime(d, fmt) : input string too long

Categories

Resources