How to get text data from help pages in R?

How to get text data from help pages in R? - r

Globally, I'm interested in getting all text data from R documentations to put them in data frames and apply text mining techniques.
PACKAGE LEVEL: Suppose I'm interested in a package, for instance "utils" and I want to get all text data in a vector.
This works:
package_d <- packageDescription("utils")
package_d$Description
But not this :
package_d$Details
FUNCTIONS LEVEL : Same problem but for the functions. I tried this without success:
function_d <- ?utils::adist
function_d$Description
SUB-LEVELS : I would like to extract all the details, descriptions of arguments and values of the functions of a particular package...
Thank you very much for your help !

I couldn't find a built in one, but looking at the source for the functions that do most of the work, here's a function that can extract the text from the help page.
help_text <- function(...) {
file <- help(...)
path <- dirname(file)
dirpath <- dirname(path)
pkgname <- basename(dirpath)
RdDB <- file.path(path, pkgname)
rd <- tools:::fetchRdDB(RdDB, basename(file))
capture.output(tools::Rd2txt(rd, out="", options=list(underline_titles=FALSE)))
}
You can use it with the package help pages and function help pages.
h1 <- help_text(utils)
h2 <- help_text(adist)
You'll get an array of rows from the help page. You can print them with
cat(h1, sep="\n")

Related

Receiving "Error in image_read" when trying to batch convert images with "magick" in R

I am currently trying to write a function to batch process a number of PNGs I have in a file into SVGs. I wanted to get this up and running as it is something I usually have to do and find the "free" online alternatives to be lacking. I have written the following script to do this:
image.list <- list.files("D:/R Exports/06 Graphics")
image.select <- list()
batch.svg.convert <- function(image.list){
for (i in 1:length(image.list)){
image.select[i] <- paste0("D:/R Exports/06 Graphics/",image.list[i])
image.read <- image_read(image.select[i])
image.svg <- image_convert(image.read,format = "svg")
image.write <- image_write(image.svg, paste0("D:/R Exports/Graphics SVG",image.list[i]))
}
}
batch.svg.convert(image.list)
However, when doing so I get the following error in response:
Error in image_read(image.select[i]) :
path must be URL, filename or raw vector
I am unsure why this is occurring as when I run tests such as:
image_list2 <- list.files("D:/R Exports/06 Graphics")
image_test <- paste0("D:/R Exports/06 Graphics/",image_list2[6])
image_test2 <- image_read(image_test)
The image is read as intended. What exactly am I missing here? Is this a limitation of the magick package or am I missing something from my code to get this function working correctly?
Thanks in advance,
Jim

How do I use getNodeSet (XML package) within a function?

I am trying to develop a script to extract information from xml files. After parsing the XML file I use
idNodes <- getNodeSet(doc, "//compound[#identifier='101.37_1176.0998m/z']")
to subset a particular part of the document and then extract information I need using lines such as
subject <- sapply(idNodes, xpathSApply, path = './condition/sample', function(x) xmlAttrs(x)['name'])
My xml file has hundreds of identifiers of the type 101.37_1176.0998m/z
It is not possible to load all of the identifiers at once so I need iterate through the file by using getNodeSet followed by data extraction
My script works fine if I enter the identifier manually, i.e.
idNodes <- getNodeSet(doc, "//compound[#identifier='101.37_1176.0998m/z']")
but I would like to write a function so I can use do.call to pass the function a list of identifiers.
I have tried
xtract <- function(id){
idNodes <- getNodeSet(doc, "//compound[#identifier='id']")}
but when I use this function, i.e.
xtract('102.91_1180.5732m/z')
or
compounds <- c("101.37_1176.0998m/z", "102.91_1180.5732m/z")
do.call("xtract", list(compounds))
it is clear that getNodeSet has not worked, i.e. there is no data to be extracted.
If I use
xtract(102.91_1180.5732m/z)
I get: Error: unexpected input in "xtract(102.91_"
Can anyone help resolve this problem?

In the function it should be
idNodes <- getNodeSet(doc, paste0("//compound[#identifier='",id,"']"))
then the following call will work
xtract('102.91_1180.5732m/z')

Automatically Create Objects from Files in a Directory

I am trying to automate the data ingestion process in an R script that pulls data from a directory that updates regularly.
The general framework follows this process
library(sp)
library(rgdal)
library(raster)
f1.t1.cir <- stack("../raster/field1/f1_cir_t1.tif")
f1.t1.NDVI <- stack("../raster/field1/f1_ndvi_t1.tif")
f1.t1.RGB <- stack("../raster/field1/f1_ndvi_t1.tif")
f1.dat <- c(f1.t1.cir, f1.t1.NDVI, f1.t1.RGB)
for (i in f1.dat){
plotRGB(i)
}
I would like to generate each f1.t1.cir type object from the directory directly such that when I add a new TIFF file f1_cir_t2.tif, the r script will create an object f1.cir.t2.
I am trying to use something like
a <- list.files(path= "../raster/field1", pattern = "\\.tif$")
b <- gsub("_", "\\.", a)
for (i in a) {
assign(get(b[(which(a==i))]), stack((paste("../raster/field1/", i,sep=""))))
}
At this point, I would have all tiff files as stacked multiband raster objects in the R workspace.
I am getting the following error,
Error in get(b[(which(a == i))]) : object 'f1_t1_DSM.tif' not found
I can not figure out if this is a get() problem, or something else.
for reference
> a
[1] "f1_t1_DSM.tif" "f1_t1_NDVI.tif"
> b
[1] "f1.t1.DSM.tif" "f1.t1.NDVI.tif"
so that much is working, I think.
Any suggestions?

#joran, great suggestion...
f1.t1<-list()
for(i in list.files(path= "../raster/field1", pattern = "\\.tif$")){
f1.t1[[i]]<-stack((paste("../raster/field1/", i, sep="")))
}
Worked very well, no need to change the names.
Thank you.

Scraping multiple URLs in R using sapply

Good afternoon,
Thanks for helping me out with this question.
I have a list of multiple URLs that I am interested in scraping for a specific field.
At the moment, I'm using the function below to return the value I'm interested in for a specific field:
dayViews <- function (url) {
raw <- readLines(url)
dat <- fromJSON(raw)
daily <- dat$daily_views$`2014-08-14`
return(daily)
}
How do I modify this to run on a list of multiple URLs? I tried using sapply/lapply over a list of URLs, but it gives me the following message:
"Error in file(con, "r") : invalid 'description' argument"
If anyone has any suggestions, I would be greatly appreciative.
Many thanks,

Doing something similar to you, #yarbaur, I read into R an Excel spreadsheet that keeps all the URLs of a set I want to scrape. It has columns for company, URL, and XPath. Then try something like this code where I have substituted for your variable names I made up. I am not using JSON sites, however:
temp <- apply(yourspreadsheetReadintoR, 1,
function(x) {
yourCompanyName <- x[1]
yourURLS <- x[2]
yourxpath <- x[3] # I also store the XPath expressions for each site
fetch <- content(GET(yourURLS))
locs <- sapply(getNodeSet(fetch, yourxpath), xmlValue)
data.frame(coName=rep(yourCompanyName, length(locs)), location=locs)
})

raster images stacked recursively

I have the following problem, please.
I need to read recursively raster images, stack and store them in a file with different names (e.g. name1.tiff, name2.tiff, ...)
I tried the following:
for (i in 10) {
fn <- system.file ("external / test.grd", package = "raster")
fn <-stack (fn) # not sure if this idea can work.
fnSTACK[,, i] <-fn
}
here expect a result of the form:
dim (fnSTACK)
[1] 115 80 10
or something like that
but it didn't work.
Actually, I have around 300 images that I have to be store with different names.
The purpose is to extract time series information (if you know another method or suggestions I would appreciate it)
Any suggestions are welcomed. Thank you in advance for your time.

What I would first do is put all your *.tiff in a single folder. Then read all their names into a list. Stack them and then write a multi-layered raster. I'm assuming all the images have the same extent and projection.
### Load necessary packages
library(tiff)
library(raster)
library(sp)
library(rgdal) #I cant recall what packages you might need so this is probably
library(grid) # overkill
library(car)
############ function extracts the last n characters from a string
############ without counting the last m
subs <- function(x, n=1,m=0){
substr(x, nchar(x)-n-m+1, nchar(x)-m)
}
setwd("your working directory path") # you set your wd to were all your images are
filez <- list.files() # creates a list with all the files in the wd
no <- length(filez) # amount of files found
imagestack <- stack() # you initialize your raster stack
for (i in 1:no){
if (subs(filez[i],4)=="tiff"){
image <- raster(filez[i]) # fill up raster stack with only the tiffs
imagestack <- addLayer(imagestack,image)
}
}
writeRaster(imagestack,filename="output path",options="INTERLEAVE=BAND",overwrite=TRUE)
# write stack
I did not try this, but it should work.

Your question is rather vague and it would have helped if you had provided a full example script such that it could be more easily understood. You say you need to read several (probably not recursively?) raster images (files, presumably) and create a stack. Then you need to store them in files with different names. That sounds like copying the files to new files with a different names, and there are R functions for that, but that is probably not what you intended to ask.
if you have a bunch of files (with full path names or in the working directory), e.g. from list.files()
f <- system.file ("external/test.grd", package = "raster")
ff <- rep(f, 10)
you can do
library(raster)
s <- stack(ff)
I am assuming that you simply need this stack for operations in R (it is an object, but not a file). You can extract the values in many ways (see the help files and vignette of the raster package). If you want a three dimensional array, you can do
a <- as.array(s)
dim(a)
[1] 115 80 10

thanks "JEquihua" for your suggestion, just need to add the initial variable before addLayer ie:
for (i in 1:no){
if (subs(filez[i],4)=="tiff"){
image <- raster(filez[i]) # fill up raster stack with only the tiffs
imagestack <- addLayer(imagestack,image)
}
}
And sorry "RobertH", I'm newbie about R. I will be ask, more sure or exact by next time.
Also, any suggestions for extracting data from time series of MODIS images stacked. Or examples of libraries: "rts ()", "ndvits ()" or "bfast ()"
Greetings to the entire community.

Another method for stacking
library(raster)
list<-list.files("/PATH/of/DATA/",pattern="NDVI",
recursive=T,full.names=T)
data_stack<-stack(list)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to get text data from help pages in R? - r

Related

Receiving "Error in image_read" when trying to batch convert images with "magick" in R

How do I use getNodeSet (XML package) within a function?

Automatically Create Objects from Files in a Directory

Scraping multiple URLs in R using sapply

raster images stacked recursively

Categories

Resources