I am trying to use an R script hosted on GitHub plugin-draw.R. How should I use this plugin?
You can simply use source_url from package devtools :
library(devtools)
source_url("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R")
Based on #Matifou's reply, but using the "new" method appending ?raw=TRUE at the end of your URL:
devtools::source_url("https://github.com/tonybreyal/Blog-Reference-Functions/blob/master/R/bingSearchXScraper/bingSearchXScraper.R?raw=TRUE")
You can use solution offered on R-Bloggers:
source_github <- function(u) {
# load package
require(RCurl)
# read script lines from website
script <- getURL(u, ssl.verifypeer = FALSE)
# parase lines and evaluate in the global environment
eval(parse(text = script))
}
source_github("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R")
For the function to be evaluated in a global environment (I'm guessing that you will prefer this solution) you can use:
source_https <- function(u, unlink.tmp.certs = FALSE) {
# load package
require(RCurl)
# read script lines from website using a security certificate
if(!file.exists("cacert.pem")) download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile = "cacert.pem")
script <- getURL(u, followlocation = TRUE, cainfo = "cacert.pem")
if(unlink.tmp.certs) unlink("cacert.pem")
# parase lines and evealuate in the global environement
eval(parse(text = script), envir= .GlobalEnv)
}
source_https("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R")
source_https("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/htmlToText/htmlToText.R", unlink.tmp.certs = TRUE)
As mentioned in the the original article by Tony Breyal, this discussion on SO should also be credited as it is relevant to the discussed question.
If it is a link on GitHub where you can click on Raw next to the Blame, you can actually just use the ordinary base::source. Go to the R script of your choice and find the Raw button.
The link will contain raw.githubusercontent.com now, and the page show nothing but R script itself. Then, for this example,
source(
paste0(
"https://raw.githubusercontent.com/betanalpha/knitr_case_studies/master/",
"stan_intro/stan_utility.R"
)
)
(paste0 was used just to fit the URL into a narrower screen.)
Related
I have been mainly working with .xlsb files(binary file type of xlsx) which I would like to read/write using R. Could you please let me know if there is any package that is available for this or do I need to create package on my own?
RODBC did not work too.
Try the excel.link package. The xl.read.file function allows rectangular data sets to be read-in, though there are other options available.
You also need to (install and) call the RDCOMClient package before running the first excel.link function.
e.g.,
read_xlsb <- function(x){
require("RDCOMClient")
message(paste0("Reading ", x, "...\n"))
df <- excel.link::xl.read.file(filename = x, header = TRUE,
xl.sheet = Worksheet_name)
df$filename <- x
df <- as.data.frame(df)
return(df)
}
The only annoynce I've found is that I can't override Excel's "save on close" functionality so these pop-ups need to be closed by hand.
BTW I think excel.link only works on Windows machines.
I am trying to extract urls from the website below. The tricky thing here is that the website automatically loads new pages. I did not manage to get the xpath for scraping all urls, including those on the newly loaded pages - I only manage to get the first 15 urls (of more than 70). I assume the xpath in the last line (new_results...) is missing some crucial element to account also for the pages after. Any ideas? Thank you!
# load packages
library(rvest)
library(httr)
library(RCurl)
library(XML)
library(stringr)
library(xml2)
# aim: download all speeches stored at:
# https://sheikhmohammed.ae/en-us/Speeches
# first, create vector which stores all urls to each single speech
all_links <- character()
new_results <- "/en-us/Speeches"
signatures = system.file("CurlSSL", cainfo = "cacert.pem", package = "RCurl")
options(RCurlOptions = list(verbose = FALSE, capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))
while(length(new_results) > 0){
new_results <- str_c("https://sheikhmohammed.ae", new_results)
results <- getURL(new_results, cainfo = signatures)
results_tree <- htmlParse(results)
all_links <- c(all_links, xpathSApply(results_tree,"//div[#class='speech-share-board']", xmlGetAttr,"data-url"))
new_results <- xpathSApply(results_tree,"//div[#class='speech-share-board']//after",xmlGetAttr, "data-url")}
# or, alternatively with phantomjs (also here, it loads only first 15 urls):
url <- "https://sheikhmohammed.ae/en-us/Speeches#"
# write out a script phantomjs can process
writeLines(sprintf("var page = require('webpage').create();
page.open('%s', function () {
console.log(page.content); //page source
phantom.exit();
});", url), con="scrape.js")
# process it with phantomjs
write(readLines(pipe("phantomjs scrape.js", "r")), "scrape.html")
Running the Javascript for lazy loading in RSelenium or Selenium in Python would be the most elegant approach to solve the problem. Yet, as a less elegant but faster alternative, one can manually change the settings of the json query in the firefox development modus/network feature to load not only 15 but more (=all) speeches at once. This worked fine for me and I was able to extract all the links via the json response.
I have created some functions in R and whenever I need any of those functions, I need to re-create that. Please suggest me the way and steps so that i can use directly those functions in any session of R without recreating them.
While Carl's answer is acceptable, I personally think that this is exactly the situation where you should package your functions and simply call them as a library.
There are very good reasons to do this:
Documentation (with emphasis!)
Tests
Easy loading (library(mypackage))
Easy to share and portable across systems
Easy to use within reporting (Rmd/knitr)
Reduces potential for duplication
Learning the R package system will be a strong part of your toolbox and other benefits of organizing your code appropriately will become apparent.
I have a series of functions that I need across all sessions. The trick is to add them to your .First file so that they are sourced into every session globally.
A helper function to find your first-file
find.first <- function(edit = FALSE, show_lib = TRUE){
candidates <- c(Sys.getenv("R_PROFILE"),
file.path(Sys.getenv("R_HOME"), "etc", "Rprofile.site"),
Sys.getenv("R_PROFILE_USER"),
file.path(getwd(), ".Rprofile")
)
first_hit <- Filter(file.exists, candidates)
if(show_lib & !edit){
return(first_hit)
}else {
file.edit(first_hit)
}
}
Say your scripts you use everywhere are in '/mystuff/R'
# Pop open the first Rprofile file.
find.first(edit = TRUE)
You will see something like this:
##Emacs please make this -*- R -*-
## empty Rprofile.site for R on Debian
##
## Copyright (C) 2008 Dirk Eddelbuettel and GPL'ed
##
## see help(Startup) for documentation on ~/.Rprofile and Rprofile.site
# ## Example of .Rprofile
# options(width=65, digits=5)
# options(show.signif.stars=FALSE)
# setHook(packageEvent("grDevices", "onLoad"),
# function(...) grDevices::ps.options(horizontal=FALSE))
# set.seed(1234)
#.First <- function(){}
#
#
Edit the function to something like:
.First <- function(){
all_my_r <- list.files('/mystuff/R', full.names = T,
recursive = T, pattern = ".R$" )
lapply(all_my_r, function(i){
tryCatch(source(i), error = function(e)NULL)
})
}
Save the file. Then restart the session.
I am getting an error from fread:
Internal error: ch>eof when detecting eol
when trying to read a csv file downloaded from an https server, using R 3.2.0. I found something related on Github, https://github.com/Rdatatable/data.table/blob/master/src/fread.c, but don't know how I could use this, if at all. Thanks for any help.
Added info: the data was downloaded from here:
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
then I used
download.file(fileURL, "Idaho2006.csv", method = "Internal")
The problem is that download.file doesn't work with https with method=internal unless you're on Windows and set an option. Since fread uses download.file when you pass it a URL and not a local file, it'll fail. You have to download the file manually then open it from a local file.
If you're on Linux or have either of the following already then do method=wget or method=curl instead
If you're on Windows and don't have either and don't want to download them then do setInternet2(use = TRUE) before your download.file
http://www.inside-r.org/r-doc/utils/setInternet2
For example:
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
tempf <- tempfile()
download.file(fileURL, tempf, method = "curl")
DT <- fread(tempf)
unlink(tempf)
Or
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
tempf <- tempfile()
setInternet2 = TRUE
download.file(fileURL, tempf)
DT <- fread(tempf)
unlink(tempf)
fread() now utilises curl package for downloading files. And this seems to work just fine atm:
require(data.table) # v1.9.6+
fread(fileURL, showProgress = FALSE)
The easiest way to fix this problem in my experience is to just remove the s from https. Also remove the method you don't need it. My OS is Windows and i have tried the following code and works.
fileURL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
download.file(fileURL, "Idaho2006.csv")
I am trying to do some data.table manipulations in an Rmd file. The file works just fine with knit. However, when I run it through easyHtmlReport, it doesn't work: my data.table by expressions fail with ‘Error: object 'userId' not found’, where userId is one of the columns in my data table that I am using in the j expression. The broken expression is:
expt.daystat = expt.users[,list(count=length(userId)),
keyby=list(day, status)]
As I said, it works fine in plain knit but breaks in easyHtmlReport.
#Ramnath is correct. Line 40 of EasyHTMLReport.R is:
knit(input=f,output=md.file)
Update this line with:
knit(input = f, output = md.file, envir = envir)
Update the signature of the function from:
easyHtmlReport <-
function(rmd.file,from,to,subject,headers=list(),control=list(),
markdown.options=c("hard_wrap","use_xhtml","smartypants"),
stylesheet="", echo.disable=TRUE, is.debug=F){
to:
easyHtmlReport <-
function(rmd.file,from,to,subject,headers=list(),control=list(),
markdown.options=c("hard_wrap","use_xhtml","smartypants"),
stylesheet="", echo.disable=TRUE, is.debug=FALSE, envir = parent.frame()){
If you don't want to rebuild the package you should be able to make this change using the edit function.
I wanted to post my alternative solution that I ended up using. It utilises mailR which allows multiple recipients and enables the easy implementation of html without worrying about mime_part commands.
library(mailR)
library(markdown)
library(knitr)
from <- "me#me.com"
to <- "me#me.com"
subject <- "Test"
message <- markdownToHTML(knit("Test.Rmd"))
send.mail(from,to,subject,message,html=TRUE,smtp=list( host.name="smtp.test.com"))