I've been having trouble saving lightgbm boosters after running a model with the bonsai and the tidymodels package (as in this StackOverflow thread How to save Tidymodels Lightgbm model for reuse - however, I never managed to get it to work).
Therefore, reading about it (and as it says on the update in the abovementioned StackOverflow thread) led me to find out that the development version of the lightgbm package (version 4.0 I believe) supports using saveRDS and readRDS directly for their lightgbm models.
The thing is, I don't know how to install the development version of lightgbm. Reading this thread I can see that James, who's one of the maintainers of the lightgbm package, wrote this (https://github.com/tidymodels/bonsai/issues/34):
"If you already have a local installation of LightGBM, you can re-install from the latest development version by running the following:"
REPO_DIR="${HOME}/repos"
cd "${REPO_DIR}/lgb-dev"
git checkout master
git pull origin master
git submodule update --recursive
sh build-cran-package.sh \
--no-build-vignettes
R CMD INSTALL \
--with-keep.source \
./lightgbm_*.tar.gz
My problem is that I do not understand this code. Is it supposed to be run in R? (this does not work for me). Or in the command line? Which language is this?
If anyone can help me out regarding installation of the dev version of the lightgbm package, I would be really grateful
The code you've provided is the preferred way to install the development version of {lightgbm} (the R package for LightGBM).
That can be run from any shell that understands Unix commands, as long as you have git and R installed. If you're on a Mac, you could for example use the Terminal application. If you're on a Windows machine, you could use Git for Windows.
Unfortunately, {lightgbm} cannot be installed from its git repo using R-only tools like remotes::install_github(). That is intentional, as described in https://github.com/tidymodels/bonsai/issues/34#issuecomment-1187589193.
If you are not comfortable using git and the command line, the following R-only solution could be used to install the latest development version of {lightgbm}.
library(httr)
library(jsonlite)
TEMP_DIR <- "tmp-lightgbm"
if (!dir.exists(TEMP_DIR)) {
dir.create(TEMP_DIR)
}
builds_response <- httr::RETRY(
verb = "GET"
, url = "https://dev.azure.com/lightgbm-ci/lightgbm-ci/_apis/build/builds?branchName=refs/heads/master&resultFilter=succeeded&queryOrder=finishTimeDescending&%24top=1&api-version=7.1-preview.7"
)
build_id <- jsonlite::fromJSON(
txt = httr::content(builds_response, as = "text")
, simplifyDataFrame = FALSE
)[["value"]][[1L]][["id"]]
artifact_url <- paste0(
"https://dev.azure.com/lightgbm-ci/lightgbm-ci/_apis/build/builds/"
, build_id
, "/artifacts?artifactName=R-package&api-version=7.1-preview.5&%24format=zip"
)
download.file(
url = artifact_url
, destfile = file.path(TEMP_DIR, "lightgbm-r.zip")
)
unzip(
zipfile = file.path(TEMP_DIR, "lightgbm-r.zip")
, exdir = file.path(TEMP_DIR, "extracted")
)
artifacts_dir <- file.path(TEMP_DIR, "extracted", "R-package")
tarball_path <- list.files(
path = artifacts_dir
, pattern = "lightgbm-[0-9.]+-r-cran\\.tar\\.gz"
, full.names = TRUE
)[[1L]]
new_tarball_path <- file.path(artifacts_dir, "lightgbm.tar.gz")
file.rename(tarball_path, new_tarball_path)
install.packages(new_tarball_path, repos = NULL, type = "source")
Related
We have recently got RStudio Connect in my office. For our work, we have made custom packages, which we have updated amongst ourselves by opening the project and build+reloading.
I understand the only way I can get our custom packages to work within apps with RSConnect is to get up a local repo and set our options(repos) to include this.
Currently I have the following:
library(drat)
RepoAddress <- "C:/<RepoPath>" # High level path
drat::insertPackage(<sourcePackagePath>, repodir = RepoAddress)
# Add this new repo to Rs knowledge of repos.
options(repos = c(options("repos")$repos,LocalCurrent = paste0("file:",RepoAddress)))
# Install <PackageName> from the local repo :)
install.packages("<PackageName>")
Currently this works nicely and I can install my custom package from the local repo. This indicates to me that the local repo is set up correctly.
As an additional aside, I have changed the DESCRIPTION file to have an extra line saying repository:LocalCurrent.
However when I try to deploy a Shiny app or Rmd which references , I get the following error on my deploy:
Error in findLocalRepoForPkg(pkg, repos, fatal = fatal) :
No package '<PackageName> 'found in local repositories specified
I understand this is a problem with packrat being unable to find my local repos during the deploy process (I believe at a stage where it uses packrat::snapshot()).This is confusing since I would have thought packrat would use my option("repos") repos similar to install.packages. If I follow through the functions, I can see the particular point of failure is packrat:::findLocalRepoForPkg("<PackageName", repos = packrat::get_opts("local.repos")), which fails even after I define packrat::set_opts("local.repos" = c(CurrentRepo2 = paste0("file:",RepoAddress)))
If I drill into packrat:::findLocalRepoForPkg, it fails because it can't find a file/folder called: "C://". I would have thought this is guaranteed to fail, because repos follow the C://bin/windows/contrib/3.3/ structure. At no point would a repo have the structure it's looking for?
I think this last part is showing I'm materially misunderstanding something. Any guidance on configuring my repo so packrat can understand it would be great.
One should always check what options RStudio connect supports at the moment:
https://docs.rstudio.com/connect/admin/r/package-management/#private-packages
Personally I dislike all options for including local/private packages, as it defeats the purpose of having a nice easy target for deploying shiny apps. In many cases, I can't just set up local repositories in the organization because, I do not have clearance for that. It is also inconvenient that I have to email IT-support to make them manually install new packages. Overall I think RS connect is great product because it is simple, but when it comes to local packages it is really not.
I found a nice alternative/Hack to Rstudio official recommendations. I suppose thise would also work with shinyapps.io, but I have not tried. The solution goes like:
add to global.R if(!require(local_package) ) devtools::load_all("./local_package")
Write a script that copies all your source files, such that you get a shinyapp with a source directory for a local package inside, you could call the directory for ./inst/shinyconnect/ or what ever and local package would be copied to ./inst/shinyconnect/local_package
manifest.
add script ./shinyconnect/packrat_sees_these_dependencies.R to shiny folder, this will be picked up by packrat-manifest
Hack rsconnet/packrat to ignore specifically named packages when building
(1)
#start of global.R...
#load more packages for shiny
library(devtools) #need for load_all
library(shiny)
library(htmltools) #or what ever you need
#load already built local_package or for shiny connection pseudo-build on-the-fly and load
if(!require(local_package)) {
#if local_package here, just build on 2 sec with devtools::load_all()
if(file.exists("./DESCRIPTION")) load_all(".") #for local test on PC/Mac, where the shinyapp is inside the local_package
if(file.exists("./local_package/DESCRIPTION")) load_all("./local_package/") #for shiny conenct where local_package is inside shinyapp
}
library(local_package) #now local_package must load
(3)
make script loading all the dependencies of your local package. Packrat will see this. The script will never be actually be executed. Place it at ./shinyconnect/packrat_sees_these_dependencies.R
#these codelines will be recognized by packrat and package will be added to manifest
library(randomForest)
library(MASS)
library(whateverpackageyouneed)
(4) During deployment, manifest generator (packrat) will ignore the existence of any named local_package. This is an option in packrat, but rsconnect does not expose this option. A hack is to load rsconnect to memory and and modify the sub-sub-sub-function performPackratSnapshot() to allow this. In script below, I do that and deploy a shiny app.
library(rsconnect)
orig_fun = getFromNamespace("performPackratSnapshot", pos="package:rsconnect")
#packages you want include manually, and packrat to ignore
ignored_packages = c("local_package")
#highjack rsconnect
local({
assignInNamespace("performPackratSnapshot",value = function (bundleDir, verbose = FALSE) {
owd <- getwd()
on.exit(setwd(owd), add = TRUE)
setwd(bundleDir)
srp <- packrat::opts$snapshot.recommended.packages()
packrat::opts$snapshot.recommended.packages(TRUE, persist = FALSE)
packrat::opts$ignored.packages(get("ignored_packages",envir = .GlobalEnv)) #ignoreing packages mentioned here
print("ignoring following packages")
print(get("ignored_packages",envir = .GlobalEnv))
on.exit(packrat::opts$snapshot.recommended.packages(srp,persist = FALSE), add = TRUE)
packages <- c("BiocManager", "BiocInstaller")
for (package in packages) {
if (length(find.package(package, quiet = TRUE))) {
requireNamespace(package, quietly = TRUE)
break
}
}
suppressMessages(packrat::.snapshotImpl(project = bundleDir,
snapshot.sources = FALSE, fallback.ok = TRUE, verbose = FALSE,
implicit.packrat.dependency = FALSE))
TRUE
},
pos = "package:rsconnect"
)},
envir = as.environment("package:rsconnect")
)
new_fun = getFromNamespace("performPackratSnapshot", pos="package:rsconnect")
rsconnect::deployApp(appDir="./inst/shinyconnect/",appName ="shinyapp_name",logLevel = "verbose",forceUpdate = TRUE)
The problem is one of nomenclature.
I have set up a repo in the CRAN sense. It works fine and is OK. When packrat references a local repo, it is referring to a local git-style repo.
This solves why findlocalrepoforpkg doesn't look like it will work - it is designed to work with a different kind of repo.
Feel free to also reach out to support#rstudio.com
I believe the local package code path is triggered in packrat because of the missing Repository: value line in the Description file of the package. You mentioned you added this line, could you try the case-sensitive version?
That said, RStudio Connect will not be able to install the package from the RepoAddress as you've specified it (hardcoded on the Windows share). We'd recommend hosting your repo over https from a server that both your Dev environment and RStudio Connect have access to. To make this type of repo setup much easier we just released RStudio Package Manager which you (and IT!) may find a compelling alternative to manually managing releases of your internal packages via drat.
I am using R on my work computer inside of our Network. Naturally, I do not have admin rights and It would be hard to convince my IT Department to make an exception for me installing Rtools on my computer.
My main issue with not having Rtools is, that I cannot use the saveWorkbook command from openxlsx, which would allow me to save data as Excel table objects.
The Error of the command implies that I could use an alternative zip application:
Please make sure Rtools is installed or a zip application is available to R
Would this be possible? Our work Computers have 7-zip for instance.
In line with the comment of #Tung and others I copied a Folder of Rtools from my private Computer to my work Computer. I tried the following to no avail
Rtools.bin="C:\\Rtools\\bin"
sys.path = Sys.getenv("PATH")
if (Sys.which("zip") == "" ) {
system(paste("setx PATH \"", Rtools.bin, ";", sys.path, "\"", sep = ""))
}
I also tried using Sys.setenv("R_ZIPCMD" = "C:/Program Files/7-Zip/7zG.exe") to use 7 zip but then I get an error messan Incorrect Switch postfix: -r1
I am specifally trying to replicate the writeDataTable example from openxlsx
To avoid the irregular parameter error caused by the directory including space and set workspace and tool path at once, I tried to compile R package in R console. My goal is compile the new version of "text2vec-0.4" which downloaded from https://github.com/dselivanov/text2vec/tree/0.4
setwd("E:/packbuild/")
builder <- "D:/Program Files/R/R-3.3.1/bin/x64/Rcmd.exe"
para <- "INSTALL --build"
packname <- "text2vec-0.4"
system(paste(shQuote(builder), para, packname, sep = " "), wait = FALSE)
When I run this program, it runs “successfully” no errors or warnings, but when I check the workspace"E:/packbuild/", there is no zip files generated and no new files installed in the R library. Rtools has completely installed in “D:/Rtools/” and its default path set has done.
Does Rtools can’t call by a meta program like this?
Or if we use command line, and don’t write directory of “Rcmd” into
path, how to do is right?
I'm trying to run some demo R code for the optparse package that I got from R-bloggers. I am using ubuntu 14.04
The code is:
#!/usr/bin/env Rscript
library(optparse)
option_list = list( make_option(c("-f", "--file"),
type="character", default=NULL,
help="dataset file name",
metavar="character"),
make_option(c("-o", "--out"),
type="character", default="out.txt",
help="output file name [default=
%default]", metavar="character")
);
opt_parser = OptionParser(option_list=option_list);
opt = parse_args(opt_parser);
if (is.null(opt$file)){
print_help(opt_parser)
stop("At least one argument must be supplied (input file).n",
call.=FALSE)
}
## program...
df = read.table(opt$file, header=TRUE)
num_vars = which(sapply(df, class)=="numeric")
df_out = df[ ,num_vars]
write.table(df_out, file=opt$out, row.names=FALSE)
If the entire script is saved in a file called yasrs.R using the call:
Rscript --vanilla yasrs.R
should return the help messages.
I get an error:
Rscript --vanilla yasrs.R Error in library(optparse) : there is no package called ‘optparse’
I have installed the package (optparse) through RStudio when writing the code and also ensured that it is installed when calling from the terminal. Both terminal and RStudio are running the same R version.
Any suggestions would be appreciated.
Where did RStudio install optparse? Get that from packageDescription("optparse").
Then check the output of .libPaths() in your Rscript environment and your RStudio environment. Maybe RStudio stuck it somewhere that RScript doesn't look.
Then check that even though they may be the same version of R, they might be two different installations. What does R.home() say in each?
One or more of these things will show you why it doesn't find it. Solution is probably to write and run a little RScript that installs it, then you should be fairly sure its going to go in a location that RScript will find it in future.
I'm distributing jobs over a cluster and I'd rather not go to each machine and manually install the right packages. The job controller runs scripts as nobody, so I have to specify uncontroversial writeable paths for the installations. I actually had this working solution:
`%ni%` = Negate(`%in%`) ### "not in"
.libPaths("/tmp/") ### for local (remote non super user) install of packages
if ("xxx" %ni% installed.packages()) {install.packages("xxx", repos = "http://cran.r-project.org", lib="/tmp/")}
# ... and more
library(xxx)
# ... and more
It worked at first, but a week later I've got a strange problem.
> library(xxx)
Error in library(xxx) : there is no package called 'xxx'
xxx (and other packages) is in the manifest of installed.packages(), .libPaths is reporting /tmp/ on the path, and ls shows a folder for the package in /tmp/. Reinstalling with install.packages throws an error, as does remove.package, update.package, and find.package.
Two questions:
Is there a different way that I ought to have managed the remote install?
Any ideas what has caused my problem with the failure to load the package?
Please save me from having to implement a kludge like
locdir <- paste("/tmp/", as.integer(runif(1, 1, 100000)), sep='')
system(paste("mkdir", locdir))
.libPaths(locdir)
install.packages("xxx", repos = "http://cran.r-project.org", lib=locdir)
library(xxx)
You might need option character.only = TRUE, although it is weird that your code worked before but not anymore. Anyway, try this function:
packageLoad<-function(libName){
# try to load the package
if (!require(libName,character.only = TRUE)){
# if package is not available, install it
install.packages(libName,dep=TRUE,
repos="http://cran.r-project.org",lib="/tmp/",destdir="/tmp/")
# try again
if(!require(libName,character.only = TRUE))
stop(paste("Package ",libName,"
not found and its installation failed."))
}
}