ReadError: file could not be opened successfully. But I am not sure where the tar file is stored to resolve this - bert-language-model

I am using biobert-embeddings==0.1.2 and torch==1.2.0 versions to embed some documents. But, I get the following error when I try to load the model by
from biobert_embedding.embedding import BiobertEmbedding
biobert = BiobertEmbedding()
Output/Error I get is -
Extracting biobert model tar.gz
ReadError: file could not be opened successfully

I was also having the same issue. Please follow the below steps to run the model:
Download the model from the link https://www.dropbox.com/s/hvsemunmv0htmdk/biobert_v1.1_pubmed_pytorch_model.tar.gz?dl=0
Extract all the files from the downloaded tar.gz file.
Use code:
biobert = BiobertEmbedding(model_path = "location_you_installed")
Note: Please make sure "location_you_installed" has config.json, pytorch_model.bin, and vocab.text files. These files are obtained after step 2.

Related

Downloading and unzipping GitHub zipped files directly in R

I am trying to download and unzip a folder of files from GitHub into R. I can manually download the file at https://github.com/dylangomes/SO/blob/main/Shape.zip and then extract all files in working directory, but I'd like to work directly from R.
utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip")
# Warning message:
# In utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip", :
# error 1 in extracting from zip file
It says it is a warning message, although nothing has been downloaded or unzipped into my wd.
I can download the file to my machine:
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip")
But I get the same message with the unzip function:
utils::unzip("Shape.zip")
And the downloaded file cannot manually be extracted. Here, I get the error that the compressed folder is empty. The unzip line works on the manually downloaded .zip file, which tells me something is wrong with the download.file line.
So if I add raw=TRUE to the end (which can make a difference in downloading data from GitHub):
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE","Shape.zip")
utils::unzip("Shape.zip")
I get a different warning with, similarly, nothing being executed:
Warning message:
In utils::unzip("Shape.zip") : internal error in 'unz' code
I have tried most of the answers at Using R to download zipped data file, extract, and import data, but they appear to be for single files that are zipped and aren't helping here. I've tried the answers at r function unzip error 1 in extracting from zip file, which mentions the same warning message I am getting, but none of the solutions work in this case.
Any idea of what I am doing wrong?
You need to use:
download.file(
"https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE",
"Shape.zip",
mode = "wb"
)
Without the query string ?raw=TRUE you are downloading the webpage and not the file.
(For Windows) R will use mode = "wb" by default when it detects from the end of the URL that certain file formats, including .zip, are being downloaded. However, the URL finishing with a query string instead of a file format means the check fails so you need to set the mode explicitly.

How can I upload a genepop file? Keep getting error message

I am trying to analyze microsatellite data and am trying to use the Genepop program.
I made a txt document with my data but when I try to run the program I get this message:
There is no such file in the package 'extdata' directory
I have the working directory set to the right file.

Set wd in RStudio

I am creating a series of r scripts that will be used by multiple people, meaning that the working directory of files used and stored will differ. There are two folders, one for the R code, called "rcode," and another to store the generated outputs, called "data". These two folders will always be shared in tandem. To accommodate for the changing working directory I created a "global" script that has the following lines of code and resides in the "rcode" folder:
source_path = rstudioapi::getActiveDocumentContext()$path
setwd(dirname(source_path))
swd_data <- paste0("..\\data\\")
The first line gets the source path of the global script. The second line makes this the working directory. The third line essentially tells the script to store an output in the "data" folder, which has the same path as the "rcode" folder. So to read in a csv file within the "data" folder I write:
old_total_demand <- read.csv(paste0(swd_data, "boerne_total_demand.csv"))
When I use this script on my Windows laptop it works beautifully, but when I use it on my Mac I get the following error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '..\data\demand\boerne_total_demand.csv': No such file or directory
Would anyone have any idea why this would be? Thanks in advance for the help.
I'm not sure what systems your collaborators will be using, but you may run into issues due to differences between Window/Mac/Linux with regards to how paths are written. I suggest you create a R Project .Rprj using RStudio and save that in your directory that contains subdirectories for data and rcode, and share the entire project directory.
/Project_dir/MyProject.Rprj
/Project_dir/data/
/Project_dir/rcode/
Then from the R project opened through RStudio you should be able to directly refer to your data by:
data <- read.csv("data/boerne_total_demand.csv")
The working directory will always be where your .Rproj is stored, so you can avoid having to setwd as it causes lots of chaos when sharing and collaborating with others.
I have this code from my current script at hand.
I hope you like it !
path <- dirname(getActiveDocumentContext()$path)
setwd(path)
swd_path <- paste0(path,"/data/")
if(!dir.exists(swd_path)){
dir.create(swd_path)
}
old_total_demand <- read.csv(paste0(swd_data, "boerne_total_demand.csv"))

Load h5 keras model file in R

I'm building a R package for binary classification and I'm using opencpu to host it. Currently I've saved the h5 file as .RData file(serialized), which is then loaded in the environment using the .onLoad() function in R. This enables the R script to use the environment variable to load keras model using keras::unserialized_model().
I've tried directly using keras::load_model_hdf5() in the code, but after building and deploying on opencpu, when I try to hit the prediction API, I get error
ioerror: unable to open file (unable to open file: name = '/home/modelfile_26feb.h5', errno = 13, error message = 'permission denied', flags = 0, o_flags = 0)
I have changed permission for the file(777) and even the groups but still getting the error.
I even tried putting the file in inst/extdata folder so that it gets in the package but still same error.
Can anyone help on this, or suggest some alternative to load the h5 model directly?
Which OS does OpenCPU run on? Why does it try to write in /home/, this is very unusual? The best solution is to adapt your code to write in getwd() or tempdir(). Even better is to store data in a local database or redis server and let R read it from there, so you don't need disk access at all.
If you run on Ubuntu Server, reading from /home/ is not permitted by default. If you want to allow this, you need to add apparmor rules, see section 3.5 of the server manual.
Some relevant topics from the opencpu mailing list:
write in home dir: https://groups.google.com/d/msg/opencpu/5vRvgSKY-qE/4xMzZCGJBAAJ
keras in opencpu: https://groups.google.com/d/msg/opencpu/HhRzFVVFdaA/n5Nu1sxyFgAJ
write tmp folder: https://groups.google.com/d/msg/opencpu/Y1tYhaQUzwU/ubSEd_CDCgAJ

Publishing AzureML Webservice from R requires external zip utility

I want to deploy a basic trained R model as a webservice to AzureML. Similar to what is done here:
http://www.r-bloggers.com/deploying-a-car-price-model-using-r-and-azureml/
Since that post the publishWebService function in the R AzureML package was has changed it now requires me to have a workspace object as first parameter thus my R code looks as follows:
library(MASS)
library(AzureML)
PredictionModel = lm( medv ~ lstat , data = Boston )
PricePredFunktion = function(percent)
{return(predict(PredictionModel, data.frame(lstat =percent)))}
myWsID = "<my Workspace ID>"
myAuth = "<my Authorization code"
ws = workspace(myWsID, myAuth, api_endpoint = "https://studio.azureml.net/", .validate = TRUE)
# publish the R function to AzureML
PricePredService = publishWebService(
ws,
"PricePredFunktion",
"PricePredOnline",
list("lstat" = "float"),
list("mdev" = "float"),
myWsID,
myAuth
)
But every time I execute the code I get the following error:
Error in publishWebService(ws, "PricePredFunktion", "PricePredOnline", :
Requires external zip utility. Please install zip, ensure it's on your path and try again.
I tried installing programs that handle zip files (like 7zip) on my machine as well as calling the utils library in R which allows R to directly interact with zip files. But I couldn't get rid of the error.
I also found the R package code that is throwing the error, it is on line 154 on this page:
https://github.com/RevolutionAnalytics/AzureML/blob/master/R/internal.R
but it didn't help me in figuring out what to do.
Thanks in advance for any Help!
The Azure Machine Learning API requires the payload to be zipped, which is why the package insists on the zip utility being installed. (This is an unfortunate situation, and hopefully we can find a way in future to include a zip with the package.)
It is unlikely that you will ever encounter this situation on Linux, since most (all?) Linux distributions includes a zip utility.
Thus, on Windows, you have to do the following procedure once:
Install a zip utility (RTools has one and this works)
Ensure the zip is on your path
Restart R – this is important, otherwise R will not recognize the changed path
Upon completion, the litmus test is if R can see your zip. To do this, try:
Sys.which("zip")
You should get a result similar to this:
zip
"C:\\Rtools\\R-3.1\\bin\\zip.exe"
In other words, R should recognize the installation path.
On previous occasions when people told me this didn’t work, it was always because they thought they had a zip in the path, but it turned out they didn’t.
One last comment: installing 7zip may not work. The reason is that 7zip contains a utility called 7zip, but R will only look for a utility called zip.
I saw this link earlier but the additional clarification which made my code not work was
1. Address and Path of Rtools was not as straigt forward
2. You need to Reboot R
With regards to the address - always look where it was installed . I also used this code to set the path and ALWAYS ADD ZIP at the end
##Rtools.bin="C:\\Users\\User_2\\R-Portable\\Rtools\\bin"
Rtools.bin="C:\\Rtools\\bin\\zip"
sys.path = Sys.getenv("PATH")
if (Sys.which("zip") == "" ) {
system(paste("setx PATH \"", Rtools.bin, ";", sys.path, "\"", sep = ""))
}
Sys.which("zip")
you should get a return of
" C:\\RTools|\bin\zip"
From looking at Andrie's comment here: https://github.com/RevolutionAnalytics/AzureML/commit/9cf2c5c59f1f82b874dc7fdb1f9439b11ab60f40
Implies we can just download RTools and be done with it.
Download RTools from:
https://cran.r-project.org/bin/windows/Rtools/
During installation select the check box to modify the PATH
At first it didn't work. I then tried R32bit, and that seemed to work. Then R64 bit started working again. Honestly, not sure if I did something in the middle to make it work. Only takes a few minutes so worth a punt.
Try the following
-Download the Rtools file which usually contains the zip utility.
-Copy all the files in the "bin" folder of "Rtools"
-Paste them in "~/RStudio/bin/x64" folder

Resources