Accessing Python functions from a large Git repository in R - r

My company has a large Git repository that is actively being developed (call it really-big-repo). I would like to use the reticulate package to call some of those python functions in R. Apologies in advance as I still try to get my arms around both python and git.
Single python files can be called using reticulate::source_python("flights.py") (see here). However, the python script I would like to imports modules that are from other parts of the repository. For example, the file that I would like to source great_python_functions.py looks like this:
import datetime
import json
import re
import requests
from bs4 import BeautifulSoup
from SomeRepoDirectory import utils
from SomeRepoDirectory.entity.models import Entity, EntityAlias, EntityBase, Subsidiary
import SomeRepoDirectory.entity.wikipedia
from SomeRepoDirectory.io.es import es_h
...
To further complicate it, the repo is pretty large and this file is just a small part of it. I'm not sure it is wise to load ALL of the repo's functions into my R environment.
And one more bonus question. I really would like to access the functions that are on a branch of the repo, not master. I've cloned the repository to a different directory than my R project using git clone git#github.com:my-company-name/really-big-repo.git

Here are the following steps you can try, but gotta say this is going to be complicated, might I say learning python might be easier :p
Like you said you have cloned the repository:
cd ./cloned_repo
conda activate your_vitual_env
git checkout feature/branch
python setup.py develop # this will install the pkg in virtual env, you can also use install instead of develop
Now in your R, use the virtual env in which you installed the repo, in my example I am using conda env, so you can use: reticulate::use_condaenv('your_virtual_env') and then you should be able to use those functions.
Also, in my experience intermingling python and R has caused a lot of pain for production development especially with package management. So I will advise some caution.

Related

I am trying to work with excel files with aspose cell so that I can encrypted them but its not working because of Java

import jpype
import asposecells
jpype.startJVM()
from asposecells.api import *
for this code, I get the following error
JVMNotFoundException: No JVM shared library file (jvm.dll) found. Try setting up the JAVA_HOME environment variable properly.
I am doing this through anaconda, in jupyter notebook. I am trying to get the workbook from aspose cell.
It seems configuration issue. Try to setup your environment properly. Make sure to install Java and setup JAVA_HOME and Path environment variables accordingly, see the document on how to setup environment and installation to use Aspose.Cells for Python via Java for your reference. You may also post your queries in the dedicated section.
PS. I am working as Support developer/ Evangelist at Aspose.

Error while importing pandas in R via reticulate

I am using R and I want to use a function I wrote in Python which needs to import pandas. Hence, I use the following code in R:
library(reticulate)
reticulate::py_install("pandas", force = TRUE)
which runs with no issues. Also, I already installed pandas in Python. Nevertheless, when I run the script which imports pandas:
source("script_with_pandas.py")
I get the following error:
Error in py_run_file_impl(file, local, convert) :
ImportError: C extension: No module named 'pandas._libs.interval' not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --force' to build the C extensions first.
Any idea how to solve this?
Try
reticulate::source_python("script_with_pandas.py")
But I'm pretty sure this is an environment issue. If you're using RStudio >=v1.4 then you can go to tools --> global options --> python interpreter and check which one you're using, that may be the problem. Aside from that, I think it's only a matter of installing the pandas package onto the right environment.

Install a R package permanently in Google Colab

I am using the -idefix- R package and I do not want to install it everytime I log into Google Colab. Is there any way of installing it permanently? Will it also be installed for other people if I share the notebook.
Thank you :)
Like what you could do in a local computer, copy the source local R library to the target location. See some instruction in this blog ( atusy.net )
Here are two CoLab notebooks to reproduce the import and export R library.
CoLab Notebook export local library
CoLab Notebook import local library
Here are some minimal snippets in this I/O process.
Open a CoLab notebook in Python,
# activate R magic
%load_ext rpy2.ipython
Make the notebook available for R.
%%R
install.packages('tidymodels')
tar("library.tar.gz", "/usr/local/lib/R/site-library")
Install the package tidymodels, and zip your library with installed packages.
from google.colab import drive
drive.mount('/content/drive')
Connect your google drive and make a copy for use in the future.
%cp library.tar.gz drive/MyDrive/src/
drive/MyDrive/src/ is the path I choose, you can use another.
Next, you use this library in another or new notebook.
from google.colab import drive
drive.mount('/content/drive')
Connect your Google Drive.
%cp drive/MyDrive/src/library.tar.gz .
Copy it in your working directory.
!tar xf library.tar.gz
Extract the installed packages from the zipped file.
.libPaths('usr/local/lib/R/site-library/')
update the Library path and put it at the top ranking.
library(tidymodels)
Check, this package is of reuse
As far as I understand it, each virtual machine is recycled after you close the browser window or the session is longer than 12 hours. There is no possibility to install packages in a way that you can access them without installing them again (to the best of my knowledge).

How do I use PyInstaller to make packaged applications smaller

How do you make packaged generated applications smaller?
I type in the terminal of Windows
PS C:\Users\Lenovo-pc> PyInstaller -w 123.py
But the package is huge, and the following modules are mainly imported
import time,threading,itertools
from tkinter import Label,Tk,Button,Text,END,PhotoImage,Entry,RAISED
from random import shuffle,choice
from datetime import datetime
from openpyxl import load_workbook
How do I make the resulting executable smaller?
Actually, you can't do much about that, because PyInstaller would bring each dependency beside your application output. On the other side sometimes it brings unnecessary modules with your output which should be ignored.
For a small executable output you can do two things:
Always use virtualenv to buildup your app. This would remove unnecessary packages that you already installed on your main Python library. So they will be ignored on the current build.
According to this using a compression decrease your output executable significantly.
UPX is a free utility available for most operating systems. UPX
compresses executable files and libraries, making them smaller,
sometimes much smaller. UPX is available for most operating systems
and can compress a large number of executable file formats. See the
UPX home page for downloads, and for the list of supported executable
formats.
So bring up a virtualenv install your external dependencies and then install UPX from here and use --upx-dir to pass UPX dir.

Dependency management in R

Does R have a dependency management tool to facilitate project-specific dependencies? I'm looking for something akin to Java's maven, Ruby's bundler, Python's virtualenv, Node's npm, etc.
I'm aware of the "Depends" clause in the DESCRIPTION file, as well as the R_LIBS facility, but these don't seem to work in concert to provide a solution to some very common workflows.
I'd essentially like to be able to check out a project and run a single command to build and test the project. The command should install any required packages into a project-specific library without affecting the global R installation. E.g.:
my_project/.Rlibs/*
Unfortunately, Depends: within the DESCRIPTION: file is all you get for the following reasons:
R itself is reasonably cross-platform, but that means we need this to work across platforms and OSs
Encoding Depends: beyond R packages requires encoding the Depends in a portable manner across operating systems---good luck encoding even something simple such as 'a PNG graphics library' in a way that can be resolved unambiguously across systems
Windows does not have a package manager
AFAIK OS X does not have a package manager that mixes what Apple ships and what other Open Source projects provide
Even among Linux distributions, you do not get consistency: just take RStudio as an example which comes in two packages (which all provide their dependencies!) for RedHat/Fedora and Debian/Ubuntu
This is a hard problem.
The packrat package is precisely meant to achieve the following:
install any required packages into a project-specific library without affecting the global R installation
It allows installing different versions of the same packages in different project-local package libraries.
I am adding this answer even though this question is 5 years old, because this solution apparently didn't exist yet at the time the question was asked (as far as I can tell, packrat first appeared on CRAN in 2014).
Update (November 2019)
The new R package renv replaced packrat.
As a stop-gap, I've written a new rbundler package. It installs project dependencies into a project-specific subdirectory (e.g. <PROJECT>/.Rbundle), allowing the user to avoid using global libraries.
rbundler on Github
rbundler on CRAN
We've been using rbundler at Opower for a few months now and have seen a huge improvement in developer workflow, testability, and maintainability of internal packages. Combined with our internal package repository, we have been able to stabilize development of a dozen or so packages for use in production applications.
A common workflow:
Check out a project from github
cd into the project directory
Fire up R
From the R console:
library(rbundler)
bundle('.')
All dependencies will be installed into ./.Rbundle, and an .Renviron file will be created with the following contents:
R_LIBS_USER='.Rbundle'
Any R operations run from within this project directory will adhere to the project-speciic library and package dependencies. Note that, while this method uses the package DESCRIPTION to define dependencies, it needn't have an actual package structure. Thus, rbundler becomes a general tool for managing an R project, whether it be a simple script or a full-blown package.
You could use the following workflow:
1) create a script file, which contains everything you want to setup and store it in your projectd directory as e.g. projectInit.R
2) source this script from your .Rprofile (or any other file executed by R at startup) with a try statement
try(source("./projectInit.R"), silent=TRUE)
This will guarantee that even when no projectInit.R is found, R starts without error message
3) if you start R in your project directory, the projectInit.R file will be sourced if present in the directory and you are ready to go
This is from a Linux perspective, but should work in the same way under windows and Mac as well.

Resources