use julia language without internet connection (mirror?) - julia

Problem:
I would like to make julia available for our developers on our corporate network, which has no internet access at all (no proxy), due to sensitive data.
As far as I understand julia is designed to use github.
For instance julia> Pkg.init() tries to access:
git://github.com/JuliaLang/METADATA.jl
Example:
I solved this problem for R by creating a local CRAN repository (rsync) and setting up a local webserver.
I also solved this problem for python the same way by creating a local PyPi repository (bandersnatch) + webserver.
Question:
Is there a way to create a local repository for metadata and packages for julia?
Thank you in advance.
Roman

Yes, one of the benefits from using the Julia package manager is that you should be able to fork METADATA and host it anywhere you'd like (and keep a branch where you can actually check new packages before allowing your clients to update). You might be one of the first people to actually set up such a system, so expect that you will need to submit some issues (or better yet; pull requests) in order to get everything working smoothly.
See the extra arguments to Pkg.init() where you specify the METADATA repo URL.
If you want a simpler solution to manage I would also think about having a two tier setup where you install packages on one system (connected to the internet), and then copy the resulting ~/.julia directory to the restricted system. If the packages you use have binary dependencies, you might run into problems if you don't have similar systems on both sides, or if some of the dependencies is installed globally, but Pkg.build("Pkgname") might be helpful.

This is how I solved it (for now), using second suggestion by
ivarne.I use a two tier setup, two networks one connected to internet (office network), one air gapped network (development network).
System information: openSuSE-13.1 (both networks), julia-0.3.5 (both networks)
Tier one (office network)
installed julia on an NFS share, /sharename/local/julia.
soft linked /sharename/local/bin/julia to /sharename/local/julia/bin/julia
appended /sharename/local/bin/ to $PATH using a script in /etc/profile.d/scriptname.sh
created /etc/gitconfig on all office network machines: [url "https://"] insteadOf = git:// (to solve proxy server problems with github)
now every user on the office network can simply run # julia
Pkg.add("PackageName") is then used to install various packages.
The two networks are connected periodically (with certain security measures ssh, firewall, routing) for automated data exchange for a short period of time.
Tier two (development network)
installed julia on NFS share equal to tier one.
When the networks are connected I use a shell script with rsync -avz --delete to synchronize the .julia directory of tier one to tier two for every user.
Conclusion (so far):
It seems to work reasonably well.
As ivarne suggested there are problems if a package is installed AND something more than just file copying is done (compiled?) on tier one, the package wont run on tier two. But this can be resolved with Pkg.build("Pkgname").

PackageCompiler.jl seems like the best tool for using modern Julia (v1.8) on secure systems. The following approach requires a build server with the same architecture as the deployment server, something your institution probably already uses for developing containers, etc.
Build a sysimage with PackageCompiler's create_sysimage()
Upload the build (sysimage and depot) along with the Julia binaries to the secure system
Alias a script to julia, similar to the following example:
#!/bin/bash
set -Eeu -o pipefail
unset JULIA_LOAD_PATH
export JULIA_PROJECT=/Path/To/Project
export JULIA_DEPOT_PATH=/Path/To/Depot
export JULIA_PKG_OFFLINE=true
/Path/To/julia -J/Path/To/sysimage.so "$#"
I've been able to run a research pipeline on my institution's secure system, for which there is a public version of the approach.

Related

Restrictive AppArmor Profiles in R

I developed an R shiny web application, hosted on an Ubuntu 20.04 machine and deployed via ShinyProxy. I.e. each instance of the app runs in a separate Docker container. Various directories inside the containers are mapped to directories in the host system.
The app contains a console where users can enter R code which is evaluated in the backend. Allowing users to insert executable code into the backend is always risky. Yet, it is mandatory for this application. Docker containers provide some degree of isolation, but are not a full sandboxing solution.
Therefore, I would like to use AppArmor, called via RAppArmor, to further secure the application and prevent the user from reading, writing, or executing essentially any files on disk. That is more restrictive than what the pre-defined AppArmor profiles in the RAppArmor package implement. The problem is that R would not run, if I simply denied access to the entire file system. R's basic functionalities require access to various directories. However, I do not know what the most restrictive configuration would look like that still permits running R.
The setup should not allow R to read, write, or execute any files, except those needed for R to run and functions included in pre-defined list of packages. E.g. the user might be allowed to use functions from the gdistance package, but not the DBI package. And of course, the user may not install any packages.
Here is a much less restrictive example profile from RAppArmor:
profile r-base {
#include <abstractions/base>
#include <abstractions/nameservice>
#{PROC}/[0-9]*/attr/current r,
/bin/* rix,
/dev/tty r,
/etc/R/ r,
/etc/R/* r,
/etc/fonts/** mr,
/etc/resolv.conf r,
/etc/xml/* r,
/tmp/** rw,
/usr/bin/* rix,
/usr/lib/R/bin/* rix,
/usr/lib{,32,64}/** mr,
/usr/lib{,32,64}/R/bin/exec/R rix,
/usr/local/lib/R/** mr,
/usr/local/share/** mr,
/usr/share/** mr,
/usr/share/ca-certificates/** r,
}
This question fits multiple SE forums in that it requires a deeper understanding of the R programming language (Stack Overflow), the Linux (Ubuntu) operation system (Unix SE), and security issues (Security SE). Yet, because its focus lies on R and the RAppArmor package in particular, Stack Overflow is the best fit. This should be obvious from reading the question. I still include this paragraph because there are numerous trigger-happy reviewers and moderators who shoot down any question that is remotely related to another SE forum, without carefully considering their decision.

Single install Apache Karaf with failover configuration using shared disk

I'm looking to implement failover (master/slave) for Karaf. Our current
server setup has two application servers that have a shared SAN disk where
our current Java applications are installed in a single location and can
be started on either machine or both machines at the same time.
I was looking to implement Karaf master/slave failover in a similar way
(one install being shared by both app servers), however I'm not sure that
this is really a well beaten path and would appreciate some advice on
whether the alternatives (mentioned below) are significantly better.
Current idea for failover:
Install Karaf once on the shared SAN and setup basic file locking on this
shared disk.
Both application servers will effectively initiate the Karaf start script,
however only one (the first) will fully start (grabbing the lock) and the
second will remain in standby until it grabs the lock (if the master falls
over)
The main benefit I can see from this is that I only have to manage
deploying components to one Karaf installation and I only need to manage
one Karaf installation.
Alternatives:
We install Karaf in two separate locations on the shared SAN and setup to
lock to the same lock file.
Each application server will have their own Karaf instance, thus start
script to run.
This will make our deployment slightly more complicated (2 Karaf
installations to manage and deploy to).
I'd be interested if anyone can indicate any specific concerns that they
have with the current idea.
Note: I understand that Karaf-cellar can simplify my Karaf instance
management, however we would need to undertake another round of PoCs etc..
to approve our company use of cellar (as a separate product). Something
that I'd like to migrate to in the future.
Take a look at the documentation
This is from the documentation on how to set a lockfile for HA:
karaf.lock=true
karaf.lock.class=org.apache.karaf.main.lock.SimpleFileLock
karaf.lock.dir=<PathToLockFileDirectory>
karaf.lock.delay=10000
as can be seen there, you can also set a level for the bundle start-levels to start or not to start:
karaf.lock.level=50

How to work on several machines, keep them in sync without Internet?

For quite a while now, I have been using Dropbox to sync a Git repository on several virtual machines (one Windows, one Mac, one Linux). I would commit my changes on one of the machines and Dropbox would take care of syncing the changes of the files and the repo onto the other machines.
This works very seamless. I code on OSX, test the code on Windows and Linux, maybe make some changes there, then commit from one of the three.
However, it has three major drawbacks:
It requires an internet connection. I frequently have to rely on my cellphone for internet connectivity, which is unreliable if I'm on a train and only good for a few hundred Mb per month.
Dropbox syncs EVERYTHING including object files, Visual Studio debug databases and just a whole lot of unnecessary stuff that does not need to be synced.
It always goes through Dropbox servers, which is fine for some minor project or some open source stuff, but I don't want to push my work files to an untrusted server.
So, how do you manage an environment like this?
Edit:
Actually, all the three virtual machines live on the very same laptop, so network connections between them are not a problem. Also, I frequently code on one OS and compile on another--and go back and forth until I have found all errors. I don't want to spam the company repo with hundreds of incremental commits.
Edit 2:
To give you an idea for what I am looking for, here is a partial solution I came up with: On each machine, I created a git repository of the files I want to work with. Typically, I will start working on a bug/feature one machine, then commit my work. On the next machine, I will call git reset origin to load the changes from the first machine, then continue working on the commit using git commit --amend. This will go back and forth a few times. Once I am done, I will finally commit the changes for real (no more amending) and start working on the next feature/bug.
However, this workflow feels cumbersome and inelegant. What I am looking for is something that results in the same output--one commit on the repo--but was created fluently between the three machines.
You could consider setting up your own software versioning server.
Most clients for these servers have implementations on varying OS's and platforms.
But if you want to communicate between machines that are not in a LAN, you're going to need an internet connection.
The versioning servers network communication can be exposed over NAT through a gateway to the internet. You could implement security by setting up a tunnel mechanism. Any client would then tunnel up to a gateway server and then communicate with the versioning server.
As for control over which files are actually versioned: I have some experience with SVN, with which you can select on file level which files to add to versioning. the SVN client will then simply ignore the rest of the files and directories.
Edit:
Reading the edit of the original author's question:
Maybe setup a 4th virutal machine, containing the Versioning server. SVN isn't (by any stretch of the imagination) hard to manage. (RTM). Have the three virtual machines connect to the server on the 4th. (This is ofcourse, if it's possible to run the machines in parallel on the same machine.)
If you can share a disk between the three, put the master repo on that. (Make sure you make backups! Especially for removable media.)
Edit: In more detail, you can have your master repo on a USB stick or a shared partition on your hard drive (as you indicate you are using multiple virtual machines on the same hardware).
To set up a private git repo, simply create an empty directory and run git init.
Assuming you are on your Ubuntu box and have an USB stick with a file system which you can read and write in all your operating systems, mounted in /media/usbgit, run this:
vnix$ mkdir /media/usbgit/mycode
vnix$ cd /media/usbgit/mycode
vnix$ git init
Initialized empty Git repository in /media/usbgit/mycode/.git/
(Given that you already have a git repo, you probably just want to clone it to the USB stick instead:
vnix$ cd /media/usbgit
vnix$ git clone /home/you/work/wherever/mycode
Initialized empty Git repository in /media/usbgit/mycode/.git/
This will now contain all the commits from the repo you pulled from.)
Now you have an empty repository which you can clone and pull from on all the boxes. Once you have the USB stick mounted, you can clone from it.
vnix$ cd
vnix$ mount | fgrep usbgit
/dev/whatever on /media/usbgit type NTFS (rw)
vnix$ git clone /media/usbgit/mycode
Initialized empty Git repository in /home/you/mycode/.git/
warning: You appear to have cloned an empty repository.
All of this is doable with SVN too (use svnadmin create to initialize a repository, and svn checkout file:///media/usbgit/mycode to check it out), but you will lose the benefits of a distributed VCS, which seem useful in your scenario.
In particular, with a distributed VCS, you can have multiple private repositories (each working directory is a repository on its own) and you can sync with and pull from your private master and a public repository e.g. on Github -- just make sure you know what you have where.

Alternative tools for Amazon EC2?

Amazon's official tools for interacting with EC2 are kind of clunky and a pain to deal with. I have to set up a bunch of environment variables, store separate private keys just for EC2, add extra items to my PATH, and so on. They all output tab delimited lines that are hundreds of characters long with no headings, so it's a bit of a pain to interpret them. Their instructions for setting up an SSH keypair give you one that isn't protected by a passphrase, rather than letting you use an existing keypair that you already have. The programs are all just a bit clunky and aren't very good Unix programs.
So, are there any easier to use command line tools for accessing EC2? I know there is ElasticFox, and there is their web based console, which do make the process easier, but I'm wondering if anyone else has written better command line tools for interacting with EC2.
I'm a bit late but I have a solution!
I found the same problems with the Amazon AMI tools. They're a decent reference implementation but very difficult to use particularly when you have more than a couple instances. I wrote a replacement command-line tool as part of another project, called Rudy that answers most of your concerns
The commands are more intuitive than Amazon's AMI tools:
rudy-ec2 instances -C
rudy-ec2 groups -A -p 8080 -a 11.22.33.44 group-name
rudy-ec2 volumes -C -s 100
rudy-ec2 images
...
All configuration is in a single file (~/.rudy/config).
It can output in several formats (yaml, json, csv, tsv, and of course regular text):
rudy-ec2 -f yaml snapshots
---
:awsid: snap-2457b24d
:progress: 100%
:created: "2009-05-08T15:24:17.000Z"
:volid: vol-4ee10427
:status: completed
Regarding the private keys, There are no EC2 tools that allow to create private keys for with a password for booting a public instance because the API doesn't support it. However, if you create your own image, you can use your private keys.
Here's more info:
GitHub Project
An introduction to rudy-ec2
ElasticFox is handy for most tasks. They are occasions though that a command line tool will be better suited. I personally use boto library for python. It is very easy to script all the required operations. You can also use it to upload/download files from S3. In general, I would say that a scripting language like Python or RUby, together with a AWS library, is the best solution.
I personally use Tim Kay's Perl command line tools and haven't used original Java based API for quite some time. Excellent for UNIX environment.
Not command line, but take a look at what a free RightScale account will give you - much, much easier than command line or ElasticFox IMO.
About ec2-api-tools:
I agree that they are a bit too clunky, I particular dislike the output of ec2-describe-instances.
I recently switched to python-boto which offers a very clean and easy to use interface to ec2.
About not being able to specify a passphrase for the ssh key generated by EC2:
That's not the case. You can change the passphrase of any ssh private key anytime, using:
ssh-keygen -p -f /path/to/keyfile
E.g.
ssh-keygen -p -f ~/.ssh/id_rsa
About uploading your own ssh key pair:
You can use ec2-import-keypair, like this:
for i in $(ec2-describe-regions|cut -f 2);do
ec2-import-keypair --region $i mykey --public-key-file ~/.ssh/id_rsa.pub
done
The example above will upload the public key in ~/.ssh/id_rsa.pub to every region under the name "mykey". Remember that each region has it's own keypair.
In order to have the key installed in your ec2 instances, you'll have to pass the "-k mykey" option to ec2-run-instances.
Incidentally, uploading your own keypair is the only way to login with the same key to all the instances in all regions. If you create the keypair from the web interface, you'll have a different key in every region.
I have an open source graphical system admin tool called EC2Dream that replaces the command line tool. It installs on windows, linux and Mac OS clients and is written in Ruby and FXRuby. See www.ec2dream.com.
Neill Turner
www.ec2dream.com
If you use windows, try the tool linked below (part of the O2 Platform) which gives you an easy way to start and stop Amazon EC2 images (and if you need to extend the tool it you can easily add new features (since it just an C# script that is dynamically compiled and executed)
O2 Tool – Amazon EC2 Browser
Amazon EC2 Browser – Timer to Stop Instances
the problem with alternative libraries is that they are not always kept up to date, so if new features for AWS are released, then you need to wait. You posted that your main problems are the bunch of environment variables, add extra items to your PATH, etc. We had this
issue at BitNami, and is the main reason we created BitNami Cloud Tools that ships all of the AWS command line tools together with preconfigured Java and Ruby language runtimes. You only have to download it and everything that you need will be installed in a folder without modify your system configuration. We keep it regularly up to date.
There is an entire industry called Cloud Management which try to solve this type of problems. Scalr and RightScale and the leaders in this sector (disclaimer: I work at Scalr).
Cloud management softwares are built on top of Amazon EC2 API (and usually on other public IaaS like Rackspace) and provide an improved user interface along with automation tools like backups or SSH management as you mentioned it. They don't provide easier command lines tools stricto sensu. Their goal is to make interaction with Amazon EC2 easier.
Different options are available in the market:
Scalr: Scalr is available as a hosted service with a trial version.
Otherwise you can download and install the source code yourself, as it is released under the Apache 2 license.
RightScale: while they are usually considered as expensive for small businesses, they do offer a free account.
enStratus: they offer a freemium model like RightScale.

Keeping dot files synched across machines?

Like most *nix people, I tend to play with my tools and get them configured just the way that I like them. This was all well and good until recently. As I do more and more work, I tend to log onto more and more machines, and have more and more stuff that's configured great on my home machine, but not necessarily on my work machine, or my web server, or any of my work servers...
How do you keep these config files updated? Do you just manually copy them over? Do you have them stored somewhere public?
I've had pretty good luck keeping my files under a revision control system. It's not for everyone, but most programmers should be able to appreciate the benefits.
Read
Keeping Your Life in Subversion
for an excellent description, including how to handle non-dotfile configuration (like cron jobs via the svnfix script) on multiple machines.
I also use subversion to manage my dotfiles. When I login to a box my confs are automagically updated for me. I also use github to store my confs publicly. I use git-svn to keep the two in sync.
Getting up and running on a new server is just a matter of running a few commands. The create_links script just creates the symlinks from the .dotfiles folder items into my $HOME, and also touches some files that don't need to be checked in.
$ cd
# checkout the files
$ svn co https://path/to/my/dotfiles/trunk .dotfiles
# remove any files that might be in the way
$ .dotfiles/create_links.sh unlink
# create the symlinks and other random tasks needed for setup
$ .dotfiles/create_links.sh
It seems like everywhere I look these days I find a new thing that makes me say "Hey, that'd be a good thing to use DropBox for"
Rsync is about your best solution. Examples can be found here:
http://troy.jdmz.net/rsync/index.html
I use git for this.
There is a wiki/mailing list dedicated to the topic.
vcs-home
I would definetly recommend homesick. It uses git and automatically symlinks your files. homesick track tracks a new dotfile, while homesick symlink symlinks new dotfiles from the repository into your homefolder. This way you can even have more than one repository.
You could use rsync. It works through ssh which I've found useful since I only setup new servers with ssh access.
Or, create a tar file that you move around everywhere and unpack.
I store them in my version control system.
i use svn ... having a public and a private repository ... so as soon as i get on a server i just
svn co http://my.rep/home/public
and have all my dot files ...
I store mine in a git repository, which allows me to easily merge beyond system dependent changes, yet share changes that I want as well.
I keep master versions of the files under CM control on my main machine, and where I need to, arrange to copy the updates around. Fortunately, we have NFS mounts for home directories on most of our machines, so I actually don't have to copy all that often. My profile, on the other hand, is rather complex - and has provision for different PATH settings, etc, on different machines. Roughly, the machines I have administrative control over tend to have more open source software installed than machines I use occasionally without administrative control.
So, I have a random mix of manual and semi-automatic process.
There is netskel where you put your common files on a web server, and then the client program maintains the dot-files on any number of client machines. It's designed to run on any level of client machine, so the shell scripts are proper sh scripts and have a minimal amount of dependencies.
Svn here, too. Rsync or unison would be a good idea, except that sometimes stuff stops working and i wonder what was in my .bashrc file last week. Svn is a life saver in that case.
Now I use Live Mesh which keeps all my files synchronized across multiple machines.
I put all my dotfiles in to a folder on Dropbox and then symlink them to each machine. Changes made on one machine are available to all the others almost immediately. It just works.
Depending on your environment you can also use (fully backupped) NFS shares ...
Speaking about storing dot files in public there are
http://www.dotfiles.com/
and
http://dotfiles.org/
But it would be really painful to manually update your files as (AFAIK) none of these services provide any API.
The latter is really minimalistic (no contact form, no information about who made/owns it etc.)
briefcase is a tool to facilitate keeping dotfiles in git, including those with private information (such as .gitconfig).
By keeping your configuration files in a git public git repository, you can share your settings with others. Any secret information is kept in a single file outside the repository (it’s up to you to backup and transport this file).
-- http://jim.github.com/briefcase
mackup
https://github.com/lra/mackup
Ira/mackup is a utility for Linux & Mac systems that will sync application preferences using almost any popular shared storage provider (dropbox, icloud, google drive). It works by replacing the dot files with symlinks.
It also has a large library of hundreds of applications that are supported https://github.com/lra/mackup/tree/master/mackup/applications

Resources