I am writing a docker file to conduct an evaluation of an R package that I have written. The package is installed inside RStudio server. My docker file inherits from FROM rocker/rstudio. I want the docker file to install the everything (which it does) and create 20 users (user1, user2, ..., user20) as test subjects. Each test subject additionally should have a data folder (/home/data) to contain some CSV files.
Later when the container is run, The RStudio's web interface allows the test subjects to login using the user credentials just created. They have access to their own data, perform the tests, and generates some new files in their home directory.
I am able to create the users using RUN useradd -ms /bin/bash user1 and set the current user to users USER user1, but do not know how to transfer the files alongside the Dockerfile.
Note 1: I do not like the dea of putting the files on the host machine and sharing them with the docker users. Instead I want to have them shipped with the Dockerfile and get them added to the image and container when the Dockerfile is built and run.
Note 2: I need to do the evaluation is a reproducible way, because I need to repeat it several times.
You can use the COPY or ADD command in the dockerfile to get files in to you docker machine. Have a look here for more information.
Related
I would like to use Gitlab CI to compile a Latex article as explained in this answer on tex.stackexchange (a similar pdf generation example is shown in the gitlab documentation for artifacts). I use a special latex template given by the journal editor. My Latex article contains figures made with the R statistical software. R and Latex are two large software installations with a lot of dependencies so I decided to use two separate containers for the build, one for the statistical analysis and visualization with R and one to compile a Latex document to pdf.
Here is the content of .gitlab-ci.yml:
knit_rnw_to_tex:
image: rocker/verse:4.0.0
script:
- Rscript -e "knitr::knit('article.Rnw')"
artifacts:
paths:
- figure/
compile_pdf:
image: aergus/latex
script:
- ls figure
- latexmk -pdf -bibtex -use-make article.tex
artifacts:
paths:
- article.pdf
The knit_rnw_to_tex job executed in the R "rocker" container is successful and I can download the figure artifacts from the gitlab "jobs" page. The issue in the second job compile_pdf is that ls figure shows me an empty folder and the Latex article compilation fails because of missing figures.
It should be possible to use artifacts to pass data between jobs according to this answer and to this well explained forum post but they use only one container for different jobs. It doesn't work in my case. Probably because I use two different containers?
Another solution would be to use only the rocker/tidyverse container and install latexmk in there, but the installation of apt install latexmk fails for an unknown reason. Maybe because It has over hundred dependencies and that is to much for gitlab-CI?
The "dependencies" keyword could help according to that answer, but the artifacts are still not available when I use it.
How can I pass the artifacts from one job to the other?
Should I use cache as explained in docs.gitlab.com / caching?
Thank you for the comment as I wanted to be sure, how you do it. Example would help too, but I'll be generic for now (using docker).
To run multiple containers you need a
(The Docker executor)
To quote the documentation on it:
The Docker executor when used with GitLab CI, connects to Docker
Engine and runs each build in a separate and isolated container using
the predefined image that is set up in .gitlab-ci.yml and in
accordance in config.toml.
Workflow
The Docker executor divides the job into multiple steps:
Prepare: Create and start the services.
Pre-job: Clone, restore cache and download artifacts from previous stages. This is run on a special Docker image.
Job: User build. This is run on the user-provided Docker image.
Post-job: Create cache, upload artifacts to GitLab. This is run on a special Docker Image.
Your config.toml could look like this:
[runners.docker]
image = "rocker/verse:4.0.0"
builds_dir = /home/builds/rocker
[[runners.docker.services]]
name = "aergus/latex"
alias = "latex"
From above linked documentation:
The image keyword
The image keyword is the name of the Docker image that is present in the local Docker Engine (list all images with docker images) or any image that can be found at Docker Hub. For more information about images and Docker Hub please read the Docker Fundamentals documentation.
In short, with image we refer to the Docker image, which will be used to create a container on which your build will run.
If you don’t specify the namespace, Docker implies library which includes all official images. That’s why you’ll see many times the library part omitted in .gitlab-ci.yml and config.toml. For example you can define an image like image: ruby:2.6, which is a shortcut for image: library/ruby:2.6.
Then, for each Docker image there are tags, denoting the version of the image. These are defined with a colon (:) after the image name. For example, for Ruby you can see the supported tags at docker hub. If you don’t specify a tag (like image: ruby), latest is implied.
The image you choose to run your build in via image directive must have a working shell in its operating system PATH. Supported shells are sh, bash, and pwsh (since 13.9) for Linux, and PowerShell for Windows. GitLab Runner cannot execute a command using the underlying OS system calls (such as exec).
The services keyword
The services keyword defines just another Docker image that is run during your build and is linked to the Docker image that the image keyword defines. This allows you to access the service image during build time.
The service image can run any application, but the most common use case is to run a database container, e.g., mysql. It’s easier and faster to use an existing image and run it as an additional container than install mysql every time the project is built.
You can see some widely used services examples in the relevant documentation of CI services examples.
If needed, you can assign an alias to each service.
As for your questions:
It should be possible to use artifacts to pass data between jobs
according to this answer and to this well explained forum post but
they use only one container for different jobs. It doesn't work in my
case. Probably because I use two different containers?
The builds and cache storage (from documentation)
The Docker executor by default stores all builds in /builds/<namespace>/<project-name> and all caches in /cache (inside the container). You can overwrite the /builds and /cache directories by defining the builds_dir and cache_dir options under the [[runners]] section in config.toml. This will modify where the data are stored inside the container.
If you modify the /cache storage path, you also need to make sure to mark this directory as persistent by defining it in volumes = ["/my/cache/"] under the [runners.docker] section in config.toml.
builds_dir -> Absolute path to a directory where builds are stored in the context of the selected executor. For example, locally, Docker, or SSH.
The [[runners]] section documentation
As you may have noticed I have customized the build_dir in your toml file to /home/builds/rocker, please adjust it to your own path.
How can I pass the artifacts from one job to the other?
You can use the build_dir directive. Second option would to use Job Artifacts API.
Should I use cache as explained in docs.gitlab.com / caching?
Yes, You should use cache to store project dependencies. The advantage is that you fetch the dependencies only once from internet and then subsequent runs are much faster as they can skip this step. Artifacts are used to share results between build stages.
I hope it is now clearer and I have pointed you into right direction.
The two different images are not the cause of your problems. The artifacts are saved in one image (which seems to work), and then restored in the other. I would therefore advise against building (and maintaining) a single image, as that should not be necessary here.
The reason you are having problems is that you are missing build stages which inform gitlab about dependencies between the jobs. I would therefore advise you to specify stages as well as their respective jobs in your .gitlab-ci.yml:
stages:
- do_stats
- do_compile_pdf
knit_rnw_to_tex:
stage: do_stats
image: rocker/verse:4.0.0
script:
- Rscript -e "knitr::knit('article.Rnw')"
artifacts:
paths:
- figure/
compile_pdf:
stage: do_compile_pdf
image: aergus/latex
script:
- ls figure
- latexmk -pdf -bibtex -use-make article.tex
artifacts:
paths:
- article.pdf
Context:
By default, all artifacts of previous build stages are made available in later stages if you add the corresponding specifications.
If you do not specify any stages, gitlab will put all jobs into the default test stage and execute them in parallel, assuming that they are independent and do not require each others artifacts. It will still store the artifacts but not make them available between the jobs. This is presumably what is causing your problems.
As for the cache: Artifacts are how you pass files between build stages. Caches are for well, caching. In practice, they are used for things like external packages in order to avoid having to download them multiple times, see here. Caches are somewhat unpredictable in situations with multiple different runners. They are only used for performance reasons, and passing files between jobs using cache rather than using the artifact system is a huge anti-pattern.
Edit: I don't know precisely what your knitr setup is, but if you generate an article.tex from your article.Rnw, then you probably need to add that to your artifacts as well.
Also, services are used for things like a MySQL server for testing databases, or the dind (docker in docker) daemon to build docker images. This should not be necessary in your case. Similarly, you should not need to change any runner configuration (in their respective config.toml) from the defaults.
Edit2: I added a MWE here, which works with my gitlab setup.
I have a Virtual machine that I have customized with all the packages I need and it runs using virt-manager. I logged in and created a few directories and scripts in that image. I would like to "snapshot" it and run the snapshot on openstack. I don't care if the runtime state is preserved. All I want is for the packages and custom code that I have created on the virtual machine to be there.
I tried various things with the qcow2 image that virt-manager uses but that did not work. I don't quite understand - where are the changes I made to the file system?
I would like to generate an image with my customizations that I can subsequently run on an openstack cloud.
How can I proceed?
Thanks
Ranga
I've setup a Vagrant box that runs my webserver to host my Symfony2 application.
Everything works fine except the folder synchronization.
I tried 2 things:
config.vm.synced_folder LOCALFOLDER, HOSTFOLDER
config.vm.synced_folder LOCALFOLDER, HOSTFOLDER, type="rsync"
Option 1: First option works, I actually don't know how file is shared but it works.
Files are copied in both way, but the application is SUPER slow.
Symfony is generating cache files which might be the issue, but I don't really know how to troubleshoot this and see what is happening.
Option 2: Sync is only done in one way (from my local machine to the vagrant box), which covers most of the case and is fast.
Issue is that when I use symfony command line on the vagrant box to generate some files they are not copied over to my local machine.
My question is:
What is the best way to proceed with 2 ways syncing? With option 1 how can I (as it might be the issue) exclude some files from syncing.
With Option 2 how can I make sure changes on remote are copied to my local machine?
If default synced folder strategy (VirtualBox shared folders, I imagine) is slow for your use case, you can choose a different one and, if you need, maintain the two-way sync:
If your host OS is Linux or Mac OS X, you can go with NFS.
If your host OS is Windows you can instead choose SMB.
Rsync is very fast but, as you've pointed out, is one-way only.
As it doesn't seem Vagrant offers a "built-in" way to do this here is what I did:
Configure Vagrant RSYNC folder on the folders that will contains application generated files (in Symfony2 it is your Bundle/Entity folder). Note that I didn't sync the root folder because some folders doesn't have to be rsynced (cache/logs...) and also because it was taking way too much time for the rsync process to parse all the folders/subfolders when I know that only the Entity folder will be generated.
As the Rsync has to be done from the Vagrant box to the host, I use vagrant-rsync-back plugin and thus run this manually everytime I use a command that generates code.
https://github.com/smerrill/vagrant-rsync-back#getting-started
Create an watcher on my local machine that will track any change in code and rsync it to the vagrant box.
https://gist.github.com/laurentlemaire/e423b4994c7452cddbd2
Vagrant mounts your project root as /vargrant folder inside box as 2 way share.
You can run your command there do get required files synced. Any I/O will be damn slow (like you already mentioned), however you will get your files. For other stuff use your 1-way synced folder.
I love github and RStudio for workflow. Recently, I've created a project template that makes directories and scripts etc. and would like to create locally and push to github.
In the past I created a repo for a project via https://github.com/ used version control in RStudio to create the local repo and then dump all files I already had there.
This seems wasteful of time. How can one to take the directory/repo that's already in RStudio with a .Rproj file and upload to github with out first creating the shell repo at https://github.com/?
I think this could save time in the workflow.
I thought I could just follow the directions -here- (under Adding version control to a project) to add version control but this doesn't allow me to push to github (nor should it because how does RStudio know which git site you want to push to).
The only way you could create a repository on github directly from your computer, without having to create it with their website first, would be to create a remote branch directly from git on your system. This is possible on some git installation, but not on Github.
However, Github provides an API that allows to create the repository from the command line, via a call to curl for example. You will find information on how to do it in this answer (not tested) :
curl -u 'USER:PASS' https://api.github.com/user/repos -d '{"name":"REPO"}'
git remote add origin git#github.com:USER/REPO.git
git push origin master
But I don't think you will be able to do it directly from RStudio : you will need to put your project under version control, and then to execute the three commands provided in the answer in a shell.
Have you seen hub?
hub create
git push -u origin master
will do the job for you, once hub is configured to access your GitHub account. If you want the project to be called different from the name of the parent directory, use
hub create projectname
The general usage is
hub create [NAME] [-p] [-d DESCRIPTION] [-h HOMEPAGE]
(-p -- private repository), and you can access many more GitHub features with this tool.
I'm trying to get Jenkins to copy the artifacts of a build to an archive directory on another server using the scp plugin.
Ideally, I'd like to be able to have the destination be dynamic based on the build version so the result would like something like /builds/<build version>/
For a build version like 1.2.3.4 it would look like:
/builds/1.2.3.4/
From reading the scp plugin page, it doesn't look like this is possible but I figured someone here may have figured it out.
Is there a way to do this?
Is it better to just put the artifacts with the version number embedded in the file name in one directory?
Like you said, I don't think the scp plugin can do it directly. However, there may be a workaround.
In your build, you have access to the build number using $BUILD_NUMBER (or %BUILD_NUMBER%, as the case may be -> Linux vs Windows).
In any case, as part of your script, you could create a directory with $BUILD_NUMBER as the name, so:
mkdir -p $BUILD_NUMBER
-or-
md %BUILD_NUMBER%
So, for example, a new directory would be /path/to/workspace/1.2.3.4
Once your build is complete, at the end of your script, create the above directory, move your artifact into it, and tar/zip the directory up.
Use this tar/zip file as your job's artifact.
Use the scp plugin to transfer this artifact to your destination machine, and untar/unzip it there (let's say at /path/to/artifact/directory)
What you will have then, is /path/to/artifact/directory/1.2.3.4.
For the next build, let's say 1.2.3.5, you will create a new directory (named 1.2.3.5), move your artifact into it at the end of the build, zip it up and transfer it.
When you unzip it at your destination, you'll have a new directory /path/to/artifact/directory/1.2.3.5 with the new build's artifact in it.
I know it sounds confusing, but is actually quite easy to implement.