Developing in a Docker image that's under version control - r

Currently have a pipeline that I use to build reports in R and publish in Jekyll. I keep my files under version control in github and that's been working great so far.
Recently I began thinking about how I might take R, Ruby and Jekyll and build a docker image that any of my coworkers could download and run the same report without having all of the packages and gems set up on their computer. I looked at Docker Hub and found that the automated builds for git commits were a very interesting feature.
I want to build an image that I could use to run this configuration and keep it under version control as well and keep it up to date in Docker Hub. How does something like this work?
If I just kept my current setup I could add a dockerfile to my repo and Docker Hub would build my image for me, I just think it would be interesting to run my work on the same image.
Any thoughts on how a pipeline like this may work?

Docker Hub build service should work (https://docs.docker.com/docker-hub/builds/). You can also consider using gitlab-ci or travis ci (gitlab will be useful for privet projects, it also provides privet docker registry).
You should have two Dockerfiles one with all dependencies and second very minimalistic one for reports (builds will be much faster). Something like:
FROM base_image:0.1
COPY . /reports
WORKDIR /reports
RUN replace-with-requiered-jekyll-magic
Dockerfile above should be in your reports repository.
In 2nd repository you can crate base image with all the tools and nginx or something for serving static files. Make sure that nginx www-root is set to /reports. If you need to update the tools just update base_mage tag in Dockerfile for reports.

Related

Passing files from a rocker container to a latex container within a gitlab-ci job

I would like to use Gitlab CI to compile a Latex article as explained in this answer on tex.stackexchange (a similar pdf generation example is shown in the gitlab documentation for artifacts). I use a special latex template given by the journal editor. My Latex article contains figures made with the R statistical software. R and Latex are two large software installations with a lot of dependencies so I decided to use two separate containers for the build, one for the statistical analysis and visualization with R and one to compile a Latex document to pdf.
Here is the content of .gitlab-ci.yml:
knit_rnw_to_tex:
image: rocker/verse:4.0.0
script:
- Rscript -e "knitr::knit('article.Rnw')"
artifacts:
paths:
- figure/
compile_pdf:
image: aergus/latex
script:
- ls figure
- latexmk -pdf -bibtex -use-make article.tex
artifacts:
paths:
- article.pdf
The knit_rnw_to_tex job executed in the R "rocker" container is successful and I can download the figure artifacts from the gitlab "jobs" page. The issue in the second job compile_pdf is that ls figure shows me an empty folder and the Latex article compilation fails because of missing figures.
It should be possible to use artifacts to pass data between jobs according to this answer and to this well explained forum post but they use only one container for different jobs. It doesn't work in my case. Probably because I use two different containers?
Another solution would be to use only the rocker/tidyverse container and install latexmk in there, but the installation of apt install latexmk fails for an unknown reason. Maybe because It has over hundred dependencies and that is to much for gitlab-CI?
The "dependencies" keyword could help according to that answer, but the artifacts are still not available when I use it.
How can I pass the artifacts from one job to the other?
Should I use cache as explained in docs.gitlab.com / caching?
Thank you for the comment as I wanted to be sure, how you do it. Example would help too, but I'll be generic for now (using docker).
To run multiple containers you need a
(The Docker executor)
To quote the documentation on it:
The Docker executor when used with GitLab CI, connects to Docker
Engine and runs each build in a separate and isolated container using
the predefined image that is set up in .gitlab-ci.yml and in
accordance in config.toml.
Workflow
The Docker executor divides the job into multiple steps:
Prepare: Create and start the services.
Pre-job: Clone, restore cache and download artifacts from previous stages. This is run on a special Docker image.
Job: User build. This is run on the user-provided Docker image.
Post-job: Create cache, upload artifacts to GitLab. This is run on a special Docker Image.
Your config.toml could look like this:
[runners.docker]
image = "rocker/verse:4.0.0"
builds_dir = /home/builds/rocker
[[runners.docker.services]]
name = "aergus/latex"
alias = "latex"
From above linked documentation:
The image keyword
The image keyword is the name of the Docker image that is present in the local Docker Engine (list all images with docker images) or any image that can be found at Docker Hub. For more information about images and Docker Hub please read the Docker Fundamentals documentation.
In short, with image we refer to the Docker image, which will be used to create a container on which your build will run.
If you don’t specify the namespace, Docker implies library which includes all official images. That’s why you’ll see many times the library part omitted in .gitlab-ci.yml and config.toml. For example you can define an image like image: ruby:2.6, which is a shortcut for image: library/ruby:2.6.
Then, for each Docker image there are tags, denoting the version of the image. These are defined with a colon (:) after the image name. For example, for Ruby you can see the supported tags at docker hub. If you don’t specify a tag (like image: ruby), latest is implied.
The image you choose to run your build in via image directive must have a working shell in its operating system PATH. Supported shells are sh, bash, and pwsh (since 13.9) for Linux, and PowerShell for Windows. GitLab Runner cannot execute a command using the underlying OS system calls (such as exec).
The services keyword
The services keyword defines just another Docker image that is run during your build and is linked to the Docker image that the image keyword defines. This allows you to access the service image during build time.
The service image can run any application, but the most common use case is to run a database container, e.g., mysql. It’s easier and faster to use an existing image and run it as an additional container than install mysql every time the project is built.
You can see some widely used services examples in the relevant documentation of CI services examples.
If needed, you can assign an alias to each service.
As for your questions:
It should be possible to use artifacts to pass data between jobs
according to this answer and to this well explained forum post but
they use only one container for different jobs. It doesn't work in my
case. Probably because I use two different containers?
The builds and cache storage (from documentation)
The Docker executor by default stores all builds in /builds/<namespace>/<project-name> and all caches in /cache (inside the container). You can overwrite the /builds and /cache directories by defining the builds_dir and cache_dir options under the [[runners]] section in config.toml. This will modify where the data are stored inside the container.
If you modify the /cache storage path, you also need to make sure to mark this directory as persistent by defining it in volumes = ["/my/cache/"] under the [runners.docker] section in config.toml.
builds_dir -> Absolute path to a directory where builds are stored in the context of the selected executor. For example, locally, Docker, or SSH.
The [[runners]] section documentation
As you may have noticed I have customized the build_dir in your toml file to /home/builds/rocker, please adjust it to your own path.
How can I pass the artifacts from one job to the other?
You can use the build_dir directive. Second option would to use Job Artifacts API.
Should I use cache as explained in docs.gitlab.com / caching?
Yes, You should use cache to store project dependencies. The advantage is that you fetch the dependencies only once from internet and then subsequent runs are much faster as they can skip this step. Artifacts are used to share results between build stages.
I hope it is now clearer and I have pointed you into right direction.
The two different images are not the cause of your problems. The artifacts are saved in one image (which seems to work), and then restored in the other. I would therefore advise against building (and maintaining) a single image, as that should not be necessary here.
The reason you are having problems is that you are missing build stages which inform gitlab about dependencies between the jobs. I would therefore advise you to specify stages as well as their respective jobs in your .gitlab-ci.yml:
stages:
- do_stats
- do_compile_pdf
knit_rnw_to_tex:
stage: do_stats
image: rocker/verse:4.0.0
script:
- Rscript -e "knitr::knit('article.Rnw')"
artifacts:
paths:
- figure/
compile_pdf:
stage: do_compile_pdf
image: aergus/latex
script:
- ls figure
- latexmk -pdf -bibtex -use-make article.tex
artifacts:
paths:
- article.pdf
Context:
By default, all artifacts of previous build stages are made available in later stages if you add the corresponding specifications.
If you do not specify any stages, gitlab will put all jobs into the default test stage and execute them in parallel, assuming that they are independent and do not require each others artifacts. It will still store the artifacts but not make them available between the jobs. This is presumably what is causing your problems.
As for the cache: Artifacts are how you pass files between build stages. Caches are for well, caching. In practice, they are used for things like external packages in order to avoid having to download them multiple times, see here. Caches are somewhat unpredictable in situations with multiple different runners. They are only used for performance reasons, and passing files between jobs using cache rather than using the artifact system is a huge anti-pattern.
Edit: I don't know precisely what your knitr setup is, but if you generate an article.tex from your article.Rnw, then you probably need to add that to your artifacts as well.
Also, services are used for things like a MySQL server for testing databases, or the dind (docker in docker) daemon to build docker images. This should not be necessary in your case. Similarly, you should not need to change any runner configuration (in their respective config.toml) from the defaults.
Edit2: I added a MWE here, which works with my gitlab setup.

Deploy Docker to offline PC

I am new to docker and have hit a road block I am having troule figuring out.
Here is my scenario
Current (pre-container)
We do a Visual Studio Online build that outputs an MSI. The build uses the full .Net Framework (we will later go to .Net core).
The MSI is put on a flash drive and installed on an offline (no internet access) computer.
The MSI installs several Windows services that expose web api web services.
Local clients can query those web services.
Desired (containers)
We wish to replace the windows services with docker containers.
The installation still needs to be performed offline (no internet access)
We wish to use docker community edition to avoid cost.
One assumption: the offline computer will have docker installed and will have already downloaded the base image "microsoft/aspnet".
To start figuring this out, I simply created a new ASP.NET Web Application from VS 2017. I chose Web Api and to enable docker support. Great, now I have a container running with a web site / service in it. I next wanted to try to figure out how to deploy the container / image. For reference, here is the default Dockerfile that was created when I created the empty project.
FROM microsoft/aspnet:latest
ARG source
WORKDIR /inetpub/wwwroot
COPY ${source:-obj/Docker/publish} .
I first looked at "docker save". My thinking was that I could save the image as a file and use that to deploy the container. However, because I am using the full .net framework, the saved file is 7.7 GB. I understand why it is so large; that image has not only my sample web site in it, but also the microsoft/aspnet image in it too. After some googling, I found references do being able to exclude layers (https://github.com/moby/moby/pull/9304), but it does not appear that "docker save" supports that. Ultimately though, that is what I think I want - to be able to save just my layer to a file.
Am I going down the right path with trying to figure out how to save a layer? We are pretty open on how to accomplish this, but we are not able to deploy a 7.7 GB file for every software update.
Any suggestions on how to do this - especially any that incorporate the VS Online build are greatly appreciated.
Thanks.
The only way to transfer an image offline is to save it into tarball using docker save
As for the size of the image, the solution is to use a smaller aspnet. The one you are using is 7GB large. Thus you need to choose a smaller aspnet image that would be sufficient from the available ones
Another solution is to transfer the source code and build the image on the target machine. In this case, you save the microsoft/aspnet:latest to a tarball and transfer it once to the target machine. Whenever you have new updates in the source, you copy the source and the Dockerfile to the target machine and you build the image there.

How do I edit files in a docker image?

I am just setting up docker on my local machine for web-dev.
I have seen lots of tutorials for docker with rails etc...
I am curious how does docker work in terms of editing the projects source code.
I am trying to wrap my head around this -v tag.
In many of the tutorials I have seen users have stored their Dockerfile in the project base directory and the built from there, do you just edit the code in the directory and refresh the browser? And leave docker running.
Just trying to wrap my head around it all, sorry if basic question.
I usually differentiate two use cases of Docker:
in one case I want a Dockerfile that helps end users get started easily
in another case I want a Dockerfile to help code contributors to have a testing environment up and running easily
For end users, you want your Dockerfile to
install dependencies
checkout the latests stable code (from github or elsewhere)
setup some kind of default configuration
for contributors, you want your Dockerfile to
install dependencies
document how to run a Docker container setting up a volume to share the source code between their development environment and the docker container.
To sum up, for end users the Docker image should embed the application code while for contributors the docker image will just have the dependencies.

TeamCity for static site

I am a front-end developer. I do a lot of PHPs/CSS/JS and HTMLs. Currently, how we do our deployment to staging environment is to push our codes to GIT servers. Go to our staging servers and do a pull to some directory. And then manually move the files from the directory to the correct directories in our apache web server.
Will it be overkill if I use TeamCity to do this? I intend to write an ANT script that does the copying which means to say Runner type will be ANT. So every time there is a push to the GIT repo, Teamcity will pull and then run the ANT script to copy the affected codes to the correct directories.
If not, I will gladly love to listen to any suggestions.
Thanks
Teamcity may be overkill right now, as you would just be using it as a fancy trigger for your build.
But consider adding custom build parameters, which it can pass to your script. You can then start automating builds to different environments through a friendly UI.
You then have a platform to base a correct deployment process around further down the road.
Come the time when you need PHP compilation, JS minifcation, unit testing, its all just another step in your TC configuration.
I would recommend it.

A build and deployment machine with a web-based dashboard

Here is what I am trying to do: I have my code sitting on Bitbucket (it is a ASP.net web application). I want to put together a build machine that has a web-based dashboard. I go to the dashboard and I say: Ok show me all the branches and stuff on my Bitbucket project. Now please get the latest code from this branch for me and build it. Then please deploy it to this location for me or maybe this other location. I want this dashboard give me a history of all this builds and deployments too. I am very new to this concept I have been reading about CC.net and MSBuild and other stuff but I can not find the right answer. Please put me in the right direction.
The point of a build server is that it automatically runs a build each time you commit something to your repository.
In order for the build server to know exactly what to do, you normally put a build script (with MSBuild or NAnt) into your solution which does everything you want - building your solution, maybe create a setup package and so on.
The build server basically knows where the repository is and where in the repository your build script is.
You need to configure this once in the build server, and then it will always run after you commit (but you can also start a build manually, if you want).
If you want a solution with web-based dashboard, try TeamCity.
It's a commercial product, but it's free for up to 20 users.
You can do everything in the web interface - configuration, running the builds AND browsing the build history.
EDIT:
Houda, concerning your question about deployment:
I don't think that TeamCity has a "deployment mode" in that sense. What you could do is include the deployment stuff in your build script that is run by TeamCity.
So, after the build itself is finished, copy the generated assemblies and files on your web server(s).
If you do it this way, you HAVE to make sure in the build script that the deployment will only happen if the build didn't fail (and if you have unit tests, if the unit tests didn't fail as well).
This is very important for a live application, because if you don't take care of this well enough, your app will go immediately offline every time someone commits "bad" code to your repository (and it will stay offline until the next "good" commit)!!
EDIT 2:
See Lasse V. Karlsen's comment below: it's more comfortable with the new TeamCity version 6.

Resources