I have a models folder that contains multiple sql files for example models/mart/table1.sql, models/mart/table2.sql, models/mart/table3.sql
I run this command manually on the terminal:
dbt run-operation generate_model_yaml --args '{"model_name": "table1"}'
However, instead of running it individually for each table, I want to include it in the Bitbucket pipeline. How can I modify the command such that it runs in a loop? It should be able to extract the tablename (filename) for all files from a specified folder eg (models/mart) and then run the command accordingly by replacing the model_name by the filename each time?
pipelines:
custom:
dbt-run-default:
- step:
name: 'Compile'
image: fishtownanalytics/dbt:1.0.0
script:
- cd dbt_4flow
- dbt deps --profiles-dir ./profiles
I'm not sure I quite understand which parts of the paths you want to pick apart. Does this work?
for file in models/mart/*.sql; do
table=$(basename "$file" .sql)
model=${file%/*}
printf " --args '{\"%s\": \"%s\"}'\n" "$model" "$table"
dbt run-operation generate_model_yaml --args "{\"model_name\": \"$table\"}"
done
Related
How do I execute a command using RScript myfile.R so that it uses the renv environment of the project/directory it's in, NOT my default environment?
There are a couple ways:
Ensure your working directory is set to the root of your renv project, and that the renv project's auto-loader is active. (You can set up the auto-loader by calling renv::activate() from R in that project.)
In your script, explicitly call renv::load("/path/to/project") to load the requested project.
If neither of these methods suffice, please file an issue at https://github.com/rstudio/renv/issues.
I recently had a similar problem but the answer by #kevin-ushey was insufficient. Here's the background. I need to be able to run Rscript from any directory (Because I had several statistical models which were to be called from a Docker file, forcing a Docker file to have WORKDIR many times is just too cumbersome when you have long files with several Rscript calls. Moreover, some of these models are called several times in different bash files, making it cumbersome to cd to the directory before every Rscript call). We needed something akin to conda activate where any Rscript call would just use the activated 'renv environment' by default, regardless of what your working directory is. Here's a dummy example:
Install renv with install.packages('renv').
Create dummy folder with a dummy script containing the beepr library (just for the sake of the example) and initialize the renv environment:
mkdir ~/renv_test/
cd ~/renv_test/
echo "library(beepr); print('success')" >> test.R
Rscript -e "renv::init()"
Create a Docker image with the code below:
FROM rocker/r-base
ENV PROJ_ROOT='/usr/local/src/renv_test'
ENV RENV_DIR='/usr/local/.renv/'
COPY . $PROJ_ROOT
# Copy the projects renv infrastructure to RENV_DIR and remove all traces of renv from PROJ_ROOT
RUN mkdir -p $RENV_DIR/renv/ && \
cp $PROJ_ROOT/renv.lock $RENV_DIR && \
cp $PROJ_ROOT/renv/activate.R $RENV_DIR/renv/ && \
echo "source('renv/activate.R')" >> $RENV_DIR/.Rprofile && \
cd $RENV_DIR && \
Rscript -e "renv::restore()" && \
cd $PROJ_ROOT && Rscript -e "renv::deactivate()" && \
rm -rf renv/ renv.lock
# Set RENV_DIR's restore library as the default library
RUN echo $(cd $RENV_DIR && Rscript -e "cat(paste0('R_LIBS=', renv::paths\$library()), sep = '\n')") >> $HOME/.Renviron
# Run any script from any directory as if you had 'renv activated'
CMD Rscript $PROJ_ROOT/test.R
Here's a summary of the approach:
Copy the project to the docker image
Copy the renv infrastructure to a separate folder (here ~/.renv/) and restore the project there.
Eliminate all traces of renv from the project folder (this is so we don't mess up the path of the library if for some reason we execute a script from the root of this project).
Edit .Renviron so that it contains the restored library path in ~/.renv as the default library. This ensures that any new R session will use that library as the first option.
Execute any R scripts located in the project folder without having to cd or WORKDIR (docker) to the project folder.
If you build and run the previous Docker image, you should get a success statement even though we never cd to the project folder:
docker build -t renv_test .
docker run renv_test
[1] "success"
I believe a simpler way than the above answers:
Rscript -e 'renv::run("/path/to/myscript.R")'
It will pick up the renv environment from the base path. You can also specify the environment using the project parameter.
I want to run an R script using SLURM. I have created the R script, "test.R" as shown:
print("Running the test script")
write.csv(head(mtcars), "mtcars_data_test.csv")
I created a bash script to run this R script "submit.sh"
#!/bin/bash
#sbatch --job-name=test.job
#sbatch --output=.out/abc.out
Rscript /home/abc/job_sub_test/test.R
And I submitted the job on the cluster
sbatch submit.sh
I am not sure where my output is saved. I looked in the home directory but no output file.
Edit
I also set my working directory in test.R, but nothing different.
setwd("/home/abc")
print("Running the test script")
write.csv(head(mtcars), "mtcars_data_test.csv")
When I run the script without SLURM Rscript test.R, it worked fine and saved the output according to the set path.
Slurm will set the job working directory to the directory which was the working directory when the sbatch command was issued.
Assuming the /home directory is mounted on all compute nodes, you can change explicitly the working directory with cd in the submission script, or setwd() in the R syntax. But that should not be necessary.
Three possibilities:
either the job did not start at all because of a configuration or hardware issue ; that you can find out with the sacct command, looking at the state column.
either the file was indeed created but on the compute node on a filesystem that is not shared; in that case the best option is to SSH to the compute node (which you can find out with sacct) and look for the file there; or
the script crashed and the file was not created at all, in that case you should look into the output file of the job (.out/abc.out). Beware that the .out directory must be present before the job starts, and that, as it starts with a ., it will be a hidden file, revealed in ls only with the -a argument.
The --output argument to sbatch is relative to the folder you submitted the job from. setwd inside the R script wouldn't affect it, because Slurm has already parsed that argument and started piping output to the file by the time the R script is running.
First, if you want the output to go to /home/abc/.out/ make sure you're in your homedir when you submit the script, or specify the full path to the --output argument.
Second, the .out folder has to exist; I tested this and Slurm does not create it if it doesn't.
I have raw micro-array expression data files with extension .CEL, but for few files the extension is somehow \ .CEL (ie. name space dot CEL). I have made a simple shell script that replaces file names correctly on ubuntu terminal, but I do not have any idea of using it in R environment. I have even tried it using system() command to execute shell script, but did not work for me.
Shell Script I have written is as follows:
for file in *.CEL;
do mv "$file" "${file//[[:space:]]}";
done
I need to execute a short script that just renames a few files (mv).
The script is in about 50 folders, each folder is named and currently I'm executing them from the shell and moving to the next folder using:
bash rename && cd ../folder01
Then pressing up and changing the last digit(s).
Is there a way to execute the script in all folders in one line?
Sure, use a for loop:
for f in folder*; do (cd "$f" && bash rename) ; done
Suppose the structure:
/foo/bar/
--file1
--file2
--file3
--folder1
--file4
--folder2
--file5
I want to run the unix zip utility, compressing the bar folder and all of it's files and subfolders, from foo folder, but not have the bar folder inside the zip, using only command line.
If I try to use the -j argument, it doesn't create the bar folder inside the zip as I want, but doesn't create folder1 and folder2. Doing -rj doesn't work.
(I know I can enter inside bar and do zip -r bar.zip . I want to know if it's possible to accomplish what $/foo/bar/ zip -r bar.zip . but doing it from $/foo).
You have to do cd /foo/bar then zip -r bar.zip ., however, you can group them with parentheses to run in a subshell:
# instead of: cd /foo/bar; zip -r bar.zip; cd -
( cd /foo/bar; zip -r bar.zip . )
The enclosed (paren-grouped) commands are run in a subshell and cd within it won't affect the outer shell session.
See sh manual.
Compound Commands
A compound command is one of the following:
(list) list is executed in a subshell environment (see COMMAND EXECUTION ENVIRONMENT below).
Variable assignments and builtin commands that affect the shell's environment do not remain in effect after the command completes.
The return status is the exit status of list.
zip doesn't have a -C (change directory) command like tar does
you can do:
cd folder1 && zip -r ../bar.zip *
from within a command line shell
or you can use bsdtar which is a version of tar from libarchive that can create zips
bsdtar cf bar.zip --format zip -C folder1 .
(this creates a folder called ./ -- not sure a way around that)
I can't speak for the OP's reasoning. I was looking for this solution as well.
I am in the middle of coding a program that creates an .ods by building the internal xml files and zipping them together. They must be in the root dir of the archive, or you get an error when you try and run OOo.
I'm sure there is a dozen other ways to do this:
create a blank .ods file in OOo named blank.ods, extract to dir named blank, then try running:
cd blank && zip -r ../blank.ods *
The way I wrote mine, the shell closes after one command, so I don't need to navigate back to the original directory, if you do simply add && cd .. to the command line:
cd blank && zip -r ../blank.ods * && cd ..