read two files, one raw and the other json - jq

I have two files. The first is a dnsmasq.leases file, text only. I can read this file using the -R switch. I have another file, macs.json, this file contains a json dictionary with mac address information in it. What I'd like to do is read these two files from jq cli using
jq -s raw:dnsmasq.leases macs.json
I can decompose it, and do it in stages, like:
jq -Rs '.|split("\n")' dnsmasq.leases | jq -s '.[0] as $macs|.[1] as $leases|etc' macs.json -
but I wondered if there was a way to read one raw and the other json at the same time?

You can always read a raw file using the --rawfile VAR FILENAME command-line option.

Related

Can I use conditional "or" statements in selecting files to download with wget

I'm trying to download a bunch of files via ftp with wget. I could do this manually for each of the variables that I am interested in, or I was wondering if I could specify these in an "or" type conditional statement in the filepath name.
For example, I would like to download all files that contain the strings "NRRS412", "NRRS443", "NRRS490", etc. I had planned to do individual calls to wget for each of these, like this:
wget -r -A "L3m*NRRS412*.nc" ftp://username:password#ftp.address
I cannot simply use "L3m*NRRS*.nc", as there are other "NRRS" strings that I don't want.
Is there a way to download all of my target strings in a single call to wget?
Thanks for any help
OK, I figured out the solution, which is to create several possible strings separated by commas:
wget -r -A "L3m*NRRS412*.nc, L3m*NRRS43*.nc, L3m*NRRS490*.nc" ftp://username:password#ftp.address

How to move files based on a list (which contains the filename and destination path) in terminal?

I have a folder that contains a lot of files. In this case images.
I need to organise these images into a directory structure.
I have a spreadsheet that contains the filenames and the corresponding path where the file should be copied to. I've saved this file as a text document named files.txt
+--------------+-----------------------+
| image01.jpg | path/to/destination |
+--------------+-----------------------+
| image02.jpg | path/to/destination |
+--------------+-----------------------+
I'm trying to use rsync with the --files-from flag but can't get it to work.
According to man rsync:
--include-from=FILE
This option is related to the --include option, but it specifies a FILE that contains include patterns (one per line). Blank lines in the file and lines starting with ';' or '#' are ignored. If FILE is -, the list will be read from standard input
Here's the command i'm using: rsync -a --files-from=/path/to/files.txt path/to/destinationFolder
And here's the rsync error: syntax or usage error (code 1) at /BuildRoot/Library/Caches/com.apple.xbs/Sources/rsync/rsync-52.200.1/rsync/options.c(1436) [client=2.6.9]
It's still pretty unclear to me how the files.txt document should be formatted/structured and why my command is failing.
Any help is appreciated.

How can I list all existing workspaces in JupyterLab?

How can I list all existing workspaces in JupyterLab?
I know that one can view the current workspace name in the URL:
When you create a workspace, this creates a file in ~/.jupyter/lab/workspaces. The name of your workspace is in the ['metadata']['id'] key of the corresponding JSON file.
a simple code to list all workspaces is therefore:
import os, glob, json
for fname in glob.glob(os.path.join(os.environ['HOME'], ".jupyter/lab/workspaces/*")):
with open (fname, "r") as read_file:
print (json.load(read_file)['metadata']['id'])
For convenience, I created a gist with that bit of code. I have also added some cosmetics to directly generate the different URLs:
$ list_workspaces.py -u
http://10.164.5.234:8888/lab
http://10.164.5.234:8888/lab/workspaces/BBCP
http://10.164.5.234:8888/lab/workspaces/blog
I think you could try
ls ~/.jupyter/lab/workspaces
Each time u create a new workspace, there will be a corresponding file generated here. More detailed docs are here
As others have pointed out, workspace files are located at ~/.jupyter/lab/workspaces. Each workspace is represented by a .jupyterlab-workspace, which is actually just a JSON file.
If you have the CLI tool jq installed, the following one-liner gives you a quick list of workspaces:
cat ~/.jupyter/lab/workspaces/* | jq -r '.metadata.id'
Sample output:
/lab
/lab/workspaces/aaaaaaaaaaaa
/lab/workspaces/xxxxxxxxxx
With most basic shell commands:
grep metadata ~/.jupyter/lab/workspaces/* | sed -e 's/"/ /g' | awk '{print $(NF-1)}'
Output will look like:
/lab
/lab/workspaces/auto-x
/lab/workspaces/foo

UNIX how to use the base of an input file as part of an output file

I use UNIX fairly infrequently so I apologize if this seems like an easy question. I am trying to loop through subdirectories and files, then generate an output from the specific files that the loop grabs, then pipe an output to a file in another directory whos name will be identifiable from the input file. SO far I have:
for file in /home/sub_directory1/samples/SSTC*/
do
samtools depth -r chr9:218026635-21994999 < $file > /home/sub_directory_2/level_2/${file}_out
done
I was hoping to generate an output from file_1_novoalign.bam in sub_directory1/samples/SSTC*/ and to send that output to /home/sub_directory_2/level_2/ as an output file called file_1_novoalign_out.bam however it doesn't work - it says 'bash: /home/sub_directory_2/level_2/file_1_novoalign.bam.out: No such file or directory'.
I would ideally like to be able to strip off the '_novoalign.bam' part of the outfile and replace with '_out.txt'. I'm sure this will be easy for a regular unix user but I have searched and can't find a quick answer and don't really have time to spend ages searching. Thanks in advance for any suggestions building on the code I have so far or any alternate suggestions are welcome.
p.s. I don't have permission to write files to the directory containing the input folders
Beneath an explanation for filenames without spaces, keeping it simple.
When you want files, not directories, you should end your for-loop with * and not */.
When you only want to process files ending with _novoalign.bam, you should tell this to unix.
The easiest way is using sed for replacing a part of the string with sed.
A dollar-sign is for the end of the string. The total script will be
OUTDIR=/home/sub_directory_2/level_2
for file in /home/sub_directory1/samples/SSTC/*_novoalign.bam; do
echo Debug: Inputfile including path: ${file}
OUTPUTFILE=$(basename $file | sed -e 's/_novoalign.bam$/_out.txt/')
echo Debug: Outputfile without path: ${OUTPUTFILE}
samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE}
done
Note 1:
You can use parameter expansion like file=${fullfile##*/} to get the filename without path, but you will forget the syntax in one hour.
Easier to remember are basename and dirname, but you still have to do some processing.
Note 2:
When your script first changes the directory to /home/sub_directory_2/level_2 you can skip the basename call.
When all the files in the dir are to be processed, you can use the asterisk.
When all files have at most one underscore, you can use cut.
You might want to add some error handling. When you want the STDERR from samtools in your outputfile, add 2>&1.
These will turn your script into
OUTDIR=/home/sub_directory_2/level_2
cd /home/sub_directory1/samples/SSTC
for file in *; do
echo Debug: Inputfile: ${file}
OUTPUTFILE="$(basename $file | cut -d_ -f1)_out.txt"
echo Debug: Outputfile: ${OUTPUTFILE}
samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE} 2>&1
done

Split files linux and then grep

I'd like to split a file and grep each piece without writing them to indvidual files.
I've attempted a couple variations of split and grep and no such luck; any suggestions?
Something along the lines of:
split -b SIZE filename | grep "string"
I've attempted grep/fgrep to find the string but my shell complains that the files are too large. See: use fgrep instead
There is no point in splitting the file if you plan to [linearly] search each of the pieces anyway (assuming that's the only thing you are doing with it). Consider running grep on the entire file.
If however you plan to utilize the fact that the file is split later on, then the typical way would be:
Create a temporary directory and step into it
Run split/csplit on the original file
Use for loop over written fragment to do your processing.

Resources