How do I parse hard drive identifiers - standards

This image shows some hard drive IDs, they look pretty standardized (acquired from a web GUI which gathered the data from the command prompt on CentOS).
Are these drive IDs standardized and how can I parse the data out (of any set of hard drives on the market). i.e. I want to end up with the following variables (would regex work for any drive on the market?):
type=scsi
type2=SATA
MFR=WDC
model=WDC_WD1001FALS
serial=WD-WCATR6632234
Is this apparent order truly standardized across all mfrs and how do I parse it?

The pattern you see comes from a .rules file on your computer, something like "60-persistent-storage.rules":
# by-id (hardware serial number)
KERNEL=="hd*[!0-9]", ENV{ID_SERIAL}=="?*", \
SYMLINK+="disk/by-id/ata-$env{ID_SERIAL}"
KERNEL=="hd*[0-9]", ENV{ID_SERIAL}=="?*", \
SYMLINK+="disk/by-id/ata-$env{ID_SERIAL}-part%n"
KERNEL=="sd*[!0-9]", ENV{ID_SCSI_COMPAT}=="?*", \
SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}"
KERNEL=="sd*[0-9]", ENV{ID_SCSI_COMPAT}=="?*", \
SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}-part%n"
ENV{DEVTYPE}=="disk", ENV{ID_BUS}=="?*", ENV{ID_SERIAL}=="?*", \
SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}"
ENV{DEVTYPE}=="partition", ENV{ID_BUS}=="?*", ENV{ID_SERIAL}=="?*", \
SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"
ENV{DEVTYPE}=="disk", ENV{ID_EDD}=="?*", \
SYMLINK+="disk/by-id/edd-$env{ID_EDD}"
ENV{DEVTYPE}=="partition", ENV{ID_EDD}=="?*", \
SYMLINK+="disk/by-id/edd-$env{ID_EDD}-part%n"
ENV{DEVTYPE}=="disk", ENV{ID_WWN_WITH_EXTENSION}=="?*", \
SYMLINK+="disk/by-id/wwn-$env{ID_WWN_WITH_EXTENSION}"
ENV{DEVTYPE}=="partition", ENV{ID_WWN_WITH_EXTENSION}=="?*", \
SYMLINK+="disk/by-id/wwn-$env{ID_WWN_WITH_EXTENSION}-part%n"
These rules can be changed.
Note that your strings are SCSI, and the rules for SCSI IDs follow these rules (although I'm not sure how exactly this works).

Related

How to use include and exclude precedence with rsync?

I'm trying to backup a machine using rsync and after reading numerous SO QAs and the man pages, I'm still failing to understand how include/exclude precedence works so I can transfer the right set of files. Obfuscating the specific details, I'm attempting the following:
Include rescursively:
/home/erik/foo
/home/erik/bar
/home/erik/baz
Include /git rescursively, but exclude some specific subdirs, like /git/src/github.com/foo and /git/src/github.com/bar.
So far, the rsync command I think should accomplish this. It does not, and I've tried a number of variations that fail in different ways:
rsync -am \
--include='*/' \
--include='/home/erik/foo' \
--include='/home/erik/bar' \
--include='/home/erik/baz' \
--include='/git' \
--exclude='/git/bin' \
--exclude='/git/src/github.com/foo' \
--exclude='/git/src/github.com/bar' \
--exclude='*' \
/ nfs.example.com:/data/pool/backup/laptop
Some specific questions:
I have seen it suggested many times that the initial --include='*/ is necessary, although I'm not entirely sure why. I gather it has something to do with ensuring directories are expanded and followed(?). I also assume the final exclude is what excludes any file that doesn't match on the higher statements? Could someone elaborate on whether or not these are both necessary, and their positions are significant?
I am unsure whether or not the directories need leading /. I have seen hints that these paths are relative to the requested transfer root of /, which would suggest it should be something like home/erik, but I have not had success with this either. Could someone expand on how this works?
I'm unsure if a suffix / is required in the paths if I want to include the directory and all subcontents?
Could someone elaborate on whether or not the position of the params is actually significant, i.e., the first in the list to match will be applied?
Is there any reason I should prefer --filter='+ X' over --include? Same for exclude?
I'm unsure if there's a more native way to cross post an answer, but I asked on Stack Exchange and got this excellent answer that solved what I was trying to do perfectly, with context.

How to select all files from one sample?

I have a problem figuring out how to make the input directive only select all {samples} files in the rule below.
rule MarkDup:
input:
expand("Outputs/MergeBamAlignment/{samples}_{lanes}_{flowcells}.merged.bam", zip,
samples=samples['sample'],
lanes=samples['lane'],
flowcells=samples['flowcell']),
output:
bam = "Outputs/MarkDuplicates/{samples}_markedDuplicates.bam",
metrics = "Outputs/MarkDuplicates/{samples}_markedDuplicates.metrics",
shell:
"gatk --java-options -Djava.io.tempdir=`pwd`/tmp \
MarkDuplicates \
$(echo ' {input}' | sed 's/ / --INPUT /g') \
-O {output.bam} \
--VALIDATION_STRINGENCY LENIENT \
--METRICS_FILE {output.metrics} \
--MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 200000 \
--CREATE_INDEX true \
--TMP_DIR Outputs/MarkDuplicates/tmp"
Currently it will create correctly named output files, but it selects all files that match the pattern based on all wildcards. So I'm perhaps halfway there. I tried changing {samples} to {{samples}} in the input directive as such:
expand("Outputs/MergeBamAlignment/{{samples}}_{lanes}_{flowcells}.merged.bam", zip,
lanes=samples['lane'],
flowcells=samples['flowcell']),`
but this broke the previous rule somehow. So the solution is something like
input:
"{sample}_*.bam"
But clearly this doesn't work.
Is it possible to collect all files that match {sample}_*.bam with a function and use that as input? And if so, will the function still work with $(echo ' {input}' etc...) in the shell directive?
If you just want all the files in the directory, you can use a lambda function
from glob import glob
rule MarkDup:
input:
lambda wcs: glob('Outputs/MergeBamAlignment/%s*.bam' % wcs.samples)
output:
bam="Outputs/MarkDuplicates/{samples}_markedDuplicates.bam",
metrics="Outputs/MarkDuplicates/{samples}_markedDuplicates.metrics"
shell:
...
Just be aware that this approach can't do any checking for missing files, since it will always report that the files needed are the files that are present. If you do need confirmation that the upstream rule has been executed, you can have the previous rule touch a flag, which you then require as input to this rule (though you don't actually use the file for anything other than enforcing execution order).
If I understand correctly, zip needs to be applied only to {lane} and {flowcells} and not to {samples}. In that case, use two expand instances can achieve that.
input:
expand(expand("Outputs/MergeBamAlignment/{{samples}}_{lanes}_{flowcells}.merged.bam",
zip, lanes=samples['lane'], flowcells=samples['flowcell']),
samples=samples['sample'])
PS: output.tmp file uses {sample} instead of {samples}. Typo?

Firebase BigQuery migration bash error

I am using the standard workflow of Google to run the migration from the old dataset to the new dataset (Migration steps). I imputed the missing values, such as Property ID, BigQuery ID, etc. When I ran the bash script the following error occured?
Migrating mindfulness.com_mindfulness_ANDROID.app_events_20180515
--allow_large_results --append_table --batch --debug_mode --destination_table=analytics_171690789.events_20180515 --noflatten_results --nouse_legacy_sql --parameter=firebase_app_id::1:437512149764:android:0dfd4ab1e9926c7c --parameter=date::20180515 --parameter=platform::ANDROID#platform --project_id=mindfulness --use_gce_service_account
FATAL Flags positioning error: Flag '--project_id=mindfulness' appears after final command line argument. Please reposition the flag.
Run 'bq help' to get help.
On stackoverflow and Google I couldn't find a solution. Someone any idea how to solve this?
My migration.sh script (with small modifications to the IDs to stay anonymous)
# Analytics Property ID for the Project. Find this in Analytics Settings in Firebase
PROPERTY_ID=171230123
# Bigquery Export Project
BQ_PROJECT_ID="mindfulness" #(e.g., "firebase-public-project")
# Firebase App ID for the app.
FIREBASE_APP_ID="1:123412149764:android:0dfd4ab1e1234c7c" #(e.g., "1:300830567303:ios:
# Dataset to import from.
BQ_DATASET="com_mindfulness_ANDROID" #(e.g., "com_firebase_demo_IOS")
# Platform
PLATFORM="ANDROID"#"platform of the app. ANDROID or IOS"
# Date range for which you want to run migration, [START_DATE,END_DATE]
START_DATE=20180515
END_DATE=20180517
# Do not modify the script below, unless you know what you are doing :)
startdate=$(date -d"$START_DATE" +%Y%m%d) || exit -1
enddate=$(date -d"$END_DATE" +%Y%m%d) || exit -1
# Iterate through the dates.
DATE="$startdate"
while [ "$DATE" -le "$enddate" ]; do
# BQ table constructed from above params.
BQ_TABLE="$BQ_PROJECT_ID.$BQ_DATASET.app_events_$DATE"
echo "Migrating $BQ_TABLE"
cat migration_script.sql | sed -e "s/SCRIPT_GENERATED_TABLE_NAME
$BQ_TABLE/g" | bq query \
--debug_mode \
--allow_large_results \
--noflatten_results \
--use_legacy_sql=False \
--destination_table analytics_$PROPERTY_ID.events_$DATE \
--batch \
--append_table \
--parameter=firebase_app_id::$FIREBASE_APP_ID \
--parameter=date::$DATE \
--parameter=platform::$PLATFORM \
--project_id=$BQ_PROJECT_ID
temp=$(date -I -d "$DATE + 1 day")
DATE=$(date -d "$temp" +%Y%m%d)
done
exit
# END OF SCRIPT
If you look at the output of your script, it contains this bit of text, right before the flag that's out of order:
--parameter=platform::ANDROID#platform --project_id=mindfulness
I'm pretty sure you want your platform to be ANDROID, not ANDROID#platform.
I suspect you can fix this just by putting a space between the end of the string, and the inline comment. So you have something like this:
PLATFORM="ANDROID" #"platform of the app. ANDROID or IOS"
Although to be safe, you might want to remove the comments entirely at the end of each line.

How to pre load syntaxnet so that it takes lesser time to give dependency parse output

I am using demo.sh provided in syntaxnet repository. If I give input with '\n' separation, it is taking 27.05 seconds for running 3000 lines of text but when I run each line individually, it is taking more than one hour.
It means loading the model takes over 2.5 seconds. If this step is separated and has been put on cash, it will make the whole pipeline faster.
Here is modified version of demo.sh:-
PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
MODEL_DIR=syntaxnet/models/parsey_mcparseface
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin
$PARSER_EVAL \
--input=$INPUT_FORMAT \
--output=stdout-conll \
--hidden_layer_sizes=64 \
--arg_prefix=brain_tagger \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/tagger-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
| \
$PARSER_EVAL \
--input=stdin-conll \
--output=stdout-conll \
--hidden_layer_sizes=512,512 \
--arg_prefix=brain_parser \
--graph_builder=structured \
--task_context=$MODEL_DIR/context.pbtxt \
--model_path=$MODEL_DIR/parser-params \
--slim_model \
--batch_size=1024 \
--alsologtostderr \
I want to build a function call that will take input sentence and give output with dependency parser stored on to local variable like below( the below code is just to make the question clear )
dependency_parsing_model = ...
def give_dependency_parser(sentence,model=dependency_parsing_model):
...
#logic here
...
return dependency_parsing_output
In the above, model is stored in to a variable, so it takes lesser time for running each line on function call.
How to do this ?
The current version of syntaxnet's Parsey McParseface has two limitations which you've run across:
Sentences are read from stdin or a file, not from a variable
The model is in two parts and not a single executable
I have a branch of tensorflow/models:
https://github.com/dmansfield/models/tree/documents-from-tensor
which I'm working with the maintainers to get merged. With this branch of the code you can build the entire model in one graph (using a new python script called parsey_mcparseface.py) and feed sentences in with a tensor (i.e. a python variable).
Not the best answer in the world I'm afraid because it's very much in flux. There's no simple recipe for getting this working at the moment.

Completion when program has sub-commands

I have written a command-line tool that uses sub-commands much like Mercurial, Git, Subversion &c., in that its general usage is:
>myapp [OPTS] SUBCOMMAND [SUBCOMMAND-OPTS] [ARGS]
E.g.
>myapp --verbose speak --voice=samantha --quickly "hello there"
I'm now in the process of building Zsh completion for it but have quickly found out that it is a very complex beast. I have had a look at the _hg and _git completions but they are very complex and different in approach (I struggle to understand them), but both seem to handle each sub-command separately.
Does anyone know if there a way using the built in functions (_arguments, _values, pick_variant &c.) to handle the concept of sub-commands correctly, including handling general options and sub-command specific options appropriately? Or would the best approach be to manually handle the general options and sub-command?
A noddy example would be very much appreciated.
Many thanks.
Writing completion scripts for zsh can be quite difficult. Your best bet is to use an existing one as a guide. The one for Git is way too much for a beginner. You can use this repo:
https://github.com/zsh-users/zsh-completions
As for your question, you have use the concept of state. You define your subcommands in a list and then identify via $state which command you are in. Then you define the options for each command. You can see this in the completion script for play. A simplified version is below:
_play() {
local ret=1
_arguments -C \
'1: :_play_cmds' \
'*::arg:->args' \
&& ret=0
case $state in
(args)
case $line[1] in
(build-module|list-modules|lm|check|id)
_message 'no more arguments' && ret=0
;;
(dependencies|deps)
_arguments \
'1:: :_play_apps' \
'(--debug)--debug[Debug mode (even more informations logged than in verbose mode)]' \
'(--jpda)--jpda[Listen for JPDA connection. The process will suspended until a client is plugged to the JPDA port.]' \
'(--sync)--sync[Keep lib/ and modules/ directory synced. Delete unknow dependencies.]' \
'(--verbose)--verbose[Verbose Mode]' \
&& ret=0
;;
esac
esac
(If you are going to paste this, use the original source, as this won't work).
It looks daunting, but the general idea is not that complicated:
The subcommand comes first (_play_cmds is a list of subcommands with a description for each one).
Then come the arguments. The arguments are built based on which subcommand you are choosing. Note that you can group multiple subcommands if they share arguments.
With man zshcompsys, you can find more info about the whole system, although it is somewhat dense.
I found a technique that works well and is easy to understand. Basically, you create a new completion function for each subcommand and call it from the top-level completion function. Here's an example with dolt, showing how dolt completes to dolt table and dolt table completes to dolt table import, which then completes with a set of flags:
_dolt() {
local line state
_arguments -C \
"1: :->cmds" \
"*::arg:->args"
case "$state" in
cmds)
_values "dolt command" \
"table[Commands for copying, renaming, deleting, and exporting tables.]" \
;;
args)
case $line[1] in
table)
_dolt_table
;;
esac
;;
esac
}
_dolt_table() {
local line state
_arguments -C \
"1: :->cmds" \
"*::arg:->args"
case "$state" in
cmds)
_values "dolt_table command" \
"import[Creates, overwrites, replaces, or updates a table from the data in a file.]" \
;;
args)
case $line[1] in
import)
_dolt_table_import
;;
esac
;;
esac
}
_dolt_table_import() {
_arguments -s \
{-c,--create-table}'[Create a new table, or overwrite an existing table (with the -f flag) from the imported data.]' \
{-u,--update-table}'[Update an existing table with the imported data.]' \
{-f,--force}'[If a create operation is being executed, data already exists in the destination, the force flag will allow the target to be overwritten.]' \
{-r,--replace-table}'[Replace existing table with imported data while preserving the original schema.]' \
'(--continue)--continue[Continue importing when row import errors are encountered.]' \
{-s,--schema}'[The schema for the output data.]' \
{-m,--map}'[A file that lays out how fields should be mapped from input data to output data.]' \
{-pk,--pk}'[Explicitly define the name of the field in the schema which should be used as the primary key.]' \
'(--file-type)--file-type[Explicitly define the type of the file if it can''t be inferred from the file extension.]' \
'(--delim)--delim[Specify a delimiter for a csv style file with a non-comma delimiter.]'
}
I wrote a full guide here:
https://www.dolthub.com/blog/2021-11-15-zsh-completions-with-subcommands/

Resources