Is it possible to inspect all tables in a BigQuery dataset with one dlpJob? - google-cloud-dlp

I'm using Google Cloud DLP to inspect sensitive data in BigQuery. I wonder is it possible to inspect all tables within a dataset with one dlpJob? If so, how should I set the configs?
I tried to omit the BQ tableId field in config. But it will return http 400 error "table_id must be set". Does it mean that with one dlpJob, only one table can be inspected, and to scan multiple tables we need multiple dlpJobs? Or is there a way to scan multiple tables within the same dataset with some regex tricks?

At the moment, one job just scans one table. The team is working on that feature - in the meantime you can manually create jobs with a rough shell script like what I've put below which combines gcloud and the rest calls to the dlp api. You could probably do something a lot smoother with cloud functions.
Prerequisites:
1. Install gcloud. https://cloud.google.com/sdk/install
2. Run this script with the following arguments:
3.
1. The project_id to scan bigquery tables of.
2. The dataset id for the output table to store findings to.
3. The table id for the output table to store findings to.
4. A number that represents the percentage of rows to scan.
# Example:
# ./inspect_all_bq_tables.sh dlapi-test findings_daataset
# Reports a status of execution message to the log file and serial port
function report() {
local tag="${1}"
local message="${2}"
local timestamp="$(date +%s)000"
echo "${timestamp} - ${message}"
}
readonly -f report
# report_status_update
#
# Reports a status of execution message to the log file and serial port
function report_status_update() {
report "${MSGTAG_STATUS_UPDATE}" "STATUS=${1}"
}
readonly -f report_status_update
# create_job
#
# Creates a single dlp job for a given bigquery table.
function create_dlp_job {
local dataset_id="$1"
local table_id="$2"
local create_job_response=$(curl -s -H \
"Authorization: Bearer $(gcloud auth print-access-token)" \
-H "X-Goog-User-Project: $PROJECT_ID" \
-H "Content-Type: application/json" \
"$API_PATH/v2/projects/$PROJECT_ID/dlpJobs" \
--data '
{
"inspectJob":{
"storageConfig":{
"bigQueryOptions":{
"tableReference":{
"projectId":"'$PROJECT_ID'",
"datasetId":"'$dataset_id'",
"tableId":"'$table_id'"
},
"rowsLimitPercent": "'$PERCENTAGE'"
},
},
"inspectConfig":{
"infoTypes":[
{
"name":"ALL_BASIC"
}
],
"includeQuote":true,
"minLikelihood":"LIKELY"
},
"actions":[
{
"saveFindings":{
"outputConfig":{
"table":{
"projectId":"'$PROJECT_ID'",
"datasetId":"'$FINDINGS_DATASET_ID'",
"tableId":"'$FINDINGS_TABLE_ID'"
},
"outputSchema": "BASIC_COLUMNS"
}
}
},
{
"publishFindingsToCloudDataCatalog": {}
}
]
}
}')
if [[ $create_job_response != *"dlpJobs"* ]]; then
report_status_update "Error creating dlp job: $create_job_response"
exit 1
fi
local new_dlpjob_name=$(echo "$create_job_response" \
head -5 | grep -Po '"name": *\K"[^"]*"' | tr -d '"' | head -1)
report_status_update "DLP New Job: $new_dlpjob_name"
}
readonly -f create_dlp_job
# List the datasets for a given project. Once we have these we can list the
# tables within each one.
function create_jobs() {
# The grep pulls the dataset id. The td removes the quotation marks.
local list_datasets_response=$(curl -s -H \
"Authorization: Bearer $(gcloud auth print-access-token)" -H \
"Content-Type: application/json" \
"$BIGQUERY_PATH/projects/$PROJECT_ID/datasets")
if [[ $list_datasets_response != *"kind"* ]]; then
report_status_update "Error listing bigquery datasets: $list_datasets_response"
exit 1
fi
local dataset_ids=$(echo $list_datasets_response \
| grep -Po '"datasetId": *\K"[^"]*"' | tr -d '"')
# Each row will look like "datasetId", with the quotation marks
for dataset_id in ${dataset_ids}; do
report_status_update "Looking up tables for dataset $dataset_id"
local list_tables_response=$(curl -s -H \
"Authorization: Bearer $(gcloud auth print-access-token)" -H \
"Content-Type: application/json" \
"$BIGQUERY_PATH/projects/$PROJECT_ID/datasets/$dataset_id/tables")
if [[ $list_tables_response != *"kind"* ]]; then
report_status_update "Error listing bigquery tables: $list_tables_response"
exit 1
fi
local table_ids=$(echo "$list_tables_response" \
| grep -Po '"tableId": *\K"[^"]*"' | tr -d '"')
for table_id in ${table_ids}; do
report_status_update "Creating DLP job to inspect table $table_id"
create_dlp_job "$dataset_id" "$table_id"
done
done
}
readonly -f create_jobs
PROJECT_ID=$1
FINDINGS_DATASET_ID=$2
FINDINGS_TABLE_ID=$3
PERCENTAGE=$4
API_PATH="https://dlp.googleapis.com"
BIGQUERY_PATH="https://www.googleapis.com/bigquery/v2"
# Main
create_jobs

Related

jq: how to check two conditions in the any filter?

I have this line jq 'map(select( any(.topics[]; . == "stackoverflow" )))'
Now I want to modify it (I didn't write the original) to add another condition to the any function.
Something like this jq 'map(select( any(.topics[]; . == "stackoverflow" and .archived == "false" )))'
But it gives me “Cannot index string with string “archived”".
The archive field is on the same level as the topics array (it's repo information from the github API).
It is part of a longer command, FYI:
repositoryNames=$(curl \
-H "Authorization: token $GITHUB_TOKEN" \
-H "Accept: application/vnd.github.v3+json" \
"https://api.github.com/orgs/organization/repos?per_page=100&page=$i" | \
jq 'map(select(any(.topics[]; . == "stackoverflow")))' | \
jq -r '.[].name')
The generator provided to any already descends to .topics[] from where you cannot back-reference two levels higher. Use the select statement to filter beforehand (also note that booleans are not strings):
jq 'map(select(.archived == false and any(.topics[]; . == "stackoverflow")))'
You should also be able to combine both calls to jq into one:
jq -r '.[] | select(.archived == false and any(.topics[]; . == "stackoverflow")).name'

How can I add a user to a protected branch?

I would like to configure my gitlab project so that every maintainer can merge (after review) but nobody can push on master; only a bot (for release).
I'm using terraform to configure my gitlab, with something like this:
resource "gitlab_branch_protection" "BranchProtect" {
project = local.project_id
branch = "master"
push_access_level = "no one"
merge_access_level = "maintainer"
}
But with have a "premium" version and the terraform provider do not allow to add a user (goto: https://github.com/gitlabhq/terraform-provider-gitlab/issues/165 ).
So, what I like to do is doing some http request on the API to add the specific user.
So I'm doing it like this:
get the actual protection
delete the actual configuration
update the retrieved configuration with what I want
push the new configuration
BTW: I've not found how to just update the configuration... https://docs.gitlab.com/ee/api/protected_branches.html
TMP_FILE=$(mktemp)
http GET \
$GITLAB_URL/api/v4/projects/$pid/protected_branches \
PRIVATE-TOKEN:$GITLAB_TOKEN \
name=$BRANCH_NAME \
| \
jq \
--arg uid $USER_ID \
'.[0] | .push_access_levels |= . + [{user_id: ($uid | tonumber)}]' \
> $TMP_FILE
http DELETE \
"$GITLAB_URL/api/v4/projects/$pid/protected_branches/$BRANCH_NAME" \
PRIVATE-TOKEN:$GITLAB_TOKEN
http --verbose POST \
"$GITLAB_URL/api/v4/projects/$pid/protected_branches" \
PRIVATE-TOKEN:$GITLAB_TOKEN \
< $TMP_FILE
But my problem is that the resulting configuration is not what I expect, I've got something like this:
"push_access_levels": [
{
"access_level": 40,
"access_level_description": "Maintainers",
"group_id": null,
"user_id": null
}
],
How can I just update the branch protection to add a simple user ?
Ok like they say: RTFM !
But you need to delete the rule before adding the new configuration.
http \
DELETE \
"$GITLAB_URL/api/v4/projects/$pid/protected_branches/$BRANCH_NAME" \
PRIVATE-TOKEN:$GITLAB_TOKEN \
http \
POST \
$GITLAB_URL/api/v4/projects/$pid/protected_branches \
PRIVATE-TOKEN:$GITLAB_TOKEN \
name==${BRANCH_NAME} \
push_access_level==0 \
merge_access_level==40 \
unprotect_access_level==40 \
allowed_to_push[][user_id]==$USER_ID \

Extract fields from json using jq

I am trying write a shell script that will get some json from URL and parse the json and extract fields.
This is what is done so far.
#!/bin/bash
token=$(http POST :3000/signin/frontm user:='{"email": "sourav#frontm.com", "password": "Hello_789"}' | jq -r '.data.id_token')
cred=$(http POST :3000/auth provider_name:frontm token:$token user:=#/tmp/user.json | jq '{ creds: .creds, userUuid: .user.userId }')
echo $cred
access=$(jq -r "'$cred'")
echo $access
So the output from echo $cred is a json:
Eg:
{ "creds": { "accessKeyId": "ASIAJPM3RDAZXEORAQ5Q", "secretAccessK
ey": "krg5GbU6gtQV+a5pz4ChL+ECVJm+wKogjglXOqr6", "sessionToken": "Ag
oGb3JpZ2luEAYaCXVzLWVhc3QtMSKAAmhOg7fedV+sBw+8c45HL9naPjqbC0bwaBxq
mQ9Kuvnirob8KtTcsiBkJA/OfCTpYNUFaXXYfUPvbmW5UveDJd+32Cb5Ce+3lAOkkL
aZyWJgvhM1u53WNuMekhcZX7SnlCcaO4e/A9TR74qMOsVptonw5jFB5zjbEI4hFsVX
UHXtkYMYpSyG+2P2LxWRqTg4XKcg2vT+qrLtiXu3XNK70wuCe0/L4/HjjzlLvChmhe
TRs8u8ZRcJvSim/j1sLqe85Sl1qrFv/7msCaxUa3gZ3dOcfHliH64+8NHfS1tkaVkS
iM2x4wxTdZI/SafduFDvGCsltxe9p5zQD0Jb1Qe02ccqpgUIWxAAGgw3NzE5NTYwMD
EyNDciDOQZkq8t+c7WatNLHyqDBahqpQwxpGsYODIC1Db/M4+PXmuYMdYKLwjv3Df2
JeTMw2RT1h8M0IOOPvyBWetwB42HLhv5AobIMkNVSw6tpGyZC/bLMGJatptB0hVMBg
/80VnI7pTPiSjb/LG46bbwlbJevPoorCEEqMZ3MlAJ2Xt2hMmA+sHBRRvV1hlkMnS8
NW6w9xApSGrD001zdfFkmBbHw+c4vmX+TMT7Bw0bHQZ5FQSpEBOw9M5sNOIoa+G/pP
p4WoHiYfGHzaXGQe9Iac07Fy36W/WRebZapvF7TWoIpBjAV+IrQKP3ShJdBi3Oa6py
lGUQysPa3EN0AF/gDuTsdz7TDsErzzUERfQHksK495poG92YoG2/ir8yqTQtUDvshO
7U4SbFpUrozCYT6vp7++BWnpe+miIRCvjy2spqBqv2RY6lhgC6QPfS/365T+QbSTMc
R+ZNes0gX/QrEG4q1sMoxyTltL4sXS2Dz9UXywPkg78AWCOr34ii72m/67Gqe1P3KA
vBe9xF9Hem4H1WbYAqBN76ppyJyG17qK8b2/r71c8rdY+1gYcskV1vUfTQUVCGE0y2
JXKV2UMFOwoTzy6SFIGcuTeOAHiYPgTkMZ6X7hNjf56ihzBIbhSHaST8U4eNBka8j8
Y949ilJwz9QO0l1kwdb2+fQSMblHgeYvF1P8HxBSpRA28gKkkXMf73Zk27I3O2DRGb
lcXS4tKRvan4ASTi4qkdrvVwMT5mwJI4mGIJZSiMJqPxjVh5E9OicFbIOCRcbcIRDE
mj5t9EvaSbIm4ELBMuyoFjmKJmesE03uFRcHkEXkPBxhkJbQwkJeUxHll5kR1IYzvA
K2A2EiZqjkhiSJC4NRekEuM+5WowwuWw1wU=" }, "userUuid": "mugqRKHmTPxk
obBAtwTmKk" }
So basically I am stuck here .. how do i parse this json in $cred further and basically want to get access to say accessKeyId using jq further?
I wonder if the variable $cred really holds a string formated in 67 columns, but if that so, tr might help to remove the newline and extract the accessKeyId using jq:
echo "$cred" | tr -d '\n' | jq -r '.creds.accessKeyId'

Filtering information from JIRA

I would like to get some information from jira project, using http method, f.e.:
curl -D- -u uname:pass -X PUT -d "Content-Type: application/json" http://localhost:8080/jira/rest/api/2/search?jql=project=XXX%20created='-5d'
After all, I received a lot of information, but I would like get only one tag:
{"expand":"schema,names","startAt":0,"maxResults":50,"total":1234,"issues":
here - multiple lines....
Have You maybe idea, how I can get only "total":1234 field?
Thank You in advance.
Add the following to your URL:
&maxResults=0
Which will result in a return like:
{
"startAt": 0,
"maxResults": 0,
"total": 3504,
"issues": []
}
You can then pipe you're curl to an awk and get the number only with:
curl --silent "https://jira.atlassian.com/rest/api/2/search?jql=project=STASH%20&created=%27-5d%27&maxResults=0" | awk '{split($0,a,":"); print a[4]}' | awk '{split($0,a,","); print a[1]}'

Custom ceilometer metrics

I am trying to add a custom metric to ceilometer via API and have success in adding new metric and new data, but I have fail when try to see this new metric in dashboard.
The comand I gave use:
Get a token:
curl -i -X POST http://controller:35357/v2.0/tokens -H "Content-Type: application/json" -H "User-Agent: python-keystoneclient" -d '{"auth": {"tenantName": "test", "passwordCredentials": {"username": "admin", "password": "password"}}}' ;
Take token:
mysql -e 'use keystone; select id from token;' | tail -n 1
Add custom metric with data:
curl -X POST -H 'X-Auth-Token: TOKEN' -H 'Content-Type: application/json' -d '[{"counter_name": "test","user_id": "admin_user_id","resource_id": "Virtual_machine_ID","resource_metadata": {"display_name": "my_test","my_custom_metadata_1": "value1","my_custom_metadata_2": "value2"},"counter_unit": "%","counter_volume": 10.57762938230384,"project_id": "VM_tenant_ID","counter_type": "gauge"}]' http://controller:8777/v2/meters/test
All of that comands have success =)
Checking with comands like:
ceilometer sample-list -m test
ceilometer meter-list |grep test
ceilometer statistics -m test
they are returns the data that I have input before. But when I am open dashboard with Resources Usage Overview I can't see new metric in a list.
So I can't found a desicion of my problem. Anybody can help me?

Resources