How to scrape these player ratings from Squawka - web-scraping

I want to scrape the player ratings from Squawka, if I just do a URL request and parse the content in Python using BeautifulSoup, I do not see the ratings or player names show up anywhere. How should I proceed? For the specific URL see:
http://www2.squawka.com/football-player-rankings#performance-score#player-stats#english-premier-league|season-2017/2018#all-teams#all-player-positions#16#40#0#0#90#11/08/2017#13/05/2018#season#1#all-matches#total

The data is not embedded in the html but is retrieved from another JSON api http://www2.squawka.com/wp-content/themes/squawka_web/leaderboard_process-v2.php with some url parameters :
http://www2.squawka.com/wp-content/themes/squawka_web/leaderboard_process-v2.php?type=Player%20Stats&filter=2&league=819&team=0,31,299,301,302,33,169,34,309,315,36,37,38,39,43,44,46,47,323,48,49&played=All%20matches&position=All%20Player%20Positions&agestart=16&ageend=40&noofmatch=0&seasonstart=11/08/2017&seasonend=13/05/2018&by=season&timestart=0&timeend=90&is_home=1&showtype=total
To get the player full name and the total, using curl & jq :
curl -s 'http://www2.squawka.com/wp-content/themes/squawka_web/leaderboard_process-v2.php?type=Player%20Stats&filter=2&league=819&team=0,31,299,301,302,33,169,34,309,315,36,37,38,39,43,44,46,47,323,48,49&played=All%20matches&position=All%20Player%20Positions&agestart=16&ageend=40&noofmatch=0&seasonstart=11/08/2017&seasonend=13/05/2018&by=season&timestart=0&timeend=90&is_home=1&showtype=total' | \
jq '[
.result | .. | {total: .data?.total?, name: .info?.full_name?} | select(.total != null)
] | sort_by(.total) | reverse'

Related

jq: how to check two conditions in the any filter?

I have this line jq 'map(select( any(.topics[]; . == "stackoverflow" )))'
Now I want to modify it (I didn't write the original) to add another condition to the any function.
Something like this jq 'map(select( any(.topics[]; . == "stackoverflow" and .archived == "false" )))'
But it gives me “Cannot index string with string “archived”".
The archive field is on the same level as the topics array (it's repo information from the github API).
It is part of a longer command, FYI:
repositoryNames=$(curl \
-H "Authorization: token $GITHUB_TOKEN" \
-H "Accept: application/vnd.github.v3+json" \
"https://api.github.com/orgs/organization/repos?per_page=100&page=$i" | \
jq 'map(select(any(.topics[]; . == "stackoverflow")))' | \
jq -r '.[].name')
The generator provided to any already descends to .topics[] from where you cannot back-reference two levels higher. Use the select statement to filter beforehand (also note that booleans are not strings):
jq 'map(select(.archived == false and any(.topics[]; . == "stackoverflow")))'
You should also be able to combine both calls to jq into one:
jq -r '.[] | select(.archived == false and any(.topics[]; . == "stackoverflow")).name'

Problem with jq doing a select and contains

I have limited experience with jq and am having an issue doing a a select contains for a string in a boolean. This is my json and am looking to get back just tdonn.
[
"user",
"admin"
]
[
[
"tdonn",
true
]
]
Here is what im trying. I have tried many different ways too.
jq -e -r '.results[] | .series[] | select(.values[] | contains("tdon"))[]'
With the sample JSON shown in a comment, the following filter would produce the result shown:
.results[] | .series[][] | flatten[] | select(contains("tdon")?)
Output with -r option
tdonn
You might like to consider:
jq '.. | strings | select(contains("tdon"))'

kusto function to parse json which is number

i could not able to parse the below json value , I tried with parse_json() and todynamic() ,I m getting the result column values to be empty
]1
the issue is that your payload includes an internal invalid JSON payload.
it is possible to "fix" it using the query language (see usages of replace() in the example below), however it'd be best if you can write a valid JSON payload to begin with.
try running this:
print s = #'{"pipelineId":"63dfc1f6-5a43-5bca-bffe-6a36a435e19d","vmId":"9252382a-814f-4d02-9b1b-305db4caa208/usl-exepipe-dev/westus/usl-exepipe-lab-dev/asuvp306563","artifactResult":{"Id":"execution-job-2","SourceName":"USL Repository","ArtifactName":"install-lcu","Status":"Succeeded","Parameters":null,"Log":"[{\"code\":\"ComponentStatus/StdOut/succeeded\",\"level\":\"Info\",\"displayStatus\":\"Provisioning succeeded\",\"message\":\"2020-06-02T14:33:04.711Z | I | Starting artifact ''install-lcu''\r\n2020-06-02T14:33:04.867Z | I | Starting Installation\r\n2020-06-02T14:33:04.899Z | I | C:\\USL\\LCU\\4556803.msu Exists.\r\n2020-06-02T14:33:04.914Z | I | Starting installation process ''C:\\USL\\LCU\\4556803.msu /quiet /norestart''\r\n2020-06-02T14:43:14.169Z | I | Process completed with exit code ''3010''\r\n2020-06-02T14:43:14.200Z | I | Need to restart computer after hotfix 4556803 installation\r\n2020-06-02T14:43:14.200Z | I | Finished Installation\r\n2020-06-02T14:43:14.200Z | I | Artifact ''install-lcu'' succeeded\r\n\",\"time\":null},{\"code\":\"ComponentStatus/StdErr/succeeded\",\"level\":\"Info\",\"displayStatus\":\"Provisioning succeeded\",\"message\":\"\",\"time\":null}]","DeploymentLog":null,"StartTime":"2020-06-02T14:32:40.9882134Z","ExecutionTime":"00:11:21.2468597","BSODCount":0},"attempt":1,"instanceId":"a301aaa0c2394e76832867bfeec04b5d:0","parentInstanceId":"78d0b036a5c548ecaafc5e47dcc76ee4:2","eventName":"Artifact Result"}'
| mv-expand log = parse_json(replace("\r\n", " ", replace(#"\\", #"\\\\", tostring(parse_json(tostring(parse_json(s).artifactResult)).Log))))
| project log.code, log.level, log.displayStatus, log.message

Is it possible to inspect all tables in a BigQuery dataset with one dlpJob?

I'm using Google Cloud DLP to inspect sensitive data in BigQuery. I wonder is it possible to inspect all tables within a dataset with one dlpJob? If so, how should I set the configs?
I tried to omit the BQ tableId field in config. But it will return http 400 error "table_id must be set". Does it mean that with one dlpJob, only one table can be inspected, and to scan multiple tables we need multiple dlpJobs? Or is there a way to scan multiple tables within the same dataset with some regex tricks?
At the moment, one job just scans one table. The team is working on that feature - in the meantime you can manually create jobs with a rough shell script like what I've put below which combines gcloud and the rest calls to the dlp api. You could probably do something a lot smoother with cloud functions.
Prerequisites:
1. Install gcloud. https://cloud.google.com/sdk/install
2. Run this script with the following arguments:
3.
1. The project_id to scan bigquery tables of.
2. The dataset id for the output table to store findings to.
3. The table id for the output table to store findings to.
4. A number that represents the percentage of rows to scan.
# Example:
# ./inspect_all_bq_tables.sh dlapi-test findings_daataset
# Reports a status of execution message to the log file and serial port
function report() {
local tag="${1}"
local message="${2}"
local timestamp="$(date +%s)000"
echo "${timestamp} - ${message}"
}
readonly -f report
# report_status_update
#
# Reports a status of execution message to the log file and serial port
function report_status_update() {
report "${MSGTAG_STATUS_UPDATE}" "STATUS=${1}"
}
readonly -f report_status_update
# create_job
#
# Creates a single dlp job for a given bigquery table.
function create_dlp_job {
local dataset_id="$1"
local table_id="$2"
local create_job_response=$(curl -s -H \
"Authorization: Bearer $(gcloud auth print-access-token)" \
-H "X-Goog-User-Project: $PROJECT_ID" \
-H "Content-Type: application/json" \
"$API_PATH/v2/projects/$PROJECT_ID/dlpJobs" \
--data '
{
"inspectJob":{
"storageConfig":{
"bigQueryOptions":{
"tableReference":{
"projectId":"'$PROJECT_ID'",
"datasetId":"'$dataset_id'",
"tableId":"'$table_id'"
},
"rowsLimitPercent": "'$PERCENTAGE'"
},
},
"inspectConfig":{
"infoTypes":[
{
"name":"ALL_BASIC"
}
],
"includeQuote":true,
"minLikelihood":"LIKELY"
},
"actions":[
{
"saveFindings":{
"outputConfig":{
"table":{
"projectId":"'$PROJECT_ID'",
"datasetId":"'$FINDINGS_DATASET_ID'",
"tableId":"'$FINDINGS_TABLE_ID'"
},
"outputSchema": "BASIC_COLUMNS"
}
}
},
{
"publishFindingsToCloudDataCatalog": {}
}
]
}
}')
if [[ $create_job_response != *"dlpJobs"* ]]; then
report_status_update "Error creating dlp job: $create_job_response"
exit 1
fi
local new_dlpjob_name=$(echo "$create_job_response" \
head -5 | grep -Po '"name": *\K"[^"]*"' | tr -d '"' | head -1)
report_status_update "DLP New Job: $new_dlpjob_name"
}
readonly -f create_dlp_job
# List the datasets for a given project. Once we have these we can list the
# tables within each one.
function create_jobs() {
# The grep pulls the dataset id. The td removes the quotation marks.
local list_datasets_response=$(curl -s -H \
"Authorization: Bearer $(gcloud auth print-access-token)" -H \
"Content-Type: application/json" \
"$BIGQUERY_PATH/projects/$PROJECT_ID/datasets")
if [[ $list_datasets_response != *"kind"* ]]; then
report_status_update "Error listing bigquery datasets: $list_datasets_response"
exit 1
fi
local dataset_ids=$(echo $list_datasets_response \
| grep -Po '"datasetId": *\K"[^"]*"' | tr -d '"')
# Each row will look like "datasetId", with the quotation marks
for dataset_id in ${dataset_ids}; do
report_status_update "Looking up tables for dataset $dataset_id"
local list_tables_response=$(curl -s -H \
"Authorization: Bearer $(gcloud auth print-access-token)" -H \
"Content-Type: application/json" \
"$BIGQUERY_PATH/projects/$PROJECT_ID/datasets/$dataset_id/tables")
if [[ $list_tables_response != *"kind"* ]]; then
report_status_update "Error listing bigquery tables: $list_tables_response"
exit 1
fi
local table_ids=$(echo "$list_tables_response" \
| grep -Po '"tableId": *\K"[^"]*"' | tr -d '"')
for table_id in ${table_ids}; do
report_status_update "Creating DLP job to inspect table $table_id"
create_dlp_job "$dataset_id" "$table_id"
done
done
}
readonly -f create_jobs
PROJECT_ID=$1
FINDINGS_DATASET_ID=$2
FINDINGS_TABLE_ID=$3
PERCENTAGE=$4
API_PATH="https://dlp.googleapis.com"
BIGQUERY_PATH="https://www.googleapis.com/bigquery/v2"
# Main
create_jobs

Extract fields from json using jq

I am trying write a shell script that will get some json from URL and parse the json and extract fields.
This is what is done so far.
#!/bin/bash
token=$(http POST :3000/signin/frontm user:='{"email": "sourav#frontm.com", "password": "Hello_789"}' | jq -r '.data.id_token')
cred=$(http POST :3000/auth provider_name:frontm token:$token user:=#/tmp/user.json | jq '{ creds: .creds, userUuid: .user.userId }')
echo $cred
access=$(jq -r "'$cred'")
echo $access
So the output from echo $cred is a json:
Eg:
{ "creds": { "accessKeyId": "ASIAJPM3RDAZXEORAQ5Q", "secretAccessK
ey": "krg5GbU6gtQV+a5pz4ChL+ECVJm+wKogjglXOqr6", "sessionToken": "Ag
oGb3JpZ2luEAYaCXVzLWVhc3QtMSKAAmhOg7fedV+sBw+8c45HL9naPjqbC0bwaBxq
mQ9Kuvnirob8KtTcsiBkJA/OfCTpYNUFaXXYfUPvbmW5UveDJd+32Cb5Ce+3lAOkkL
aZyWJgvhM1u53WNuMekhcZX7SnlCcaO4e/A9TR74qMOsVptonw5jFB5zjbEI4hFsVX
UHXtkYMYpSyG+2P2LxWRqTg4XKcg2vT+qrLtiXu3XNK70wuCe0/L4/HjjzlLvChmhe
TRs8u8ZRcJvSim/j1sLqe85Sl1qrFv/7msCaxUa3gZ3dOcfHliH64+8NHfS1tkaVkS
iM2x4wxTdZI/SafduFDvGCsltxe9p5zQD0Jb1Qe02ccqpgUIWxAAGgw3NzE5NTYwMD
EyNDciDOQZkq8t+c7WatNLHyqDBahqpQwxpGsYODIC1Db/M4+PXmuYMdYKLwjv3Df2
JeTMw2RT1h8M0IOOPvyBWetwB42HLhv5AobIMkNVSw6tpGyZC/bLMGJatptB0hVMBg
/80VnI7pTPiSjb/LG46bbwlbJevPoorCEEqMZ3MlAJ2Xt2hMmA+sHBRRvV1hlkMnS8
NW6w9xApSGrD001zdfFkmBbHw+c4vmX+TMT7Bw0bHQZ5FQSpEBOw9M5sNOIoa+G/pP
p4WoHiYfGHzaXGQe9Iac07Fy36W/WRebZapvF7TWoIpBjAV+IrQKP3ShJdBi3Oa6py
lGUQysPa3EN0AF/gDuTsdz7TDsErzzUERfQHksK495poG92YoG2/ir8yqTQtUDvshO
7U4SbFpUrozCYT6vp7++BWnpe+miIRCvjy2spqBqv2RY6lhgC6QPfS/365T+QbSTMc
R+ZNes0gX/QrEG4q1sMoxyTltL4sXS2Dz9UXywPkg78AWCOr34ii72m/67Gqe1P3KA
vBe9xF9Hem4H1WbYAqBN76ppyJyG17qK8b2/r71c8rdY+1gYcskV1vUfTQUVCGE0y2
JXKV2UMFOwoTzy6SFIGcuTeOAHiYPgTkMZ6X7hNjf56ihzBIbhSHaST8U4eNBka8j8
Y949ilJwz9QO0l1kwdb2+fQSMblHgeYvF1P8HxBSpRA28gKkkXMf73Zk27I3O2DRGb
lcXS4tKRvan4ASTi4qkdrvVwMT5mwJI4mGIJZSiMJqPxjVh5E9OicFbIOCRcbcIRDE
mj5t9EvaSbIm4ELBMuyoFjmKJmesE03uFRcHkEXkPBxhkJbQwkJeUxHll5kR1IYzvA
K2A2EiZqjkhiSJC4NRekEuM+5WowwuWw1wU=" }, "userUuid": "mugqRKHmTPxk
obBAtwTmKk" }
So basically I am stuck here .. how do i parse this json in $cred further and basically want to get access to say accessKeyId using jq further?
I wonder if the variable $cred really holds a string formated in 67 columns, but if that so, tr might help to remove the newline and extract the accessKeyId using jq:
echo "$cred" | tr -d '\n' | jq -r '.creds.accessKeyId'

Resources