HOWTO extract specific text from HTML/XML ,using cURL and HTTP

HOWTO extract specific text from HTML/XML ,using cURL and HTTP - http

I successfully downloaded a file from a remote server using cURL and HTTP, but the file includes all the HTML code.
Is there a function in cURL so that I can extract the values I want?
For example, I am getting:
...
<body>
Hello,Manu
</body>
...
But I only want Hello,Manu.
Thanks in advance,
Manu

try using DOMDocument or any other XML parser.
$doc= new DOMDocument();
$doc->loadHTML($html_content); // result from curl
$xpath= new DOMXPath($doc);
echo $xpath->query('//body')->item(0)->nodeValue;
alternatively for command line you can use
curl 'http://.................' | xpath '//body'

Related

How translate this curl command into a R curl call?

I have this curl command that I can call in bash
curl -X POST -H 'Content-Type: text/csv' --data-binary #data/data.csv https://some.url.com/invocations > data/churn_scored.jsonl
which posts a CSV file to the API endpoint and I redirect the results into a .jsonl file.
I can't find where I can specify a data file to POST to the endpoint like using curl's #.
What's the way to achieve the CURL post using R's curl package (or any other package)? The redirection of output I can figure out in another way.

This is a very helpful site in converting curl commands to other languages: https://curl.trillworks.com/#r
When plugging in your curl command there I got this.
require(httr)
headers = c(
`Content-Type` = 'text/csv'
)
data = upload_file('data/data.csv')
res <- httr::POST(url = 'https://some.url.com/invocations', httr::add_headers(.headers=headers), body = data)

To specific symbol #. From man curl:
--data-binary <data>
(HTTP) This posts data exactly as specified with no extra processing whatsoever.
If you start the data with the letter #, the rest should be a filename. Data is
posted in a similar manner as --data-ascii does, except that newlines are preserved
and conversions are never done.
If this option is used several times, the ones following the first will append data
as described in -d, --data.
It seems no need to worry about #
As it mentioned by #chinsoon12, httr is a good nice method to handle request:
-X or --request translates to VERB function POST(), which includes --data-binary
-H or --header translates to add_headers() but there are special functions for setting content type (see below)
So it looks like:
library(httr)
response <- POST(
url = "https://some.url.com/invocations",
body = upload_file(
path = path.expand("data/data.csv"),
type = 'text/csv'),
verbose()
)
# get response and write to you disk

Geofencing Extension API - Uploading a New Layer

I'm trying to use Postman for upload a WKT layer by the cURL code below.
curl -X POST
'https://gfe.cit.api.here.com/2/layers/upload.json?layer_id=123&app_id={APP_ID}&app_code={APP_CODE}'
-H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW'
-F 'zipfile=#C:\xampp\htdocs\here\southern.wkt.zip'`
but the response message is
{"issues":[{"message":"Multipart should contain exactly one part"}],"error_id":"2e1e3994-69d6-43bb-8224-2a869b5255ae","response_code":"400 Bad Request"}
Am I doing something wrong?

Make sure your WKT file doesn't contain additional indentation(spaces) avoid giving spaces and instead use equal tabs and give proper column names
refer this

Obtain filename from url in R

I have an url like http://example.com/files/01234 that when I click it from the browser downloads a zip file titled like file-08.zip
With wget I can download using the real file name by running
wget --content-disposition http://example.com/files/01234
Functions such as basename do not work in this case, for example:
> basename("http://example.com/files/01234")
[1] "01234"
I'd like to obtain just the filename from the URL in R and create a tibble with zip (files) names. No matter if using packages or system(...) command. Any ideas? what I'd like to obtain is something like
url | file
--------------------------------------------
http://example.com/files/01234 | file-08.zip
http://example.com/files/03210 | file-09.zip
...

Using the httr library, you can make a HEAD call and then parse he content-disposition header For example
library(httr)
hh <- HEAD("https://example.com/01234567")
get_disposition_filename <- function(x) {
sub(".*filename=", "", headers(x)$`content-disposition`)
}
get_disposition_filename(hh)
This function doesn't check that the header actually exists so it's not very robust, but should work in the case where the server returns an alternate name for the downloaded file.

With #Sathish contribution:
When URLs don't contain the file to download in the URL string a valid solution is
system("curl -IXGET -r 0-10 https://example.com/01234567 | grep attachment | sed 's/^.\\+filename=//'")
The idea is to read 10 bytes from the zip instead of the full file before obtaining file name, it will return file-789456.zip or the real zip name from that URL.

unix sh script - read from file

I'm trying to read the content from a aux file, but I can't figure why the command don't work, if I use the string in parameter, that was read from read from file..
Script
file=servers.aux
for server in $(cat $file)
do
echo $server
echo $server
`/usr/IBM/WebSphere/App/profiles/BPM/bin/serverStatus.sh $server -username adm -password adm`
done
Result
BPM.AppTarget.bpm01.0
ServersStatus[7]: ADMU0116I:: not found.
In past, I used something like: put the variable in one array and read the variable from that array, but I think this is possible, what I'm doing wrong?
Thanks in Advance
Tiago

I don't think you need the back-ticks on the last line. You're not trying to run the output of the serverStatus.sh script as a command itself, are you?

Find missing URL routes using the command-line

I'm trying to automate a check for missing routes a Play! web application.
The routing table is in a file in the following format:
GET /home Home.index
GET /shop Shop.index
I've already managed to use my command line-fu to crawl through my code and make a list of all the actions that should be present in the file. This list is in the following format:
Home.index
Shop.index
Contact.index
About.index
Now I'd like to pipe the output of this text into another command that checks if each line is present in the route file. I'm not sure how to proceed though.
The result should be something like this:
Contact.index
About.index
Does someone have a helpful suggestion on how I can accomplish this?

try this line:
awk 'NR==FNR{a[$NF];next}!($0 in a)' routes.txt list.txt
EDIT
if you want the above line to accept list from stdin:
cat list.txt|awk 'NR==FNR{a[$NF];next}!($0 in a)' routes.txt -
replace cat list.txt with your magic command

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

HOWTO extract specific text from HTML/XML ,using cURL and HTTP - http

I successfully downloaded a file from a remote server using cURL and HTTP, but the file includes all the HTML code. Is there a function in cURL so that I can extract the values I want? For example, I am getting: ... <body> Hello,Manu </body> ... But I only want Hello,Manu. Thanks in advance, Manu

try using DOMDocument or any other XML parser. $doc= new DOMDocument(); $doc->loadHTML($html_content); // result from curl $xpath= new DOMXPath($doc); echo $xpath->query('//body')->item(0)->nodeValue; alternatively for command line you can use curl 'http://.................' | xpath '//body'

Related

How translate this curl command into a R curl call?

Geofencing Extension API - Uploading a New Layer

Obtain filename from url in R

unix sh script - read from file

Find missing URL routes using the command-line

Categories

Resources