Scraping and Crawling-Issue a token

Scraping and Crawling-Issue a token - web-scraping

I am a newbie to Scraping and Crawling. For a research project, I am trying to scrape and crawl the social network site: https://my-plant.org/
There is an API for this site: Foundational API v1.0
It says this is how you issue a token:
curl -X POST -sku "vaughn:**********" https: //foundation.iplantc.org/auth-v1/ | python -mjson.tool
I'm trying to use php with the help of wikiscraper to get authenticated and into the site so I can scrape it. I am having a difficult time getting authenticated and into the site. I put the command above on a command line and was returned:
curl: No match.
python: module json.tool not found
Can someone help me get authenticated so I can begin crawling and scraping the site in php?

Standard json module appeared in python2.6 (or 2.5.5?). Probably, you use the earlier python release.

Related

wget won't download files I can access through browser

I am an amateur historian trying to access newspaper archives. The server where the scans are located "works" using an outdated tif viewer that doesn't seem to actually work at all anymore. I can access the files individually in chrome without logging in, but when I try to use wget or curl, I'm told that viewing the file is unauthorized, even when I use my login info, and even when using my cookies from chrome.
Here is an example of one of the files: https://ulib.aub.edu.lb/nahar/images2/7810W2/78101001.TIF
When I put this into chrome, it automatically downloads the file even though I cannot access the directory itself, but when I use wget, I get the following response: "401 unauthorized Username/Password Authentication Failed."
This is the basic wget command I'm using (if I can get it to work at all, then I'll input a list of the other files):
wget --no-check-certificate https://ulib.aub.edu.lb/nahar/images2/7810W2/78101001.TIF
I've tried variations with and without cookies, with a blank user, with and without login credentials, As I'm sure you can tell, I'm new to this sort of thing but eager to learn.

From what I can see, authentication on your website is done with HTTP basic. This kind of authentication is not using HTTP cookies, it is using HTTP Authorization header. You can pass HTTP basic credentials to wget with the following arguments.
wget --http-user=YourUsername --http-password=YourPassword https://ulib.aub.edu.lb/nahar/images2/7810W2/78101001.TIF

Firebase/Flashlight Elastic Search

I was using this tutorial by Lance Samaria to integrate firebase flashlight into Heroku with the bonsai add-on for elastic search. On step 17 after I run the curl post I get this error No handler found for uri [/firebase] and method [POST]. Where is this error coming from?

Fixed this by using cURL PUT instead of cURL POST

How to include or wrap the file or file content or file object when using Softlayer Object Storage REST API

I am using Softlayer Object Storage REST API.
I was told that below command line using CURL has been successful.
$ curl -i -XPUT -H "X-Auth-Token: AUTH_tkabcd" --data-binary "Created for testing REST client" https://dal05.objectstorage.softlayer.net/v1/AUTH_abcd/container2/file10.txt
I wish to upload files using Javascript so I have no clue how do I wrap the file in my request.
Anyone please provide an example? A lot of thanks.

Here some ideas and examples that can help you:
CORS
How to consume a RESTful service using
jQuery
JavaScript REST client Library
[closed]
https://ricardo147.files.wordpress.com/2012/08/image001.png
However, there is a customer who was not able to make rest request through Javascript. I will investigate about it, I will let you know any news.
(Softlayer, Open Stack Swift) How to solve cross domain origin with
object storage
api?

how to get the link names of a webpage using cURL command

I am using Postman, a google Chrome add on that makes cURL commands, and I make a GET command with a website url. My question goes with an example: on a website like google, if i type "stackoverflow" and search, I take this url and make my cURL command, how can I get the names of each link? Is that possible? By example, for this page there would be "Stack Overflow" ... "Stack Overflow - Wikipédia"...

I found out the answer by myself.
I installed Postman Interceptor, activated it and entered the website on google chrome. The interceptor sent all the requests done to get that website to my postman, so all i had to do is look at them and I found the information I looked for in one of them.

Not ablle to receive token from OAuth authentication in Apigee tool

I have created AccessTokenClientCredential and RefreshAccessToken in OAuth proxy through Apigee tool.
When I tried to access "https://damuorgn-prod.apigee.net/oauth/client_credential/accesstoken?grant_type=client_credentials&client_id=07VoDotbGhyl3aG8GxjkyXivoTNH9oiQ&client_secret=fb8ZOrAUUSGp3FAv" URL after mentioning client Id and client secret ID, page is empty. It does not displays any error or displays with Token value.
Steps followed to create token from below URL
"http://apigee.com/docs/gateway-services/content/secure-calls-your-api-through-oauth-20-client-credentials".
Please advise.
Regards,
Damodaran
I tried both Test and Prod environment but there was no luck.
I have requested for Curl software installation. Is there any other way to test this URL without Curl software. Your immediate reply is appreciated. Thanks!
Curl https://damuorgn-test.apigee.net/oauth/client_credential/accesstoken?grant_type=client_credentials -X POST -d 'client_id=qnYUqb6j3uGraRAh7JF9d651nUXNwMCC&client_secret=mjHIFMcTDCa3YQ6f'
Could you please check on this link from Curl software ?

It looks like there may be a couple of issues:
When I try your URL, I get a "CLASSIFICATION_FAILURE" error - which means the proxy can't be found. I noticed that you're using "damuorgn-prod.apigee.net" when you might have deployed your proxy to the test environment, and meant to use: "damuorgn-test.apigee.net".
In step 5.2 of the document you referenced, it says to use POST instead of GET. So you might try this:
curl https://damuorgn-test.apigee.net/oauth/client_credential/accesstoken?grant_type=client_credentials -X POST -d 'client_id=07VoDotbGhyl3aG8GxjkyXivoTNH9oiQ&client_secret=fb8ZOrAUUSGp3FAv'
(When I try this, I get an "invalid client id" error, but maybe that client_id is no longer valid?)
Hope that helps,
Scott

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Scraping and Crawling-Issue a token - web-scraping

Standard json module appeared in python2.6 (or 2.5.5?). Probably, you use the earlier python release.

Related

wget won't download files I can access through browser

Firebase/Flashlight Elastic Search

How to include or wrap the file or file content or file object when using Softlayer Object Storage REST API

how to get the link names of a webpage using cURL command

Not ablle to receive token from OAuth authentication in Apigee tool

Categories

Resources