Downloading remote file using WordPress forbidden - wordpress

I am trying to use WordPress' media_sideload_image with a remotely hosted image in S3 in order to save it into WordPress' media gallery.
But for whatever reason, I always get a forbidden response no matter what I try and do regarding request options for the WordPress request. Visiting the URL directly in the browser works, wget works, postman works.
Does anyone have any ideas on how to make WordPress be able to successfully download this file?
Here's the code I'm using:
$attachment_ID = media_sideload_image('https://s3.amazonaws.com/mlsgrid/images/0788b2c2-d865-496b-bad3-69ebe9c1db79.png');
And here's the WordPres error response I get:
object(WP_Error)[2090]
public 'errors' =>
array (size=1)
'http_404' =>
array (size=1)
0 => string 'Forbidden' (length=9)
public 'error_data' =>
array (size=1)
'http_404' =>
array (size=2)
'code' => int 403
'body' => string '<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>39B59073BBC1205F</RequestId><HostId>6TwMl4cMbLXzr7jbx6ykQKaQuk0Rn5Oyc2Q3+02zmgtNoDqUvcg8VY32qGuS1ZMzgpZuLAefK3g=</HostId></Error>' (length=243)
protected 'additional_data' =>
array (size=0)
empty
Thanks!

After digging around WordPress' request functionality, it looks like it is setting a referer header on each request to be the same as the URL being fetched and I guess Amazon S3 rejects requests with a referer header set? (not sure if that is specific to the bucket I'm fetching images from or true across every single bucket).
Here's how I got it working by removing the referer header from the request, basically just filter for all S3 URLs and remove the referer request header.
// Remove referer from request headers if the URL is an S3 bucket.
add_action( 'http_api_curl', function ($ch, $parsed_args, $url) {
$s3_url = 'https://s3.amazonaws.com';
if (substr($url, 0, strlen($s3_url)) === $s3_url) {
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Referer:']);
}
}, 10, 3);

Related

How to fix 401 Unauthorized response WP Rest API

Im making a request to another wordpress site on our network as below.
//Send the request to update the submission post
$response = wp_remote_request( $this->urls->assign_url, array(
'headers' => array(
'Content-Type' => 'application/json; charset=utf-8',
'Authentication' => 'Basic '.base64_encode('somename:somepassword')
),
'body' => json_encode($array),
'method' => 'POST',
'data_format' => 'body'
)
);
Im making this request via ajax.
The callback function is being called and sends back data.
Im also logged into the remote site.
Im using a nonce and the user being authorised in the headers is a valid user.
All I keep getting back is:
body: "{"code":"rest_not_logged_in","message":"You are not currently logged in.","data":{"status":401}}"
Ive only just started getting this since I updated the remote wordpress version. It was working fine before that.
Any thoughts.
I believe in order to authenticate the way you want to, you need to use a plugin - the built-in authentication method is not ideal for offsite requests since it is cookie based.
https://developer.wordpress.org/rest-api/using-the-rest-api/authentication/#authentication-plugins

Computer Vision API - v1.0 Recognize Handwritten Text returns empty response

I am trying to start using the computer vision API but I keep getting an empty response. My request in php (as exported by Postman) looks like this:
<?php
$request = new HttpRequest();
$request->setUrl('https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/recognizeText');
$request->setMethod(HTTP_METH_POST);
$request->setQueryData(array(
'language' => 'en',
'handwriting' => 'true'
));
$request->setHeaders(array(
'Postman-Token' => '442d04f7-49a0-4262-9d0f-666fe5240cc7',
'Cache-Control' => 'no-cache',
'Content-Type' => 'application/octet-stream',
'Ocp-Apim-Subscription-Key' => 'KEY'
));
try {
$response = $request->send();
echo $response->getBody();
} catch (HttpException $ex) {
echo $ex;
}
The above code works fine with the ocr endpoint!
The file is passed as binary using Postman.
Edit: I also tried to copy/paste the code from here: https://learn.microsoft.com/en-gb/azure/cognitive-services/computer-vision/quickstarts/php#ocr-php-example-request and if I change the ocr endpoint to recognizeText I get an empty response as well!
Unlike the other Computer Vision endpoints, RecognizeText is an asynchronous operation. Barring some issue with the image, you will get a 202 response instead of the usual 200 response. 202 responses customarily contain an empty response body. In this particular case you can find the URL where you can query for completion of the task. The documentation is here. The header you're looking for is Operation-Location.

Testing file uploads with codeception

Problem:
No data and files are coming through to the Silex application when a request is made from codeception test using the REST module with PhpBrowser driver.
// ApiTester $I
$I->wantTo('Submit files');
// prepare:
$data = ['key' => 'value'];
$files = [
'file_key' => 'path/to/file.doc',
'file2_key' => 'path/to/file2.doc'
];
// act:
$I->haveHttpHeader('Content-Type', 'multipart/form-data');
$I->sendPOST('/attachments/', $data, $files);
Current response
I have http header "Content-Type","multipart/form-data"
I send post "/attachments/",{"key":"value"},{"file_key":"/path/to/file/...}
[Request] POST http://localhost/attachments/ {"key":"value"}
[Request Headers] {"Content-Type":"multipart/form-data"}
[Page] http://localhost/attachments/
[Response] 400
[Request Cookies] []
[Response Headers] {"Date":["Tue, 25 Oct 2016 09:15:31 GMT"],"Server":["Apache/2.4.10 (Debian)"],"Cache-Control":["no-cache"],"Access-Control-Allow-Origin":["*"],"Access-Control-Allow-Headers":["Content-Type, Authorization"],"Access-Control-Allow-Methods":["GET,PATCH,PUT,POST,HEAD,DELETE,OPTIONS"],"Content-Length":["1235"],"Connection":["close"],"Content-Type":["application/json"]}
[Response] {"status":400,"meta":{"time":"2016-10-25 09:15:31"},"title":"Invalid Request","errors":"No data received","details":{"error_class":"Symfony\Component\HttpKernel\Exception\BadRequestHttpException"
Tried:
changing the Content-Type header
changing files array passed to sendPOST to an array of:
file paths file objects ( UploadedFile )
file arrays
The test works with Silex driver, but that won't be an option on the CI server. Also we've checked with Postman and the API route works as intended, files are sent and all good.
The actual problems:
$I->haveHttpHeader('Content-Type', 'multipart/form-data'); overwrites the Content-Type as it should be set by the http library (in phpbrowser is guzzle) to include the boundary, it's related to this.
Also be mindful that the $I->header does not reset after each request, to unset it use $I->deleteHeader('Content-Type');
Solution
Don't set the 'Content-Type' headers when sending files.

Rackspace CloudFile API - get object information

So, here's what I'm doing with the API:
Auth (to get token and publicUrl for the particular region I need from the "object-store")
Use the publicUrl from the endpoint like so to get a list of files:
GET [publicUrl]/[container]
This returns an array where each item (object) looks like the following:
(
[hash] => 7213ee9a7d9dc119d2921a40e899ec5e
[last_modified] => 2015-12-29T02:46:08.400490
[bytes] => 1
[name] => Some type of file name.jpg
[content_type] => application/postscript
)
Now, how do I build the url to do a GET on the item (object)? I've tried the following:
[publicUrl]/[container]/[hash]
[publicUrl]/[container]/urlencoded([name])
among other things that don't make sense, but I tried anyway.
Any thoughts/help would be appreciated!
If you are using a Rackspace SDK, you can skip building the URLs yourself.
Here is the documentation for retrieving a Cloud Files object using a public URL. The object URL is the combination of the public URL of the container (found in the X-Cdn-Uri response header) with the object name appended.
For example, for a container named 'foo', send an authenticated HEAD request to the API:
HEAD {cloudFilesEndpoint}/foo
In the response, the container's public URL is in the 'X-Cdn-Uri' header:
HTTP/1.1 204 No Content
X-Cdn-Ssl-Uri: https://83c49b9a2f7ad18250b3-346eb45fd42c58ca13011d659bfc1ac1.ssl.cf0.rackcdn.com
X-Ttl: 259200
X-Cdn-Uri: http://081e40d3ee1cec5f77bf-346eb45fd42c58ca13011d659bfc1ac1.r49.cf0.rackcdn.com
X-Cdn-Enabled: True
X-Log-Retention: False
X-Cdn-Streaming-Uri: http://084cc2790632ccee0a12-346eb45fd42c58ca13011d659bfc1ac1.r49.stream.cf0.rackcdn.com
X-Trans-Id: tx82a6752e00424edb9c46fa2573132e2c
Content-Length: 0
Now, for an object named 'styles/site.css', append that name to the public URL, resulting in the following URL:
http://081e40d3ee1cec5f77bf-346eb45fd42c58ca13011d659bfc1ac1.r49.cf0.rackcdn.com/styles/site.css

Reading WordPress header.php/footer.php to a text string

I am attempting to read the results of the executed header.php/footer.php files as a string of html. Here's the scenario:
There are pages in the site that are developed in a .net environment but they want to share common headers/footers across the entire domain. They wish to have WordPress be the repository for this code and any time there is an update have a PHP cURL call to a .net web service and feed it the new HTML for the header/footers.
I tried calling get_header() but that does not return a string (as I anticipated) so then I tried this test solution in functions.php:
function write_header() {
$header_content = file_get_contents(get_bloginfo('wpurl').'/index.php' );
$fp = fopen('c:\header.txt', 'a+');
fwrite($fp, $header_content);//just testing the output, this will be a cURL call eventually.
fclose($fp);
}
add_action( 'wp_update_nav_menu', 'write_header' );
It seems to be a very heavy handed method of getting the HTML since I'll have to do a lot of string manipulation to parse out the pieces I want. Is there a simpler way of doing this that I'm missing?
If get_header() outputs the header for you, try just wrapping it with an ob_start() and ob_get_contents() to extract the header to a string. You can then discard the output with ob_end_clean(). See the PHP output buffering documentation.
ob_start();
get_header();
$header_as_string = ob_get_contents();
ob_end_clean();
There's a couple ways you can approach this problem (both are a bit of kludge, but what isnt...). The first would be to create a template in your theme's directory that will include only the header and footer calls -- the body of the template can contain a delimiter string like an html comment, e.g. <!-- SPLIT HERE -->.
Request the page through CURL into an output buffer, capturing the resulting response, which you can split into it's component parts using the above delimiter. That will give you your header and footer, complete with the fully rendered tags in the header for css,js, etc. It's not pretty, but it does the job.
The second approach would be an adaptation of the first, which, rather than you doing the splitting, have your .net team take care of it on their end if possible.
UPDATE
Okay, so there's actually a third option, which I completely forgot about, and that's to use one of WP's features: wp_remote_get() http://codex.wordpress.org/Function_API/wp_remote_get
Retrieves a URL using the HTTP GET method, returning results in an array. Results include HTTP headers and content.
This is what you should get back (excerpted from the API docs):
Array
(
[headers] => Array
(
[date] => Thu, 30 Sep 2010 15:16:36 GMT
[server] => Apache
[x-powered-by] => PHP/5.3.3
[x-server] => 10.90.6.243
[expires] => Thu, 30 Sep 2010 03:16:36 GMT
[cache-control] => Array
(
[0] => no-store, no-cache, must-revalidate
[1] => post-check=0, pre-check=0
)
[vary] => Accept-Encoding
[content-length] => 1641
[connection] => close
[content-type] => application/php
)
[body] => <html>This is a website!</html>
[response] => Array
(
[code] => 200
[message] => OK
)
[cookies] => Array
(
)
)
All you'd have to do is pass the URL to a page that's using the template I mentioned above, then handle response from wp_remote_get(); extract the html content form [body] and do your string splitting. Pretty much what you want.
Further reading: wp_remote_retrieve_body() http://codex.wordpress.org/Function_API/wp_remote_retrieve_body

Resources