Wordpress REST API fails with Python - wordpress

I've written a simple Python script to upload an image to a WP site:
import requests
import base64
BASE_URL = "https://example.com/wp-json/wp/v2"
media = {
"file": open("image.png", "rb"),
"caption": "a media file",
"description": "some media file"
}
creds = "wp_admin_user" + ":" + "app password"
token = base64.b64encode(creds.encode())
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36",
"Authorization":"Basic " + token.decode("utf-8")
}
r = requests.post(BASE_URL + "/media", headers=header, files=media)
print(r)
When using Python 3.9 on Windows, everything works as expected: I get a <Response [201]> reply and I can see the image in my site's media library.
When running the exact same script on a Linux, it fails with a 503 reply from the WP server:
<Response [503]>
The Linux is running Python 3.9.1
I can run the script again on Windows ten times, and it always works. I've searched the internets for the error and it's usually a WP configuration error, which doesn't seem to be the case here as the script works on Windows.
Any help is much appreciated!
I think that the problem is in the ip of the server where it hangs Linux

Related

Why does using free proxies not work for scraping google scholar?

I have seen that there are many approarches for scraping data from google scholar out there. I tried to put together my own code, however I cannot get access to google scholar using free proxies. I am interested in understanding why that is (and, secondarily, what to change). Below is my code. I know it is not the most elegant one, its my first try at data scraping...
This is a list of proxies I got from "https://free-proxy-list.net/" and I did test if they worked by accessing "http://icanhazip.com" with them.
live_proxies = ['193.122.71.184:3128', '185.76.10.133:8081', '169.57.1.85:8123', '165.154.235.76:80', '165.154.235.156:80']
Then I made the urls I want to scrape, and tried to get the content of the pages with one randome proxy
search_terms = ['Acanthizidae', 'mammalia']
for i in range(len(search_terms)):
url = 'https://scholar.google.de/scholar?hl=en&as_sdt=0%2C5&q={}&btnG='.format(search_terms[i])
session = requests.Session()
proxy = random.choice(live_proxies)
session.proxies = {"http": proxy , "https": proxy}
ua = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'}
output = s.get(url, headers=ua, timeout=1.5).text
However, I get:
requests.exceptions.ProxyError: HTTPSConnectionPool(host='scholar.google.de', port=443): Max retries exceeded with url: /scholar?hl=en&as_sdt=0%2C5&q=Acanthizidae&btnG= (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 400 Bad Request')))
Like I said, I did test the proxies before with a different site. What is the problem?

AJAX requests failing for ASP.NET Core app in Amazon Linux 2 vs Amazon Linux 1

I have an ASP.NET Core v.3.0 web application hosted in Amazon AWS Elastic Beanstalk using Docker. The application works fine in Docker running 64bit Amazon Linux (v2.16.11). When I update to Docker running 64bit Amazon Linux 2 (v3.4.12) the requests work fine except for AJAX requests which fail with Status Code Error 400 "Bad request". Nothing else has changed in the source code, dockerfile etc. Only the Linux version has changed from Amazon Linux to Amazon Linux 2. Does anybody have an idea what is different between Amazon Linux 1 and Amazon Linux 2 that may be the cause leading to AJAX requests failing?
More info:
I cannot replicate this error with the official ASP.NET core 3.1 examples. I have not updated my application to v3.1, I will do it soon and I will update this question.
The relevant action inside the controller does not return the partial view in Amazon Linux 2. The controller provides a log just before returning the partial view and this is not triggered in Amazon Linux 2.
The nginx access.log file shows the following response of the load balancer:
Amazon Linux 1:
{IP} - - [10/Apr/2022:07:36:01 +0000] "POST {url} HTTP/1.1" 200 3882 "{url2}" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36" "{IP2}"
Amazon Linux 2:
{IP} - - [10/Apr/2022:07:00:14 +0000] "POST {url} HTTP/1.1" 400 0 "{url2}" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36" "{IP2}"
The call is made with jQuery 3.4.1:
var $form = $("#inputForm");
if ($form.length <= 0) return;
var data = $form.serialize();
$.ajax({
url: "...",
type: "POST",
data: data,
error: function (jqXHR, textStatus, errorThrown) {
alert("There was an error when loading the results (error type = " + errorThrown + ").");
},
success: function (result) {
$("#calculationTarget").html(result)
});
The issue is no longer present if the project is updated from ASP.NET Core 3.0 to ASP.NET Core 3.1.
There is a very simple fix, which is updating to ASP.NET Core 3.1.
In this version, the issue that you have having is fixed.
See the steps below for updating.
If you have a global.json file to target a specific .NET Core SDK version, update the version property.
{
"sdk": {
"version": "3.1.101"
}
}
Update the TFM to netcoreapp3.1, as described below.
<Project Sdk="Microsoft.NET.Sdk.Web">
<PropertyGroup>
<TargetFramework>netcoreapp3.1</TargetFramework>
</PropertyGroup>
</Project>
You need to update the package references. To do this, update every Microsoft.AspNetCore.* (* meaning wildcard) to 3.1.0 (or any version later).
If you are using Docker (which I think you are), then you need to use an ASP.NET Core 3.1 base image. See an example below.
$ docker pull mcr.microsoft.com/dotnet/aspnet:3.1
For extra steps and information, see the official guide from migrating to ASP.NET Core 3.1.
In summary, upgrading your current application to ASP.NET Core 3.1 should fix your issue.

Changing user agent on headless chrome

I have an issue with changing the user agent.
I am trying to use the following line in my runner.js file in the browsers array :
chrome:headless:userAgent=Mozilla/5.0\ \(Linux\;\ Android\ 5.0\;\ SM-G900P\ Build/LRX21T\)\ AppleWebKit/537.36\ \(KHTML,\ like\ Gecko\)\ Chrome/57.0.2987.133\ Mobile\ Safari/537.36
However, the best I can get is Mozilla/5.0 (Linux in the actual user agent.
The guide doesn't say anything explicit about user agents and how to escape them.
Could someone help me with using a custom user agent for the headless chrome? I can't seem to get over the escaping problem. Thanks.
I actually found the answer, you need to escape with \\ every ; character.
E.g:
chrome:headless:userAgent=Mozilla/5.0 (X11\\; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36
will work.
In case of using in cli command you need to double escape. (I didn't have success in that)

Using Urllib in Python3 to download a file, giving HTTP error 403- faking a user agent?

I'm using phantomJS and selenium to convert Youtube videos to mp3s using anything2mp3.com and then attempting to download the files.
I'm trying to use urllib in Python 3 to download a .mp3 file. However, when I try:
url = 'example.com'
fileName = 'testFile.mp3'
urllib.request.urlretrieve(url, fileName)
I get the error:
urllib.error.HTTPError: HTTP Error 403: Forbidden
From hours of searching, I have found that it is likely due to the website not liking the user agent being used to access the website. I've tried to alter the user agent but haven't had any luck since I can't simply supply a header to urlretrieve.
Use requests lib:
SERVICE_URL = 'http://anything2mp3.com/'
YOUTUBE_URL = 'https://youtu.be/AqCWi_-vnTg'
FILE_NAME = 'song.mp3'
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'
# Get mp3 link using selenium
browser = webdriver.PhantomJS()
browser.get(SERVICE_URL)
search = browser.find_element_by_css_selector('#edit-url')
search.send_keys(YOUTUBE_URL)
submit = browser.find_element_by_css_selector('#edit-submit--2')
submit.click()
a = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, '#block-system-main > a')))
download_link = a.get_attribute('href')
# Download file using requests
# http://docs.python-requests.org/en/latest/
r = requests.get(download_link, stream=True, headers={'User-Agent': USER_AGENT})
with open(FILE_NAME, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)

'RCurl' [R] package getURL webpage error when scraping API

I am trying to scrape data on pages from an API using the getURL function of the RCurl package in R. My problem is that I can't replicate the response that I get when I open the URL in Chrome when I make the request using R. Essentially, when I open the API page (url below) in Chrome it works fine but if I request it in using getURL in R (or using incognito mode in Chrome) I get a '500 Internal Server Error' response and not the pretty JSON that I'm looking for.
URL/API in question:
http://www.bluenile.com/api/public/loose-diamond/diamond-details/panel?country=USA&currency=USD&language=en-us&productSet=BN&sku=LD04077082
Here is my (failed) request in [R].
test2 <- fromJSON(getURL("http://www.bluenile.com/api/public/loose-diamond/diamond-details/panel?country=USA&currency=USD&language=en-us&productSet=BN&sku=LD04077082", ssl.verifypeer = FALSE, useragent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36"))
My Research so Far
First I looked at this prior question on stack and added in my useragent to the request (did not solve problem but may still be necessary):
ViralHeat API issues with getURL() command in RCurl package
Next I looked at this helpful post which guides my rationale:
R Disparity between browser and GET / getURL
My Ideas About the Solution
This is not my area of expertise but my guess is that the request is lacking a cookie needed to complete the request (hence why it doesn't work in my browser in incognito mode). I compared the requests and responses from the successful request to the unsuccessful request:
Successful request:
Unsuccessful request:
Anyone have any ideas? Should I try using the package RSelenium package that was suggested by MrFlick in the 2nd post I made.
This is a courteous site. It would like to know where you come from what currency you use etc. to give you a better user experience. It does this by setting a multitude of cookies on the landing page. So we follow suit and navigate to the landing page first getting the cookies then we goto the page we want:
library(RCurl)
myURL <- "http://www.bluenile.com/api/public/loose-diamond/diamond-details/panel?country=USA&currency=USD&language=en-us&productSet=BN&sku=LD04077082"
agent="Mozilla/5.0 (Windows NT 6.3; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0"
#Set RCurl pars
curl = getCurlHandle()
curlSetOpt(cookiejar="cookies.txt", useragent = agent, followlocation = TRUE, curl=curl)
firstPage <- getURL("http://www.bluenile.com", curl=curl)
myPage <- getURL(myURL, curl = curl)
library(RJSONIO)
> names(fromJSON(myPage))
[1] "diamondDetailsHeader" "diamondDetailsBodies" "pageMetadata" "expandedUrl"
[5] "newVersion" "multiDiamond"
and the cookies:
> getCurlInfo(curl)$cookielist
[1] ".bluenile.com\tTRUE\t/\tFALSE\t2412270275\tGUID\tDA5C11F5_E468_46B5_B4E8_D551D4D6EA4D"
[2] ".bluenile.com\tTRUE\t/\tFALSE\t1475342275\tsplit\tver~3&presetFilters~TEST"
[3] ".bluenile.com\tTRUE\t/\tFALSE\t1727630275\tsitetrack\tver~2&jse~0"
[4] ".bluenile.com\tTRUE\t/\tFALSE\t1425230275\tpop\tver~2&china~false&french~false&ie~false&internationalSelect~false&iphoneApp~false&survey~false&uae~false"
[5] ".bluenile.com\tTRUE\t/\tFALSE\t1475342275\tdsearch\tver~6&newUser~true"
[6] ".bluenile.com\tTRUE\t/\tFALSE\t1443806275\tlocale\tver~1&country~IRL&currency~EUR&language~en-gb&productSet~BNUK"
[7] ".bluenile.com\tTRUE\t/\tFALSE\t0\tbnses\tver~1&ace~false&isbml~false&fbcs~false&ss~0&mbpop~false&sswpu~false&deo~false"
[8] ".bluenile.com\tTRUE\t/\tFALSE\t1727630275\tbnper\tver~5&NIB~0&DM~-&GUID~DA5C11F5_E468_46B5_B4E8_D551D4D6EA4D&SESS-CT~1&STC~32RPVK&FB_MINI~false&SUB~false"
[9] "#HttpOnly_www.bluenile.com\tFALSE\t/\tFALSE\t0\tJSESSIONID\tB8475C3AEC08205E5AC6252C94E4B858"
[10] ".bluenile.com\tTRUE\t/\tFALSE\t1727630278\tmigrationstatus\tver~1&redirected~false"

Resources