Unable to start server after Jupyterhub upgrade to 0.8.1

Unable to start server after Jupyterhub upgrade to 0.8.1 - jupyter-notebook

I have recently upgraded jupyterhub from 0.7 to 0.8.1. After upgrade, i have upgraded the sqllite database as well as mentioned in the upgrade documents. I'm able to start the jupyterhub service but after login, i'm unable to start the server with below error. My server is AD integrated for login. This was working perfectly before upgrade. Any idea how this can be resolved?
[I 2019-03-14 15:21:57.698 JupyterHub base:346] User logged in: test
[E 2019-03-14 15:21:57.746 JupyterHub user:427] Unhandled error starting test's server: 'getpwnam(): name not found: test'
[E 2019-03-14 15:21:57.755 JupyterHub web:1590] Uncaught exception POST /hub/login?next= (192.168.0.24)
HTTPServerRequest(protocol='https', host='jupyter2.testing.com', method='POST', uri='/hub/login?next=', version='HTTP/1.1', remote_ip='192.168.0.24', headers={'Content-Type': 'application/x-www-form-urlencoded', 'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8', 'Cookie': '_xsrf=2|e7c2dfb6|2e7d6377e8446061ff8be0e64f86210f|1551259887', 'Upgrade-Insecure-Requests': '1', 'Host': 'jupyter2.testing.com', 'X-Forwarded-Proto': 'https', 'Origin': 'https://jupyter2.testing.com', 'X-Real-Ip': '192.168.0.24', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate, br', 'Content-Length': '42', 'Cache-Control': 'max-age=0', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36', 'X-Forwarded-Port': '443', 'Referer': 'https://jupyter2.testing.com/hub/login', 'X-Forwarded-Host': 'jupyter2.testing.com', 'X-Forwarded-For': '192.168.0.24,127.0.0.1', 'Connection': 'close', 'X-Nginx-Proxy': 'true'})
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.4/site-packages/tornado/web.py", line 1511, in _execute
result = yield result
File "/usr/local/python3/lib/python3.4/site-packages/jupyterhub/handlers/login.py", line 94, in post
yield self.spawn_single_user(user)
File "/usr/local/python3/lib/python3.4/site-packages/jupyterhub/handlers/base.py", line 475, in spawn_single_user
yield gen.with_timeout(timedelta(seconds=self.slow_spawn_timeout), finish_spawn_future)
File "/usr/local/python3/lib/python3.4/site-packages/jupyterhub/handlers/base.py", line 445, in finish_user_spawn
yield spawn_future
File "/usr/local/python3/lib/python3.4/site-packages/jupyterhub/user.py", line 439, in spawn
raise e
File "/usr/local/python3/lib/python3.4/site-packages/jupyterhub/user.py", line 378, in spawn
ip_port = yield gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
File "/usr/local/python3/lib/python3.4/site-packages/jupyterhub/spawner.py", line 968, in start
env = self.get_env()
File "/usr/local/python3/lib/python3.4/site-packages/jupyterhub/spawner.py", line 960, in get_env
env = self.user_env(env)
File "/usr/local/python3/lib/python3.4/site-packages/jupyterhub/spawner.py", line 947, in user_env
home = pwd.getpwnam(self.user.name).pw_dir
KeyError: 'getpwnam(): name not found: test'
[E 2019-03-14 15:21:57.756 JupyterHub log:114] {
"Content-Type": "application/x-www-form-urlencoded",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Cookie": "_xsrf=2|e7c2dfb6|2e7d6377e8446061ff8be0e64f86210f|1551259887",
"Upgrade-Insecure-Requests": "1",
"Host": "jupyter2.testing.com",
"X-Forwarded-Proto": "https",
"Origin": "https://jupyter2.testing.com",
"X-Real-Ip": "192.168.0.24",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Content-Length": "42",
"Cache-Control": "max-age=0",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36",
"X-Forwarded-Port": "443",
"Referer": "https://jupyter2.testing.com/hub/login",
"X-Forwarded-Host": "jupyter2.testing.com",
"X-Forwarded-For": "192.168.0.24,127.0.0.1",
"Connection": "close",
"X-Nginx-Proxy": "true"
}
[E 2019-03-14 15:21:57.757 JupyterHub log:122] 500 POST /hub/login?next= (#192.168.0.24) 199.65ms

Related

Scrapy parsed unknown character

I have wanted to scrape the site https://www.bikebd.com/brand/yamaha/ . here is my script
import scrapy
from scrapy.utils.response import open_in_browser
from urllib.parse import urlencode
class BikebdSpider(scrapy.Spider):
name = 'bikebd'
allowed_domains = ['www.bikebd.com']
headers = {
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,da;q=0.8',
'cache-control':'no-cache',
'cookie': '_ga=GA1.1.1549289426.1669609851; XSRF-TOKEN=eyJpdiI6IjhCN1BnV0RoK3dOQnFEQlhYRUZVZEE9PSIsInZhbHVlIjoiTFQ4Ym15MWhoU1hmR3FxaWdVYnkvbnovMTVDbS9iRm1OVCsrV0F2RzA5dHlmMWpObENoSFBWY0VXclBNWkZaNlV1aitwSXBWNjhNMGs2Z3JqQ3ZvQWVIQ25QcnNOZkNpR3lwMGNkL01aWHM3VDZ5YmZJblRha0kyUk5IMTh2UzQiLCJtYWMiOiJjMzFmMDZlZDFjNzVhNTVlZjY1MWEzNWJkZjY5Y2Q1MjFiZmNmM2UxOWRiZWJlMGRhZWY5OGU0MGQ4OWI5N2ViIiwidGFnIjoiIn0%3D; bikebd_session=eyJpdiI6ImVVb2NqcmFLR2dKSXc2NnNqUlV6ZWc9PSIsInZhbHVlIjoibUVNcEZidUxsbWdkK3c2UDFYdDYwcHFOdVU1WmVXY0ZiV1pHRzJBbzlaUDNuWGl2Vk1OTk5QYnRkdmVXdDg3bEx2SEpiMGE1c2dvakdkU0tQOTBucHc5ajRpcGpod2ViL3B2ME9DRXc4SUFtSG56YU9MVTdEVi9rYW8reXk0TDYiLCJtYWMiOiI5MmU2NWEyZDhkOGFiNTdkYzQ0ZGJhMDQwNzFhYzFmOGY4MzNjNWU2ODczYWNiOTVlNjU4MWUyZWVmMzE5NjNmIiwidGFnIjoiIn0%3D; _ga_HEG073JLWK=GS1.1.1670663205.2.1.1670663540.0.0.0',
'pragma': 'no-cache',
'referer': 'https://www.bikebd.com/bike-price-in-bd',
'sec-ch-ua': '"Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': "Windows",
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
}
def start_requests(self):
urls = ["https://www.bikebd.com/brand/yamaha"]
for url in urls:
yield scrapy.Request(url= url , callback=self.parse)
def parse(self, response):
container = response.xpath("//div[#class='tab-cards-rev']/div/div[#class='col-md-3']")
for item in container:
title = item.xpath(".//h2[#class='ltn-tittle mb-0']/strong/text()").get()
yield {'title' : title}
but when I crawl the script it returns none. Then I start debug the spider with open in browser by this line of code...
def parse(self, response):
open_in_browser(response)
Then it showed me some unreadable characters like below.
ôÿ‘šôC##tøœ÷ÿfjun_N?ÈKÛžDÜ$YVyË²UÒ–³TUoŸÄ'…8hI®ö,ëá4·9g¶åÝûtÎOûUéCh|â¢€Ð8`÷ D“†b&³“ÝªW¯ª~÷À"Á¹¹a]Ðøß¿{©wŽ(€è ¼ÇX¶nû»¥ˆŠ¦eÙËÿ«Íñ"åY³1Vÿõ¯³½ÍUDÃ±‡½â`¹‰½é½ê”§Œl‡%,{Š»È?8PaÐ-œ[·EÏ&Žl,ö‰êµŽÄ€ŠŒ+ŒMØèãG{L˜ž2 ?£?èaÂ´UWÞ$[0²üÃZ’‡N±ÅÔ%$[pÝ9ä[ ¯±ÖÞW(ñ¥-ˆxf¿ì±
What's going on the site? need some help.

requests.get 500 error code trying to access .cfm page

I am trying to scrape the following page:
https://apps.fcc.gov/oetcf/tcb/reports/Tcb731GrantForm.cfm?mode=COPY&RequestTimeout=500&tcb_code=&application_id=ll686%2BwlPnFzHQb6tru2vw%3D%3D&fcc_id=QDS-BRCM1095
headers_initial = {
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Mobile Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'en-US,en;q=0.9,de;q=0.8',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
}
r = requests.get(url, timeout=100, headers=headers_initial)
print(r.status_code)
print(r.headers)
print(r.text)
my status code is 400
my requests.get gets hung up. I would be very appreciative of any help someone can provide.

How to get around 403 error when webscraping with R

I am trying to webscrape some price information from a local supermarket. I am being denied access to the site, and not sure why... I have updated my user identity to be that of google chrome but am still getting the same error. Thanks!
library(rvest)
library(dplyr)
link <- "https://www.paknsave.co.nz/shop/product/5031015_ea_000pns?name=size-7-eggs"
page <- GET(link, add_headers('user-agent' = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"))

import requests
cookies = {
'__cf_bm': 'HrPgHXxwZ21ZHce6g119GTh0TW5PLK226Avwsqr_yRk-1657420237-0-AV3AFcbB1RRPQi9sj9f0jlyEtnLOU3joTSTqvIuc0StLeyezQdAJDeSSBWpSuxYxQLz6k7KvDjIKR4dPCPww4nxztaohWaWLgKR8wJw1OopkzNjFT7V/MPgZknXPuL4W0B//cUxLgOniMWzJyUqDjAPqJ3fIVNZykHBsk3kWx+krXKDl/xVcmgfD0X8HnQoBtw==',
'cf_chl_2': '6362dc2388c492e',
'cf_chl_prog': 'x13',
'cf_clearance': '7Q36fdlfvE_xpzRSuN425iQrAXi0K6t9oMEg9bgBl1E-1657420230-0-150',
'shell#lang': 'en',
'SessionCookieIdV2': '2f331eba017f4978a21db30d38bd58bd',
'SC_ANALYTICS_GLOBAL_COOKIE': '75db5ec972684f1d83a298be947ff26f|False',
'server_nearest_store_v2': '{"StoreId":"3c5e3145-0767-4066-9349-6c0a1313acc5","UserLat":"37.7697","UserLng":"-122.3933","StoreLat":"-35.09945","StoreLng":"173.258322","IsSuccess":true}',
'__RequestVerificationToken': 'i7yGKUCMmP0LpzH6Ir9q8Tin79X0zz2C9mzoUh_VUyNxQNWZ-Gm64inb2J8yRT7C89VdUZc85pIIztehy5ypTrgxBmU1',
'STORE_ID_V2': '3c5e3145-0767-4066-9349-6c0a1313acc5|False',
'Region': 'NI',
'AllowRestrictedItems': 'true',
'sxa_site': 'PAKnSAVE',
'__cfruid': '8f13df268c53d03a3b3440e47baa5df4671d278d-1657420232',
'_gcl_au': '1.1.1855441244.1657420235',
'_ga_8ZFCCVKEC2': 'GS1.1.1657420235.1.1.1657420235.60',
'_ga': 'GA1.1.444441072.1657420235',
'FPLC': 'G6JkKZ86eQgLbN2PTg5DU9nts8HFZj2ZdPTjM6VTo6Johf6YgbfYcZZVDcnxgUmYN%2FdRRR6%2Fz4mEDQIYWroUc8Rhy5%2BXkehpQlNuUN%2Bd11JsFx8S%2FzyGohu9wvfYeA%3D%3D',
'FPID': 'FPID2.3.pLYyjOkBCu9gt8rah2k%2BxfEuOt1pMJfZ%2Fg7VwV%2Fwsy8%3D.1657420235',
'ASP.NET_SessionId': '1rzzw1ls1vagg4fdeayflrm0',
'fs-store-select-tooltip-closed': 'true',
}
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
# 'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.paknsave.co.nz/shop/product/5031015_ea_000pns?name=size-7-eggs&__cf_chl_tk=ZpR7svpE5x07zN1HC3SVHKDAAVXTtdqPUtgz1pBfj.A-1657420229-0-gaNycGzNCD0',
'Origin': 'https://www.paknsave.co.nz',
'DNT': '1',
'Connection': 'keep-alive',
# Requests sorts cookies= alphabetically
# 'Cookie': '__cf_bm=HrPgHXxwZ21ZHce6g119GTh0TW5PLK226Avwsqr_yRk-1657420237-0-AV3AFcbB1RRPQi9sj9f0jlyEtnLOU3joTSTqvIuc0StLeyezQdAJDeSSBWpSuxYxQLz6k7KvDjIKR4dPCPww4nxztaohWaWLgKR8wJw1OopkzNjFT7V/MPgZknXPuL4W0B//cUxLgOniMWzJyUqDjAPqJ3fIVNZykHBsk3kWx+krXKDl/xVcmgfD0X8HnQoBtw==; cf_chl_2=6362dc2388c492e; cf_chl_prog=x13; cf_clearance=7Q36fdlfvE_xpzRSuN425iQrAXi0K6t9oMEg9bgBl1E-1657420230-0-150; shell#lang=en; SessionCookieIdV2=2f331eba017f4978a21db30d38bd58bd; SC_ANALYTICS_GLOBAL_COOKIE=75db5ec972684f1d83a298be947ff26f|False; server_nearest_store_v2={"StoreId":"3c5e3145-0767-4066-9349-6c0a1313acc5","UserLat":"37.7697","UserLng":"-122.3933","StoreLat":"-35.09945","StoreLng":"173.258322","IsSuccess":true}; __RequestVerificationToken=i7yGKUCMmP0LpzH6Ir9q8Tin79X0zz2C9mzoUh_VUyNxQNWZ-Gm64inb2J8yRT7C89VdUZc85pIIztehy5ypTrgxBmU1; STORE_ID_V2=3c5e3145-0767-4066-9349-6c0a1313acc5|False; Region=NI; AllowRestrictedItems=true; sxa_site=PAKnSAVE; __cfruid=8f13df268c53d03a3b3440e47baa5df4671d278d-1657420232; _gcl_au=1.1.1855441244.1657420235; _ga_8ZFCCVKEC2=GS1.1.1657420235.1.1.1657420235.60; _ga=GA1.1.444441072.1657420235; FPLC=G6JkKZ86eQgLbN2PTg5DU9nts8HFZj2ZdPTjM6VTo6Johf6YgbfYcZZVDcnxgUmYN%2FdRRR6%2Fz4mEDQIYWroUc8Rhy5%2BXkehpQlNuUN%2Bd11JsFx8S%2FzyGohu9wvfYeA%3D%3D; FPID=FPID2.3.pLYyjOkBCu9gt8rah2k%2BxfEuOt1pMJfZ%2Fg7VwV%2Fwsy8%3D.1657420235; ASP.NET_SessionId=1rzzw1ls1vagg4fdeayflrm0; fs-store-select-tooltip-closed=true',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'Cache-Control': 'max-age=0',
# Requests doesn't support trailers
# 'TE': 'trailers',
}
params = {
'name': 'size-7-eggs',
}
data = {
'md': 'LY3z3moXjvkiL6.TltBjKutBlxr0gRcWotnWZD224Ik-1657420229-0-AdVFW-EtYzqbcg7Spq0beYQxr56eQ35wUZByyeUdPhP2RYKPi-G5qDV3BcVp9a-cMTclDfdEhbvXZLuhffGQLmLiSva5afHqpVZZYRepw3ej7SDL2x_vpDpT7yDzSdOVhiRIYWNgG82LWigFM5t7GPoG9XgJTDbpt7exsP4fbjENcCSQCGPhzI8H1FZVDUmRLDMMRLSFECC_ntCat-xMaNN1-LMQnqb_ASBKE7tQzjtFlZc3Uix4SsRbeZqs1CWJWdVsRMfm8jNh9hhG9NuMIq2ggRZGd7r1va7C8m1aj1UdlbnzM2juswggBe-1J1gMF6ZFrjmbiulBfe-HSwf3h65MlDrX63uJTU4XCg62A9HMGq_5t2IcNa8V93H4fLeJEI-KMsmHmhM-gE-VHHUV1ygSyaK1RQQvDNVF2K9QRFYaMZBc0rjMaJsZd8tiU5vXW4xEAWKDvxZHSAkXqklXpKY58VTudkiRw_xrcAzIkGotTZ3okQwAIV4BFspJOO6ir9yx4MIyjPr53rGvqQEOSa24GHlpAm8EEojo3FGbu_YUX5vjjptyItyM-juiyxqdiWx7dKA1gY-KJjwNpVKYhfAfLgH7EU86WUTPHZK2Zkx1d3URpbaKnll47i18d-dSnfuWwt1NwAv_rcr_tFdC2cdYxoebGLMqd7DIKdRR7BgNve_lOxnVjv6-tS7eIoOuCw10FX3HN_mVv7ez8RXLYKYlbMFeKw_tbUNqRayHFsjifNPwr0nkZI_dCpxfwc56pYIWLprD4GbRvMW8DLvb9780wJteNcbw3lUAwljK_lVX07rqC69W_SEzU_Tx7SA5XA',
'r': 'NhJKZcFC0PI8V8pbbsMGnQLopbV2aQcLxBSjbGH0Rnk-1657420229-0-AV1Uyb7KhY0yniBW0nJsIQ9n0cm+m8jbKXLtyUVB8dsxtMpVww0d5vQUlTu06beYL4XRKa9x8nz8ddLZzJhz/rW61z8Bax45FL5KQqVdrnC5Ki3ul+MZnmLwBC1Do1DYP2837DScEbbPB9lKtzv0M6M0+plSLMwyYKolemCmp3vUI3DWKvvv0SLBP1R/hzqGH56HuR/wTrJEv2mmjOUiqa+FyG9Go6wMIE57FV3WmtaRHp2ZE+QtpgLw8o9l6KUN8wZByC+NfSfBLDDK1ofEtb0aDXPPjr+JfgMeM1rigJOGxsYTN8tcdoQA7B/wvA/xWGJ+V49AoJOWo1pXm0WqXenbsHYbFwYYT8wVoiUAMod6uxPKqxJYGrOJkI6N28k/WjcRQaJ4Tbz6hR18WBN8xayfGtTvfc2vVHpfZGzI2BjKIbGVQ6UL6mgGFsQZ16UDfX0FyDOkvDRtd3Q645K8l9oUt1++PLoEQxmOP21FegAjIRFVy1+WfgKVJgubsnpNRIOxZ+U9EZbgRTDjJO+ruPzzUFPixw74ZNCInHVNE4xwpUWndqDTS17RKX/ZR3auc1rltgtJGHrVFzATgLjAroAhSyy24ddDpvGcRXZZSoaI+X6bXf7A0UVh2BxPvPYmbwsxTXblYxve6enVJvN5drt+n+0nVzoih3VhMHuQqDLebFn+Yfq4OJmVgpq0yXuiGM++JPY/5H8bicbKiCMjG+JSUSJOyJoBItscitorPNSyJC6OX1laLmXarxXbLd2AXUMhXAteXoToSDm1gUbCOkYBR/1q7lv/QuBJBiFV9Rnt3zVRsqNPNGzlz6CzTVPpHGU209I21lA79VYTtnLkAjWVRE/PRARD4qK4JdzAvnJyI8xoBGebUYs+nySaewIMnjslXPQSYLisF53fNLwkcjUoZqq1qMEmw7Wc2fq6DB/9MFTa7ZVc9luxy8mbdyRAeh9XXOfbUNhwp0RYaC9ps0pptrj/2e7FJOhe/r63h+DoA3LhSh8JOc40SGE6ayfsgr5FVAmUwsFvQE2sYuCI4GaULBP+tVkhOrEY9793n09AM9ljQ7Cr2dV+0p80xQdzy7td7pEVOa/qw4IvYPTBLWGHjBacJiON0ARj+uO0RdWR7MSJP/6WGvvF01Tbcdd12Ss4JzqYqc+sDJ9VjqaqawOW79JI7DjUYXPhJqJ6iGPxMDLe939qTpystXf6Fvi3ZGovpBru0aMFlCmTU/HwtkwAG3G5Hzg2GFFr2ViuYzB1TrGzzGDmbOwuWEG6p6l/WCeuY8l5f/NfqTq8oLaGCiDYr9sbJL4EOHbJZ+6tcaoQxD5xm+Yd3jskCqk6MY7vGARUrof/Wl0GhU8znpVZeDa7wKmzGd6XGYG/gJKnM6rOf3I/sEnY8HJ5Hj9o7bZ52x9N80DwPJbGbTvVG9JR9pE1B0MPqrUUM1Omkh2aUh/Co7qAf2qC3aeTBLbwKwXN6TcB6S1yOGcvNMT+eKbdMpA1Ac0YjvD0b1t3/SlK3pkx10kBhXJ3HE0bj/WiqmHNW/OX3FiT7B06ynF+rKPrUPKqQ089/rThZ+VAheq7KveUxJtVXAwkOwe0xn7hk5HuhmKLq1i8psr1eFU9IJYmSB8QENvZ1k4ZOUdBbBZxBeMpA8iA2pu57E/+hTCDvjdpxxETwu84Y1sEHxVO80Qsir25+DDemFMiVi9DRUlyaiZ43dHC/qhrb4TEQiRWpYTpOrjv7Z0YPZUm5O3Q5hyXYfpgeuJ1+0JHLz/KH0U0lNLynMAAjyypipScAruzr25YGHAGsexzTwoQRoVED6nNRbc/4hQcFdNhRIyhd1aDNDkkzOC3gKPn8kjpFQqVmoAQU8Yfv6BohhHMyon5+sNV3Fdp1/az30lILeriDWU7KoL70nmvdyBcmboUyGesJS4GPWAe67E0sU9NLcZF6LzoP1YUmdd0FQZ7wvisAg2yJyBVwXD+eehpLE7gGeXEyAxr7DepYT8wwqEGk4Dcx+4AScviP84T8JKiDiWchaGW/GTjdc/5flgFa3BeR/4W94wDpQ==',
'vc': '1574e190db357034339f269c7c5755d0',
'captcha_vc': 'c3856b069d07c7e16a7767324ca6f885',
'captcha_answer': 'cppbodShTefN-13-7285e0b05b37b8af',
'cf_ch_cp_return': '9a10d0a1037bf2b325009ab7be973b18|{"managed_clearance":"ni"}',
}
response = requests.post('https://www.paknsave.co.nz/shop/product/5031015_ea_000pns', params=params, cookies=cookies, headers=headers, data=data)

scrapy returns response.status 505

scrapy when trying to open the site returns response.status 505
505 HTTP Version Not Supported
The same site opens normally in the browser. Why might this be? How can this be fixed?
I call scrapy in console by this command line:
scrapy shell 'https://xiaohua.zol.com.cn/detail60/59411.html'

You should use proper headers to extract the data. here is a demo with output
import scrapy
from scrapy.crawler import CrawlerProcess
import json
class Xiaohua(scrapy.Spider):
name = 'xiaohua'
start_urls = 'https://xiaohua.zol.com.cn/detail60/59411.html'
def start_requests(self):
headers = {
'authority': 'xiaohua.zol.com.cn',
'cache-control': 'max-age=0',
'sec-ch-ua': '"Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Linux"',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'cross-site',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'en-US,en;q=0.9',
'cookie': 'z_pro_city=s_provice%3Dmengjiala%26s_city%3Dnull; userProvinceId=1; userCityId=0; userCountyId=0; userLocationId=1; ip_ck=7sWD7/jzj7QuOTIyODI0LjE2MzQxMTQxNzg%3D; lv=1634114179; vn=1; Hm_lvt_ae5edc2bc4fc71370807f6187f0a2dd0=1634114179; _ga=GA1.3.116086394.1634114186; _gid=GA1.3.2021660129.1634114186; Hm_lpvt_ae5edc2bc4fc71370807f6187f0a2dd0=1634114447; questionnaire_pv=1634083202; z_day=ixgo20%3D1%26icnmo11564%3D1; 22aa20c0da0b6f1d9a3155e8bf4c364e=cq11lgg54n27u10p%7B%7BZ%7D%7D%7B%7BZ%7D%7Dnull; MyZClick_22aa20c0da0b6f1d9a3155e8bf4c364e=/html/body/div%5B5%5D/div/div/div%5B2%5D/p/a/',
}
yield scrapy.Request(url= self.start_urls , callback=self.parse, headers=headers)
def parse(self, response):
print(response.status)
print('*'*10)
print(response.css('h1.article-title::text').get())
print(response.css('ul.nav > li > a::text').getall())
print('*'*10)
process = CrawlerProcess()
process.crawl(Xiaohua)
process.start()
output
200
**********
导演你能认真点儿吗
['笑话首页', '最新笑话', '冷笑话', '搞笑趣图', '搞笑视频', '上传笑话']
**********

Can't scrape robots in python using beautifulsoup

I managed to get the soup and the html of the webpage, but for some reason can't find the robots tag even though I can find it when scraping in other languages.
Example:
headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8',
'Cache-Control': 'max-age=0', 'Connection': 'keep-alive',
}
res=requests.get('http://{}'.format("radverdirect.com"), headers=headers, allow_redirects = True)
number= str(res.status_code)
soup = BeautifulSoup(res.text, 'html.parser')
x=soup.find('meta', attrs={'name':'robots'})
out=x.get("content", None)
out
This site returns to me noodp in other languages but here I can't find this tag. Why and how do I fix it?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Unable to start server after Jupyterhub upgrade to 0.8.1 - jupyter-notebook

Related

Scrapy parsed unknown character

requests.get 500 error code trying to access .cfm page

How to get around 403 error when webscraping with R

scrapy returns response.status 505

Can't scrape robots in python using beautifulsoup

Categories

Resources