How can I use Scrapy to download all my Quora answers?

How can I use Scrapy to download all my Quora answers? - web-scraping

I'm trying to use Scrapy to download my Quora answers, but I can't even seem to be able to download my page. Using the simple
scrapy shell 'http://it.quora.com/profile/Ferdinando-Randisi'
returns this error
2017-10-05 22:16:52 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: quora)
2017-10-05 22:16:52 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'quora.spiders', 'ROBOTSTXT_OBEY': True, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'SPIDER_MODULES': \[quora.spiders'], 'BOT_NAME': 'quora', 'LOGSTATS_INTERVAL': 0}
....
2017-10-05 22:16:53 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-10-05 22:16:53 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-10-05 22:16:53 [scrapy.core.engine] INFO: Spider opened
2017-10-05 22:16:54 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://it.quora.com/robots.txt> from <GET http://it.quora.com/robots.txt>
2017-10-05 22:16:55 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://it.quora.com/robots.txt> (referer: None)
2017-10-05 22:16:55 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://it.quora.com/profile/Ferdinando-Randisi> from <GET http://it.quora.com/profile/Ferdinando-Randisi>
2017-10-05 22:16:56 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://it.quora.com/profile/Ferdinando-Randisi> (referer: None)
2017-10-05 22:16:58 [root] DEBUG: Using default logger
What's wrong? Error 429 is associated with too many requests, but I'm making only one request. Why would that be too many?

It blocks Scrapy based on user agent string. Try to mimic e.g. Chromium:
scrapy shell "http://it.quora.com/profile/Ferdinando-Randisi" -s USER_AGENT="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.52 Safari/537.36"

Related

Scrapy playwright gets stuck on Telnet console listening

I am doing a practice project about scraping dynamically loaded content with scrapy-plawright but I managed to hit a wall and cannot figure out what's the issue is. The spider simply refuses to start the crawling process and gets stuck on the "Telnet console listening on 127.0.0.1:6023" part.
I set up the project as it is is recommended in the tutorial.
this is how the relevant part of my settings.py looks like (I played around other settings too to try to fix it like with CONCURRENT_REQUESTS and COOKIES_ENABLED but no changes)
import asyncio
from scrapy.utils.reactor import install_reactor
install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
And this is how the spider itself:
class roksh_crawler(scrapy.Spider):
name = "roksh_crawler"
def start_requests(self):
yield Request(
url="https://www.roksh.com/",
callback=self.parse,
meta={
"playwright": True,
"playwright_page_methods": [
PageMethod("screenshot", path="example.png", full_page=True),
],
},
)
def parse(self, response):
screenshot = response.meta["playwright_page_methods"][0]
# screenshot.result contains the image's bytes
I tried to take a screenshot of the page but nothing else works either so I assume this is not the issue.
And here is the the log I am getting:
2022-11-24 09:54:19 [scrapy.utils.log] INFO: Scrapy 2.7.1 started (bot: roksh_crawler)
2022-11-24 09:54:19 [scrapy.utils.log] INFO: Versions: lxml 4.9.1.0, libxml2 2.9.12, cssselect 1.2.0, parsel 1.7.0, w3lib 2.0.1, Twisted 21.7.0, Python 3.9.5 (tags/v3.9.5:0a7dcbd, May 3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)], pyOpenSSL 22.1.0 (OpenSSL 3.0.7 1 Nov 2022), cryptography 38.0.3,
Platform Windows-10-10.0.19045-SP0
2022-11-24 09:54:19 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'roksh_crawler',
'CONCURRENT_REQUESTS': 32,
'NEWSPIDER_MODULE': 'roksh.spiders',
'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
'ROBOTSTXT_OBEY': True,
'SPIDER_MODULES': ['roksh.spiders'],
'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
2022-11-24 09:54:19 [asyncio] DEBUG: Using selector: SelectSelector
2022-11-24 09:54:19 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2022-11-24 09:54:19 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop
2022-11-24 09:54:19 [scrapy.extensions.telnet] INFO: Telnet Password: 7aad12ee78cfff92
2022-11-24 09:54:19 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2022-11-24 09:54:19 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2022-11-24 09:54:19 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-11-24 09:54:19 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-11-24 09:54:19 [scrapy.core.engine] INFO: Spider opened
2022-11-24 09:54:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-11-24 09:54:19 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2022-11-24 09:55:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-11-24 09:56:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-11-24 09:57:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-11-24 09:58:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-11-24 09:59:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-11-24 10:00:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-11-24 10:01:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-11-24 10:02:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-11-24 10:03:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
and this goes on infinitely.
I also tried with different URLs but got the same result so I assume the problem is on my end not on the server's. Plus if I try to run the spider without playwright (so if I take out the DOWNLOAD_HANDLERS from the settings) then it works, albeit it only returns the source HTML which is not my desired result.

It works fine for me.
Just remove or comment out these lines in your settings.py file
# import asyncio
# from scrapy.utils.reactor import install_reactor
# install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')
# asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

I got timeout error with scrapy when crawling this page

I can't crawl this page https://www.adidas.pe/, scrapy crawl my_spider returns:
2018-12-17 15:36:39 [scrapy.core.engine] INFO: Spider opened
2018-12-17 15:36:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-17 15:36:39 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024
2018-12-17 15:36:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET http://www.adidas.pe/> from <GET http://adidas.pe/>
2018-12-17 15:37:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-17 15:38:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
I tried to change settings.py:
COOKIES_ENABLED = True
ROBOTSTXT_OBEY = False
and doesn't works

You could try changing USER_AGENT in settings.py, it works for me. My settings.py:
# -*- coding: utf-8 -*-
# Scrapy settings for adidas project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://doc.scrapy.org/en/latest/topics/settings.html
# https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
# https://doc.scrapy.org/en/latest/topics/spider-middleware.html
BOT_NAME = 'adidas'
SPIDER_MODULES = ['adidas.spiders']
NEWSPIDER_MODULE = 'adidas.spiders'
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'

Openstack unable to create instance due neutron connection error

I am trying to do a manual install of openstack. I am not able to create an instance. I followed the documention but i am still getting errors. Anyone willing to help me getting openstack started. Much appreciated
If i reload the website it does show up
apache error.log
Timeout when reading response headers from daemon process 'horizon': /usr/share/openstack-dashboard/openstack_dashboard/wsgi.py, referer: http://192.168.1.100/horizon/project/instances/
Neutron-server.log
2018-11-07 18:07:52.655 31285 DEBUG neutron.wsgi [-] (31285) accepted ('192.168.1.100', 35802) server /usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py:956
2018-11-07 18:07:52.799 31285 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-9b499140-3f8f-471c-8bdf-2c377170e782 b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] Attributes excluded by policy engine: [u'is_default', u'vlan_transparent'] _exclude_attributes_by_policy /usr/lib/python2.7/dist-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256
2018-11-07 18:07:52.801 31285 INFO neutron.wsgi [req-9b499140-3f8f-471c-8bdf-2c377170e782 b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] 192.168.1.100 "GET /v2.0/networks?shared=True HTTP/1.1" status: 200 len: 866 time: 0.1453359
2018-11-07 18:07:52.817 31286 DEBUG neutron.wsgi [-] (31286) accepted ('192.168.1.100', 35808) server /usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py:956
2018-11-07 18:07:52.959 31286 INFO neutron.wsgi [req-044501ca-4e30-4871-81f1-5b979b859277 b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] 192.168.1.100 "GET /v2.0/ports?network_id=fb280ae3-bc89-4922-bd85-391c79967ae8 HTTP/1.1" status: 200 len: 1804 time: 0.1413569
2018-11-07 18:08:03.763 31286 INFO neutron.wsgi [req-b48c285d-e875-4a93-ad2e-d810e407d46c b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] 192.168.1.100 "GET /v2.0/security-groups?fields=id&id=26ee0adb-06a9-49a1-a134-7058131ad216 HTTP/1.1" status: 200 len: 267 time: 0.0607591
2018-11-07 18:08:03.819 31286 INFO neutron.wsgi [req-959eb0d0-b302-4aa0-a57f-226133b7c9d3 b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] 192.168.1.100 "GET /v2.0/security-groups/26ee0adb-06a9-49a1-a134-7058131ad216 HTTP/1.1" status: 200 len: 2631 time: 0.0522490
2018-11-07 18:08:03.989 31286 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-514f65e7-6aed-42c5-aef8-9863650f8253 b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] Attributes excluded by policy engine: [u'is_default', u'vlan_transparent'] _exclude_attributes_by_policy /usr/lib/python2.7/dist-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256
2018-11-07 18:08:03.991 31286 INFO neutron.wsgi [req-514f65e7-6aed-42c5-aef8-9863650f8253 b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] 192.168.1.100 "GET /v2.0/networks?id=fb280ae3-bc89-4922-bd85-391c79967ae8 HTTP/1.1" status: 200 len: 866 time: 0.1661508
2018-11-07 18:08:04.007 31286 INFO neutron.wsgi [req-ff26e06a-3ec7-4232-a1f8-0ddbcdafead1 b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] 192.168.1.100 "GET /v2.0/quotas/85ee25734f664d6d822379674c93da44 HTTP/1.1" status: 200 len: 341 time: 0.0136049
2018-11-07 18:08:04.085 31286 INFO neutron.wsgi [req-b5797013-aab1-440f-b7c4-f0e8f8571a30 b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] 192.168.1.100 "GET /v2.0/ports?fields=id&tenant_id=85ee25734f664d6d822379674c93da44 HTTP/1.1" status: 200 len: 255 time: 0.0731540
2018-11-07 18:08:04.998 31286 DEBUG neutron.wsgi [req-f3e3f7d8-daa4-4464-ab4a-430cbf2dee58 5673e5fcc9ff4029b9a72e32310570fb 6e60623ffbbe49fb94cc0eade1d4f6e3 - default default] http://controller:9696/v2.0/extensions returned with HTTP 200 __call__ /usr/lib/python2.7/dist-packages/neutron/wsgi.py:715
2018-11-07 18:08:05.000 31286 INFO neutron.wsgi [req-f3e3f7d8-daa4-4464-ab4a-430cbf2dee58 5673e5fcc9ff4029b9a72e32310570fb 6e60623ffbbe49fb94cc0eade1d4f6e3 - default default] 192.168.1.100 "GET /v2.0/extensions HTTP/1.1" status: 200 len: 7807 time: 0.3759091
2018-11-07 18:08:05.118 31286 INFO neutron.wsgi [req-c8c02b81-6a39-4a1a-813c-8eeaffee2d53 5673e5fcc9ff4029b9a72e32310570fb 6e60623ffbbe49fb94cc0eade1d4f6e3 - default default] 192.168.1.100 "GET /v2.0/networks/fb280ae3-bc89-4922-bd85-391c79967ae8?fields=segments HTTP/1.1" status: 200 len: 212 time: 0.1159072
2018-11-07 18:08:05.243 31286 INFO neutron.wsgi [req-17890f41-07cc-4e53-998d-feb6443e3515 5673e5fcc9ff4029b9a72e32310570fb 6e60623ffbbe49fb94cc0eade1d4f6e3 - default default] 192.168.1.100 "GET /v2.0/networks/fb280ae3-bc89-4922-bd85-391c79967ae8?fields=provider%3Aphysical_network&fields=provider%3Anetwork_type HTTP/1.1" status: 200 len: 281 time: 0.1205151
2018-11-07 18:08:19.218 31289 DEBUG neutron.db.agents_db [req-68b39ef4-218e-442f-ab21-f2b8442bdb65 - - - - -] Agent healthcheck: found 0 active agents agent_health_check /usr/lib/python2.7/dist-packages/neutron/db/agents_db.py:326
2018-11-07 18:08:56.222 31289 DEBUG neutron.db.agents_db [req-68b39ef4-218e-442f-ab21-f2b8442bdb65 - - - - -] Agent healthcheck: found 0 active agents agent_health_check /usr/lib/python2.7/dist-packages/neutron/db/agents_db.py:326
neutron log file after trying to create a floating ip
0dfe9c3 HTTP/1.1" status: 200 len: 825 time: 0.0563171
2018-11-07 18:20:54.655 31285 DEBUG neutron.wsgi [-] (31285) accepted ('192.168.1.100', 37282) server /usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py:956
2018-11-07 18:20:54.660 31285 WARNING neutron.pecan_wsgi.controllers.root [req-6081c61f-abf0-4f97-9f00-5a2a4870fca4 b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] No controller found for: floatingips - returning response code 404: PecanNotFound
2018-11-07 18:20:54.661 31285 INFO neutron.pecan_wsgi.hooks.translation [req-6081c61f-abf0-4f97-9f00-5a2a4870fca4 b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] POST failed (client error): The resource could not be found.
2018-11-07 18:20:54.661 31285 DEBUG neutron.pecan_wsgi.hooks.notifier [req-6081c61f-abf0-4f97-9f00-5a2a4870fca4 b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] Skipping NotifierHook processing as there was no resource associated with the request after /usr/lib/python2.7/dist-packages/neutron/pecan_wsgi/hooks/notifier.py:74
2018-11-07 18:20:54.662 31285 INFO neutron.wsgi [req-6081c61f-abf0-4f97-9f00-5a2a4870fca4 b2e32e162a664ad0afd1a0c34643cd0c 85ee25734f664d6d822379674c93da44 - default default] 192.168.1.100 "POST /v2.0/floatingips HTTP/1.1" status: 404

Login with scrapy doesn't work

I have recently started using scrapy and was setting it up for a typical task of scraping a webpage which requires authentication.
My idea is to start with the login page, submit the form and then download the data from other login protected pages.
I can see that I am authenticated, however, I see that I am stuck in a loop of redirects when it goes to the download page.
My spider class looks like below:
class MySpiderWithLogin(Spider):
name = 'my-spider'
download_url = 'https://example.com/files/1.zip'
login_url = 'https://example.com/login'
login_user = '...'
login_password = '...'
def start_requests(self):
# let's start by sending a first request to login page
yield Request(self.login_url, self.parse_login)
def parse_login(self, response):
# got the login page, let's fill the login form...
return FormRequest.from_response(response,
formdata={'username': self.login_user, 'password': self.login_password},
callback=self.start_crawl,
dont_filter = True)
def start_crawl(self, response):
# OK, we're in, let's start crawling the protected pages
yield Request(self.download_url, dont_filter = True)
def parse(self, response):
# do stuff with the logged in respons
inspect_response(response, self)
return
What I see after running the spider is a redirect loop as below. I have abstracted login_page, download_page and some query parameters along with them namely ticket, jsession_id, cas_check
2016-12-21 18:06:36 [scrapy] DEBUG: Redirecting (302) to <GET <login_page>>
2016-12-21 18:06:39 [scrapy] DEBUG: Crawled (200) <GET <login_page>>
2016-12-21 18:06:39 [scrapy] DEBUG: Redirecting (302) to <GET <login_page/j_spring_cas_security_check;jsessionid=bar?ticket=foo> from <POST <login_page/j_spring_cas_security_check;jsessionid=bar?ticket=foo>
2016-12-21 18:06:42 [scrapy] DEBUG: Redirecting (302) to <GET home_page>
2016-12-21 18:06:44 [scrapy] DEBUG: Crawled (200) <GET <home page>>
2016-12-21 18:06:44 [scrapy] DEBUG: Redirecting (302) to <GET login_page>>from <GET download_page>
2016-12-21 18:06:47 [scrapy] DEBUG: Redirecting (302) to < download_page?ticket=biz> from <GET login_page>
2016-12-21 18:06:50 [scrapy] DEBUG: Redirecting (302) to <GET download_page> from <GET download_page?ticket=biz>
2016-12-21 18:06:54 [scrapy] DEBUG: Redirecting (302) to <GET login_page>>from <GET download_page>
....
....
2016-12-21 18:07:34 [scrapy] DEBUG: Discarding <GET download_page?ticket=biz_100>: max redirections reached
I have set my User-agent to that of a browser in settings.py but to no effect here. Any ideas what could possibly be wrong?
Here's payload of login form from a successful request from a browser for reference:
url: login_page/jsessionid=...?service=../j_spring_cas_security_check%3Bjsessionid%3D...
method: POST
payload:
- username: ...
- password: ...
- lt: e1s1
- _eventId: submit
- submit: Sign In
UPDATE
Using python's requests library works like a charm for the same url. Also it would be worthy to mention that the website uses Jasig CAS for authentication, which makes the download url one-time accessible for a given ticket. For any further access, a new ticket needs to be issued.
I am guessing that might be the reason why Scrapy's Request is stuck in redirects as it might not be built around the one-time access scenario.

Symfony on Heroku produces 500 Internal server error but there isn't the message

As i say in the title, the app produces a 500 internal server error, but in the logs there isn't the error, so I don't know how to understand what is happening.
This is the log:
2015-11-30T16:31:50.881209+00:00 heroku[router]: at=info method=GET path="/" host=myapp.herokuapp.com request_id=52d75ec4-3345-4d3c-88e6-a5f08c366dc2 fwd="151.77.121.140" dyno=web.1 connect=0ms service=10ms status=500 bytes=765
2015-11-30T16:31:50.882326+00:00 app[web.1]: 10.76.13.112 - - [30/Nov/2015:16:31:50 +0000] "GET / HTTP/1.1" 500 495 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
2015-11-30T16:31:51.618891+00:00 app[web.1]: 10.76.13.112 - - [30/Nov/2015:16:31:51 +0000] "GET /favicon.ico HTTP/1.1" 200 5430 "https://myapp.herokuapp.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
2015-11-30T16:31:51.617820+00:00 heroku[router]: at=info method=GET path="/favicon.ico" host=myapp.herokuapp.com request_id=cf2ecf03-65ee-4213-8d52-a7eaaa24ac76 fwd="151.77.121.140" dyno=web.1 connect=0ms service=1ms status=200 bytes=5696
How can I know which kind of error is producing the app so I can fix it?

Solved. To see the errors produced by the application, Symfony has to be configured to be compatible with the ephemeral system that Heroku uses.
So in config_prod.yml set the monolog to write to php://stderr:
monolog:
handlers:
main:
type: fingers_crossed
action_level: error
handler: nested
nested:
type: stream
# Required by Heroku ephemeral filesystem
path: "php://stderr"
level: debug
console:
type: console
More info about how to configure logging for Symfony apps on Heroku can be found on their help pages.
PS
The error was a simple mispelling: wrote default/index.html.twig instead of Default/index.html.twig. Is a problem of capital letters and git commits. Very simple but so tedious and hard to find!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How can I use Scrapy to download all my Quora answers? - web-scraping

It blocks Scrapy based on user agent string. Try to mimic e.g. Chromium: scrapy shell "http://it.quora.com/profile/Ferdinando-Randisi" -s USER_AGENT="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.52 Safari/537.36"

Related

Scrapy playwright gets stuck on Telnet console listening

I got timeout error with scrapy when crawling this page

Openstack unable to create instance due neutron connection error

Login with scrapy doesn't work

Symfony on Heroku produces 500 Internal server error but there isn't the message

Categories

Resources