Haskell Network.Browser HTTPS Connection - networking

Is there a way to make https calls with the Network.Browser package.
I'm not seeing it in the documentation on Hackage.
If there isn't a way to do it with browse is there another way to fetch https pages?
My current test code is
import Network.HTTP
import Network.URI (parseURI)
import Network.HTTP.Proxy
import Data.Maybe (fromJust)
import Control.Applicative ((<$>))
import Network.Browser
retrieveUrl :: String -> IO String
retrieveUrl url = do
rsp <- browse $ request (Request (fromJust uri) POST [] "Body")
return $ snd (rspBody <$> rsp)
where uri = parseURI url
I've been running nc -l -p 8000 and watching the output.
I see that it doesn't encrypt it when I do retrieveUrl https://localhost:8000
Also when I try a real https site I get:
Network.Browser.request: Error raised ErrorClosed
*** Exception: user error (Network.Browser.request: Error raised ErrorClosed)
Edit: Network.Curl solution (For doing a SOAP call)
import Network.Curl (curlGetString)
import Network.Curl.Opts
soapHeader s = CurlHttpHeaders ["Content-Type: text/xml", "SOAPAction: " ++ s]
proxy = CurlProxy "proxy.foo.org"
envelope = "myRequestEnvelope.xml"
headers = readFile envelope >>= (\x -> return [ soapHeader "myAction"
, proxy
, CurlPost True
, CurlPostFields [x]])
main = headers >>= curlGetString "https://service.endpoint"

An alternative and perhaps more "haskelly" solution as Travis Brown put it with http-conduit:
To just fetch https pages:
import Network.HTTP.Conduit
import qualified Data.ByteString.Lazy as L
main = simpleHttp "https://www.noisebridge.net/wiki/Noisebridge" >>= L.putStr
The below shows how to pass urlencode parameters.
{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Conduit
import qualified Data.ByteString.Lazy as L
main = do
initReq <- parseUrl "https://www.googleapis.com/urlshortener/v1/url"
let req' = initReq { secure = True } -- Turn on https
let req = (flip urlEncodedBody) req' $
[ ("longUrl", "http://www.google.com/")
-- ,
]
response <- withManager $ httpLbs req
L.putStr $ responseBody response
You can also set the method, content-type, and request body manually. The api is the same as in http-enumerator a good example is: https://stackoverflow.com/a/5614946

I've wondered about this in the past and have always ended up just using the libcurl bindings. It would be nice to have a more Haskelly solution, but Network.Curl is very convenient.

If all you want to do is fetch a page, Network.HTTP.Wget is the most simple way. Exhibit a:
import Network.HTTP.Wget
main = putStrLn =<< wget "https://www.google.com" [] []

Related

How to make a request to an IPv6 address using the http-client package in haskell?

I've been trying to make a request to an IPv6 address using the parseRequest function from Network.HTTP.Client (https://hackage.haskell.org/package/http-client-0.7.10/docs/Network-HTTP-Client.html) package as follows:
request <- parseRequest "http://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]"
Instead of parsing it as an address/addrInfo, it is parsed as a hostname and throws the error: does not exist (Name or service not known). As a next step, I tried pointing a domain to the same IPv6 address and then using the domain name in parseRequest, then it successfully resolves that into the IPv6 address and makes the request. Is there some other way I can directly use the IPv6 address to make the request using the http-client package?
PS: I also tried without square brackets around the IP address, in this case the error is Invalid URL:
request <- parseRequest "http://2001:0db8:85a3:0000:0000:8a2e:0370:7334"
More context:
For an IPv4 address, the getAddrInfo function generates the address as:
AddrInfo {addrFlags = [AI_NUMERICHOST], addrFamily = AF_INET, addrSocketType = Stream, addrProtocol = 6, addrAddress = 139.59.90.1:80, addrCanonName = Nothing}
whereas for IPv6 address(inside the square brackets format):
AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 6, addrAddress = 0.0.0.0:0, addrCanonName = Nothing}
and the error prints as:
(ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 6, addrAddress = 0.0.0.0:0, addrCanonName = Nothing}, host name: Just "[2001:0db8:85a3:0000:0000:8a2e:0370:7334]", service name: Just "80"): does not exist (Name or service not known))
When a literal IPv6 address is used in a URL, it should be surrounded by square brackets (as per RFC 2732) so the colons in the literal address aren't misinterpreted as some kind of port designation.
When a literal IPv6 address is resolved using the C library function getaddrinfo (or the equivalent Haskell function getAddrInfo), these functions are not required to handle these extra square brackets, and at least on Linux they don't.
Therefore, it's the responsibility of the HTTP client library to remove the square brackets from the hostname extracted from the URL before resolving the literal IPv6 address using getaddrinfo, and the http-client package doesn't do this, at least as of version 0.7.10. So, this is a bug, and I can see you've appropriately filed a bug report.
Unfortunately, I don't see an easy way to work around the issue. You can manipulate the Request after parsing to remove the square brackets from the host field, like so:
{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString (ByteString)
import qualified Data.ByteString as BS
import Network.HTTP.Client
import Network.HTTP.Types.Status (statusCode)
main :: IO ()
main = do
manager <- newManager defaultManagerSettings
request <- parseRequest "http://[::1]"
let request' = request { host = removeBrackets (host request) }
response <- httpLbs request' manager
print response
removeBrackets :: ByteString -> ByteString
removeBrackets bs =
case BS.stripPrefix "[" bs >>= BS.stripSuffix "]" of
Just bs' -> bs'
Nothing -> bs
The problem with this is that it also removes the square brackets from the value in the Host header, so the HTTP request will contain the header:
Host: ::1
instead of the correct
Host: [::1]
which may or may not cause problems, depending on the web server at the other end.
You could try using a patched http-client package. The following patch against version 0.7.10 seems to work, but I didn't test it very extensively:
diff --git a/Network/HTTP/Client/Connection.hs b/Network/HTTP/Client/Connection.hs
index 0e329cd..719822e 100644
--- a/Network/HTTP/Client/Connection.hs
+++ b/Network/HTTP/Client/Connection.hs
## -15,6 +15,7 ## module Network.HTTP.Client.Connection
import Data.ByteString (ByteString, empty)
import Data.IORef
+import Data.List (stripPrefix, isSuffixOf)
import Control.Monad
import Network.HTTP.Client.Types
import Network.Socket (Socket, HostAddress)
## -158,8 +159,12 ## withSocket :: (Socket -> IO ())
withSocket tweakSocket hostAddress' host' port' f = do
let hints = NS.defaultHints { NS.addrSocketType = NS.Stream }
addrs <- case hostAddress' of
- Nothing ->
- NS.getAddrInfo (Just hints) (Just host') (Just $ show port')
+ Nothing -> do
+ let port'' = Just $ show port'
+ case ip6Literal host' of
+ Just lit -> NS.getAddrInfo (Just hints { NS.addrFlags = [NS.AI_NUMERICHOST] })
+ (Just lit) port''
+ Nothing -> NS.getAddrInfo (Just hints) (Just host') port''
Just ha ->
return
[NS.AddrInfo
## -173,6 +178,11 ## withSocket tweakSocket hostAddress' host' port' f = do
E.bracketOnError (firstSuccessful addrs $ openSocket tweakSocket) NS.close f
+ where
+ ip6Literal h = case stripPrefix "[" h of
+ Just rest | "]" `isSuffixOf` rest -> Just (init rest)
+ _ -> Nothing
+
openSocket tweakSocket addr =
E.bracketOnError
(NS.socket (NS.addrFamily addr) (NS.addrSocketType addr)

Encoding problem with GET requests in Haskell

I'm trying to get some Json data from a Jira server using Haskell. I'm counting this as "me having problems with Haskell" rather than encodings or Jira because my problem is when doing this in Haskell.
The problem occurs when the URL (or query) has plus signs. After building my request for theproject+order+by+created, Haskell prints it as:
Request {
host = "myjiraserver.com"
port = 443
secure = True
requestHeaders = [("Content-Type","application/json"),("Authorization","<REDACTED>")]
path = "/jira/rest/api/2/search"
queryString = "?jql=project%3Dtheproject%2Border%2Bby%2Bcreated"
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
But the request fails with this response:
- 'Error in the JQL Query: The character ''+'' is a reserved JQL character. You must
enclose it in a string or use the escape ''\u002b'' instead. (line 1, character
21)'
So it seems like Jira didn't like Haskell's %2B. Do you have any suggestions on what I can do to fix this, or any resources that might be helpful? The same request sans the +order+by+created part is successful.
The code (patched together from these examples):
{-# LANGUAGE OverloadedStrings #-}
import Data.Aeson
import qualified Data.ByteString.Char8 as S8
import qualified Data.Yaml as Yaml
import Network.HTTP.Simple
import System.Environment (getArgs)
-- auth' is echo -e "username:passwd" | base64
foo urlBase proj' auth' = do
let proj = S8.pack (proj' ++ "+order+by+created")
auth = S8.pack auth'
request'' <- parseRequest urlBase
let request'
= setRequestMethod "GET"
$ setRequestPath "/jira/rest/api/2/search"
$ setRequestHeader "Content-Type" ["application/json"]
$ request''
request
= setRequestQueryString [("jql", Just (S8.append "project=" proj))]
$ setRequestHeader "Authorization" [S8.append "Basic " auth]
$ request'
return request
main :: IO ()
main = do
args <- getArgs
case args of
(urlBase:proj:auth:_) -> do
request <- foo urlBase proj auth
putStrLn $ show request
response <- httpJSON request
S8.putStrLn $ Yaml.encode (getResponseBody response :: Value) -- apparently this is required
putStrLn ""
_ -> putStrLn "usage..."
(If you know a simpler way to do the above then I'd take such suggestions as well, I'm just trying to do something analogous to this Python:
import requests
import sys
if len(sys.argv) >= 4:
urlBase = sys.argv[1]
proj = sys.argv[2]
auth = sys.argv[3]
urlBase += "/jira/rest/api/2/search?jql=project="
proj += "+order+by+created"
h = {}
h["content-type"] = "application/json"
h["authorization"] = "Basic " + auth
r = requests.get(urlBase + proj, headers=h)
print(r.json())
)
project+order+by+created is the URL-encoded string for the actual request project order by created (with spaces instead of +). The function setRequestQueryString expects a raw request (with spaces, not URL-encoded), and URL-encodes it.
The Python script you give for comparison essentially does the URL-encoding by hand.
So the fix is to put the raw request in proj:
foo urlBase proj' auth' = do
let proj = S8.pack (proj' ++ " order by created") -- spaces instead of +
...

How to get the Tor ExitNode IP with Python and Stem

I'm trying to get the external IP that Tor uses, as mentioned here. When using something like myip.dnsomatic.com, this is very slow. I tried what was suggested in the aforementioned link (python + stem to control tor through the control port), but all you get is circuit's IPs with no assurance of which one is the one on the exitnode, and, sometimes the real IP is not even among the results.
Any help would be appreciated.
Also, from here, at the bottom, Amine suggests a way to renew the identity in Tor. There is an instruction, controller.get_newnym_wait(), which he uses to wait until the new connection is ready (controller is from Control in steam.control), isn't there any thing like that in Steam (sorry, I checked and double/triple checked and couldn't find nothing) that tells you that Tor is changing its identity?
You can get the exit node ip without calling a geoip site.
This is however on a different stackexchange site here - https://tor.stackexchange.com/questions/3253/how-do-i-trap-circuit-id-none-errors-in-the-stem-script-exit-used-py
As posted by #mirimir his code below essentially attaches a stream event listener function, which is then used to get the circuit id, circuit fingerprint, then finally the exit ip address -
#!/usr/bin/python
import functools
import time
from stem import StreamStatus
from stem.control import EventType, Controller
def main():
print "Tracking requests for tor exits. Press 'enter' to end."
print
with Controller.from_port() as controller:
controller.authenticate()
stream_listener = functools.partial(stream_event, controller)
controller.add_event_listener(stream_listener, EventType.STREAM)
raw_input() # wait for user to press enter
def stream_event(controller, event):
if event.status == StreamStatus.SUCCEEDED and event.circ_id:
circ = controller.get_circuit(event.circ_id)
exit_fingerprint = circ.path[-1][0]
exit_relay = controller.get_network_status(exit_fingerprint)
t = time.localtime()
print "datetime|%d-%02d-%02d %02d:%02d:%02d % (t.tm_year, t.tm_mon, t.tm_mday, t.tm_hour, t.tm_min, t.tm_sec)
print "website|%s" % (event.target)
print "exitip|%s" % (exit_relay.address)
print "exitport|%i" % (exit_relay.or_port)
print "fingerprint|%s" % exit_relay.fingerprint
print "nickname|%s" % exit_relay.nickname
print "locale|%s" % controller.get_info("ip-to-country/%s" % exit_relay.address, 'unknown')
print
You can use this code for check current IP (change SOCKS_PORT value to yours):
import re
import stem.process
import requesocks
SOCKS_PORT = 9053
tor_process = stem.process.launch_tor()
proxy_address = 'socks5://127.0.0.1:{}'.format(SOCKS_PORT)
proxies = {
'http': proxy_address,
'https': proxy_address
}
response = requesocks.get("http://httpbin.org/ip", proxies=proxies)
print re.findall(r'[\d.-]+', response.text)[0]
tor_process.kill()
If you want to use socks you should do:
pip install requests[socks]
Then you can do:
import requests
import json
import stem.process
import stem
SOCKS_PORT = "9999"
tor = stem.process.launch_tor_with_config(
config={
'SocksPort': SOCKS_PORT,
},
tor_cmd= 'absolute_path/to/tor.exe',
)
r = requests.Session()
proxies = {
'http': 'socks5://localhost:' + SOCKS_PORT,
'https': 'socks5://localhost:' + SOCKS_PORT
}
response = r.get("http://httpbin.org/ip", proxies=proxies)
self.current_ip = response.json()['origin']

Create a portal_user_catalog and have it used (Plone)

I'm creating a fork of my Plone site (which has not been forked for a long time). This site has a special catalog object for user profiles (a special Archetypes-based object type) which is called portal_user_catalog:
$ bin/instance debug
>>> portal = app.Plone
>>> print [d for d in portal.objectMap() if d['meta_type'] == 'Plone Catalog Tool']
[{'meta_type': 'Plone Catalog Tool', 'id': 'portal_catalog'},
{'meta_type': 'Plone Catalog Tool', 'id': 'portal_user_catalog'}]
This looks reasonable because the user profiles don't have most of the indexes of the "normal" objects, but have a small set of own indexes.
Since I found no way how to create this object from scratch, I exported it from the old site (as portal_user_catalog.zexp) and imported it in the new site. This seemed to work, but I can't add objects to the imported catalog, not even by explicitly calling the catalog_object method. Instead, the user profiles are added to the standard portal_catalog.
Now I found a module in my product which seems to serve the purpose (Products/myproduct/exportimport/catalog.py):
"""Catalog tool setup handlers.
$Id: catalog.py 77004 2007-06-24 08:57:54Z yuppie $
"""
from Products.GenericSetup.utils import exportObjects
from Products.GenericSetup.utils import importObjects
from Products.CMFCore.utils import getToolByName
from zope.component import queryMultiAdapter
from Products.GenericSetup.interfaces import IBody
def importCatalogTool(context):
"""Import catalog tool.
"""
site = context.getSite()
obj = getToolByName(site, 'portal_user_catalog')
parent_path=''
if obj and not obj():
importer = queryMultiAdapter((obj, context), IBody)
path = '%s%s' % (parent_path, obj.getId().replace(' ', '_'))
__traceback_info__ = path
print [importer]
if importer:
print importer.name
if importer.name:
path = '%s%s' % (parent_path, 'usercatalog')
print path
filename = '%s%s' % (path, importer.suffix)
print filename
body = context.readDataFile(filename)
if body is not None:
importer.filename = filename # for error reporting
importer.body = body
if getattr(obj, 'objectValues', False):
for sub in obj.objectValues():
importObjects(sub, path+'/', context)
def exportCatalogTool(context):
"""Export catalog tool.
"""
site = context.getSite()
obj = getToolByName(site, 'portal_user_catalog', None)
if tool is None:
logger = context.getLogger('catalog')
logger.info('Nothing to export.')
return
parent_path=''
exporter = queryMultiAdapter((obj, context), IBody)
path = '%s%s' % (parent_path, obj.getId().replace(' ', '_'))
if exporter:
if exporter.name:
path = '%s%s' % (parent_path, 'usercatalog')
filename = '%s%s' % (path, exporter.suffix)
body = exporter.body
if body is not None:
context.writeDataFile(filename, body, exporter.mime_type)
if getattr(obj, 'objectValues', False):
for sub in obj.objectValues():
exportObjects(sub, path+'/', context)
I tried to use it, but I have no idea how it is supposed to be done;
I can't call it TTW (should I try to publish the methods?!).
I tried it in a debug session:
$ bin/instance debug
>>> portal = app.Plone
>>> from Products.myproduct.exportimport.catalog import exportCatalogTool
>>> exportCatalogTool(portal)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File ".../Products/myproduct/exportimport/catalog.py", line 58, in exportCatalogTool
site = context.getSite()
AttributeError: getSite
So, if this is the way to go, it looks like I need a "real" context.
Update: To get this context, I tried an External Method:
# -*- coding: utf-8 -*-
from Products.myproduct.exportimport.catalog import exportCatalogTool
from pdb import set_trace
def p(dt, dd):
print '%-16s%s' % (dt+':', dd)
def main(self):
"""
Export the portal_user_catalog
"""
g = globals()
print '#' * 79
for a in ('__package__', '__module__'):
if a in g:
p(a, g[a])
p('self', self)
set_trace()
exportCatalogTool(self)
However, wenn I called it, I got the same <PloneSite at /Plone> object as the argument to the main function, which didn't have the getSite attribute. Perhaps my site doesn't call such External Methods correctly?
Or would I need to mention this module somehow in my configure.zcml, but how? I searched my directory tree (especially below Products/myproduct/profiles) for exportimport, the module name, and several other strings, but I couldn't find anything; perhaps there has been an integration once but was broken ...
So how do I make this portal_user_catalog work?
Thank you!
Update: Another debug session suggests the source of the problem to be some transaction matter:
>>> portal = app.Plone
>>> puc = portal.portal_user_catalog
>>> puc._catalog()
[]
>>> profiles_folder = portal.some_folder_with_profiles
>>> for o in profiles_folder.objectValues():
... puc.catalog_object(o)
...
>>> puc._catalog()
[<Products.ZCatalog.Catalog.mybrains object at 0x69ff8d8>, ...]
This population of the portal_user_catalog doesn't persist; after termination of the debug session and starting fg, the brains are gone.
It looks like the problem was indeed related with transactions.
I had
import transaction
...
class Browser(BrowserView):
...
def processNewUser(self):
....
transaction.commit()
before, but apparently this was not good enough (and/or perhaps not done correctly).
Now I start the transaction explicitly with transaction.begin(), save intermediate results with transaction.savepoint(), abort the transaction explicitly with transaction.abort() in case of errors (try / except), and have exactly one transaction.commit() at the end, in the case of success. Everything seems to work.
Of course, Plone still doesn't take this non-standard catalog into account; when I "clear and rebuild" it, it is empty afterwards. But for my application it works well enough.

Download pdf file from wikipedia

Wikipedia provides a link (left side on Print/export) on every article to download the article as pdf. I wrote a small Haskell script which first gets the Wikipedia link and output the rendering link. When I am giving the rendering url as input, I am getting empty tags but the same url in browser provides download link.
Could someone please tell me how to solve this problem? Formated code on ideone.
import Network.HTTP
import Text.HTML.TagSoup
import Data.Maybe
parseHelp :: Tag String -> Maybe String
parseHelp ( TagOpen _ y ) = if any ( \( a , b ) -> b == "Download a PDF version of this wiki page" ) y
then Just $ "http://en.wikipedia.org" ++ snd ( y !! 0 )
else Nothing
parse :: [ Tag String ] -> Maybe String
parse [] = Nothing
parse ( x : xs )
| isTagOpen x = case parseHelp x of
Just s -> Just s
Nothing -> parse xs
| otherwise = parse xs
main = do
x <- getLine
tags_1 <- fmap parseTags $ getResponseBody =<< simpleHTTP ( getRequest x ) --open url
let lst = head . sections ( ~== "<div class=portal id=p-coll-print_export>" ) $ tags_1
url = fromJust . parse $ lst --rendering url
putStrLn url
tags_2 <- fmap parseTags $ getResponseBody =<< simpleHTTP ( getRequest url )
print tags_2
If you try requesting the URL through some external tool like wget, you will see that Wikipedia does not serve up the result page directly. It actually returns a 302 Moved Temporarily redirect.
When entering this URL in a browser, it will be fine, as the browser will follow the redirect automatically. simpleHTTP, however, will not. simpleHTTP is, as the name suggests, rather simple. It does not handle things like cookies, SSL or redirects.
You'll want to use the Network.Browser module instead. It offers much more control over how the requests are done. In particular, the setAllowRedirects function will make it automatically follow redirects.
Here's a quick and dirty function for downloading an URL into a String with support for redirects:
import Network.Browser
grabUrl :: String -> IO String
grabUrl url = fmap (rspBody . snd) . browse $ do
-- Disable logging output
setErrHandler $ const (return ())
setOutHandler $ const (return ())
setAllowRedirects True
request $ getRequest url

Resources