Haskell http response result unreadable - http

import Network.URI
import Network.HTTP
import Network.Browser
get :: URI -> IO String
get uri = do
let req = Request uri GET [] ""
resp <- browse $ do
setAllowRedirects True -- handle HTTP redirects
request req
return $ rspBody $ snd resp
main = do
case parseURI "http://cn.bing.com/search?q=hello" of
Nothing -> putStrLn "Invalid search"
Just uri -> do
body <- get uri
writeFile "output.txt" body
Here is the diff between haskell output and curl output

It's probably not a good idea to use String as the intermediate data type here, as it will cause character conversions both when reading the HTTP response, and when writing to the file. This can cause corruption if these conversions are nor consistent, as it would appear they are here.
Since you just want to copy the bytes directly, it's better to use a ByteString. I've chosen to use a lazy ByteString here, so that it does not have to be loaded into memory all at once, but can be streamed lazily into the file, just like with String.
import Network.URI
import Network.HTTP
import Network.Browser
import qualified Data.ByteString.Lazy as L
get :: URI -> IO L.ByteString
get uri = do
let req = Request uri GET [] L.empty
resp <- browse $ do
setAllowRedirects True -- handle HTTP redirects
request req
return $ rspBody $ snd resp
main = do
case parseURI "http://cn.bing.com/search?q=hello" of
Nothing -> putStrLn "Invalid search"
Just uri -> do
body <- get uri
L.writeFile "output.txt" body
Fortunately, the functions in Network.Browser are overloaded so that the change to lazy bytestrings only involves changing the request body to L.empty, replacing writeFile with L.writeFile, as well as changing the type signature of the function.

Related

Nimlang: Async program does not compile

I'm trying to write a HTTP server that sends a HTTP request and returns the content to client.
Here is the code:
import asynchttpserver, asyncdispatch
import httpClient
let client = newHttpClient()
var server = newAsyncHttpServer()
proc cb(req: Request) {.async.} =
let content = client.getContent("http://google.com")
await req.respond(Http200, content)
waitFor server.serve(Port(8080), cb)
However, I obtain the following compile error message (nim v1.0.0):
Error: type mismatch: got <AsyncHttpServer, Port, proc (req: Request): Future[system.void]{.locks: <unknown>.}>
but expected one of:
proc serve(server: AsyncHttpServer; port: Port;
callback: proc (request: Request): Future[void] {.closure, gcsafe.};
address = ""): owned(Future[void])
first type mismatch at position: 3
required type for callback: proc (request: Request): Future[system.void]{.closure, gcsafe.}
but expression 'cb' is of type: proc (req: Request): Future[system.void]{.locks: <unknown>.}
This expression is not GC-safe. Annotate the proc with {.gcsafe.} to get extended error information.
expression: serve(server, Port(8080), cb)
The serve function expects another expression but do not know how to fix it.
Surprisingly, the code compiles perfectly fine when I remove the HTTP request from the server callback "cb". Does this mean that the serve function expects different callback expressions depending on the callback body ?
OK the problem is that the HttpClient is a global variable and is used in the callback function "cb". As a result the callback function is not GC safe.
So it is enough to instantiate the HttpClient within the callback function:
import asynchttpserver, asyncdispatch
import httpClient
var server = newAsyncHttpServer()
proc cb(req: Request) {.async.} =
let client = newHttpClient()
let content = client.getContent("https://google.com")
await req.respond(Http200, content)
waitFor server.serve(Port(8080), cb)

Elm 0.19: How to obtain request body when receiving BadStatus with elm/http 2.0.0

elm/http 1.0.0 defined Http.Error as
type Error
= BadUrl String
| Timeout
| NetworkError
| BadStatus (Response String)
| BadPayload String (Response String)
but 2.0.0 changed it to
type Error
= BadUrl String
| Timeout
| NetworkError
| BadStatus Int
| BadBody String
When receiving BadStatus I cannot obtain the body of the request, only the status code. In the docs Evan suggests a solution for this, but I don't understand how to make it work.
If we defined our own expectJson similar to
expectJson : (Result Http.Error a -> msg) -> D.Decoder a -> Expect msg
expectJson toMsg decoder =
expectStringResponse toMsg <|
\response ->
case response of
Http.BadStatus_ metadata body ->
Err (Http.BadStatus metadata.statusCode)
...
Then we have access to the metadata and body, but how do I use them? Should I define my own myBadStatus and return that instead?
Http.BadStatus_ metadata body ->
Err (myBadStatus metadata.statusCode body)
Would this work?
What I need is to convert the following code:
myErrorMessage : Http.Error -> String
myErrorMessage error =
case error of
Http.BadStatus response ->
case Decode.decodeString myErrorDecoder response.body of
Ok err ->
err.message
Err e ->
"Failed to parse JSON response."
...
Thank you.
Edit 22/4/2019: I updated this answer for version 2.0+ of http-extras which has some API changes. Thanks to Berend de Boer for pointing this out!
The answer below gives a solution using a package I wrote (as per request), but you don't have to use the package! I wrote an entire article on how to extract detailed information from an HTTP response, it includes multiple Ellie examples that don't require the package, as well as an example that uses the package.
As Francesco mentioned, I created a package for exactly this purpose, using a similar approach described in the question: https://package.elm-lang.org/packages/jzxhuang/http-extras/latest/.
Specifically, the module to use Http.Detailed. It defines an Error type that keeps the original body around on error:
type Error body
= BadUrl String
| Timeout
| NetworkError
| BadStatus Metadata body Int
| BadBody Metadata body String
Make a request like so:
type Msg
= MyAPIResponse (Result (Http.Detailed.Error String) ( Http.Metadata, String ))
sendRequest : Cmd Msg
sendRequest =
Http.get
{ url = "/myapi"
, expect = Http.Detailed.expectString MyAPIResponse
In your update, handle the result including decoding the body when it is BadStatus:
update msg model =
case msg of
MyAPIResponse httpResponse ->
case httpResponse of
Ok ( metadata, respBody ) ->
-- Do something with the metadata if you need! i.e. access a header
Err error ->
case error of
Http.Detailed.BadStatus metadata body statusCode ->
-- Try to decode the body the body here...
...
...
Thanks the Francisco for reaching out to me about this, hopefully this answer helps anyone who faces the same problem as OP.

simpleHttp causing 'unsupported browser response?'

I'm executing a simpleHttp request to a https domain, yet the response html is showing 'unsupported browser' messages -- i believe this is because simpleHttp does not support HTTPS.
My function:
import Network.HTTP.Simple
makeRequest :: IO LAZ.ByteString
makeRequest = do
response <- simpleHttp "https://www.example.com"
return (response)
Which haskell libraries support https?
Wreq provides a very easy to follow tutorial on http/s requests using basic lens syntax.
A https compatible request is as simple as:
main = do
r <- get "https://www.example.com"
Response statuses and bodies can be accessed respectively:
r ^. responseStatus . statusCode
r ^. responseBody
This code doesn't compile. Even adding in the LAZ import, the Network.HTTP.Simple module does not provide the simpleHttp function. You can do this with httpLBS:
{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple
import qualified Data.ByteString.Lazy as LAZ
makeRequest :: IO LAZ.ByteString
makeRequest = do
response <- httpLBS "https://www.example.com"
return (getResponseBody response)
main :: IO ()
main = makeRequest >>= LAZ.putStr
Or by using the simpleHttp function from Network.HTTP.Conduit:
{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Conduit
import qualified Data.ByteString.Lazy as LAZ
makeRequest :: IO LAZ.ByteString
makeRequest = simpleHttp "https://www.example.com"
main :: IO ()
main = makeRequest >>= LAZ.putStr
Note that wreq uses the same HTTP engine under the surface as http-conduit (http-client). My guess is that you were originally trying to use one of the functions from http-client itself, but I'm not sure what that code would have looked like.

Haskell Https Get Proxy Request

I have a working getRequest via a Proxy :
main = do
rsp <- browse $ do
setProxy . fromJust $ parseProxy "128.199.232.117:3128"
request $ getRequest "https://www.youtube.com/watch?v=yj_wyw6Xrq4"
print $ rspBody <$> rsp
But it's htpps and so basically I get an Exception. But I foud out here that it can also work with htpps :
import Network.Connection (TLSSettings (..))
import Network.HTTP.Conduit
main :: IO ()
main = do
request <- parseUrl "https://github.com/"
let settings = mkManagerSettings (TLSSettingsSimple True False False) Nothing
manager <- newManager settings
res <- httpLbs request manager
print res
But I have no idea how to integrate this into my Proxy getRequest Code?
Could someone show me please? Thanks
Looks like you are using HTTP package it the first snippet and http-conduit in the second one.
Unfortunately HTTP doesn't support https, so your can't "integrate" the second snippet into the first one. But http-conduit supports proxies, so you can use addProxy function to set proxy host and port (not tested):
{-# LANGUAGE OverloadesStrings #-}
...
request <- do
req <- parseUrl "https://github.com/"
return $ addProxy "128.199.232.117" 3128 req
...

Increasing request timeout for Network.HTTP.Conduit

I use the http-conduit library version 2.0+ to fetch the contents from a HTTP webservice:
import Network.HTTP.Conduit
main = do content <- simpleHttp "http://stackoverflow.com"
print $ content
As stated in the docs, the default timeout is 5 seconds.
Note: This question was answered by me immediately and therefore intentionally does not show further research effort.
Similar to this previous question you can't do that with simpleHttp alone. You need to use a Manager together with httpLbs in order to be able to set the timeout.
Note that you don't need to set the timeout in the manager but you can set it for each request individually.
Here is a full example that behaves like your function above, but allows you to modify the timeout:
import Network.HTTP.Conduit
import Control.Monad (liftM)
import qualified Data.ByteString.Lazy.Char8 as LB
-- | A simpleHttp alternative that allows to specify the timeout
-- | Note that the timeout parameter is in microseconds!
downloadHttpTimeout :: Manager -> String -> Int -> IO LB.ByteString
downloadHttpTimeout manager url timeout = do req <- parseUrl url
let req' = req {responseTimeout = Just timeout}
liftM responseBody $ httpLbs req' manager
main = do manager <- newManager conduitManagerSettings
let timeout = 15000000 -- Microseconds --> 15 secs
content <- downloadHttpTimeout manager "http://stackoverflow.com" timeout
print $ content
I've found the following to be a version of Uli's downloadHttpTimeout that resembles simpleHTTP more closely:
simpleHTTPWithTimeout :: Int -> Request a -> IO (Response LB.ByteString)
simpleHTTPWithTimeout timeout req =
do mgr <- newManager tlsManagerSettings
let req = req { responseTimeout = Just timeout }
httpLbs req mgr
the only difference from simpleHTTP being a slightly different return type, so to extract e.g. the response body, one uses conduit's responseBody not Network.HTTP.getResponseBody.

Resources