I'm trying to run a scraper of which the output log ends as follows:
2017-04-25 20:22:22 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <429 http://www.apkmirror.com/apk/instagram/instagram-instagram/instagram-instagram-9-0-0-34920-release/instagram-9-0-0-4-android-apk-download/>: HTTP status code is not handled or not allowed
2017-04-25 20:22:22 [scrapy.core.engine] INFO: Closing spider (finished)
2017-04-25 20:22:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 16048410,
'downloader/request_count': 32902,
'downloader/request_method_count/GET': 32902,
'downloader/response_bytes': 117633316,
'downloader/response_count': 32902,
'downloader/response_status_count/200': 121,
'downloader/response_status_count/429': 32781,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 4, 25, 18, 22, 22, 710446),
'log_count/DEBUG': 32903,
'log_count/INFO': 32815,
'request_depth_max': 2,
'response_received_count': 32902,
'scheduler/dequeued': 32902,
'scheduler/dequeued/memory': 32902,
'scheduler/enqueued': 32902,
'scheduler/enqueued/memory': 32902,
'start_time': datetime.datetime(2017, 4, 25, 17, 54, 36, 621481)}
2017-04-25 20:22:22 [scrapy.core.engine] INFO: Spider closed (finished)
In short, of the 32,902 requests, only 121 are successful (response code 200) whereas the remainder receives 429 for 'too many requests' (cf. https://httpstatuses.com/429).
Are there any recommended ways to get around this? To start with, I'd like to have a look at the details of the 429 response rather than just ignoring it, as it may contain a Retry-After header indicating how long to wait before making a new request.
Also, if the requests are made using Privoxy and Tor as described in http://blog.michaelyin.info/2014/02/19/scrapy-socket-proxy/, it may be possible to implement retry middleware which makes Tor change its IP address when this occurs. Are there any public examples of such code?
You can modify the retry middleware to pause when it gets error 429. Put this code below in middlewares.py
from scrapy.downloadermiddlewares.retry import RetryMiddleware
from scrapy.utils.response import response_status_message
import time
class TooManyRequestsRetryMiddleware(RetryMiddleware):
def __init__(self, crawler):
super(TooManyRequestsRetryMiddleware, self).__init__(crawler.settings)
self.crawler = crawler
#classmethod
def from_crawler(cls, crawler):
return cls(crawler)
def process_response(self, request, response, spider):
if request.meta.get('dont_retry', False):
return response
elif response.status == 429:
self.crawler.engine.pause()
time.sleep(60) # If the rate limit is renewed in a minute, put 60 seconds, and so on.
self.crawler.engine.unpause()
reason = response_status_message(response.status)
return self._retry(request, reason, spider) or response
elif response.status in self.retry_http_codes:
reason = response_status_message(response.status)
return self._retry(request, reason, spider) or response
return response
Add 429 to retry codes in settings.py
RETRY_HTTP_CODES = [429]
Then activate it on settings.py. Don't forget to deactivate the default retry middleware.
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
'flat.middlewares.TooManyRequestsRetryMiddleware': 543,
}
Wow, your scraper is going really fast, over 30,000 requests in 30 minutes. That's more than 10 requests per second.
Such a high volume will trigger rate limiting on bigger sites and will completely bring down smaller sites. Don't do that.
Also this might even be too fast for privoxy and tor, so these might also be candidates for those replies with a 429.
Solutions:
Start slow. Reduce the concurrency settings and increase DOWNLOAD_DELAY so you do at max 1 request per second. Then increase these values step by step and see what happens. It might sound paradox, but you might be able to get more items and more 200 response by going slower.
If you are scraping a big site try rotating proxies. The tor network might be a bit heavy handed for this in my experience, so you might try a proxy service like Umair is suggesting
Building upon Aminah Nuraini's answer, you can use Twisted's Deferreds to avoid breaking asynchrony by calling time.sleep()
from twisted.internet import reactor, defer
from scrapy.downloadermiddlewares.retry import RetryMiddleware
from scrapy.utils.response import response_status_message
async def async_sleep(delay, return_value=None):
deferred = defer.Deferred()
reactor.callLater(delay, deferred.callback, return_value)
return await deferred
class TooManyRequestsRetryMiddleware(RetryMiddleware):
"""
Modifies RetryMiddleware to delay retries on status 429.
"""
DEFAULT_DELAY = 60 # Delay in seconds.
MAX_DELAY = 600 # Sometimes, RETRY-AFTER has absurd values
async def process_response(self, request, response, spider):
"""
Like RetryMiddleware.process_response, but, if response status is 429,
retry the request only after waiting at most self.MAX_DELAY seconds.
Respect the Retry-After header if it's less than self.MAX_DELAY.
If Retry-After is absent/invalid, wait only self.DEFAULT_DELAY seconds.
"""
if request.meta.get('dont_retry', False):
return response
if response.status in self.retry_http_codes:
if response.status == 429:
retry_after = response.headers.get('retry-after')
try:
retry_after = int(retry_after)
except (ValueError, TypeError):
delay = self.DEFAULT_DELAY
else:
delay = min(self.MAX_DELAY, retry_after)
spider.logger.info(f'Retrying {request} in {delay} seconds.')
spider.crawler.engine.pause()
await async_sleep(delay)
spider.crawler.engine.unpause()
reason = response_status_message(response.status)
return self._retry(request, reason, spider) or response
return response
The line await async_sleep(delay) blocks process_response's execution until delay seconds have passed, but Scrapy is free to do other stuff in the meantime. This async/await corutine syntax was introduced in Python 3.5 and support for it was added in Scrapy 2.0.
It's still necessary to modify settings.py as in the original answer.
You can use HTTPERROR_ALLOWED_CODES =[404,429]. I was getting 429 HTTP code and I just allowed it and then problem fixed. You can allow the HTTP code that you are getting in terminal. This may be solve your problem.
Here is what I found, a simple trick
import scrapy
import time ## just add this line
BASE_URL = 'your any url'
class EthSpider(scrapy.Spider):
name = 'eth'
start_urls = [
BASE_URL.format(1)
]
pageNum = 2
def parse(self, response):
data = response.json()
for i in range(len(data['data']['list'])):
yield data['data']['list'][i]
next_page = 'next page url'
time.sleep(0.2) # and add this line
if EthSpider.pageNum <= data['data']['page']:
EthSpider.pageNum += 1
yield response.follow(next_page, callback=self.parse)
Consider the following (based on sockserv from LYSE)
%%% The supervisor in charge of all the socket acceptors.
-module(tcpsocket_sup).
-behaviour(supervisor).
-export([start_link/0, start_socket/0]).
-export([init/1]).
start_link() ->
supervisor:start_link({local, ?MODULE}, ?MODULE, []).
init([]) ->
{ok, Port} = application:get_env(my_app,tcpPort),
{ok, ListenSocket} = gen_tcp:listen(
Port,
[binary, {packet, 0}, {reuseaddr, true}, {active, true} ]),
lager:info(io_lib:format("Listening for TCP on port ~p", [Port])),
spawn_link(fun empty_listeners/0),
{ok, {{simple_one_for_one, 60, 3600},
[{socket,
{tcpserver, start_link, [ListenSocket]},
temporary, 1000, worker, [tcpserver]}
]}}.
start_socket() ->
supervisor:start_child(?MODULE, []).%,
empty_listeners() ->
[start_socket() || _ <- lists:seq(1,20)],
ok.
%%%-------------------------------------------------------------------
%%% #author mylesmcdonnell
%%% #copyright (C) 2015, <COMPANY>
%%% #doc
%%%
%%% #end
%%% Created : 06. Feb 2015 07:49
%%%-------------------------------------------------------------------
-module(tcpserver).
-author("mylesmcdonnell").
-behaviour(gen_server).
-record(state, {
next,
socket}).
-export([start_link/1]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2, code_change/3, terminate/2]).
-define(SOCK(Msg), {tcp, _Port, Msg}).
-define(TIME, 800).
-define(EXP, 50).
start_link(Socket) ->
gen_server:start_link(?MODULE, Socket, []).
init(Socket) ->
gen_server:cast(self(), accept),
{ok, #state{socket=Socket}}.
handle_call(_E, _From, State) ->
{noreply, State}.
handle_cast(accept, S = #state{socket=ListenSocket}) ->
{ok, AcceptSocket} = gen_tcp:accept(ListenSocket),
kvstore_tcpsocket_sup:start_socket(),
receive
{tcp, Socket, <<"store",Value/binary>>} ->
Uid = kvstore:store(Value),
send(Socket,Uid);
{tcp, Socket, <<"retrieve",Key/binary>>} ->
case kvstore:retrieve(binary_to_list(Key)) of
[{_, Value}|_] ->
send(Socket,Value);
_ ->
send(Socket,<<>>)
end;
{tcp, Socket, _} ->
send(Socket, "INVALID_MSG")
end,
{noreply, S#state{socket=AcceptSocket, next=name}}.
handle_info(_, S) ->
{noreply, S}.
code_change(_OldVsn, State, _Extra) ->
{ok, State}.
terminate(normal, _State) ->
ok;
terminate(_Reason, _State) ->
lager:info("terminate reason: ~p~n", [_Reason]).
send(Socket, Bin) ->
ok = gen_tcp:send(Socket, Bin),
ok = gen_tcp:close(Socket),
ok.
I'm unclear on how each tcpserver process is terminated? Is this leaking processes?
I don't see any place that you are terminating the owning process.
I think what you are looking for are four cases:
The client terminates the connection (you receive tcp_closed)
The connection goes wonky (you receive tcp_error)
The server receives a system message to terminate (this could, of course, just be the supervisor killing it, or a kill message)
The client sends a message telling the server its done and you want to do some clean up other than just reacting to tcp_closed.
The most common case is usually the client just closes the connection, and for that you want something like:
handle_info({tcp_closed, _}, State) ->
{stop, normal, State};
The connection getting weird is always a possibility. I can't think of any time I want to have the owning process or the socket stick around, so:
%% You might want to log something here.
handle_info({tcp_error, _}, State) ->
{stop, normal, State};
And any case where the client tells the server its done and you need to do cleanup based on the client having done something successful (maybe you have resources open that should be written to first, or a pending DB transaction open, or whatever) you would want to expect a success message from the client that closes the connection the way your send/2 does, and returns {stop, normal, State} to halt the process.
The key here is making sure you identify the cases where you want to end the connection and either have the server process killed or (better) return {stop, Reason, State}.
As written above, if you intend send/2 to be a single response and a clean exit (or really, that every accept cast should result in a single send/2 and then termination), then you want:
handle_cast(accept, S = #state{socket=ListenSocket}) ->
{ok, AcceptSocket} = gen_tcp:accept(ListenSocket),
kvstore_tcpsocket_sup:start_socket(),
receive
%% stuff that results in a call to send/2 in any case.
end,
{stop, normal, S}.
The case LYSE demonstrates is one where the connection is persistent and there is ongoing back-and-forth between a client and server. In the case above you are handling a single request, spawning a new listener to re-fill the listener pool, and should be exiting because you have no plan of this gen_server doing any further work.
Can anyone help me solve this problem, I have the stacktrace of the problem, but can't understand what the trace actually means.
The error occurs when I try to retrieve all data from a bucket, in a Riak Database. And I am using java-riak-client library as the ORM. I can figure out that its a MapReduce problem, but other than that.....
Here below is the actual stacktrace, I actually could not figure out what error is it pointing to,
and I tried to find out the record its displaying in error.
#Update: Yes the record is there, when i CURL
com.basho.riak.client.RiakException: java.io.IOException: <html><head><title>500 Internal Server Error</title></head><body><h1>Internal Server Error</h1>The server encountered an error while processing this request:<br><pre>{error,
{error,
{case_clause,
{error,
{0,
[{module,riak_kv_mrc_map},
{partition,913438523331814323877303020447676887284957839360},
{details,
[{fitting,
{fitting,<0.21083.23>,#Ref<0.0.31.39954>,follow,1}},
{name,0},
{module,riak_kv_mrc_map},
{arg,{{jsfun,<<"Riak.mapValuesJson">>},none}},
{output,
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,sink,
undefined}},
{options,
[{log,sink},
{trace,[error]},
{sink,
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,
sink,undefined}},
{sink_type,{fsm,10,infinity}}]},
{q_limit,64}]},
{type,forward_preflist},
{error,[preflist_exhausted]},
{input,
{ok,{r_object,<<"xxxx-users">>,
<<"xxxx#hotmail.com-userpass">>,
[{r_content,
{dict,7,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],
[],[],[],[]},
{{[],[],
[[<<"Links">>]],
[],[],[],[],[],[],[],
[[<<"content-type">>,97,112,112,108,
105,99,97,116,105,111,110,47,106,
115,111,110],
[<<"X-Riak-VTag">>,54,98,119,73,73,
84,107,120,66,70,107,86,102,67,103,
71,73,116,120,121,85,53]],
[[<<"index">>]],
[],
[[<<"X-Riak-Last-Modified">>|
{1407,514685,380030}]],
[],
[[<<"charset">>,117,116,102,45,56],
[<<"X-Riak-Meta">>]]}}},
<<"{\"identityId\":{\"userId\":\"xxxx#hotmail.com\",\"providerId\":\"userpass\"},\"firstName\":\"xx\",\"lastName\":\"xx\",\"fullName\":\"xx xx\",\"email\":\"xxxx#hotmail.com\",\"authMethod\":{\"method\":\"userPassword\"},\"passwordInfo\":{\"hasher\":\"bcrypt\",\"password\":\"$2a$10$Gm1VVCM09iyI7TQY7r8B7.Baa.YrtHHgREkQpTIH9ThyW4WzuUeJ.\"}}">>}],
[{<<35,9,254,249,83,228,76,146>>,
{1,63574733885}}],
{dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[[clean|true]],
[]}}},
undefined},
undefined}},
{modstate,
{state,
913438523331814323877303020447676887284957839360,
{fitting_details,
{fitting,<0.21083.23>,#Ref<0.0.31.39954>,
follow,1},
0,riak_kv_mrc_map,
{{jsfun,<<"Riak.mapValuesJson">>},none},
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,sink,
undefined},
[{log,sink},
{trace,[error]},
{sink,
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,
sink,undefined}},
{sink_type,{fsm,10,infinity}}],
64},
{jsfun,<<"Riak.mapValuesJson">>},
none}},
{stack,[]}]}}},
[{riak_kv_wm_mapred,pipe_mapred_nonchunked,3,
[{file,"src/riak_kv_wm_mapred.erl"},{line,180}]},
{webmachine_resource,resource_call,3,
[{file,"src/webmachine_resource.erl"},{line,183}]},
{webmachine_resource,do,3,
[{file,"src/webmachine_resource.erl"},{line,141}]},
{webmachine_decision_core,resource_call,1,
[{file,"src/webmachine_decision_core.erl"},{line,48}]},
{webmachine_decision_core,decision,1,
[{file,"src/webmachine_decision_core.erl"},{line,481}]},
{webmachine_decision_core,handle_request,2,
[{file,"src/webmachine_decision_core.erl"},{line,33}]},
{webmachine_mochiweb,loop,1,
[{file,"src/webmachine_mochiweb.erl"},{line,97}]},
{mochiweb_http,parse_headers,5,
[{file,"src/mochiweb_http.erl"},{line,180}]}]}}</pre><P><HR><ADDRESS>mochiweb+webmachine web server</ADDRESS></body></html>
at com.basho.riak.client.query.MapReduce.execute(MapReduce.java:81)
at models.UserRecordsModel$.getAllUsers(UserRecordsModel.scala:131)
at controllers.DataRetrieval$$anonfun$getRegisteredUserData$1.apply(DataRetrieval.scala:42)
at controllers.DataRetrieval$$anonfun$getRegisteredUserData$1.apply(DataRetrieval.scala:38)
at play.api.mvc.ActionBuilder$$anonfun$apply$10.apply(Action.scala:221)
at play.api.mvc.ActionBuilder$$anonfun$apply$10.apply(Action.scala:220)
at securesocial.core.SecureSocial$SecuredActionBuilder$$anonfun$2$$anonfun$apply$1.apply(SecureSocial.scala:117)
at securesocial.core.SecureSocial$SecuredActionBuilder$$anonfun$2$$anonfun$apply$1.apply(SecureSocial.scala:113)
at scala.Option.map(Option.scala:145)
at securesocial.core.SecureSocial$SecuredActionBuilder$$anonfun$2.apply(SecureSocial.scala:113)
at securesocial.core.SecureSocial$SecuredActionBuilder$$anonfun$2.apply(SecureSocial.scala:112)
at scala.Option.flatMap(Option.scala:170)
at securesocial.core.SecureSocial$SecuredActionBuilder.invokeSecuredBlock(SecureSocial.scala:112)
at securesocial.core.SecureSocial$SecuredActionBuilder.invokeBlock(SecureSocial.scala:146)
at play.api.mvc.ActionBuilder$$anon$1.apply(Action.scala:309)
at play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4$$anonfun$apply$5.apply(Action.scala:109)
at play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4$$anonfun$apply$5.apply(Action.scala:109)
at play.utils.Threads$.withContextClassLoader(Threads.scala:18)
at play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4.apply(Action.scala:108)
at play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4.apply(Action.scala:107)
at scala.Option.map(Option.scala:145)
at play.api.mvc.Action$$anonfun$apply$1.apply(Action.scala:107)
at play.api.mvc.Action$$anonfun$apply$1.apply(Action.scala:100)
at play.api.libs.iteratee.Iteratee$$anonfun$mapM$1.apply(Iteratee.scala:481)
at play.api.libs.iteratee.Iteratee$$anonfun$mapM$1.apply(Iteratee.scala:481)
at play.api.libs.iteratee.Iteratee$$anonfun$flatMapM$1.apply(Iteratee.scala:517)
at play.api.libs.iteratee.Iteratee$$anonfun$flatMapM$1.apply(Iteratee.scala:517)
at play.api.libs.iteratee.Iteratee$$anonfun$flatMap$1$$anonfun$apply$13.apply(Iteratee.scala:493)
at play.api.libs.iteratee.Iteratee$$anonfun$flatMap$1$$anonfun$apply$13.apply(Iteratee.scala:493)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:42)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: <html><head><title>500 Internal Server Error</title></head><body><h1>Internal Server Error</h1>The server encountered an error while processing this request:<br><pre>{error,
{error,
{case_clause,
{error,
{0,
[{module,riak_kv_mrc_map},
{partition,913438523331814323877303020447676887284957839360},
{details,
[{fitting,
{fitting,<0.21083.23>,#Ref<0.0.31.39954>,follow,1}},
{name,0},
{module,riak_kv_mrc_map},
{arg,{{jsfun,<<"Riak.mapValuesJson">>},none}},
{output,
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,sink,
undefined}},
{options,
[{log,sink},
{trace,[error]},
{sink,
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,
sink,undefined}},
{sink_type,{fsm,10,infinity}}]},
{q_limit,64}]},
{type,forward_preflist},
{error,[preflist_exhausted]},
{input,
{ok,{r_object,<<"xxxx-users">>,
<<"xxxx#hotmail.com-userpass">>,
[{r_content,
{dict,7,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],
[],[],[],[]},
{{[],[],
[[<<"Links">>]],
[],[],[],[],[],[],[],
[[<<"content-type">>,97,112,112,108,
105,99,97,116,105,111,110,47,106,
115,111,110],
[<<"X-Riak-VTag">>,54,98,119,73,73,
84,107,120,66,70,107,86,102,67,103,
71,73,116,120,121,85,53]],
[[<<"index">>]],
[],
[[<<"X-Riak-Last-Modified">>|
{1407,514685,380030}]],
[],
[[<<"charset">>,117,116,102,45,56],
[<<"X-Riak-Meta">>]]}}},
<<"{\"identityId\":{\"userId\":\"xxxx#hotmail.com\",\"providerId\":\"userpass\"},\"firstName\":\"xx\",\"lastName\":\"xx\",\"fullName\":\"xx xx\",\"email\":\"xxxx#hotmail.com\",\"authMethod\":{\"method\":\"userPassword\"},\"passwordInfo\":{\"hasher\":\"bcrypt\",\"password\":\"$2a$10$Gm1VVCM09iyI7TQY7r8B7.Baa.YrtHHgREkQpTIH9ThyW4WzuUeJ.\"}}">>}],
[{<<35,9,254,249,83,228,76,146>>,
{1,63574733885}}],
{dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[[clean|true]],
[]}}},
undefined},
undefined}},
{modstate,
{state,
913438523331814323877303020447676887284957839360,
{fitting_details,
{fitting,<0.21083.23>,#Ref<0.0.31.39954>,
follow,1},
0,riak_kv_mrc_map,
{{jsfun,<<"Riak.mapValuesJson">>},none},
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,sink,
undefined},
[{log,sink},
{trace,[error]},
{sink,
{fitting,<0.21081.23>,#Ref<0.0.31.39954>,
sink,undefined}},
{sink_type,{fsm,10,infinity}}],
64},
{jsfun,<<"Riak.mapValuesJson">>},
none}},
{stack,[]}]}}},
[{riak_kv_wm_mapred,pipe_mapred_nonchunked,3,
[{file,"src/riak_kv_wm_mapred.erl"},{line,180}]},
{webmachine_resource,resource_call,3,
[{file,"src/webmachine_resource.erl"},{line,183}]},
{webmachine_resource,do,3,
[{file,"src/webmachine_resource.erl"},{line,141}]},
{webmachine_decision_core,resource_call,1,
[{file,"src/webmachine_decision_core.erl"},{line,48}]},
{webmachine_decision_core,decision,1,
[{file,"src/webmachine_decision_core.erl"},{line,481}]},
{webmachine_decision_core,handle_request,2,
[{file,"src/webmachine_decision_core.erl"},{line,33}]},
{webmachine_mochiweb,loop,1,
[{file,"src/webmachine_mochiweb.erl"},{line,97}]},
{mochiweb_http,parse_headers,5,
[{file,"src/mochiweb_http.erl"},{line,180}]}]}}</pre><P><HR><ADDRESS>mochiweb+webmachine web server</ADDRESS></body></html>
at com.basho.riak.client.raw.http.ConversionUtil.convert(ConversionUtil.java:589)
at com.basho.riak.client.raw.http.HTTPClientAdapter.mapReduce(HTTPClientAdapter.java:386)
at com.basho.riak.client.query.MapReduce.execute(MapReduce.java:79)
... 36 more
The stack trace is telling us that there was a case clause exception at line 180 of the fie riak_kv_wm_mapred.erl
The clause at that line is handling the responses for the pipe processing the map phase, which appears to be returning the error preflist_exhausted, which is not explicitly handled by the case statement.
That error usually indicates that one or more vnodes were overloaded or otherwise unavailable, and fallbacks had not yet started to take over their workload.
The affected partition was 913438523331814323877303020447676887284957839360, the console.log and error.log may have further details about what happened.