Mongolite Error: Failed to read 4 bytes: socket error or timeout - r

I was trying to query a mongo database for all of the id's included in the database so I could compare the list to a separate data frame. However, when I attempt to find all sample_id fields I'm presented with:
Error: Failed to read 4 bytes: socket error or timeout
An example of the find query:
library(mongolite)
mongo <- mongo(collection,url = paste0("mongodb://", user,":",pass, "#", mongo_host, ":", port,"/",db))
mongo$find(fields = '{"sample_id":1,"_id":0}')
# Error: Failed to read 4 bytes: socket error or timeout
As the error indicates, this is probably due to some internal socket timeout problem due to the large amount of data. However, in the mongo documentation the default is set to never timeout.
socketTimeoutMS:
The time in milliseconds to attempt a send or receive on a socket before the attempt times out. The default is never to timeout, though different drivers might vary. See the driver documentation.
So my question was why does this error occur when using mongolite? I think I've solved it but I'd welcome any additional information or input.

The simple answer is that, as indicated in the above quote from the mongo documenation, "different drivers might vary". In this case the default for mongolite is 5 minutes, found in this github issue, I'm guessing it's related to the C drivers.
The default socket timeout for connections is 5 minutes. This means
that if your MongoDB server dies or becomes unavailable it will take 5
minutes to detect this. You can change this by providing
sockettimeoutms= in your connection URI.
Also noted in the github issue is a solution which is to increase the sockettimeoutms in the URI. At the end of the connection URI you should add ?sockettimeoutms=1200000 as an option to increase the length of time (20 minutes in this case) before a socket timeout. Modifying the original example code:
library(mongolite)
mongo <- mongo(collection,url = paste0("mongodb://", user,":",pass, "#", mongo_host, ":", port,"/",db,"?sockettimeoutms=1200000"))
mongo$find(fields = '{"sample_id":1,"_id":0}')

Laravel: in your database.php 'sockettimeoutms' => '1200000', add this and enjoy the ride

Related

Google Cloud Bigtable: repeated grpc error code 13, then suddenly success

In short, we are sometimes seeing that a small number of Cloud Bigtable queries fail repeatedly (for 10s or even 100s of times in a row) with the error rpc error: code = 13 desc = "server closed the stream without sending trailers" until (usually) the query finally works.
In detail, our setup is as follows:
We are running a collection (< 10) of Go services on Google Compute Engine. Each service leases tasks from a pair of PULL task queues. Each task contains an ID of a bigtable row. The task handler executes the following query:
row, err := tbl.ReadRow(ctx, <my-row-id>,
bigtable.RowFilter(bigtable.ChainFilters(
bigtable.FamilyFilter(<my-column-family>),
bigtable.LatestNFilter(1))))
If the query fails then the task handler simply returns. Since we lease tasks with a lease time between 10 and 15 minutes, a little while later the lease will expire on that task, it will be lease again, and we'll retry. The tasks have a max retry of 1000 so they can be retried many times over a long period. In a small number of cases, a particular task will fail with the grpc error above. The task will typically fail with this same error every time it runs for hours or days on end, before (seemingly out of the blue) eventually succeeding (or the task runs out of retries and dies).
Since this often takes so long, it seems unrelated to server load. For example right now on a Sunday morning, these servers are very lightly loaded, and yet I see plenty of these errors when I tail the logs. From this answer, I had originally thought that this might be due to trying to query for a large amount of data, perhaps near the max limit that cloud bigtable will support. However I now see that this is not the case; I can find many examples where tasks that have failed many times finally succeed and report only a small amount of data (e.g. <1 MB) was retrieved.
What else should I be looking at here?
edit: From further testing I now know that this is completely machine (client) independent. If I tail the log on one of the task leasing machines, wait for a "server closed the stream without sending trailers" error, and then try a one-off ReadRow query to the same rowId from another, unrelated, totally unused machine, I get the same error repeatedly.
This error is typically caused by having more than 256MB of data in your reply.
However, there is currently a bug in our server side error handling code that allows some invalid characters in HTTP/2 trailers which is not allowed by the spec. This means that some error messages that have invalid characters will be seen as this kind of error. This should be fixed early next year.

ODP.NET or Oracle causing 03113 end of file on communication channel

I've got a console program that does the following using the enterprise library and oracle ODP.NET
Open connection
Search for some data
Close connection
Wait a small period of time
Repeat
After somewhere between 30 minutes and 3 hours (typically it's about 2 hours) I get an oracle 03113 end of file communication.
I've traced it to it getting a connection (well, thinking it has a connection) then it fails on the execution of a statement (i.e. the first time it tries to use the connection).
Typically (i'm in development), step 2 returns no data and always uses the same sql SELECT.
I've fixed it by adding 'Pooling=false' to the DSN:
<add name="DEV64" connectionString="Data Source=dev3;User Id=XXX;Password=YYY;Pooling=false" providerName="System.Data.OracleClient"/>
Obviously this isn't ideal, but I've no idea why it's doing this?
I have no firewall between me and the database and I tried ORA-03113: end-of-file on communication channel after long inactivity in ASP.Net app which had one solution of:
Validate Connection = True
But that just crashes with the driver not knowing this.
The code to do this, btw, is something like:
Get my context:
dac = new DataAccessContext(EnterpriseLibraryContainer.Current.GetInstance<Database>(AppSettingsHandler.ConnectionStringName));
open the connection:
Context.DBConnection = new DataAccessConnection(Context.Database.CreateConnection());
Context.DBConnection.Connection.Open();
return Context.DBConnection.Connection;
read the data, typically where it falls over, presumably a stale connection of sorts:
DataSet ds = context.Database.ExecuteDataSet(CommandType.Text, query);

Error when running TcmReindex.exe

I am currently trying to get search working in my Tridion 2011 installation. I read in another article that I should run the TcmReIndex.exe tool in the Tridion/bin folder to re-index all my sites. So I tried this and it failed with a message box giving the following details
Unable to get list of Publication items.
Unable to Intialize TDSE object.
The wait operation timed out
Connection Timeout Expired. The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement. This could be because the pre-login handshake failed or the server was unable to respond back in time. The duration spent while attempting to connect to this server was - [Pre-Login] initialization=21054; handshake=35;
The wait operation timed out
A database error occurred while executing Stored Procedure "EDA_TRUSTEES_GETTRUSTEEETOKEN"
I have four fairly large publications (100 000+ items in total) which I am trying to index.
Any ideas?
Whenever I get "Unable to Intialize TDSE object." errors, I typically write a small test script using VBScript, and try running it on the CMS server. Whilst this does not directly solve the problem, it often gives some insight into the issue by logging information in the event viewer. Try creating a test.vbs file as follows and running it:
Set tdse = CreateObject("TDS.TDSE")
tdse.initialize()
msgbox(tdse.User.Description)
Set tdse = Nothing
If it throws any errors, please let me know, and it may help us solve the problem. If it gives you a popup with your user description, then I am completely barking up the wrong tree.
I haven't come to anything conclusive but it seems like my issue may have been a temporary one as it just started working. I did increase all timeouts in Tridion MMC > Timeout Settings by 100 times their amounts but I suspect that this wasn't the issue, when it works the connection is almost instant.
If anyone else has this issue
Restart the computer the content manager is installed on, try again.
Wait an hour or two, try again.
Increase timeouts, try again.
I've run the process a few more times and it seems to be working correctly.

What could cause a message (from a polling receive location) to be ignored by subscribing orchestration?

I'll try provide as much information as possible:
No error message.
The instance stays in the "ready service instances".
The receive location has the same parameters (except URI, the three polling queries, user account/pw and receive pipeline) as another receive location that points to another database/table which works.
The pipeline is waiting for the correct schema.
The port surface and receive location are both waiting for the correct schema.
In my test example, there are only 10 lines being returned.
The message, which contains those 10 lines, validates against the schema.
I tried to let the instance alone to no avail - 30+ minutes - and no change in its condition.
I had also tried suspending and then resuming it which then places the instance in the "dehydrated orchestrations" list. Again, with no error message.
I'm able to get the message by looking at the body of the message that's in the "ready to run" service. (This is the message that validates versus the schema I use in Visual Studio.)
How might something like this arise?
Stupid question, but I have to ask... Is the corresponding host instance running?

Cannot enqueue large Oracle AQ message

I am trying to Enqueue a message onto an Oracle Queue from a .NET client. If the message exceeds a certain size, the following error occurs:
ORA-01013: user requested cancel of current operation
This happens with both XMLTYPE and raw as the queue table's message type.
It seems that the size of the message is to blame but cannot tell for sure be cause of the limited Oracle error message.
Is there a limit on the size, can I increase the size or is there another way to overcome this issue?
Update:
I am able to send the message directly with dbms_aq.enqueue(...)
Setting the timeout from the .NET client does not have any effect. (It times out immediately regardless of the timeout value)
This sounds like a connection timeout from the .net client. Try increasing the timeout. If that doesn't work, check if the issue is with the message payload by inserting the message directly through dbms_aq.enqueue(...). If you are able insert, then the message itself is fine.
There are a couple of size-related issues being fixed for 11.2.0.3. See this non-authoritative list here:
http://www.eygle.com/Notes/11.2.0.3.html
Some examples:
9878459 Specific length object binds over 4k may be bound as NULL
10389881 Raw buffered message payload > 8k corrupted when dequeued
from a buffered queue
Maybe, your issue is in this list?

Resources