I have an iOS app connected to Firebase Crashlytics. For some crashes the console does not show stack traces (or any other data for that matter), only an error message saying "There was an error loading your session":
This does not happen for all crash events: for some of them, crash data is properly shown. I have tried to get in touch with Firebase support, but no luck so far.
Is anyone experiencing this issue? Any hints about what might be causing it?
Alternatively is there a way to download the raw crash data from Crashlytics?
Firebase tech support has confirmed that the problem is that their backend is having trouble storing the crash data:
what appears to be happening is that the stacktraces for these crashes are too large for Crashlytics to persist to storage. Normally, Crashlytics removes extra threads and manages to store the crash, but if the frames within the threads are too large, we would encounter issues storing the crashes.
Also they say that since the problem is that they did not manage to store the crashes, they cannot provide any way to download the raw crash data either.
They further mention that they are considering implementing ways to increase our size capacity, but they don't have a specific timeline for that, and they consider this a "feature request".
Needless to say I strongly disagree on this last bit, given that:
On iOS it is not really possible to install more than one exception handler / crash reporter
So basically you need to choose WHO you will be sending your crash reports to
If the crash reporting engine then fails to persist your data, then the data is essentially lost
It's difficult to understand how "not losing your data" is considered a feature request rather than a severe bug.
I have found a workaround for this.
If you manually search for the issue title in the issues list and open it from the search tool, the stack trace appears.
Related
I'm using Firebase and Firestore. My web app was throwing an error when rendering:
FirebaseError: Missing or insufficient permissions
but the stacktrace is very long and all library code as far as I can tell, making it nearly impossible to determine what code is actually causing this.
After lots of hunting through my code base I determined that a library was trying to fetch data from /profiles/{uid}, and fixed the error, but if I had some better debugging tools this would have been much faster.
Is there a way to get the path that was attempting to be read from the error, or any other metadata related to the error? I checked monitoring and logs in Firebase web console but couldn't find anything. A log of Firestore denies, including the attempted path would be very helpful for this purpose.
According to this answer by Doug Stevenson there's no way of logging this since it may reveal security measures to a potential attacker.
Still, you should find more logs if you use the local emulator to test your queries before deploying.
If crash happened on mobile device, how developer team can receive it?
What should be logged to restore what happened? Just actions on objects and page transitions?
If my markup will looks wrong on some devices or application will behave strange or come to weird state, I want functionality to collect screenshot and info from device and send it. What is the best practices here?
The question is about sending the crash stack trace and logs out. Not about QML app per se but about its C++ base or just about C++ app if we have one. The app should have logging enabled and collect its activity info, maybe for the period of time or until the logs get large enough. We were splitting log in chunks files and removing the oldest after we've accumulated, say, 5 of 100kb chunks.
Crash stack/minidump. Both call stack for all threads and the time of the crash plus minidump of the code with all variables visible can be collected.
How to send the log and crash stack/minidump out? There solutions like BreakPad we supposed to link with/ enable in the app code. The app will take care of sending all the crash info out when it runs again after the crash.
Quite a few things to implement, no to mention the web service that collects the crash info from client apps.
And you have to have "symbols" for the app release code kept in order to be able to trace the stack and see variable values at the time of a crash.
I will start with TL;DR version as this may be enough for some of you:
We are trying to investigate an issue that we see in diagnostic data of our C++ product.
The issue was pinpointed to be caused by timeout on sqlite3_open_v2 which supposedly takes over 60s to complete (we only give it 60s).
We tried multiple different configurations, but never were able to reproduce even 5s delay on this call.
So the question is if maybe there are some known scenarios in which sqlite3_open_v2 can take that long (on windows)?
Now to the details:
We are using version 3.10.2 of SQLite. We went through changelogs from this version till now and nothing we've found in the bugfixes section seems to suggest that there was some issue that was addressed in consecutive SQLite releases and may have caused our problem.
The issue we see affects around 0.1% unique user across all supported versions of windows (Win 7, Win 8, Win 10). There are no manual user complaints/reports about that - this can suggest that a problem happens in the context where something serious enough is happening with the user machine/system that he doesn't expect anything to work. So something that indicates system-wide failure is a valid possibility as long as it can possibly happen for 0.1% of random windows users.
There are no data indicating that the same issue ever occurred on Mac which is also supported platform with large enough sample of diagnostic data.
We are using Poco (https://github.com/pocoproject/poco, version: 1.7.2) as a tool for accessing our SQLite database, but we've analyzed the Poco code and it seems that failure on this code level can only (possibly) explain ~1% of all collected samples. This is how we've determined that the problem lies in sqlite3_open_v2 taking a long time.
This happens on both DELETE journal mode as well as on WAL.
It seems like after this problem happens the first time for a particular user each consecutive call to sqlite3_open_v2 takes that long until the user restarts whole application (possibly machine, no way to tell from our data).
We are using following flags setup for sqlite3_open_v2 (as in Poco):
sqlite3_open_v2(..., ..., SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_URI, NULL);
This usually doesn't happen on startup of the application so it's not likely to be caused by something happening while our application is not running. This includes power cuts offs causing data destruction (which tends to return SQLITE_CORRUPT anyway, as mentioned in https://www.sqlite.org/howtocorrupt.html).
We were never able to reproduce this issue locally even though we tried different things:
Multiple threads writing and reading from DB with synchronization required by a particular journaling system.
Keeping SQLite connection open for a long time and working on DB normally in a meanwhile.
Trying to hit HDD hard with other data (dumping /dev/rand (WSL) to multiple files from different processes while accessing DB normally).
Trying to force antivirus software to scan DB on every file access (tested with Avast with basically everything enabled including "scan on open" and "scan on write").
Breaking our internal synchronization required by particular journaling systems.
Calling WinAPI CreateFile with all possible combinations of file sharing options on DB file - this caused issues but sqlite3_open_v2 always returned fast - just with an error.
Calling WinAPI LockFile on random parts of DB file which is btw. nice way of reproducing SQLITE_IOERR, but no luck with reproducing the discussed issue.
Some additional attempts to actually stretch the Poco layer and double-check if our static analysis of codes is right.
We've tried to look for similar issues online but anything somewhat relevant we've found was here sqlite3-open-v2-performance-degrades-as-number-of-opens-increase. This doesn't seem to explain our case though, as the numbers of parallel connections are way beyond what we have as well as what would typical windows users have (unless there is some somewhat popular app exploiting SQLite which we don't know about).
It's very unlikely that this issue is caused by db being accessed through network share as we are putting the DB file inside %appdata% unless there is some pretty standard windows configuration which sets %appdata% to be a remote share.
Do you have any ideas what can cause that issue?
Maybe some hints on what else should we check or what additional diagnostic data that we can collect from users would be useful to pinpoint the real reason why that happens?
Thanks in advance
I am trying to maximize the benefits from an experience.
Also I usually use Enterprise library logging block, I log errors and a portion of statistical information into the database, because it is centralized place to track errors, if database logging failed, Normally it goes to Event Log.
Tracing messages should go into file:
Which choice you believe we should go
1- Only Some tracing messages can be left in code if there is a complex algorithm or unstable module.
OR
2- We should not keep any tracing messages in code, clean it up as soon as bug is resolved.
For database.
I think that Errors raised from SP and functions should be logged into another table in the database, and that exactly what is done by AdventureWorksLT2008 database.
Is it a bad idea to log database events directly to Enterprise library Log table without raising this errors to next tier. I think it is more fixable, because I can put more custom information in the message. of course some errors will not be handled and will reach the next tier.
Any ideas, or comments, something else you do. something you want to clarify.
Thanks
Are you talking about catching errors and logging directly in T-SQL and not then doing RAISERROR to get it to the caller?
I think that's a viable strategy for certain kinds of issues - for instance, if an SP wants to find a problem and correct it silently and simply issue a warning.
But the kind of issues it would apply to might not be terribly frequent.
The kind of things I would think about are things like unusual cases where unexpected UPDATEs are done instead of INSERTs? Or where data already exists so is not generated. Or in a deployment or build script which skips an existing table, etc.
What if your database has performance issues and SP/functions start timing out - logging the error to the database may not work?
Having deployed a new build of an ASP.NET site in a production environment, I am logging dozens of data errors every second, almost always with the error "Cannot find table 0." We use datasets and frequently refer to Table[0], and while I understand the defensive coding practice of checking the dataset for tables before accessing Table[0], it's never been a problem in the past. A certain page will load fine one second, and then be missing one of its data-driven components the next. Just seeing if this rings a bell for anyone.
More detail: I used a different build server this time, and while I imagine the compiler settings are the same on both, I have a hard time thinking that there's a switch that makes 50% of my database calls come back with no tables. I also switched the project to VS 2008, but then reverted all of those changes when I switched back to VS 2005. I notice that the built assembly has a new MyLibrary.XmlSerializers.dll, where it didn't used to, but I also can't imagine that that's causing all the trouble. (It also doesn't fall down on calls to MyLibrary, or at least no more than any other time.)
Updated to add: I've discovered that the troublesome build is a "Release" build, where the working build was compiled as "Debug". Could that explain it?
Rolling back to the build before these changes fixed it. (Rebooting the SQL Server, the step we tried before that, did not.)
The trouble also seems to be load-based - this cruised through our integration and QA environments without a problem, and even our smoke test environment - the one that points to production data - is fine under light load.
Does this have the distinguishing characteristics of anything you might have seen in the past?
Bumping this old question because we have encountered the same issue and perhaps our solution would give more insight in what causes this.
Essentially this problem occurs in a production environment that is under very heavy load in a Windows service that uses multiple threads to process several jobs simultaneously (100 users use the same DB via ASP.NET web app and there are about 60 transactions/second on older hardware with SQL Server 2000).
No variables are shared, that is connections are opened anew, transaction is started, operations executed, transaction committed and connection closes.
Under heavy load sometimes one of the following exceptions occurs:
NullReferenceException: Object reference not set to an instance of an
object.
at System.Data.SqlClient.SqlInternalConnectionTds.get_IsLockedForBulkCopy()
or
System.Data.SqlClient.SqlException:
The server failed to resume the transaction. Desc:3400000178
or
New request is not allowed to start because it should come with valid transaction descriptor
or
This SqlTransaction has completed; it is no longer usable
It seems somehow the connection that is within the pool becomes corrupted and remains associated with previously used transactions. Furthermore, if such connection is retrieved from pool then sqlAdapter.Fill(dataset) results in an empty dataset, causing "Cannot find table 0". Because our service would retry the operation (reading job list) on failure and it would always get the same corrupt connection from the pool it would fail with this error until restarted.
We removed the issue by using SqlConnection.ClearPool(connection) on exception to make sure this connection is discarded from the pool and restructuring the application so less threads access the same resources simultaneously.
I have no clue who exactly caused this issue so I am not sure we have really fixed that, maybe just made it so rare it had not occurred again yet.
I've fought precisely this error message before. The key is that an underlying data method is swallowing a timeout exception.
You're probably doing something like this:
var table = GetEmployeeDataSet().Tables[0];
GetEmployeeDataSet is swallowing an exception, probably a timeout exception, which is why it only happens sporadically - it happens under load. You need to do the following to fix it:
Modify the underlying code to not swallow the exception, but rather let it bubble up to the next level so you can identify it properly.
Identify the query(s) causing the problem, and then rewrite, reindex, denormalize or throw hardware at the problem. See this for more info: System.Data.SqlClient.SqlException: Timeout expired
I've seen something similar. I believe our problem had to do with failed sessions being re-used (once the session object failed it went into a poor state and could not recover.) We fixed it by increasing the memory for the session pool and increasing the frequency of the web application recycling.
It also was "caused" by a new version that at first blush did not seem to have any change to cause such an effect. However, eventually it became clear that the logic of the program was opening and closing a lot more connections (maybe 20% more) than it used to. This small change pushed the limit of our prior configuration.
You might check the SQL Server logs for errors. Or, the Web server event log. It sounds like your connection pool could be out of open connections or your db could be out.
Which database calls changed between versions?
The error is obviously telling you one of your database calls isn't returning any data on occasion; I can't think of any cases where a code/assembly issue would cause it.
I have seen something like this when doing something with nHibernate Sessions in a non-thread-safe manner. That would explain why you only see it under load. Would need to see your code to guess at what isn't thread-safe though.