How to analyze a MariaDB crash log? - mariadb

When I created a partition for an existing table, I got an exception stack below:
Thread pointer: 0x1ce1ecc6ab8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
mysqld.exe!ha_partition::read_par_file()[ha_partition.cc:3021]
mysqld.exe!ha_partition::get_from_handler_file()[ha_partition.cc:3161]
mysqld.exe!ha_partition::initialize_partition()[ha_partition.cc:512]
mysqld.exe!partition_create_handler()[ha_partition.cc:185]
mysqld.exe!get_new_handler()[handler.cc:302]
mysqld.exe!TABLE_SHARE::init_from_binary_frm_image()[table.cc:2025]
mysqld.exe!open_table_def()[table.cc:698]
mysqld.exe!tdc_acquire_share()[table_cache.cc:842]
mysqld.exe!open_table()[sql_base.cc:1905]
mysqld.exe!open_and_process_table()[sql_base.cc:3802]
mysqld.exe!open_tables()[sql_base.cc:4300]
mysqld.exe!mysql_alter_table()[sql_table.cc:9353]
mysqld.exe!Sql_cmd_alter_table::execute()[sql_alter.cc:510]
mysqld.exe!mysql_execute_command()[sql_parse.cc:6087]
mysqld.exe!Prepared_statement::execute()[sql_prepare.cc:4760]
mysqld.exe!Prepared_statement::execute_loop()[sql_prepare.cc:4246]
mysqld.exe!mysql_sql_stmt_execute()[sql_prepare.cc:3364]
mysqld.exe!mysql_execute_command()[sql_parse.cc:3901]
mysqld.exe!sp_instr_stmt::exec_core()[sp_head.cc:3652]
mysqld.exe!sp_lex_keeper::reset_lex_and_exec_core()[sp_head.cc:3335]
mysqld.exe!sp_instr_stmt::execute()[sp_head.cc:3513]
mysqld.exe!sp_head::execute()[sp_head.cc:1346]
mysqld.exe!sp_head::execute_procedure()[sp_head.cc:2288]
mysqld.exe!do_execute_sp()[sql_parse.cc:3005]
mysqld.exe!Sql_cmd_call::execute()[sql_parse.cc:3247]
mysqld.exe!mysql_execute_command()[sql_parse.cc:6087]
mysqld.exe!sp_instr_stmt::exec_core()[sp_head.cc:3652]
mysqld.exe!sp_lex_keeper::reset_lex_and_exec_core()[sp_head.cc:3335]
mysqld.exe!sp_instr_stmt::execute()[sp_head.cc:3513]
mysqld.exe!sp_head::execute()[sp_head.cc:1346]
mysqld.exe!sp_head::execute_procedure()[sp_head.cc:2288]
mysqld.exe!Event_job_data::execute()[event_data_objects.cc:1459]
mysqld.exe!Event_worker_thread::run()[event_scheduler.cc:312]
mysqld.exe!event_worker_thread()[event_scheduler.cc:268]
mysqld.exe!pthread_start()[my_winthread.c:62]
ucrtbase.dll!_configthreadlocale()
KERNEL32.DLL!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x1ce21cf66b8): alter table mytable add PARTITION (PARTITION t20221103 VALUES LESS THAN (TO_DAYS("2022-11-03")+1))
Connection ID (thread ID): 1649
Status: NOT_KILLED
The running platform is windows so I cannot debug the core file.
Therefore, I read the source code but I cannot understand the reason still deeply.
The database version is 10.4.6 so here is the source code link:
https://github.com/MariaDB/server/blob/mariadb-10.4.6/sql/ha_partition.cc#L3021
chksum= 0;
for (i= 0; i < len_words; i++)
chksum ^= uint4korr((file_buffer) + PAR_WORD_SIZE * i);
if (chksum)
goto err2;
m_tot_parts= uint4korr((file_buffer) + PAR_NUM_PARTS_OFFSET);
DBUG_PRINT("info", ("No of parts: %u", m_tot_parts));
tot_partition_words= (m_tot_parts + PAR_WORD_SIZE - 1) / PAR_WORD_SIZE;
tot_name_len_offset= file_buffer + PAR_ENGINES_OFFSET +
PAR_WORD_SIZE * tot_partition_words;
tot_name_words= (uint4korr(tot_name_len_offset) + PAR_WORD_SIZE - 1) /
PAR_WORD_SIZE; // <--- crashed here
There are some macros, so I don't know whether the line number is exact.
It seems a bug for loading the par file. I found that the above code has checked the validation on the par file, but subsequently, MariaDB still raises the exception.
So, I wonder if my analysis is right and how to bypass the exception, thanks a lot.

First is to search existing bug reports for 'read_par_file', and there's no obvious match.
Second is, looking at the line it crashed, we can assume its reading from memory that isn't allocated. The file_buffer was allocated earlier in the file based of a word length at the beginning.
Third, if you look at the git blame of the function, you see nothing has changed in quite a while.
So its likely a new bug. Please report it using the show create table and par file.
To confirm, that the show create table mytable, paste that into a running 10.4 latest version (containers are good for this). Then issue your alter table statement again. Its likely to crash, but its good to confirm.
There does seem to be a lack of checking around some of these offsets with regard to the allocated space.
Given it looks like you are doing daily partitions, have you exceeded the maximum partitions of 8k per table? Either way, an error message should occur rather than a crash.

Related

First token could not be read or is not the keyword 'FoamFile' in OpenFOAM

I am a beginner to programming. I am trying to run a simulation of a combustion chamber using reactingFoam.
I have modified the counterflow2D tutorial.
For those who maybe don't know OpenFOAM, it is a programme built in C++ but it does not require C++ programming, just well-defining the variables in the files needed.
In one of my first tries I have made a very simple model but since I wanted to check it very well I set it to 60 seconds with a 1e-6 timestep.
My computer is not very powerful so it took me for a day aprox. (by this I mean I'd like to find a solution rather than repeating the simulation).
I executed the solver reactingFOAM using 4 processors in parallel using
mpirun -np 4 reactingFOAM -parallel > log
The log does not show any evidence of error.
The problem is that when I use reconstructPar it works perfectly but then I try to watch the results with paraFoam and this error is shown:
From function bool Foam::IOobject::readHeader(Foam::Istream&)
in file db/IOobject/IOobjectReadHeader.C at line 88
Reading "mypath/constant/reactions" at line 1
First token could not be read or is not the keyword 'FoamFile'
I have read that maybe some files are empty when they are not supposed to be so, but I have not found that problem.
My 'reactions' file have not been modified from the tutorial and has always worked.
edit:
Sorry for the vague question. I have modified it a bit.
A typical OpenFOAM dictionary file always contains a Foam::Istream named FoamFile. An example from a typical system/controlDict file can be seen below:
FoamFile
{
version 2.0;
format ascii;
class dictionary;
location "system";
object controlDict;
}
During the construction of the dictionary header, if this Istream is absent, OpenFOAM ceases its operation by raising an error message that you have experienced:
First token could not be read or is not the keyword 'FoamFile'
The benefit of the header is possibly to contribute OpenFOAM's abstraction mechanisms, which would be difficult otherwise.
As mentioned in the comments, adding the header entity almost always solves this problem.

Qt error is printed on the console; how to see where it originates from?

I'm getting this on the console in a QML app:
QFont::setPointSizeF: Point size <= 0 (0.000000), must be greater than 0
The app is not crashing so I can't use the debugger to get a backtrace for the exception. How do I see where the error originates from?
If you know the function the warning occurs in (in this case, QFont::setPointSizeF()), you can put a breakpoint there. Following the stack trace will lead you to the code that calls that function.
If the warning doesn't include the name of the function and you have the source code available, use git grep with part of the warning to get an idea of where it comes from. This approach can be a bit of trial and error, as the code may span more than one line, etc, and so you might have to try different parts of the string.
If the warning doesn't include the name of the function, you don't have the source code available and/or you don't like the previous approach, use the QT_MESSAGE_PATTERN environment variable:
QT_MESSAGE_PATTERN="%{function}: %{message}"
For the full list of variables at your disposal, see the qSetMessagePattern() docs:
%{appname} - QCoreApplication::applicationName()
%{category} - Logging category
%{file} - Path to source file
%{function} - Function
%{line} - Line in source file
%{message} - The actual message
%{pid} - QCoreApplication::applicationPid()
%{threadid} - The system-wide ID of current thread (if it can be obtained)
%{qthreadptr} - A pointer to the current QThread (result of QThread::currentThread())
%{type} - "debug", "warning", "critical" or "fatal"
%{time process} - time of the message, in seconds since the process started (the token "process" is literal)
%{time boot} - the time of the message, in seconds since the system boot if that can be determined (the token "boot" is literal). If the time since boot could not be obtained, the output is indeterminate (see QElapsedTimer::msecsSinceReference()).
%{time [format]} - system time when the message occurred, formatted by passing the format to QDateTime::toString(). If the format is not specified, the format of Qt::ISODate is used.
%{backtrace [depth=N] [separator="..."]} - A backtrace with the number of frames specified by the optional depth parameter (defaults to 5), and separated by the optional separator parameter (defaults to "|"). This expansion is available only on some platforms (currently only platfoms using glibc). Names are only known for exported functions. If you want to see the name of every function in your application, use QMAKE_LFLAGS += -rdynamic. When reading backtraces, take into account that frames might be missing due to inlining or tail call optimization.
On an unrelated note, the %{time [format]} placeholder is quite useful to quickly "profile" code by qDebug()ing before and after it.
I think you can use qInstallMessageHandler (Qt5) or qInstallMsgHandler (Qt4) to specify a callback which will intercept all qDebug() / qInfo() / etc. messages (example code is in the link). Then you can just add a breakpoint in this callback function and get a nice callstack.
Aside from the obvious, searching your code for calls to setPointSize[F], you can try the following depending on your environment (which you didn't disclose):
If you have the debugging symbols of the Qt libs installed and are using a decent debugger, you can set a conditional breakpoint on the first line in QFont::setPointSizeF() with the condition set to pointSize <= 0. Even if conditional breakpoints don't work you should still be able to set one and step through every call until you've found the culprit.
On Linux there's the tool ltrace which displays all calls of a binary into shared libs, and I suppose there's something similar in the M$ VS toolbox. You can grep the output for calls to setPointSize directly, but of course this won't work for calls within the lib itself (which I guess could be the case when it handles the QML internally).

Out-of-memory error in parfor: kill the slave, not the master

When an out-of-memory error is raised in a parfor, is there any way to kill only one Matlab slave to free some memory instead of having the entire script terminate?
Here is what happens by default when an out-of-memory error occurs in a parfor: the script terminated, as shown in the screenshot below.
I wish there was a way to just kill one slave (i.e. removing a worker from parpool) or stop using it to release as much memory as possible from it:
If you get a out of memory in the master process there is no chance to fix this. For out of memory on the slave, this should do it:
The simple idea of the code: Restart the parfor again and again with the missing data until you get all results. If one iteration fails, a flag (file) is written which let's all iterations throw an error as soon as the first error occurred. This way we get "out of the loop" without wasting time producing other out of memory.
%Your intended iterator
iterator=1:10;
%flags which indicate what succeeded
succeeded=false(size(iterator));
%result array
result=nan(size(iterator));
FLAG='ANY_WORKER_CRASHED';
while ~all(succeeded)
fprintf('Another try\n')
%determine which iterations should be done
todo=iterator(~succeeded);
%initialize array for the remaining results
partresult=nan(size(todo));
%initialize flags which indicate which iterations succeeded (we can not
%throw erros, it throws aray results)
partsucceeded=false(size(todo));
%flag indicates that any worker crashed. Have to use file based
%solution, don't know a better one. #'
delete(FLAG);
try
parfor falseindex=1:sum(~succeeded)
realindex=todo(falseindex);
try
% The flag is used to let all other workers jump out of the
% loop as soon as one calculation has crashed.
if exist(FLAG,'file')
error('some other worker crashed');
end
% insert your code here
%dummy code which randomly trowsexpection
if rand<.5
error('hit out of memory')
end
partresult(falseindex)=realindex*2
% End of user code
partsucceeded(falseindex)=true;
fprintf('trying to run %d and succeeded\n',realindex)
catch ME
% catch errors within workers to preserve work
partresult(falseindex)=nan
partsucceeded(falseindex)=false;
fprintf('trying to run %d but it failed\n',realindex)
fclose(fopen(FLAG,'w'));
end
end
catch
%reduce poolsize by 1
newsize = matlabpool('size')-1;
matlabpool close
matlabpool(newsize)
end
%put the result of the current iteration into the full result
result(~succeeded)=partresult;
succeeded(~succeeded)=partsucceeded;
end
After quite bit of research, and a lot of trial and error, I think I may have a decent, compact answer. What you're going to do is:
Declare some max memory value. You can set it dynamically using the MATLAB function memory, but I like to set it directly.
Call memory inside your parfor loop, which returns the memory information for that particular worker.
If the memory used by the worker exceeds the threshold, cancel the task that worker was working on. Now, here it get's a bit tricky. Depending on the way you're using parfor, you'll either need to delete or cancel either the task or worker. I've verified that it works with the code below when there is one task per worker, on a remote cluster.
Insert the following code at the beginning of your parfor contents. Tweak as necessary.
memLimit = 280000000; %// This doesn't have to be in parfor. Everything else does.
memData = memory;
if memData.MemUsedMATLAB > memLimit
task = getCurrentTask();
cancel(task);
end
Enjoy! (Fun question, by the way.)
One other option to consider is that since R2013b, you can open a parallel pool with 'SpmdEnabled' set to false - this allows MATLAB worker processes to die without the whole pool being shut down - see the doc here http://www.mathworks.co.uk/help/distcomp/parpool.html . Of course, you still need to arrange somehow to shutdown the workers.

fread protection stack overflow error

I'm using fread in data.table (1.8.8, R 3.0.1) in a attempt to read very large files.
The file in questions has 313 rows and ~6.6 million cols of numeric data rows and the file is around around 12gb. This is a Centos 6.4 with 512GB of RAM.
When I attempt to read in the file:
g=fread('final.results',header=T,sep=' ')
'header' changed by user from 'auto' to TRUE
Error: protect(): protection stack overflow
I tried starting R with --max-ppsize 500000 , which is the max, but the same error.
I also tried setting the stack size to unlimited via
ulimit -s unlimited
Virtual memory was already set to unlimited.
Am I being unrealistic with a file of this size? Did I miss something fairly obvious?
Now fixed in v1.8.9 on R-Forge.
An unintended 50,000 column limit has been removed in fread. Thanks to mpmorley for reporting. Test added.
The reason was I got this part wrong in the fread.c source :
// *********************************************************************
// Allocate columns for known nrow
// *********************************************************************
ans=PROTECT(allocVector(VECSXP,ncol));
protecti++;
setAttrib(ans,R_NamesSymbol,names);
for (i=0; i<ncol; i++) {
thistype = TypeSxp[ type[i] ];
thiscol = PROTECT(allocVector(thistype,nrow)); // ** HERE **
protecti++;
if (type[i]==SXP_INT64)
setAttrib(thiscol, R_ClassSymbol, ScalarString(mkChar("integer64")));
SET_TRUELENGTH(thiscol, nrow);
SET_VECTOR_ELT(ans,i,thiscol);
}
According to R-exts section 5.9.1, that PROTECT inside the loop isn't needed :
In some cases it is necessary to keep better track of whether protection is really needed. Be
particularly aware of situations where a large number of objects are generated. The pointer
protection stack has a fixed size (default 10,000) and can become full. It is not a good idea
then to just PROTECT everything in sight and UNPROTECT several thousand objects at the end. It
will almost invariably be possible to either assign the objects as part of another object (which
automatically protects them) or unprotect them immediately after use.
So that PROTECT is now removed and all is well. (It seems that the pointer protection stack limit has been reduced to 50,000 since that text was written; Defn.h contains #define R_PPSSIZE 50000L.) I've checked all other PROTECTs in data.table C source for anything similar and found and fixed one in assign.c too (when adding more than 50,000 columns by reference), no others.
Thanks for reporting!

Is there a working distribution of sqlite available for OpenVMS?

I am looking for a working distribution of SQLite for OpenVMS. I tried building SQLite 3.7.9 from the amalgamation file, using patches I found in a mailing list, but it does not quite work.
I am using HP C V7.1-015 on OpenVMS Alpha 7.3-2.
Since I cannot install python, which seems to include SQLite3, I have to build from sources.
I compile using the following commands:
$ CC /OPTIMIZE -
/DEFINE=(SQLITE_THREADSAFE=0, -
SQLITE_OMIT_LOAD_EXTENSION=1, -
SQLITE_OMIT_COMPILEOPTION_DIAGS=1, -
SQLITE_OMIT_MEMORYDB=1, -
SQLITE_OMIT_TEMPDB=1, -
SQLITE_OMIT_DEPRECATED=1, -
SQLITE_OMIT_SHARED_CACHE=1, -
_USE_STD_STAT=ENABLE) -
/FLOAT=IEEE_FLOAT -
sqlite3.c
$ CC /OPTIMIZE -
/DEFINE=(SQLITE_THREADSAFE=0, -
SQLITE_OMIT_LOAD_EXTENSION=1, -
SQLITE_OMIT_COMPILEOPTION_DIAGS=1, -
SQLITE_OMIT_MEMORYDB=1, -
SQLITE_OMIT_TEMPDB=1, -
SQLITE_OMIT_DEPRECATED=1, -
SQLITE_OMIT_SHARED_CACHE=1, -
_USE_STD_STAT=ENABLE) -
/FLOAT=IEEE_FLOAT -
shell.c
I copied the defines from the mailing list, and added /FLOAT=IEEE_FLOAT to get rid of most warnings regarding floating points (related to overflows due to exponent 308).
During compilation I got some informationals and warnings.
I get the following messages while linking:
$ LINK shell.obj,sqlite3.obj
...
%LINK-W-NUDFSYMS, 2 undefined symbols:
%LINK-I-UDFSYM, __STD_FSTAT
%LINK-I-UDFSYM, __STD_STAT
...
Since I am a little bit lost here, I rather have SQLite3 sources which compile on OpenVMS.
The specific problem you're getting from the linker arises from the fact that you've requested capability at compile time that your system doesn't have. I believe the _USE_STD_STAT option first became available in OpenVMS v8.2, yet you're on 7.3-2. Your compiler and your headers know what to do when _USE_STD_STAT is defined, but the functions to process the X/Open-compliant stat structure do not exist in the C run-time (CRTL in VMS parlance) on your system, and your linker is telling you, "ain't got those functions."
Ideally you would be able to upgrade your operating system. Current as of this writing is v8.4. v7.3-2 was released eight and a half years ago and v8.2 over seven years ago. I understand that there are technical, budgetary, and even political reason that upgrades aren't always possible. If it were me, and I were stuck on OpenVMS Alpha v7.3-2, I would try removing the _USE_STD_STAT=ENABLE from the compilation and see what blows up.
One of the side effects of enabling _USE_STD_STAT is that you also get _LARGEFILE along with it. If that's the only reason SQLite needs the option, you may be fine but limited to 4GB databases. I suspect there's more to it than that, i.e., SQLite very likely makes use of elements in the stat structure that do actually require the newer structure.
You can read up on the differences in the traditional and standards-compliant stat structures at http://h71000.www7.hp.com/doc/84final/5763/5763profile_062.html#index_x_1699.
I've recently improved my VMSish patch for SQLite and made it available for SQLite version 3.7.14.1: http://www.mail-archive.com/sqlite-users#sqlite.org/msg73570.html (or http://sqlite.1065341.n5.nabble.com/Building-SQLite-3-7-14-1-for-OpenVMS-td65277.html).
POSIX locking still doesn't work though, and I was unable to find out why.
Well, there was a message on the sqlite-users mailing list on getting SQLite 3.7.9 working on OpenVMS. I don't know how relevant that is to the version you've got (or if the patch was adopted by the SQLite developers; they're a bit picky for legal reasons IIRC) but it looks likely to be useful. Good luck.

Resources