MariaDB select with group_concat() - Out of memory - mariadb

we have centos 7 machine with mariadb installed.
When I run:
SELECT h.id,
h.name,
group_concat(distinct d.name ORDER BY d.name SEPARATOR " ") AS descriptions
FROM inventar h
LEFT JOIN descriptions d ON(FIND_IN_SET(d.id, h.description_id) > 0) GROUP BY h.id,h.description_id
ORDER BY h.name asc;
ERROR 5 (HY000): Out of memory (Needed 65535816 bytes)
I read that it probably limit of the size of temporary table.
I checked the size:
MariaDB [wexac_hosts]> show variables like "%table_size%";
Variable_name
Value
max_heap_table_size
1048576000
tmp_disk_table_size
18446744073709551615
tmp_memory_table_size
12572426240
tmp_table_size
12572426240
it's bigger then 65535816 bytes.
Which mariadb variable should I increase?

If it's GROUP_CONCAT that's running out of memory, you need to increase group_concat_max_len.
From the GROUP_CONCAT documentation:
The maximum returned length in bytes is determined by the
group_concat_max_len server system variable, which defaults to 1M (>=
MariaDB 10.2.4) or 1K (<= MariaDB 10.2.3).

Related

Sqoop trying to --split-by ROWID (Oracle) fails

(be Kind, this is my first question and I did extensive Research here and on the net beforehand. Question Oracle ROWID for Sqoop Split-By Column did not really solve this issue, as the original Person asking resorted to using another column)
I am using sqoop to copy data from an Oracle 11 DB.
Unfortunately, some tables have no index, no Primary key, only partitions (date). These tables are very large, hundreds of millions if not billions of rows.
so far, I have decided to Access data in the source by explicitly adressing the partitions. That works well and Speeds up the process nicely.
I need to do the splits by data that resides in each and every table in order to avoid too many if- branches in my bash script. (we're talking some 200+ tables here)
I notice that a split by 8 Tasks results in very uneven spread of workload among the Tasks. I considered using Oracle ROWID to define the split.
To do this, I must define a boundary-query. In a Standard query 'select * from xyz' the rowid is not part of the result set. therefore, it is not an option to let Sqoop define the boundary-query from --query.
Now, when I run this, I am getting the error
ERROR tool.ImportTool: Encountered IOException running import job:
java.io.IOException: Sqoop does not have the splitter for the given SQL
data type. Please use either different split column (argument --split-by)
or lower the number of mappers to 1. Unknown SQL data type: -8
samples of ROWID :
AAJXFWAKPAAOqqKAAA
AAJXFWAKPAAOqqKAA+
AAJXFWAKPAAOqqKAA/
it is static and unique once it is created for any row.
I cast this funny datatype into something else in my boundary-query
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect
jdbc:oracle:thin:#127.0.0.1:port:mydb --username $USER --P --m 8
--split-by ROWID --boundary-query "select cast(min(ROWID) as varchar(18)), cast
( max(ROWID)as varchar(18)) from table where laufbzdt >
TO_DATE('2019-02-27', 'YYYY-MM-DD')" --query "select * from table
where laufbzdt > TO_DATE('2019-02-27', 'YYYY-MM-DD') and \$CONDITIONS "
--null-string '\\N'
--null-non-string '\\N'
But then I get ugly ROWIDs that are rejected by Oracle:
select * from table where laufbzdt > TO_DATE('2019-02-27', 'YYYY-MM-DD')
and ( ROWID >= 'AAJX6oAG聕聁AE聉N:' ) AND ( ROWID < 'AAJX6oAH⁖⁁AD䁔䀷' ) ,
Error Msg = ORA-01410: invalid ROWID
how can I resolve this properly?
I am a LINUX-Embryo and have painfully chewed myself through the Topics of bash-shell-scripting and Sqooping so far, but I would like to make better use of evenly spread mapper-task workload - it would cut sqoop-time in half, I guess, saving some 5 to 8 hours.
TIA!
wahlium
You can try ROWNUM, but I think sqoop import does not work with pseudocolumn.

MariaDB 10.2.10 writing double binlog entries in mixed format

I am using MariaDB 10.2.10 under Debian 9 in Master/Slave replication. I am experiencing problems with replication since the slave is refusing replication due to 1062 duplicate key errors.
After a long time of investigation I found, that the binlog of the master contains the same INSERT statement twice. It is written in statement AND row based format. binlog_format is set to MIXED.
I had a look at general log - the INSERT statement was only commited once.
Heres the outpout of mysqlbinlog:
# at 11481089
#171205 10:22:37 server id 126 end_log_pos 11481132 CRC32 0x73b0f77c
Write_rows: table id 22683990 flags: STMT_END_F
### INSERT INTO `mydb`.`document_reference`
### SET
### #1=30561
### #2=6
### #3=0
# at 11481132
#171205 10:22:37 server id 126 end_log_pos 11481387 CRC32 0x599e2b04
Query thread_id=3282752 exec_time=0 error_code=0
SET TIMESTAMP=1512465757/*!*/;
INSERT INTO document_reference
(document_reference_document_id, document_reference_type, document_reference_value)
VALUES (30561, "single", 0)
/*!*/;
# at 11481387
#171205 10:22:37 server id 126 end_log_pos 11481418 CRC32 0x73fe1166 Xid = 248234294
COMMIT/*!*/;
Anyone has an idea, why this statement is written twice to the binlog?

Newbie Alert: How Do I Determine How Much Storage I'm Using in an Oracle 11g Database?

Techies--
I am migrating an Oracle 11g database to an MS SQL Server 2012 instance. Before I begin the actual physical copy of data from 11g to MSSQL, how do I determine what kind of space I'll need to set aside? In Oracle SQL Developer I was able to find the Statistics tab while in a SQL session. I see that this tab may give me some of the high-level/rough idea version of what I'm looking for--however, I have several hundred tables, so this approach doesn't seem practical. Is there a way to issue a sql statement, or execute an existing stored proc to determine how much physical space will be needed to contain this data?
A few queries that I keep on my hand and are useful for this exercise
find out tablespaces and the space they use
col "Tablespace" for a22
col "Used MB" for 99,999,999
col "Free MB" for 99,999,999
col "Total MB" for 99,999,999
select df.tablespace_name "Tablespace",
totalusedspace "Used MB",
(df.totalspace - tu.totalusedspace) "Free MB",
df.totalspace "Total MB",
round(100 * ( (df.totalspace - tu.totalusedspace)/ df.totalspace))
"Pct. Free"
from
(select tablespace_name,
round(sum(bytes) / 1048576) TotalSpace
from dba_data_files
group by tablespace_name) df,
(select round(sum(bytes)/(1024*1024)) totalusedspace, tablespace_name
from dba_segments
group by tablespace_name) tu
where df.tablespace_name = tu.tablespace_name ;
Get storage from table within a tablespace
COLUMN TABLE_NAME FORMAT A20
COLUMN TABLESPACE_NAME FORMAT A20
SELECT
SUBSTR(s.segment_name,1,20) TABLE_NAME,
SUBSTR(s.tablespace_name,1,20) TABLESPACE_NAME,
ROUND(DECODE(s.extents, 1, s.initial_extent,
(s.initial_extent + (s.extents-1) * s.next_extent))/1024000,2) ALLOCATED_MB,
ROUND((t.num_rows * t.avg_row_len / 1024000),2) REQUIRED_MB
FROM
dba_segments s,
dba_tables t
WHERE
s.owner = t.owner AND
s.segment_name = t.table_name and
s.tablespace_name = '<yourtablespacename>'
ORDER BY 3 ASC;

How can I get more memory in "Oracle R Enterprise"?

How can I increase memory for ORE?
My server have 10G free memory, but I can't use all of them for ORE.
These commands have error:
begin
sys.rqscriptDrop('clustering_test');
sys.rqscriptcreate('clustering_test','function(n){
rm(list=ls())
cnt=30000
d=dist(data.frame(x=rnorm(cnt),
y=rnorm(cnt)))
}');
end;
(for run above R script you can run this):
select * from table(rqtableeval(cursor(select 1 n from dual),
cursor(select 1 "ore.connect" from dual),
'select 1 x, 2 y from dual','clustering_test'))
But these commands could run on R in server:
cnt=30000
d=dist(data.frame(x=rnorm(cnt),y=rnorm(cnt)))
They are same code, but one of them ran directly on R and another through ORE.
cnt depends to your free system RAM.
error in ORE has not any description. but if you decrease cnt that could run.

Postgresql partitions: Abnormally high seq scan cost on master table

I have a little database of a few hundreds of millions of rows for storing call detail records. I setup partitioning as per:
http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html
and it seemed to work pretty well until now. I have master table "acmecdr" which has rules for inserting into the correct partition and check constraints to make sure the correct table is used when selecting data. Here is an example of one of the partitions:
cdrs=> \d acmecdr_20130811
Table "public.acmecdr_20130811"
Column | Type | Modifiers
-------------------------------+---------+------------------------------------------------------
acmecdr_id | bigint | not null default
...snip...
h323setuptime | bigint |
acmesipstatus | integer |
acctuniquesessionid | text |
customers_id | integer |
Indexes:
"acmecdr_20130811_acmesessionegressrealm_idx" btree (acmesessionegressrealm)
"acmecdr_20130811_acmesessioningressrealm_idx" btree (acmesessioningressrealm)
"acmecdr_20130811_calledstationid_idx" btree (calledstationid)
"acmecdr_20130811_callingstationid_idx" btree (callingstationid)
"acmecdr_20130811_h323setuptime_idx" btree (h323setuptime)
Check constraints:
"acmecdr_20130811_h323setuptime_check" CHECK (h323setuptime >= 1376179200 AND h323setuptime < 1376265600)
Inherits: acmecdr
Now as one would expect with SET constraint_exclusion = on the correct partition should automatically be preferred and since there is an index on it there should only be one index scan.
However:
cdrs=> explain analyze select * from acmecdr where h323setuptime > 1376179210 and h323setuptime < 1376179400;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..1435884.93 rows=94 width=1130) (actual time=138857.660..138858.778 rows=112 loops=1)
-> Append (cost=0.00..1435884.93 rows=94 width=1130) (actual time=138857.628..138858.189 rows=112 loops=1)
-> Seq Scan on acmecdr (cost=0.00..1435863.60 rows=1 width=1137) (actual time=138857.584..138857.584 rows=0 loops=1)
Filter: ((h323setuptime > 1376179210) AND (h323setuptime < 1376179400))
-> Index Scan using acmecdr_20130811_h323setuptime_idx on acmecdr_20130811 acmecdr (cost=0.00..21.33 rows=93 width=1130) (actual time=0.037..0.283 rows=112 loops=1)
Index Cond: ((h323setuptime > 1376179210) AND (h323setuptime < 1376179400))
Total runtime: 138859.240 ms
(7 rows)
So, I can see it's not scanning all the partitions, only the relevant one (which in index scan and pretty quick) and also the master table (which seems to be normal from the examples I've seen). But the high cost of the seq scan on the master table seems to be abnormal. I would love for that to come down and I see no reason for it, especially since the master table does not have any records in it:
cdrs=> select count(*) from only acmecdr;
count
-------
0
(1 row)
Unless I'm missing something obvious, this query should be quick. But it's not - it takes about 2 minutes? This does not seem normal at all (even for a slow server).
I'm out of ideas of what to try next, so if anyone has any suggestions or pointers in the right direction, it would be very much appreciated.

Resources