Can anyone tell me what is the maximum possible throughput of Teradata Parallel Transporter via using Informatica.
I am using Informatica TPT connection to load a huge flat file (around 10 to 15 Gb) in a table which has CANSE_N partitioning.
I am getting a throughput of aroud 50,000 to 56,000 Rows/sec and i am looking for a performance of more than 90,000 rows/sec please suggest as performance am getting is not enough to load such big files.
Thanks.
MLoad (update) is the operator. Source is MQ from Webshpere. One box with 20GB and 2 2.9 GHz CPU's (QA). millions in the source.
Related
We are using Apache pinot as source system. We have loaded 10GB TPCH data into pinot. We are using Presto as query execution engine, using pinot connector.
We are trying with simple configuration. Presto installed on CentOS machine with 8CPUs and 64GB RAM. Only one instance of worker running with embedded coordinator. Pinot is installed on CentOS machine with 4 CPUs and 64 GB RAM. One Controller,one broker,one server and one zookeeper are running.
Running a query on Lineitem table involving group by roll-up, is taking 23 seconds. Around 20 seconds is spent in transferring 2.3GB data from pinot to presto.
In another query, involving join between Lineitem,Nation,Partsupply,Region with group by cube is taking around 2 minutes. Data transfer is taking around 25 seconds in this. Most of the remaiy time is spent in join and aggregation computation.
Is this normal performance with presto-pinot?
If not,what am I missing?
Do I, need to increase hardware? Increase number of presto/pinot processes?
Any specific presto properties I should consider modifying?
Thanks for your help in advance
Please list the queries so that we can provide a better answer. At a high level, Presto Pinot connector tries to pushdown most of the computation (filter, aggregation, group by) to Pinot and minimize the amount of data needed to pull from Pinot.
There are always queries that require a full table scan and computation cannot be pushed to pinot. Query latency can be higher in such cases. Pinot recently added a streaming api that can improve the latency further.
I am not able to find maria DB recommended RAM,disk,number of Core capacity. We are setting up initial level and very minimum data volume. So just i need maria DB recommended capacity.
Appreciate your help!!!
Seeing that over the last few years Micro-Service architecture is rapidly increasing, and each Micro-Service usually needs its own database, I think this type of question is actually becoming more appropriate.
I was looking for this answer seeing that we were exploring the possibility to create small databases on many servers, and was wondering for interest sake what the minimum requirements for a Maria/MySQL DB would be...
Anyway I got this helpful answer from here that I thought I could also share here if someone else was looking into it...
When starting up, it (the database) allocates all the RAM it needs. By default, it
will use around 400MB of RAM, which isn’t noticible with a database
server with 64GB of RAM, but it is quite significant for a small
virtual machine. If you add in the default InnoDB buffer pool setting
of 128MB, you’re well over your 512MB RAM allotment and that doesn’t
include anything from the operating system.
1 CPU core is more than enough for most MySQL/MariaDB installations.
512MB of RAM is tight, but probably adequate if only MariaDB is running. But you would need to aggressively shrink various settings in my.cnf. Even 1GB is tiny.
1GB of disk is more than enough for the code and minimal data (I think).
Please experiment and report back.
There are minor differences in requirements between Operating system, and between versions of MariaDB.
Turn off most of the Performance_schema. If all the flags are turned on, lots of RAM is consumed.
20 years ago I had MySQL running on my personal 256MB (RAM) Windows box. I suspect today's MariaDB might be too big to work on such tiny machine. Today, the OS is the biggest occupant of any basic machine's disk. If you have only a few MB of data, then disk is not an issue.
Look at it this way -- What is the smallest smartphone you can get? A few GB of RAM and a few GB of "storage". If you cut either of those numbers in half, the phone probably cannot work, even before you add apps.
MariaDB or MySQL both actually use very less memory. About 50 MB to 150 MB is the range I found in some of my servers. These servers are running a few databases, having a handful of tables each and limited user load. MySQL documentation claims in needs 2 GB. That is very confusing to me. I understand why MariaDB does not specify any minimum requirements. If they say 50 MB there are going to be a lot of folks who will want to disagree. If they say 1 GB then they are unnecessarily inflating the minimum requirements. Come to think of it, more memory means better cache and performance. However, a well designed database can do disk reads every time without any performance issues. My apache installs (on the same server) consistently use up more memory (about double) than the database.
I have developed a program which uses SQLite 3.7 ... database, in it there is a rather extensive write/read module that imports , checks and updates data. This process takes 14 seconds on my PC and Im pleased as punch with the performance.
I use transactions for everything with paratetrs my PC is a Intel i7 with 18gig of ram. I have not set anything in the database. I used SQLite Expert to create the database and create the data structures including table and columns and checked that all indexes are created. In other words its all OK.
I have since deployed the program/database to 2 other machines. That 14 second process takes over 5 minutes on the other machines. Same program, identical data, identical database. The machines are upto date, one is a 3rd gen Intel i7 bought last week, the other is quite fast as well so hardware should not be an issue.
Im just not understanding what the problem could be? Is it the database itself ? I have not set anything other then encription on it. Remembering that I run the same and it takes the 14 seconds. Could it be that the database is 'optimised' to my PC ? so when I give it to others its not optimised?
I know I could turn off jurnaling to get better performance, but that would only speed up the process and still would leave the problem.
Any ideas would be welcome.
EDIT:
I have tested the program on my 7yo Dual Athelon with 3gig of ram running XP on HDD, and the procedure took 35 seconds. Well in tolerable limits considering. I just dont get what could be making 2 modern machines take 5 min ?
I have an idea that its a write issue, as using a reader they are slower but quite ecceptable.
SQLite speed is affected most by how well the disk does random reads and writes; any SSD is much more better at this than any rotating disk.
Whenever changes overflow the internal cache, they must be written to disk. You should use PRAGMA cache_size to increase the cache to more than the default 2 MB.
Changed data must be written to disk at the end of every transaction. Make sure that there are as many changes as possible in one transaction.
If much of your processing involves temporary tables or indexes, the speed is affected by the speed of the main disk. If your machines have enough RAM, you can force temporary data to RAM with PRAGMA temp_store.
You should enable Write-Ahead Logging.
Note: the default SQLite distribution does not have encryption.
We are in analysis mode for an embedded Linux system which will use an ARM9 processor around 400 megahertz.
It would have multiple sensors and we would need to write logs to a SQLite 3 database. We estimate the maximum load imaginable would be between 100 and 200 database inserts per second.
Is this reasonable or should we go have our heads examined?
I think it is unreasonable to write logs into db. SQL databases are developed with assumption of frequently reading and not writing.
We are trying to Integrate SQLite in our Application and are trying to populate as a Cache. We are planning to use it as a In Memory Database. Using it for the first time. Our Application is C++ based.
Our Application interacts with the Master Database to fetch data and performs numerous operations. These Operations are generally concerned with one Table which is quite huge in size.
We replicated this Table in SQLite and following are the observations:
Number of Fields: 60
Number of Records: 1,00,000
As the data population starts, the memory of the Application, shoots up drastically to ~1.4 GB from 120MB. At this time our application is in idle state and not doing any major operations. But normally, once the Operations start, the Memory Utilization shoots up. Now with SQLite as in Memory DB and this high memory usage, we don’t think we will be able to support these many records.
Q. Is there a way to find the size of the database when it is in memory?
When I create the DB on Disk, the DB size sums to ~40MB. But still the Memory Usage of the Application remains very high.
Q. Is there a reason for this high usage. All buffers have been cleared and as said before the DB is not in memory?
Any help would be deeply appreciated.
Thanks and Regards
Sachin
A few questions come to mind...
What is the size of each record?
Do you have memory leak detection tools for your platform?
I used SQLite in a few resource constrained environments in a way similar to how you're using it and after fixing bugs it was small, stable and fast.
IIRC it was unclear when to clean up certain things used by the SQLite API and when we used tools to find the memory leaks it was fairly easy to see where the problem was.
See this:
PRAGMA shrink_memory
This pragma causes the database connection on which it is invoked to free up as much memory as it can, by calling sqlite3_db_release_memory().