Does encrypting data on Netezza using the SQL extension toolkit utilize more disk space? http://www-01.ibm.com/support/docview.wss?uid=swg21672494
Not that I’ve ever used that functionality, but I would expect that to be a VERY limited effect.
A comment:
I hope you will not use this functionality for ALL data. At least join keys and columns you plan to run WHERE clauses against should probably be omitted for performance reasons. Furthermore I fail to see what you try to accomplish by this? Make sure that the DBA cannot read your data?
I suspect that will not work: your pg.log will contain the ‘password’....
Related
Hello there fellow netizens,
I have a SQL database (about 600MB big) that I want to import into my GAE app. I know that one possibility would be to simpy use Google Cloud SQL, but I'd rather have the data available in NDB to get the benefits thereof. So I'm wondering, how should I think about converting the SQL schema into a NDB schemaless structure? Should I simply set up Kinds to mirror each table? How ought I deal with foreign keys that relate different tables?
Any pointers are greatly appreciated!
- Lee
How should I think about converting the SQL schema into a NDB schemaless structure?
If you are planning to transfer your SQL data to the Datastore, you need to think about how these two systems are very different.
Should I simply set up Kinds to mirror each table?
In thinking about making this transfer, simple analogies like this will only get you so far. Thinking SQL on a schemaless DB can get you in serious trouble due to the difference in implementation, even if at first it helps to think of a Kind as a table, Entity properties as columns, etc... In short, no, you should not simply set up Kinds to mirror each table. You could, but it depends what kind of operations you want to support on these entities, how often these ops will occur, what kind of queries your system relies on, etc...
How ought I deal with foreign keys that relate different tables?
Honestly, if you're looking to use MySQL specific features like foreign keys, or your data model will require a lot of rethinking. A "foreign key" could be as little as maintaining a key reference to the other Kind in an Entity of a certain Kind.
I would suggest that you stick with Cloud SQL if your data storage solution is already built in SQL, unless you are willing to A) rethink your whole data model B) implement the new data model C) transfer the data you currently have D) re-code all code that interacts with data storage (unless using ORM, in which case your life might be easier for this aspect).
Depending how complex your SQL db is, and how much time you feel it will take to migrate to Datastore, and how much time/brainpower you are willing to commit to learning a new system and new ways of thinking, you should either stick with SQL or do the above steps to rebuild your data storage solution.
Heard that deferred_segment_creation had been introduced since Oracle 11g. I have gone through the documentation. Do we need to set the value of deferred_segment_creation for each table we created? Someone please help me in understanding the usage of deferred_segment_creation.
Deferred_segment_creation is normally set at the database level though it can be set at a session level. You can specify segment creation deferred when you create the table but that is very rare.
Generally deferred_segment_creation is helpful when you are installing large packaged applications that create thousands of tables of which many if not most will never be use in a particular installation. That avoids wasting space for tables that will never have any data. If you're building an application, you're probably not creating a ton of tables that will never have data so this is much less useful.
I am reworking a .NET application that so far has been running slowly. Our databases are Oracle, and the code is written in VB. When writing queries, I typically pass the parameters to a middle tier function which builds the raw SQL. I have a database class that has a function ExecuteQuery which takes in a SQL string and returns a DataTable. This uses an OleDbDataAdapter to run the query on the database.
I found some existing code that sends the SQL and a parameter to a stored procedure which as far as I can tell, opens the query and ouputs it to a SYS_REFCURSOR / DataSet.
I don't know why it's set up this way, but could someone tell me which is better performance-wise? Or the pros/cons to doing it this way?
Thanks in advance
Stored Procedures vs dynamic SQL have the exact same performance. In other words there is no performance advantage of one over the other. (Incidentally, I am a HUGE believer in using stored procs for everything for a host of other reasons but that's not the topic on hand).
Bottle necks can occur for many reasons.
For one, if you are actually code generating select statements it is highly probable that those statements are very unoptimized for the data the app needs. For example, doing a SELECT * which pulls 50 columns back versus a SELECT ID, Description which just pulls the two you need in your application at that point. In this example, the amount of data that has to be read from disk, transferred over the network wire, and pushed into objects in memory of the web server isn't trivial.
These will have to be evaluated on a case by case basis.
I would highly suggest that if you have a "slow" application that you need to improve the performance of the very first thing you ought to do is profile the application. What part of it is running slow? It might be inside the database server, it might be in your middle tier, it may even be a function of your network bandwidth or memory / load limitations on your web server. Heck, there might even be a WAIT command lurking somewhere in there placed by some previous programmer that left the company...
In short, you have at this point absolutely no idea on where to begin. So looking at actual code is premature. Go profile the app and see where things are slowing down. You might find that performance may radically improve simply by putting more memory in the database server.... Which is a much cheaper alternative than rewriting, testing and deploying vast amounts of code.
a stored procedure will definitely have better performance over building a raw query in code and executing it, but the important thing to realize is that, that difference in performance won't be your performance issue, there are many other things that will affect performance much more than just changing just query to be a stored procedure, even if you run a stored procedure and process the results using adapters, data tables, data sets, you're still incurring in a lot of performance, specially if you pass those large objects around (I have seen cases where datasets are returned wrapped in web service calls), so, don't focus on that, focus on caching data, having a good query, create the proper indexes, minimize the use of datasets, datatables, that will yield better benefits than just moving queries to stored procedures
I'm working on a website running ASP.NET with MSSQL Database. I realized, that the number of rows in several tables can be very high (possibly something like hundread million rows). I thnik, that this would make the website run very slowly, am I right?
How should I proceed? Should I base it on a multidatabase system, so that users will be separated in different databases and each of the databases will be smaller? Or is there a different, more effective and easier approach?
Thank you for your help.
Oded's comment is a good starting point and you may be able to just stop there.
Start by indexing properly and only returning relevant results sets. Consider archiving unused data (or rarely accessed data
However if it isn't Partioning or Sharding is your next step. This is better than a "multidatabase" solution because your logical entities remain intact.
Finally if that doesn't work you could introduce caching. Jesper Mortensen gives a nice summary of the options that are out there for SQL Server
Sharedcache -- open source, mature.
Appfabric -- from Microsoft, quite mature despite being "late
beta".
NCache -- commercial, I don't know much about it.
StateServer and family -- commercial, mature.
Try partitioning the data. This should make each query faster and the website shouldn't be as slow
I don't know what kind of data you'll be displaying, but try to give users the option to filter it. As someone had already commented, partitioned data will make everything faster.
I have considered SQLite, but from what I've read, it is very unstable at sizes bigger than 2 GB. I need a database that in theory can grow up to 10 GB.
It would be best if it was stand-alone since it is easier to implement for non-techie users, instead of having the extra step of installing something like MySQL which most likely will require assistance.
Any recommendations?
SQLite should handle your file sizes just fine. The only caveat worth mentioning is that SQLite is not suitable for highly-concurrent environments, since the entire database file is exclusively-locked during writing processes.
So if you are writing an application that needs to handle several users concurrently, a better choice would be Postgresql.
I believe SQLite will actually work fine for you with large databases, especially if you index them appropriately. Considering SQLite's popularity it seems unlikely that it would have fundamental bugs.
I would suggest that you revisit the decision to rule out SQLite, and you might try to compensate for the selection bias of negative reports. That is, people tend to publicize bug reports, not non-bug reports, and if SQLite were the most popular embedded database then you might expect to see more negative experiences than with less popular packages even if it were superior.