Since two decades author and leadership consultant Frank Kanu helps top managers and executives to improve success ratios and productivity.
 About Frank Kanu  |  Testimonials  |  Order Books  |  Free Articles  |  Press  |  Excellence in Leadership  |  Genius One

Genius One Inc., Smart Solutions for Growth

Frank Uncovers Excellence in Leadership

Posts

How to speed up a database which has gotten slow

© Copyright Frank D. Kanu 2000-2008

PDF version

Preface

While the actual database design usually is able to handle the amount of stored data without major performance problems newly added clients and their data volume changes that. Once the size of a database grows close to 1 GB, the number of concurrent user exceeds 20 or the number of new a-records goes beyond 200 an hour, the need for speed improvement is raised.

This document is not in search for reasons why a database is designed the way it is.

In general there are two types of improvements involved: Direct and indirect.

  • Direct improvement by means of applying a database design.

  • Indirect by changing the way the client applications request data from the database.


It has to be understood that direct changes always force changes in the way client applications access the data. This document only considers direct changes.

Note:
The best way to get performance is to pay up-front for tuning. It is not a hidden cost if included in the initial design effort and is exponentially cheaper than performing tuning as an add-on feature.


Put simply, tuning involves three basic activities:

  1. Eliminating unnecessary processing.

  2. Eliminating redundant processing.

  3. Using more efficient processing in exchange for less efficient processing.


Usually you start of with the existing diagram of your database.

Oops—you do not have one? What about reverse engineering? Most case tools offer that option.

Unfortunately too often reversed-engineered diagrams of existing databases are an unreadable mess of 60 pages or more. This easily proves that databases grow into far too complex systems that do not allow the easy use of a case tool anymore. It is recommended to take the proper steps to not only simplify the database but to also force the existence of an easily readable and up-to-date diagram.

“Performance Design Review” should be part of both the “Database Design Review” and “Application Design Review” processes.

These are experiences while working with many different databases sized 2 GB up to 100 GB. Some of the information in this document can be found on the Internet as others faced the same challenges.

To simplify things blob fields and large varchar field (> 254 characters, but mainly > 2048 characters) will be named blobs and seen as synonym.

There are two major times when the behavior of databases changes: Once the database hits the 2GB size and then once again when the DB grows over 10GB. Once a table hits more then 10 million rows there will be changes in the behavior too, but they are less compared to the 2 GB size. And usually they appear around the same time, especially in non normalized databases. Some of the speed problems appear earlier, but they are usually covered by the use of expensive hardware or spending lots of time to fine tune the used hardware and/or software.

Once a database exceeds the 2GB there will be a few others—name them natural—issues:

  • Backing up and Restoring a database becomes a difficult task, especially when the DB is supposed to be running in a 24/7 environment.

  • When containing blobs the DB will grow exponentially faster with the growing size of the DB.

  • When containing blobs databases tend to crash more often and get corrupted faster.


Redundancy of data

Some of the described speed improvements will lead to more redundancy of data. While this is an unwanted side effect the improved speed makes up for it.

Possible Optimizations


There are quite a few major areas of improvement, which can gain more speed with less work:

  • hardware


    Some RAID controllers can cause database corruption. It is not clear under which circumstances and what vendor. Reports about this behavior can be found on the internet.

    Please avoid the mistake of throwing good hardware at bad design.

  • OS


    Generally the use of Unix based systems will speed up things.

  • database parameters


    Experience shows that changing the database parameters is a time intensive game which does not necessarily leads to consistent results. What works with one DB might fail with another. Having multiple databases on the same server makes a perfect set of settings for a specific database almost impossible. It will be very hard to measure the gain or estimate the time needed for finding the best settings.

  • Blob fields


    Blob fields should be moved out of the main database. Especially since different database vendors handle blobs differently. On top of that some databases need more temporary space for sorting or retrieving data when one of the involved tables contains a blob field. Queries have to be optimized way more to force the RDBMs to use the proper indices and minimize disk IO.

    Most times blobs shouldn’t be stored in the database at all. Access restriction can be handled via the OS.

    In case blobs will be stored in the database there should be only one blob table in the whole system. The access to the proper data will be done via the unique PK. See primary key discussion below.

    As described earlier huge databases containing many tables and blob fields tend to corrupt faster.

  • Indices


    The main condition for proper key design is the understanding that two unique keys have to be considered:

      Logical keys

      A logical key is combined of attributes. It is used to avoid duplicates in the database.

      Primary keys

      The primary key is nothing else then a physical unique identifier for every single row. Most RDBMs have this key implemented for internal use, but not available for the programmers use.

      The primary key must to be stable. With other words it can NEVER be changed.

      The primary key must never be the same as the logical key, because the logical key is made up of entities whose values can change.

      Way too often databases contain many tables where an additional field was added to force the uniqueness of the logical key: e.g. A_ID got a SEQ_NUM added for that sole purpose. Meaning keys represent a hierarchy among the data. This is inflexible and limits the number of levels. And it is plain wrong.

      If a logical key is used as primary key and this key is changed all foreign key references to this table have to be updated. Migration of logical keys used as primary keys will have terrible implications on database flexibility. A primary key independent of all attributes or relationships allows changes to the logical key without cascading changes. It allows maintaining data easily and sufficiently.

      Another major advantage is the minimal impact when adding attributes to the logical key. Since entities are defined by relationships programmers are tempted to use foreign key columns in the primary key. As stated above this is bad practice because changing the primary key will lead to major changes in all client applications and stored procedures. When using a real primary key there would be only the change to the logical key in the involved tables and any place where inserts and updates happen. All references to primary keys stay the way they are.

      Primary keys have to be




    • unique

    • not null

    • stable, meaning unchanged over their lifetime

    • different from the logical key

    • internally assigned, else they belong in the logical key

    • logical meaningless, to avoid the use of them


    Do all tables need to have primary keys?

    Yes. Just do it!
    When a table contains rows which have to be referenced from other tables in the database then that table absolutely needs a primary key. In all other cases it depends on the way programmers use the data. In cases it will make sense to add the primary key to simplify the physical view of the data. From past experience we know that systems and their uses change.

    To be future-compatible always adding the primary key is best practice. Primary keys should be integers filled from one generator per DB.

    Benefits of primary keys


    • Improved speed

    • Improved flexibility

    • Simplification of the database schema

    • Easier transition to databases from other vendors because foreign key constraints are simplified


    There is only one major disadvantage: It takes a lot more time to implement the use of primary keys into an existing system then starting with primary keys right from the beginning. And it has to be done with a lot of discipline.


  • table design / table contents


    Most systems are designed in a way, which allows the easy retrieve of data in specific predefined ways. Adding new ways is complicated and time intensive.

    More often than not tables need a re-design to better reflect the data stored in them.

    A perfect example would be a table containing one varchar field to store numeric, alphanumeric and date values. This table should be split into more then one table to reflect the different types. It should be one logical table with one primary key but containing multiple physical tables, at least one for each different data type stored.

  • normalization


    • First NF
      All columns only contain atoms

    • Second NF
      All columns not containing the primary key are dependent on the primary key

    • Third NF
      All columns which do neither contain the primary nor the logical key are independent from each other

    • Fourth, Fifth NF and further normalization
      Has been proven as not relevant to actual RDBMs and will be therefore ignored.


    Normalizing the tables will lead to speed improvements and a more simplified ERWin diagram.

  • query optimization


    Query optimization depends heftily on the used database and will be discussed in another white paper.


  • Tags:
    10 million  absolutely  alphanumeric  application design  applications access  atoms  attributes  blob fields  blobs  case tool  case tools  challenges  circumstances  client applications  complex systems  concurrent user  consistent results  constraints  crash  data volume  database  database corruption  database design  database flexibility  database hits  database parameters  database schema  database vendors  databases  dependent  design effort  discipline  duplicates  entities  erwin  existence  existing system  experiences  exponentially  field queries  fine tune  flexibility  force changes  foreign key  gb  hierarchy  implications  improvement  improvements  initial design  integers  intensive game  key design  lifetime  many tables  migration  minimal impact  mistake  need for speed  nf  normalization  normalized  normalizing  pdf version  performance design  performance problems  physical tables  pk  predefined  preface  primary key  primary keys  processes  programmers  proper key  proper steps  query optimization  raid controllers  rdbms  readable  redundancy  relationships  request data  restriction  retrieving data  reverse engineering  right from the beginning  seq  simplification  sole purpose  speed improvement  speed improvements  speed problems  stable  stored procedures  synonym  table contents  transition  unchanged  uniqueness  unix  unwanted side effect  used hardware  varchar  vendor reports  volume changes
     Technorati (All Links are external): 10 million  absolutely  alphanumeric  application design  applications access  atoms  attributes  blob fields  blobs  case tool  case tools  challenges  circumstances  client applications  complex systems  concurrent user  consistent results  constraints  crash  data volume  database  database corruption  database design  database flexibility  database hits  database parameters  database schema  database vendors  databases  dependent  design effort  discipline  duplicates  entities  erwin  existence  existing system  experiences  exponentially  field queries  fine tune  flexibility  force changes  foreign key  gb  hierarchy  implications  improvement  improvements  initial design  integers  intensive game  key design  lifetime  many tables  migration  minimal impact  mistake  need for speed  nf  normalization  normalized  normalizing  pdf version  performance design  performance problems  physical tables  pk  predefined  preface  primary key  primary keys  processes  programmers  proper key  proper steps  query optimization  raid controllers  rdbms  readable  redundancy  relationships  request data  restriction  retrieving data  reverse engineering  right from the beginning  seq  simplification  sole purpose  speed improvement  speed improvements  speed problems  stable  stored procedures  synonym  table contents  transition  unchanged  uniqueness  unix  unwanted side effect  used hardware  varchar  vendor reports  volume changes  business  it
     
    Digg  del.icio.us  StumbleUpon  Technorati  Reddit This blog-entry is protected by a digital fingerprint:785273ed81985582c8a1be62f78c9459
     
    • Frank Kanu on Thursday, October 7th, 2004 @ 08:00
    • Filed under Business, IT


    You can follow responses to this entry through the RSS 2.0 feed.

    Readers, who enjoyed reading this posting, also read:
    • follow up to "How to speed up a database which has gotten slow"
    • Index? - Answer
    • Index?
    • Quotes/Quotations Database

    Because I value your thoughtful opinions, I encourage you to add them.

    Please leave your Response right here:




    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>



    Please do not be offended if I edit your response for clarity or to keep out questionable matters, however, and I may even delete off-topic responses.

    Archives and Links




    « Happy Birthday Billy!     Congratulations! »


    • Tags
      • blog
        confucius
        decisions
        e book
        education
        failure
        few days
        followers
        job
        kanu
        leader
        leaders
        leadership skills
        love
        mail
        manager
        managers
        mistakes
        money
        president bush
        programmers
        questions
        start leading
        stop telling
        success

      • Tags sorted alphabetically
      • Tags sorted numerically
    • Categories
      • Business
        • Case Studies
        • Negotiations
      • Cooking
      • Definitions
      • Ethics
      • Events
      • Frank Kanu
      • General
        • Daily Question
        • Quote of the Day
        • Weekly Poll
      • IT
        • Code review
      • Leadership
      • Management
        • Teams
      • Motivation
      • Politics
      • Something funny
      • Stop Telling… Start Leading!
    • Random Posts
      • - 11/24/2008
      • - 10/30/2008
      • - Phishing

      • blogmap
      • More about Frank
    • Links
      • Frank Kanu's Books
      • - Andy Coote
      • - Andy Wibbels
      • - Billy McDermott*s Blog
      • - Blogcritics
      • - dandrea projetos
      • - David Intersimone “David I”
      • - Derek Jones
      • - Develop Your Vision
      • - enhance your leadership skills
      • - Forbes.com
      • - Frank Kanu
      • - Frank Kanu’s article archive
      • - Genius One
      • - Handelsblatt
      • - Karel’s Legal Blog
      • - Managing Leadership
      • - New York Nitty-Gritty
      • - News for nerds, stuff that matters
      • - Philip Greenspun’s Weblog
      • - Practical Developmental Ideas
      • - Practical Solutions to Business Puzzles
      • - Recipes
      • - Rick Cooper, The PDA Pro
      • - Roberta Pili
      • - Sacred Cow Dung
      • - Savvy Intrapreneur
      • - Scrapbooker for hire
      • - Scrapbooking 4 others
      • - seth godin’s blog
      • - stop telling… start leading!
      • - The Art of Managing People by Asking Questions
      • - The Washington Post
      • - TomPeters!
    • RSS Feed
      • Syndicate using RSS
      • The latest comments to all posts in RSS
    • Archives/Calendar
      • yearly archives
      • monthly archives
      • weekly archives
      • daily archives
      December 2008
      M T W T F S S
      « Nov    
      1234567
      891011121314
      15161718192021
      22232425262728
      293031  
    • Blog Of The Day Awards Winner

    If not otherwise stated - all postings © Frank D. Kanu. All rights reserved.

    This blog is designed to provide accurate and authoritative information in regard to the subject matter covered.
    If legal advice or other expert assistance is required, the services of a competent professional should be sought.



    Genius One Inc. · USA
    Fax: (509) 463-0129 · E-mail:
    info@GeniusOne.com

     


    Privacy Statement   Sitemap

     
    Copyright © 2000-2008 Genius One Inc.
    All rights reserved.
    Reproduction or use in whole or part in any form or medium without written permission of Genius One Inc. is prohibited.
    If parts or any part of these WebPages is judicially determined to be invalid, that invalidity will not affect the remaining part of these Pages.