[Date Prev][Date Next][Thread Prev][Thread Next]
[Author Index] [Date Index] [Thread Index]
[SQR-USERS Info] [SQRUG Home Page]

Re: SQR Performance Issues (New Question)



Hi Sarah,

Here are my guidelines for efficient Select statements.

I understand that the thought process is that the order of the statements in the
Where clause does not matter because the database's optimizer will automatically
rearrange the clause to pull the data in the most efficient manner.
Unfortunately, I've seen database optimizers not work. Therefore, I have a
tendancy to create my statements in the Where one table at a time.

I can tell you that I saw many times where the order of the statements in both
the From and the Where clauses made a major difference during my six years of
working with Oracle and my year of working with DB2. For example, if I want to
know who was active in department XXX on January 1, 2000, I need to join Job and
Personal_Data. One way to write the From/Where clauses would be:

>From PS_Job A,
  PS_Personal_Data B
Where B.EmplID = A.EmplID
  And A.EffDT = (Select Max(EffDT)
                                From PS_Job
                                Where EmplID = A.EmplID
                                   And Empl_Rcd# = A.Empl_Rcd#
                                   And EffDT <= '01/01/2000')
  And A.EffSeq = (Select Max(EffSeq)
                                  From PS_Job
                                  Where EmplID = A.EmplID
                                    And Empl_Rcd# = A.Empl_Rcd#
                                    And EffDT = A.EffDT)
  And A.Empl_Status = 'A'
  And A.DeptID = 'XXX'

I have seen Oracle take this and join every Job record with its respective
Personal_Data record before it looked at the EffDT, EffSeq, and Empl_Status
fields. I had 15,000 active/inactive employees at the time. If each had only 5
Job records, Oracle was joining 75,000 records before it determined the 6 people
who were active in the department on the given date.

I was able to dramatically cut processing time by rearranging the Where clause
to specify all of the Job criteria first.

>From PS_Job A,
  PS_Personal_Data B
Where A.EffDT = (Select Max(EffDT)
                                From PS_Job
                                Where EmplID = A.EmplID
                                   And Empl_Rcd# = A.Empl_Rcd#
                                   And EffDT <= '01/01/2000')
  And A.EffSeq = (Select Max(EffSeq)
                                  From PS_Job
                                  Where EmplID = A.EmplID
                                    And Empl_Rcd# = A.Empl_Rcd#
                                    And EffDT = A.EffDT)
  And A.Empl_Status = 'A'
  And A.DeptID = 'XXX'
  And B.EmplID = A.EmplID

Using this second Where clause, Oracle would pull the active employees in the
department as of 1/1/2000, and then join the records with Personal_Data. Using
the same number of employees, I would be performing 6 joins rather than 75,000!

Perhaps optimizers have gotten better in the last couple years, but I personally
feel more comfortable having all of the criteria for a specific table together
in the Where clause.

My personal rules for From/Where/Order By clauses are:

     Place tables in the From clause from most restrictive to least restrictive.
     In the above example, I have Job before Personal_Data. I want the 6 records
     from Job before I look at the related Personal_Data records. If the tables
     are reversed, I get 15,000 records from Personal_Data to join with Job
     because there will not be any restrictions on Personal_Data.

     The order of the tables in the Where should match the order of the tables
     in the From. This places the most restrictive conditions at the beginning
     of the Where.

     Within each table within the Where, I follow the table's key as much as
     possible. Again, following the above example, I have EffDT and EffSeq (key
     fields) from Job appearing in the Where prior to Empl_Status and DeptID
     (non-key fields).

     Table - Job
     Fields - EmplID (key), Empl_Rcd# (key), EffDT (key), EffSeq (key), DeptID,
     Empl_Status, ...

     Subqueries within subqueries begin to introduce slowness.

     In general, subqueries do not need to include fields from the record
     structure that are below the field you're concerned with in the key. For
     example, above, I do not include EffSeq as part of the subquery looking at
     EffDT. Since EffSeq is structurally below EffDT in the table, it has
     nothing to do with EffDT. However, EffDT, structurally above EffSeq in the
     table, has everything to do with pulling the proper EffSeq. Thus, EffDT is
     part of the subquery looking at EffSeq.

     Notice that I said "In general" at the beginning of the previous paragraph.
     There are times when it is necessary to include a structurally lower field
     within a subquery. It depends on the data you need.

     Usually, when pulling Select Max(EffDT), you want the EffDT to be <= to
     another date. Otherwise you could retrieve future dated rows that you don't
     really want.

     Don't try to do too much in the Where clause. Where clauses can become
     overburdened and complicated. Sometimes it's more efficient to move
     conditions out of the Where and into If statements within the body of the
     Select.

     Finally, it helps if the tables needed in the Order By clause are in the
     same sequence as the From clause. For example:

     From PS_Job A,
       PS_Dept_Tbl B
     Where ...
     Order By A.DeptID

     is generally more efficient than

     From PS_Job A,
       PS_Dept_Tbl B
     Where ...
     Order By B.Descr

     There are times when conflicts arise in this ordering of tables. During
     those times, the Order By clause takes a lower priority.

HTH,
Rod





"Sarah Jane A. Lohrey" <lohreysj@MUOHIO.EDU> on 03/09/2001 03:40:16 PM

Please respond to sqr-users@list.iex.net

To:   SQR-USERS@list.iex.net
cc:    (bcc: Rodney Wright/Trcb/Rouse)

Subject:  SQR Performance Issues (New Question)



Hello Everyone,

I was watching all of your postings with regard to Deepak's questions on
SQR Performance Issues and I have a new question for you.


I consider myself a novice at SQR yet I have been asked to give a 3 hour
workshop to discuss performance issues in SQR programming to a small group
of users who are even newer to SQR than I am.   I'm hoping some of you can
add to my list of "Performance issues" to consider.


The kinds of things I have been think of so far include:  (NOTE:  THERE ARE
SOME QUESTIONS IN MY LIST)

1. From our most recent postings about using count(*) in subqueries.  There
are more efficient ways to write subqueries.
2. When using evaluate statements, one should use break logic and evaluate
the values that are most frequent first so that the break logic will exit
an evaluate sooner.
3.  When performing selects to multiple tables, does the order of the
fields in the select affect performance?  Perhaps the order of the tables
in the from clause or the where clauses affect the performance.   Do you
want the smaller tables listed first or last?
4. Is it true that the LET command is less efficient than the MORE command.
5.  ....

What are some of the most common consideration you have when attempting to
develop efficient programs?


Thanks in advance for any thoughts you might have on this subject.

Sarah



Sarah Jane A. Lohrey
Programmer
Miami Computing & Information Services
PH: 513.529.6074
FX: 513.529.1434
lohreysj@muohio.edu