You may have looked into a GP performance issue and dug around a bit and discovered that you have high fragmentation in a large number of table indexes. After rebuilding indexes, a large number of indexes still have high fragmentation, and they all have an Index Type of HEAP. What is this and why does GP have messy HEAP tables?
First, let’s start by looking at your index fragmentation, and talking about indexes.
SELECT OBJECT_NAME(ind.OBJECT_ID) AS TableName, ind.name AS IndexName, indexstats.index_type_desc AS IndexType, indexstats.avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) indexstats INNER JOIN sys.indexes ind ON ind.object_id = indexstats.object_id AND ind.index_id = indexstats.index_id WHERE indexstats.avg_fragmentation_in_percent > 30 ORDER BY indexstats.avg_fragmentation_in_percent DESC
Run the query above against your company database. Note the “avg_fragmentation_in_percent > 30” line. This restricts the query to only show results if the fragmentation is above 30%.
The link above is to an XLSX of IV00101 (the Item Master) which I’ll reference below. There are several tabs along the bottom of the workbook:
IV00101 shows the complete table, with the data as it could be arranged physically on the hard-drive. Note that it is not in order by Item Number. When the Primary Key is “non-clustered”, data is simply added to the table using the next empty row at the bottom of the SQL table. Think of “SQL Table” as a single Excel worksheet.
For IV00101 the Primary Key (Primary Index) is ITEMNMBR (Item Number). An Index on a table is like an additional tab in an excel workbook. For the Primary Key of ITEMNMBR, SQL creates a page of data that contains only ITEMNMBR, and the data on that page is ordered by ITEMNMBR. Click on the PrimaryKey tab in the workbook.
A Non-Clustered index is a HEAP. The data is physically recorded on the disk in the order it was added to the table, in other words, it’s sort of a “heap” of data with no organization.
When a new record is added to IV00101, it is added to the bottom of the IV00101 page. SQL then inserts it into the correct position in the ordered list of the PrimaryKey tab.
The Key2, Key3 and Key4 tabs are the actual Alternate Keys (Indexes) on IV00101. Key2 is ITEMDESC (Item Description) and ITEMNMBR (Item Number). The data on that page is ordered by ITEMDESC then ITEMNMBR.
Likewise for Key3, the data is ordered by ITMCLSCD (Item Class Code) and ITEMNMBR (Item Number).
Each time you add a record to the IV00101 table, SQL also adds records to the page for each Index.
Now imagine that the index page is actually written on a yellow notepad. If you hit the end of the page you need a new page, and you need to place the paper on an open space on your desk. Now your index is fragmented. SQL Database Maintenance can reorder the indexes so that the data pages are next to each other rather than scattered around your desk.
If you were able to run the query above on your own database, the %-Fragmentation is a measure of how scattered your yellow notepad pages are on your desk. If it’s above 90%, that index is on a bunch of separate pages all over your desk.
I’m simplifying a lot. If you want a more technical description of index and fragmentation, click here.
So, why does GP have HEAP tables? Wouldn’t it make more sense to arrange the data on the disk in order by the Primary Key?
Because it mostly doesn’t matter, and probably not.
A clustered index (aka not a HEAP) arranges the data on the drive in the physical order dictated by the Primary Key. To write a new record it needs to rearrange the data on the page so it can insert the new Item Number into the correct row in the table. This is time consuming. So there is an argument in favor of HEAPs–they offer better performance writing data to the disk.
The HEAP doesn’t make any difference for most tasks. If GP needs a list of Items between A and D, it can use the Primary Key page, which is in the correct order.
Ultimately the real performance question is: Does the table have Indexes that provide (1) the columns needed by a GP function, and (2) in the correct order? For example, if I want a list of all Items in Item Class ATT CORD, SQL Server can use Key3. If that key didn’t exist, SQL would either have to read through the table by the PrimaryKey, pulling out the needed records, or simply read from the disorganized physical table. Either way is not efficient.
For a much more technical discussion of HEAPS, click here.
SQL Server has several tools to help you optimize database performance. In particular, the Database Engine Tuning Advisor can suggest additional table indexes to help performance based on your usage. Read more here.