Home » When should a primary key be declared non-clustered?

When should a primary key be declared non-clustered?


The question is not ‘when should the PK be NC’, but instead you should ask ‘what is the proper key for the clustered index’?

And the answer really depends on how do you query the data. The clustered index has an advantage over all other indexes: since it always includes all columns, is always covering. Therefore queries that can leverage the clustered index certainly do not need to use lookups to satisfy some of the projected columns and/or predicates.

Another piece of the puzzle is how can an index be used? There are three typical patterns:

  • probes, when a single key value is seek-ed in the index
  • range scans, when a range of key values is retrieved
  • order by requirements, when an index can satisfy an order by w/o requiring a stop-and-go sort

So if you analyze your expected load (the queries) and discover that a large number of queries would use a particular index because they use a certain pattern of access that benefits from an index, it makes sense to propose that index as the clustered index.

Yet another factor is that the clustered index key is the lookup key used by all non-clustered indices and therefore a wide clustered index key creates a ripple effect and widens all the non-clustered indices and wide indices mean more pages, more I/O, more memory, less goodness.

A good clustered index is stable, it does not change during the lifetime of the entity, because a change in the clustered index key values means the row has to be deleted and inserted back.

And a good clustered index grows in order not randomly (each newly inserted key value is larger than the preceding value) as to avoid page splits and fragmentation (without messing around with FILLFACTORs).

So now that we know what a good clustered index key is, does the primary key (which is a data modelling logical property) match the requirements? If yes, then the PK should be clustered. If no, then the PK should be non-clustered.

To give an example, consider a sales facts table. Each entry has an ID that is the primary key. But the vast majority of queries ask for data between a date and another date, therefore the best clustered index key would be the sales date, not the ID. Another example of having a different clustered index from the primary key is a very low selectivity key, like a ‘category’, or a ‘state’, a key with only very few distinct values. Having a clustered index key with this low selectivity key as the leftmost key, e.g. (state, id), often makes sense because of ranges scans that look for all entries in a particular ‘state’.

One last note about the possibility of a non-clustered primary key over a heap (i.e. there is no clustered index at all). This may be a valid scenario, the typical reason is when bulk insert performance is critical, since heaps have significantly better bulk insert throughput when compared to clustered indices.

The basic reason to use Clustered indexes is stated on Wikipedia:

Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order. Therefore, only one clustered index can be created on a given database table. Clustered indices can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected.

Say that I have a table of People, and these people have a Country column and a unique Primary Key. It’s a demographics table, so these are the only things I care about; what Country and how many unique people are tied to that country.

I am thus only ever likely to SELECT WHERE or ORDER BY the Country column; a clustered index on the Primary Key doesn’t do me any good, I’m not accessing this data by PK, I’m accessing it by this other column. Since I can only have one clustered index on a table, declaring my PK as Clustered would prevent me from using a Clustered Index on Country.

In addition, here’s a good article on Clustered vs Nonclustered Indexes, turns out clustered indexes caused insert performance issues in SQL Server 6.5 (which at least hopefully isn’t relevant for most of us here).

If you put a clustered index on an IDENTITY column, then all of your inserts will happen on the last page of the table – and that page is locked for the duration of each IDENTITY. No big deal… unless you have 5000 people that all want the last page. Then you have a lot contention for that page

Note that this isn’t the case in later versions.

If your primary key is of the UNIQUEIDENTIFIER, make sure to specify that it’s NONCLUSTERED. If you make it clustered, every insert will have to do a bunch of shuffling of records to insert the new row in the correct position. This will tank performance.

Related Solutions

Calculate the sum with minimum usage of numbers

Here's a hint: 23 : 11 + 11+ 1 ( 3 magic numbers) 120: 110+ 10 (2 magic numbers) The highest digit in the target number is the answer, since you need exactly k magic numbers (all having 1 in the relevant position) in order for the sum to contain the digit k. So...

Why not drop the “auto” keyword? [duplicate]

Your proposal would be rejected on the basis of backward compatibility alone. But let's say for the sake of argument that the standards committee like your idea. You don't take into account the numerous ways you can initialize a variable widget w; // (a) widget...

Recursive to iterative using a systematic method [closed]

So, to restate the question. We have a function f, in our case fac. def fac(n): if n==0: return 1 else: return n*fac(n-1) It is implemented recursively. We want to implement a function facOpt that does the same thing but iteratively. fac is written almost in...

How can I match values in one file to ranges from another?

if the data file sizes are not huge, there is a simpler way $ join input1 input2 | awk '$5<$4 && $3<$5 {print $2, $5-$3+1}' B100002 32 B100043 15 B123465 3 This Perl code seems to solve your problem It is a common idiom: to load the entire...

Javascript difference between “=” and “===” [duplicate]

You need to use == or === for equality checking. = is the assignment operator. You can read about assignment operators here on MDN. As a quick reference as you are learning JS: = assignment operator == equal to === equal value and equal type != not equal !==...

Compiler complains about misplaced else [closed]

Your compiler complains about an misplaced else because, well, there is an else without a preceding if: // ... for (j=1; j<n-i; j++) { if(a[j]<=a[j+1]) { // ... } // END OF IF } // END OF FOR else { continue; } // ... The else in your code does not follow...

Bootstrap – custom alerts with progress bar

/* !important are just used to overide the bootstrap css in the snippet */ .alertContainer { border-radius: 0 !important; border-width: 0 !important; padding: 0 !important; height: auto !important; position: absolute !important; bottom: 15px !important; left:...

How to Garbage Collect an external Javascript load?

Yes, s.onload = null is useful and will garbage collect! As of 2019, it is not possible to explicitly or programmatically trigger garbage collection in JavaScript. That means it collects when it wants. Although there is cases where setting to null may do a GC...

Math programming with python

At first, what you are looking for is the modulo operator and the function math.floor() Modulo from wikipedia: In computing, the modulo operation finds the remainder after division of one number by another (sometimes called modulus). for example: 12%12=0...

Android slide over letters to create a word [closed]

Here some advice you can use: First for each cell you can create an object that represents the state of that cell: class Cell { char mChar; int row,column; boolean isSelected; } then you can create a 2D array of your cells Cell[][] mTable = ... For views you...

Sum two integers in Java

You reused the x and y variable names (hence the variable x is already defined in method main error), and forgot to assign the ints read from the Scanner to the x and y variables. Besides, there's no need to create two Scanner objects. public static void...

Extend three classes that implements an interface in Java

Using this simplified implementation of the library, using method() instead of M(): interface IFC { void method(); } class A implements IFC { public void method() { System.out.println("method in A"); }; } As akuzminykh mentions in their comment You'd write a...

How to set the stream content in PHPExcel? [closed]

Okey, First thing first PHPExcel_Worksheet_MemoryDrawing() can't solve your problem if you insist to use stream content and pass that to your worksheet your PDF will not render your image. But you can use `PHPExcel_Worksheet_Drawing()' if you want to render...

How to remove all files from a directory?

Linux does not use extensions. It is up to the creator of the file to decide whether the name should have an extension. Linux looks at the first few bytes to figure out what kind of file it is dealing with. To remove all non-hidden files* in a directory use: rm...

Hacker used picture upload to get PHP code into my site

Client side validation The validation code you have provided is in JavaScript. That suggests it is code that you use to do the validation on the client. Rule number one of securing webapps is to never trust the client. The client is under the full control of...