UUID Primary Keys and Index Performance: UUIDv4, ULID, and UUIDv7
- Published on
Junyoung Yang
When designing identifiers for a project, I once had to choose between a Long auto-increment ID and a UUID. When I wanted IDs exposed to the outside to be hard to guess, UUIDs looked like a reasonable option.
But I also often saw that using UUIDs as primary keys could affect index performance. At first, UUIDs looked good because they were hard to guess and seemed suitable for distributed environments. But after checking how values are inserted into indexes, I felt that this was not something to choose blindly. So I first checked the characteristics of different UUID-like identifiers.
This post is a record of comparing UUIDv4, ULID, and UUIDv7 from an index point of view.
Problem
UUIDv4 is close to a random value. It has the advantage of being hard to guess, but in a B-Tree index, new values can keep being inserted into scattered positions.
An auto-increment ID keeps increasing, so new values are usually inserted near the end of the index. On the other hand, UUIDv4 values can be inserted into many different positions in the index. Because of this, page splits can increase and cache efficiency can decrease.
In other words, I understood it not as "UUIDv4 is always slow," but as "UUIDv4 can be disadvantageous depending on the write pattern and index structure."
Approach
If the randomness of UUIDv4 is a concern, identifiers that include some time ordering can be considered. ULID and UUIDv7 are candidates here.
ULID uses both time information and random information. It also has the advantage that it is easy to sort when represented as a string. UUIDv7 is also a time-based UUID, so it keeps the UUID shape while having properties that are more favorable for sorting.
These two are not exactly the same choice. But compared to a random UUID, both can make the insertion position in the index a little easier to predict.
Trade-offs
That does not mean identifiers with time ordering are always better. There are cases where randomness is needed.
For example, if I want to make it hard to guess the creation time from an externally exposed ID, or if exposing order information is uncomfortable in a certain domain, a random UUID can be more appropriate.
On the other hand, for internal primary keys or tables where write performance matters, sortable identifiers can be a better choice.
In the end, identifiers should be chosen by looking at performance, exposed information, and implementation convenience together.
Checkpoints
When choosing identifiers for a project, I check these points.
- Is the ID exposed to the outside?
- Is it okay if the creation order is somewhat visible?
- Does this table care a lot about write performance and index efficiency?
- Is the type easy to handle in both the database and the application?
- How will IDs be generated in a distributed environment?
Without these conditions, it was hard to simply say that UUIDs are good or bad.
Takeaway
While looking at UUID primary keys and index performance, I saw that identifier choices also depend on the domain and usage conditions.
UUIDv4 has the advantage of being hard to guess, but it can be disadvantageous for index write patterns. ULID and UUIDv7 can reduce that problem through time-ordering properties, but they can also reveal some ordering information.
An identifier is not just a value. It is also a design element that affects how data is accumulated and queried.