This is a subject that has obviously generated a lot of volume and controversy over the years, and there still isn’t any single agreed upon system for naming entities in relational databases. Through years of experience I have developed a system that I believe has a lot of merits above and beyond the typical recommendations I have seen. My system is a little counter intuitive at first, but I think the value is easy to recognize.
Obviously, one of the most well known naming conventions in computer programming in general is Hungarian Notation, which uses the convention of prefixing an entity with an abbreviation indicating the type of entity that it is. In the RDBMS world this has typically been employed in the naming of high level databases entities like tables, stored procedures, views, and functions, etc., but hasn’t been widely used for the naming of things like columns or variables. If we had a table, a view, and a stored procedure, all of which stored or returned customer data, then using Hungarian Notation they may be called something like tbl_Customers, vw_Customers, and sp_Customers respectively.
There is widespread agreement in the database world that this type of naming convention isn’t preferable for an RDBMS, to a large extent because the prefix is mostly meaningless within each type of entity, i.e. prefixing all tables with tbl_ provides no differentiation between tables, it only provides differentiation between types of entities, but there again, we shouldn’t actually be trying to differentiate between types of entities. One of the purposes of views is for them to be interchangeable with tables. In fact, there are cases where views are used to replace legacy tables, i.e. a table needs to be deprecated or re-named, so a view with the same name as the table is created in order to allow legacy code to continue to operate. In addition, SQL Server, and other database systems have a concept of Synonyms, which are basically aliases and in many cases they are used in a way that would be totally inconsistent with Hungarian Notation. For example, you can use Synonyms to create aliases to entities in a different database in order to deploy the same exact code to multiple databases or instances, but have all of the code reference a single database on a single instance. If the underlying lying object being aliased is tbl_Customers, then the Synonym would have to be called tbl_Customers in the other databases as well, which would violate the naming rule, because according to the naming rule it should be called syn_Customers, or syn_tbl_Customers, etc.
So, what convention do I use, and why?
The naming convention that I use for container entities, i.e. tables, views, stored procedures, etc. is to use a prefix that denotes the function of the entity. Some standard prefixes that I use for tables are:
- data_ (Tables that the contain information of primary concern, i.e. in the case of customers, the actual customer names and attributes, etc.)
- join_ (Tables used for the creation of many-to-many relationships)
- ref_ (Reference tables that store look-up values, i.e. customer type, etc.)
For more specialized development I have also used the following:
- app_ (Tables containing data that will be appended another set of data)
- raw_ (Tables housing raw data imported from some external source)
- typed_ (Tables with appropriately defined data types hold data that was previously imported in a raw_ table that may or may not have had correctly defined types (i.e. maybe in the raw_ table date columns were imported into a char column, etc.))
- std_ (Tables that represents a “standard” to which other data has been mapped)
- sum_ (Summary tables that are aggregations of data held in other tables)
- dim_ (Typically convention used in a data warehouse for dimensions)
- fact_ (Typical convention used in a data warehouse for facts that are to be associated with dimensions)
Note that all of those conventions are for Tables. Some of them I also apply to other entities as well, such as app_ and sum_, both of which I also apply to Views. For Views I’m a little less consistent, sometimes I don’t use any prefix at all, but other times I do use them. The prefixes I use for views are less consistent because they have a lot to do with the specific application. For example, I may create a view that pivots data in a table for the purpose of indicating records that are to be excluded from some form of analysis. This view may be joined to another table or view to provide a flattened listing of the reasons why records are marked for exclusion. In this case I’d prefix the view with ex_, etc.
When it comes to stored procedures things get rather “unconventional”, in that here how I prefix the procedures depends on what exactly is being done. In some cases I’ve created procedures that are intended to be run in a sequence. In those cases the sequence is the prefix. There I’ve gone back to the GW Basic days of line number conventions, using prefixes like 010_InitializeTables, 020_AppendWhatever, 030_EtcEtc, etc. The purpose here is making it easy to create a new procedure called 021_ if needed to fit between 020_ and 030_.
In other cases, where sequencing isn’t an issue I’ve tended more to things likes load_, rpt_, seq_, etc. reflecting the primary function of the procedure.
So, what’s the point to all of this? Well its all about organization, which to a large degree is that database are all about. Because entities tend to be listed in alphabetical order, prefixing is a way to group similar things together. In addition, the prefix gives you some information about what the function of the entity is. Looking at a database that I have designed, its easy to know which tables are the reference tables, which tables house the core data, and which tables you need to use for many-to-many joins. Yes, you can get that information from an ERD, assuming one has been created, this just makes it easier and more obvious.
But now on to columns within a Table or View. Here is where my convention is a little less intuitive, but I think is quickly understandable. I tend to name almost everything backwards from the way that we normally say things in the English language. This is because in the English language we tend to put descriptors in front of objects, i.e. we describe the object before we identify the object in the English language, e.g. “red car”, “primary address”, “last name”, “customer id”.
What I do in naming columns is reverse this order, and put the object first, e.g. “address primary”, ”name last”, “id customer”. Of course I would use id_customer or idCustomer, etc. In my case I use all lower case with words separated by underscores. The purpose for this is for avoiding any possible case sensitivity issues. Windows and SQL Server are not case sensitive, but others systems are and some programming languages are, so given that you are “never sure” how the database may be used or whatever, I tend to make my column names all lower case to reduce potential problems. The use of case and isn’t a big issue and these practices depend on what the practice is where you work, or what the customer requests etc.
In addition to putting the object first in the column name, when I design tables I always place the primary key columns at the front of the table in the order of the clustered index, and all other columns are sequenced alphabetically. This makes finding what you are looking for in a table a lot easier, especially if there are more than 10 or so columns.
By putting the object first in the name it not only makes recognizing what a column is easier, it also again leads to better grouping, especially when columns are created in alphabetical order. This becomes even more important when data is de-normalized, for whatever reason, be it in a table, an export file, via a view, or in a OLAP cube.
And last but not least, of course we have the issue of Schemas. (You thought I was going to leave that out didn’t you?) Because security is tied to Schemas they are best used in relation to “who”. Schemas are basically containers into which entities can be placed based on how those entities should be accessed, and by whom. As far as I know there is no “naming convention” for Schemas, other than to try and keep them as short as possible. I try to keep schema names below 5 characters personally. Depending on the setting, I’ve also used Schemas to identify who created the entity, i.e. “data”, “eng”, “dba”, “ba”, etc. In other words, this identifies if a table was created by a Data Analyst, someone in Engineering, a DBA, a Business Analyst, etc. This is more useful in a data warehouse type setting or data staging setting, you wouldn’t use this in the design of a OLTP system, for example.
So now let’s look at what basic database might look like using the naming conventions I’ve described above. Here is what a really basic set of tables would look like using this convention:
And here is an ERD with some additional detail:
As you can see, this does a pretty good job of organizing like entities together. Some possible questions here are:
- In the join table, why is id_customer before id_address since “a” comes before “c”? Answer: Because the columns are in the order of the clustered index, which is puts customer first since that’s how the data would most typically be inserted and read.
- Why prefix everything in the address table with the word address? This is a preference issue, but I did it here so that when the data is de-normalized alongside the customer data the address columns would be clearly identifiable, for example if you were to create a View that returned customer name along with address.
- But the address table looks silly with address_city in front of address_line1. Yeah, and if you are manually scripting the tables this may be a place where, within a given prefix, you violate the alphabetical rule to put things into a logical order, however I often create my metadata separately and use a script generator that automatically sorts all the (non-PK) columns in alphabetical order when generating the CREATE TABLE script.
So there you have it. That’s the naming convention that I’ve developed over the years and I think it works really well. There are some kinks here and there and times when aspects of it can seem subjective (what’s a data table vs what’s a reference table, its usually clear, but not always), but overall its done a good job for me of clarifying database structure and making it easier to find what is being looked for. Where I’ve gotten the most push-back is in the column naming, but once people see it in context they get on board unless there is need to preserve an existing convention. When people hear id_x or name_last, they often say, “Why say name_last instead of last_name, sounds weird,” but then I explain it and they usually get it.