Hungarian Notation

  • Variable prefixes designed by Microsoft.
  • The intention is to compensate for the weak typing of C / C++ by prepending info about the data type to the name of the variable, so that a programmer is less likely to, for instance, mistakenly use a pointer as if it were an integer variable, or vice versa.

Issues

  • Usage of Hungarian notation is very controversial.
  • The main drawback is that it exposes implementation details, and interferes with abstraction (the concept of hiding implementation details as much as possible).
  • If a program used to use short integers, and then changes to using long integers instead, then throughout the source code there will often be variable names with a prefix that implies a short integer, although the variables are now long integers.
  • Cases like the above (widespread changes of basic data types used) are especially common when porting software from 16-bit to 32-bit, or from 32-bit to 64-bit.
  • Using typedefs for basic data types can make it easy to change the basic data types used throughout a program, but in some cases, changing the names of the variables after the fact is not an option, like when some of the variables in question are part of a published API.
  • The classic example of this is as follows:
LRESULT CALLBACK WindowProc(  
HWND hwnd, /* handle of window */
UINT uMsg, /* message identifier */
WPARAM wParam, /* first message parameter */
LPARAM lParam); /* second message parameter */
  • The above example is a 32-bit Windows API function, that still bears some of the prefix info from the earlier, equivalent API function from 16-bit Windows.
  • In this example, "WPARAM wParam" seems to indicate that the parameter is a word (an unsigned 16-bit integer); but in 32-bit Windows WPARAM is defined as an unsigned 32-bit integer. In this case, both the name of the typedef, WPARAM, and the name of the variable, wParam (with a "w" prefix for "word"), are misleading and inappropriate.

Standard Hungarian notation

Prefix  Data Type
------ ---------
a Array
b Boolean
C Class

d Double
g_ Global variable
h Handle

i Integer (index into)
l Long
lp Long pointer to

lpfn Long pointer to function
m_ Object member variable
n Integer (number of)

p Pointer to
str CString
sz Zero terminated string

u Unsigned integer

Possible compromise notation

Prefix  Data Type
------ ---------
g_ Global variable
h Handle
m_ Object member variable

p Pointer
pp Pointer to a pointer
str String template or user-defined string object

sz C-style null-terminated string
  • The above compromise notation offers some middle-ground: prefixes sometimes used by those who want to avoid most of the drawbacks of standard Hungarian notation, but who still find an advantage in using prefixes for certain types of data, in particular, pointers, object member variables, and strings.

Strings

  • Of the above prefixes, only "str" and "sz" significantly expose implementation details.
  • But, the implementation details exposed do not relate to CPU-size porting (16-bit to 32-bit, or 32-bit to 64-bit).
  • Also, the difference of behavior between a standard C-style string and a template string (or user-defined string object) is great enough that a switch from one to the other would require extensive reworking of the source code, as well as altering the API for any relevant published functions or classes (i.e. The parts of the source code that sometimes "can't" be changed, would have to be changed anyways in this case).
  • As such, there wouldn't be an unavoidable problem with variable names that wind up being inconsistent with the underlying data type.

Global variables

  • It can be reasonably argued that global variables should never be used (or as close to never as possible); they interfere with modularity, since they introduce data that can be accessed or modified from anywhere in the code base.
  • Usage of the "g_" prefix can help draw special attention to such variables, when they do exist.

Object member variables

  • Usage of the "m_" prefix for object member variables, allows the names to be freely chosen for the methods (functions) and arguments of a class without worrying about if they conflict with the names of the (usually private) member variables.
  • In this way, the API for the class (the public methods and their arguments) can be created without having to worry about the implementation details of the class (in particular, the names of the private member variables).
  • This also often eliminates the need to use awkward names for some of the public methods and arguments, so that they don't conflict with the names of the private member variables
  • In a sense, the naming awkwardness is passed from the public API to the private implementation details.
  • This awkwardness of the names of the private member variables (the "m_" prefix) does not have to be a problem though, because in good OOP practice, private member variables will only be accessed by getters and setters (i.e. "get_variable1", "set_variable1"); thus, the actual name of each private member variable would only be used in two small methods.

Pointers

  • Usage of the "p" prefix for pointers, can help establish the existence of indirection - that the variable has to be dereferenced to access the data, via the * operator (for pointers to basic data types) or the -> operator (for pointers to objects or structs).
  • It can also help establish the level of indirection; for instance, if a variable is a pointer to a pointer, using the "pp" indicates not only that the variable will need to be dereferenced, but that it will need to be dereferenced twice.
  • Usage of the pointer prefixes "p" and "pp" (and "ppp", and so on) helps show that indirection exists, but without exposing the implementation details of the data that the pointer points to (i.e. whether the data in question is an long integer, a short integer, a boolean, and so forth).