Hungarian Notation
- Variable prefixes designed by
Microsoft.
- The intention is to compensate for the
weak typing of C / C++
by prepending info about the data type to the name of the variable, so
that a programmer is less likely to, for instance, mistakenly use a
pointer as if it were an integer variable, or vice versa.
Issues
- Usage of Hungarian notation is very
controversial.
- The main drawback is that it exposes
implementation details,
and interferes with abstraction (the concept of hiding implementation
details as much as possible).
- If a program used to use short integers,
and then changes to
using long integers instead, then throughout the source code there will
often be variable names with a prefix that implies a short integer,
although the variables are now long integers.
- Cases like the above (widespread changes
of basic data types
used) are especially common when porting software from 16-bit to
32-bit,
or from 32-bit to 64-bit.
- Using typedefs for basic data types can
make it easy to change
the basic data types used throughout a program, but in some cases,
changing the names of the variables after the fact is not an option,
like when some of the variables in question are part of a published API.
- The classic example of this is as follows:
LRESULT CALLBACK WindowProc(
HWND hwnd, /* handle of window */
UINT uMsg, /* message identifier */
WPARAM wParam, /* first message parameter */
LPARAM lParam); /* second message parameter */
- The above example is a 32-bit Windows
API
function, that still
bears some of the prefix info from the earlier, equivalent API function
from 16-bit Windows.
- In this example, "WPARAM wParam" seems
to
indicate
that the parameter is a word (an unsigned 16-bit integer); but in
32-bit
Windows WPARAM is defined as an unsigned 32-bit integer.
In this case, both the name of
the typedef, WPARAM, and the name of the variable, wParam (with a "w"
prefix for "word"), are misleading and inappropriate.
Standard Hungarian notation
Prefix Data Type
------ ---------
a Array
b Boolean
C Class
d Double
g_ Global variable
h Handle
i Integer (index into)
l Long
lp Long pointer to
lpfn Long pointer to function
m_ Object member variable
n Integer (number of)
p Pointer to
str CString
sz Zero terminated string
u Unsigned integer
Possible compromise notation
Prefix Data Type
------ ---------
g_ Global variable
h Handle
m_ Object member variable
p Pointer
pp Pointer to a pointer
str String template or user-defined string object
sz C-style null-terminated string
- The above compromise notation offers some
middle-ground:
prefixes sometimes used by those who want to avoid most of the
drawbacks
of standard Hungarian notation, but who still find an advantage in
using prefixes for certain types of data, in particular, pointers,
object member variables, and strings.
Strings
- Of the above prefixes, only "str" and
"sz" significantly
expose implementation details.
- But, the implementation details exposed
do not relate to
CPU-size porting (16-bit to 32-bit, or 32-bit to 64-bit).
- Also, the difference of behavior between
a standard C-style
string and a template string (or user-defined string object) is great
enough that a switch from one to the other would require extensive
reworking of the source code, as well as altering the API for any
relevant published functions or classes (i.e. The parts of the source
code that sometimes "can't" be changed, would have to be changed
anyways
in this case).
- As such, there wouldn't be an unavoidable
problem with
variable names that wind up being inconsistent with the underlying data
type.
Global variables
- It can be reasonably argued that global
variables should
never be used (or as close to never as possible); they interfere with
modularity, since they introduce data that can be accessed or modified
from anywhere in the code base.
- Usage of the "g_" prefix can help draw
special attention to
such variables, when they do exist.
Object member variables
- Usage of the "m_" prefix for object
member variables, allows
the names to be freely chosen for the methods (functions) and arguments
of a class without worrying about if they conflict with the names of
the
(usually private) member variables.
- In this way, the API for the class (the
public methods and
their arguments) can be created without having to worry about the
implementation details of the class (in particular, the names of the
private member variables).
- This also often eliminates the need to
use awkward names for
some of the public methods and arguments, so that they don't conflict
with the names of the private member variables
- In a sense, the naming awkwardness is
passed from the public
API to the private implementation details.
- This awkwardness of the names of the
private member variables
(the "m_" prefix) does not have to be a problem though, because in good
OOP practice, private member variables will only be accessed by getters
and setters (i.e. "get_variable1", "set_variable1"); thus, the actual
name of each private member variable would only be used in two small
methods.
Pointers
- Usage of the "p" prefix for pointers, can
help establish the
existence of indirection - that the variable has to be
dereferenced to access the data, via the * operator (for pointers to
basic data types) or the -> operator (for pointers to objects or
structs).
- It can also help establish the level of
indirection; for
instance, if a variable is a pointer to a pointer, using the "pp"
indicates not only that the variable will need to be dereferenced, but
that it will need to be dereferenced twice.
- Usage of the pointer prefixes "p" and
"pp" (and "ppp", and so
on) helps show that indirection exists, but without exposing the
implementation details of the data that the pointer points to (i.e.
whether the data in question is an long integer, a short integer, a
boolean, and so forth).