Design a site like this with WordPress.com
Get started

Kara’s C++ STL Tutorial

A “Monty Python and the Holy Grail”-themed STL Tutorial

Version 1.3 © Kara Ottewell 1997-2020


Table of Contents

  1. Introduction
  2. Templates ite Domum
  3. What has the STL ever done for us ?
  4. Sequence Adapters
  5. Strings
  6. Iterators
  7. We are searching for the Associative Container
  8. Algorithms and Functions
  9. STL Related Web Pages
  10. Bibliography

Warning for the humour-impaired: Any strange references that you can’t understand are almost certainly a skit on Monty Python’s “Life of Brian”.

These notes formed part of an internal course on the STL which I was asked to give to my colleagues at Yezerski Roper

1. Introduction

The C++ Standard Template Library, generally referred to as the STL, saves you, the programmer, from having to re-invent the wheel. This course is aimed at programmers who have reasonable familiarity with the C++ programming language, and know about classes, constructors and the like. Avoiding esoteric functionality, the examples have been written to clearly demonstrate STL features. The sample programs also use Standard Library (as distinct from Standard Template Library) features like fstream and iostream, providing examples of how to use these. A discussion of the Standard Library as a whole is beyond the scope of this document – see Stroustrup and others in the bibliography for more information.

What motivated people to write the STL ? Many people felt that C++ classes were inadequate in situations requiring containers for user defined types, and methods for common operations on them. For example, you might need self-expanding arrays, which can easily be searched, sorted, added to or removed from without messing about with memory reallocation and management. Other Object-Oriented languages used templates to implement this sort of thing, and hence they were incorporated into C++.

Driving forces behind the STL include Alexander Stepanov and Meng Lee at Hewlett-Packard in Palo Alto, California, Dave Musser at General Electric’s Research Center in Schenectady, New York, Andrew Koenig, and of course “Mr C++” himself, Bjarne Stroustrup at AT&T Bell Laboratories.

The example programs are known to work on Alpha/VAX VMS 6.2 onward using DEC C++ 5.6, and Microsoft Visual Studio 2017 or greater. The free Community Edition is fine. Platform-specific #pragmas have been guarded with #ifdef _VMS or #ifdef _WIN32.

You can download the Microsoft Visual Studio solution, with all the samples as separate projects, as a .zip file here:  KarasSTLTutorial.zip. The original code was written in 1997, the STL was still in its infancy, and I’m slightly embarrassed about some of the code I wrote back then, but that was 23 years ago. The VS solution has precompiled headers switched off  to cut down on unnecessary files, and should not be regarded as a paragon of great coding. It is intended to be illustrative, with a dose of cringe-worthy humor thrown in. The standalone C and C++ files, more suitable for OpenVMS are downloadable as text files from this location , and as links to individual files throughout this tutorial.

To build under OpenVMS use the MAKE.COM command file. Just give a program name like example_1_1 as its argument and it will look for files with extension .CXX, .CPP, .C , in that order. If you provide the extension then it uses that. On Alphas, output files get an _ALPHA suffix. Here is an example:

$ @MAKE EXAMPLE_1_1 ! On an Alpha
  DEV$DISK:[KARA.WWW.STL]
CC/PREFIX=ALL EXAMPLE_1_1.C  -> EXAMPLE_1_1.OBJ_ALPHA
LINK EXAMPLE_1_1  -> EXAMPLE_1_1.EXE_ALPHA
$ @MAKE EXAMPLE_1_2 ! Now on a VAX
DEV$DISK:[KARA.WWW.STL]
CXX/ASSUME=(NOHEADER_TYPE_DEFAULT)/EXCEPTIONS/TEMPLATE_DEFINE=(LOCAL)
EXAMPLE_1_2.CXX -> EXAMPLE_1_2.OBJ
CXXLINK EXAMPLE_1_2 -> EXAMPLE_1_2.EXE

A slight buglet introduced in DEC C++ 5.6 for Alpha VMS means that you might get a warning in the CXXLINK step.

%LINK-W-NUDFSYMS, 1 undefined symbol:
%LINK-I-UDFSYM,         WHAT__K9BAD_ALLOCXV
%LINK-W-USEUNDEF, undefined symbol WHAT__K9BAD_ALLOCXV referenced
        in psect __VTBL_9BAD_ALLOC offset %X00000004
        in module MEMORY file SYS$COMMON:[SYSLIB]LIBCXXSTD.OLB;1

The undefined symbol is harmless and never referenced, but you can obtain the
official patch from ftp.service.digital.com. Download and run cxxae01056.a-dcx_axpexe
to unpack it, then whilst logged on as SYSTEM use @SYS$UPDATE:VMSINSTAL to install it.

Download individual sample programs using Right Click and “Save” on their links, or download all the examples in a .zip file. For Microsoft Visual Studio the solution and project files, as well as the examples, are provided in the Kara’s STL Tutorial .zip distribution.

The first two programs are implementations of expanding, integer arrays which
are sorted into value order. Example 1.1 is in ANSI C, and 1.2 is C++using the STL. I have tried to make the C program as general as possible by using typedef, but this is still not really adequate, as we will see.

Click on the link to see example_1_1.c

In the this program (see Kara’s C Course for an introduction to C) I used a typedef for the type of item I want to store and sort, and a get_array_space function to allocate and/or expand the available space. Even with very basic error handling get_array_space is rather ungainly. It handles different types of data if I change the typedef, but if I wanted more than one type of data storing, or even more than one buffer of the same type, I would have to write a unique get_array_space_type function for each. The compare_values function would also have to be rewritten, though this is also the case in the C++ code, for user defined types or pointers to values.

Click on the link to see example_1_2.cpp

Contrast the C++ program, Example 1.2, with the previous code. Using the STL, we invoke the vector template class, which allows us to store any data type we like in what is essentially a contiguous, self-expanding, random access array.

2. Templates ite Domum

This is (incorrect) Latin for “Templates Go Home !” and represents the ambivalence that some new C++ programmers feel towards this language feature. I hope that you will be persuaded of its usefulness after reading this section.

C++ supports a number of OOP (Object Oriented Programming) concepts. Broadly speaking, it supports encapsulation through the member functions and private or protected data members, inheritance by allowing classes to be derived from other
classes and abstract base classes, and polymorphism through virtual functions, function signatures and templates. Templates achieve polymorphism by allowing us to define classes or functions in a generic way, and let the compiler/linker generate an instantiation of the function or class using the actual types we require.

The STL is built, as its name suggests, on the C++ template feature. There are two types of template: function templates and class templates. Both perform similar roles in that they allow functions or classes to be specified in a generic form, enabling the function or class to be generated for any data type – user defined or built-in.

At first sight this might not appear to be very different from macros in the C language. In the C program Example 1.1 we could have made it more flexible by using a macro to define the comparison function.

#define COMPARE_VALUES( value_type ) \
value_type compare_values_##value_type( const void *a, const void *b ) \
{const value_type *first, *second; \
first = (value_type *)a; second = (value_type *)b; return( *first - *second );}
COMPARE_VALUES( float ) /* Generate function for floats, */
COMPARE_VALUES( double ) /* doubles and */
COMPARE_VALUES( int ) /* ints */
.
/* Pick comparison function */
qsort( v, nitems, sizeof(array_type), compare_values_int );

The same method can be used for structure generation. There are a number of drawbacks to this approach. You have to explicitly generate functions for all
the types you want to use, and there is no type checking in this particular
case, so you could easily pass compare_values_float when you meant
to compare integers, and you would have to be rigorous about your naming
convention. In addition, some people would argue that macros are not as
transparent, since you can’t see what they expand into until compilation time.

Templates avoid these problems. Because they are built into the language, they
are able to provide full type safety checking and deduce the types of their
arguments automatically, generating the code for whatever arguments are used.
C++ allows you to overload operators like < for user-defined types so the same template definition often suffices for built-in and user-defined classes. The following two sections discuss the syntax and use of template functions and classes.

Function Templates

Template functions have the following form:

template < template-argument-list >
function-definition

The template-argument-list is one or more type-names within the scope of
the template definition. In template functions the first argument is always a type, as in this code fragment.

template 
T mymin( T v1, T v2)
{
  return( (v1 < v2) ? v1 : v2 );
}

The above example uses class T but you can also use things like
template if you want to constrain a particular parameter to be of some known type. For example, if you were doing some sort of buffer class, you might want people to be able to pre-allocate a particular size of buffer,

template 
class MyBuffer {
 protected:
  T MyBufferObjects[iMaxObjects];
 public:
   & x(const T & x) { array[0] = x; return *this; }
  MyBuffer void func(const T & buffer)
  {
       // Whatever
  }
};

class StringBuffer : public MyBuffer // Defaults to 256
{
   // Whatever
}

class ScreenBuffer : public MyBuffer
{
   // Whatever
}

template 
class MyBuffer {
 protected:
  T MyBufferObjects[iMaxObjects];
 public:
   & x(const T & x) { array[0] = x; return *this; }
  MyBuffer void func(const T & buffer)
  {
       // Whatever
  }
};

class StringBuffer : public MyBuffer // Defaults to 256
{
   // Whatever
}

class ScreenBuffer : public MyBuffer
{
   // Whatever
}

You should be able to use the typename keyword in your function (or class) declaration, like this

// This may well give an error but is perfectly legal
template 
T mymin( T v1, T v2)
{
  return( (v1 < v2) ? v1 : v2 );
}

Stroustrup favours the class T format because it means fewer
keystrokes. Personally I would prefer the typename T form, but
won’t use it because it will give errors with some compilers.

Those of use who started programming using proper languages like
FORTRAN are used to the idea of the compiler selecting the correct function variant 🙂 Not many people using the FORTRAN MAX function bother to specify IMAX0,JMAX0,KMAX0 and so on. The compiler selects the specific function according to the arguments. Remember that the class T type doesn’t have to be a class. It can be a built-in type like int or float. The C++ compiler always tries to find a “real” function with matching arguments and return type before generating a function from the template, as in the following program.

Click on the link to see example_2_1.cpp

The “real” function signature matched the float case, and was used in preference to the template. The using namespace std line is necessary on Windows if we wish to avoid prefixing STL features with std::, e.g. std::cin. It is not necessary with VMS and DEC C++ 5.6, though it may be with future versions of DEC C++.

Class Templates

Template classes have the following form:

template <template-argument-list>
class-definition

The template-argument-list is one or more type-name within the scope of
the template definition. For example

Click on the link to see example_2_2.cpp

3. What has the STL ever done for us ?

“Well, yes, vectors, I mean obviously the vectors are good …”

“Don’t forget queues Reg, I mean, where would we be without properly organized
queues and iterators for any data type ?” General murmurs of agreement

“Yes, alright, apart from vectors and queues …”

“Sorts Reg – I hated having to code up a new sort routine for every class.” Here, here, etc.

“Right. So apart from vectors, queues and associated containers classes,
iterators, various useful algorithms and functions, what has the STL ever done
for us ?”

“Memory allocators Reg. We can allocate container memory using any scheme we
like, and change it without rewriting all the code. It keeps things in order” Reg loses his temper

“Order ? Order ? Oh shut up!”

At the end of this section, the waffle above should start making sense to you,
but is unlikely to become more humorous as a result of your studies.

There are three types of sequence containers in the STL. These, as their
name suggests, store data in linear sequence. They are the vector,
deque and list:

  • vector
  • deque
  • list

To choose a container, decide what sort of operations you will most
frequently perform on your data, then use the following table to help you.

Time overhead of operations on sequence containers
Operation Vector Deque List
Access 1st Element Constant Constant Constant
Access last Element Constant Constant Constant
Access “random” element Constant Constant Linear
Add/Delete at Beginning Linear Constant Constant
Add/Delete at End Constant Constant Constant
Add/Delete at “random” Linear Linear Constant

Each container has attributes suited to particular applications. The subsections and code samples below should further clarify when and how to use each type of sequence container.

Throughout this tutorial, I have given the #include file needed to use a feature immediately after the subsection heading. Note that some of the header names have changed since earlier versions of the STL, and the .h suffix has been dropped. Older books may refer to, for example, , which you should replace with  . If you include ANSI C headers, they should have the .h, e.g. . C++ versions of the ANSI C headers, prefixed by the letter “c” and minus the .h are becoming more widely available, but not all implementations currently support them, e.g. .

On OpenVMS systems a reference copy of the source code for the STL can be found in SYS$COMMON:[CXX$LIB.REFERENCE.CXXL$ANSI_DEF]. So for look in there for the file VECTOR.; . For Windows, open the C++ file in Visual Studio, right-click on the header file name being included, such as

#include <vector>

and choose “Open Document “, in this example.

Vector

#include <vector>

We introduced the vector in Example 1.2, where we used it
instead of an array. The vector class is similar to an array, and
allows array-type syntax, e.g. my_vector[2] . A vector
is able to access elements at any position (referred to as “random” access in
the preceding table) with a constant time overhead, O(1). Insertion or
deletion at the end of a vector is “cheap”. As with the
string, no bounds checking is
performed when you use operator [].

Insertions and deletions anywhere other than at the end of the
vector incur overhead O(N), N being the number of elements
in the vector, because all the following entries have to be shuffled along to
make room for the new entries, the storage being contiguous. Memory overhead of
a vector is very low and comparable to a normal array.

The table below shows some of the main vector functions.

  Some Vector Access Functions    Purpose
  ----------------------------    -------
  begin()                         Returns iterator pointing to first element
  end()                           Returns iterator pointing _after_ last element
  push_back(...)                  Add element to end of vector
  pop_back(...)                   Destroy element at end of vector
  swap( , )                       Swap two elements
  insert( , )                     Insert new element
  size()                          Number of elements in vector
  capacity()                      Element capacity before more memory needed
  empty()                         True if vector is empty
  []                              Random access operator

The next example shows a vector in use.

Click on the link to see example_3_1.cpp

Note how the element sort takes v.begin() and v.end() as range arguments. This is very common in the STL, and you will meet it again. The STL provides specialized variants of vectors: the bitset and valarray. The former allows a degree of array-like addressing for individual bits, and the latter is intended for numeric use with real or integer quantities. To use them, include the bitset or valarray header files (these are not always supported in current STL implementations). Be careful if you erase() or insert() elements in the middle of a vector. This can invalidate all existing iterators. To erase all elements in a vector use the clear() member function.

Deque

#include <deque>

The double-ended queue, deque (pronounced “deck”) has similar properties to a vector, but as the name suggests you can efficiently insert or delete elements at either end.

The table shows some of the main deque functions.

  Some Deque Access Functions    Purpose
  ---------------------------    -------
  begin()                        Returns iterator pointing to first element
  end()                          Returns iterator pointing _after_ last element
  push_front(...)                Add element to front of deque
  pop_front(...)                 Destroy element at front of deque
  push_back(...)                 Add element to end of deque
  pop_back(...)                  Destroy element at end of deque
  swap( , )                      Swap two elements
  insert( , )                    Insert new element
  size()                         Number of elements in deque
  capacity()                     Element capacity before more memory needed
  empty()                        True if deque is empty
  []                             Random access operator

A deque, like a vector, is not very good at inserting or deleting elements at random positions, but it does allow random access to elements using the array-like [] syntax, though not as efficiently as a vector or array. Like the vector an erase() or insert() in the middle can invalidate all existing iterators.

The following program shows a deque representing a deck of cards. The queue is double-ended, so you could modify it to cheat and deal off the bottom 🙂

Click to see example_3_2.cpp

The card game is a version of pontoon, the idea being to get as close to 21 as possible. Aces are counted as one, picture cards as 10. Try to modify the program to do smart addition and count aces as 10 or 1; use a vector to store your “hand” and give alternative totals.

Notice the check on the state of the input stream after reading in the character response. This is needed because if you hit, say, Z, the input stream will be in an error state and the next read will return immediately, causing a loop if you don’t clear cin to a good state.

List

#include <list>

Lists don’t provide [] random access like an array or vector, but are suited to applications where you want to add or remove elements to or from the middle. They are implemented as double linked list structures in order to support bidirectional iterators, and are the most memory-hungry standard container, vector being the least so. In compensation, lists allow low-cost growth at either end or in the middle.

Here are some of the main list functions.

  Some List Access Functions    Purpose
  --------------------------    ------
  begin()                       Returns iterator pointing to first element
  end()                         Returns iterator pointing _after_ last element
  push_front(...)               Add element to front of list
  pop_front(...)                Destroy element at front of list
  push_back(...)                Add element to end of list
  pop_back(...)                 Destroy element at end of list
  swap( , )                     Swap two elements
  erase(...)                    Delete elements
  insert( , )                   Insert new element
  size()                        Number of elements in list
  capacity()                    Element capacity before more memory needed
  empty()                       True if list is empty
  sort()                        Specific function because 
                                sort routines expect random access iterators

Click to see example_3_3.cpp

The loop over elements starts at yrl.begin() and ends just before yrl.end(). The STL .end() functions return iterators pointing just past the last element, so loops should do a != test and not try to dereference this, most likely invalid, position. Take care not to reuse (e.g. ++) iterators after they have been used with erase() – they will be invalid. Other iterators, however, are still valid after erase() or insert().

Container Caveats

Be aware that copy constructors and copy assignment are used when elements are added to and (in the case of the vector and deque) deleted from containers, respectively. To refresh your memories, copy constructor and copy assignment member functions look like this example:

class MyClass {
public:
  .
// Copy constructor
MyClass( const MyClass &mc )
{
  // Initialize new object by copying mc.
  // If you have *this = mc , you'll call the copy assignment function
}
// Copy assignment
MyClass & operator =( const MyClass &mcRHS )
{
// Avoid self-assignment
if ( this != &mcRHS ) {
// Be careful not to do *this = mcRHS or you'll loop
.
}
return( *this );
}
};

When you put an object in a container, the copy constructor will be called. If you erase elements, then destructors and copy assignments (if other elements need to be shuffled down) will be called. see supplementary example, RefCount.cpp for a demonstration of this. Another point to bear in mind is that, if you know in advance how many elements you are going to add to a container, you can reserve() space, avoiding the need for the STL to reallocate or move the container.


  vector things;
  things.reserve( 30000 );
  for ( ... ) {
    things.push_back( nextThing );
  }

The above code fragment reserves enough space for 30000 objects up front, and produced a significant speed up in the program.

Allocators

Allocators do exactly what it says on the can. They allocate raw memory, and return it. They do not create or destroy objects. Allocators are very “low level” features in the STL, and are designed to encapsulate memory allocation and deallocation. This allows for efficient storage by use of different schemes for particular container classes. The default allocator, alloc, is thread-safe and has good performance characteristics. On the whole, it is best to regard allocators as a “black box”, partly because their implementation is still in a state of change, and also because the defaults work well for most applications. Leave well alone!

4. Sequence Adapters

Sequence container adapters are used to change the “user interface” to other STL sequence containers, or to user written containers if they satisfy the access function requirements. Why might you want to do this ? Well, if you wanted to implement a stack of items, you might at first decide to base your stack class on the list container – let’s call it ListStack – and define public member functions for push(), pop(), empty() and top(). However, you might later decide that another container like a vector might be better suited to the task. You would then have to define a new stack class, with the same public interface, but based on the vector, e.g. VectorStack, so that other programmers could choose a list or a vector based queue. It is obvious that the number of names for what is essentially the same thing start to mushroom. In addition, this approach rules out the programmer using his or her own underlying class as the container.

Container adapters neatly solve this by presenting the same public interface irrespective of the underlying container. Being templatized, they avoid name proliferation. Provided the container type used supports the operations required by the adapter class (see the individual sections below) you can use any type for the underlying implementation. It is important to note that the adapters provide a restricted interface to the underlying container, and you cannot use iterators with adapters.

Stack

#include <stack>

The stack implements a Last In First Out, or LIFO structure, which provide the public functions push(), pop(), empty() and top(). Again, these are self explanatory – empty returns a bool value which is true if the stack is empty. To support this functionality stack expects the underlying container to support push_back(), pop_back(), empty() or size() and back()

  Container Function    Stack Adapter Function
  ------------------    ----------------------
  back()                top()
  push_back()           push()
  pop_back()            pop()
  empty()               empty()
  size()                size()

You would be correct in surmising that you can use vector, deque or list as the underlying container type. If you wanted a user written type as the container, then if provided the necessary public interfaces, you could just “plug” it into a container adapter.

Example 4.1 demonstrates a stack implemented with a vector of pointers to char. Note that the syntax of using container adapters differs from that shown in Saini and Musser or Nelson’s book, and is based on the December 1996 Working Paper of the ANSIC++ Draft Standard.

Click to see example_4_1.cpp

Note how the stack declaration uses two arguments. The first is the type of object stored, and the second is a container of the same type of object.

Queue

#include <queue>

A queue implements a First In First Out, or FIFO structure, which provides the public functions push(), pop(), empty(), back() and front() ( empty() returns a bool value which is true if the queue is empty). To support these, queue expects the underlying container to have push_back(), pop_front(), empty() or size() and back()

  Container Function    Queue Adapter Function
  ------------------    ---------------------
  front()               front()
  back()                back()
  push_back()           push()
  pop_front()           pop()
  empty()               empty()
  size()                size()

You can use deque or list as the underlying container type, or a user-written type. You can’t use a vector because vector doesn’t support pop_front(). You could write a pop_front() function for vector, but this would be inefficient because removing the first element would require a potentially large memory shuffle for all the other elements, taking time O(N).

The following code shows how to use a queue.

Click to see example_4_2.cpp

Note how we haven’t given a second argument in the queue declaration, but used the default deque given in the header file.

Priority Queue

#include <queue>

A priority_queue, defined in the queue header, is similar to a queue, with the additional capability of ordering the objects according to a user-defined priority. The order of objects with equal priority is not really predictable, except of course, they will be grouped together. This might be required by an operating system process scheduler, or batch queue manager. The underlying container has to support push_back(), pop_back(), empty(), front(), plus a random access iterator and comparison function to decide priority order.

  Container Function    Priority Queue Adapter Function
  ------------------    -------------------------------
  front()               top()
  push_back()           push()
  pop_back()            pop()
  empty()               empty()
  size()                size()
  [] random iterators   Required to support heap ordering operations

Hence a vector or a deque can be used as the underlying container, or a suitable user-provided class.

The next sample program demonstrates a priority_queue implemented with a vector of task objects. Note that the syntax of using container adapters differs from that shown in Saini and Musser or Nelson’s book.

Click to see example_4_3.cpp

Example 4.3 program shows a user-defined comparison function object (discussed later), the only member of the PrioritizeTasks class. This is used to determine the relative priority of tasks and, like the << operator, is made a friend of the TaskObject class, with the second parameter being a const TaskObject & ,  so that it can access the private data members. When the tasks are pulled off the priority_queue, they are in our notional execution order, highest priority first.

5. Strings

#include <string>

A member of the C++ standards committee was allegedly told that if strings didn’t appear pretty darn quickly, then there was going to be a lynching. There hasn’t been a lynching, and whilst we can debate the merits of this, I think there is general agreement that it is a good thing to have strings at last. Those of us who started programming with proper languages, like FORTRAN, have long criticized the rather ugly syntax of C string manipulation; “What ? You have to call a function to add two strings ?” being a typical comment.

The C++ string template class is built on the basic_string template. Providing much of the functionality of the container classes like vector, it has built in routines for handling character set conversion, and wide characters, like NT’s Unicode. The string class also provides a variety of specialized search functions for finding substrings. The characteristics of the character set stored in the string are described by the char_traits structure within the string, there being a different definition of this for each type of character set. On the whole you needn’t concern yourself too much with these details if you are using strings of ASCII characters. Strings, like the vector, expand as you add to them, which is much more convenient than C-style strings, where you either have to know how big they will be before you use them, or malloc and realloc. The largest possible string that can be accommodated is given by the max_size() access function.

  Some String Access Functions    Purpose
  ----------------------------    -------
  find(...)                       Find substring or character, start at start
  find_first_of(...)              Find first occurrence of any characters in
                                  given set, starting from start of string
  find_last_of(...)               Find last occurrence of any characters in
                                  given set, starting from start of string
  find_not_first_of(...)          Find first occurrence of characters _not_
                                  in given set, starting from start of string
  find_last_not_of(...)           Find last occurrence of characters _not_ in
                                  given set, starting from start of string
  rfind(...)                      Find substring or character, start at end
  size()                          Number of elements in vector
  []                              Random access to return a single character
                                  - no bounds checking
  at(...)                         Random access to return a single character
                                  - with bounds checking
  +                               Concatenate strings
  swap( , )                       Swap two strings
  insert( , )                     Insert a string at the specified position
  replace(...)                    Replace selected substring with another string

The string provides the highest level of iterator functionality, including [] random access. Hence all relevant standard algorithms work with string. You can sort, reverse, merge and so on. Specialized versions of some algorithms, like swap(), are provided for strings to take advantage of certain optimization techniques. The operator [] allows you to access a single character in a string, but without any bounds checking. Use the at() function if you want bounds checking. The operator+ allows easy string concatenation, so you can now do things like


  string firstname, lastname, name;
  .
  name = firstname + " " + lastname;

or


  name = firstname;
  name += " ";
  name += lastname;

Easily understandable documentation on the string class is still a bit thin on the ground at the moment, so I have compiled some sample code to illustrate the main facilities.

Click to see example_5_1.cpp

The next program puts some of the string functions to use in a simple expression evaluator, which takes arithmetic-style expressions. It also shows the at() function, which unlike operator[], at() will throw an out_of_rangeexception for a bad index. Try calculating the rest energy of an electron and proton (E = m*c^2).

Click to see example_5_2.cpp

The expression evaluator above introduces maps, discussed later. Here they are used to allow us to get a numeric value from the symbolic name stored in a string.

6. Iterators

#include <iterator>

An iterator you will already be familiar with is a pointer into an array.


  char name[] = "Word";
  char ch, *p;

  p = name;   // or &name[0] if you like
  ch = p[3];  // Use [] for random access
  ch = *(p+3);// Equivalent to the above
  *p = 'C';   // Write "through" p into name
  while ( *p && *p++ != 'r' ); // Read name through p, look for letter 'r'

Looking at the above code sample shows how flexible and powerful iterators can be. The above code fragment uses p in at least 5 different ways. We take it for granted that the compiler will generate the appropriate offset for array elements, using the size of a single element.

The STL iterators you’ve already met are those returned by the begin() and end() container access functions, that let you loop over container elements. For example:


  list l;
  list::iterator liter; // Iterator for looping over list elements
  for ( liter = l.begin(); liter != l.end(); ++liter )
  {
    *liter = 0;
  }

The end-of-loop condition is slightly different to normal. Usually the end condition would be a less than < comparison, but as you can see from the table of iterator categories below, not all iterators support <, so we increment the iterator from begin() and stop just before it becomes equal to end(). It is important to note that, for virtually all STL purposes, end() returns an iterator “pointing” to an element just after the last element, which it is not safe to dereference, but is safe to use in equality tests with another iterator of the same type. The pre-increment ++ operator can sometimes yield better performance because there is no need to create a temporary copy of the previous value, though the compiler usually optimizes this away.

Iterators are a generalized abstraction of pointers, designed to allow programmers to access different container types in a consistent way. To put it more simply, you can think of iterators as a “black box” between containers and algorithms. When you use a telephone to directly dial someone in another country, you don’t need to know how the other phone system works. Provided it supports certain basic operations, like dialling, ringing, reporting an engaged tone, hanging up after the call, then you can talk to the remote person. Similarly, if a container class supports the minimum required iterator types for an algorithm, then that algorithm will work with the container.

This is important because it means that you can use algorithms such as the sort and random_shuffle we’ve seen in earlier examples, without their authors having to know anything about the containers they are acting on, provided we support the type of iterator required by that algorithm. The sort algorithm, for example, only needs to know how to move through the container elements, how to compare them, and how to swap them. There are 5 categories of iterator:

  • Random access iterators
  • Bidirectional iterators
  • Forward iterators
  • Input iterators
  • Output iterators

They are not all as powerful in terms of the operations they support – most don’t allow [] random access, as we’ve seen with the difference between vector and list. The following is a summary of the iterator hierarchy, most capable at the top, operations supported on the right.

                      The Iterator Hierarchy

           Iterator Type                   Operations Supported

                 ^               +-------------------------------------+
                / \               \                                   /
               /   \               \                                 /
              /     \               \        == !=  >=      []      /
             /       \               \                             /
            / Access  \               \            =*p            /
           /-----------\               \-------------------------/
          /Bidirectional\               \       == != ++ --     /
         /               \               \      *p= -> =*p     /
        /-----------------\               \-------------------/
       /      Forward      \               \     == != ++    /
      /                     \               \   *p= -> =*p  /
     /-----------+-----------\               \-------+-----/
    /Input       |      Output\               \ == !=| ++ /
   /             |             \               \++ ->|*p=/
  +--------------+--------------+               \ =*p|  /
                                                 \   | /
                                                  \  |/
                                                   \ /
                                                    v

The higher layers have all the functionality of the layers below, plus some extra. Only random iterators provide the ability to add or subtract an integer to or from the iterator, like *(p+3). If you write an iterator it must provide all the operations needed for its category, e.g. if it is a forward iterator it must provide ==, !=, ++, *p=, -> and =*p. Remember that ++p and p++ are different. The former increments the iterator then returns a reference to itself, whereas the latter returns a copy of itself then increments.

Operators must retain their conventional meaning, and elements must have the conventional copy semantics. In a nutshell, this means that the copy operation must produce an object that, when tested for equality with the original item, must match. Because only random iterators support integer add and subtract, all iterators except output iterators provide a distance() function to find the “distance” between any two iterators. The type of the value returned is


template typename iterator_traits::difference_type

This is useful if, for example, you find() a value in a container, and want to know the “position” of the element you’ve found.


  map::iterator im;
  map::difference_type dDiff;
  im = my_map.find( key );
  dDiff = distance( my_map.begin(), im );

Of course, this operation might well be inefficient if the container doesn’t support random access iterators, since in that case it will have to “walk through” the elements comparing the iterators.

Just as you can declare pointers to const objects, you can have iterators to const elements. The const_ prefix is used for this purpose, e.g.

The iterator_traits for a particular class is a collection of information, like the “iterator tag” category, which help the STL “decide” on the best algorithm to use when calculating distances. The calculation is trivial for random iterators, but if you only have forward iterators then it may be a case of slogging through a linked list to find the distance. If you write a new class of container, then this is one of the things you must be aware of. As it happens, the vector, list, deque, map and set all provide at least Bidirectional iterators, but if you write a new algorithm, you should not assume any capability better than that which you really need. The lower the category of iterator you use in your algorithm, the wider the range of containers your algorithm will work with.

Although the input and output iterators seem rather poor in capability, in fact they do add the useful ability to be able to read and write containers to or from files. This is demonstrated in the program below, and again in Example 7.2.

Click to see example_6_1.cpp

The result of the possible restrictions on an iterator is that most algorithms have two iterators as their arguments, or (perhaps less safely) an iterator and a number of elements count. In particular, when you are using iterators, you need to be aware that it isn’t a good idea to test an iterator against NULL, or to test if one iterator is greater than another. Testing for equality or inequality is safe except for output iterators, which is why the loops in the example code use iterator != x.end() as their termination test.

Iterator Adapters

Like the container adapters, queue, priority_queue
and stack, iterators have adapters too. There are three types:

  • Reverse iterators
  • Insert iterators
  • Raw storage iterators

The reverse iterator reverses the behaviour of the ++ and -- operators, so you can write code like this:

Standard containers all provide rbegin() and rend() functions to support this kind of thing.

The insertions iterators will, depending on the type of container, allow insertion at the front, back or middle of the elements, using front_insert_iterator, back_insert_iterator or insert_iterator. Because you might just as well use container.push_back() and so forth, their main use is as the return value of functions like front_inserter(), back_inserter and inserter, which modify how a particular algorithm should work.

Raw storage iterators are used for efficiency when performing operations like copying existing container elements to regions of uninitialized memory, such as that obtained by the STL functions get_temporary_buffer and return_temporary_buffer. Look in the algorithm header for examples of iterator use.

7. We are searching for the Associative Container

“We’ve already got one !”

Mumbles of “Ask them what it looks like”

“Well what does it look like ?”

“It’s a verra naice !”

There are four types of associative container in the STL. Associative containers are used to store objects that we want to be able to retrieve using a key. We could use a map as a simple token/value database, where a token might be a character string, and the value might be an integer.

Associative containers store items in key order, based on a user-supplied comparison function, and the multi variants allow duplicate keys. Lookup is O(logN), N being the number of items stored. The associative containers are:

  • map
  • multimap
  • set
  • multiset

All four associative containers store the keys in sorted order to facilitate fast traversal. For built-in types the Compare function can simply be a suitable STL function object, e.g. map . If you are storing pointers to objects rather than the objects themselves, then you will need to provide your own comparison function object even for built-in types. The multi variant of the container is able to store more than one entry with the same key, whereas map and set can only have one entry with a particular key.

In Stroustrup’s book he shows how to make a hash_map variant of map. When working well, even with large data sets this can perform lookups in O(1) time, compared to O(logN) performance from the map. However, hashing can exhibit pathological behaviour if many keys hash to the same value, and if hash table expansion is required, that can be a slow operation.

Map and Multimap

#include <map>

A map is used to store key-value pairs, with the values retrieved using the key. The multimap allows duplicate keys, whereas maps insist on unique keys. Items in a map are, when they are dereferenced through an iterator for example, returned as a pair, which is a class defined in the utility header. The pair has two members, first and second which are the key and the data respectively. The pair is used throughout the STL when a function needs to return two values.

  Some Map Access Functions    Purpose
  -------------------------    -------
  begin()                      Returns iterator pointing to first element
  end()                        Returns iterator pointing _after_ last element
  swap( , )                    Swap two elements
  insert( , )                  Insert a new element
  size()                       Number of elements in map
  max_size()                   Maximum possible number of elements in map
  empty()                      True if map is empty
  []                           "Subscript search" access operator

In the sample program below, which uses first and second, a list of tokens and values in the form

pi = 3.1415926535898
c = 299792459.0

are read in from the file tokens.dat, then you are prompted to enter a token name for which the corresponding value is displayed. Because map supports the [] subscript operator, you can access the stored value using the key as the subscript.

Click to see example_7_1.cpp

In Example 5.2 we used the following lookup method with a map

This is fine where we know that item is definitely in symbol_values[], but generally you must use the find(...) function and test against end(), which is the value returned if the key doesn’t exist.

Several variants of an insert() function exist for the map. The single argument version of insert allows you test whether the item was already in the map by returning a pair. If successful the .second bool value will be true and the iterator will “point” at the inserted item. On failure .second will be false and the iterator will point at the duplicate key that caused the insertion to fail.

The map can only store one value against each key. Because each key can only appear once, if you try and add a second instance of the same key, then that will supercede the existing one. Edit the tokens.dat file used by Example 7.1 and convince yourself that this is the case. In situations where this restriction is not acceptable, the multimap should be used.

Set and Multiset

#include <set>

The set stores unique keys only, i.e. the key is the value. Here are some of the set access functions:

  Some Set Access Functions    Purpose
  -------------------------    ------
  begin()                      Returns iterator pointing to first element
  end()                        Returns iterator pointing _after_ last element
  swap( , )                    Swap two elements
  insert( , )                  Insert a new element
  size()                       Number of elements in set
  max_size()                   Maximum possible number of elements in set
  empty()                      True if set is empty

Like map, set supports the insert() function. Entries are kept in order, but you can provide your own comparison function to determine that order. Useful algorithms operating on a set are includes(), set_union(), set_intersection(), set_difference() and set_symmetric_difference(). The set supports bidirectional iterators, all set iterators are const_iterators, even if you declare them as set::iterator, so watch out for that.

Click to see example_7_2.cpp

This example also shows the use of an output iterator which in this case is directing the output to cout, but this could just as well be a file. The set can only store unique keys, hence if you try and insert a second instance of the same key a failure will result. The single argument version of insert( const value_type& ) returns pair( it, true or false) with the same meaning as for map. Remember that you can define what equality means, so the situation may arise where two “identical” set elements have different data members. In situations where this restriction is not acceptable, the multiset should be used.

8. Algorithms and Functions

We’ve already met and used several of the STL algorithms and functions in the example programs, sort(...) being one of them. In the STL, algorithms are all template functions, parameterized by iterator types. For example, sort(...) might be implemented like this:


template 
inline void sort (RandomAccessIter first, RandomAccessIter last)
{
    if (!(first == last)) {
      //     Do the sort
    }
}

Because algorithms only depend on the iterator type they need no knowledge of the container they are acting on. This allows you to write your own container and, if it satisfies the iterator requirements, the STL algorithms will work with a container type “unknown” when they were written. Because the algorithms are all templatized, the compiler should be able to generate inline code to do the operation just as efficiently as if you had “hand coded” the routine.

Many algorithms need a little help from the programmer to determine, for example, whether one item in a container is greater, less than or equal to another. This is where function objects come into the picture. Function objects are used similarly to function pointers in C. We have seen one example of function pointers in qsort used by Example 1.1. In OSF/Motif programs we often need to supply a “callback function” that will be executed when a particular event is seen by a “Widget”, e.g.someone clicking on a button widget. In C we might do this:


typedef void (*CallbackProcedure)( Context c, Event e, Userdata u);
struct {
  CallbackProcedure callback;
  Userdata udat;
} CallbackRecord;
      .
/*  My function to be called when button is pressed */
void ButtonPressedCB( Context c, Event e, Userdata u);
      .
  CallbackRecord MyCallbackList[] = { {ButtonPressedCB, MyData}, {NULL,NULL} };
      .
  SetButtonCallback( quit_button, MyCallbackList );

Problems with this approach are the lack of type safety, the overhead associated with indirecting function calls, lack of inline optimization, and problems with interpreting “user data” which tends to end up being a (void *) pointer. STL function objects avoid these problems because they are templatized, so provided the function object is fully defined at the point of use, the compiler can generate the code inline at compilation time. User data can be kept in the function object, and maintains type safety. You can, of course, pass function pointers to algorithms if you wish, but in the following sections it should become apparent that function objects are generally a better idea.

Algorithms

#include <algorithm></span

We all know that an algorithm is abstract logical, arithmetical or computational procedure that, if correctly applied, ensures the solution of a problem. But what is an STL algorithm ? STL algorithms are template functions parameterized by the iterator types they require. In the iterators section I likened iterators to a “black box” that allowed algorithms to act on any container type which supported the correct iterators, as shown in the diagram below.

    +------------------+                     +------------------+
    |                  +---------------------+                  |
    |    Algorithms    |    |    Containers    |
    |                  +---------------------+                  |
    +------------------+                     +------------------+

The “abstraction layer” provided by iterators decouples algorithms from containers, and vastly increases the capability of the STL. Not all containers support the same level of iterators, so there are often several variants of the same algorithm to allow it to work across a range of containers without sacrificing speed benefits for containers with more capable iterators. The appropriate version of the algorithm is automatically selected by the compiler using the iterator tag mechanism mentioned earlier. It does this by using the iterator_category() template function within a “jacket definition” of the algorithm, and a rather convoluted mechanism which you don’t really need to worry about (see Mark Nelson’s book pages 338-346 if you really want to know the grisly details).

Something you should worry about is whether the containers you are using support the iterators you need for an algorithm. The best way to determine this is to use a reference book, or look at the algorithm header file. For example, the min_element algorithm will have a definition similar to this:


template 
ForwardIterator min_element (ForwardIterator first, ForwardIterator last);

Hence it requires a container that supports at least forward iterators. It can lead to strange errors if you try an algorithm with a container that doesn’t provide the necessary iterators, because the compiler will try and generate code from the various templates and get confused before collapsing with an error. On VMS, this might result in a lot of

%CXX-E-PARMTYPLIST, Ill-formed parameter type list.
%CXX-E-BADTEMPINST, Previous error was detected during the instantiation of ..
%CXX-E-OVERLDFAIL, In this statement, the argument list .. matches no ..

errors. In Windows you tend to get lots of

.. <`template-parameter-1',`template-parameter-2', ..
.. could not deduce template argument for  ..
.. does not define this operator or a conversion to a type
     acceptable to the predefined operator

Try compiling the example code and see what errors you get.

Click to see example_8_1.cpp

There are 60 different algorithms in 8 main categories in the STL. See Stroustrup pages 507-511 for a full list of all the functions.

  • Nonmodifying Sequence Operations – these extract information,
    find, position within or move through elements but don’t change
    them, e.g. find() .
  • Modifying Sequence Operations – these are miscellaneous functions
    that do change the element they act on, e.g. swap(),
    transform(), fill(), for_each() .
  • Sorted Sequences – sorting and bound checking functions,
    e.g. sort(), lower_bound() .
  • Set Algorithms – create sorted unions, intersections and so on,
    e.g. set_union(), set_intersection() .
  • Heap Operations – e.g.
    make_heap(), push_heap(), sort_heap() .
  • Minimum and Maximum – e.g. min(), max(),
    min_element(), max_element() .
  • Permutations – e.g. next_permutation(),
    prev_permutation() .
  • Numeric – include <numeric< for general
    numerical algorithms, e.g. partial_sum() .

Some of the algorithms, like unique() (which tries to eliminate adjacent duplicates) or replace(), can’t simply eliminate or replace elements because they have no knowledge of what the elements are. What they actually do is shuffle the unwanted elements to the end of the sequence and return an iterator pointing just past the “good” elements, and it is then up to you to erase() the others if you want to. To get round this several algorithms have an _copy suffix version, which produces a new sequence as its output, containing only the required elements.

Algorithms whose names end with the _if suffix, only perform their operation on objects that meet certain criteria. To ascertain whether the necessary conditions, known as predicates, have been met, you pass a function object returning a bool value. There are two types of predicate: Predicate and BinaryPredicate. Predicates dereference a single item to test, whereas BinaryPredicates dereference two items which they might compare for instance.

template 
void count_if( InputIterator first, InputIterator last, Predicate pred,Size& n);

This will return the number of objects in the range first to just before last that match the Predicate function object pred , which takes one argument – a reference to the data type you are checking. The next algorithm requires a BinaryPredicate.

template 
ForwardIterator adjacent_find (ForwardIterator first, ForwardIterator last,
                               BinaryPredicate binary_pred);

This will look in the range first to just before last for two adjacent objects that “match”. If no match is found, then it returns last. Because a match is determined by the BinaryPredicate function object binary_pred , which takes two arguments (references to the appropriate data types), you can match on any conditions you like. In fact, there are two versions of adjacent_find : one just requires first and last and uses the == operator to determine equality, and the one above which gives you more control over the match test.

With the information above, you should now be able to look at an algorithm in the header file or reference manual, and determine what sort of function object, if any, you need to provide, and what sort of iterators your container must support if the algorithm is to be used on it.

Functions and Function Objects

#include <functional>

objects are the STL’s replacement for traditional C function pointers, and if you look at the STL algorithms, they are written as they would be if function objects were function pointers. This is why you can use function pointers (or plain, old functions)with the correct argument signature if you wish, though function objects offer several advantages, as we will see.

We have already seen a function object used in Example 4.3 to compare two task objects. This is superior to the qsort style comparison function in many respects. The function object provides type safety, allows an object’s copy constructors to be used if necessary (rather than just doing a binary copy), and doesn’t require that the objects be contiguous in memory – it can use the appropriate iterators to walk through the objects. Function objects can be used to “tuck away” data that would otherwise have to be global, or passed in a “user data” pointer. The usual template features like inline optimization and automatic code generation for different data types also apply.

“Why a ‘function object’?”, you might ask. What would be wrong with using a function template like this:


template 
bool is_less_than( const T &x, const T &y ) { return x < y; };
    .
    sort( first, last, is_less_than() );

Unfortunately this is not legal C++. You can’t instantiate a template function in this way (Stroustrup page 855). The correct thing to do is to declare a template class with operator() in it.


template 
class is_less_than {                            // A function object
  bool operator()( const T &x, const T &y ) { return x < y;}
};
    .
  sort( first, last, is_less_than() );

This is legal because we have instantiated the function object for MyObjects. There are three main uses for function objects within the STL. The comparison and predicate function objects which we have met already return a bool value indicating the result of a comparison, e.g. one object greater than another, or telling an algorithm whether to perform a conditional action, e.g. remove all objects with a particular attribute. The numeric function objects perform operations like addition, subtraction, multiplication or division. These usually apply to numeric types, but some, like +, can be used with strings. Several function objects are built in to the STL, such as plus, minus, multiplies, divides, modulus, negate, equal_to, not_equal_to, greater, and so on. See the header file for a complete list. If the data type defines the correct operator, you can use the pre-defined template function objects like this:


  some_algorithm( first, last, greater() );

so you don’t always need to create your own function object from scratch. Try and use one of the library versions if it is available. This saves effort, reduces the chances of error and improves portability. To increase the flexibility of STL function objects, adapters are provided which allow us to compose slightly modified function objects from the standard ones. If we wanted to find values greater than 1997 in a vector of integers we would use a binder to take advantage of the greater() function, which takes two arguments, to compare each value with 1997.


  iter = find_if( v.begin(), v.end(), bind2nd(greater(),1997) );

Other adapters exist to allow negation of predicates, calling of member functions, or use of “real” function pointers with binders. This topic is covered in some detail by Stroustrup in pages 518 onwards.

9. STL Related Web Pages

Here are a few of the URL’s I’ve collected relating to the STL and
C++ draft standard. Use AltaVista
http://www.altavista.digital.com/
and search for STL Tutorial, or the Yahoo!

http://www.yahoo.com/ Standard Template Library
section for more links.

10. Bibliography

The C++ Programming Language, 4th Edition
by Bjarne Stroustrup, Pub. Addison-Wesley. More pages than 2nd edition, much of the new material concerns the STL

Back to Top

Advertisement
%d bloggers like this: