Loops in Objects

Introduction

The classic definition of a loop is that it presents the contents of some data structure for some sort of processing, one element at a time. The common definition is that it starts at the beginning of the data structure and preceeds one element at a time to the end of the data structure. The data structure may be an array (vector), linked list, hash table, tree, or some other database structure that is undefined outside of the object itself.

Since the data structure is for all practical purposes unknown when it is enclosed in a class the developer of the class should provide a way to loop over the contents. So if that is the case, then it is useful to standardize on a simple way to create those loops. Here is what I have settled on for forming loops:

MyIter iter(myObject);
Datum* datum;

for (datum = iter(); datum; datum = iter++) { ooo }

By the construction of the loop, the value of datum, a pointer to an object in the data structure in myObject, is either a valid object or it is zero. Thus when datum is processed in the body of the loop it is guaranteed to be valid object.

MyIter is a class that knows a great deal about the MyObject class which defines the object, myObject. MyIter must be a friend of MyObject and since it is a friend it can "see" some private functions in MyObject. The private functions can provide the information needed for MyIter to loop over the data structure.

The best way to organize MyIter is to touch the data in the myObject object. MyIter should just retain any information it needs to loop over the data in myObject. Thus two or more MyIter objects may exist at the same time. This is an advantage and a disadvantage.

Suppose the data cannot be sorted but we need to determine if two nodes are somehow related and we need to find that out. One method would be to sort the array on using the relationship attributes for the sort key. But that is out. So another method would be to loop over the data in one function. As each node is processed the current node and the iterator is copied into another function. The second function uses the iterator copy to examine the remaining nodes in the data structure for a related node. Neither function knows how the iterator works or what kind of data structure is in play.

The downside of this kind of looping class is that two functions in different threads (processes) operating independently can get into trouble adding and deleting nodes. Then the best solution is to lock the objects while the two different functions in different threads (processes) perform their loops.

Examples are always good. Suppose we have an array in MyObject:

class MyObject {
int   nItems;
Datum data[DataSize];

public:

  MyObject() : nItems(0) { }
  o o o
  }

What does an interator need? An index variable in this case. It also needs to know the location of the object, i.e. its name. It also needs definitions for the function and "++" operators.

class MyIter {
MyObject& obj;
int       i;

public:

MyIter(MyObject& myObject) : obj(myObject), i(0) { }

Datum* operator() ()    { ooo }
Datum* operator++ (int) { ooo }       // By the way, the int is just a c++ thing

private:

  MyIter() : obj(*(MyObject*)0) { }   // Prevents MyIter not being initialized properly
  }

Clearly we have to rely on the MyObject class to supply information about the data structure. Since this is an array we need to know how many elements are occupied in the array. We also need to know how to get the data from the array. Let's add the functions and friend declaration to MyObject.

class MyObject {
int   nItems;
Datum data[DataSize];

public:

  MyObject() : nItems(0) { }

  bool add(Datum& dtm)
     {if (nItems >= DataSize) return false; data[nItems++] = dtm; return true;}
  o o o

private:

  Datum* datum(int i) { ooo }
  int    nData()      { ooo }

  friend class MyIter;       // or friend typename MyIter if MyIter is a typedef
  }

The function datum(int i) must protect the data array. In this case it must test for both ends of the array:

Datum* datum(int i) {return 0 <= i && i < nData() ? &data[i] : 0;}
int nData() {return nItems;}

This example does not need an initialization function, some do. So given that here are the two functions needed to be defined in MyIter:

Datum* operator() () {i = 0; return obj.datum(i);}
Datum* operator++ (int) {return obj.datum(++i);}

Implementations will vary depending on the data structure. The goal is to always have the following loop where the body of the loop may be anything that needs a pointer to a datum:

Datum* find(Key key) {
MyIter iter(myObject);
Datum* datum;

  for (datum = iter(); datum; datum = iter++) {if (datum->key == key) return datum;}
  return 0;
  }

Software Design & Engineering

Loops In Objects

Introduction