Building Coroutines

Suppose we have three functions A(), B() and C(), where A() calls B() to do something and then calls C() to do some other thing. This calling activity is shown in the following figure. More precisely, when A() calls B(), A() waits until B() returns. When B() is called, it always starts its execution with its first statement. The similar holds true for the call to C().

However, this type of call/return is not always the best way of using routines. Suppose a system runs three threads A, B and C with the round-robin scheduling policy. In other words, the threads run in the order of A, B, C, A, B, C, A, B, C and so on. The following figure shows the scheduling of these threads. The red arrows are thread executions, doted arrows indicate context switching, and different rounds use different colors.

Thus, A runs for a while, and the CPU is switched to B. Then, B runs for a while, and the CPU switched to C. When C finishes its time quantum, the control switches back to A; however, the restarting point is where A was suspended previously. When B and C restart, the restarting points are where B and C were suspended. If we consider A, B and C as three functions calling each other (simulated by switching CPU), the call brings the control to the statement where the caller was suspended. Routines organized in this way are called coroutines. Coroutines have been used to construct finite automata, parsers and, of course, multithreaded systems!

To show you a simple example, let us only consider two functions working as two coroutines. Click here to download a copy of this program.

#include  
#include  

int       max_iteration, iter;

jmp_buf   Main, PointA, PointB;

void      Ping(void);
void      Pong(void);

void  main(int  argc, char* argv[])
{
     max_iteration = abs(atoi(argv[1]));
     iter = 1;
     if (setjmp(Main) == 0)
          Ping();
     if (setjmp(Main) == 0)
          Pong();
     longjmp(PointA, 1);      
}

void  Ping(void)
{
     if (setjmp(PointA) == 0)
          longjmp(Main, 1);
     while (1) {
          printf("%3d : Ping-", iter);
          if (setjmp(PointA) == 0)
               longjmp(PointB, 1);
     }
}

void  Pong(void)
{
     if (setjmp(PointB) == 0)
          longjmp(Main, 1);
     while (1) {
          printf("Pong\n");
          iter++;
          if (iter > max_iteration)
               exit(0);
          if (setjmp(PointB) == 0)
               longjmp(PointA, 1);
     }
}

The logic of this program requires some explanation.

The main program looks easy. The first setjmp() saves the current state into the jump buffer Main. Then, the main program calls function Ping().
Once the execution enters function Ping(), a jump buffer PointA is setup and then immediately returns the control to the main program. This is shown in the following figure:
After returning from Ping() via a long jump, the main program does the same to function Pong(). Function Pong() establishes a jump buffer PointB and uses a long jump back to the main:
Let us return to the main program. The next statement to be executed is longjmp(PointA,1). That is, the main program uses a long jump to enter function Ping()!
Hmmm, things are getting more interesting now.
In function Ping(), since this is a return from a long jump, the control falls through and executes the while loop. In each iteration of this loop, a new state is saved into jump buffer PointA, overwriting the previous one. After saving this state (or marking a return point), function Ping() executes a long jump to PointB which is a point marked by function Pong(). More precisely, function Ping() enters function Pong() via a long jump rather than a call!
Function Pong() follows what we just mentioned. That is, the execution falls through and enters the while loop. In this loop, a new state is saved to jump buffer PointB and long jumps back to Ping(). These activities is shown in the following figure.
Therefore, function Ping() uses a long jump into function Pong()'s PointB. Then, function Pong() also uses a long jump function into function Ping()'s PointA. This alternating execution (between functions Ping() and Pong()) continues until function Pong() has executed max_iteration times.
The following figure shows the control flow. Blue squares indicate function calls, red arrows are program execution flow, and light blue arrows are control flow generated using setjmp() and longjmp(). Numbers by the arrows give the order of that event.

In a multithreaded system, we may have many functions running as threads, each of which has an environment in the runtime stack. The thread scheduler just uses a technique similar to the above program to schedule those threads that are in the ready queue.

But, ....

It seems that we have a working multithreaded system. Really? No, things are not that simple. If you take a look at functions Ping() and Pong(), you will see that both only use global variables. If the program is changed to use local variables, it is likely that you will obtain incorrect result. The following is a modified version. Click here to download a copy.

#include  
#include  

int       max_iteration;      

jmp_buf   Main;        
jmp_buf   PointPing;
jmp_buf   PointPong;    

void      Ping(void);
void      Pong(void);

void  main(int  argc, char* argv[])
{
     if (argc != 2) {             
          printf("Use %s max-#-of-lines\n", argv[0]);
          exit(1);
     }
     max_iteration = abs(atoi(argv[1]));
     if (setjmp(Main) == 0)
          Ping();
     if (setjmp(Main) == 0)
          Pong();
     longjmp(PointPing, 1);
}

void  Ping(void)
{
     int  i;

     if (setjmp(PointPing) == 0) {
          i = 1;
          longjmp(Main, 1);
     }
     while (1) {
          printf("Ping (%2d) - ", i);
          i++;
          if (setjmp(PointPing) == 0)
               longjmp(PointPong, 1);
     }
}

void  Pong(void)
{
     int  j;

     if (setjmp(PointPong) == 0) {
          j = 1;
          longjmp(Main, 1);
     }
     while (1) {
          printf("Pong (%2d)\n", j);
          j++;
          if (j > max_iteration)
               exit(0);
          if (setjmp(PointPong) == 0)
               longjmp(PointPing, 1);
     }
}

Local variables i and j are counters and are initialized when Ping() and Pong() are called. The output of this program with max_iteration set to 4 may look like the following:

Ping ( 1) - Pong ( 2)
Ping ( 3) - Pong ( 4)

This output is definitively incorrect, because the while loop in function Pong() should iterate four times, printing out Pong ( 1), Pong ( 2), Pong ( 3), and Pong ( 4). But, we have have Pong ( 2) and Pong ( 4) in the above output. (WARNING: your system may generate a different output.) Moreover, the output from Ping() is also wrong. So, what is the problem?

The problem is in stack memory allocation as we have discussed a couple of times. When the main starts execution and before reaches the first setjmp(), the stack has only one stack frame as shown in Figure (b) below. Then, Ping() is called, and the stack has two frames as shown in Figure (a). Function Ping() uses a longjmp() to bring the control back to the main program, effectively restoring the stack back to the point of the first setjmp() being called (Figure (a)). When the main program calls setjmp() to mark a second return point and then calls Pong(), the stack frame for function Pong() will likely occupy the same location which was allocated for Ping()! Because both functions Ping() and Pong() have only one local variable, they will likely share the same location on the stack. More precisely, it is highly likely that i of Ping() and j of Pong() share the same address. In fact, you can print their addresses to verify this. Therefore, when Ping() prints and increases the value of i, Pong() loses its original value of j and prints the next one!


(a)	(b)	(c)

To overcome this problem, we need to allocate separate stack space for each function that is running as a thread. Please refer to our mini-project to learn how this can be done easily. Unfortunately, this is usually system-dependent. Check our mini-project for a simple implementation of multiple stacks, on for each thread, on Sun Solaris.