Concurrency Theory: Lecture 12, 08 March 2018
---------------------------------------------

Recall:

Let t = (E,<=.lab) be a trace, lab : E --> Act

max_i(t) = maximum i-event in t
delta_i(t) = i-view of t = {e | e <= max_i(t)}
latest_{i->j}(t) = max_j(delta_i(t))

----------

Problem: Suppose each process i incrementally maintains local
  information about latest{i->k}(t) for every process k.  When i
  and j meet, they need to compare which of latest_{i->k}(t) and
  latest_{j->k}(t) is latest, for every process k

Constraints: Finite-state, maintain only bounded amount of
information.

----------

Difficulty:

- Comparing local timestamps relies on actual values of
  timestamps.  Counter values are unbounded, so number of bits to
  maintain counters and timestamps is also unbounded.

- Finite state => bounded memory => bounded set of labels, that
  must be reused.  Need to fix order between labels dynamically,
  according to context.  (For unbounded numbers, order between
  labels is fixed statically, based on value of number.)

----------

Comparing labels without relying on values:

Suppose i and j synchronize on an action after t. 

Events in delta_i(t) union delta_j(t) divide into three sets:

1. E_common = delta_i(t) intersect delta_j(t) : both see these events
2. E_i = delta_i(t) \ delta_j(t)  : only i sees these events
3. E_j = delta_j(t) \ delta_i(t)  : only j sees these events

Note that each pair of events e in E_i and f in E_j is
independent.

We want to compare e_ik = latest_{i->k}(t) and e_jk =
latest{j->k}(t).  Three cases are possible

A. e_ik in E_i, e_jk in E_common : e_ik is later than e_jk
B. e_jk in E_j, e_ik in E_common : e_jk is later than e_ik
C. e_ik and_jk both in E_common  : e_jk = e_ik

Cannot have e_ik in E_i and e_jk in E_j because all k events are
linearly ordered, but E_i is independent of E_j.

If we can compute whether e_ik and e_jk are inside or outside
E_common, we are done.

Lemma: Every maximal element in E_common is a primary event for
  both i and j

Therefore, the set of events that belong to both latest_i and
latest_j include all the maximal events in E_common

- Any event in latest_i below such a maximal event lies in E_common
- Any event in latest_i above such a maximal event lies in E_i

Given this, each process i now maintains latest_i as a partial
order, called the primary graph, rather than just an indexed list
(array) of primary events

Informal algorithm

1. i and j scan primarygraph_i and primarygraph_j and mark all
   events whose labels appear in both graphs

2. For any other k, check the positions of e_ik =
   latest_{i->k}(t) and e_jk = latest{j->k}(t) with respect to
   marked events.

   This tells us which of the following holds

   A. e_ik in E_i, e_jk in E_common : e_ik > e_jk
   B. e_jk in E_j, e_ik in E_common : e_jk > e_ik
   C. e_ik and_jk both in E_common  : e_jk = e_ik

3. We collect the later copies of e_ik for each k.  We have to
   also put these together as a new primary graph

   a. If (e,f) both come from i, inherit edge from primarygraph_i
   b. If (e,f) both come from j, inherit edge from primarygraph_j
   c. If e comes from i and f comes from j, e is in E_i, b is in
      E_j and they must be unordered, so no edge

Notice that we need only equality of labels to perform this
comparison, actual values are unimportant.  So we can use labels
without assuming any static ordering relation.

- To reuse labels, we should ensure that labels are used
  consistently across across all primary graphs---that is, equal
  labels always denote the same event.

- Each primary graph has upto N events.  There are N processes, so
  at most N^2 labels are ever "in use" at any time.

- In principle, if we N^2 + 1 labels, we always have one that is
  free to use.

  Question: how can the processes that synchronize on an action a
  accurately check which labels are being used by other processes
  not involved in a?

----------------------------------------------------------------------

Recycling labels

- Event labels are used across primary graph to determine the
  maximal events in the intersection.

- If two labels are the same, they must refer to the same event

- If {i,j} meet to execute an action a, the current event needs a
  label (a,l) that is not in use in any other agent's primary
  graph

----------------------------------------------------------------------

Secondary information

- i's best information about j's primary information:

     latest_{j->k} wrt latest_{i->j}(t)

- Write as latest_{i->j->k}(t)

Observe that secondary information is "inherited" by the
comparison of secondary information.

Suppose latest_{i->k} is older than latest_{j->k}.  Then, every
event of the form latest_{i->k->k'} must be older than the
corresponding event latest_{j->k->k'}.

Hence, when i updates its primary information for k from j, it
also copies all of j's secondary information for k.  Note that
this does not involve comparing labels at the secondary level.
The original comparison of labels in the primary graph suffices
to update both primary and secondary information.

----------------------------------------------------------------------

Lemma:

Suppose e is an i-event that is in the primary graph for some
other process j.  Then, e is also a secondary event for i.

Proof:

There are paths from e to max_i(t) and max_j(t). Consider the
path from e to max_j(t). This path leaves the intersection of
view_i(t) and view_j(t) at some point. Let e',e" be events on
this path such that e' is in the intersection and e" is outside
the interesection.

e' and e" are labelled by dependent letters, so there is some
process k that takes part in both.  Clearly, e' =
latest_{i->k}(t) because e' is in the intersection and the next
k-event, e", is not.

Hence, e = latest_{i->k->i}(t).

Corollary:

If e is an i-event and e does not appear in the secondary
information of i, e does not appear in the primary graph of any
other process j.

----------------------------------------------------------------------

This gives us an algorithm with bounded labels for maintaining
primary information.  Since each process has at most n^2
secondary events and there are n processes overall, we need at
most n^3 + 1 labels to ensure that each new event can be labelled
consistently. 

- When i and j synchronize, they choose a label for the new event
  that is not in their secondary information.

- They then compare their primary graphs and update their primary
  and secondary information

======================================================================