Midterm 3 review

Midterm review

Hash functions:

Remainder treats the input string (byte sequence) like a base-256 number, and then takes that modulo m. m should be a prime, not too close to a power of 2. Note that if the keys are already numbers, then you just take them mod m and you’re done.
Multiplicative multiplies the return of the remainder hash by a floating-point constant, takes the fractional portion of the result, and then multiplies that by m and rounds it down (resulting in a value between 0 and \(m-1\)). The multiplier \(A\) has an effect on how well the algorithm performs. (I don’t expect you to memorize the multiplier we used.)

If we have two different hash values then we know that their keys must be different. If we have two identical hash values then we don’t know anything, because collisions (different keys that hash to the same value) are possible.

Handling collisions:

Chaining: keep a linked list in every table entry, just add colliding entries to the list.
Open addressing: if an entry is full, look for an open entry somewhere else in the table, using a probe sequence. Linear probing looks in the next entry, quadratic probing looks in the \(a_0 i + a_1 i^2\) entry, double-hashing looks in the \(\mathrm{hash}_0(k) + i \mathrm{hash}_1(k)\) entry.

The load factor \(\alpha\) of a hash table is the ratio of elements stored \(n\) to \(m\), the size of the table. With open addresses, the maximum load factor is 1.0.

Hash functions should be

Deterministic: always hash a particular key to the same value
Fast (normally)
Well-distributed over the range m. No hash value should more dramatically more common than any other.
Amplify small differences in keys (avalanche effect)

Expression trees

Data structure for encoding the (recursive) structure of various kinds of expressions. (Each kind of expression will have its own expression tree type.) We build expression trees using inheritance: we have a base class for “all expressions of this type” and then subclasses for particular types of subexpressions. E.g., in arithmetic expressions, we have

Numbers
Variables
Unary minus
Infix operators
Parentheses

We definitely need a subclass for the first four, and maybe a fifth for parentheses, depending on what we are doing. (We might even choose to split the different infix operators into different subclasses.)

Example: A ring is a mathematical structure consisting of a set of values a, b, c, etc. together with two operations:

+ a binary operation
* a binary operation

and two identified elements

0, the identity for + and
1, the identity for *

(Parentheses are also allowed.)

An expression tree structure for general rings would need:

A node for elements in general
A node for 0
A node for 1
A binary node for +
A binary node for *

A sketch of this would be something like

struct ring {}; // Base class

struct ring_elem : public ring { string e; };
struct ring_zero : public ring { };
struct ring_one  : public ring { };
struct ring_plus : public ring { ring* left; ring* right; };
struct ring_mult : public ring { ring* left; ring* right; };

(If I ask you to do this on a test, you don’t need to worry about constructors or methods.)

Example: Suppose we have a sequence of tokens of the following types:

Binary literals 00101001
The binary complement operator ~
The binary OR operator +
The binary AND operator *
The binary XOR operator ^
Open and closed parentheses

and the following grammar

bin -> LITERAL
bin -> '~' bin
bin -> bin OP bin
bin -> bin '~' OP bin
bin -> '(' bin ')'

where OP is any of the infix operators. Note that this grammar allows us to write things like 001 ~* 101 where ~* means NAND.

Given this grammar, write a recognizer, using the helper functions is_bin() (is a binary literal), is_op (is an infix operator).
Give a sketch of an expression tree type (just the classes/structs and their inheritance and data members) for this expression type.
Suppose that the precedence of * is higher than that of + and ^, and the precedence of ~ is highest of all. (The precedence of the negated infix operators is the same as that of their unnegated forms.)

a) What precedence levels should be assigned to the different tree node types?

b) Rewrite the grammar so that it enforces these precedence levels.

Review of earlier material

Big-O complexity. E.g., binary search. Cutting the search space in half means logarithmic time.

Linked lists

class list {
  public:
    class node {
      int value;
      node* next;
      node* prev;
    };

  node* head;
  node* tail;
};

Implement various operations. E.g., how to find the length of a list:

int length(list l) {
  node* curr = l.head;
  int len = 0;
  while(curr) {
    len++;
    curr = curr->next;
  }

  return len;
}

Stacks: push, pop. Stack underflow: popping from an empty stack. Stack overflow: pushing onto a “full” stack (for stacks that have a maximum size). E.g., matching HTML tags: <html> with </html>. What happens if there are too many opening tags? Too many closing tags? Balanced tags?

Queues

Implementing stack on top of list.

Binary search: implementation, loop based, recursive

Sorting algorithms: bubble sort, insertion sort, selection sort.

Merge sort: merge operation

Quick sort: partition operation, choice of pivot, how it affects performance

Binary trees: tree rotations

BST operations: find, insert, delete. Three cases for delete.