Midterm review
Hash functions:
Remainder treats the input string (byte sequence) like a base-256 number, and then takes that modulo m. m should be a prime, not too close to a power of 2. Note that if the keys are already numbers, then you just take them mod m and you’re done.
Multiplicative multiplies the return of the remainder hash by a floating-point constant, takes the fractional portion of the result, and then multiplies that by m and rounds it down (resulting in a value between 0 and \(m-1\)). The multiplier \(A\) has an effect on how well the algorithm performs. (I don’t expect you to memorize the multiplier we used.)
If we have two different hash values then we know that their keys must be different. If we have two identical hash values then we don’t know anything, because collisions (different keys that hash to the same value) are possible.
Handling collisions:
Chaining: keep a linked list in every table entry, just add colliding entries to the list.
Open addressing: if an entry is full, look for an open entry somewhere else in the table, using a probe sequence. Linear probing looks in the next entry, quadratic probing looks in the \(a_0 i + a_1 i^2\) entry, double-hashing looks in the \(\mathrm{hash}_0(k) + i \mathrm{hash}_1(k)\) entry.
The load factor \(\alpha\) of a hash table is the ratio of elements stored \(n\) to \(m\), the size of the table. With open addresses, the maximum load factor is 1.0.
Hash functions should be
Deterministic: always hash a particular key to the same value
Fast (normally)
Well-distributed over the range m. No hash value should more dramatically more common than any other.
Amplify small differences in keys (avalanche effect)
Expression trees
Data structure for encoding the (recursive) structure of various kinds of expressions. (Each kind of expression will have its own expression tree type.) We build expression trees using inheritance: we have a base class for “all expressions of this type” and then subclasses for particular types of subexpressions. E.g., in arithmetic expressions, we have
Numbers
Variables
Unary minus
Infix operators
Parentheses
We definitely need a subclass for the first four, and maybe a fifth for parentheses, depending on what we are doing. (We might even choose to split the different infix operators into different subclasses.)
Example: A ring is a mathematical structure consisting of a set of values a, b, c, etc. together with two operations:
+
a binary operation*
a binary operation
and two identified elements
0
, the identity for+
and1
, the identity for*
(Parentheses are also allowed.)
An expression tree structure for general rings would need:
A node for elements in general
A node for 0
A node for 1
A binary node for
+
A binary node for
*
A sketch of this would be something like
struct ring {}; // Base class
struct ring_elem : public ring { string e; };
struct ring_zero : public ring { };
struct ring_one : public ring { };
struct ring_plus : public ring { ring* left; ring* right; };
struct ring_mult : public ring { ring* left; ring* right; };
(If I ask you to do this on a test, you don’t need to worry about constructors or methods.)
Example: Suppose we have a sequence of tokens of the following types:
Binary literals
00101001
The binary complement operator
~
The binary OR operator
+
The binary AND operator
*
The binary XOR operator
^
Open and closed parentheses
and the following grammar
bin -> LITERAL
bin -> '~' bin
bin -> bin OP bin
bin -> bin '~' OP bin
bin -> '(' bin ')'
where OP
is any of the infix operators. Note that this grammar allows us
to write things like 001 ~* 101
where ~*
means NAND.
Given this grammar, write a recognizer, using the helper functions
is_bin()
(is a binary literal),is_op
(is an infix operator).Give a sketch of an expression tree type (just the classes/structs and their inheritance and data members) for this expression type.
Suppose that the precedence of
*
is higher than that of+
and^
, and the precedence of~
is highest of all. (The precedence of the negated infix operators is the same as that of their unnegated forms.)a) What precedence levels should be assigned to the different tree node types?
b) Rewrite the grammar so that it enforces these precedence levels.
Review of earlier material
Big-O complexity. E.g., binary search. Cutting the search space in half means logarithmic time.
Linked lists
class list {
public:
class node {
int value;
node* next;
node* prev;
};
node* head;
node* tail;
};
Implement various operations. E.g., how to find the length of a list:
int length(list l) {
node* curr = l.head;
int len = 0;
while(curr) {
len++;
curr = curr->next;
}
return len;
}
Stacks: push, pop. Stack underflow: popping from an empty stack. Stack overflow:
pushing onto a “full” stack (for stacks that have a maximum size). E.g.,
matching HTML tags: <html>
with </html>
. What happens if there are too
many opening tags? Too many closing tags? Balanced tags?
Queues
Implementing stack on top of list.
Binary search: implementation, loop based, recursive
Sorting algorithms: bubble sort, insertion sort, selection sort.
Merge sort: merge operation
Quick sort: partition operation, choice of pivot, how it affects performance
Binary trees: tree rotations
BST operations: find, insert, delete. Three cases for delete.