Vectors and Arrays

Next we’re going to look at two new types, that allow us to store collections of data. Often, we don’t know how many of a thing we will need; we need some kind of structure that can contain an arbitrary (and sometimes changing!) number of items. Vectors and arrays satisfy this need, in different ways. When combined with loops, vectors/arrays give us a way to store and manipulate unknown amounts of data.

One useful way to think of a vector or an array is as a table. For example, suppose we want to store a list of names, like this

Number	Name
0	David
1	Prudence
2	Todd
3	Ming
4	Alex
5	Shikego
6	Brent
7	Jin

(These are the faculty from my grad school.)

The first column of a vector/array-ish table is always a number, it always starts at 0, and it always counts up by 1. The second column (only 2 columns for now) is the data we want to store.

A vector/array associates a number with a piece of data. The primary operation is looking up a piece of data, from its number. E.g., “what is the name in row 5?”. We can also do things like ask “what is the first name” (i.e., what is the name in row 0), “how many names are there” (what is the number of the last row, plus one). Vectors support the additional ability of adding new rows at the end, and inserting or deleting rows in the middle; the numbering is automatically adjusted. Arrays, for the time begin, have a fixed number of elements.

Vectors

In order to use a vector, you must add

#include <vector>

to the top of your file.

The vector type is a compound type, which means it’s built out of other types. In this case, we have to tell vector what we want it to store: do we want a vector of ints, or a vector of strings, etc. Thus, the type for a vector looks like this:

vector<int>           // a vector of ints
vector<string>        // a vector of strings
vector<vector<float>> // a vector of vectors (!) of floats

You declare a variable of type vector as with any other type:

vector<int> data;     // Creates an empty vector
vector<int> items(3); // Creates a vector with space for 3 ints
vector<string> items(3,"No name"); // Vector with "fill"
vector<float> samples = { 1.0, 2.0, 4.5, 6.6 }; // Give initial contents

The last one corresponds to the table:

Number	Sample
0	1.0
1	2.0
2	4.5
3	6.6

Internally, a vector looks very much like a string: it has internal boxes, numbered starting at 0, that hold, not just chars, but whatever the “element type” is (int, float, etc.). Just as with a string, we can access a particular element, given its row number, with .at:

vector<int> samples = { 1.0, 2.0, 4.5, 6.6 };
samples.at(3)       // evaluates to 6.6
samples.at(0) = 5.2 // sets first element to 5.2

(Demonstrate these on the table)

Vector supports some of the same operations as string:

Operation	Description
`v.size()`	Length of the vector, as an int
`v.insert(v.begin()+pos,w);`	Insert a vector/element, after `pos` position
`v.erase(v.begin()+start,v.begin()+end);`	Remove the portion from `start` up to `end`
`v.at(pos)`	Extract/modify a single element, safely
`v[pos]`	Extract/modify an element unsafely

Examples:

samples.size() // = 4
samples.erase(1,2);

If we have

vector<float> more_data = {-1.2, 0.3};

Number	Data
0	-1.2
1	0.3

Then we can do

samples.insert(samples.begin()+1, more_data);

You can also insert a single value:

samples.insert(samples.begin()+2, 1.5);

Number	Sample
0	1.0
1	-1.2
2	0.3
3	2.0
4	4.5
5	6.6

More operations:

Operation	Description
`v.front()`	Gives the first element of `v`
`v.back()`	Gives the last element of `v`
`v.push_back(e)`	Add another element `e` to the end of `v`
`v.pop_back()`	Remove the last element of the vector
`v.empty()`	Returns `true` if `v`‘s size is 0
`v.assign(s,e)`	Set the size to `s` and then fill with `e`

An .assign() is basically like doing a .clear() and then a .resize().

Examples: (demo all)

samples.front()  //  1.0
samples.back()   // 6.6
samples.push_back(6.7);
samples.pop_back();
samples.empty()  // false

Vectors support an operation for changing their size: v.resize(n).

If n is smaller than the current size, then the elements at the end are simply thrown away. (Resizing always happens at the end of the vector.)
If n is larger than the current size, then new “default” elements are added to pad out the end. E.g., the default element for int is 0, for string it is the empty string, etc.
If n is larger you can optionally specify the new element to be used for the padding, via
```
 v.resize(n, new_element);
```
Enough copies of new_element will be added to the end to fill it out to the new size n.

Interestingly, you can use all the comparison operators on vectors:

vector<int> v1 = {1, 2, 3};
vector<int> v2 = {2, 3, 4};
vector<int> v3 = {1, 1, 3};

v1 == v1 // true
v1 == v2 // false
v1 != v2 // true
v1 < v2  // true
v1 > v3  // true

The ordering comparisons proceed lexicographically, which means they first compare elements 0; if the 0 elements are different, then that gives the result of the comparison. If the 0 elements are the same, it then checks the 1 elements, and so forth. (This is basically just the usual dictionary ordering, extended to things other than words.)

I’m going to show you an example of how to read an arbitrary number of things into a vector:

#include <iostream>
#include <vector>
using namespace std;

int main() {
    vector<int> values;

    cout << "Enter values, press Ctrl-D when finished." << endl;

    int v;
    while(cin >> v) {
        cout << "You entered " << v << endl;
        values.push_back(v);
    }

    cout << "You entered " << values.size() << "values." << endl;
    return 0;
}

This program uses a loop to keep reading input from the user until it reaches the end of the input stream (Ctrl-D means “end of input”). Although we haven’t used it, whenever we do cin >> ... it actually not only reads from the input, it evaluates to a true/false-like value. The value is true if the input was successful, and false if the input stream has nothing else to give. (We can end the stream by pressing Ctrl-D.)

(Walk through example, constructing the rows as we go.)

Now for a puzzle: can we write a program that reads in a vector’s elements, and then prints them out, one by one? There are a couple of ways of doing this:

#include <iostream>
#include <vector>
using namespace std;

int main() {
    vector<int> values;

    cout << "Enter values, press Ctrl-D when finished." << endl;

    int v;
    while(cin >> v) {
        cout << "You entered " << v << endl;
        values.push_back(v);
    }

    cout << "You entered " << values.size() << "values." << endl;

    while(!values.empty()) {
        cout << values.back() << endl;
        values.pop_back();
    }

    return 0;
}

Notice that this prints them out in reverse order, because we add new elements to the back, but then we remove from the back as well.

(Demo this with some sample data, growing and shrinking the table as we go.)

Or, if we don’t want to destroy the vector in order to print it out, we could just use a for loop:

for(int i = 0; i < values.size(); ++i)
    cout << values.at(i);

Let’s work through some more examples of using vectors:

Example: Prompt the user for three ints a, b, and c and then creates a vector that contains (in order) a copies of a, b copies of b, and c copies of c, and then prints the size of the vector.

Example: Using two vectors and the while loop above, write a program that inputs a vector and then reverses it.

Example: Without using size, write a loop which counts the number of elements in a vector.

Arrays

Arrays are like low-level vectors. They don’t support any operations beyond looking up elements by position, and their size is fixed when they are created. While vectors are “better” than arrays, it’s important to understand arrays, as they are the basis for many advanced kinds of structures (vectors, in fact, are built from arrays internally).

The syntax for creating an array comes in a couple different forms:

int a[] = {1, 2, 3}; // Size determined from initial contents;
int b[3];            // No initial contents -> must specify size

(Note that the initial contents of b are undefined, just as with any other variable which we do not initialize.)

Either way, you can’t change the size of an array after it’s created. Arrays also lack all the “fancy” ways of manipulating elements. Pretty much the only thing you can do with an array after it’s created is access its elements, by index:

a[0] // = 1
a[2] = 3;

While vector will signal an error if you try to access an element outside of its range (i.e., v.at[-1] or v.at[v.size()+1]) an array is unsafe; this means that you can access outside its range, but the results will be undefined.

If you read outside of an array, you will get whatever happens to be in the computer’s memory at that location, which is unpredictable.
If you write outside of an array, who knows what will happen. Your program may crash (most likely), or may break itself in strange ways. You can’t break your computer (thankfully) or other programs, but your own program may die a horrible death.

Most of the time, accessing outside an array will cause a “SIGSEGV” error. Unfortunately, the system won’t give you any information about why this happened. If you want all the details, run your program via
```
 catchsegc ./myprogram
```
It will print out a lot of information about what was going on; if you scroll up, it will have the line number where the error occured.

To demonstrate the latter:

int main() {
    int a[4]; 
    a[10] = 100;

    return 0;
}

(We don’t need the #include or using namespace std; because we’re literally not doing anything; arrays and ints are built-in.)

Compile and run via catchsegv to demo, point out where to get the line number.

Two-dimensional vectors and arrays

Sometimes we need to store a two-dimensional arrangement of data. For example, in order to store an image, we must store the color at each pixel, and pixels are arranged in a two-dimensional grid. In general, two-dimensional vectors/arrays can be visualized as grids.

(draw example)

However, as we will see, the optimal arrangement for this is quite different, internally.

There are two ways to store a two-dimensional array/vector:

As a vector of vectors (or array of arrays)
As a one-dimensional array, with a coordinate transform

Constructing an array of arrays is relatively easy:

int a[8][8]; // 8x8 array of arrays

Likewise, a vector of vectors can be constructed, somewhat more obscurely, as

vector<vector<int>> b(8, vector<int>(8));

(We are actually using the “fill” method, where each of the 8 rows is filled with 8 vector<int>(8)s; i.e., with empty rows of length 8.)

To access a particular element x,y we just write

a[x][y]       // for arrays
b.at(x).at(y) // for vectors

Most of the vector operations don’t make sense for a vector-of-vectors (at least, not for one where we intend to keep the size fixed, and all the rows the same length).

An array-of-arrays will contain undefined data when it is first created. A vector of vectors will contain the “default value” for its element type; for numbers this is 0. Thus, vectors are somewhat safer in that it is impossible to use them “uninitialized”.

Doing anything interesting with two (or more) dimensional arrays/vectors generally requires loops, which we’ll cover later.

Storing an array-of-arrays (or a vector-of-vectors) has a few disadvantages

There’s a size penalty. Each array/vector requires some additional data, beyond just that required for the things in it. Thus, storing an $m \times n$ array requires more than $mn$ space (it’s actually closer to $mn + m)$, with the extra data for each row). Normally this isn’t big enough to worry about, but it’s worth taking into consideration.
A vector-of-vectors raises the possibility that the rows may come to have different lengths. This is not something we want. We want to think of a 2D vector as a $m \times n$ grid, and that doesn’t work if some of the rows are shorter or longer than others. Removing this possibility would be good.

The other way of storing an $m \times n$ grid is as an array/vector with exactly $mn$ elements. How can we map x and y to the index into this? Well, time for a puzzle:

Draw 3x3, 4x4, 5x5 grids with elements numbered sequentially. Given a particular x and y, how can we compute the i (index), written in each cell?

The solution is

i = x + y*width;

Note that we don’t care how tall the grid is, just how wide.

This arrangement is called row-major ordering, because the elements of the rows are listed in order. (in order to go left/right in a row, you just move to the next/prev element. In order to move down/up a column, you have to skip ahead/back by width.) Also note that, unlike the way we usually draw a 2D coordinate space, this layout has positive Y moving down.

The row-major ordering is common for images, games, etc. It trades a bit of (easy) math for better space usage.

We can actually extend row-major ordering to 3 (or more) dimensions. E.g., if we have x, y, and z, along with width, height, and depth then we have the computation

i = x + y*width + (width*height)*z

The general computation, if we have coordinates $d_1, d_2, \ldots$ and dimensions $D_1, D_2, \ldots$ is

$$i = d_1 + d_2 D_1 + d_3 D_1 D_2 \ldots + d_n (D_{n-1} D_{n-2} \ldots D_2 D_1)$$

Another puzzle:

Given i and width, can you recover x and y?

Parallel arrays and vectors

If an array or vector corresponds to a table with two columns (one of which is always “Number”) then parallel arrays/vectors give us a way to add more columns.

The idea of a parallel vector/array is when we need to associate more than one piece of information with a particular index. Normally, a vector or array associates a single piece of data (e.g., a string, float, int, etc.) with each index. Suppose, for example, we want to maintain a table of Fullerton College faculty, with names and a bool indicating whether or not they have tenure. A parallel vector for this might look like:

vector<string> names =    { "Andy Clifton", "Scott Edwards", "Caleb Petrie" };
vector<bool> has_tenure = { false,          true,            false };

Each particular index (e.g., 0) is assigned to a single “record”, so names[0] contains my name, while has_tenure[0] says whether or not I have tenure (I don’t). This corresponds to the table:

Number	Name	Has Tenure
0	Andy Clifton	`false`
1	Scott Edwards	`true`
2	Caleb Petrie	`false`

Later on we’ll see a much better way of organizing “aggregate” data like this, but for now, it gives us a good way of associating things together: just make sure the associated items are all at the same index (number).

In general, we’ll assign the number of the thing we are interested to a variable (e.g., i), then we can get all the associated information with

names[i]
has_tenure[i]
office[i]
// etc.

Remember that there’s no relationship between elements with different indices. E.g., it makes no sense to look at

names[i]
has_tenure[i+1]
office[i-1]

On the other hand, since an index identifies a particular person (in this example), we can easily create another column that “points to” a person. Suppose we wanted to store each person’s supervisor. The other person is identified by their own number, so we have the table:

Number	Name	Has Tenure	Supervisor
0	Andy C.	`false`	1
1	Scott E.	`true`	…

Thus, we can easily build “tables” where the rows refer to each other.

Like I said, later on, we’ll see a much better way of building things like this. We will esentially create our own “type” (just as int and float and string are types) faculty that will contain within it all the information associated with a particular faculty member. Thus, instead of being scattered over several different arrays, we will just create an array of faculty and each element of that array will have everything we need for that person. Really, in C++ there’s almost no reason to use parallel arrays; I’m only showing them to you so you know what they are, if someone refers to them.

Processing vectors with loops

If we don’t necessarily know how big a vector is, how can we do something to all the elements in it? Simple: we write a loop that is based on the indices of the elements in the vector. Say a vector has 12 elements (i.e., v.size() == 12). Then we know that the first element has index 0, and the last has index 12 - 1 = 11, so a loop that goes from 0 to v.size()-1 will process everything in the vector. Furthermore, if we define it this way, it will work regardless of what the size is.

Because a for loop is the usual way to expression a numeric from-to loop, you should get used to writing loops like this:

vector<string> names = {"Andy", "Mark", "Scott", "Steve"};
for(int i = 0; i < names.size(); i++) 
    cout << names.at(i) << endl;

(I used i < names.size() as the condition, but I could just as well have used i <= names.size() - 1. It’s just that the first one is less typing.)

The range-based `for` loop

The ranged-based for loop works with vectors, too! Suppose you want to print out all the elements of a vector. One way to do this is to write a for loop that starts at 0 and goes to v.size()-1 (or just less than length). This looks like

vector<int> values = ...;
for(int i = 0; i < values.size(); ++i)
    cout << values.at(i) << endl;

This sort of thing is so common that it’s useful to have a consise way to express “loop over all the contents of this container”. And there is:

for(int v : values)
    cout << v << endl;

Accumulating loops

A general term for loops that compute sums, products, etc. is an accumulating loop. These have the general structure

accumulator = identity_value;
for(element : collection)
    accumulator OP= element ...;

Note that the loop need not be ranged-for. I just wrote it that way for clarity. E.g., the “collection” might be “all the integers between 1 and 100”, in which case a normal for-loop would be much more suitable.

For example, to sum all the elements of a vector we would do

int sum = 0;
for(int e : data)
    sum += e;

The identity value depends on the operator OP, the idea is to give a value that will have no effect on the calculation. That is, the identity value is the value i such that $i \mathrm{OP} a = a$ for any a. For +, the identity value is 0, because $0 + a = a$. For multiplication it is 1.

What is the identity value for &&? Accumulating loops where OP is && usually express the idea that all of the elements of some collection have some property.
What is the identity value for ||? Loops where OP is || express the idea that any of the elements of a collection fill some condition.

There’s no special syntax for accumulating loops, but it can help you to think about what’s going on if you recognize a loop as one that is accumulating.