Monday, November 17, 2014

Arrays of Golden Sun

We've seen processing of simple integer values already, but what if you have to deal with a bunch of them?  Perhaps you have 20 test results and want to do something with them.

I suppose we could write something like this:

int testResult1;
int testResult2;
int testResult3;
...

Then we could write a calculateAverage function...

int calculateAverage(int testResult1, int testResult2, int testResult3..

You know, I'm already tired of typing that, and I've got a whole lot more to go.  And you just know that next year there will be 22 tests, right?

Never fear, we aren't stuck with such awfulness.  Java (as with most languages) supports the concept of arrays.  Don't fear the word, the concept is actually pretty simple to grasp.

First, let's go back and deal with testResult1.  We can think of that variable as being a box that holds a number.  We can put a number in the box, or we can ask the box what number is in it.

testResult1 = 97;  //We've just put 97 into the box

System.out.println("Your first test result was " + testResult1 ); // We've just asked the box what number is in it.

All the operations boil down to one of those.  If we do math, we're just asking what's in the box.  If we reassign the number (like 'testResult1 ++') we're really just asking what's inside, making a new number and putting the new one into the box.  Whatever was in the box before is gone once you do this, there's no room for two numbers.  These are pretty simplistic boxes.

So, I can conceive of 'testResult1' through 'testResult20' as being a row of 20 boxes, each of which I can use to store or retrieve a number.  But man, it's going to be annoying to do much useful work with individually addressable variables like that.

Enter the array.

Let's just take that same row of 20 boxes and relabel them.  In fact, let's glue them together in a long line and give the whole spiel one single label.  We'll call it 'testResults'.

Of course, testResults is not an int.  It's a grouping of 20 ints.  That's going to make it slightly more difficult to do math on it.  Enter the square bracket and the array index.

If we want to refer to the item in the first box, first we have to get over a little hump.  For reasons we do not really need to get into now, just trust me that they're valid, the first box is number 0, not 1.  So, what we previously would have called 'testResult1' can now be called 'testResults[0]'.  It is important to understand that 'testResults' is NOT an integer.  However, testResults[0] (or testResults[13]) IS an integer and is just a marker for one of the boxes in our row.

I know, that seems a bit more difficult at first, but it turns out not to be.  Here's why:

Remember before?  We had to declare testResult1, testResult2, etc. on individual lines?

Well now we can just do:

int [] testResults = new int [ 20 ];

We've specified that testResults is not an int, but an array of ints with the '[]' notation.  We've made it have 20 boxes with 'new int [ 20 ]'.  (This can be done live at runtime, you could do 'new int [ someCalculatedValue ]').  In one fell swoop we've made storage space for 20 individual (but related) integer values.  It gets better.

The 'calculateAverage' function can now be declared like this:

int calculateAverage( int [] valuesToAverage ) {

Well isn't that a whole lot easier?  And you can imagine that before, that calculateAverage method would have had to look something like this:

return (value1 + value2 + value3 + ... + value20) / 20;

Which seems like it takes up a lot of valuable real estate on screen, and, as mentioned before, is irritating to change.

Our new version takes advantage of the fact that the array is indexed (it could do more but I'll talk about that later):

float total = 0;

for (int i = 0; i < 20; i ++ ) {
    total = total + valuesToAverage[i];
}
return total / 20f;

That's the whole thing.  I didn't even get tired.

Of course, this can be improved further.  What if there aren't 20 values in the array?  What if there are 12?  We could change the code, or we could let the array tell us how big it is:

float total = 0;
float valueCount = valuesToAverage.length;

for (int i = 0; i < valueCount; i++ ) {
    total = total + valuesToAverage[i];
}
return total/valueCount;

Now we have a completely generic method for averaging any number of integer values we want, that we will never have to change again.  All we need to do is make sure before using it that the array we're passing in is sized correctly and populated.  We can stash that into a utility class somewhere and use it for years.

Naturally none of this is going to work without stashing numbers in the array in the first place.

One can initialize an array with hard coded values:

int [] testScores = { 100, 95, 22, 84 };

Or one can write code to set the values.  Assuming we have appropriate helper functions in place, this kind of thing could work.  It's more likely that you'd be getting input from the user or a file for your early experiments with the language.  I think we'll talk about those soon.

int [] testScores = new int[ getNumberOfScores() ];
for ( int i = 0; i < getNumberOfScores(); i++ ) {
    testScores[i] = getScore(i);
}

Of course, that's just the tip of the array iceberg, but I think it gets across the main points:

Arrays:

  1. Are just ordered, indexed lists of the data types we already understand
  2. Can be passed as single labels but all the internal values are readily available
  3. Contain 'meta' information about their contents, particularly how many items they hold
  4. Can make many data processing tasks easier to code and understand
If you're having trouble with this concept, please let me know.  It's good to have a firm grounding in basic aggregate data structures like this before we move on to more complex topics.  Much of what we do in programming involves understanding some concept, then not using it directly, but instead using higher level structures that are better handled if you know what's going on underneath.

View Code