Sunday, December 21, 2014

Old McDonald had a File, IO IO IO

One of the most basic and common functions we programmers have to implement involves reading and writing data files. So get used to doing so. The good news is that Java is well equipped with many library functions for doing this. The bad news is exactly the same. There are many library functions, so many that you can get lost in them. In fact, just when we thought we had a handle on the 'java.io' package, along came 'java.nio' (New I/O) to cloud the picture even more. Even an experienced programmer can get a little lost in the multiple classes that are in place for dealing with persistent data, and I certainly don't want to overwhelm my audience, so I'm going to stick to some basics.

 First off, I'm going to pretend that 'new I/O' never happened. I'm also going to offer you only a subset of all the functionality that's out there even in the java.io library. When it comes down to it, when you've got a file to read or write, you want to be able to just do so, without a lot of muss or fuss.

I am going to have to assume that you're reasonably familiar with directories (or folders, if you must) and files as seen by your operating system. I'm going to assume that you understand what a text file is, and that you feel reasonably comfortable with viewing them or editing them with notepad, or UltraEdit, or whatever your choice of text editor might happen to be.

First, a note on file types. There are of course a huge number of file formats out there, to support applications from AutoCAD to PhotoShop to Visual Studio, but all files can be pretty much divided into to major categories: Text and Other. Text files are broken down into lines, and many programs deal with them on a line by line basis. Other files tend to consist of plain old blocks of information, whether those be pixels for an image, or a song or movie, or quite literally anything else you can save.

Java supports both major file types of course, and has objects specifically for dealing with each. When dealing with text files it's very convenient to read or write things one line at a time, so we have that ability in our libraries.

We've used an object we call 'PersonData' before, and we'll make use of it again now.  Suppose we had a file in which we had a bunch of last names and first names stored, for now we'll assume that last names and first names each appear on a line, so the file would look something like this:

PEOPLE.TXT

    Schmoe
    Joe
    Doe
    John
    Smith
    Bill

So we've got the information here to support three PersonData objects, all contained within a file called PEOPLE.TXT.

If we wanted to read the names from this file and create PersonData objects, it's conceptually pretty simple:

Open the file for reading
As long as there's something to read
    Read a last name
    Read a first name
    Create a PersonData object with those values

That makes sense, right?

Enter the Scanner.  The Scanner object is pretty new, and frankly I'm still not completely used to it.  But it's so darned handy I have no excuse not to use it.  The Scanner has several different constructors, but we're going to make use of the one that uses the File object right now.

First, let's define the file that we intend to read.  This sets us up for the next step:

File personFile = new File("PEOPLE.TXT");

Note that this isn't something that will fail, even if PEOPLE.TXT does not actually exist.  Maybe we intend to create it next, or we want to test for its existence.  The File object is handy that way.

Scanner personScanner = new Scanner(personFile);

This can fail.  If personFile refers to a non-existent file bad things can and will happen starting here.  Proper code will be wrapped up in try/catch blocks as I discussed when we talked about exception handling.

At this point we have covered the first line of our pseudo-code algorithm:  "Open the file for reading".  The next part was "As long as there's something to read", which a Scanner can handle quite easily:

while (personScanner.hasNextLine()) {

This loop will now run until we're out of lines.  That's pretty convenient.  Now let's extract the last and first names:

    String lastName = personScanner.nextLine();
    String firstName = personScanner.nextLine();

Yes, there's a weakness or three here.  We're making the assumption that our file is correctly formatted and complete, with two lines for every person.  Good coding practice would include checking 'hasNextLine' in between those two lines and dealing with it if that isn't the case.

We're actually almost done here.  Once we've got the last name and the first name, we can go ahead and create a PersonData object:

    PersonData personFromFile = new PersonData( firstName, lastName );

At this point, technically we can end the loop.  We've done the specified job, although realistically this is useless.  There's not much point in reading a data file only to toss away the data.  You'd want to do something with these 'personFromFile' objects, like adding them to a List or Map, or storing them in some other sort of data structure your program will use.

But for now, we'll just pretend we're doing that and close down the file:

}
Scanner.close();

And that's about all there is to reading text files since Java 7 came out.  It was a bit more difficult before, when we had to create a couple of other objects in place of Scanner, and if the data included numbers or multiple items on one line it was even more involved.

Let's say that the file was formatted like this instead:

PEOPLE.TXT

    Schmoe Joe
    Doe John
    Smith Bill

That seems reasonable, right?  It is, and it's pretty easy to deal with.  All we'd need to do is change the following two lines from this:

    String lastName = personScanner.nextLine();
    String firstName = personScanner.nextLine();

To this:

    String lastName = personScanner.next();
    String firstName = personScanner.nextLine();

And we will have accomplished our goal.  We could also have done this:

    String lastName = personScanner.next();
    String firstName = personScanner.next();
    String anythingLeftover = personScanner.nextLine();

What is important is to realize that 'nextLine' is critical to advance us from one line to the next, so we have to use it before we go through the loop again.

Before I go, I also want to mention that Scanner is capable of reading not just Strings, but also any of the primitive data types as well.  It supports nextInt(), nextDouble(), etc.  It's also capable of reading more complicated input via the use of Regular Expressions, which are a huge topic in their own right.  Regular Expressions essentially provide an entire language for matching complicated text, and an entire series of articles could no doubt be written just on that topic alone.

In our next episode, we'll talk about writing text files, which is just the reverse of what we've done here.

Monday, December 15, 2014

Inheritance Part 2: That's Rather Abstract, Isn't it?

I hope you've been browsing through Java source code outside of your own and the small samples I've provided on this site.  Reading and trying to understand code can only help you in the long run.  You will encounter different code styles, you will see interesting data structures, API calls you didn't know existed, all sorts of stuff.

Today I'm going to talk about something you might have seen, but if you haven't, it's no big deal.  It is the keyword abstract.

We use abstract as part of the definition of a class, which indicates that while it is in fact a valid class file, nobody is allowed to create an object of that type.  That doesn't mean that it's useless though.  Remember, we have inheritance on our side.  You can create concrete classes (concrete is just the opposite of abstract, indicating a class from which you can create objects.  No special keyword is needed for that) that extend abstract classes all you want.

We also use abstract as part of method signatures when we want to create the idea of a particular method call but wish to defer the actual implementation of said method to our concrete classes.  This is similar in concept to interfaces, and sometimes it's more of a personal decision as to which you will use.

You see a lot of abstract classes in use when you look through frameworks, which are basically bodies of code designed to accomplish some function but with the details left up to the final developer.  There are business frameworks, logging frameworks, GUI frameworks, reporting frameworks, the list is very long.  Usually the trigger for creating a framework is when someone realizes they've just written the same thing for the fifth or tenth time, and that the differences between the various programs are relatively minor.  Maybe you query a database, format a report and write it to a file, and it's always the same except for the details of which fields you look for and print.

This is a great application for abstract classes.  Let's take a look at a simple one:

public abstract class Report {
}

Well, that's about as simple as it gets.  To be fair it's also about as useless as it gets but never fear, we'll fill in some details later.  Actually it's not entirely useless.  You might want to have a set of classes that all share a common base class so that you can collect them into a Collection of some sort, although you could also make this work with an interface.  Abstract classes really come into their own when you include some code within them.  So let's write something.

public abstract class Report {
    public abstract void GenerateOutputLine(String dataLine);
    public abstract String ReadData();

    public void runReport() {

        String dataLine = null;
        while( null != ( dataLine = ReadData() ) ) {
            GenerateOutputLine(dataLine);
        }
    }
    
}

OK, that was simple enough.  We have defined not only our abstract Report class, but made it do a little bit of work for us.  It calls ReadData until that method returns a null, and then takes the value returned from ReadData and passes it to GenerateOutputLine.  Of course, you can't actually run this report, you need to create at least one concrete class based on it.  That could be a simple test driver designed to make absolutely certain that your framework behaves properly with known data.  It could be something that reads from one file and writes to another, which would basically give you a 'copy file' command.  It could be something that reads one file, does some calculations or modifications, and then spits out the changed data.  The options are nearly endless, despite the fact that this is a highly simplified example.

Let's write a simple test driver:

public class ReportTester extends Report {

    public static void main(String[] args) {
        ReportTester rt = new ReportTester();
        rt.runReport();
    }

    String [] dataLines = {
        "one", "two", "three", "four", "five"
    };
    int lineNumber = 0;

    public void GenerateOutputLine(String dataLine) {
        System.out.println(dataLine);
    }

    public String ReadData() {
        String toReturn = null;
        if (lineNumber < dataLines.length) {
            toReturn = dataLines[ lineNumber++ ] ;
        }
        return toReturn;
    } 
}

Now if we were to run 'ReportTester' we'd get the output:

one
two
three
four
five

The main driver of the program is still in that 'Report' superclass, but the implementation details have been kept in ReportTester.  We run the code in Report, and it calls the methods it finds in ReportTester to do the work.  ReportTester itself is quite simple.  In addition to a simple main that does nothing more that create ReportTester and run it (actually running that method from the superclass Report), it defines the specific implementations for our two abstract methods.  One method does nothing more than write to the console and I hope I don't need to explain that.  The other goes through an array of Strings, returning the next one in the array and increasing the index value each time it's invoked until it reaches the end of the list, at which time it returns a null which triggers the end of the loop in 'runReport'.

This is of course tremendous overkill for a program of this size, but as your projects get more involved, the ratio of abstract code to concrete code is likely to change quite a bit.  System maintenance gets easier and faster, and your software gets more robust.  After all, a bug fix to a framework may take care of issues across several dozen different specific implementations of different reports.

I will of course revisit this topic later, because we've only scratched the surface of the possibilities brought on by object orientation.  With enough small components intelligently built, one can assemble new programs almost like putting together Tinker Toys or Lego.  One's systems can be defined in large part by configuration files, and changed easily at will, all without writing, testing or deploying new code.