The Seductions of Scala, Part II - Functional Programming 203

Posted by Dean Wampler Wed, 06 Aug 2008 01:32:00 GMT

A Functional Programming Language for the JVM

In my last blog post, I discussed Scala’s support for OOP and general improvements compared to Java. In this post, which I’m posting from Agile 2008, I discuss Scala’s support for functional programming (FP) and why it should be of interest to OO developers.

A Brief Overview of Functional Programming

You might ask, don’t most programming languages have functions? FP uses the term in the mathematical sense of the word. I hate to bring up bad memories, but you might recall from your school days that when you solved a function like

    
y = sin(x)
    

for y, given a value of x, you could input the same value of x an arbitrary number of times and you would get the same value of y. This means that sin(x) has no side effects. In other words, unlike our imperative OO or procedural code, no global or object state gets changed. All the work that a mathematical function does has to be returned in the result.

Similarly, the idea of a variable is a little different than what we’re used to in imperative code. While the value of y will vary with the value of x, once you have fixed x, you have also fixed y. The implication for FP is that “variables” are immutable; once assigned, they cannot be changed. I’ll call such immutable variables value objects.

Now, it would actually be hard for a “pure” FP language to have no side effects, ever. I/O would be rather difficult, for example, since the state of the input or output stream changes with each operation. So, in practice, all “pure” FP languages provide some mechanisms for breaking the rules in a controlled way.

Functions are first-class objects in FP. You can create named or anonymous functions (e.g., closures or blocks), assign them to variables, pass them as arguments to other functions, etc. Java doesn’t support this. You have to create objects that wrap the methods you want to invoke.

Functional programs tend to be much more declarative in nature than imperative programs. This is perhaps more obvious in pure FP languages, like Erlang and Haskell, than it is in Scala.

For example, the definition of Fibonacci numbers is the following.

    
F(n) = F(n-1) + F(n-2) where F(1)=1 and F(2)=1
    

An here is a complete implementation in Haskell.

    
module Main where 
-- Function f returns the n'th Fibonacci number. 
-- It uses binary recursion. 
f n | n <= 2 = 1 
    | n >  2 = f (n-1) + f (n-2) 
    

Without understanding the intricacies of Haskell syntax, you can see that the code closely matches the “specification” above it. The f n | ... syntax defines the function f taking an argument n and the two cases of n values are shown on separate lines, where one case is for n <= 2 and the other case if for n > 2.

The code uses the recursive relationship between different values of the function and the special-case values when n = 1 and n = 2. The Haskell runtime does the rest of the work.

It’s interesting that most domain-specific languages are also declarative in nature. Think of how JMock, EasyMock or Rails’ ActiveRecord code look. The code is more succinct and it lets the “system” do most of the heavy lifting.

Functional Programming’s Benefits for You

Value Objects and Side-Effect Free Functions

It’s the immutable variables and side-effect free functions that help solve the multicore problem. Synchronized access to shared state is not required if there is no state to manage. This makes robust concurrent programs far easier to write.

I’ll discuss concurrency in Scala in my third post. For now, let’s discuss other ways that FP in Scala helps to improve code, concurrent or not.

Value objects are beneficial because you can pass one around without worrying that someone will change it in a way that breaks other users of the object. Value objects aren’t unique to FP, of course. They have been promoted in Domain Driven Design (DDD), for example.

Similarly, side-effect free functions are safer to use. There is less risk that a caller will change some state inappropriately. The caller doesn’t have to worry as much about calling a function. There are fewer surprises and everything of “consequence” that the function does is returned to the caller. It’s easier to keep to the Single Responsibility Principle when writing side-effect free functions.

Of course, you can write side-effect free methods and immutable variables in Java code, but it’s mostly a matter of discipline; the language doesn’t give you any enforcement mechanisms.

Scala gives you a helpful enforcement mechanism; the ability to declare variables as val’s (i.e., “values”) vs. var’s (i.e., “variables”, um… back to the imperative programming sense of the word…). In fact, val is the default, where neither is required by the language. Also, the Scala library contains both immutable and mutable collections and it “encourages” you to use the immutable collections.

However, because Scala combines both OOP and FP, it doesn’t force FP purity. The upside is that you get to use the approach that best fits the problem you’re trying to solve. It’s interesting that some of the Scala library classes expose FP-style interfaces, immutability and side-effect free functions, while using more traditional imperative code to implement them!

Closures and First-Class Functions

True to its functional side, Scala gives you true closures and first-class functions. If you’re a Groovy or Ruby programmer, you’re used to the following kind of code.

    
class ExpensiveResource {
    def open(worker: () => Unit) = {
        try {
            println("Doing expensive initialization")
            worker()
        } finally {
            close()
        }
    }
    def close() = {
        println("Doing expensive cleanup")
    }
}
// Example use:
try {
    (new ExpensiveResource()) open { () =>        // 1
        println("Using Resource")                 // 2
        throw new Exception("Thrown exception")   // 3
    }                                             // 4
} catch {
    case ex: Throwable => println("Exception caught: "+ex)
}
    

Running this code will yield:

    
Doing expensive initialization
Using Resource
Doing expensive cleanup
Exception caught: java.lang.Exception: Thrown exception
    

The ExpensiveResource.open method invokes the user-specified worker function. The syntax worker: () => Unit defines the worker parameter as a function that takes no arguments and returns nothing (recall that Unit is the equivalent of void).

ExpensiveResource.open handles the details of initializing the resource, invoking the worker, and doing the necessary cleanup.

The example marked with the comment // 1 creates a new ExpensiveResource, then calls open, passing it an anonymous function, called a function literal in Scala terminology. The function literal is of the form (arg_list_) => function body or () => println(...) ..., in our case.

A special syntax trick is used on this line; if a method takes one argument, you can change expressions of the form object.method(arg) to object method {arg}. This syntax is supported to allow user-defined methods to read like control structures (think for statements – see the next section). If you’re familiar with Ruby, the four commented lines read a lot like Ruby syntax for passing blocks to methods.

Idioms like this are very important. A library writer can encapsulate all complex, error-prone logic and allow the user to specify only the unique work required in a given situation. For example, How many times have you written code that opened an I/O stream or a database connection, used it, then cleaned up. How many times did you get the idiom wrong, especially the proper cleanup when an exception is thrown? First-class functions allow writers of I/O, database and other resource libraries to do the correct implementation once, eliminating user error and duplication. Here’s a rhetorical question I always ask myself:

How can I make it impossible for the user of this API to fail?

Iterations

Iteration through collections, Lists in particular, is even more common in FP than in imperative languages. Hence, iteration is highly evolved. Consider this example:

    
object RequireWordsStartingWithPrefix {
    def main(args: Array[String]) = {
        val prefix = args(0)
        for {
            i <- 1 to (args.length - 1)   // no semicolon
            if args(i).startsWith(prefix)
        } println("args("+i+"): "+args(i))
    }
}
    

Compiling this code with scalac and then running it on the command line with the command

    
scala RequireWordsStartingWithPrefix xx xy1 xx1 yy1 xx2 xy2
    

produces the result

    
args(2): xx1
args(5): xx2
    

The for loop assigns a loop variable i with each argument, but only if the if statement is true. Instead of curly braces, the for loop argument list could also be parenthesized, but then each line as shown would have to be separated by a semi-colon, like we’re used to seeing with Java for loops.

We can have an arbitrary number of assignments and conditionals. In fact, it’s quite common to filter lists:

    
object RequireWordsStartingWithPrefix2 {
    def main(args: Array[String]) = {
        val prefix = args(0)
        args.slice(1, args.length)
            .filter((arg: String) => arg.startsWith(prefix))
            .foreach((arg: String) => println("arg: "+arg))
    }
}
    

This version yields the same result. In this case, the args array is sliced (loping off the search prefix), the resulting array is filtered using a function literal and the filtered array is iterated over to print out the matching arguments, again using a function literal. This version of the algorithm should look familiar to Ruby programmers.

Rolling Your Own Function Objects

Scala still has to support the constraints of the JVM. As a comment to the first blog post said, the Scala compiler wraps closures and “bare” functions in Function objects. You can also make other objects behave like functions. If your object implements the apply method, that method will be invoked if you put parentheses with an matching argument list on the object, as in the following example.

    
class HelloFunction {
    def apply() = "hello" 
    def apply(name: String) = "hello "+name
}
val hello = new HelloFunction
println(hello())        // => "hello" 
println(hello("Dean"))  // => "hello Dean" 
    

Option, None, Some…

Null pointer exceptions suck. You can still get them in Scala code, because Scala runs on the JVM and interoperates with Java libraries, but Scala offers a better way.

Typically, a reference might be null when there is nothing appropriate to assign to it. Following the conventions in some FP languages, Scala has an Option type with two subtypes, Some, which wraps a value, and None, which is used instead of null. The following example, which also demonstrates Scala’s Map support, shows these types in action.

    
val hotLangs = Map(
    "Scala" -> "Rocks", 
    "Haskell" -> "Ethereal", 
    "Java" -> null)
println(hotLangs.get("Scala"))          // => Some(Rocks)
println(hotLangs.get("Java"))           // => Some(null)
println(hotLangs.get("C++"))            // => None
    

Note that Map stores values in Options objects, as shown by the println statements.

By the way, those -> aren’t special operators; they’re methods. Like ::, valid method names aren’t limited to alphanumerics, _, and $.

Pattern Matching

The last FP feature I’ll discuss in this post is pattern matching, which is exploited more fully in FP languages than in imperative languages.

Using our previous definition of hotLangs, here’s how you might use matching.

    
def show(key: String) = {
    val value: Option[String] = hotLangs.get(key)
    value match {
        case Some(x) => x
        case None => "No hotness found" 
    }
}
println(show("Scala"))  // => "Rocks" 
println(show("Java"))   // => "null" 
println(show("C++"))    // => "No hotness found" 
    

The first case statement, case Some(x) => x, says “if the value I’m matching against is a Some that could be constructed with the Some[+String](x: A) constructor, then return the x, the thing the Some contains.” Okay, there’s a lot going on here, so more background information is in order.

In Scala, like Ruby and other languages, the last value computed in a function is returned by it. Also, almost everything returns a value, including match statements, so when the Some(x) => x case is chosen, x is returned by the match and hence by the function.

Some is a generic class and the show function returns a String, so the match is to Some[+String]. The + in the +String expression is analogous to Java’s extends, i.e., <? extends String>. Capiche?

Idioms like case Some(x) => x are called extractors in Scala and are used a lot in Scala, as well as in FP, in general. Here’s another example using Lists and our friend ::, the “cons” operator.

    
def countScalas(list: List[String]): Int = {
    list match {
        case "Scala" :: tail => countScalas(tail) + 1
        case _ :: tail       => countScalas(tail)
        case Nil             => 0
    }
}
val langs = List("Scala", "Java", "C++", "Scala", "Python", "Ruby")
val count = countScalas(langs)
println(count)    // => 2
    

We’re counting the number of occurrences of “Scala” in a list of strings, using matching and recursion and no explicit iteration. An expression of the form head :: tail applied to a list returns the first element set as the head variable and the rest of the list set as the tail variable. In our case, the first case statement looks for the particular case where the head equals Scala. The second case matches all lists, except for the empty list (Nil). Since matches are eager, the first case will always pick out the List("Scala", ...) case first. Note that in the second case, we don’t actually care about the value, so we use the placeholder _. Both the first and second case’s call countScalas recursively.

Pattern matching like this is powerful, yet succinct and elegant. We’ll see more examples of matching in the next blog post on concurrency using message passing.

Recap of Scala’s Functional Programming

I’ve just touched the tip of the iceberg concerning functional programming (and I hope I got all the details right!). Hopefully, you can begin to see why we’ve overlooked FP for too long!

In my last post, I’ll wrap up with a look at Scala’s approach to concurrency, the Actor model of message passing.