Naming and Body Language in Functional Code 21

Posted by Michael Feathers Tue, 11 Aug 2009 17:25:00 GMT

I wrote a blog the other day about functional refactoring and I had what I thought was a good example:

absPosition :: Buffer -> Int
absPosition (Buffer (x, y) contents) = x + lenPreviousLines contents
  where lenPreviousLines = 
    foldr ((+).length) 0 . take y . terminatedLines

Almost immediately, I saw replies on a couple of forums (including this one) which pointed out that I could’ve written the code this way:

absPosition (Buffer (x, y) contents) = x + lenPreviousLines contents
  where lenPreviousLines = 
    sum . map length . take y . terminatedLines

It’s funny, I thought of using sum instead of foldr back when I was using Haskell’s line function. The code I had looked like this:

absPosition (Buffer (x, y) contents) = x + lenPreviousLines contents
  where lenPreviousLines = 
    foldr ((+).((+1).length)) 0 . take y . lines

But, I realized that the code wasn’t in great shape for sum, so I created terminatedLines, used it and promptly forgot to do the refactoring I set out to do.

terminatedLines :: String -> [String]
terminatedLines = map (++ "\n") . lines

From an imperative point of view, terminatedLines looks a bit silly: What?? You’re going to append a newline to each line in a list of lines you just created just so that you can count it?? But, I suspect that it isn’t that bad. The evaluator pulls values from each line and as it reaches the end of one it should just put a newline at the end of it. If I’m wrong about this, please let me know.

In any case, I agree that the code looks better with sum that it does with foldr (+) 0. The big question is – should we refactor any more?

Someone with the handle sterl suggested a very cool trick. I could drop the where clause like this:

absPosition (Buffer (x, y) contents) = 
  x + (sum . map length . take y . terminatedLines) contents

And then move on to this:

absPosition (Buffer (x, y) contents) = 
  sum . (x:) . map length . take y . terminatedLines $ contents

What’s going on here? Well as sterl put it, we’re summing anyway so why not prepend the x onto the list that we are already summing?

Part of me likes this and part of me doesn’t. One the one hand, it’s brief, but on the other hand, the code isn’t telling us why it is doing what it is doing anymore. In the original code, there is an algorithm:

To get the absolute position, add the x position of the location to the sum of the lengths of all of the previous lines.

In the new code, the algorithm is:

To get the absolute position, sum the current x position with the lengths of all of the previous lines.

Wait, that’s sort of the same, isn’t it?

This example points to a fundamental dilemma that I have with naming in Haskell. I’m used to introducing names in lower-level languages to bridge the gap between intention and mechanism, but what happens when your mechanism is so high-level that it can speak for itself? Maybe we don’t need names as much?

Now, I know as I write this that someone is going to look at this as an extreme statement. It isn’t. Names are useful, and indispensable, but really they are only one of several ways of communicating meaning. In each case, we have to pick the right tool for the job. With Haskell, I think that programmers communicate with structure as much as they communicate with names. It’s the body-language of their code.

Comments

Leave a response

  1. Avatar
    Shae Erisson 28 minutes later:

    I very much agree. Significant Names become a problem when the point of the code is to communicate the naked structure. In that case, single letter names emphasize the ‘wiring pattern’ that is the actual content.

  2. Avatar
    josh about 1 hour later:

    How about x + (length . unlines . take y $ contents)?

  3. Avatar
    Michael Feathers about 1 hour later:

    josh: You mean: x + (length . unlines . take y . lines) contents

    I’ll have to try that out.

  4. Avatar
    http://donsbot.wordpress.com about 1 hour later:

    If your code is sufficiently abstract to properly capture domain concepts, its naming conventions should be the naming convention of the domainn it models (e.g. statistics, cryptography, finance ..).

    Same rules for good DSLs and EDSLs apply here.

  5. Avatar
    Jake McArthur about 2 hours later:

    I like:

    (x+) . length . unlines . take y . lines

    Also, if you define:

    withLines = (unlines .) . (. lines)

    then you can do:

    (x+) . length . withLines (take y)
  6. Avatar
    lilac about 2 hours later:

    How about:

    sum . zipWith ($) (replicate y length ++ [const x]) $ terminatedLines contents

    Or, “take the sum of the length of the first y lines, and x for the line after that.”

  7. Avatar
    Michael Feathers about 2 hours later:

    Jake: That’s wild. Is that a section of function composition?

    lilac: I’m going to have to work on understanding that one.

  8. Avatar
    Jake McArthur about 3 hours later:

    Yes. (. lines) takes a function f and creates f . lines. That is then applied to (unlines .), which results in this:

    unlines . f . lines

    I normally avoid sections with function composition, but I think it is visually meaningful in this kind of simple case.

  9. Avatar
    Jake McArthur about 3 hours later:

    Correction to the above:

    (unlines .) is then applied to that, which results in this [...]

  10. Avatar
    lilac about 4 hours later:

    It might be clearer to rewrite my suggestion as:

    length $ charsUpTo x y contents
      where charsUpTo x y = withLines (zipWith ($) (replicate y id ++ [take x]))
  11. Avatar
    Matthew about 5 hours later:

    About this game of rephrasing haskell programs into mysterious expressions involving hordes of composition operators and other higher-level combinators being applied to eachother, with few meaningful identifier names:

    For higher mathematics, maybe there are real advantages in the use of abstracted language and definitions. Maybe the abstractions used really elevate you to a higher level of discourse, where you can work in peace and clarity free of the irrelevant details. Perhaps this is also true for well-designed DSLs in a language like haskell.

    But for simple bits of programming, taking this kind of thing to the extreme seems counter-productive. It doesn’t raise you cleanly to a higher level of abstracted discourse, it just adds indirection to a simple idea, requiring a lot of mental energy to unpack the definition of something which is trivial.

  12. Avatar
    Michael Feathers about 6 hours later:

    Matthew: I agree. The question seems to be where do you stop? I suspect a lot of it has to do with what you assume about your audience.. whether some idioms common enough to avoid head-scratching.

  13. Avatar
    Jake McArthur about 6 hours later:

    Matthew: I think my version is a good example of the goal of this whole exercise, though. I have expressed the whole idea of “take y lines, get the length of that, then add x to that” rather clearly, I think. Far more clearly than the original, certainly.

  14. Avatar
    Samuel A. Falvo II about 8 hours later:

    @Matthew: You don’t understand functional programming, so of course this looks very foreign to you. And, by understand, I truly mean eat, breath, sleep, and think in it. You’re an OO coder, and it shows, just by your comment.

    @Michael: Matthew’s point is valid, but what I think you’re going through is the classic “newbie” phase of functional programming. You know just enough to be dangerous—that is, you can speak the language, but you’re not fluent in it like a native coder/speaker would be. As you indicate, much of functional coding is body language

    The funny thing is, Lisp and Forth coders have been trying to tell people this FOR DECADES. The real question is, why hasn’t anyone been listening?

  15. Avatar
    wren ng thornton about 11 hours later:

    @Matthew: Taking anything to extremes is liable to be more trouble than it’s worth, but these examples are hardly extreme. As for “unpacking the definition”, using function composition and eschewing names for intermediate parts often simplifies the reading because there is less to unpack. In the C/Java/Perl world I often have to run around in circles reading function after function before finally getting to the real code; in Haskell things are usually a lot more direct.

  16. Avatar
    Mark Wotton about 11 hours later:

    (++) might be space equivalent, but I think it’s not time equivalent:

    14:22 ~ % cat test.hs
    import System
    
    atEnd [] = []
    atEnd (x:xs) = (atEnd xs) ++ [x]
    
    main = do
      (n:_) <- getArgs
      putStrLn $ show $ sum$  atEnd [1..(read n)]
    
    14:21 ~ % time ./test 10000 
    50005000
    ./test 10000  1.26s user 0.02s system 96% cpu 1.332 total
    
    14:22 ~ % cat test2.hs
    import System
    
    atBeginning [] = []
    atBeginning (x:xs) = [x] ++ (atBeginning xs)
    
    main = do
      (n:_) <- getArgs
      putStrLn $ show $ sum$  atBeginning [1..(read n)]
    
    14:22 ~ % time ./test2 10000     
    50005000
    ./test 10000  0.00s user 0.00s system 74% cpu 0.008 total
    
  17. Avatar
    Dirk about 18 hours later:

    Actually, terminatedLines isn’t a good idea, because (++) is slow (Rule of thumb: Putting things in front of a list is good, appending things at the end is bad, because in the latter case you first have to traverse the whole list).

    For the example in question, I’d just use a list comprehension:

    absPosition (Buffer (x,y) contents) = 
      x + sum [length l + 1 | l <- take y $ lines $ contents]
    

    No need to make code harder to read than necessary.

  18. Avatar
    Michael Feathers about 19 hours later:

    Mark: Thx.

    Dirk: I was hoping that with lazy evaluation, the reduction order for something like [1,2,3] + [4,5,6] would be 1:(2:([3]+[4,5,6])). In other words, we consume elements from the head and then the append happens sort of as a cons and at end of the consumption. from the definition of (++) it looks like that is what happens.

  19. Avatar
    nemo godfrey about 22 hours later:

    Michael Feathers,

    re: Dirk: terminatedLines isn’t a good idea

    While [1,2,3] + [4,5,6] indeed reduces to 1:(2:([3]+[4,5,6])), the cost of that reduction is the length of the first list. In the context of your map (++”\n”) code, it’s decidedly non-trivial. That’s Dirk’s point.

    In other words, append doesn’t do automagical, never mind zero-cost, consing “at the end of the consumption.” That said, you might want to check out difference lists.

  20. Avatar
    sterl. 6 days later:

    When to extract and not to is a good question. I tend to only give names and extract when there’s some redundancy to be exploited. Otherwise, it always makes sense to me to inline—proper formatting and occasional comments are enough otherwise. Nice Haskell can read very fluently already, I think.

    Viewing the simple stuff inline can also expose all sorts of other nice optimizations such as why “terminatedLines” can be replaced with, e.g., Jake’s very elegant code above.

    “editor combinators” and variants such as the above are really nice ways to denoise your code—abstracting out views on data, so to speak, rather than operations on it.

  21. Avatar
    Walt "BMeph" Rorie-Baety 13 days later:

    I personally love the terseness of functional programming for a very good reason: It breaks the ungodly, pretentious habit programmers have of “nice-naming” – that is, naming functions and procedures with names that suggest what the function is meant to do, but doesn’t.

    I suppose the reason the practice gets on my nerves so much is because I tend to run into the mid-level-skill types – the people who think they’re much cleverer than they are. There’s nothing that makes a team member want to step back and watch a drowning man go under, than hearing, “Jeez, can’t you read?!? It’s OBVIOUS what my program does! It’s called ‘addStudentToTutors’!” It’s obvious what the program is supposed to do, yes…

    One of the nice thing about doing all of those Flying Wallendas-style functions, is that it breaks your functions up into dense nuggets of power, that you are then “compelled” to comment. It’s never bad to just throw a line of commentary that says: ”—‘withLines’ takes a function, and makes a new function that applies ‘lines’ to its argument, runs your function on that, and then does ‘unlines’ to the result”.

    Functional programming in general, gives you the idea that you’re describing what you want to happen, instead of what you have to set up to get it to happen.

Comments