If-Methods 21
if (Character.isLetter(ch) && identCharCount > MIN_IDENT_LEN) {
currentToken.append(ch);
}
if (Character.isWhitespace(char ch) {
tokens.add(new Token(tokenBuffer.toString(), TokenType.IDENT));
currentToken = new StringBuilder();
}
The list of ifs went on and on. Each of them did something distinct and I wanted to break the method down into a set of meaningful sub-methods, but how?
The typical pattern to use is to extract the bodies of the if-statements into separate methods, provided you can find good names for them. However, when I imagined what the code would look like after that refactoring, I didn’t really like it. It was still rather wordy:
if (Character.isLetter(ch) && identCharCount > MIN_IDENT_LEN) {
appendToCurrentToken(ch);
}
if (Character.isWhitespace(char ch) {
addToken(currentToken.toString(), TokenType.IDENT);
}
So, I thought that I’d extract each of the if-statements into its own method:
void onIdentChar(char ch) {
if (Character.isLetter(ch)
&& identCharCount > MIN_IDENT_LEN) {
currentToken.append(ch);
}
}
void onWhitespace(char ch) {
if (Character.isWhitespace(char ch) {
tokens.add(new Token(tokenBuffer.toString(),
TokenType.IDENT));
currentToken = new StringBuilder();
}
}
The calling code would certainly look cleaner:
onIdentChar(ch);
onWhitespace(ch);
The only problem is that the method names are a bit of a lie. When we use an “on” prefix we’re writing in event style. We expect that a method like onWhitespace is called only when we have whitespace. It shouldn’t have to check for it.
I’ve run into this situation before. How do you name a function which consists of a single if-block? The situation crops up frequently in legacy code. You want to break up a large method into several smaller ones and you’re left with a dilemma: do you extract the entire if-statement or just its body?
Here’s another example.
if (replyFlag == RE_SEND) {
...
// code that sends a mail message
...
}
I once asked a friend what he’d name a method for that if-block if we extracted it and he said “sendEmailIfNeeded.” We both flinched and moved on to other things, but let’s think about that name a bit. Part of it talks about intention and part of it talks about implementation. Maybe we can pull the implementation bit out and concentrate on the condition:
if (needResend()) {
...
// code that sends a mail message
...
}
Better. Now, how about extracting it like this?
ifNeedsResend();
It’s sort of like the event style I wrote about earlier. The method is a handler for a condition and the condition is checked in the method. Violation of the Single-Responsiblity Principle? Perhaps, but I do know that I’m looking at some code now that has been clarified by this idiom.
public List<Token> getTokens() {
onStart();
for (int n = 0; n < text.length(); n++) {
char ch = text.charAt(n);
ifLetter(ch);
ifWhitespace(ch);
ifLeftParen(ch);
ifRightParen(ch);
ifVerticalBar(ch);
ifColon(ch);
ifDigit(ch);
}
onFinish();
return tokens;
}
What do you think?

My first reaction was “use a better language for parsing, like lex/yacc, antlr, etc.” ;)
I’ve used the
doFooIfBar()and related idioms before. Yea, it’s a bit of an SRP violation, but of course the principles are just that, principles, not laws. The clarity you got at the end is worth it, IMHO.One nit; it looks like only one
ifblock will get used, so calling all the methods for all chars seems wasteful, or is that premature optimization and hence evil? It would certainly require the finalgetTokensto be a little more complicated.ifblock into a an object. Each object an instance ofRuleimplementing two methods:isSatisfied(char ch)andapply(char ch). Stick theRuleobjects in a list/array and you’re down a loop.for(Rule rule in rules) { if(rule.isSatisfied(ch) rule.apply(ch); }I’m thinking most people would have called it handleWhitespace(ch) w/out a second thought.
I can’t see this as an improvement.
When I see
I can only wonder
How does this make the code clearer?
I agree with Micah here and would like to add some.
I’ve tried programming without using If’s. Just to see how far you can take it. In fact I created a smaller application without any conditionals at all.
It’s not something I can recommend for production but it was a fun and mind expanding experience.
I ended up more than one time with solutions similar to Micah’s suggestion. The rules become very clear that way. The implementation as well.
No SRP-violations and you sort of ‘configure’ in new behavior by putting new Rules into a list/map. Easily testable and very flexible.
How about naming them….
whenLetter(ch);
.... etc. ?
This would imply an action would be taken with the parameter when the condition is met. The ifLetter() does not necessarily connote an action will be taken.
I’m with Jeffery. We do this and we also address Deans desire for short-circuiting and add error handling on top of that by returning a boolean from our handleXxx() routines:
if (!(handleLetter(ch) || handleWhitespace(ch) || handleLeftParen(ch) ...)) { throw new LexerException("Did not recognize character: " + ch...)); }I’m with Micah.
Once you get a collection of objects, then iterating through them is easy. Micah is left with just one ‘if’ and you don’t even have to have that inside the for loop.
Now the for loop is closed to modification of the collection – bingo, OCP!
When programming in Forth, this pattern occurs all the time. I demonstrate this in my “Over The Shoulder” video (see the torrent http://archive.forthworks.com/ots/ots-01.mpg?torrent .
To summarize, Forth programmers can, and I argue should, abstract multi-path branches in a more linguistically natural mechanism. For example:
: character ( ch—) eitherLetter orWhitespace orLeftParen …. lexerException ;
Then, we define each choice as a guard on some action, through careful manipulation of the return stack. For example:
: eitherLetter dup isLetter? if handle-it-here r> drop then ;
Interestingly, reading about how Haskell compilers implements some of its optimizations, the use of “structured returns” isn’t novel—Haskell’s been optimizing like this for some time.
I like ‘handle’ and it works in this case, but I keep thinking about the generic problem of naming a method extracted around an if-block.
The thing about ‘handle’ is illustrated by the other example: we could call the method ‘handleResend’, but I wonder if I’m alone in feeling that it sounds so noncommittal.. sort of like you say “handle this” to an object and it replies “Okeh, whatever..” :-)
Rick: I think your objection is very interesting. Yes, if we say ifWhiteSpace() people are bound to say “then WHAT?” but I wonder.. why don’t we have the same reaction to onWhitespace? Why don’t we say “On whitespace WHAT?” I think it’s just because we’re used to the idiom.
DeanW, Micah: the thing that is missing here is that this is an odd lexer with very weird indentation behavior. Doesn’t feel worth trying to find a tool that will handle it well (no pun intended). Also, a class for each of those methods would be fine, except that they fiddle with the state of the class. Not wanting to break it out yet.
Samuel: Have to check that out. Thanks!
The Rules approach is the best I can think of. But I would improve it with anonymous inline classes (like callbacks).
final char ch = ’ ’; final StringBuilder sb = new StringBuilder();
List rules = new ArrayList(); rules.add(new Rule() { public boolean isSatisfied() { return ; } });
// add several rules ...
for(Rule rule in rules) { if(rule.isSatisfied()) rule.apply(); }
—-
Also, sometimes I just create local boolean variables as conditions:
boolean isLetter = Character.isLetter(ch); // bunch of boolean variables
if (isLetter) { // do something }
if (anoterBooleanVariable) { // do one more thing }
Cheers
The thing I don’t like about handle is that you aren’t really handling anything. I would expect it to be a chain of responsibility, where the first match stops the execution.
I thought also about what Micah mentions of turning each if into an object. But it’s the same thing – we don’t really need to go through each rule (or maybe we do).
I almost want to say something like:
public List<Token> getTokens() { onStart(); for (int n = 0; n < text.length(); n++) { char ch = text.charAt(n); handleCharacter(ch); } onFinish(); return tokens; }Or:
With the former just deferring the action to somewhere else, and the latter using polymorphism to determine the handler to create. Bonus points for the latter that you could dynamically wire up new handlers and conditions.
What if you had an if-else if-else if…-else situation instead of just the ifs?
I see the chain of responsability here. Still, if that’s not doable, I’d go for handleIfSpace, handleIfLetter, etc. Just the ifSpace, ifLetter leaves me hanging on the implied but unanswered then-part of the if.
I agree with the previous comments that
ifLetter,ifWhitespaceneed an associated action. If letter then what?I like the suggestion of a table-based loop, with predicates (
isLetter,isWhitespace) and corresponding actions.As you originally mentioned,
onFoois an event idiom: the code raising the event callsonFoo, and somewhere there is a function that handles the event. In this case you still ask the question “on foo then what?”, but the answer is “do what the listener wants”. Hopefully the listener has a sensibly-named function. I don’t like the handlers for button click events to be called “onClick”, I prefer “addUser”, “deleteFile”, “closeWindow”, “saveDocument”. If your framework requires that the button handler is called “onClick” then a simpleworks wonders for clarity.
This doesn’t work with your
ifLetterscheme since the whole point of these functions as you’ve got them is to perform the test and the action.handleLetterworks a bit better, since it says “handle the case that this is a letter”. I’ve usedhandleFoobefore, but “handle” is a bit generic.“Why don’t we say “On whitespace WHAT?” I think it’s just because we’re used to the idiom.”
I don’t think this question is asked much, because it’s usually a framework that’s calling onEvent not your “own” code. So when people read though, they may see onEvent registered somewhere, and then read declarative code saying onEvent { do this; do that; }
I liked Micah’s approach, but I think I’d prefer using a list of tuples with a boolean predicate and an action:
or something similar. The “parse” function would, for each character in the text, simply iterate over all tuples, and whenever a predicate is true, perform the associated action.
This solution allows you to capture both the intent of the predicate, as well as the intent of the action, and avoids coupling the two unnecessarily. I think it’s this coupling between predicate and action that is the source of all the unease with the proposed solutions so far.
Michael, you were heading towards an event style notation, but shied away from it. What’s wrong with an event notation? You just needed to separate the event from the action, as in the above example.
Ultimately, I suspect a Finite State Machine would be one of the most appropriate representations of the intent of this part of the code. You could use the example above as an intermediate step to refactor towards a full FSM implementation, by adding states and state transitions to the rules later. Of course, this application might not require a full FSM, but I can’t judge that based on the information given.
I would agree with Cory seems chain o’ reponsibility would apply here. My only concern would be introducing the complexities of a pattern in a stable area of the code.
Rick is perfectly right. The method doesn’t “if” but it generates a token from character if the character matches a certain pattern so I’d rather opt for:
for (int n = 0; n < text.length(); n++) { char ch = text.charAt(n); scanLetter(ch); scanWhitespace(ch); ... }The general problem here is that the methods do two distinct things so the right name would be something like “scanLetterIfItsALetter” which is rather strange. In the long run, I’d go with Micah. You can extend the isSatisfied method with a context parameter that propagates state and/or apply weighted rules to make the mechanism smarter.
I think it’s a language flaw. If the language supports pattern matching, then there will be a beautiful result.
public class TokenBuilder { StringBuilder token ... List tokens ... public void appendIf(boolean condition, char c) { if (condition) token.append(c); } public void cutIf(boolean condition, TokenType type) { if (condition) { String content = token.toString(); tokens.add(new Token(content,type)); token = new StringBuilder(); } } }The loop would become:... TokenBuilder builder = ... for (int n = 0; n < text.length(); n++) { char ch = text.charAt(n); bool isLetter = ... builder.appendIf(isLetter,ch); bool isWhitespace = ... builder.cutIf(isWhitespace,TokenType.IDENT); ... } ...