Premature Optimization? 33
I have a theory.
It’s one of those theories that I don’t want to get ruined with actual facts, but I suppose I’ll put it out there and see what I learn.
The theory stems from a series of observations about the way some Microsoft development shops write code versus others. I’ve noticed that some shops like to make many things sealed (final). I’m not talking about the occasional class, but rather the default is to use sealed on most things. In addition to classes being sealed, there’s a heavy bias against virtual methods. I’ve not noticed this with Java development groups, though in the spirit of honest disclosure, I’ve only noticed thins tendency is about 50% of the MS shops I’ve visited – and it seems like the tendency is reducing.
- It expresses the design intent
- It’s for performance
- The code as written cannot be safely overridden.
(Before I go any further, I am not arguing against the use of sealed or final. I will say that in my practice, it has to be justified to be used rather than the other way around.)
Back to the reasons. I generally don’t buy these reasons. Each may be perfectly valid but none of them stand on their own; each requires further discussion.
I’ve had those discussions with various people and most of the time they really don’t hold water. Here are some quip-y responses to each of them:- It expresses design intent – you don’t know how to design
- It’s for performance – do you have any idea how a VM works?
- The code as written cannot be safely overridden – So the code is so badly written that nobody understands it?
Again, let me be clear, there are good reasons to seal things. My point is that while there are good reasons, most of the time I’ve seen the habit is a form of cargo culting; monkey see, monkey do.
This observation sat around for some time until I read some stuff in a book called CLR via C# (good reading if you want to better understand the CLR, and if you’re developing code for a virtual machine, I think it’s a good idea to understand the virtual machine at least a little).
Reading that book and my observations led to my theory:People are sealing as much as they are because virtual table binding is done way to early.
The thing I read in the book I mentioned above relates to when virtual methods are bound. This applies to 2.0 version of the CLR, but I don’t think there’s any change for the latest version of the CLR and this seems like a pretty “hard” design decision. This book mentions that binding of virtual methods is performed at compile time.
I’m not making a distinction between compile time and link time for this discussion, but I’ll be a little more precise. The generated code stored in an assembly “knows” which virtual function it is invoking because in the generated byte code is an index into a virtual table.
How is this a problem?
I write a base class and ship it. You write a derived class (OK, maybe you use delegation, good for you, but the problem still remains). Your coupling to my class involves using a method with the virtual keyword (Java programmers, you don’t have to do this, regular methods [not static, not a constructor, ...] have the potential for dynamic dispatch unless you declare them final). At compile time, your object module (which eventually resides in an assembly that will get loaded and executed some time later) knows that it wants to call the 3rd virtual method in the v-table.
So far, so good.
Next, I add a new virtual method to my class and ship a new assembly.
Question. To use this new assembly, do you need to recompile?
Yes.
What?
Yes. What happens if my newly added method is the first virtual method? Then your compiled code that is happily living in a peaceful assembly somewhere has a stale reference. It thinks it wants to run the 3rd method in the virtual table but now it really should be using the 4th virtual method.
This is similar to a caching issue or premature optimization. The calling code has a compile-time generated index into the v-table. Why is this? Is it for performance? Not a good argument, this pre-calculation is irrelevant with modern JIT compiling techniques. I think the reason it is done this way is that C++ does things this way. C++ needs to because of its execution model (there’s not virtual machine between the code and its execution).
No big deal, you think, I’ll just recompile my code.
That will fix the problem. Or will it?
Say you’re stuck using an old version of some assembly that uses an old version of another assembly that you also use. Do you recompile both your code and that other assembly? You’ll have to but you might forget. Or you might not be able to.
.Net handles this by having an excellent dependency system to track which versions of which assemblies work with with other versions of assemblies. They need this to address problems introduced by the decision to perform such early binding.
Simply put, this is an unfortunate decision that I believe was more based on precedence or history than on sound design decisions.
This is not what happens in the Java Virtual Machine (JVM).
In the JVM, the binding is done later—much later. It is possible that my class got loaded and JIT’ed so that all of the methods that could be virtually invoked (not final) are in fact not virtually invoked. So far, no subclasses have been loaded, so there’s been no need. After my system has been running for a few weeks, a new class (a subclass) gets loaded and invalidates the JIT’d version of my class. No problem, the system will dump the old version and JIT it again, on the fly.
Even if there is a need for virtual dispatch, it might be that most of the time I use one version of method for a particularly heavily used subclass. The JVM might inline the common case and virtually dispatch the others.
Bottom line: what’s being done with JVM’s is indistinguishable from black magic for most of us.
I can use new JAR files and not have to worry about recompiling to remove stale v-table references. The method binding will just work. (This does not remove all problems with changing versions of JAR’s but it gets rid of one persnickety one.)
But it goes deeper. As I mentioned above, I’ve noticed several Microsoft shops that use a lot of sealed classes or avoid using virtual methods because “they perform poorly”.
OK, let’s talk about method invocation for just a second (or make that pico second). On my machine, I timed virtual method dispatch. I should check my numbers, but what I got was that virtual method dispatch was taking around 450 pico-seconds. So you can invoke roughly 2.2 Billion methods per second.
So:- Virtual method dispatch is not that expensive
- The JVM can address caching issues with code analysis and inlining
- You only pay the price if your design requires it
- The people writing the JVM are better and micro optimization than certainly me, and I’d say most people
- Testability
- Maintainability
- Flexibility
- Extensibility
Look up the percentage of time spent “building” a system versus “maintaining” a system. The percentage has been on the rise, and it’s 80 – 95% typically. 80% of the cost of development is spent maintaining code. (I believe a lot of this percentage has to do with the definition of “done†– but that’s a whole other discussion).
You might think that locking things down make them more stable and that stability leads to maintainability.
That’s not really the case.
Things change. Requirements change.
Locking down a class to keep it from breaking is like signing off on requirements to keep them from changing. Neither thing is really a good idea.
I CAN lock down my classes in a sense. I can use tests to describe my assumptions so the class is locked down so long as we all agree that we’ll keep the tests passing. That is a fine level of granularity.
But I digress.
Back to the whole sealed thing. I was wondering why so many places that use Microsoft stuff seal their classes and do not use virtual methods (in my experience, it’s about 50%). I then read that thing about the binding of virtual methods at compile time. And then about 2 weeks later it hit me.
If you use virtual methods and you have a messed up build system, then you’ll get strange behavior. Sealing things makes those surprising behaviors go away, so seal stuff.
It makes sense. This is something that C++ was notorious about. You hoped you’d get a segmentation violation but often the wrong method could get called and the system continued to run…
Unfortunately, the “solution” – sealing – is attacking a symptom, not the actual problem. This is where following the 5-why’s of Toyota would have been a good idea.
“Seal your classes” Why?- Because they will be more stable. More safe/it expresses my design. Why?
- Because my design says that this should not change. Why?
- Because when things change, sometimes “bad things happen.” Why?
- I don’t know…
And right there is where you know the solution is a bad idea.
In reality I suspect the habit perpetuates because once something appears to work, it is cut and pasted ad nauseam.
Should you never seal things?
Never say never.
Or better yet, never say “never, never, never.”
There are three levels of never:- Never: Don’t do it because you may not know what you are doing. (Once you know the rules, you can do it.)
- Never, Never: You know what you’re doing most of the time, you are doing a bad thing but you really do know better.
- Never, Never, Never: DON’T DO IT!
There are few things that are Never, Never, Never. Dereferencing NULL comes to mind (but then maybe I waned a core dump if I get to this point in the code).
I’d place using sealed/final in the Never, Never camp if I’m using tests to describe the semantics of my classes. OR, if I’m writing concurrent code, using final on fields provides certain guarantees…
However, let’s say that I’ve been told not to use tests (I’ve seen it at more than one place). Then that “never, never” becomes a “sure do it to cover my assâ€. Someone can’t subclass and mess things up. Someone can’t forget to compile their code when mine changes and cause their code to break…
Now is this a reason to avoid using Microsoft products? No. I really do like C# (for a statically-typed language that is). And I’m convinced that Java is a better language because of the work done by and in C#. And let’s face it, having more VM’s is a good thing. It improves competition. It raises the level at which we can expect to work.
But to quote Tim Booth/James in a rather trite way, “there’s a chain of consequence within, without.” That one thing, binding to the v-table so early, has caused a chain of events that leads to a platform that seems a little bit more sluggish to develop in to me. (I expect flames for this.)
I think this design decision is a reflection of a mind-set that permeates the development environment. For example, if I have a large solution with several projects, it sure seems that dependency checking in the build environment takes a long time. When I watch Developer Studio build things, it looks like make is running under the covers and performing a bunch of file checks to see what has changed. Where’s the incremental compilation?
I’ve seen very good work in both .Net and Java development efforts. It just seems that the frequency of unnecessary, basic kinds of environmental design debt occur more frequently in a .NET development effort than in similar Java development efforts.
Maybe it’s a sampling effect. To paraphrase Weinberg, “there’s always a sample, be aware of it, you can’t remove it.” I’m a consultant, I go places where they need consultants. I never, never go places that don’t need consultants, so my sample set is biased (I have one experience, maybe two, where I went and the people did not need consultants because they were doing fine – or maybe they didn’t need technically-oriented consultants).
So what do you think?- Is using sealed by default a good design choice?
- Is the binding of methods early a good thing?
- Was the decision by design or because of history?
- Has/will the CLR change?