Wednesday, December 2, 2009

Spike Day 3

(See New Broom for an explanation....)

12/2/09

Priorities

Finish the extractor BDDing or start on pipelining?

Dave’s keeping the definition of the “Craftsman Spike” open is a blessing and a curse. It’s forcing me to prioritize across “meta” boundaries and confront the local existential question: why am I here?

Prioritization is part of the craft, so in wrestling with these issues, I am learning something important.

The story-level cycle (think about the story, run RGR cycles, repeat) requires a way to guarantee that the “thinking algorithm” will halt at a reasonable point, avoiding “analysis paralysis” and over-engineering. (In fact, the same issue comes up within the RGR cycle itself – cf. Red-Green-Refactor.)

Rereading that post by James Shore reminds me that I still need to work seriously on keeping the coding short in the RGR cycle. That first fiasco with the TagExtractor is a case in point. In fact, this is another example of the halting problem. Given that emitting only short bursts of code – baby steps - is a big priority, it points toward finishing the extractor story.

But another priority here is learning Ruby, which I’m interpreting as “getting some experience with all the constructs that are new to me as a Java/C# developer”. From that perspective, these extractor stories are like micro-spikes that don’t have to lead to production code, just proof of concept.
It occurs to me that I can use this log/blog/diary as part of the cycle in a more granular way as a check on concision.

Interrupt

In the spirit of making this more real-time (and probably more boring to folks reading it – hmm, should I be tweeting?), I'll mention that I just got an email from Andy M. about my Rails getting-started DB problems. He’s asking for my database.yml and Rails logs. (Embarrassing that I never looked at them....)  So hold on while I go look for those.

Interrupting the Interrupt

Todd just cranked up Pandora – his blues station – and I didn’t recognize Howlin’ Wolf.  That’s gigantically embarrassing, but I can’t let it stop my march toward craftsmanship.

Back to Priorities

If I want to finish the constituent extractor (CE), there are two stories I haven’t dealt with.  The easy one is – extract multiple constituents from one string. The hard one is figuring out how to deal with EOS, because in the current rspec context, the end_matcher is the same as the start_matcher – i.e., the beginning of a W1913 entry is a headword element, and the only way to find the end of it is to find the next headword.

So I’ll start with the easy one and concentrate on concision.

It occurs to me that I could be providing zips of the current state of the code for anybody crazy enough to want to follow this stuff in detail.

Don’t see how to do it with Blog*Spot free hosting, although they will host images.

Stupidity Interrupt

How the hell did I ever decide that the initials for Red-Green-Refactor were “RGF”?  Sheesh! It’s RGR from now on. But I won’t go back and correct it.  (Stop laughing, Milo!)

[UPDATE:  I lied - I went back and fixed it. No point in confusing the reader.]

Back to CE

Going with the easy one first. Shore wants me to restrict myself to five lines of code per cycle. Let’s see if I can do that.  Should I prefactor? (Hmm, that word seems to have turned into something else since I heard Paul P. and Micah M. using it.)  I think it will be harmless to prevent duplication by making my headword matcher an instance variable and moving the definition into a before(:each), although Craig D. instilled the maxim “see the duplication before you remove it” in me.
(Okay, I’ll admit it – I really miss Eclipse’s (and Visual Studio/ReSharper’s) instant highlighting of errors.)

Another prefactoring – extract a validation method for the results.

Leaning over backward to keep it under five lines: wrote this very long one:
results = ConstituentExtractor.new(@headword_matcher, @headword_matcher).extract('AxByC')
(If it works, I’ll go back and verify that Ruby won’t let me break it before or after the ‘.’.)

Oops – I needed the offset argument in the extract call.  Now it’s the “shade of red” I wanted, and the test is only four lines long. (Just checking the size of the results array for now – expected 2, got 1.)
Extracted the extraction code to a private method. The extract method calls it in a loop and increments the offset as long as there’s a match.

Ran the test – good news, bad news: the single extraction test is green, but the double extraction is still red.  I’m inferring that the problem is possibly a one-off in the offset handling in the loop.

Interrupt

Milo T. IMs me.  Interesting convo. Some of it will show up here. He’s reading this blog. Suggesting I use cards to put some of the technical overthinking on the stack to keep it from slowing me down – a good idea for production, but this Craftsman Spike is rehearsal – speed is not the issue here, and overthinking is part of the process.

But Milo’s right, kids – kards are kool!

Back to CE

Wait – there is no loop yet! No wonder it only found one constituent.
Set up a begin .. end until loop.  Oops – infinite - had to force-quit TextMate!

Meta: Agile Athletics

After the IM with Milo about metrics and micrometrics, I realize I’ve thought about Agile development as an athletic activity for a long time – it goes back to Kent Beck and the name XP. So the Craftsman Spike is like training camp – not necessarily just boot camp, but a refresher.

Back to CE

Corrected the offset calc, and all is green!

Refactor:  deleted the debug console dumps.  Could probably compress the extraction into fewer lines, but I want to move on.

Now the hard one – dealing with the end of the source text (EOS).

Realized I could put the project into git – that would be the simplest way to track it through time. Still a little learning curve there – last time I tried git I had problems getting it to ignore some things.
Okay – trying EOS. I hope it’s as simple as an alternation (‘|’) in the end matcher.  Worked in rubular anyway.

Got to green on the EOS test, but boy does this need refactoring!  Passing in a couple of procs helped me learn about procs (certainly a spike goal), but the justification for it (duck typing: you could pass in anything as a matcher as long as it returns something that quacks a bit like MatchData, in case ordinary regexps aren’t enough) isn’t cutting it at the moment, mainly because so much of MatchData is being used now that EOS is supported. What’s worse: the extractor is making assumptions about what’s captured in the regexps.

So I’m going to pass in three regexp patterns:  constituent start, constituent end, and source end. It was another case of premature generalization – regexps are fine for my current story. I have to have faith in the process: keeping the code DRY and OO-clean will make it easy to generalize in future if necessary.
Let’s see if I can do this refactoring in baby steps without breaking tests.... With Java constructor overloading it might be easier, but too bad.

Okay, after a couple of false steps, it’s done. No more procs – but at least now I know how to use them.
Noticed something nice (it may happen with Java too, but I don’t remember seeing it, maybe because Java code is more verbose):  every time I refactor, the code gets more compact.  Certainly true of the rspec class.  In the extractor class, the private extractor method is 18 lines long, which seems like a lot, but the algorithm is kinda complex and I used intermediate variables to keep it a little clearer as to what’s happening. There’s one comment, and it regrets itself:

        # (smelly that this needs a comment) following line handles the normal and EOS cases:
        end_match_index = end_match.captures[0] ? end_match.begin(1) : end_match.begin(2)

Anyway, the exposed surface of MatchData is not all that transparent.

So – pipelining?

Pipelining again

I’ll start from the outside and maybe get into some Ruby file IO.

...

Just got sidetracked into looking at issues with IO and String classes and encoding.  Can’t expect to come up with the ultimate efficient solution now – just want to concentrate on reading in files, pipelining, extracting, etc.

Will get to it tomorrow.

No comments:

Post a Comment