Conditional Estimation of HMMs for Information Extraction
Submitted to ACL 2003
Sapporo, Japan
July 2003

Download PDF (8 pages)

Download PPT (500KB; presentation to NLP group, including work discussed in this paper)

A conditionally-trained HMM in a toy domainThis is another paper I wrote that didn’t get accepted for publication.┬áLike my character-level paper, it was interesting and useful but not well targeted to the mindset and appetite of the academic┬áNLP community. Also like my other paper, the work here ended up helping us build our CoNLL named-entity recognition model, which performed quite well and became a well-cited paper. If for no other reason, this paper is worth looking at because it contains a number of neat diagrams and graphs (as well as some fancy math that I can barely comprehend any more, heh).

One reason why I think this paper failed to find acceptance is that it wasn’t trying to get a high-score in extraction accuracy. Rather it was trying to use smaller models and simpler data to gain a deeper understanding of what’s working well and what’s not. When you build a giant HMM and run it on 1000 pages of text, it does so-so and there’s not a lot you can learn about what went wrong. It’s way too complex and detailed to look at and grok what it did and didn’t learn. Our approach was to start with a highly restricted toy domain and minimal model so we could see exactly what was going on and test various hypotheses. We then scaled the models up slightly to show that the results held in the real world, but we never tried to beat the state-of-the-art numbers. Sadly, it’s a lot harder to get a paper published when your final numbers aren’t competitive, even if the paper contributes some useful knowledge in the process.

It seems both odd and unfortunate to me that academic NLP, which is supposedly doing basic scientific research for the long-term interest, is culturally focused more on engineering and tweaking systems that can boost the numbers by a few percent than by really trying to understand what’s going on under the covers. After all, most of these systems aren’t close to human-level performance, and the current generation of technology is unlikely to get us there, so just doing a little better is a bit like climbing a tree to get to the moon (to quote Hubert Dreyfus, who famously said as much about the field of AI in general).

If companies are trying to use AI in the real-world, their interest is performance first, understanding second (make it work). But in academia, it should be just the opposite–careful study of techniqus and investigation of hypotheses with the aim of making breakthroughs in understanding today that will lead to high-performance systems in the future. But I guess the reality is that it’s much easier (in any discipline) to pick a metric and compete for the high score. (The race for a 3.6GHz processor to out-do the 3.5GHz competition in consumer desktop computers comes to mind, when both computers are severely bottlenecked on disk-IO and memory size and rarely stress the CPU in either case. Ok, that was either a lucid metaphor or complete jibberish, depending on you are. :))

In any event, I enjoyed doing this research, and I’m proud of the paper we wrote.

Liked this post? Follow this blog to get more.