Jan 02 2010
JavaCC Parser for WordNet 3.0 Noun Data Set
Here is a bare-bone parser (parse_wn.jj) that reads WordNet 3.0 noun data set (dict/data.noun).
The parser is not perfect: when parsing the original data.noun file as unpacked from WordNet, it would fail at the entry “zero”. Similarly, it requires all words in the dictionary not start with “0″ (which just happen to be the case). However, this is as close as I can get after two days’ work. “Zero” is the only entry that fails, and it would work if you put a double quote around the digit 0 in the synset ring.
Last time I wrote a serious parser was 15 years ago. I always enjoyed writing parsers. It is like writing poem, in a very strange way–you have to choose your words carefully. But if you are successful, you can express a lot things with very few words.
This time around, the effort started quickly but stuck in the sand soon after. More than once, I felt like a lab rat running around a maze–I could smell the cheese but still find a thin plastic wall in between. So this is the best I can get.
Right now, I am working on a Taxonomy related project, one that really kills brain cell. But is exciting as hell.