55181

Natural Language Processing

It's incredible how many followups my previous post about "Human Language Parsers" (the official name of which apparently is $SUBJECT) prompted; it would appear that this is quite a popular subject among academics. I've received at least three pointers to allegedly good books about the subject, a number of claims on both sides of the argument whether or not it is possible, and some pointers to existing and somewhat-working implementations of the structure I outlined in my previous post.

Noteworthy is the blog link to blogs.msdn.com, which only worked once out of the five or so times I tried (the other tries all gave 404 or 503 error codes).

My conclusion from all the documentation I read: What makes NLP so hard is the fact that the information required to parse a text includes more than what is actually available in the text; one also needs background information that would require outright artificial intelligence if it is to be understood by a machine.

Or so people actually involved with this stuff claim. Having read the small amount of information, I can understand why that is.

It's also nice to find out that Dutch, which has an extensive use of compound words, is way more exceptional in that regard than I thought it was. Gheh.

Hottentottententententoonstelling. There.