Computational Linguistics

In late 2011 and early 2012, I got interested in this subject because of my fascination with Google Translate. Sure, you can poke fun at Google translations: about how idioms in one language are meaningless in another (recall the “clothes that poke eye” brouhaha) and how you can get funny result by translating one phrase to another language and translating back. That is completely missing the point. Google Translate work very well, even in some languages which I do not think Google has a resident expert on. For example “pakaian menjolok mata” is now “revealing clothing”. Google Translate not only make meaningfully correct translations, it can now make idiomatically correct translations as well. Not only that, but GT actually learns how to improve its algorithm using built-in machine learning. Google developed the BM translator using nothing more than a finite set of BM rules (what are the rules? I’m dying to know), a finite set of BM vocabulary, and a learning engine, all these without having a single native BM speaker in the development team (as far as I know).

Along with my curiosity towards Google Translate, three other things happened at the time which made me interested in computational linguistics:

One, I finished reading The Language Instinct by Steven Pinker. This is one of the finest popular science books ever written. In the hands of laymen, like me, it shatters all my intuitive beliefs on the nature of language, and how we acquire language. It is one of those books that makes you feel dumb for not knowing even the rudiments of how the world works. Of course there are scholars who contest Pinker’s assertions, but the majority of linguists now subscribe to Pinker’s theory of language acquisition, which in turn was inspired by Noam Chomsky’s earlier work on universal grammar.

Two, I started to learn programming seriously — only to drop out a few months later, LOL — and what keeps me from being a good programmer is that I ask too much question to myself about the language, rather than the algorithm. As usual, I keep over-intellectualize everything to the point of inefficiency. I need to work on this personal shortcoming.

Three, I discovered the ILO, the International Linguistics Olympiad (http://www.ioling.org/), which is a competition based on computational linguistics. Having been involved with the national math olympiad program, and having founded three other national olympiad programs (IBO, IESO and IOI), I decided to give this a look. It is quite interesting to know that there is a devoted, close-knit community of people around the world, working on the field of linguistics education using the olympiad model. Fascinating as it is, it is not something I see myself working on in the near future. But who knows.

I started a simple blog (http://linguistikmalaysia.wordpress.com/) to put together some useful links for my reference. This blog is not meant to be updated regularly, in fact it hasn’t been since March 2012, but I might update sporadically with new interesting content. Just want to tell people that it exists.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: