sub-words

Words, sub-words and compound words!

Not everything I write here is going to be about CS or linguistics however! So don't be discouraged from reading in the future even if you find this painfully boring!

In the last year I was lucky enough to be able to work on the project WordMine2, a linguistics project which is trying to gather Orthographic (letter sizes and differences), Phonological (phonemes or sound properties) and Semantic (meanings) information  about words. Currently managed by Dr. Lori Buchanan of the University of Windsor I re-designed the UI for searching this database of about 64 thousand words and now I'm looking at more data mining.

One of the benefits of doing this work in Dr. Buchanan's lab is that I am able to sit in on meetings she has with her thesis/graduate students. This allows me to be able to see what kind of data and access to the data they require for their studies. Recently a few students have been looking into compound words and that's what sparked my interest in this topic.

So my current self-assigned task is to take our word list and try and find all of the compound words within it. I divided this into two separate operations in the code, the first being to find all of the words in the list that were parts of other words (this will make more sense later) and then take those words that i found and see if they make up the whole word.

Subscribe to RSS - sub-words