A couple of weeks back I was talking with a friend about indexing texts. She (Hi Drea!) works with medieval dye recipes rather than culinary ones, but we both have the same sorts of problems with trying to search texts that are littered with spelling variations and foreign words.
That conversation got me thinking about, and then experimenting with, a couple possibilities for an improved cookbook search for my website. One of my tests proved to be very functional and much more efficient, so now I've got the new indexing and search interface written and am half way through building new indexes for all the cookbooks.
(While building the indexes is no longer as much work as it used to be - and the new system is a lot more maintainable - it's still a rather labor intensive task.)
Two of the texts I've done so far - "Two Fifteenth Century Cookery Books" and "Forme of Cury" - are the most irritating. They're not only large and loaded with medieval spellings, but they contain many uses of certain words I've now come to loathe.
Pyk - This word, along with its variants (pik, pyke, etc.) can mean pike (as in "Take a fresh pyk and remove the scales") or pick (as in "and then pyk out the bones"). This word is by far the worst, with no consistency in how a given spelling is used. The only saving grace is that "pick" isn't an ingredient, so I could go through the text and mark all the cases where "pyk" meant "pick" so they wouldn't be indexed. Anything left is therefore "pike".
Flowre - There are a surprising number of variants for this one (flour, flower, floure). Indexing them was made a little easier in that a plural always indicates "flowers". One text did have a couple cases where "flower" meant "flour" though, which is really awkward because people who search on "flower" don't want to see recipes with "flour" but do want to see recipes with "flowers".
Eles - Almost as much of a pain as "pyk". Here the many variants (els, elys, etc.) can mean either "eels" or "else". Again though, I only need to index one of the terms.
Grains - For this one I have a different issue. "Grains" can mean ... well, grains, like wheat. Alternately, it could be part of the term "grains of paradise" meaning the seeds of the plant Aframomum melegueta. Generally the plural always means "grains of paradise" and the singular means "grain".
Even though I'm an incredible word geek, after working on these texts I really do hate these words. I also have a renewed appreciation for standardized spelling (or "standardised" for those in the UK).