This cleek gared a fair bit blether on the FB.
A masel warkit oot aince that the Broons and Oor Wullie *alane* hae pit oot weel ower a million wirds o Scotsbut Thomas Widmann thocht “Ay, but ye wad need a parallel corpus o owersettins, no juist a corpus o Scots. An aw the orthographical variation wad mak the job e’en mair difficult”.
Ay, a muckle projeck. Aff the tap o ma heid, ye’d (a) hae tae the choose Scots texts (b) regularise the spellins accordin tae set rules (via corpus analysis tae pick oot the maist cowmon variants or yaise a dictionary), an (c) dae an *English* owersettin o the lot. No impossible, tho. Tak 2-3 year, A doot…*afore* ye get tae Google mind.
Níall Páraig Ó Treasaigh wunnert “A) Is Oor Wullie really Scots? B) How many of those words are “kartie”, “bucket” and “stoorie brae”? There’s not a lot of variation in the language used in the books, so the coverage of the language will be minimal. C) You also need a lot of bilingual material, and the last major push to produce bilingual text in Scots and English was the translation of King James’s back catalogue of Scots writings when he assumed the English throne. That said, Scots is a language that would have a reasonable chance at translating well if there was sufficient material (which there isn’t) due to the structural similarities with English. Google still wouldn’t be able to decided between “that”/”thae and “thon”.
On the other hand, as your target language is most likely to be English, you’d probably be just as well with a program that does interactive glossing and/or dictionary look up. For example, there’s Caoimhín Ó Donnaíle‘s Wordlink site (http://multidict.net/wordlink/) which already has a Scots option (but unfortunately seems to break the SLC site when you attempt to switch to Scots). It connects to 4 dictionaries at present: Scots Online, DSL, Glosbe and Global Glossary.There’s lots that can be done with a tool like Wordlink, and I’m sure the SLC would be able to make good use of Wordlink for making resources for schools. There’s another interesting webapp on the Multidict site which makes use of Wordlink — http://multidict.net/clilstore/” It combines video and audio recordings with transcripts that are then linked to the dictionaries”.
Warren Maguire concludit “As faur as I can see, you dinna juist need a 1,000,000 wird bilingual corpus, you need ane aboot 1,000,000,000 wirds in baith leids, niver mynd a staunnart form o Scots. Aesy eneuch gin you hae newspapers an magazines bein pit oot daily, but that’s no the case wi Scots, mair’s the peity”.