About the author
I build tools for languages, with a particular emphasis on the many languages that are usually excluded from today’s knowledge and technology resources. I was raised in the rural American state of Vermont, originally thinking I’d follow my mother and grandfather into mathematics. Visiting India as a teenager, though, I witnessed deprivation on a scale I had never imagined. I made poverty the focus of my studies, first for a BA at Columbia in New York, and then 9 years for a PhD at Yale. My anthropology research examined why people stay poor despite decades of well-intended development programs (here’s the abstract and here’s the text). Having spent summers milking cows in the countryside, I was attracted to do my field research in the remote highlands of southern Tanzania, where…
… I needed to speak Swahili if I was going to speak at all. To learn Swahili, I needed a dictionary – or “kamusi”, as it is called in Swahili. The best available was antiquated and difficult to use. Why not divide the alphabet into chunks and use this new thing called “the internet” to get a bunch of people to write a new dictionary together, I thought? This was 1994 – screens were green, disks were floppy, and modems trickled signal over telephone lines. I found the roughly 30 people in the world who both spoke Swahili and had email, and the Kamusi Project was born.
After a while, we tried cloning the project for the Yoruba language of Nigeria, and it was a disaster because the languages were too dissimilar. So when another language joined in, we dug deep into figuring out how to make a model that genuinely worked for more than a single language pair. It turned out that going from two languages to three was perniciously complicated, but the jump from three to four was a relatively straightforward extension of the new logic. And if four, why not five? If five, why not all 2000 African languages? If Africa, why not the rest of the world?
I’ve made some big mistakes as I’ve pushed forward on Kamusi. First, I concentrated on Swahili and other African languages, which can’t pay for themselves and don’t interest foundations and government agencies in the slightest. Second, I forgot to become a dot.com millionaire before setting out, and third, I neglected to put any time into the monetization potential of a central unified linguistic data repository. And, I totally failed to move to Silicon Valley. So the project has moved forward in fits and bursts, with sporadic bits of funding, a lot of effort from students and volunteers, and a new approach to digitizing the world’s languages that will, I hope, begin to make a difference in the very near future.
I moved to Switzerland in 2007 for family reasons, and enjoy raising my daughter away from the threat of gun violence experienced in the US. Working through Kamusi, now an independent, non-profit NGO registered in the State of Delaware (501(c)(3)) and the Canton of Vaud, and also for several years at the Swiss Federal Institute of Technology in Lausanne (the LSIR lab at EPFL and now part of the EdTech Collider), I spend my days designing ways to gather data from all 7000 languages, and exploring new ways to deploy that linguistic knowledge within cutting-edge technology.