About the author

I build tools for languages (see LinkedIn and Zotero), with a particular emphasis on the many languages that are usually excluded from today’s knowledge and technology resources. I was raised in the rural American state of Vermont, originally thinking I’d follow my mother and grandfather into mathematics. Visiting India as a teenager, though, I witnessed deprivation on a scale I had never imagined. I made poverty the focus of my studies, first for a BA at Columbia in New York, and then 9 years for a PhD at Yale. My anthropology research examined why people stay poor despite decades of well-intended development programs (granted with distinction but never published as a book, you can read and download the full text for free). Having spent summers milking cows in the countryside, I was attracted to do my field research in the remote highlands of southern Tanzania, where…

… I needed to speak Swahili if I was going to speak at all. To learn Swahili, I needed a dictionary – or “kamusi”, as it is called in Swahili. The best available was antiquated and difficult to use. Why not divide the alphabet into chunks and use this new thing called “the internet” to get a bunch of people to write a new dictionary together, I thought? This was 1994 – screens were green, disks were floppy, and modems trickled signal over telephone lines. I found the roughly 30 people in the world who both spoke Swahili and had email, and the Kamusi Project was born.

After a while, we tried cloning the project for the Yoruba language of Nigeria, and it was a disaster because the languages were too dissimilar. So when another language joined in, we dug deep into figuring out how to make a model that genuinely worked for more than a single language pair. It turned out that going from two languages to three was perniciously complicated, but the jump from three to four was a relatively straightforward extension of the new logic. And if four, why not five? If five, why not all 2375 African languages? If Africa, why not the rest of the world? The molecular lexicography model gradually evolved into the Kam4D data matrix, a unique and comprehensive multilingual platform to document human expression across time and space.

I’ve made some big mistakes as I’ve pushed forward on Kamusi. First, I concentrated on Swahili and other African languages, which can’t pay for themselves and don’t interest foundations and government agencies in the slightest. Second, I forgot to become a dot.com millionaire before setting out, and third, I neglected to put any time into the monetization potential of a central unified linguistic data repository. And, I totally failed to move to Silicon Valley. So the project has moved forward in fits and bursts, with sporadic bits of funding, a lot of effort from students and volunteers, and a new approach to digitizing the world’s languages that will, I hope, begin to make a difference in the very near future.

I moved to Switzerland in 2007 for family reasons, and enjoy raising my daughter away from the threat of gun violence experienced in the US. Working through Kamusi, now an independent, non-profit NGO registered in the State of Delaware (501(c)(3)) and the Canton of Vaud, and also for several years at the Swiss Federal Institute of Technology in Lausanne (the LSIR lab at EPFL and now part of the EdTech Collider), I spend my days designing ways to gather data from all 7000 languages, and exploring new ways to deploy that linguistic knowledge within cutting-edge technology. In 2013, Kamusi was recognized as a “lighthouse project” by the White House Office of Science and Technology policy, and I regularly participate in various language actions of the European Union. I have a role with the African Academy of Languages (the African Union’s intergovernmental language agency) as a Technology and Language Expert for their ongoing digitization projects, serve as a member of the UNDP Global Expert Group on Closing the Language Gap in AI, and have also worked with the multi-organizational AI4D initiative for laying the groundwork for AI in African languages.

You can find many of my formal writings at http://www.zotero.org/malangali, and periodic quick takes on AI on LinkedIn with the hashtag #AIspotcheck. Teach You Backwards is the in-depth study I conducted about the inner workings of Machine Translation, especially Google Translate in its operations across more than 100 languages – some things have changed in the industry due to AI since I finished Teach You Backwards in 2019, but many of the fundamental flaws, including MUSA (the Make Up Stuff Algorithm) remain very much the same, if not more so.

In 2020 I had a nasty bike accident that cost me the use of my left eye. I leaned into the damage and started a YouTube channel as “The Pirate Professor“, where I sometimes post videos about language technology, AI, climate matters, and more – please watch, enjoy, like, and share! I’ve long enjoyed photography, with lots of photos on Flickr, though less than before going half blind. I also had to quit running, but walk everywhere all the time – over 2000 miles/ 3000 km a year, if you’d like to connect on Strava.

One more thing: I run Kamusi Labs as a virtual lab for students who seek interesting projects in language technology. If you are looking for a cool and useful internship, semester project, senior thesis, Masters project, or even direction for a PhD, please get in touch. As an example of what we can do, the groundbreaking Kamfupi software was started by a student from India and brought to greatness with a student in Kenya – do check it out! Though the lab probably can’t pay you, your work can qualify for course credits at many universities. I enjoy working with students around the world, especially when the projects can involve languages that do not yet have good technology resources, so I’ll be happy to talk about how you can join the lab.

And another one more thing: I’m usually happy to talk about my work and perspectives on languages and technology, so get in touch if you would like me to speak at your organization or university!

To notify TYB of a spelling, grammatical, or factual error, or something else that needs fixing, please select the problematic text and press Ctrl+Enter.

An In-Depth Study of Google Translate for 108 Languages

About the author

Spelling error report

The following text will be sent to our editors:

Your comment (optional):