Kamusi versus Google Translate Head to Head
This document presents a head-to-head comparison between Kamusi and Google Translate. These are actual screenshots from the KamusiHere! (version 1.0) and Google Translate mobile apps, for the same set of searches.
The images below show how the Kamusi model provides much more precise vocabulary than Google is capable of. Although the Kamusi data is still young, you can already see how much more confident you can be in its results. The difference is notable between English and another language. When English is not the language of interest (over 99% of translation scenarios), the difference is incontrovertible.
The first comparison shows English to Chinese translation of the word “dry”. Dry often means “free from liquid”, but it can mean many other things, such as “lacking interest”, “abstaining from alcohol”, or, for mammals, “not producing milk”. Kamusi shows you each sense, with enough information for you to decide which Chinese term matches the meaning you want. Google shows you some options, but gives little or no context; 5 choices are simply stated as “dry”. Google does follow with some English dictionary definitions for “dry”, but they are not connected to their matches in Chinese.
Also important to note, Kamusi highlights data that it knows to be missing, and will soon open its systems for knowledgeable users to help fill in those gaps. Google has a link on its website to “suggest an edit”, but testing over several years shows this to be essentially a placebo button.
The second comparison shows Chinese to English for 干, which means “not wet” and several other things. Kamusi shows each sense separately, always with a definition and often with usage examples. Google shows you a list of English words, without any context to zero in on your meaning. On the Kamusi side, the model allows you to choose with confidence. On the Google side, you’re left to guess.
The remaining comparisons demonstrate translations among languages other than English. These pairs are where Kamusi really shines, and where Google falls completely flat. Kamusi currently connects 44 languages at this level of precision, and the model will support the inclusion of thousands more languages as data is collected over time. Google provides a single “best guess” for a grid of about 100 languages, with no capacity to improve their accuracy among most translation pairs.
As you scroll down, you will quickly notice that Google provides one result per search, based on its statistical estimation of correlation to the most frequent intermediary English sense in the two languages. However, the Kamusi side shows that the terms in the other languages all have as much nuance as you saw for “dry” in English. For example, 干 currently has six senses in Kamusi, of which three are already matched to Italian, three to Greek, and three to Romanian. You can flip those searches around and find that all four senses of “secco” in Italian are accurately aligned to terms in Chinese, two are completed vis-à-vis Greek, and all four to Romanian. Translating between Chinese and Italian, or between Italian and Greek, or Czech and Zulu, is every bit as complex as translating between Chinese and English. On aggregate, all of the non-English translation scenarios dwarf the reach that is possible with Google, while the Kamusi model can scale continuously, extending to many valuable markets that are unserved by Google’s roll-the-dice approach.
The very last comparison below, between Italian and Romanian, exemplifies where Kamusi is headed. Definitions are shown in Italian, Romanian, and English for all four concepts that match the Italian term “secco”, including the ideas of drought and thinness. As of 2015, more than 1.1 million Romanians were living in Italy. Google serves this population with a guess that is slightly better than random. In a world with people increasingly on the move, Kamusi serves concrete facts whenever two languages meet.
Google offers one thing Kamusi does not: translating whole sentences well between English and a handful of other select languages in certain situations, but its vocabulary errors permeate when it strays from formal texts in its top few languages. Kamusi has not yet released tools for translating long texts, but you can picture how the software now in the lab will benefit from the vocabulary precision already demonstrated for single words, between any pair of languages for which it can collect the words.
English to Chinese |
|
Chinese to English |
|
Chinese to Italian |
|
Italian to Chinese |
|
Chinese to Greek |
|
Italian to Greek |
|
Chinese to Romanian |
|
Italian to Romanian |
|