Wu Dao 2.0 – Bigger, Stronger and Faster AI from China
It’s no secret that China has COVID-19 under control. When you travel there you have to go through a 2 week hotel quarantine, but once you are in the country you are safe. Likely even safer than before COVID because wearing a mask is now part of the label, and many other viral respiratory illnesses are likely to be on the decline. Therefore, when I was invited to speak at the Beijing Academy of Artificial Intelligence (BAAI) Annual Conference in the AI for Healthcare Section, I readily agreed.
BAAI is a great platform to showcase technology and talent across broad categories. The nonprofit institute encourages scientists to tackle problems and promote breakthroughs in AI theories, tools, systems and applications. In addition, the BAAI has a unique focus on long-term research into AI technology.
AI is big in China. So large that over 70,000 people register for the event and many more log on to watch BAAI’s presentations after the event. And some of them present very new approaches, algorithms, systems and applications. However, the real success of BAAI was Wu Dao 2.0 – a system that surpassed OpenAI’s GPT-3 in many ways.
The Encyclopedia Britannica defines language as “a system of conventional spoken, manual or written symbols through which human beings, as members of a social group and participants in its culture, express themselves”. We can conclude from this definition that language is an integral part of human connection. Not only does it allow us to share ideas, thoughts and feelings with each other, but language also allows us to create and build societies and empires. Simply put: language makes us human.
According to Professor Gareth Gaskell of the Department of Psychology at York University, an average 20-year-old knows between 27,000 and 52,000 different words. At age 60, that number averages between 35,000 and 56,000. Therefore, when we use words in a conversation, the brain has to make a quick decision about which words to use and in what order. In this context, the brain functions as a processor which can do several things at the same time.
Linguists suggest that every word we know is represented by a separate processing unit that has one task: to assess the likelihood that incoming speech matches that particular word. In the context of the brain, the processing unit that represents a word is similar to a pattern of activity through a group of neurons in the brain. So when we hear the start of a word, several thousand of these units become active because there are many possible matches.
Most people can understand up to eight syllables per second. However, the goal is not to recognize the word but to access its stored meaning. The brain accesses many possible meanings of the word before it is fully identified. Studies show that upon hearing a word fragment like “cap”, listeners begin to register several possible meanings like “captain” or “capital letter” before the full word emerges.
Like most things driven by artificial intelligence in the 21st century, language has also evolved to take on different forms and meanings. Recently, the concept of “language models” has taken center stage in AI. Essentially, language models determine the probability of words by analyzing textual data. This means that language models interpret data using statistical and probabilistic techniques to determine the probability of a given sequence of words. Language models are commonly used in natural language processing applications such as those that generate text output. These include machine translation and answering questions.
When Microsoft unveiled its Turing-NLG language model in February 2020, it was hailed as the largest model ever released and one that outperformed other models on various language modeling benchmarks. Upon release, Turing-NLG published 17 billion parameters and could generate words to accomplish open textual tasks. The model also generated direct responses to questions and summaries of input documents.
In March of the same year, OpenAI unveiled its version of an autoregressive language model called Generative Pre-trained Transformer 3 (GPT-3), which uses deep learning to create human-like text. This third generation GPT-n series language model has a capacity of 175 billion machine learning parameters. OpenAI researchers published an article in which they demonstrated that GPT-3 can generate news articles that human reviewers have difficulty distinguishing from articles written by humans. These researchers also claim that the language model can be trained to generate 100 pages of content that cost just pennies in energy costs.
GPT-3 was found to be so strong and powerful that Microsoft allowed exclusive use of the language model and its underlying code.
A year later, however, another language model took over both GPT-3 and Turing-NLG in terms of innovation and ingenuity.
This model, called Wu Dao 2.0, was presented at BAAI. The work behind Wu Dao 2.0, which is dubbed China’s first large-scale intelligent model system, was led by BAAI Research Academic Vice President and Tsinghua University Professor Tang Jie. It was supported by a team of more than 100 AI scientists from Peking University, Tsinghua University, Renmin University of China, Chinese Academy of Sciences and other institutions.
Wu Dao 2.0 is actually the successor to Wu Dao 1.0, which was unveiled by BAAI earlier this year. Wu Dao 2.0 is truly China’s biggest and best response to GPT-3.
First, unlike GPT-3, Wu Dao 2.0 expands in both Chinese and English with skills learned by analyzing 4.9 terabytes of images and text. Wu Dao 2.0 has also entered into partnership agreements with 22 brands, including smartphone maker Xiaomi and video app Kuaishou. The Chinese model was trained on 1.75 trillion parameters, which is almost 10 times greater than the 175 billion parameters on which GPT-3 was trained.
Wu Dao 2.0 can also write poems in traditional Chinese styles, answer questions, write essays, and write text for pictures. Additionally, this language model has met or exceeded peak levels (SOTA) on nine benchmarks, as reported by BAAI. These include:
1- ImageNet (zero-shot): SOTA, surpassing OpenAI CLIP.
2- LAMA (factual knowledge and common sense): AutoPrompt exceeded.
3- LAMBADA (shutdown tasks): overtook Microsoft Turing NLG.
4- SuperGLUE (few-shot): SOTA, surpassing OpenAI GPT-3.
5- UC Merced Land Use (zero-shot): SOTA, surpassing OpenAI CLIP.
6- MS COCO (text generation diagram): Outperformed OpenAI DALL · E.
7- MS COCO (graphical recovery in English): Outperformed OpenAI CLIP and Google ALIGN.
8- MS COCO (multilingual graphic recovery): Surpassed UC (best multilingual and multimodal pre-trained model).
9- Multi 30K (multilingual graphics recovery): Surpassed UC.
Finally, Wu Dao 2.0 unveiled Hua Zhibing, the world’s first Chinese virtual student. Hua can learn, draw and compose poetry. In the future, she will be able to learn coding. This learning ability of Wu Dao 2.0 contrasts sharply with GPT-3.
Further details on how and exactly the nature of Wu Dao 2.0 training is not yet available, making direct comparison with GPT-3 difficult. However, the new language model is a testament to China’s AI ambitions and superb research agendas. There is no doubt that innovation in AI will increase in the years to come, and many of these innovative developments will help advance many other industries.
One of the luminaries of AI and investors, who helped build at least 7 AI-powered unicorns driven by AI, Dr Kai-Fu Lee, recently gave a talk at the Science and Technology Park from Hong Kong where he explained the power of transformers tweaking huge pre-trained models such as Wu Dao 2.0. These models can be refined for multiple industries and a large number of applications such as education, finance, law, entertainment, and most importantly, healthcare and biomedical research.
Applications of transformers in biomedical research have the potential to produce new discoveries that will benefit humans no matter where they live. And we sincerely hope that despite the trade wars, governments will consider collaborating in biomedical research.