China's Long Halftime Walk in AI
China's LLM entrepreneurs have gathered at a crossroads. They include scientists who have been studying natural language for nearly 40 years, successful former entrepreneurs, and young people who have just graduated with a doctorate.
Entrepreneurs compete on various levels. The crossroads even exist physically- it is the intersection outside the east gate of Tsinghua University. These companies are in such proximity to each other that the closest ones are only separated by a few floors.
The Sohu Network Building is on one side of this intersection and is likely the office building with the highest density of LLM talent in China. Wang Huiwen's <Light Years Away> is on the third floor. <Smart AI>, incubated by the Tsinghua University computer science department, rented the seventh to eleventh floors. Above the ninth floor, the office space is still vacant and retains memories of Sogou's presence, with the hallway displaying "Sogou's Milestones". Sogou's founder Wang Xiaochuan held a press conference in a meeting room on the second floor, announcing the start of a LLM entrepreneurship and the establishment of the new company <Bai Chuan Intelligence>, but he was prepared to choose a different location nearby, stating "I won't be caught up with them here".These companies endure office rent as expensive as Beijing's CBD just to be close to "China's top AI talent" geographically-speaking.
On the other side of the road are representatives from "Tsinghua faction" teams ,<Lingxin Intelligence> and <Shenyan Technology> .The former was founded by Tsinghua University's computer science department's deputy professor Huang Minlie and has been researching a "superhuman-sized large model" since the end of 2021. The latter team's founders almost all come from the Tsinghua NLP laboratory, with the laboratory's academic leader, Professor Sun Maosong, serving as the company's chief scientist. When the founder and CEO, Qifan Chao, wants to talk to the professor, he only has to walk a few hundred meters back to school.
Their opportunities for entrepreneurship were not the same. <Smart AI >was established in 2019 and was one of the earliest companies to start. In the early days of entrepreneurship, they developed applications based on Google's 2018 release of the BERT large model. <Light Years Away >was officially launched in early April 2023. Wang Huiwen saw the opportunity for LLM early in the year and "made the decision within a few days" to start another business.
They were all shocked by the "talent" displayed by ChatGPT. <Zhen Fund> designed over 300 questions, including "What is the square root of a banana?" and asking LLM to simulate Tarot card divination. In mid-March, GPT-4, which had just been released, had an accuracy rate of over 70%, while the average accuracy rate of domestically produced large models released in the same period was 20%. By May, the average accuracy rate of domestically produced LLM had caught up to over 50%.
Entrepreneurs who were impressed by the capabilities of LLM compared them to "the next-generation computer," "the invention of fire," and "the God created by humans." They used various analogies to explain the expected magnitude of change, including "Cambrian explosion," "industrial revolution," "renaissance," "Great Age of Exploration," "Apple-Microsoft era," and "BlackBerry era," and so on.
"Large models focus on data, models, and scalable algorithms, while traditional NLP research focuses on designing many intricate models, but many of them are no longer effective in large data and models," said Huang Minlie, founder of <Lingxin Intelligence> and deputy professor of Tsinghua University's computer science department.
The entrepreneurs have different interpretations of AGI (general artificial intelligence), the ultimate goal of large models, from definition to understanding.
Wang Xiaochuan only casually chatted a few rounds with ChatGPT before he was convinced that "AGI had already arrived." He believed that ChatGPT confirmed his judgment six or seven years ago: when the machine mastered language, strong artificial intelligence arrived. At a small sharing session, several AI industry entrepreneurs only defined ChatGPT's progress by its functionality.
"You guys are thinking too small," Wang Xiaochuan said. He received a call from someone at the event, asking, "Xiaochuan, are you just pretending again?" A few days later, that same person called again, saying, "You were right again this time."
Wang Huiwen believes that "the recognition of AGI may flip many times along with grasping the facts and the results."
The commonality is that they all believe that the large model's technological revolution is bigger than any change they have experienced, and they are at the starting point of a change wave that could last for decades.
"This AI wave should last for decades and consist of multiple small waves. It will not be completed in one wave but will feature different innovations in different waves," Wang Huiwen said.
He agrees with American investor Elad Gil's view that in some technological waves, all value can be captured by startups, while in others, most of the value will belong to mature enterprises or be distributed between startups and mature enterprises. Wang Huiwen believes that the AGI wave belongs to the latter because the difference between large model technology and past technology is significant, leading to market unpredictability and giving startups room to grow.
Until ChatGPT educates the domestic market.
In October 2022, multiple American investors mentioned to Li Zhifei a profitable AIGC application called Jasper. Jasper had only been established for 18 months with a valuation of 1.5 billion.Li Zhifei realized that Jasper solved the problem he had been considering for two years: where GPT-3 is applicable. Li Zhifei had previously developed an AI writing tool but did not release it due to a lack of commercial prospects.
Other startups, such as <Fourth Paradigm>, also attempted to use BERT and GPT series models for writing assistance but were unsuccessful due to limited resources and difficulty obtaining external support. However, in June 2020, GPT-3 was released, and Li Zhifei recognized it as a game-changer.
At the time, investors focused more on applications than on large-scale models. However, many venture capitalists did not realize the commercial potential behind GPT-3 until 2023, when ChatGPT began to dominate the Chinese education market. Li Zhifei restarted his research on large models, and began to receive inquiries from a wide range of industries.
However, customers were not always receptive to the idea of large-scale models, as their deployment costs could be in the millions of yuan and required significant investment. Nonetheless, the AI race has intensified, with resources becoming increasingly scarce and competition growing more fierce.
Fixing screws while landing on the moon
From February 7, Wang Huiwen began making phone calls one by one to those he thought were suitable for LLM entrepreneurship. He was always asking "What do you think about...", followed by "Do you want to do it?" The responses he often heard were negative, like "It costs too much" and "This is something for giants". A week later, he decided to enter the field himself. "People often underestimate the importance of jumping in as soon as you see a big change," Wang Huiwen said.
The president of Meituan's Home Business Group, Wang Pu, praised Wang Huiwen for his "exceptional ability to identify talent". As a result, those contacted by Wang Huiwen for LLM talent also attracted attention from other competitors. Qi Fan, the CEO of <Shenyan Technology> was one of them. Qi Fan participated in the development of the "Wudao" big model at the NLP Laboratory of Tsinghua University during his PhD, published more than 30 papers in top international journals, and developed the "WantWords Reverse Dictionary" with classmates, attracting over 5 million users. In the eyes of an employee of <Shenyan Technology>, Qi Fan is a rare talent with both technical and product capabilities, driven by his innovation to do research and create products, "he doesn't want to do the same thing as others."
One month after the release of GPT4, the atmosphere in the venture capital circle gradually heated up. Various companies held press conferences, demonstrating how big models can be applied in office, marketing, and other scenarios, and how they can be integrated with industries such as medical care and smart transportation. Investment institutions held closed-door meetings, urging invested companies to keep up with changes and avoid being subverted. An investor who was sent to Singapore to study web3 projects a year ago came back to "study AI vigorously", and many investment managers began to study technical papers. Lu Qi asked his team to do a "Big Model Daily" to synchronize the latest information, lamenting that there were so many new papers that he "simply couldn't keep up."
Wang Xiaochuan asked ChatGPT, "In order to succeed in this start-up, recruit more partners and outstanding leaders, what should I do?" One of the suggestions was that Wang Xiaochuan should first announce his thoughts to the outside world. He took ChatGPT's advice, held a media communication meeting, and officially announced the establishment of <Baichuan Intelligence>, planning to release a large model benchmarked against GPT-3.5 by the end of the year.
Money quickly poured in. As soon as Wang Xiaochuan told his friends about his entrepreneurial ideas, he was asked "if he could add a friends and family share". The startup capital for <Baichuan Intelligence> came from his personal funds and the support of friends, totalling $50 million. Wang Xing invested in Wang Huiwen's <Light Years Away> as an individual, while Shu Hua invested in several generative AI-related companies.
Currently, the two highest-valued start-ups in China are <Light Years Away> and <MiniMax>.
<Zhen Fund> was one of the early investment institutions to invest in <Light Years Away>. Dai Yusen, managing partner of <Zhen Fund>, believes that the process of commercialization of big model technology is difficult for scientists and requires someone with a business mindset. At the same time, the high financial threshold of big models requires entrepreneurs to have experience in "raising hundreds of millions of dollars and spending it effectively."
Not many people are familiar with MiniMax, a company named after an algorithm, but it has many well-known partners: on April 18, the president of Volcano Engine, Tan Dai, mentioned at a press conference that "Volcano Engine runs Douyin and MiniMax"; on the same day, Kingsoft Office released WPS AI, and CEO Zhang Qingyuan introduced that the underlying big model was provided by MiniMax.
MiniMax was established in December 2021 and already has self-developed basic models for text, voice, and vision modalities. In March 2023, MiniMax launched an API open platform for enterprise users, supporting the service invocation of text and voice models.
Co-founder Yang Bin previously worked at Uber AI Research Institute and has received NVIDIA's Pioneer Research Award in 2018 and Microsoft's Global PhD Scholarship in 2021. Most of the MiniMax team members were born in the 1990s. Several venture capitalists believe that the best age for starting a big model business is under 35. They think big model technology updates take place "daily," and young people can iterate their cognition faster with no outdated knowledge structures.
MiniMax's team has a lively appearance and strong self-motivation. Initially, they nicknamed their big model "ABAB" because, during the early stages of language capability training, the model could only make baby-like sounds ("ah ba ah ba"). Now, nearly 18 months after the company was founded, they are confident that their model has domestic leading capabilities. MiniMax's corporate culture is inspired by SpaceX, with the founding team often comparing big model development to building rockets, hoping to achieve AGI by not taking shortcuts.
Now, more organizations in China are striving for AGI as a vision, building teams and securing resources from scratch. A member of a big-model start-up team described the current situation as everyone "fixing screws while landing on the moon."
On the new continent, the most valuable thing may not necessarily be gold
Entrepreneurs are exploring different paths of "technology-product-commercialization" based on their experiences.
At the strategic level, several entrepreneurs are working on both large models and business applications. Zhou Ming, one of the earliest entrepreneurs in large models, believes that his company, <Lanzhou Technology>, has formed a "feedback chain" between the models and applications and has a first-mover advantage. Wang Huiwen established a "dual-wheel drive" strategy for <Light Years Away>. He believes that focusing only on applications or models carries strategic risks. The risk for the former is that modeling capabilities are constantly evolving, potentially covering many application scenarios. The risk for the latter is that the market may be seized by competitors focusing on commercialization, making it difficult to measure the quality of late-stage research models and collect more data through applications.
Li Zhifei believes that OpenAI has seen "real user data" earlier than competitors like Google, an important factor behind OpenAI's research direction and barriers. He proposes that AI application data can be divided into three stages: pre-launch, with developer assumptions of user data; initial-stage collection of large amounts of "false data" such as daily new users leaving meaningless interactions; and OpenAI's current stage, with retained users, paid users, and more realistic data.
Different entrepreneurs are choosing different product routes. Zhou Ming is committed to focusing on 2B and developing specialized large models for specific industries, which he believes is a more pragmatic approach. In contrast, some start-ups, like MiniMax and <DeepWord Technology>, believe that large models can provide value in general scenarios and can meet both 2B and 2C market demands without deliberately distinguishing between them.
Li Zhifei's direction is to provide AI tools for "professional consumers" or "prosumers"- individuals who create certain content for consumption.
Despite their different product paths, entrepreneurs face some common challenges, such as how to prevent large models from generating nonsensical results. By introducing expert knowledge and rules, as well as aligning with human values, start-ups can create more accurate and reliable AI applications.
However, they also face fierce competition from larger companies and a more stringent regulatory environment. As they strive to navigate these challenges and build revolutionary AI products, they are inspired by the stories and experiences of successful start-ups from previous technology waves.
Overall, Chinese entrepreneurs in the large model market are just beginning their race. Undoubtedly, the competition will be fierce, the environment will be harsh, and the journey will be long. Yet, these challenges offer the potential for vast growth and innovation, as they shape the future of AI technology.