The market size of 10 years is 1.3 trillion US dollars, and the era of modular power has come

Original source: Qubits

Image source: Generated by Unbounded AI

The big model storm blew for most of the year, and the AIGC market began to change again:

Cool tech demos are being replaced by full product experiences. **

For example, OpenAI's latest AI painting model DALL· As soon as the E 3 debuted, it joined forces with ChatGPT to become the most anticipated new productivity tool in ChatGPT Plus.

** **###### DALL· E3 accurately reproduces every detail of text input

For example, Microsoft's Copilot based on GPT-4 has been fully settled in Windows 11, officially replacing Cortana as a new generation of AI assistants in the operating system.

** **###### Use Copilot to summarize blog posts in one click

For another example, domestic cars such as Jiyue 01 have officially equipped large models in the cockpit, and they are completely offline ...

If "big models reshape everything" in March 2023 was just an optimistic prediction of technology pioneers, today, the still fierce 100-model war and practical application progress have made this view more and more resonant inside and outside the industry.

In other words, from the entire Internet production method to the intelligent cockpit in every car, an era of self-innovation with large models as the technical base and driving thousands of industries is coming.

According to the naming method of the steam age and the electric age, it may be named the "modular force era".

In the "Moli Era", one of the most concerned scenarios is the smart terminal.

The reason is simple: the smart terminal industry, represented by smart phones, PCs, smart cars and even XR devices, is one of the technology industries most closely related to contemporary people's lives, and naturally has become a gold standard for testing the maturity of cutting-edge technologies.

Therefore, when the first wave of hype brought by the technology boom gradually calms down, with the smart terminal scenario as an anchor, how should the new opportunities and challenges of the "modular power era" be viewed and interpreted?

Now, it's time to break it up and knead it and comb it out.

Smart Terminal, Big Model New Battlefield

Before analyzing the challenges and opportunities in detail, let's return to the essential question: why is generative AI represented by large models so popular, and even considered the "fourth industrial revolution"?

In response to this phenomenon, many institutions have been conducting research to try to predict or summarize the development of generative AI in different scenarios, such as Sequoia Capital's "Generative AI: A Creative New World".

Among them, many leading companies in the industry have analyzed the landing scenarios and potential change directions of generative AI in specific industries based on their own experience.

For example, terminal-side AI represents the player Qualcomm, and some time ago released a white paper on the development status and trend of generative AI "Hybrid AI is the Future of AI".

From this, it may be possible to interpret the three major reasons why generative AI is popular in the industry.

First of all, the technology itself is hard enough.

Whether it's a big model emerging intelligently or an AI painting that generates fake quality with fake quality, it's all about using effects to speak, and it's a real work area related to text, images, video, and automation, demonstrating an amazing ability to disrupt traditional workflows.

Secondly, there are rich potential landing scenarios. The generational breakthrough of AI brought by the big model has brought people infinite imagination from the beginning: the earliest batch of experiencers quickly perceived the benefits of generative AI to work.

The huge demand on the user side can be seen from the user growth rate of representative applications such as ChatGPT.

** **#### ChatGPT broke the record of more than 100 million registered users of popular applications, source Sequoia Capital

From the initial Internet search, programming, office, to the emergence of cultural tourism, law, medicine, industry, transportation and other scene applications, riding the wind of generative AI, far more than companies that can provide basic large models, but also a large number of start-ups are prospering and growing.

Many industry experts believe that for entrepreneurs, the application layer brought by large models has greater opportunities.

There is a generational breakthrough of technology at the bottom, and a vigorous explosion of application demand at the top, and the ecological effect is stimulated.

According to Bloomberg Intelligence's prediction, the generative AI market will explode from $40 billion to $1.3 trillion** by 2032, covering a wide range of participants in the ecological chain, including infrastructure, basic models, developer tools, application products, terminal products, and so on.

The formation of this ecological chain has promoted new changes in the industry and is expected to make AI further become the underlying core productivity.

Based on this background, let's look at what is happening in the smart industry today.

On the one hand, the AIGC application storm represented by large models is rapidly ** from the cloud to the terminal ** in the iteration rhythm of days.

ChatGPT is the first to update the multi-modal function of "audiovisual talk" on the mobile terminal, and users can take photos and upload them, and they can talk to ChatGPT for the photo content.

For example, "How to adjust the height of the bike seat":

** **#### and GPT-4 graphic dialogue, give 5 suggestions in seconds

Qualcomm has also quickly realized the large model of Stable Diffusion and ControlNet running more than a billion parameters on the terminal side, and it only takes more than a dozen seconds to generate high-quality AI images on mobile phones.

Many mobile phone manufacturers have also announced that they will install the "brain" of large models for their voice assistants.

And it's not just phones.

In large-scale exhibitions at home and abroad, such as Shanghai Auto Show, Chengdu Auto Show, Munich Motor Show, etc., cooperation between basic model manufacturers and car manufacturers is becoming more and more common, and large model "getting on the car" has become a new competition point in the field of intelligent cockpit.

** **###### One sentence can make the car model buy ingredients in the APP, and you can cook when you go home

On the other hand, the outbreak of ** applications has exacerbated the situation that computing power is in short supply. **

It is foreseeable that the inference cost of the model will increase with the increase in the number of daily active users and their frequency of use, and relying only on cloud computing power alone is not enough to quickly promote the scale of generative AI.

This can also be seen from the fact that all walks of life are increasing their attention to terminal-side AI computing power.

For example, the terminal-side AI player Qualcomm released a new generation of PC computing platform for PC chip performance improvement, using Qualcomm's self-developed Oryon CPU, especially the NPU equipped with it will provide more powerful performance for generative AI, which is named the Snapdragon X series platform.

This new computing platform is expected to be released at the 2023 Snapdragon Summit.

Obviously, whether from the perspective of application or computing power, smart terminals have become one of the scenarios with the greatest landing potential of AIGC.

AIGC Reef Under Tide

Things often have two sides, and so do big models from rapid development to landing.

When generative AI has soared all the way to today, the real bottleneck under the huge potential of the intelligent terminal industry has surfaced.

**One of the biggest constraints is the lowest level of hardware. **

As Sequoia investors Sonya Huang and Pat Grady mentioned in their latest generative AI analysis article "Generative AI's Act Two", AIGC is growing fast, but the expected bottleneck is not customer demand, but supply-side computing power.

The computing power here mainly refers to AI and machine learning hardware accelerators, which can be divided into five categories from the perspective of deployment scenarios:

Data center-class systems, server-level accelerators, accelerators for assisted driving & autonomous driving scenarios, edge computing, and ultra-low-power accelerators.

** **###### 5 types of AI accelerators, source MIT paper "AI and ML Accelerator Survey and Trends"

With the explosion of ChatGPT, the large model has driven AIGC phenomenal out of the circle, making the "** cloud computing power**" such as data centers and server-level processors receive a lot of attention in the short term, and even the situation of short supply.

However, as generative AI enters its second phase, some questions about computing power are becoming more and more prominent.

**The first and biggest problem is cost. **As stated in Qualcomm's "Hybrid AI is the Future of AI" white paper, now more than half a year has passed, as large models shift from technology chasing to application landing, the basic model ** training ** of each company has gradually settled, and most of the computing power has fallen on the ** reasoning ** of large models.

In the short term, the inference cost is acceptable, but as there are more and more apps for large models and more and more application scenarios, the cost of inference on accelerators such as servers will increase sharply, eventually resulting in the cost of calling large models higher than training large models themselves.

In other words, after the large model enters the second stage, the long-term demand for computing power for inference will be much higher than that of a single training, and relying only on the "cloud computing power" composed of data centers and server-level processors is completely insufficient to hit the inference to a cost acceptable to users.

According to Qualcomm's statistics in the white paper, taking the search engine with a large model as an example, the cost of each search query can reach 10 times that of traditional methods, and the annual cost in this area alone may increase by billions of dollars.

This is destined to become a key constraint to the landing of large models.

**Along with that, there are latency, privacy, and personalization issues. **Qualcomm also mentioned in "Hybrid AI is the Future of AI" that large models are directly deployed in the cloud, in addition to the insufficient amount of server computing caused by the surge in users, the need to "queue for use" and other bugs, it is also bound to solve user privacy and personalization problems.

If users do not want to upload data to the cloud, the use scenarios of large models, such as office and intelligent assistant, will be subject to many restrictions, and most of these scenarios are distributed on the terminal side; If you need to further pursue better results, such as customizing large models for your own use, you need to directly use personal information for large model training.

Under various factors, the "terminal computing power" that can play a role in reasoning, that is, several types of processors including automatic driving & assisted driving, edge computing (embedded) and ultra-low-power accelerators, have begun to enter people's field of vision.

Terminals have huge computing power. According to IDC's prediction, the number of global IoT devices will exceed 40 billion by 2025, generating nearly 80 zettabytes of data, and more than half of the data needs to rely on terminal or edge computing power for processing.

However, the terminal also has problems such as limited power consumption and heat dissipation, resulting in limited computing power.

In this case, how to use the huge computing power hidden in the terminal to break through the bottleneck faced by the development of cloud computing power is becoming one of the most common technical problems in the "modular power era".

**Not to mention that in addition to computing power, the implementation of large models also faces challenges such as algorithms, data and market competition. **

For the algorithm, the architecture of the underlying model is still unknown. ChatGPT has achieved good results, but its technical route is not the architectural direction of the next-generation model.

For data, high-quality data is indispensable for other companies to achieve the big model results of ChatGPT, but Generative AI's Act Two also points out that the data generated by the application company does not really create a barrier.

The advantage built by data is fragile and unsustainable, and the next generation of basic models is likely to directly destroy this "wall", in contrast, continuous and stable users can truly build data sources.

For the market, there are currently no killer applications for large model products, and it is still unknown what kind of scenarios it is suitable for.

In this era, what kind of products it is used in and which applications can exert its greatest value, the market has not yet given a set of methodologies or standard answers that can be followed.

**In response to this series of problems, there are currently two main ways to solve problems in the industry. **

One is to improve the algorithm of the large model itself, without changing the "essence" of the model, better improve its size, and enhance its deployment ability on more devices;

Taking the Transformer algorithm as an example, such models with a large number of parameters must be adjusted in the structure if they want to run on the end side, so many lightweight algorithms such as MobileViT have been born during this time.

These algorithms seek to improve the structure and the amount of parameters without affecting the output effect, so that they can run on more devices with smaller models.

The other is to improve the AI computing power of the hardware itself, so that large models can better land on the end side.

Such methods include multi-core design on hardware and development software stacks, which are used to improve hardware computing performance and the versatility of models on different devices, so as to enhance the possibility of large models landing on the end side.

The former can be called the adaptation of software to hardware, and the latter is that hardware manufacturers adapt to the change of the tide of the times. But in either direction, there is a risk of being overtaken by betting alone. **

Under the "era of modular power", technology is changing with each passing day, and new breakthroughs may appear from either side of software and hardware, and once the necessary technical reserves are lacking, they may fall behind.

So should we blindly follow up or simply miss the development of this wave of technology? Not really.

**For companies that have discovered their own value in the Internet and AI era, they may also be able to explore a third solution idea in the AIGC era based on their own scenarios and technology accumulation. **

Take Qualcomm, an AI company with both software and hardware technologies, as an example.

Faced with the challenges of large model technology in different scenarios, Qualcomm has jumped out of the identity of a chip company and embraced the wave of AIGC early.

In addition to continuously improving the AI computing power of the terminal-side chip, Qualcomm is also laying out basic AI technology, striving to accelerate the speed of the entire intelligent terminal industry to embrace AIGC as an enabling enterprise.

However, there are also various foreseeable difficulties in this approach:

For larger and more complex AI models, how to ensure performance while making it run smoothly on the terminal?

When to use different models to best allocate computing power between terminals and clouds?

Even if the problem of large models being deployed on the terminal side is solved, which part should be deployed in the cloud and which part should be deployed on the terminal, and how to ensure that the connections and functions between different parts of the large model are not affected?

If the performance advantage on the terminal side is insufficient, how to solve it?

……

These issues do not appear in a single case, but already exist in every industry or scenario affected by AIGC.

Whether it is a game-breaking method or actual landing experience, the answer can only be explored from specific scenarios and industry cases.

**How to break the fog of the "Modular Power Era"? **

AIGC has entered the second phase, large models are becoming more popular, and the industry has begun to explore ways to land.

** Qualcomm's "Hybrid AI is the Future of AI" white paper mentioned that taking smartphones and PCs as an example, there have been many cases of AIGC landing scenarios in the new battlefield intelligent terminal industry. **

Companies are already deploying smaller, larger models to the terminal side for more personalized problems, including finding messages, generating reply messages, modifying calendar events, and one-click navigation.

For example, "book a favorite restaurant seat", based on the large model, according to the user data analysis of favorite restaurants and free schedules, give scheduling recommendations, and add the results to the calendar.

Qualcomm believes that due to the limited amount of large model parameters deployed by the terminal and the lack of networking, there may be "AI illusion" when answering, and then it can be based on orchestrator technology to set guardrails when the large model lacks information to prevent the above problems.

If you are not satisfied with the content generated by the large model, you can also send the question to the cloud for execution with one click, and then feedback the large model generation result with better answer to the terminal side.

In this way, it can not only reduce the computing power pressure of large models running in the cloud, but also ensure that large models can be personalized while protecting user privacy to the greatest extent.

**As for the technical bottlenecks that need to be broken through such as terminal computing power and algorithms, some players have also developed some "ways to break the game". **

In the white paper, Qualcomm introduced a class of new technologies that have been widely used in the white paper, such as speculative decoding, which was a fire some time ago.

This is a method discovered by Google and DeepMind at the same time to accelerate the inference of large models, and can apply a smaller large model to accelerate the generation of large models.

Simply put, it is to train a smaller model and generate a batch of "candidate words" for the large model in advance, rather than letting the large model "think" and generate by itself, and directly make "choices".

Since the small model generation speed is several times faster than the large model, once the large model feels that the words that the small model already has are available, it can be taken directly without slowly generating it yourself.

This method mainly takes advantage of the fact that the inference speed of large models is more affected by memory bandwidth than the increase in computational amount.

Due to the huge number of parameters and far exceeding the cache capacity, large models are more likely to be limited by memory bandwidth than computing hardware performance during inference. For example, GPT-3 needs to read all 175 billion parameters every time it generates a word, and the computing hardware is often idle while waiting for memory data from DRAM.

In other words, when the model does batch inference, there is little difference in time between processing 100 tokens and one token at a time.

Therefore, the use of speculative sampling can not only easily run large models with tens of billions of parameters, but also put part of the computing power on the terminal side, ensuring the inference speed while retaining the generation effect of large models.

……

But whether it is a scenario or a technology, in the end, we must find each other's adaptation points in order to produce substantial application value**, just as the relationship between software and hardware is inseparable:

Software algorithm breakthroughs such as generative AI, when looking for smart terminal landing scenarios, will inevitably face technical requirements combined with mobile AI hardware such as Qualcomm.

Including smartphones, PCs, XR, automobiles and the Internet of Things, how can various segments of the smart terminal industry find their own play and value based on AIGC hotspots?

How can enterprises seize this wave of the times to stimulate the application value of this type of technology and not miss the opportunity of productivity transformation in the whole industry?

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)