The Fourth Paradigm Chief Scientist Yang Qiang: AlphaGo's Weakness in Migration Learning

[Netease Smart News May 29 news] Recently, the Global Machine Intelligence Summit hosted by the heart of the machine (GMIS 2017) held in Beijing, the fourth paradigm co-founder, chief scientist Professor Yang Qiang as an expert in artificial intelligence and transfer learning , attended the conference and delivered a keynote speech, sharing the latest research on transfer learning.

Professor Yang Qiang believes that AlphaGo is not "God". It has weaknesses. One of the weaknesses is the lack of ability to transfer learning. Our ability is very good. For example, after humans learn to ride a bicycle, we feel like riding a motorcycle. Very easy, this is very natural for humans, but artificial intelligence is still not doing well.

In addition, Yang Qiang pointed out why we need to study transfer learning? First of all, in life, we encounter more small data, and the model of learning on small data is true intelligence. Second, the system that we hope to build not only can play a role in that area, but also play a role in its periphery. That is, we hope that the system will be reliable, and that it will be able to draw lessons from one another. This is also a definition we have given wisdom. Third, we hope that what is more important is how we can move a general-purpose system with personal small data and migrate into individual scenes. Therefore, we can develop towards personalization. Migration learning is an essential tool.

There are no examples of migration learning?

For example, Professor Yang Qiang said that in terms of staged car sales, we know that every single car is very large, so the number of such orders is very small, so this data is small data, even less than a hundred sample. However, we also have tens of millions of small data transactions. We have built a model that can migrate a small data model to large orders. The final results we achieved with migration learning are better than the traditional model. %. (Yi Zhi)

The following is a record of Prof. Yang Qiang's speech, with slight deletions.

Hello everyone, I'm Yang Qiang. Today I would like to share a question that everyone may care about. What are the latest advances in artificial intelligence? I started with a hot topic recently. We know that this week's hottest topic in the artificial intelligence community is the Wuji errant station. AlphaGo and Ke Jie played three games. The game looks very exciting. What did we learn in the inside? This is what I am going to talk about today.

First of all, Ke Jie left us with a sentence. Ke Jie said that AlphaGo looks like a god. It seems to be impeccable, but is there any weakness in the machine learning perspective? My personal opinion is that it has weaknesses, and this weakness is still very serious. The weakness is that it does not have the ability to transfer learning. Migration learning is a special feature of our people. What are the characteristics? First of all, we know that a machine's ability to learn in a large amount of data, so the quality of the data is very important. But how do you spread the knowledge you learned on a 19x19 checkerboard to a 21x21 chessboard and learn how to play chess after going on chess? Can you learn how to use it in your life, on the commercial side, or in the future? In our daily activities? The machine today has no such promotion ability. This is the topic we are going to talk about today.

But before I say this topic, I have to say a few things I think I learned in AlphaGo 2.0. The first is that this year's data is very different from last year's game with Li Shishi. The quality of the data improves the level of AlphaGo. Greatly improved, so the quality of the data is very important. At the same time, the computing architecture is also very important. Alpha Dog used thousands of CPUs and hundreds of GPUs last year, but this year only used a small TPU. The computing architecture used last year and the computing architecture of the year were a leap change. . At the same time, the algorithm is also very important. Let the computer self-train, self-learning, which reminds me of a movie called “imitation game”. Turing suddenly came up with inspiration when studying the German submarine code. We people have no way to defeat the machine. , but the machine can defeat the machine. What are the characteristics of the machine? It is an automatic operation. If we give the machine the ability to make it self-learning, then it will be able to surpass our capabilities in some ways, so this self-learning algorithm is extremely important.

These three points are also very important for the commercial field and the landing of artificial intelligence. Under our thinking, in our work and life, do we have artificial intelligence applications that have high-quality data, have a good computing framework, and have a closed loop where self-learning continues to increase? These three conditions are the first thing I want to say today.

Going into our theme today is what AlphaGo can't do, where his weakness is. Everyone kindly calls AlphaGo “teacher”. This teacher should also have weaknesses. One of his weaknesses is the lack of ability to transfer learning. This ability of people is very good. For example, after we learn to ride a bicycle, we feel very easy to ride a motorcycle. This is very natural to us. When we look at a picture or two, we can extend it to many different things. Scenery, this ability is also very strong. We have a knowledge. We have extended this knowledge to other fields. How does this ability describe our human computing system? Called reliability, we are “robust” and we can bring past experience to different scenarios and adapt to different environments. How can we let the machine have this ability?

Let me give you an example for everyone to think about. The domestic car itself is a left rudder, and the Hong Kong car is a right rudder. If we go to Hong Kong car rental after learning to drive in Beijing, how can we quickly learn to drive in Hong Kong? This is an embarrassment. I am referring to the embarrassment of migrating learning. The answer is that the driver's seat is always in the middle of the road. Whether you are driving in Beijing or driving in Hong Kong, you may wish to try it. what? This shows that the essential elements of migration learning are discovering commonalities and discovering commonalities between the two domains. If this common commonality is discovered or we call it features in machine learning, and we discover this common characteristic, the migration is very easy.

Now I want to talk about why we should study migration learning. First of all, we encounter more small data. How can we achieve artificial intelligence on small data? We need to migrate to learn. We see children in the family, cats who see a picture. He sees a real cat. He will say this is a cat. We don’t need to give him ten million cases and ten million negative cases. He has this ability. . People have this ability, so that is the real intelligence.

The second benefit is that we have created a system. We hope that this system will not only be useful in one area, but also in the areas around him. When we change the surrounding environment slightly, our system It can be just as good. This is reliability. It means that we can infer synergy. This is a definition that we have given people wisdom.

The third is that we are now more and more emphasizing personalization. We have something on the mobile phone. We look at news and videos and remind us. Later, we have a robot in our home. These are to provide services for us personally. The more personalized the service, the better, but everyone has not thought that individuality data is often small data, which means that we can integrate thousands of millions of people's data in the cloud, but that way A system is just a general system, and what's more important is how we can add a general system to individual small data and turn it into a personal scene. We can see whether we are visual or speech. As for the recommendation system, all parties need to develop to individuality. Therefore, transfer learning is an indispensable tool.

After saying these benefits, we talked about why there is no large-scale promotion of transfer learning today. This is because migrating learning itself is very, very difficult. In the right picture, he actually asked about learning migration. We know that Everyone in pedagogy is very concerned about how to transfer knowledge to different scenarios. The concept of pedagogy has a history of more than a hundred years, that is, if we want to measure the quality of a teacher, we can often not pass the students’ In the final exam, because that kind of test is only for specific knowledge, students sometimes have to memorize and pass the exam. A better way is to observe the student's performance after finishing the course. How capable he can be Transfer the knowledge of this course to other classes. At that time, we went back to say that this teacher's teaching is good or bad. This is called learning to move. So in the pedagogy, everyone is asking why the learning transfer is so The difficulty lies in how to find common ground. Going back to the example I just drove, how many of us have experienced this kind of driving from the left to the right, and then very distressed, that is to say, it is very difficult for us to find this commonality, Fortunately, we have focused on the field of transfer learning for 10 to 20 years and have achieved good results.

Let me explain to you here what progress has been made recently in this area and what are the studies that are currently underway. We also welcome everyone here to participate in such research.

The first point, if we are to discover the commonalities between different machine learning problems, one way is to separate the structure of the problem from the content of the problem. This is not easy to do, but once it can be done, we will infer the ability of the others. It's very strong. You may not know that there are such training courses in Hollywood to train how to write a screenplay. Everyone thinks that writing a screenplay may be a job that requires a lot of art and talent, but you may not know that this is a script for writing a screenplay. Can also become a factory-like, how to do it? His trick is to separate the content and structure. The first part, the first ten minutes of this movie should play what, the next five minutes this film should play, when should be jealous, when everyone laughs and there is a structure of. After a few days of training, you can become a playwright. You can also become a Hollywood director. How can you make machine learning also have this ability? On the left side of the figure, we see an article in 2015. The three people learned the structure and the way of handwriting on the handwriting recognition data set. As a result, they found that they can use an example in the learning structure. This is called a single. Case study is a very sensation in machine learning.

On the right side, I show a recent study by one of my Ph.D. students, Wu Yuxiang, on a large-scale text, if we can distinguish the structure of the text from its specific content with a deep neural network. If so, then the part of the structure he learned is very easy to use to help our natural language system do different things. For example, to identify topics, text summaries, text summaries, etc., you can even learn how it should be used to generate this text. So we can be a robot to write a press release automatically, so this work I think is very promising.

The second development is the discovery that in the past we have focused too much on finding commonalities in learning, but we have not paid attention to finding this commonality between different levels. It is now found that if we assign problems to different levels, some levels will more easily help us move knowledge. For example, if we have used tens of millions of data in an area and we have trained an eight-layer deep neural network in image recognition, then we change the scene of the problem, and traditional machine learning needs to use tens of millions of data again. , spend a lot of time to train again. But now with this level of migration learning, we will find that different levels have different migration capabilities. Then we can have a quantitative estimate of the migration capabilities of different levels. Therefore, when new problems are encountered, we can fix certain regions and certain levels, and use other regions to do training with small data, so that we can achieve the effect of transfer learning. New job appears. For example, in speech recognition, if we have trained an announcer's speech model, how can we migrate it to an accented language environment? If we find that there are common inner layers that are common to speech, we can use this kind of hierarchical migration so that we can use small data to train the dialect speech model.

At the same time, we can make various changes to this structure like engineers. For example, we can find the commonalities of their semantics between images and texts. If we use a multimodal deep learning network, we have common characteristics. We can learn its internal semantics so that it can move freely between words and images. So this multi-layered migration does bring a lot of convenience.

The third progress is that the past migration learning often means that I have a field that is already well-modeled, and I now aim to migrate it to a new field, from an old field to a new field, from a multi-data The field is migrating to a field of little data. We call this one-step migration in one step. However, we now find that many scenarios require us to do it in phases. For example, in our study, we go to university in four years. Why? ? Because we can't do it overnight, we need to segment our knowledge. There is a convergence from one lesson to the next. Just like stepping over the stone, we need to step on some stones in order to reach the other side. With this idea, we can also carry out multi-step conduction-type migration. For example, we can establish an in-depth network of machine learning. The middle layer of this network can both take care of our target areas and take care of our original fields. At the same time, if we have some intermediate fields, the data in the middle field can be completely unsupervised. What role can it play? They can connect the source domain and the target domain step by step, such as A and B, B and C, C and D. In this way, we can define two objective functions. The first one on the left and the one on the bottom left are for you. For example, if your task is classified, it is better for you to classify it. The second objective function is used to distinguish between the samples and features I have taken when I go through these intermediate fields, making him useful for our optimization function. This is the objective function in the lower right corner. When these two objective functions are optimized together, one aims to optimize the final goal, and one aims to select the best sample, gradually, just like the right figure, the data in our source domain is multi-step migrated to the target domain. Now.

A recent practical example at Stanford University was the use of satellite images to analyze the poverty of the African continent. They also used multi-step migrations. Satellite imagery from day to night was the first step in the migration from the evening image to the The brightness of the evening to the development of this place is the second step of the migration. Through these two steps of migration, we successfully built a model and used satellite images to tell us where the economic situation and poverty are.

The fourth is learning how to migrate. Over the past 20 years we have accumulated a great deal of knowledge and hundreds of migration learning algorithms. When we encounter a machine learning problem, which algorithm should we use? In fact, since there are so many algorithms, so many articles, we can sum up these experiences and use them to train a new algorithm. The teacher of this algorithm is all our machine learning algorithms and data, so how is this learning? Migration is like how we often say learning how to learn. This is the highest level of learning, that is, the acquisition of learning methods. Here we have a doctoral student Wei Ying doing research in this area. The modelling effect he learned last time is to give us any migration learning problem. It can find the most suitable algorithm in our past experience; it can be feature-based, multi-layered network-based, or sample-based. Even some kind of hybrid, so these can be done automatically.

Here is an example of public opinion analysis. We know that sensibility analysis is to give some text and label data, first train a model, and then give some new users feedback, we know whether he is positive or negative. For example, on Weibo or Twitter, we can know everybody's reaction every single day, and whether a specific application is optimistic or pessimistic nowadays, evaluation of the stock market for movies, etc. But this kind of migration is very difficult, because we often face two areas. The left one is labeled, the right one is not labeled. The key issue is how to connect the left and right, that is, we How can we establish the connection between words and words, and then successfully migrate the corresponding data model with annotation to the model without annotation. That is to say, these green words and green words on the map should be corresponding, red words and red words should be corresponding, but the machine does not know this, we people know it, establishing this correspondence in the past is indeed It is done by people. For example, keyword is just to do this. It needs people to provide such knowledge. Now we find that the migration of the original public sentiment analysis, in which each word in this text has the ability to connect like this, does not mean that some words are connected words, other words are not, and each word has different sizes at different levels Probability can connect the two fields. The key is how the machine can discover it automatically. We use the method I learned to learn how to migrate, how to migrate, and how to automatically discover the method of transfer learning.

The fifth progress is to use transfer learning itself as a "meta learning" method, giving it a different approach to learning. If we had a machine learning model before, it would be a migration learning model just by putting a cover on it. However, how can this approach be achieved? We are currently experimenting with reinforcement learning and in-depth learning. Suppose we already have a deep learning model and a reinforcement learning model. We make a wrapper above and we can successfully turn it into a migration learning model.

Let's take an example. This example is a personalized human-computer dialogue system. Suppose we have a general-purpose task dialogue system that can conduct dialogues in the general field, but how can we turn this into personalized and take care of each Personal personality and interest? We are doing this experiment with a recurrent neural network and reinforcement learning. At present, we have done a general task dialogue system. Then through several personalized examples, the system can learn to find some shortcuts in this transition between states. These shortcuts are equivalent to individualized choices.

Now I'm going to the last one that I'm going to talk about. I'm using data generation migration learning. We recently heard a hot term, called generative network, which sounds a bit complicated, but the picture is very good The explanation, we all know what is Turing test? A machine is a student, then a referee here, and the last machine can't tell which one is the person and which is the machine. This generative game against the network he said that the outside referee is a student and the machine inside is also a student. The purpose of both of them is to grow together. Among the questions asked, if the referee finds that it is a machine, then he tells You say you are not true enough. You can improve yourself. If the machine inside finds him deceiving the referee, he can go and tell the referee that you are not smart enough. You also need to improve yourself. Such two people. Continual mutual stimulation forms a kind of confrontation. This is a feature of common learning. So, one of the characteristics of this kind of generative anti-network is that it can generate a lot of simulation data through small data, and then use this simulation data to determine whether it is true or not, so as to stimulate the growth of the generative model. It seems to be a game between a computer and a computer, a computer, and a human. We show here on the left is a game tree. We can use this method to do migration learning. An example given here is a recent work, that is, we use a discriminative model to distinguish whether the data comes from the source domain or the target domain. We let the generative model continue to simulate new areas, so that in the end we can generate a lot of new data, its data is very consistent with the real data, through this approach a discriminator in the field, in addition A generator generates data, we can generate more data through small data, and we can realize the purpose of our migration learning in the new field.

The above migration learning technology has been used in some business areas. One example cited here is the fourth paradigm. In terms of automotive staged marketing, we know that each car ticket is very large, then the number of such orders is very high. Less, so this data is small, even less than a hundred samples. However, we also have tens of millions of small data transactions. We have built a model that can migrate a small data model to large orders. The final results we achieved with migration learning are better than the traditional model. %.

In conclusion, I would like to say that we have made great achievements in deep learning. We are trying hard to make various attempts to strengthen our studies today. But I think the future of machine learning is based on small data, personalization, and reliability. It is migration learning. It is our future.

thank you all!

燑br>

2000W FM Transmitter

The 2kw FM Transmitter is a compact 3U, versatile and economical unit with an R.F. output in excess of 2kW, combined to provide 2kW O/P, with enough `head-room` for high reliability in extreme temperature and VSWR conditions. The FM transmitter offers overall efficiency of greater than 76%, which translates to lower heat dissipation and therefore the higher the reliability.

2000W Fm Transmitter,2000W Fm Stereo Broadcast Transmitter,2000W Fm Transmitter For Radio Station,2Kw Fm Radio Broadcast Transmitter

Anshan Yuexing Technology Electronics Co., LTD , https://www.yxhtfmtv.com