Credit: SciTech Daily

Researchers have created an algorithm that determines whether a “student” machine should learn from its teacher or on its own.

A tennis beginner could engage a coach to expedite their learning. There are occasions when trying to perfectly replicate the instructor won’t help the pupil learn since this teacher is (ideally) a brilliant tennis player. Perhaps the instructor makes a powerful jump into the air to skillfully block a shot. In order to develop the abilities necessary to return volleys, the learner may try a few different manoeuvres on her own if she is unable to duplicate those.

The researchers discovered that their mix of imitation learning and trial-and-error learning helped students learn tasks more successfully than techniques that employed only one form of learning when they evaluated this strategy in simulations.

The training of machines that will be used in ambiguous real-world settings, such as a robot being trained to navigate within a building it has never seen, might be improved with the use of this technique.

This blend of self-directed learning and teacher-following is extremely effective. According to main author of a publication on this approach and graduate student in electrical engineering and computer science (EECS), Idan Shenfeld, “It allows our algorithm to perform exceedingly challenging problems that cannot be completed by utilising either technique alone.

Shenfeld collaborated on the paper with senior author Pulkit Agrawal, director of the Improbable AI Lab and an assistant professor in the Computer Science and Artificial Intelligence Laboratory, as well as graduate student Zhang-Wei Hong, Aviv Tamar, an assistant professor of electrical engineering and computer science at the Technion, and senior author Zhang-Wei Hong. At the International Conference on Machine Learning, the study will be presented.

Achieving Balance

Many current approaches use a lot of trial-and-error and physical force to try to balance imitation learning with reinforcement learning. The training process is carried out using a weighted mixture of the two learning techniques, and the process is then repeated until the ideal balance is achieved. This is inefficient and frequently requires so much computer power that it isn’t even practical.

These criteria have guided Agrawal’s research: “We want algorithms that are principled, involve tuning of as few knobs as possible, and achieve high performance.”

The team used a different approach to the issue than previous efforts in order to accomplish this. Their technique is teaching the same activity to two students: one who receives training using a weighted combination of reinforcement learning and imitation learning, and the other who receives instruction using solely reinforcement learning.

Resolving Difficult Issues

The researchers put up several simulated teacher-student training exercises, such as travelling through a lava labyrinth to reach the opposite corner of a grid, to evaluate their methodology. In this instance, the student can only view a slice of the grid since the teacher has a map of the full grid. Their algorithm was substantially quicker than previous approaches and had a nearly flawless success rate across all testing conditions.

They created a simulation in which a robotic hand with touch sensors but no eyesight must reposition a pen to the proper attitude in order to put their algorithm to the test. The student could only determine the pen’s orientation via touch sensors, while the teacher had access to the pen’s true orientation.

They fared better than approaches that merely employed reinforcement learning or imitation learning.

One of the numerous manipulation jobs that a future house robot will have to carry out is reorienting things, according to Agrawal. The Improbable AI lab is working towards this goal.

Robots have been successfully trained using teacher-student learning techniques to manipulate complicated objects and move around in a virtual environment before using their newfound abilities in the actual world. When using these techniques, the instructor has access to confidential information that students do not when it is used in the real world. The instructor, for instance, will be familiar with the intricate layout of the building that the student robot is being taught to navigate using only pictures taken by its camera.

The researchers think their approach has the potential to enhance performance in several situations where imitation or reinforcement learning are being employed, aside from creating smarter robots. For instance, big language models, like GPT-4, are excellent at carrying out a variety of tasks; hence, it is possible to utilise the large model as a teacher to hone a smaller, student model to be even “better” at a specific job. Investigating the parallels and discrepancies between robots and people learning from their different masters is another fascinating direction. The researchers suggest that this sort of study might enhance the educational process.

“What’s interesting about this method compared to related methods is how robust it seems to various parameter choices, and the variety of domains it shows promising results in,” says Abhishek Gupta, an assistant professor at the University of Washington who was not involved in the research. “While the current set of results are largely based on simulation, I am very excited about the future possibilities of applying this work to problems involving memory and reasoning with different modalities, such as tactile sensing.”

To reach the Innovate Tech Show editorial team on your feedback, story ideas and pitches, contact us here.