Manipulator-Control-using-RL

This project attempts to use Reinforcement Learning to train a model to perform various control task on a manipulator.

Current State of the Project

Training the model to generalise for random target location.

Tasks

The first task for the manipulator is to reach a fixed target location.
The next task is to learn a generalised policy for the arm to reach any random target location.
Following a trajectory between two locations

Environment and Model

Environment

Developed a gym environment for Reinforcement Learning on manipulator control using PyBullet Physics Simulator.
The arm's URDF file and Physics Simulator are defined under /ents/arm.py.
The environments are defined under /envs/. The action space and observation space are defined here, also the step function is defined here.

Environment Types

Point-to-Point

In this environment the agent starts in a fixed state and it is supposed to reach a fixed target location.

Point-to-Random-Point

In this environment the agent starts from a fixed state and it is supposed to reach a target location whose position is decided at the start of every episode.
The area where the target spawns can be limited.

Environment Parameters

Episode Length
- 6500 timesteps
Torque Sensitivity
- 200
Velocity Sensitivity
- 10
Position Sensitivity
- 1
Maximum Force
- 300
Arm Base Position
- 0, 0, 0.005

Action Space

There are 3 different Action Types:

Torque Control (Maximum Torque=200)
Velocity Control (Maximum Velocity = 10)
Position Control (Maximum Force = 300)

Keeping in mind, the hardware implementation of the project, we have used Position Control as the action space. We also have trained policies for Torque Control action space

Torque Control

Observation Space

The observation space(dim=13) consists of:

Index	Observation	Range
0-6	Joint Position (angular)	- $\pi$ to $\pi$
7-9	End Effector Relative location wrt target (cartesian)	-10 to 10
10-12	Velocity of End Effector (cartesian)	-10 to 10

Model

The models used in this project, for Torque Control are from Stable Baselines3 a set of reliable implementations of reinforcement learning algorithms using PyTorch. We have use Proximal Policy Optimization (PPO) algorithm to train our manipulator agent.

PPO

Model HyperParameters

Network Architecture:
- Policy Network
  - [512, 512, 256, 128]
- Value Function Approximator
  - [512, 512, 256, 128]
Activation Function
- Relu
Learning Rate
- 0.0001

Reward Function

Euclidean Distance

The reward function is defined as the negative of Euclidean Distance between the target location and end effector.

reward = -1*np.linalg.norm(np.array(eeloc)-np.array(self.goal))

Zero-Reward Box

Defined a cube of side 0.1 units around the goal location, where the agent obtains 0 reward.

Results and Simulation

Position Control

Observation Space

The observation space(dim=17) consists of:

Index	Observation	Range
0-6	Joint Position (angular)	- $\pi$ to $\pi$
7-13	Joint Angular Velocity	-10 to 10
10-12	Target Location (cartesian)	-2 to 2

Model

The models used in this project are from Stable Baselines3 a set of reliable implementations of reinforcement learning algorithms using PyTorch. We have use Proximal Policy Optimization (PPO) algorithm to train our manipulator agent.

PPO

Model HyperParameters

Network Architecture:
- Policy Network
  - [512, 512, 256, 128]
- Value Function Approximator
  - [512, 512, 256, 128]
Activation Function
- Relu
Learning Rate
- 0.0001

Reward Function

Euclidean Distance

Firstly, The reward function is defined as the negative of Euclidean Distance between the target location and end effector.

reward1 = -1*np.linalg.norm(np.array(eeloc)-np.array(self.goal))

Results and Simulation

Rendering has a delay of 0.1 sec after every action for better visualisation

Penalty on Angular Velocity

The vibrations observed above after reaching the target location were decreased by penalizing the agent based on the value of the angular velocity of every joint of the arm.

for ang_vel in ang_vel_joints:
    if ang_vel>3:
	    reward2 = -1*ed_dis

The Euclidean Distance based reward as defined above was untouched.

Results and Simulation

Rendering has a delay of 0.1 sec after every action for better visualisation

Problems faced while training the manipulator for Position Control

1. Vibrations after reaching target location

The vibrations that were observed after reaching the specific target location in the first simulation was possibly because there were no restrictions on the angular velocity of any joint of the manipulator. Thus, even after reaching its location the joints had significant angular velocity, and hence the shaky movement of the arm.
We overcame this issue by defining another reward function along with the Euclidean Distance, which is to penalize the agent, if the angular velocity of any of its joint is greater than a certain threshold (=3). To keep this reward in proportional to the first reward function, the penalty is equal to

$reward2 = -1*Euclidean Distance$

As we can observe in the second simulation, the vibrations were reduced considerably, and the manipulator was quite stable after reaching the target location.

A Visualisation to the Manipulator's Vibrations

Euclidean Distance of end effector with respect to target location for first case

Euclidean Distance of end effector with respect to target location for second case, where the agent is penalized on its angular velocity

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ArmEnv		ArmEnv
Images		Images
Position		Position
Torque		Torque
README.md		README.md
testing.py		testing.py
testing2.py		testing2.py
testing_model.py		testing_model.py
testing_rdgl.py		testing_rdgl.py
training.py		training.py
training_rdgl.py		training_rdgl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Manipulator-Control-using-RL

Current State of the Project

Tasks

Environment and Model

Environment

Environment Types

Point-to-Point

Point-to-Random-Point

Environment Parameters

Action Space

Torque Control

Observation Space

Model

PPO

Reward Function

Euclidean Distance

Zero-Reward Box

Results and Simulation

Position Control

Observation Space

Model

PPO

Reward Function

Euclidean Distance

Results and Simulation

Penalty on Angular Velocity

Results and Simulation

Problems faced while training the manipulator for Position Control

1. Vibrations after reaching target location

A Visualisation to the Manipulator's Vibrations

About

Releases

Packages

Languages

IvLabs/Manipulator-Control-using-RL

Folders and files

Latest commit

History

Repository files navigation

Manipulator-Control-using-RL

Current State of the Project

Tasks

Environment and Model

Environment

Environment Types

Point-to-Point

Point-to-Random-Point

Environment Parameters

Action Space

Torque Control

Observation Space

Model

PPO

Reward Function

Euclidean Distance

Zero-Reward Box

Results and Simulation

Position Control

Observation Space

Model

PPO

Reward Function

Euclidean Distance

Results and Simulation

Penalty on Angular Velocity

Results and Simulation

Problems faced while training the manipulator for Position Control

1. Vibrations after reaching target location

A Visualisation to the Manipulator's Vibrations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages