lib.envs package¶

Submodules¶

lib.envs.envDiscAction module¶

class lib.envs.envDiscAction.Env(fileName, showEnv=False, trainMode=True)[source]¶

Bases: object

A convinience function for generating episodes and memories

This convinience class generates a context manager that can be used for generating a simple discrete environment. This is supposed to be a drop-in replacement for the different for any other environment. This environment is useful for testing whether a an Agent that has too select discrete actions is properly doing its job. This does not take any input parameters, and reeturns only the the single environment. The environment is shown below.

Initialize the environment

This sets up the requirements that will later be used for generating the Unity Environment. This assumes that you will provide a binary file for generating the environment. There are different ways in which the environment can be generated. It can be generated either in a headless mode by using showEnv as False, in which case the environment will not show a window at startup. This is good for training, as well as situations when you are running the environment without the presence of an X server, especially when you are running this environment remotely. The other thing that you can do is to specify that this is being run in trainMode. In this case, the environment will be primed for training. That is, each frame will finish as soon as possible. This is not good for observing what is happening. However, this significantly increases the speed of training.

Keyword Arguments:
Parameters:	{str} -- Path to the binary file. This file must be (fileName) – the same as the one for which the unityagents package has been generated.
	{bool} -- Set this to True if you want to view the (showEnv) – environment (default: {False}) {bool} -- Set this to True if you want the environment (trainMode) – tobe in training mode (i.e. fast execution) (default: {True})

__enter__()[source]¶

generate a context manager

This will actually generate the context manager and allow you use this within a with statement. This is the function that actually initialized the environment and maintains it, until it is needed.

Returns:	`this` – Returns an instance of the same class

__exit__(exc, value, traceback)[source]¶

Exit the context manager

The exit funciton that will result in exiting the context manager. Typically one is supposed to check the error if any at this point. This will be handled at a higher level

Parameters:	{[type]} -- [description] (*args) –

episode(policy, maxSteps=None)[source]¶

generate data for an entire episode

This function generates an entire episde. It plays the environment by first resetting it too the beginning, and then playing the game for a given number of steps (or unless the game is terminated). It generates a set of list of tuplees, again one for each agent. Rememebr that even when the number of agents is 1, it will still return a list oof states.

Keyword Arguments:
Parameters:	{function} -- The function that takes the current state and (policy) – returns the action vector.
	{int or None} -- The maximum number of steps that the agent is (maxSteps) – going to play the episode before the episode is terminated. (default: {None} in which case the episode will continue until it actually finishes)
Returns:	list – This returns the list of tuples for the entire episode. Again, this is a lsit of lists, one for each agent.

reset()[source]¶

reset the environment before starting an episode

Returns:	status – The current status after the reset

step(policy)[source]¶

advance one step by taking an action

This function takes a policy function and generates an action according to that particular policy. This results in the advancement of the episode into a one step with the return of the reward, and the next state along with any done information.

Parameters:	{function} -- This function takes a state vector and (policy) – returns an action vector. It is assumed that the policy is the correct type of policy, and is capable if taking the right returning the right type of vector corresponding the the policy for the current environment. It does not check for the validity of the policy function
Returns:	list – This returns a list of tuples containing the tuple `(s_t, a_t, r_{t+1}, s_{t+1}, d)`. One tuple for each agent. Even for the case of a single agent, this is going to return a list of states

lib.envs.envGym module¶

class lib.envs.envGym.Env(envName, showEnv=False)[source]¶

Bases: object

A convinience function for generating episodes and memories

This convinience class generates a context manager that can be used for generating a Gym environment. This is supposed to be a drop-in replacement for the Unity environment. This however differs from the Unity environment in that it needs the name of the environment as input. The other difference is that there is no such thing as trainMode.

Initialize the environment

This sets up the requirements that will later be used for generating the gym Environment. The gym environment can be used in a mode that hides the plotting of the actuual environment. This may result in a significant boost in speed.

Keyword Arguments:
Parameters:	{str} -- The name of the environment to be generated. This (envName) – shoould be a valid name. In case the namme provided is not a valid name, this is going to exis with an error.
	{bool} -- Set this to True if you want to view the (showEnv) – environment (default: {False})

__enter__()[source]¶

generate a context manager

This will actually generate the context manager and allow you use this within a with statement. This is the function that actually initialized the environment and maintains it, until it is needed.

The idea of multiplel agents within the gym enviroonments doesnt exists as it does in the Unity agents. However, we shall incoroporoate this idea within the gym environment so that a signgle action can takke place.

Returns:	`this` – Returns an instance of the same class

__exit__(exc, value, traceback)[source]¶

Exit the context manager

The exit funciton that will result in exiting the context manager. Typically one is supposed to check the error if any at this point. This will be handled at a higher level

Parameters:	{[type]} -- [description] (*args) –

episode(policy, maxSteps=None)[source]¶

generate data for an entire episode

This function generates an entire episde. It plays the environment by first resetting it too the beginning, and then playing the game for a given number of steps (or unless the game is terminated). It generates a set of list of tuplees, again one for each agent. Rememebr that even when the number of agents is 1, it will still return a list oof states.

Keyword Arguments:
Parameters:	{function} -- The function that takes the current state and (policy) – returns the action vector.
	{int or None} -- The maximum number of steps that the agent is (maxSteps) – going to play the episode before the episode is terminated. (default: {None} in which case the episode will continue until it actually finishes)
Returns:	list – This returns the list of tuples for the entire episode. Again, this is a lsit of lists, one for each agent.

reset()[source]¶

reset the environment before starting an episode

Returns:	status – The current status after the reset

step(policy)[source]¶

advance one step by taking an action

This function takes a policy function and generates an action according to that particular policy. This results in the advancement of the episode into a one step with the return of the reward, and the next state along with any done information.

Parameters:	{function} -- This function takes a state vector and (policy) – returns an action vector. It is assumed that the policy is the correct type of policy, and is capable if taking the right returning the right type of vector corresponding the the policy for the current environment. It does not check for the validity of the policy function
Returns:	list – This returns a list of tuples containing the tuple `(s_t, a_t, r_{t+1}, s_{t+1}, d)`. One tuple for each agent. Even for the case of a single agent, this is going to return a list of states

class lib.envs.envGym.Env1D(envName, N=1, showEnv=False)[source]¶

Bases: object

A convinience function for generating episodes and memories

This convinience class generates a context manager that can be used for generating a Gym environment. This is supposed to be a drop-in replacement for the Unity environment. This however differs from the Unity environment in that it needs the name of the environment as input. The other difference is that there is no such thing as trainMode.

This 1D environment is designed to takke 1D state vector and use this vector in its calculations. If you are using a 1D environment you are advised to use this.

This environment has the added advantage that it will automatically stack together N previous states into a single state. Note that the first state will be copied N times, rather than zero padding as this seems a more natural state for the beginning.

Initialize the environment

This sets up the requirements that will later be used for generating the gym Environment. The gym environment can be used in a mode that hides the plotting of the actuual environment. This may result in a significant boost in speed.

Keyword Arguments:
Parameters:	{str} -- The name of the environment to be generated. This (envName) – shoould be a valid name. In case the namme provided is not a valid name, this is going to exis with an error.
	{integer} -- Set this to the number of states that you wish to (N) – that will be concatenated together. (default (have) – 1). You will not able to set a value less than 1. (be) – {bool} -- Set this to True if you want to view the (showEnv) – environment (default: {False})

__enter__()[source]¶

generate a context manager

This will actually generate the context manager and allow you use this within a with statement. This is the function that actually initialized the environment and maintains it, until it is needed.

The idea of multiplel agents within the gym enviroonments doesnt exists as it does in the Unity agents. However, we shall incoroporoate this idea within the gym environment so that a signgle action can takke place.

Returns:	`this` – Returns an instance of the same class

__exit__(exc, value, traceback)[source]¶

Exit the context manager

The exit funciton that will result in exiting the context manager. Typically one is supposed to check the error if any at this point. This will be handled at a higher level

Parameters:	{[type]} -- [description] (*args) –

episode(policy, maxSteps=None)[source]¶

generate data for an entire episode

This function generates an entire episde. It plays the environment by first resetting it too the beginning, and then playing the game for a given number of steps (or unless the game is terminated). It generates a set of list of tuplees, again one for each agent. Rememebr that even when the number of agents is 1, it will still return a list oof states.

Keyword Arguments:
Parameters:	{function} -- The function that takes the current state and (policy) – returns the action vector.
	{int or None} -- The maximum number of steps that the agent is (maxSteps) – going to play the episode before the episode is terminated. (default: {None} in which case the episode will continue until it actually finishes)
Returns:	list – This returns the list of tuples for the entire episode. Again, this is a lsit of lists, one for each agent.

reset()[source]¶

reset the environment before starting an episode

Returns:	status – The current status after the reset

step(policy)[source]¶

advance one step by taking an action

This function takes a policy function and generates an action according to that particular policy. This results in the advancement of the episode into a one step with the return of the reward, and the next state along with any done information.

Parameters:	{function} -- This function takes a state vector and (policy) – returns an action vector. It is assumed that the policy is the correct type of policy, and is capable if taking the right returning the right type of vector corresponding the the policy for the current environment. It does not check for the validity of the policy function
Returns:	list – This returns a list of tuples containing the tuple `(s_t, a_t, r_{t+1}, s_{t+1}, d)`. One tuple for each agent. Even for the case of a single agent, this is going to return a list of states

lib.envs.envUnity module¶

class lib.envs.envUnity.Env(fileName, showEnv=False, trainMode=True)[source]¶

Bases: object

A convinience function for generating episodes and memories

This convinience class generates a context manager that can be used for generating a Unity environment. The Unity environment and the OpenAI Gym environment operates slightly differently and hence it will be difficult to create a uniform algorithm that is able to solve everything at the sametime. This environment tries to solve that problem.

Initialize the environment

This sets up the requirements that will later be used for generating the Unity Environment. This assumes that you will provide a binary file for generating the environment. There are different ways in which the environment can be generated. It can be generated either in a headless mode by using showEnv as False, in which case the environment will not show a window at startup. This is good for training, as well as situations when you are running the environment without the presence of an X server, especially when you are running this environment remotely. The other thing that you can do is to specify that this is being run in trainMode. In this case, the environment will be primed for training. That is, each frame will finish as soon as possible. This is not good for observing what is happening. However, this significantly increases the speed of training.

Keyword Arguments:
Parameters:	{str} -- Path to the binary file. This file must be (fileName) – the same as the one for which the unityagents package has been generated.
	{bool} -- Set this to True if you want to view the (showEnv) – environment (default: {False}) {bool} -- Set this to True if you want the environment (trainMode) – tobe in training mode (i.e. fast execution) (default: {True})

__enter__()[source]¶

generate a context manager

This will actually generate the context manager and allow you use this within a with statement. This is the function that actually initialized the environment and maintains it, until it is needed.

Returns:	`this` – Returns an instance of the same class

__exit__(exc, value, traceback)[source]¶

Exit the context manager

The exit funciton that will result in exiting the context manager. Typically one is supposed to check the error if any at this point. This will be handled at a higher level

Parameters:	{[type]} -- [description] (*args) –

episode(policy, maxSteps=None)[source]¶

generate data for an entire episode

This function generates an entire episde. It plays the environment by first resetting it too the beginning, and then playing the game for a given number of steps (or unless the game is terminated). It generates a set of list of tuplees, again one for each agent. Rememebr that even when the number of agents is 1, it will still return a list oof states.

Keyword Arguments:
Parameters:	{function} -- The function that takes the current state and (policy) – returns the action vector.
	{int or None} -- The maximum number of steps that the agent is (maxSteps) – going to play the episode before the episode is terminated. (default: {None} in which case the episode will continue until it actually finishes)
Returns:	list – This returns the list of tuples for the entire episode. Again, this is a lsit of lists, one for each agent.

reset()[source]¶

reset the environment before starting an episode

Returns:	status – The current status after the reset

step(policy)[source]¶

advance one step by taking an action

This function takes a policy function and generates an action according to that particular policy. This results in the advancement of the episode into a one step with the return of the reward, and the next state along with any done information.

Parameters:	{function} -- This function takes a state vector and (policy) – returns an action vector. It is assumed that the policy is the correct type of policy, and is capable if taking the right returning the right type of vector corresponding the the policy for the current environment. It does not check for the validity of the policy function
Returns:	list – This returns a list of tuples containing the tuple `(s_t, a_t, r_{t+1}, s_{t+1}, d)`. One tuple for each agent. Even for the case of a single agent, this is going to return a list of states

Module contents¶

several environments are available for immediate import

This library contains containerized versions of the different environments that can be used for training a number of environments. This is essential for being able to checke the qualit of the different learning algorithms. Currently the different environments available are as foollows:

envUnity: The Unity Environment

envGym: The gym environment

The details of instaling each of these environments will be shown below …