.. dataset:

Building a DataSet
==================

In order for our networks to learn anything, we need a dataset that contains
inputs and targets. PyBrain has the ``pybrain.dataset`` package for this, and we
will use the ``SupervisedDataSet`` class for our needs.


A customized DataSet
--------------------

The ``SupervisedDataSet`` class is used for standard supervised learning. It
supports input and target values, whose size we have to specify on object
creation::

   >>> from pybrain.datasets import SupervisedDataSet
   >>> ds = SupervisedDataSet(2, 1)

Here we have generated a dataset that supports two dimensional inputs and one
dimensional targets.


Adding samples
--------------

A classic example for neural network training is the XOR function, so let's just
build a dataset for this. We can do this by just adding samples to the dataset:

   >>> ds.addSample((0, 0), (0,))
   >>> ds.addSample((0, 1), (1,))
   >>> ds.addSample((1, 0), (1,))
   >>> ds.addSample((1, 1), (0,))


Examining the dataset
---------------------

We now have a dataset that has 4 samples in it. We can check that with python's
idiomatic way of checking the size of something::

   >>> len(ds)
   4

We can also iterate over it in the standard way::

   >>> for inpt, target in ds:
   ...   print inpt, target
   ...
   [ 0.  0.] [ 0.]
   [ 0.  1.] [ 1.]
   [ 1.  0.] [ 1.]
   [ 1.  1.] [ 0.]

We can access the input and target field directly as arrays::

   >>> ds['input']
   array([[ 0.,  0.],
          [ 0.,  1.],
          [ 1.,  0.],
          [ 1.,  1.]])
   >>> ds['target']
   array([[ 0.],
          [ 1.],
          [ 1.],
          [ 0.]])

It is also possible to clear a dataset again, and delete all the values from it:

   >>> ds.clear()
   >>> ds['input']
   array([], shape=(0, 2), dtype=float64)
   >>> ds['target']
   array([], shape=(0, 1), dtype=float64)