Here are the best MSEs I got on the validation set for different numbers of rectified linear units (I am still trying to predict the next acoustic sample from the previous 100). Keep in mind that here an epoch consists of an iteration over 5000 mini-batches of size 2048. As you can see, it seems pointless to use more than 500 units.
I have also started training models with two hidden layers. They seem much more promising. My first try, with 300 units on the first layer and 200 on the second layer, produced a MSE of 0.028 on the validation set.
EDIT: I made a slight improvement with 2500 hidden units and updated the table accordingly.
Please specify what data you used and how you processed it (pointing to your blog), so others can compare.
I am using the Pylearn2 TIMIT dataset written by Vincent Dumoulin (http://github.com/vdumoulin/research/blob/master/code/pylearn2/datasets/timit.py), in which the acoustic samples are standardized by removing the mean and dividing by the standard deviation, measured over the whole dataset.
Pingback: ConvNets 5 | Speech Synthesis Project (ift6266h14)
Pingback: Generating one phone from one TIMIT speaker | DAVIDTOB
Pingback: Results with ReLUs and different subjects | IFT6266 Project on Representation Learning