Here are the best MSEs I got on the validation set for different numbers of rectified linear units (I am still trying to predict the next acoustic sample from the previous 100). Keep in mind that here an epoch consists of an iteration over 5000 mini-batches of size 2048. As you can see, it seems pointless to use more than 500 units.
I have also started training models with two hidden layers. They seem much more promising. My first try, with 300 units on the first layer and 200 on the second layer, produced a MSE of 0.028 on the validation set.
EDIT: I made a slight improvement with 2500 hidden units and updated the table accordingly.