The highest Accuracies were achieved at the lowest Learning Rates.
The highest Accuracy was achieved at 10 or more Epochs (an Epoch is how many times the the data was entered into the Neural Net).
The highest Accuracy was achieved with about half as many Hidden Nodes as Input Nodes. Further increase in HNs did not improve Accuracy.
The signal emerging from a node is the activation function of the sum of signals entering the node.
My neural network was written in Python, using a Jupyter notebook. It learned (trained) from one set of numbers, (0-9), then was tested on a second set of numbers. To see details of this neural network, such as samples of the correct or incorrect predictions, see below. Find me at Steven J. Klatte on LinkedIn.
The output signal for the "8" was very weak, only 17/99. Generally, the output signal, or "confidence of prediction," of a miss was much lower (52%), than the confidence of an accurate prediction, 94%.
The data above represents about 200 runs on my PC. The time of runs varied from about 1 minute to a few hours, depending on Epochs, Learning Rates, and Hidden Nodes. I also wrote and explored Deep Neural Networks, having two hidden layers, and "tilted" the MNIST digits by 10 degrees.