Statistical Results

This section is not intended as a discussion of abstract mathematics. Rather it offers advice about interpreting test results and understanding historical testing.


Realize that the greater the volume of historical data examined during a test, the more meaningful the results. You are well advised to collect as much accurate historical data as you can. Beware vendors offering enormous quantities of low cost data – as the accuracy of this data suffers, so does your testing accuracy. Firms that guarantee their accuracy may cost more, but this money is quickly recouped by the improved results in your tests.


An additional reason for large quantities of data is better representation of the market across all market conditions. For example, it doesn't do to have a system that tested wonderfully against only bull market quotes; such a system hasn’t really stood the test of historical rigor that you should demand of your library.


A system becomes statistically significant when a large number of trades are found. Systems producing a small number of highly profitable trades are less likely to be meaningful; statistical correlation requires a reasonable amount of validation. We use data reflecting all stock and futures activity back to 1968, and typically we look for a minimum of 10,000 trades entered from our system tests.


A corollary of the previous paragraph is that systems that claim to be sensitive to individual instruments or sectors are also less likely to be statistically significant. Limiting the amount of data tested will typically increase the randomness of your results before it will help you isolate successful systems.


Finally, beware of “curve fitting”. Curve fitting occurs when you add so many indicators to a system that only a few very profitable trades remain. Curve fitting can appear to “prove” many things, but these results are invalidated by the low number of incidents resulting.


You may find yourself optimizing indicator parameters against your data. This can also result in curve-fitting: you are manipulating your conditions to match predetermined results. Avoid losing statistical significance by optimizing against only a portion of your data (say one-third), then performing historical tests against the remaining data. (You can use Investigator’s Use Directory feature to add or delete specific directories from testing.)


In general, this section has tried to make you aware of the difference between making your system fit your data and constructing a valid system independent of the data altogether. By keeping your systems general and looking for systems that produce a large number of transactions, you increase your library’s statistical validity and thereby, your potential profits.