Our toolkit for the design and implementation of parallel functional programs supports the stepwise development of parallel programs from a high level sequential specification to an optimised parallel implementation. The toolkit is used as follows: 1. The algorithm to be implemented is specified in a functional language. The program is debugged and tested using an interpreter. 2. The program is compiled for a sequential machine. Its performance is analysed an improved. 3. Annotation driven transformations are applied to the program to indicate parallel tasks. Simulations at task level, basic block level and bus transaction level make it possible to analyse the parallel performance of the program at three levels of detail. 4. When the performance is optimised using the simulators, the program is executed on a genuine parallel machine. Several programs have been developed with the toolkit. A program that simulates tidal flow in an estuary of the North sea is presented as a case study to demonstrate the merits of the toolkit when developing complex parallel programs. The toolkit not only supports the design of parallel applications; it also allows the study of important concepts in parallel computer architecture. These include the behaviour of cached memory systems, bus protocols, scheduling algorithms andmemorymanagement algorithms.