Local:Running Fluidity in parallel
Fluidity is parallelized using MPI and standard domain decomposition techniques. If you're a sensible person in sensible company then (you know) you have access to a cluster (or supercomputer) which already has MPI and queuing software installed.
New Options Parallel
Generating a Mesh
Meshes can be generated using any program that outputs triangle meshes, or outputs mesh formats that can be converted to triangle. In particular, there are conversion programs for:
All surfaces on which boundary conditions are applied should have appropriate boundary IDs, and if multiple regions are used then mesh regions should be assigned appropriate region IDs. See respective mesh generator pages for instructions on how to do this.
To be able to use fldecomp, run the following inside your fluidity folder:
The fldecomp binary will then be created in the bin/ directory.
Decomposing the Mesh
To decompose the triangle mesh, run:
fldecomp -m triangle -n [PARTS] [BASENAME]
where BASENAME is the triangle mesh base name (excluding extensions). "-m triangle" instruct fldecomp to perform a triangle-to-triangle decomposition. This will create PARTS partition triangle meshes together with PARTS .halo files.
Parallel Specific Options
In the options file, select "triangle" under /geometry/mesh/from_file/format for the from_file mesh. For the mesh filename, enter the triangle mesh base name excluding all file and process number extensions.
To launch a new options parallel simulation, add "[OPTIONS FILE]" to the Fluidity command line, e.g.:
mpiexec fluidity -v2 -l [OPTIONS FILE]
To run in a batch job on cx1, using something like the following PBS script:
#!/bin/bash # Job name #PBS -N backward_step # Time required in hh:mm:ss #PBS -l walltime=48:00:00 # Resource requirements # Always try to specify exactly what we need and the PBS scheduler # will make sure to get your job running as quick as possible. If # you ask for too much you could be waiting a while for sufficient # resources to become available. Experiment! #PBS -l select=2:ncpus=4 # Files to contain standard output and standard error ##PBS -o stdout ##PBS -e stderr PROJECT=backward_facing_step_3d.flml echo Working directory is $PBS_O_WORKDIR cd $PBS_O_WORKDIR rm -f stdout* stderr* core* module load intel-suite module load mpi module load vtk module load cgns module load petsc/2.3.3-p1-amcg module load python/2.4-fake # This will put the location of the temporary directory into a temporary file # in case you need to check it's progress mpiexec $PWD/fluidity -v2 -l $PWD/$PROJECT
This will run on 8 processors (2 * 4 from the line PBS -l select=2:ncpus=4).
The output from a parallel run is a bunch of .vtu and .pvtu files. A .vtu file is output for each processor and each timestep, e.g. backward_facing_step_3d_191_0.vtu is the .vtu file for step 191 from processor 0. A .pvtu file is generated for each timestep, e.g. backward_facing_step_3d_191.pvtu is for timestep 191.
The best way to view the output is using paraview. Simply open the .ptvu file.
On cx1, you will need to load the paraview module: module load paraview/3.4.0
Limitations / Known Issues
GEM Options Parallel
The first thing to do run flgem/gem as usual. Second, you need to partition the output into subdomains which will be the input files for parallel Fluidity. You should usually do this interactively as its serial and you generally only have to do it once while you may want to rerun the parallel problem multiple times. It's usually not too expensive anyhow. If you have a really large input you can always do this elsewhere on a machine which has sufficient memory and scp the result back to the cluster you want to run on. So, for example:
flgem annulus.gem fldecomp -n 16 annulus
This gem's and partitions the project annulus into 16 subdomains. To run this in parallel you must first modify your batch queue script to request 16 cores. If you are using PBS on the Imperial College Cluster you could do this using:
#PBS -l select=4:ncpus=4:mem=4950mb:icib=true
This also selects Infiniband on the IC cluster. If you're at IC you better be using this option when you're running in parallel or else you might get compute-nodes which only have an ethernet interconnect (i.e. it's going to be slow). It doesn't matter of course if your parallel problem is small enough to sit inside a single SMP node. Finally, add a line to actually run fluidity in parallel:
mpiexec ./dfluidity annulus
When you want to visualize dump files you have two options. Either use:
fl2vtu annulus 1
which will create a parallel VTK file which you can visualize using something like:
mayavi -d annulus_1.pvtu -m SurfaceMap
(note the .pvtu). Alternatively, you can merge the partitions to form a single file
fl2vtu -m annulus 1 mayavi -d annulus_1.vtu -m SurfaceMap
Although be warned - this is not a good idea if your dump files start getting very big in which case you have to think about using Paraview for parallel visualization.
Running in parallel on Ubuntu - OpenMPI
Example 1 - Straight run
gormo@rex:~$ cat host_file rex rex rex rex mpirun -np 4 --hostfile host_file $PWD/dfluidity tank.flml
Example 2 - running inside gdb
xhost +rex gormo@rex:~$ echo $DISPLAY :0.0 mpirun -np 4 -x DISPLAY=:0.0 xterm -e gdb $PWD/dfluidity-debug