WRF-SFIRE and WRFx on Alderaan

From openwfm
Jump to navigation Jump to search


Initial setup

Following Atipa User Guide Phoenix

SSH

There was no .ssh directory in my account. Then passwordless ssh to compute nodes does not work contrary to the guide page 8. Of course commands over compute nodes such as fornodes -s “ps aux | grep user” also do not work.

Setting up ssh:

ssh-keygen
cat id_rsa.pub >> authorized_keys

Passwordless ssh to head node works fine. Passwordless ssh to compute nodes was disabled for regular users intentionally.

Compilers

[jmandel@math-alderaan ~]$ gcc --version
gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5)
[jmandel@math-alderaan ~]$ ls -l /shared
drwxr-xr-x. 3 root root 17 Mar  5 20:21 aocl-linux-gcc-2.2-5
drwxr-xr-x. 3 root root 23 Mar  5 20:20 jemalloc-5.2.1
drwxr-xr-x  7 root root 80 Mar 19 06:26 modulefiles
drwxr-xr-x. 3 root root 23 Mar  5 20:27 openmpi-4.1.0
drwxr-xr-x  3 root root 23 Mar 19 06:22 openmpi-4.1.0-cuda
ls /shared/openmpi-4.1.0/
gcc-9.2.1
ls /shared/openmpi-4.1.0/gcc-9.2.1/bin
mpiCC   mpicxx   mpif90   ompi-clean   opal_wrapper  orte-server  orterun  oshcc    oshmem_info  shmemc++  shmemfort
mpic++  mpiexec  mpifort  ompi-server  orte-clean    ortecc       oshCC    oshcxx   oshrun       shmemcc   shmemrun
mpicc   mpif77   mpirun   ompi_info    orte-info     orted        oshc++   oshfort  shmemCC      shmemcxx

So gcc is at 8.3.1 and there is no other compiler on the system. Openmpi seems compiled with gcc 9 though. Created .bash_profile with the line

PATH="/shared/openmpi-4.1.0/gcc-9.2.1/bin:$PATH"

Copied example files

mkdir test
cp -a /opt/phoenix/doc/examples test

Fixed missing int before main in mpi-example.c (in the guide called mpihello.c).

The guide says "Intel Math Kernel Library is installed on all Atipa clusters in /opt/intel/cmkl" but there is no such thing.

Modules

But modules are there:

[jmandel@math-alderaan examples]$ module avail
------------------------------------------------ /shared/modulefiles ------------------------------------------------
aocl/2.2-5  gcc/9.2.1  jemalloc/5.2.1/gcc/9.2.1  openmpi-cuda/4.1.0/gcc/9.2.1  openmpi/4.1.0/gcc/9.2.1  

changing .bash_profile to

module load gcc/9.2.1 openmpi/4.1.0/gcc/9.2.1

MPI and scheduler

Copied slurm_submit.sh from the guide, made minor changes

[jmandel@math-alderaan examples]$ mpicc mpi-example.c 
[jmandel@math-alderaan examples]$ cat slurm_submit.sh
#!/bin/bash
### Sets the job's name.
#SBATCH --job-name=mpihello
### Sets the job's output file and path.
#SBATCH --output=mpihello.out.%j
### Sets the job's error output file and path.
#SBTACH --error=mpihello.err.%j
### Requested number of nodes for this job. Can be a single number or a range.
#SBATCH -N 4
### Requested partition (group of nodes, i.e. compute, fat, gpu, etc.) for the resource allocation.
#SBATCH -p compute
### Requested number of tasks to be invoked on each node. 
#SBATCH --ntasks-per-node=4
### Limit on the total run time of the job allocation. 
#SBATCH --time=10:00
### Amount of real memory required per node.
#SBATCH --mem-per-cpu=100
module list
mpirun a.out
[jmandel@math-alderaan examples]$ sbatch slurm_submit.sh

Building WRF-SFIRE

Following Running WRF-SFIRE with real data in the WRFx system

Libraries

Downloaded libraries and built NETCDF following https://www2.mmm.ucar.edu/wrf/OnLineTutorial/compilation_tutorial.php

Note: NETCDF is built with --disable-netcdf-4 !

Changing .bash_profile to

module load gcc/9.2.1 openmpi/4.1.0/gcc/9.2.1
DIR=$HOME/libraries
export CC="gcc"
export CXX="g++"
export FC="gfortran"
export FCFLAGS="-m64"
export F77="gfortran"
export FFLAGS="-m64"
export JASPERLIB="$DIR/grib2/lib"
export JASPERINC="$DIR/grib2/include"
export LDFLAGS="-L$DIR/grib2/lib"
export CPPFLAGS="-I$DIR/grib2/include"
export PATH="$DIR/netcdf/bin:$PATH"
export NETCDF="$DIR/netcdf"

WRF-SFIRE

Following Running WRF-SFIRE with real data in the WRFx system. Tested building by

./configure -d 
./compile em_fire

and submitting by modified slurm_submit.sh with

mpirun -np 1 ./ideal.exe
mpirun ./wrf.exe

WPS

Following Running WRF-SFIRE with real data in the WRFx system. From parent directory:

git clone https://github.com/openwfm/WPS
ln -s WRF-SFIRE WRF
cd WPS
./configure 
./compile >& compile_wps.log &

Replace WRF-SFIRE by the name used. The WPS repository is unmodified fork of https://github.com/wrf-model/WPS, currently frozen at release-v4.2.

PnetCDF

Add to .bash_profile

export MPICC=mpicc
export MPICXX=mpicxx
export MPIF77=mpif77
export MPIF90=mpif90
# export PNETCDF="$DIR/pnetcdf"

Use $PNETCDF only when needed. source .bash_profile again, then

wget https://parallel-netcdf.github.io/Release/pnetcdf-1.12.2.tar.gz
tar xvfz pnetcdf-1.12.2.tar.gz 
cd pnetcdf-1.12.2/
./configure --prefix=$HOME/libraries/pnetcdf
make
make install

Building WRFx

Following Running WRF-SFIRE with real data in the WRFx system

Installing anaconda

Had to add at the end of .bash_profile

unset PYTHONPATH
source ~/.bashrc

Do this before installing anaconda. For some reason the system has PYTHONPATH set which throws anaconda off, and the shell when it starts does not source ./bashrc which anaconda modifies and relies on the implement environment.