MPI incident
MPI has started showing some issues.
SDC:GRIS Inversion Pipeline v0.1.0
Reading fits files 500/500: |██████████████████████████████|100% (4.3s)
Performing inversion using:
----------------------------------------------------
Very Fast Inversion of the Stokes Vector version 5.0
Node for Spectrograph data: VFISV_SPEC
J.M.Borrero | Leibniz Institut fuer Sonnenphysik.
This software is distributed under GPL 2.0 license.
You can use/modifiy/distribute as you wish as long
as it is under the GPL 2.0 license. Please cite:
Borrero et al. (2011) Sol.Phys. 273, 267 in your work.
----------------------------------------------------
95246.0000
95246.0000
Problem in FRECR
NaN -1763.43201 -50000.0000 15648.5137 2.99792466E+10 3747.69580 NaN 156.485138 1
-----
10.0000000 90.0000000 0.00000000 0.500000000 156.485138 3747.69580 -50000.0000 0.00000000 0.00000000 1.00000000
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DENORMAL
here.... (500, 451, 89) 95246.0
Traceback (most recent call last):
File "/home/vigeesh/conda/envs/gris_env/bin/vfisv", line 33, in <module>
sys.exit(load_entry_point('grisinv', 'console_scripts', 'vfisv')())
File "/home/vigeesh/conda/envs/gris_env/lib/python3.9/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/vigeesh/conda/envs/gris_env/lib/python3.9/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/vigeesh/conda/envs/gris_env/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/vigeesh/conda/envs/gris_env/lib/python3.9/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "grisinv/invert.py", line 410, in main
data, vfisv_data = vfisv(path, id, line, width, numproc)
File "grisinv/invert.py", line 53, in vfisv
common_comm.Send([vfisv_data.SI, MPI.FLOAT], dest=irank + 1, tag=7770)
File "mpi4py/MPI/Comm.pyx", line 272, in mpi4py.MPI.Comm.Send
mpi4py.MPI.Exception: Unknown error class, error stack:
PMPI_Send(157).............: MPI_Send(buf=0x7f7ddb370010, count=20069500, MPI_FLOAT, dest=3, tag=7770, comm=0x84000002) failed
MPID_nem_tcp_connpoll(1794): Communication error with rank 2: Connection reset by peer
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 107904 RUNNING AT itchy.leibniz-kis.de
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0@itchy.leibniz-kis.de] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:897): assert (!closed) failed
[proxy:0:0@itchy.leibniz-kis.de] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0@itchy.leibniz-kis.de] main (pm/pmiserv/pmip.c:169): demux engine error waiting for event
[mpiexec@itchy.leibniz-kis.de] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:73): one of the processes terminated badly; aborting
[mpiexec@itchy.leibniz-kis.de] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:21): launcher returned error waiting for completion
[mpiexec@itchy.leibniz-kis.de] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:179): launcher returned error waiting for completion
[mpiexec@itchy.leibniz-kis.de] main (ui/mpich/mpiexec.c:326): process manager error waiting for completion