Bernard PREVOSTO wrote:
Bonjour,
en effet, je m'en souviens le samedi on a eu un probleme sur le serveur disque ce qui a entrainé des probleme sur PBS au redemarrage, j'ai constaté que ces 3 jobs étaient en attente, mais ne redémarraient pas je les ai donc tué Désolé de ne pas vous avoir prévenu
Ce n'est pas grave, nous nous demandions si c'était un probleme qui venait de nous (trop d'espace disque utilisé, trop de mémoire, de temps de calcul,...) étant donné que c'étiat notre premier test à grande échelle. A priori ce n'était pas le cas, c'est donc plutôt une bonne nouvelle. Je vais donc relancer ces plans de simulation Jean Couteau Code Lutin
Bernard
Jean Couteau a écrit :
Tina ODAKA wrote:
you said you submitted 3 jobs, can you send me all 3 job-id ? thanks, tina
the 3 jobs are :
99338[].service4 99198[].service4 99346[].service4
Jean
Jean Couteau a écrit :
Tina ODAKA wrote:
hi jean, you need to check what was 'pbs job id' that is the number you get when you do qsub xxx and with this number you (or I)can type tracejob -n days xxx (days is number of days you submited before, like today is 4th, you submitted on 28 thus it is 7)
to see when the job died because of what. Ok, so I got that :
poussin@service4:~> tracejob -n 7 99338[].service4
Job: 99338[].service4
11/28/2009 12:36:28 S Job Modified at request of Scheduler@service4.ice.ifremer.fr 11/28/2009 12:36:28 A user=poussin group=emh jobname=simulation-as_S queue=sequentiel ctime=1259336077 qtime=1259336077 etime=1259336077 start=0 array_indices=0-1574 Resource_List.mem=3gb Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:mem=3gb:ncpus=1 Resource_List.walltime=96:00:00 11/28/2009 12:37:07 L Considering job to run 11/28/2009 12:37:07 L Queue sequentiel per-user job limit reached 11/28/2009 12:37:13 S delete job request received 11/28/2009 12:37:13 S Job to be deleted at request of root@service4.ice.ifremer.fr 11/28/2009 12:37:13 A requestor=root@service4.ice.ifremer.fr 11/28/2009 12:37:20 S delete job request received 11/28/2009 12:37:20 S Job to be deleted at request of root@service4.ice.ifremer.fr 11/28/2009 12:37:20 A requestor=root@service4.ice.ifremer.fr 11/28/2009 12:37:21 S dequeuing from sequentiel, state 7 11/28/2009 12:37:21 A user=poussin group=emh jobname=simulation-as_S queue=sequentiel ctime=1259336077 qtime=1259336077 etime=1259336077 start=0 array_indices=0-1574 Resource_List.mem=3gb Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:mem=3gb:ncpus=1 Resource_List.walltime=96:00:00 session=0 end=1259411841 Exit_status=0
poussin@service4:~> tracejob -n 7 99198[].service4
Job: 99198[].service4
11/28/2009 12:36:17 L Considering job to run 11/28/2009 12:36:17 L Queue sequentiel per-user job limit reached 11/28/2009 12:36:24 S delete job request received 11/28/2009 12:36:24 S Job to be deleted at request of root@service4.ice.ifremer.fr 11/28/2009 12:36:24 A requestor=root@service4.ice.ifremer.fr 11/28/2009 12:37:13 S delete job request received 11/28/2009 12:37:13 S Job to be deleted at request of root@service4.ice.ifremer.fr 11/28/2009 12:37:13 S dequeuing from sequentiel, state 7 11/28/2009 12:37:13 A requestor=root@service4.ice.ifremer.fr 11/28/2009 12:37:13 A user=poussin group=emh jobname=simulation-as_r queue=sequentiel ctime=1259331670 qtime=1259331671 etime=1259331671 start=0 array_indices=0-1499 Resource_List.mem=3gb Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:mem=3gb:ncpus=1 Resource_List.walltime=96:00:00 session=0 end=1259411833 Exit_status=0 11/28/2009 12:37:20 S delete job request received 11/28/2009 12:37:20 S Unknown Job Id
poussin@service4:~> tracejob -n 7 99346[].service4
Job: 99346[].service4
11/28/2009 12:37:07 L Considering job to run 11/28/2009 12:37:07 L Queue sequentiel per-user job limit reached 11/28/2009 12:37:13 S delete job request received 11/28/2009 12:37:13 S Job to be deleted at request of root@service4.ice.ifremer.fr 11/28/2009 12:37:13 S dequeuing from sequentiel, state 1 11/28/2009 12:37:13 A requestor=root@service4.ice.ifremer.fr 11/28/2009 12:37:20 S delete job request received 11/28/2009 12:37:20 S Unknown Job Id