Run6 CAUTO operation
Description of the main cron job for Run6 Level2 PRDFs reconstruction
Contact Person:
Charles F. Maguire
Creation Date: April 4, 2006
Last update June 22, 2006: Add times of cron jobs
- Check host
The script will run only on the vmps18 gateway node. If the
script finds itself anywhere else, it will exit with a wrong node message. The
script runs twice daily at 4:05 AM and 4:05 PM .
- Check jobs still running in PBS
If there are any jobs still
running in the phenix account from a previous job submission, the
script will exit with a report of the number of jobs still running. (Note: this
restricts the use of the phenix account in PBS to be only for
Level2 PRDFs reconstruction, unless a change is made.)
- Check the database status
This check involves three steps.
- Running of /home/phenix/prod/run6pp/test/checkcalib
The checkcalib
script executes a single line psql command to produce a calib.out file.
The calib.out file is a list of run numbers, currently starting at
190283 and ending at 206191. The calib.out file is used by the
makefilelist.sh script described below.
- Running of /home/phenix/prod/run6pp/test/checkrun
The checkrun script
executes two command lines. The first command is a psql command to produce a
run.out file. The second command is a grep command on the
run.out file to produce a run.info file. The run.info
file as a single run number, currently 206859, and this information is used
to set the maxrun variable in the makefilelist.sh script.
However, this variable is then hardcoded to be fixed at 204639, as of June 16.
- Checking OK status of the /home/phenix/prod/CONTROL file
An OK
line is checked to be the first line in the CONTROL file. That file
was sent to the ACCRE phenix account
as part of the nightly database updating done on the VUPAC farm. If there is
no such OK line, then the cauto script will exit with such a message.
- Move old contents using the move.sh script
This script runs the
PERL script move.pl which functions as follows:
- The first step is to produce anew two
files called ok.txt.move and fail.txt.move which are
located in the /home/phenix/prod/run6pp/list directory. It is not clear
how these two files are used subsequently..
- The second step is to
construct an internal filelist of the full paths of all the log.txt files
located in the subdirectories /home/phenix/prod/run6pp/output/runs/batch*/run_* .
PRDFs runs (run number and segment number) which have failed. These are noted in
the fail.txt.move file.
- Successful reconstruction output files are placed in the
/home/phenix/prod/run6pp/output/newstore areas according to the file type
(CNT, DST, ...). These files will be transported to RCF. After the
files are transported to RCF they are moved to the equivalent
/home/phenix/prod/run6pp/output/store areas.
- Execute the cleanup script clean.sh script.
This script runs the
two scripts clean_batch.sh and clean_newstore.sh .
- The clean_batch.sh script removes all the files in
/home/phenix/prod/run6pp/output/runs/batch* .
- The clean_newstore.sh
removes all the file in /home/phenix/prod/run6pp/output/newstore/fail/*/* .
- Execute the mklist.sh script to make a list of new input files.
The mklist.sh script has three command lines. The first is to
execute the makefilelist.sh script which produces files called
total_todo*.txt which are located in a list subdirectory.
The second command is line count on these files. The third command is
for the date. The outputs of these command are written to the mklist.log
file. The principle script makefilelist.sh functions as follows:
- A minimum run number is set to 189838 and a maximum run number was
formerly set to 300000, but is now set to 204639.
- A get_status script is run on the /gpfs2/RUN6PRDF/lvl2_prdf
and /gpfs2/RUN6PRDF/prod directories. Three output files are produced:
done.txt, ok.txt, and fail.txt.
- The file list is updated to produce a filelist_all.txt file.
- The filelist_all.txt file is reduced to a unique names
filelist.txt file.
- The ok.txt file is reduced to a unique names total_done.txt file.
- A difference is made between the filelist.txt and the
total_done.txt files to produce a diff.txt file.
- The diff.txt file is reduced to a total_todo_all.txt file
by checking the relevant run range.
- Comparison is then made against the list of calibration runs. There are
four file lists prepared: todo_all_runlist.txt,
todo_runlist.txt, done_runlist.txt, and runlist.txt.
- The total_todo.txt file is copied to the history subdirectory with a
name plus date and time as todo.txt_YYYY-MM-DD_HH:MM:SS.
- Execute the job submission script submit.sh.
The submit.sh script has two main commands. The first
command does a perl script submit.pl, and the second command sources the
launch.csh script. The submit.pl script acts as follows:
- A identification key is made according to a time command. This
identification tag will be attached to the
/home/phenix/prod/run6pp/output/runs/batchID
subdirectory.
- A count of the number of jobs to be run is obtained from the
total_todo.txt file constructed previously. A maximum limit
of 100 jobs is preset.
- An output file steerlist.txt is constructed from the
input file total_todo.txt by eliminating from the beginning
of the file a certain number of skipped files. This number of skipped
files is presently hardcoded at 0. If there are less than 101 files
in the total_todo.txt list, then all of these run numbers are written
to the steerlist.txt files. If there are more than 100 files in
the total_todo.txt list, then the last 100 only are written
to the steerlist.txt file.
- The /home/phenix/prod/run6pp/output/runs/batchID directory is created
and the steerlist.txt file is copied to this directory.
- A "for" loop is executed for the variable ifile going from 0 to the
number jobs to be run. In this for loop a set of commands is constructed
to make /home/phenix/prod/run6pp/output/runs/batchID/run_ifile directories
which will be the working directories for the reconstruction jobs.
Into each of these working directories softlinks will be placed which
link to the various input files needed during the events reconstruction.
- After the /home/phenix/prod/run6pp/output/runs/batchID/run_ifile
areas are created and filled, the next step is to create the launch.csh
script.
The launch.csh script contains as many lines of commands as there
are jobs to run. Each command line has three parts. The first is to
change to the working directory for the input file which is being processed.
The second part is a sleep for one second. The third part is a qsub
command to submit the reconstruction job into the PBS queue. The PBS control
file was produced as the launch.csh script was being constructed
previously.
- Count how many jobs are in the PBS queue with the phenix account name.
This count is written to the cauto.log file.