Run6 Reco CAUTO operation

Run6 CAUTO operation
Description of the main cron job for Run6 Level2 PRDFs reconstruction

http://www.hep.vanderbilt.edu/~maguire/Run6/cautoOperations.html

Contact Person: Charles F. Maguire
Creation Date: April 4, 2006
Last update June 22, 2006: Add times of cron jobs

Operational steps for cauto production script

Check host
The script will run only on the vmps18 gateway node. If the script finds itself anywhere else, it will exit with a wrong node message. The script runs twice daily at 4:05 AM and 4:05 PM .
Check jobs still running in PBS
If there are any jobs still running in the phenix account from a previous job submission, the script will exit with a report of the number of jobs still running. (Note: this restricts the use of the phenix account in PBS to be only for Level2 PRDFs reconstruction, unless a change is made.)
Check the database status
This check involves three steps.
1. Running of /home/phenix/prod/run6pp/test/checkcalib
  The checkcalib script executes a single line psql command to produce a calib.out file. The calib.out file is a list of run numbers, currently starting at 190283 and ending at 206191. The calib.out file is used by the makefilelist.sh script described below.
2. Running of /home/phenix/prod/run6pp/test/checkrun
  The checkrun script executes two command lines. The first command is a psql command to produce a run.out file. The second command is a grep command on the run.out file to produce a run.info file. The run.info file as a single run number, currently 206859, and this information is used to set the maxrun variable in the makefilelist.sh script. However, this variable is then hardcoded to be fixed at 204639, as of June 16.
3. Checking OK status of the /home/phenix/prod/CONTROL file
  An OK line is checked to be the first line in the CONTROL file. That file was sent to the ACCRE phenix account as part of the nightly database updating done on the VUPAC farm. If there is no such OK line, then the cauto script will exit with such a message.
Move old contents using the move.sh script
This script runs the PERL script move.pl which functions as follows:
1. The first step is to produce anew two files called ok.txt.move and fail.txt.move which are located in the /home/phenix/prod/run6pp/list directory. It is not clear how these two files are used subsequently..
2. The second step is to construct an internal filelist of the full paths of all the log.txt files located in the subdirectories /home/phenix/prod/run6pp/output/runs/batch*/run_* . PRDFs runs (run number and segment number) which have failed. These are noted in the fail.txt.move file.
3. Successful reconstruction output files are placed in the /home/phenix/prod/run6pp/output/newstore areas according to the file type (CNT, DST, ...). These files will be transported to RCF. After the files are transported to RCF they are moved to the equivalent /home/phenix/prod/run6pp/output/store areas.
Execute the cleanup script clean.sh script.
This script runs the two scripts clean_batch.sh and clean_newstore.sh .
1. The clean_batch.sh script removes all the files in /home/phenix/prod/run6pp/output/runs/batch* .
2. The clean_newstore.sh removes all the file in /home/phenix/prod/run6pp/output/newstore/fail/*/* .
Execute the mklist.sh script to make a list of new input files.
The mklist.sh script has three command lines. The first is to execute the makefilelist.sh script which produces files called total_todo*.txt which are located in a list subdirectory. The second command is line count on these files. The third command is for the date. The outputs of these command are written to the mklist.log file. The principle script makefilelist.sh functions as follows:
1. A minimum run number is set to 189838 and a maximum run number was formerly set to 300000, but is now set to 204639.
2. A get_status script is run on the /gpfs2/RUN6PRDF/lvl2_prdf and /gpfs2/RUN6PRDF/prod directories. Three output files are produced: done.txt, ok.txt, and fail.txt.
3. The file list is updated to produce a filelist_all.txt file.
4. The filelist_all.txt file is reduced to a unique names filelist.txt file.
5. The ok.txt file is reduced to a unique names total_done.txt file.
6. A difference is made between the filelist.txt and the total_done.txt files to produce a diff.txt file.
7. The diff.txt file is reduced to a total_todo_all.txt file by checking the relevant run range.
8. Comparison is then made against the list of calibration runs. There are four file lists prepared: todo_all_runlist.txt, todo_runlist.txt, done_runlist.txt, and runlist.txt.
9. The total_todo.txt file is copied to the history subdirectory with a name plus date and time as todo.txt_YYYY-MM-DD_HH:MM:SS.
Execute the job submission script submit.sh.
The submit.sh script has two main commands. The first command does a perl script submit.pl, and the second command sources the launch.csh script. The submit.pl script acts as follows:
1. A identification key is made according to a time command. This identification tag will be attached to the /home/phenix/prod/run6pp/output/runs/batchID subdirectory.
2. A count of the number of jobs to be run is obtained from the total_todo.txt file constructed previously. A maximum limit of 100 jobs is preset.
3. An output file steerlist.txt is constructed from the input file total_todo.txt by eliminating from the beginning of the file a certain number of skipped files. This number of skipped files is presently hardcoded at 0. If there are less than 101 files in the total_todo.txt list, then all of these run numbers are written to the steerlist.txt files. If there are more than 100 files in the total_todo.txt list, then the last 100 only are written to the steerlist.txt file.
4. The /home/phenix/prod/run6pp/output/runs/batchID directory is created and the steerlist.txt file is copied to this directory.
5. A "for" loop is executed for the variable ifile going from 0 to the number jobs to be run. In this for loop a set of commands is constructed to make /home/phenix/prod/run6pp/output/runs/batchID/run_ifile directories which will be the working directories for the reconstruction jobs. Into each of these working directories softlinks will be placed which link to the various input files needed during the events reconstruction.
6. After the /home/phenix/prod/run6pp/output/runs/batchID/run_ifile areas are created and filled, the next step is to create the launch.csh script.
The launch.csh script contains as many lines of commands as there are jobs to run. Each command line has three parts. The first is to change to the working directory for the input file which is being processed. The second part is a sleep for one second. The third part is a qsub command to submit the reconstruction job into the PBS queue. The PBS control file was produced as the launch.csh script was being constructed previously.
Count how many jobs are in the PBS queue with the phenix account name. This count is written to the cauto.log file.

Run6 CAUTO operation Description of the main cron job for Run6 Level2 PRDFs reconstruction

http://www.hep.vanderbilt.edu/~maguire/Run6/cautoOperations.html

Operational steps for cauto production script

Run6 CAUTO operation
Description of the main cron job for Run6 Level2 PRDFs reconstruction