Contact Person:
Charles F. Maguire
WWW location
http://www.hep.vanderbilt.edu/~maguirc/PHENIX/buildSL3PHENIX.html
Creation Date: November 2, 2007
Last Revision: November 4, 2007 Explanation of the use of the build.pl script
The purpose of this WWW site is to document the rebuilding of the PHENIX software system on the SL3 platform at Vanderbilt. This rebuilding became necessary as of September 2007 when the RACF moved to an SL4 operating system while the large ACCRE farm at Vanderbilt remained temporarily with an SL3 clone operating system. As it happens, the SL4 binaries built at RACF do not work on the SL3 clone systems at ACCRE. The ACCRE name stands for Advanced Computing Center for Research and Education, and this farm is expected to jump to an SL5 type operating system in early March 2008. Thus, the experience gained in rebuilding on SL3 should be applicable to a future rebuilding on SL5, should that also become necessary. At Vanderbilt, we plan to be doing real data reconstruction for PHENIX in Run8 as we did in Run7, and well as continuing to support the large scale simulation production for Run7 analyses prior to QM'08. Therefore, it is necessary that the PHENIX software continue to work uninterrupted on the ACCRE farm.
There is, to my knowledge, no other existing documentation for rebuilding the PHENIX software system outside of RACF. I have asked if such documentation exists and did not receive any answer. So my starting point was to look at the build.pl script to figure out how the process works, along with using my existing knowledge of the PHENIX software system. For these reasons I don't claim that everything which is described below is completely accurate, and I would be appreciative of receiving the needed corrections.
Roughly speaking, the PHENIX software system can be viewed as having two components. The first component is a relatively static set of software which at RACF is seen in the /opt/phenix area. The second component is the set of software which is rebuilt daily from a CVS checkout, using the build.pl script mentioned in the introduction. I will explain both of these components, as I understand them.
The /opt/phenix area is actually a local
symbolic link to an AFS area at RACF, specifically
/afs/rhic.bnl.gov/@sys/opt/phenix. The @sys is the
usual AFS token which translates the operating system of the user's
workstation. Currently, on the SL4 operating systems at RACF, the
@sys translates to i386_sl4. So, if from an RACF workstation
one gives the command
Previously, when RACF was at SL3, then the @sys for the workstation
would translate as i386_sl305. So if at RACF one gives the command
bin doc etc include info lib macros man root -> root-5.17.01 root-5.17.01 share stowOne notices that PHENIX has its own built version of the ROOT software system, as opposed to using the pre-built binaries available from CERN. I leave it to the PHENIX software experts at RACF to give the exact details of why this is necessary, although I think I know why in general it has to be done.
The /opt/phenix/bin area has binary versions which the PHENIX software needs to use instead of what may be normally found in the system /usr/bin area, for example. The /opt/phenix/bin area precedes the /usr/bin area in the PATH environment variable. Consequently, one will be using the PHENIX version (1.7) of the automake binary instead of using the one to be found in the /usr/bin area (1.9.2 at RACF, 1.6.3 at ACCRE). The PHENIX build system will fail with the 1.6.3 version of automake, but will work with the 1.7 version. I have not experimented with the 1.9.2 version of automake for PHENIX use. It is then obvious that if one wants to rebuild PHENIX software outside of RACF, then the PHENIX versions of the binaries in /opt/phenix/bin should be used. In turn, that means that one needs to make a tar-archive file of the /opt/phenix/bin area, and port it to the local site as a first step.
Originally, I thought that it would be sufficient to have a copy of the /opt/phenix/bin area at the local site. However, this is not enough. One must have an /opt/phenix symbolic link also working at the local site. The /opt/phenix path is hardcoded into parts of the PHENIX software system, and these parts of the PHENIX software system will fail if there is not a local version of the /opt/phenix symbolic link. At ACCRE, it took some extra pressure on my part to have such a link installed. It was considered a poor programming practice by the software engineers here, representing an extra bit of maintenance that they had to do for one group at the ACCRE farm. They did not want it to become a precedent for other groups asking for their own private symbolic links, nor our group asking for still more symbolic links. I told the ACCRE software engineers that I would investigate trying to remove this dependence from the PHENIX software, but I am not sufficiently knowledgeable of the whole software system to do that at this time.
A last point on this subject is that the compute nodes on the ACCRE farm do not have access to AFS. So even if I were able to set up an /opt/phenix pointer to an AFS area at RACF, and assuming that there was no confusion with the @sys translation token, then these compute nodes would not be able to see the needed libraries. I believe this would affect the database access software, at least, on these compute nodes. For this same reason we have to provide local copies of PHENIX input files normally found in AFS areas when running at RACF, when we run PISA or data reconstruction at Vanderbilt.
From the above description it becomes clear that one will have to make
archive copies various /opt/phenix SL3 version subdirectories
and port these to the local system. They must be visible under the local
/opt/phenix/... path. In particular, I have found it necessary
to port archives of the following subdirectories to ACCRE
bin etc include lib root -> root-5.16.00 [version pointed to by the Sep. 16 OFFLINE_MAIN area which was the last one at SL3] root-5.16.00 share stowYou might notice in the above listing that root-5.16.00 is apparently being used. In reality, this ROOT version starts up showing 5.17.01 even though the libraries are marked as 5.16.00. When I first noticed that discrepancy several weeks ago, I posted an inquiry on phenix-off-l, but received no response. I am assuming that 5.17.01 is actually being used, and someone has mis-typed something. Another possibility is that there is a mixture of 5.16 and 5.17 versions of ROOT that was found to be necessary at the end for SL3.
The PHENIX database is used during real data reconstruction in order to obtain the run dependent calibration constants and other run dependent information. The database also contains geometry information about the detector which is used in the simulation and real data reconstruction. The database has not been accessed by the PISA program in the past for the generation of simulation data. A historical side benefit of that feature was that the PISA could run on remote site systems where the database was not, or could not be due to OS issues, installed. As of 2007, the phnx.par geometry file information used by PISA has been put in the database with the idea that it would be easier to maintain that way especially in view of the pace of recent upgrade installations. The PISA program can still run without having access to the database if a phnx.par ASCII file is used as before.
In the transition from the SL3 to the SL4 operating system, some new wrinkles were added to the database environment for PHENIX. The database environment has depended upon the presence of a local /opt/phenix/etc directory. For the SL3 version of the PHENIX software, this directory contained the following files at RACF:
odbc.ini odbc.ini.master ODBCDataSources -> ../stow/unixODBC-2.2.11/etc/ODBCDataSources odbc.ini.phenix odbcinst.ini odbcinst.ini.irina odbcinst.ini.old odbcinst.ini.phenix wgetrc -> ../stow/wget-1.8.2/etc/wgetrcBy comparison, the SL4 version of this directory contains
odbc.ini odbc.ini.master odbcinst.ini ODBCDataSources -> ../stow/unixODBC-2.2.12/etc/ODBCDataSources odbc.ini.phenix odbcinst.ini.irina odbcinst.ini.old odbcinst.ini.phenix wgetrc -> ../stow/wget-1.8.2/etc/wgetrcOne sees that there has also been a transition to unixODBC-2.2.12 from unixODBC-2.2.11 in the transition to SL4 from SL3. For the purposes of porting the software to a remote site, the two important files in this subdirectory are odbc.ini and odbcinst.ini. The PHENIX database software system appears to be hardcoded to be looking for these two files in the /opt/phenix/etc subdirectory, or in the case of the odbc.ini an .odbc.ini substitute file in the user's home directory, if present. Upon closer inspection, it can also be seen that there has been a change in the odbcinst.ini file during the transition from SL3 to SL4. The SL4 version of odbcinst.ini contains reference to a file named /opt/phenix/lib/psqlodbcw.so while the SL3 version has the name /opt/phenix/lib/psqlodbc.so. That is, a w has appeared in the name of this library for the SL4 version. There is no /opt/phenix/lib/psqlodbcw.so file in the SL3 library set but there is a /opt/phenix/lib/psqlodbc.so file. There is a corresponding change in the associated .la files for these two libraries.
The net effect of these changes in the SL3 to SL4 transition is that if one tries to run a 'pure' SL3 version of the currently checked version of the database software at a remote site, then the following error messages will appear during an attempted database access for reconstruction:
/gpfs0/home/phnxreco/new/source/offline/database/pdbcal/pg/RunToTimePg.cc:81: Fatal Exception caught during DriverManager::getConnection Message: Failed to connect to datasource: [unixODBC][Driver Manager]Can't open lib '/opt/phenix/lib/psqlodbcw.so' : /opt/phenix/lib/psqlodbcw.so: cannot open shared object file: No such file or directoryThe new version of the database software, as checked out from CVS, is apparently predisposed to look for the /opt/phenix/lib/psqlodbcw.so library. On October 31 I reported this problem to several of the PHENIX software experts at BNL. I received a prompt reply that same day from one of them saying that this was a real issue, and it would be investigated as to how to do a final correction. A little later that same day, while waiting for the official correction information to arrive, I decided to do a 'quick-and-dirty' fix. That fix involved doing the following two softlinks in the local /opt/phenix/lib SL3 area:
lrwxrwxrwx 1 phnxreco rhic 11 Oct 31 15:19 psqlodbcw.la -> psqlodbc.la lrwxrwxrwx 1 phnxreco rhic 11 Oct 31 15:20 psqlodbcw.so -> psqlodbc.soThe above softlinks have the effect of tricking the SL4 version of the PHENIX database software into using the SL3 version of these two files. Mixing different generations of files is not generally a wise thing to do, and I wasn't really expecting it to work. Surprisingly enough, however, it did work to the point of the database software not overtly complaining that something was wrong. The above run-time error messages no longer appeared, and the reconstruction output seems to be normal. Further checking will be needed to see if this trickery strategy is completely working.
The setting up of a local PHENIX database server system, and the procedures for doing daily
updates for that server, are beyond the scope of this web site. If someone at another remote site
needs help in doing these things, they are welcome to contact us at Vanderbilt for assistance.
We have had a long, and at times painfully won, experience in setting up PHENIX database servers
here to work for both real data reconstruction and for simulation data reconstruction. One other
item to finish the discussion of database porting issues is that at Vanderbilt we do the following
set environment command
This section contain the locations of gzipped tar files which are copies of the /opt/phenix subdirectories developed at Vanderbilt. The original versions of these files are based on the August 6 version of the SL3 /opt/phenix area at RACF. That date was apparently the last change in these subdirectories before the transition to SL4 which was made in September. The tar file copies are all located in the /phenix/subsys/sim/maguire/SL3/optPHENIX subdirectory at RACF. In addition, I indicate a daily rebuild archive file which worked as of November 2, where the rebuild was done at the SL3 Vanderbilt farm. This file is located in the /phenix/subsys/sim/maguire/SL3 area.
There is one caution to be attached to the above testing. The /opt/phenix area was the SL3 version of the /opt/phenix area at RACF. So even though the libraries and binaries from the tar files above were the only ones in the explicit path dependencies of the test jobs, any of that software which had hard-coded references to /opt/phenix would be using the RACF version. There has been work on the SL3 RACF version in early November, perhaps as a fallout from this need to rebuild the software in SL3, when the last previous change in that version had been in early August.
Finally on the subject of the /opt/phenix software, it will be important to have the instructions on how to build those binaries and libraries from a CVS checkout. My assumption is that there does exist an automated procedure for doing this build, or otherwise re-doing the build manually during each OS transition would be exceedingly tedious. In this regard, I am looking ahead to next March when it may become necessary to have an SL5 version of /opt/phenix working at Vanderbilt.
The build.pl script is used daily at RACF to checkout the latest CVS versions of all the PHENIX software and then rebuild these libraries in a well defined sequence. This ordered sequence is specified by a packages.txt ASCII file. The different packages (or modules) must be compiled in a given order because some modules depend on classes which were defined in a previous module. The packages.txt includes the on-line and event builder systems, the phool system, the database system, the simulation system, the off-line reconstruction system, and the Analysis Train system in that order.
As mentioned previously, I am not aware of any external documentation of the build.pl. So my knowledge of how it works comes from reading the script itself and seeing how it performs. The script can take as many as 8 different options, besides a 'help' request. Among these options is one for which CVS tag to use, as might be the case if one were building a new production ('pro') version for a given Run. At present I run the build without any options, except that I specify a pre-existing source directory to avoid the long delay needed to check out the whole 'phuniverse' of the sources specified by the packages.txt file. I will go back to checking out all the code when I have set up a daily, overnight rebuild at Vanderbilt.
A second change to save time which I have done is to use a modified packages.txt file which omits the Analysis Train modules. For purposes of real data production and simulation projects at Vanderbilt, we will not need those modules. Instead, for now, I have a simple analysis job which scans the simulated CNT output file.
The build.pl script when initiated without any input options will create a new directory in the user's home directory. Then a new/source directory will be created to store the contents of the source files obtain from a CVS checkout command. After the CVS checkout is completed a new/build directory will be created to receive the output of cycling through the autogen.sh scripts for all the source Makefile.am files. Finally an install.1 directory will be created with bin, include, lib subdirectories which collect the results of all the compilation and loading steps. This install.1 directory is essentially what constitutes a new OFFLINE_MAIN area as it appears after each successful rebuild at RACF. A copy of one such install.1 directory has been provided in the /phenix/subsys/sim/maguire/SL3 directory at RACF. Again, this install area built at Vanderbilt does not contain the Analysis Train modules.
The quality assurance checking for the SL3 build results has barely begun. Ideally we want the results from the SL3 build to be identical to the output from the SL4 build in all respects. For this purpose we can use the pisaRootRead output NTUPLEs to compare PISA output results, and the simCNTCheck output NTUPLE to check the simulated reconstruction. I have a similar real data checking module which I used to track down problems during the early Run7 minimum bias production with pro.76 at Vanderbilt in May this year. I will post these comparison results as they become available. We have a separate, small SL4 farm at Vanderbilt which we can use to generate rapidly SL4 results using a standard port of the RACF-built software to this farm.