NVIDIA – The Perforce / IC Manage Environment at NVIDIA

Doug Quist: NVIDIA, Shiv Sikand: IC Manage, Henry Grishashvili: IC Manage

Get a PDF version of this transcript



NVIDIA uses the Perforce SCM tools as their primary data management solution. The flexibility and open architecture of this system allows customization for the various needs and methodologies of all departments throughout the company. Its scalability enables efficient support of almost four thousand users. The robustness of the Perforce server allows for extended system uptime with 24/7 usage, limited only by scheduled hardware and OS maintenance requirements.

IC Manage (ICM) solutions further extend the capabilities of Perforce. ICM replication software creates live copies of the Perforce server. These copies provide read-only access to the Perforce database, fail-over nodes, and backup systems that allow daily checkpoint creation with no interruption of Perforce user activity. Use of read-only servers dramatically reduces the load on the main Perforce server, allowing large sync jobs from automated build and test farms to occur without interfering with other Perforce tasks. Overall IC Manage solutions greatly enhance the reliability of the Perforce environment by providing “hot” fail-over nodes and daily backups, and by distributing the server load away from the primary server to multiple read-only nodes.

1. Legacy

Initially Perforce was deployed at NVIDIA as a single repository. Its use grew rapidly as users began to appreciate its functionality and ease of use. Source code, documents, binary files – all data that represented value to the company was submitted to Perforce. This made data easy to manage as it was secure and available to all interested users throughout the company. The server was installed on a Sun/Solaris Starfire host with SAN storage connected via a fiber channel link. The Perforce database files and depot were located on a SAN volume which was continuously backed up.

Ultimately the Perforce depot size, large number of users and heavy usage patterns began to cause performance problems. Large numbers of users running sync operations resulted in file system lock contentions on the database and long wait times as a result. A decision was made to split the single Perforce repository into multiple independent servers in order to reduce the load on each individual server, and to help alleviate the lock contention problem. The two largest Perforce server instances after the split were the p4hw and p4sw servers (used primarily by the Hardware engineering and Software engineering groups respectively). A number of additional Perforce servers were created and assigned to other departments. All servers resided on a single Sun/Solaris host with SAN storage for database and depot files.

2. Evolution

As the p4hw and p4sw Perforce servers grew larger the performance problems resulting from CPU load, high virtual memory use and file system lock contentions returned. At peak usage times there were hundreds of p4d processes all sharing a single set of host resources. The first action of the IC Manage plan to improve the NVIDIA Perforce environment was to move each Perforce server to a dedicated Linux/x86 host with expanded RAM and SCSI direct attached storage (DAS) for Perforce database files.

The DAS approach was taken specifically to improve file system random I/O performance and reduce access latencies. It was configured as a 10-disk RAID-10 array in an attempt to maximize the sequential throughput and minimize file seek times. Currently the storage subsystem of each Perforce server at NVIDIA is configured as a 12 SAS disk RAID-10 array driven in parallel by two P800 (HP) controllers. Such a configuration yields a 400MB/sec sequential read/write rate and 100MB/sec random read/write rate (as measured by bonnie++).

Sixty four GB of physical memory together with the XFS file system provided good file system caching and further improved overall random I/O performance. In addition eight x86 CPUs were installed in each server to eliminate the possibility of a CPU bottleneck when running large numbers of p4d processes. Perforce depot volumes were left on a Netapp NAS filer mounted over NFS and with Gigabit Ethernet connection to each server. The large size of the depot (over 10 TB) made it impractical to move it to the DAS. The random I/O performance of these volumes was not nearly as critical for Perforce operation as that of the database volume. Currently Perforce servers at NVIDIA are powered by 8 CPUs and 256 GB RAM.

The result of this re-architecture was a significant improvement of Perforce performance immediately evident to the user community. One metric of the improved performance was journal file growth at three to four times the prior rate, suggesting a much larger number of Perforce transactions were being executed daily.

Another IC Manage innovation introduced to the NVIDIA Perforce environment was a fail-over configuration. Each Perforce server and its live replica were paired with a Linux high availability system. The IC Manage software enabled near real-time Perforce replication to its fail-over pair with only a few seconds of maximum lag time at peak loads. In the event of a hardware/software failure on the main node the Linux high availability system would redirect Perforce service to the fail-over node, making this failure practically transparent to users (it cannot be completely transparent since pending Perforce transactions at the moment of failure would break and have to be repeated).

Fail-over instances were installed for the most critical and most heavily used p4hw and p4sw servers. Another feature of the Perforce fail-over configuration was the convenience of software/hardware system upgrades and maintenance, such as CPU, RAM or Linux kernel upgrades. The upgrades were first done on a fail-over node then the Perforce service was switched to the fail-over node allowing the upgrade to be performed on the main server. This approach allowed server downtime to occur without interruption to the user community.

The read-only server was another enhancement initiated by IC Manage. A read-only server is a separate Perforce server running on a dedicated host powered by IC Manage replication software. Practice showed that there were a large number of Perforce clients (both users and automated scripts) that did not submit data or otherwise modify the Perforce database, other than using sync commands. A sync does modify the db.have table in the database and as such is not strictly a read-only operation. A large number of such clients were switched from using the main Perforce server to using its read-only replica. There were two read-only servers installed initially – one for p4hw and another for the p4sw Perforce server. The result was an improvement of the user experience for both the read-only server clients and clients of the primary server. Read-only servers helped to significantly reduce the load on main Perforce servers.

3. Network architecture

All Perforce servers, fail-over nodes and read-only servers are mounted in the same rack cabinets in the data-center. It is preferable that the Perforce server and its fail-over pair are located adjacent to each other because they need to be connected with an Ethernet crossover cable and a serial cable required for the Linux high availability service. Location of the read-only servers is not critical as long as they are connected to the same LAN. Currently the Perforce journal growth rate at NVIDIA is up to 8 GB per day. Such a high rate can cause high replication latencies at peak times when replicating over a WAN, but will not cause a network bottleneck when replicating over a Gigabit Ethernet even if connected to different subnets. Centralized location of all Perforce servers makes it convenient for IT personnel to perform administrative tasks.

4. Fail‐over Perforce servers

One of the first applications of the IC Manage replication technology in NVIDIA was the introduction of fail-over servers. Based on a Linux High-Availability technology the failover server is acting as a “hot swap” for the entire server hosting the Perforce service. The Linux High-Availability cluster typically consists of two identical servers.

User access to the cluster is provided through a virtual IP which is assigned to one of the two servers. The server holding the virtual IP is the server currently providing all the services to the users, in our case the Perforce server. We’ll call it the “main” node of the cluster. The other node is a live, real-time copy of the “main” node, we’ll call it the “replica”. IC Manage replication technology keeps the “replica” up to date with the “main” server with minimal lag time. In the case of catastrophic failure on the “main” server (HW malfunction, OS failure, etc.) the Perforce on the “main” server is shut off, the virtual IP address is released. Then the virtual IP is acquired on the other node and the Perforce process is started. Thus the entire fail-over process can happen fast and with minimal disturbance for the users.

5. Read‐only Perforce servers

Large build system and verification system farms in NVIDIA are primary users of read-only Perforce servers powered by IC Manage replication software. These systems run massive Perforce transactions such as syncing the entire depot tree. Each p4d process executing such a transaction can allocate up to 2 GB of virtual memory, thus it is crucial to minimize the run time of each transaction in order to prevent memory starvation of the system. The fast I/O subsystem described previously with large RAM based file system caching helps significantly to reduce the runtime of each transaction. In addition special feedback mechanisms were developed by IC Manage support team to manage the number of Perforce transactions originating from build system and prevent virtual memory exhaustion of the server.

There are two groups of automation systems in NVIDIA using read-only servers. Their usage patterns are different and each group has its own dedicated read-only replica Perforce server. The DVS group has a large number of Perforce clients, all of which are created on the read-only server and are synced against it. As the information stored in the database of this server is unique and different from the main server (particularly db.have, db.view and db.domain) backup is required. For this particular purpose IC Manage introduced a “replica of the replica” concept – another server which is replicated both from the main Perforce server and from the read-only server. The use model of the read-only server ensures that there are no conflicts in such replication because the clients used on the read-only server are unique to it. The replica of the read-only server is used to create off-line checkpoints (see checkpoint creation details in chapter 5).

The other team utilizes a different approach – for each verification procedure it creates a temporary Perforce client and syncs it. After the verification procedure is completed the Perforce client and related data can be discarded. Thus the read-only server used by this team contains no unique persistent information and can be rebuilt if necessary from the main perforce checkpoint.

Read-only servers are also used by other departments and individuals. For example: there is a dedicated Perforce replica used for gathering statistical information and producing various reports, and another replica on which the engineering teams run massive ‘p4 integrate -n’ trial transactions.

6. Off‐line checkpoint creation

Another use of Perforce replicas is off-line checkpoint creation. Currently the NVIDIA Perforce database size is close to 500 GB (~800 GB for read-only servers) and the checkpoint size is ~25 GB (gzipped varies for p4sw and p4hw). Checkpoint creation time is approximately eight hours. Considering that Perforce is in use 24/7 by NVIDIA offices around the world even eight hours of downtime per year would be unacceptable, let alone that amount of time each day. One of the features provided by IC Manage replication software is an automatic checkpoint creation process on the replica server which is initiated by a journal rotation procedure on the main Perforce server.

When replication software detects the journal rotation on the main server it finishes replaying the last (newly created) rotated journal, and before starting to replay the main journal (new journal) it creates a checkpoint with the ‘p4d -jd’ command. It uses the -jd as opposed to -jc option specifically so as to not rotate the journal and change the journal counter on replica server. This IC Manage checkpoint creation methodology can be used on a regular replica as well as on ‘replica of the replica’ configurations. NVIDIA has a total of five replicas for daily checkpoint creation: p4sw, p4hw, p4sw-ro, p4hw-ro and p4-icmcds.

7. ICM Broker

The ICM Broker runs as a switch between users and the Perforce server. It keeps a customizable list of ‘true read-only’ Perforce transactions, i.e. transactions that don’t modify the Perforce database in any way. Examples of such transactions are ‘p4 files’, ‘p4 filelog’, ‘p4 fstat’ or various commands run with the -o option. The ‘p4 sync’ command on the other hand, which does write information into the db.have table, is not a true read-only command. The ‘true read-only’ transactions are redirected by ICM Broker to one or more Perforce server replicas. Because these transactions don’t modify the state of the Perforce server there can be any number of such load balancing Perforce replicas. The icmbroker, as well as the load balancing servers, is state-less. Thus the nodes can be added or removed from the cluster “on the fly” without the service interruptions.

Transaction redirection is implemented in the same way as the Perforce Proxy server and is completely transparent to users. One example of where the ICM Broker application is useful is an environment where graphical Perforce clients such as p4v are used. When a p4v user is browsing through the Perforce repository p4v generates a multitude of Perforce queries like ‘p4 dirs’, ‘p4 files’, ‘p4 fstat’ etc. All of these transactions can be redirected to one of the load balancing replicas thus eliminating delays related to high load on the main server.

ICM Broker can also manage transactions based on customizable internal broker rules and the Perforce command, user name, user IP address and arguments. For example, using broker rules it is possible to prohibit users from running commands such as ‘p4 sync //…’ which are rarely intended and place an inordinate stress on the server.

8. Scratch servers

Scratch servers are temporary Perforce servers used by various NVIDIA teams during a projects development life cycle. It is convenient for the collaboration of engineers to submit their work results into the Perforce SCM but the majority of the data can be discarded after the project is completed. Only the final results get submitted to the main Perforce servers, after that the depot and database of the scratch servers are simply deleted, and the server is started fresh. IC Manage Sync Engine software is used by NVIDIA with scratch servers to effectively propagate changes for selected clients/users. The benefit of using Sync Engine is that it allows a large number of clients to be kept constantly synced to the head revision of the entire depot (or any client view) without running costly ‘p4 sync //depot/…’ commands for each monitored client. See chapter 10 for more details on IC Manage Sync Engine.

9. Administration and user support

The robustness and simplicity of the Perforce server architecture and its ease of administration allows just three full-time support specialists to manage all 20 servers deployed at NVIDIA. Perforce support engineers at NVIDIA, contractors employed full time at IC Manage, perform a wide variety of tasks ranging from managing Perforce user help requests to server hardware/OS/software maintenance and upgrades. They also develop various scripts for the Perforce environment and/or users such as monitoring scripts or triggers.

All of the information and status of the NVIDIA Perforce environment such as host names, IP addresses, port numbers etc, is submitted into one of the Perforce servers and is used by the automated administration scripts. A new user creation script for example goes though all Perforce servers to add a new user in order to keep Perforce users and groups in sync across all servers. A Perforce protections table modification script stores protection table changes along with corresponding comments as text files submitted into one of the Perforce servers.

The same team of Perforce engineers also supports other IC Manage software such as the replication software, Sync Engine and ICM Broker. Along with that IC Manage offers 24/7 support remotely or on site for any Perforce or IC Manage related problems. IC Manage Perforce support engineers also provide training activities for NVIDIA IT engineers or new Perforce users.

10. Perforce and IC Manage software at NVIDIA

As was described earlier the IC Manage replication software creates and maintains a near real-time Perforce replica server. It uses the Perforce journal file to replay the main server transactions on the replica. Unlike the Perforce jrep tool icmrep uses socket connection and journal copying for the replication (there is always an up-to-date copy of the journal on the replica machine, so the host loss does not lead to journal loss). It uses an ssh secure communication channel for inter-server data transfers. Broken connections are automatically restored and the replication continues without interruptions. Icmrep doesn’t run queries against the main Perforce server and has a very lightweight process on the main server host, thus it is not adversely affected by server performance degradation at the peak Perforce usage times. It automatically handles journal rotations and is capable of producing checkpoints in between journal rotations if configured as a backup replica.

The ICM Sync Engine keeps a list of managed Perforce clients. Every time a new change list is submitted into Perforce it matches the submitted file names against the managed client views. If there is a match it runs a script on the remote host specified in the client and that script selectively syncs files that were submitted. This way Sync Engine is able to keep a large number of clients synced to the head revision (i.e. user workspaces or workspaces of automated build or QA systems), without requiring p4 sync transactions to be run against the entire client view.

11. Future Projects (IC Manage puller)

One of the upcoming IC Manage projects at NVIDIA is Perforce replication to a remote disaster recovery site. Such sites are physically located in away from the main Data Center to server as a backup center in the events of irrecoverable Data Center loss. IC Manage technologies described earlier are not sufficient to support such replication because all Perforce depots in NVIDIA are located on the NFS volume mounted by all Perforce servers and replicas. Replication technology described earlier in this paper is “metadata only” replication.

“Puller” is the product developed by IC Manage to support “full” Perforce replication. The puller runs in tandem with the icmrep engine and transfers Perforce depot files to the replica server as icmrep is replicating the metadata. Transaction based replication following the “depot files first then metadata scheme” guarantees the consistency of the Perforce replica even in the event of broken replication transaction. For each replication transaction puller is querying the main Perforce server to obtain the list of files for a specific change list number(s). This list is then converted to the list of physical files on a file system and is scheduled to transfer with the rsync tool. Using rsync as a transport solution has several advantages: security of the ssl channel, compression, delta copy, incremental transfer.

The puller has been successfully used by several IC Manage customers for the remote site replication as well as “full” failover and read-only replication.

12. Conclusion

The success of the Perforce SCM system at NVIDIA is largely attributed to the efficient architecture, ease of use, flexibility and scalability of the Perforce server. Constant evolution and improvement of the Perforce server has allowed it to keep up with the high growth rate of the NVIDIA Perforce environment. Using IC Manage software has allowed further broadening of the Perforce server’s abilities and improved performance. Thanks to the enhanced performance, usage of the NVIDIA Perforce environment has tripled in the last few years (based on the average increase of journal growth rate) which in turn have led to increased efficiency of the NVIDIA teams and development processes.