24 Shared Registration System (SRS) Performance

Prototypical answer:

gTLDFull Legal NameE-mail suffixDetail
.artDadotart, Inc.deviantart.comView

Q24 - Shared Registration System (SRS) Performance

CORE Internet Council of Registrars provides a unified registration system for its members since 1997. This system grants access to a multitude of top-level domain registries, currently including .com, .net, .org, .info, .biz, .name, .us, .asia, .eu, .coop and .tel domains, via a single entry point. The activities concerning the CORE Registration System provide CORE with a great deal of expertise and know-how regarding the implementation, operation, maintenance and support of a shared registration system, facing a very heterogeneous user group regarding location, language, enterprise size and structure.

CORE is also handling the technical operation of the .cat and .museum TLDs on behalf of the puntCAT and MuseDoma registries. This proves that CORE has the knowledge and experience necessary to provide the offered registry services.

1. High-Level System Description

The Shared Registry System for the .ART Registry is a local installation of the CORE Registration System, developed by CORE. Consequently, the SRS is compliant with the various relevant standards for EPP (s. Question 25), Whois (s. Question 26), DNS (s. Question 35), DNSSEC (s. Question 43) and IDNs (s. Question 44).

Each registry service is handled by its own server. Overall, the services are set up ensuring n+1 redundancy. It is envisaged that further frontends will be added later, when increasing system usage requires such a step.

1.1 Multiple sites

The .ART Registry as a whole is distributed among a set of independent sites. Besides the geographical diversity of the sites, each site is designed to be independent of other sites. A complete failure of one site or of related infrastructure (i.e. upstream providers) does not affect the operation of the others. No networks or vital base services (like DNS resolvers, LDAP or SMTP servers) are shared among the sites.

For the main registry operation, i.e. all services except the name servers, two sites are designated, the primary one in Dortmund, Germany and the secondary one in Amsterdam, the Netherlands. Name servers, as far as operated by the .ART Registry itself, are located on other sites. Other name servers operated by contractors can be seen to be operated on other sites as well in this context.

To support scalability of the system, the SRS is modularised into components where possible. Components are allowed to run on different machines, so that the overall load of the system can be distributed hardware-wise. This approach also improves the efficiency of cluster technologies and fail-over strategies within a site.

Some components, for example the EPP interfaces to the registrars, are allowed to run in multiple instances if necessary. With the help of load balancers, the incoming requests are distributed to these instances. By directing the load balancers to exclude an instance, this instance can be maintained with respect to both hardware and software. The latter allows minor patches to be applied to the SRS software without interrupting the actual service.

Each of the two .ART Registry sites contains the full set of components that are required for operation and provides for redundancy. Under normal conditions, the primary site is active, while the secondary is inactive (components are in hot standby). In case of failure or maintenance that cannot or should not be compensated by redundant systems on the active site, the inactive site can take over the operation. The full switch-over, however, is not a requirement. Since the system consists of multiple subcomponents, the task of a failed subcomponent on one site can be transferred to the mirror subcomponent on the other site, while the other subcomponents remain on the first site. This gives the administration team freedom and flexibility to react to an incident and to minimise the impact on users. Switching of services is done using HSDNS pointers, see the answer to Q32, System and Network Architecture, for details.

The various sites are interconnected by virtual private networks (VPNs). This ensures the security and confidentiality of the communication. The VPNs are used both for data transferred between the sites as part of the .ART Registry operations (e.g. zone files to the name servers, replication data between the databases, data feed of the Whois servers) and for administrative purposes, including monitoring.

In the unlikely event of a simultaneous outage of multiple components that makes it impossible to provide the service at the SRSʹs main operating site (data centre) in spite of the redundancy provided within each site, or in case of natural⁄man-made disaster at that main site, a switch-over to a different site is possible. Thanks to continuous database replication, the other site is equipped with the entire data of the repository.

Figure Q24-F1 presents a ʺbird viewʺ on the registryʹs sites, the services hosted at these sites (as described above), as well as the connections between them. The meanings of the graphical elements and symbols is described in Figure Q24-F2 (which provides a legend for all graphics attached to the answers throughout this gTLD application).

Figure Q24-F3 shows the overall structure of the registry systems per site. The various depicted resources and the relationship between them are described in detail in the answer to Question 31, Technical Overview of Proposed Registry, et seqq.

1.2 Software Development

Like all crucial components of COREʹs registry system, the SRS has been developed from scratch by CORE staff or vendors . The custom-built main server component consists of 100% Java code. While it utilises a couple of proven, open-source third-party libraries and products (such as SLF4J for logging and PrimeFaces for the web applications), the core registry functionality remains fully under COREʹs control and may thus be customised as needed.

1.2.1 Change Control

All Java code comprising COREʹs SRS is maintained in a repository managed by Subversion (SVN), the leading open-source revision control system. All code check-ins into this repository either into the SVN trunk or into dedicated development branches (for larger additions or changes) are closely monitored by senior developers.

Software releases meant to be deployed on staging, OT+E or production environments (see below and answer to Question 23, Registry Services) are always built from so-called ʺreleaseʺ branches within the SVN repository, i.e. not from the SVN trunk or development branches. Such branches are essentially snapshots of the code known to offer stable functionality with regard to a certain specification of the system. The exclusive use of these release branches ensures that no inadvertent changes from SVN trunk or development branches are affecting code deployed on systems used by registrars or the public.

1.2.2 Quality Assurance

Each release scheduled to be deployed undergoes a series of extensive tests by an internal QA team within CORE. This includes functional tests, but also stress tests to evaluate the systemʹs behaviour under extreme load conditions.

Any issues found during these tests are reported back to the developers via JIRA, a widely used, enterprise-grade ticketing and issue system. Only after all issues were fixed to the satisfaction of the testers, a release is deployed usually on the staging system first (also to give registrars an early opportunity to test their client systems against the new version), then on OT+E and production.

In addition to functional and stress testing, COREʹs developers also write so-called unit tests with JUnit, a widely used Java unit testing framework that greatly facilitates regression testing.

1.3 Synchronisation Scheme

The synchronisation scheme is designed to enable any of the two sites to act as the master. However, in all cases except emergency and short annual fail-over tests, the system in Dortmund is the master. Data is synchronised on database level in real time.

The database software used will be PostgreSQL 9 (current version). There are four database systems altogether: two at the primary site (Dortmund) and two at the secondary site (Amsterdam). At any time, one of these four systems is active. Its data is replicated to the other three systems: locally to the other system at the same site and remotely to the other site, where a local copy is maintained, too.

2. System Reliability, Stability and Performance

2.1 Outage Prevention

2.1.1 Data Centre Precautions

The data centres hosting the system components of the .ART Registry have taken various precautions to ensure a continuous operation, such as backup power supply, technical and facility security. Please refer to the answer to Question 31, Technical Overview of Proposed Registry, for more details.

2.1.2 Availability by Design

The general system design includes various features to reduce the risk of outages. These are summarised in the following paragraphs.

The network infrastructure of the SRS is designed to compensate a failure of one of its components. This is achieved by doubling each of these components, i.e. the firewall⁄VPN system, the load balancer and the switches that represent the internal backbone. They are operated in an active-active configuration. All servers within the system are equipped with two Ethernet interfaces for each logical connection. Where applicable, the components themselves are equipped with redundant power supplies. The interconnection between the servers and the network components provides redundant paths between each two nodes without a single point of failure. For more details please refer to Question 32, System and Network Architecture.

For the database system used by the SRS, double redundancy is provided. Firstly, there are two database servers, a primary and a secondary one. The secondary database is operated as a hot-standby solution. Secondly, there are two more database servers at the secondary site. The database data at the active site is replicated to the non-active site.

To process the EPP requests of the registrars, multiple systems are provided, which run the SRS software simultaneously. A load balancer distributes the incoming requests to these systems. An outage of one server does not interrupt the service. Although the available computing power is reduced by such an outage, the provisioned spare capacities ensure that the overall performance does not violate the service level agreement.

In the unlikely event of a simultaneous outage of multiple components that makes it impossible to provide the service, or in case of natural⁄man-made disaster at the ʺmainʺ site, a switch-over to the ʺmirrorʺ site is performed. Thanks to continuous database replication, the mirror site is equipped with the entire data of the repository. Depending on the nature of the main siteʹs failure, a limited data loss regarding transactions that were performed in the last few minutes of main site uptime may occur. Compared to the damage caused by a long-term outage, this is considered negligible.

The actual switch-over procedure consists mainly of the following steps: Complete shutdown of the main site if necessary. Despite the failure, some components may still be in an operative state. To avoid interference with the mirror site, these are deactivated. IP address change of the DNS address records belonging to externally visible servers to the corresponding servers on the mirror site. To facilitate this, a short time-to-live (TTL) setting will be used, and registrars are advised to use solely domain names to connect (not IP addresses). Name servers and Whois servers are reconfigured to use the mirror site as their data source. The registrars are informed about the switch-over, enabling them to adapt or restart their clients if necessary.

The Whois subsystem has the intrinsic ability to run an arbitrary number of Whois instances in geographically diverse locations (all fed from the same data source in a near-realtime fashion). The Whois servers operate their own databases for managing the Whois data. Load balancers are used to distribute the incoming requests to these instances. In such a setup, the outage of a single Whois instance will not disrupt Whois services for Internet users. Additional Whois servers can be added quickly to the existing setup if need be.

The huge number of different name server locations used by CORE and the involved diversity (in terms of both geography and network topology) provide a high degree of inherent protection against DNS outages. In particular, the use of state-of-the-art Anycast methodology ensures that a server will be able to respond to requests as long as at least one of the sites in its Anycast cloud is available. In addition, reliable facilities with sufficient redundancy are provided at the individual sites hosting the name servers.

2.1.3 Hardware supplies and Software Availability

The data centres will keep spare parts for all critical hardware involved, which allows fast replacement in case of hardware failures. In addition, continuous 24⁄7 phone and on-site support from the vendors ensures the availability of hardware and software, including operating systems. Contracts guarantee that out-of-stock components are delivered within hours.

2.2 Performance Specifications

All components of the registry system (SRS, Whois, DNS) are operated in full compliance with ICANNʹs performance requirements as set forth in Specification 10 of the gTLD Applicant Guidebook. In particular, the SRS will meet the following specifications.

2.2.1 SRS Performance

Upper bounds for the round-trip time (RTT) of EPP requests have to be met by at least 90 per cent of all commands. The upper bound for session commands (login, logout) is four seconds, for query commands (check, info, poll, transfer) it is two seconds and for transform commands (create, delete, renew, transfer, update) it is four seconds. The downtime of the EPP service will be not more than 12 hours per month.

2.2.2 Registration Data Directory Services (RDDS) Performance

The upper bound for the round-trip time (RTT) of RDDS queries and for the RDDS update time has to be met by at least 95 per cent of all queries⁄updates. The upper bound for the collective of ʺWhois query RTTʺ and ʺWeb-based-Whois query RTTʺ is two seconds. The upper bound for the update time (i.e. from the reception of an EPP confirmation to a domain⁄host⁄contact transform command until the RDDS servers reflect the changes made) is 60 minutes. The downtime of the RDDS service will be not more than 8 hours per month, where non-availability of any service counts as downtime.

2.2.3 DNS Performance

The upper bound for the round-trip time (RTT) of DNS queries and for the DNS update time has to be met by at least 95 per cent of all queries⁄updates. The upper bound for the TCP DNS resolution RTT is 1500 milliseconds, for the UDP DNS resolution RTT it is 500 milliseconds. The upper bound for the DNS update time (i.e. from the reception of an EPP confirmation to a domain transform command until the name servers of the parent domain name answer DNS queries with data consistent with the change made) is 60 minutes. The downtime of the DNS service will be zero, i.e. continuous availability of this service is assured.

2.3 Operational Scalability

Operational scalability is primarily achieved by the underlying architecture of the components comprising the CORE Registration System.

The software used for the processing of EPP commands is designed to run on multiple systems simultaneously. Due to the fact that the software makes extensive use of Javaʹs multi-threading capabilities, it scales well with the number of processors in each system. Therefore, long-term scalability due to increased registry activity can be accomplished by extending the system with additional processors and⁄or machines.

The SRS is dimensioned to run with about ten per cent load during regular operation. The initial system is able to handle the additional load resulting from increased domain numbers. To further cope with temporary unexpected load peaks, CORE ensures that at least 100 per cent spare capacity is available all the time.

The above measures can be applied to scale the system from handling 10000 names to up to 20 million names and beyond. The initial capacity will be 1 million names and can be increased in steps of at least 1 million names within a mutually agreed time frame.

An important point is fair and acceptable use of system resources by registrars. As far as transaction numbers are concerned, the .ART Registry subjects registrars’ access to acceptable use policies that forbid wasteful use of system resources. The registry systematically avoids situations where registrars or potential registrants find themselves under pressure to enter into a race against one another with respect to registry system resources. This applies in particular to launch phases, where a contention resolution mechanism (including the use of auctions) replaces time priority. The .ART Registry furthermore imposes acceptable use restrictions to prevent the abuse of grace periods.

Additionally, the number of concurrent EPP connections per registrar is limited to a certain maximum, which is initially set to 10. Rate limiting is also implemented by limiting the EPP requests within a sliding window of one minute to a configurable number, in order to prevent monopolisation of the service by one registrar.

Thanks to these measures, the .ART Registry avoids disproportionate demand for registry resources.

3. Employed Hardware

For server and storage systems, products of HP are to be used. Network equipment products of CISCO, HP, Juniper and Foundry are to be used. Employment of upgradable blade and RAID systems as well as ensuring redundancy of network components, power supplies and such increases not only scalability, but also availability and data integrity.

The database server as the central system component is dimensioned to be able to keep the relevant database content in memory to avoid slow disk I⁄O operations. An HP server system with 2 six-core 3 GHz CPUs and 48 GB RAM will be used. All other servers will be equipped with 24 GB of RAM. The database server is connected to a storage attached network (SAN), which is connected to a high-performance RAID system, namely HP P6300 EVA 2.4 TB SFF SAS.

4. Resourcing Plans

4.1 Implementation

Since the CORE Registration System itself has already been implemented, no resources are necessary for the initial implementation. For setting up and configuring database servers, firewalls and so on, the following resource allocations are estimated:

System Administrator: 25 man hours;

Network Operation Centre Officer: 25 man hours;

DNSSEC Signing Operator: 5 man hours.

4.2 Ongoing Maintenance

For ongoing maintenance and occasional adaption of the system, the following resource allocations are estimated:

System Administrator: 5 man hours per month;

Network Operation Centre Officer: 5 man hours per month;

Software Developer: 2 man hours per month;

Quality Assurance Agent: 1 man hour per month;

DNSSEC Signing Operator: 1 man hour per month.

Employees already working for CORE Internet Council of Registrars will be handling these tasks. The numbers above were determined by averaging the effort required for comparable tasks conducted by CORE in the past over the course of 12 months.

Similar gTLD applications: (33)

gTLDFull Legal NameE-mail suffixzDetail
.quebecPointQuébec Incpointquebec.org-2.51Compare
.eusPuntueus Fundazioapuntueus.org-2.51Compare
.galAsociación puntoGALpuntogal.org-2.51Compare
.barcelonaMunicipi de Barcelonajosoc.cat-2.51Compare
.bcnMunicipi de Barcelonajosoc.cat-2.5Compare
.telefonicaTelefónica S.A.interdomain.org-2.5Compare
.seatSEAT, S.A. (Sociedad Unipersonal)abril.cat-2.5Compare
.movistarTelefónica S.A.interdomain.org-2.5Compare
.terraTelefónica S.A.interdomain.org-2.5Compare
.mangoPUNTO FA S.L.ubilibet.com-2.5Compare
.eurovisionEuropean Broadcasting Union (EBU)ebu.ch-2.5Compare
.madridComunidad de Madridmadrid.org-2.5Compare
.scotDot Scot Registry Limitedcorenic.org-2.5Compare
.erniERNI Group Holding AGerni.ch-2.49Compare
.swissSwiss Confederationbakom.admin.ch-2.48Compare
.онлайнCORE Associationaxone.ch-2.48Compare
.сайтCORE Associationaxone.ch-2.48Compare
.radioEuropean Broadcasting Union (EBU)ebu.ch-2.48Compare
.cataloniaGeneralitat de Catalunyadomini.cat-2.47Compare
.بازارCORE Associationaxone.ch-2.44Compare
.manMAN SEknipp.de-2.4Compare
.SAPSAP AGknipp.de-2.39Compare
.payDOTPAY SAsedari.com-2.37Compare
.cologneNetCologne Gesellschaft für Telekommunikation mbHnetcologne.de-2.35Compare
.koelnNetCologne Gesellschaft für Telekommunikation mbHnetcologne.de-2.35Compare
.nrwMinds + Machines GmbHgmail.com-2.34Compare
.bauhausWerkhaus GmbHknipp.de-2.34Compare
.ifmifm electronic gmbhknipp.de-2.33Compare
.gmx1&1 Mail & Media GmbH1und1.de-2.31Compare
.mail1&1 Mail & Media GmbH1und1.de-2.31Compare
.ruhrregiodot GmbH & Co. KGdotzon.com-2.31Compare