25 Extensible Provisioning Protocol (EPP)

Prototypical answer:

gTLDFull Legal NameE-mail suffixDetail
.boxNS1 Limitedtld.asiaView

We have engaged ARI Registry Services (ARI) to deliver services for this TLD. ARI provide registry services for a number of TLDs including the .au ccTLD. For more background information on ARI please see the attachment ‘Q24 – ARI Background & Roles.pdf’. This response describes the SRS as implemented by ARI.

ARI has demonstrated delivery of an SRS with exceptional availability, performance and reliability. ARI are experienced running mission critical SRSs and have significant knowledge of the industry and building and supporting SRSs.
ARI’s SRS has successfully supported a large group of Registrars for ASCII and IDN based TLDs. The system is proven to sustain high levels of concurrency, transaction load, and system uptime. ARI’s SRS meets the following requirements:
– Resilient to wide range of security & availability threats
– Consistently exceeds performance & availability SLAs
– Allows capacity increase with minimal impact to service
– Provides fair & equitable provisioning for all Registrars

ARI’s SRS was built to sustain 20M domain names. Based on ARI’s experience running a ccTLD registries and industry analysis, ARI were able to calculate the conservative characteristics of a registry this size.
Through conservative statistical analysis of the .au registry and data presented in the May 2011 ICANN reports for the .com & .net, .org, .mobi, .info, .biz and .asia [http:⁄⁄www.icann.org⁄en⁄resources⁄registries⁄reports] we know there is:
– An average of 70 SRS TPS per domain, per month
– A ratio of 3 query to 2 transform txs
This indicates an expected monthly transaction volume of 1,400M txs (840M query and 560M transforms).
Through statistical analysis of the .au registry and backed up by the data published in the .net RFP responses [http:⁄⁄archive.icann.org⁄en⁄tlds⁄net-rfp⁄net-rfp-public-comments.htm] we also know:
– The peak daily TPS is 6% of monthly total
– The peak 5 min is 5% of the peak day
Thus we expect a peak EPP tx rate of 14,000 TPS (5,600 transform TPS and 8,400 query TPS)
Through conservative statistical analysis of the .au registry we know:
– The avg no. contacts⁄domain is 3.76
– The avg no. hosts⁄domain is 2.28
This translates into a requirement to store 75.2M contacts and 45.6M hosts.
Finally through real world observations of the .au registry, which has a comprehensive web interface when compared to those offered by current gTLD registries, we know there is an avg of 0.5 HTTP requests⁄sec to the SRS web interface per Registrar. We also know that this behaviour is reasonably flat. To support an estimated 1000 Registrars, would require 500 requests⁄second.
For perspective on the conservativeness of this, the following was taken from data in the May 2011 ICANN reports referenced above:
– .info: ~7.8M names peaks at ~1,400 TPS (projected peak TPS of ~3,600 with 20M)
– .com: ~98M names peaks at ~41,000 TPS (projected peak TPS of ~8,300 TPS with 20M)
– .org: ~9.3M names, peaks at ~1,400 TPS (projected peak TPS of ~3,100 with 20M)
After performing this analysis the projected TPS for .com was still the largest value.
ARI understand the limitations of this method but it serves as a best estimate of probable tx load. ARI has built overcapacity of resources to account for limitations of this method, however as numbers are more conservative than real world observations, we are confident this capacity is sufficient.
This TLD is projected to reach 17,648
domains at its peak volume and will generate 12.3536
EPP TPS. This will consume 0.09
% of the resources of the SRS infrastructure. As is evident ARI’s SRS can easily accommodate this TLD’s growth plans. See attachment ‘Q24 – Registry Scale Estimates & Resource Allocation.xlsx’ for more information.
ARI expects to provide Registry services to 100 TLDs and a total of 12M domains by end of 2014. With all the TLDs and domains combined, ARI’s SRS infrastructure will be 60% utilized. The SRS infrastructure capacity can be easily scaled as described in Q32
ARI benchmarked their SRS infrastructure and used the results to calculate the required computing resources for each of the tiers within the architecture; allowing ARI to accurately estimate the required CPU, IOPS, storage and memory requirements for each server, and the network bandwidth & packet throughput requirements for the anticipated traffic. These capacity numbers were then doubled to account for unanticipated traffic spikes, errors in predictions, and headroom for growth. Despite doubling numbers, effective estimated capacity is still reported as 20M. The technical resource allocations are explored in Q32.

ARI’s SRS has the following major components:
– Network Infrastructure
– EPP Application Servers
– SRS Web Interface Application Servers
– SRS Database
Attachment ‘Q24 – SRS.pdf’ shows the SRS systems architecture and data flows. Detail on this architecture is in our response to Q32. ARI provides two distinct interfaces to the SRS: EPP and SRS Web. Registrar SRS traffic enters the ARI network via the redundant Internet link and passes (via the firewall) to the relevant application server for the requested service (EPP or SRS Web). ARI’s EPP interface sustains high volume and throughput domain provisioning transactions for a large number of concurrent Registrar connections. ARI’s SRS Web interface provides an alternative to EPP with a presentation centric interface and provides reporting and verification features additional to those provided by the EPP interface.

3.1 EPP
ARI’s EPP application server is based on EPP as defined in RFCs 5730 – 5734. Registrars send XML based transactions to a load balanced EPP interface which forwards to one of the EPP application servers. The EPP application server then processes the XML and converts the request into database calls that retrieve or modify registry objects in the SRS database. The EPP application server tier comprises of three independent servers with dedicated connections to the registry database. Failure of any one of these servers will cause Registrar connections to automatically re-establish with one of the remaining servers. Additional EPP application servers can be added easily without any downtime. All EPP servers accept EPP both IPv4 & IPv6.

3.2 SRS Web
The SRS Web application server is a Java web application. Registrars connect via the load balancer to a secure HTTP listener running on the web servers. The SRS web application converts HTTPs requests into database calls which query or update objects in the SRS database. The SRS Web application server tier consists of two independent servers that connect to the database via JDBC. If one of these servers is unavailable the load balancer re-routes requests to the surviving server. Additional servers can be added easily without any downtime. These servers accept both IPv4 & IPv6.

3.3 SRS Database
The SRS database provides persistent storage for domains and supporting objects. It offers a secure way of storing and retrieving objects provisioned within the SRS and is built on the Oracle 11g Enterprise Edition RDBMS. The SRS Database tier consists of four servers clustered using Oracle Real Application Clusters (RAC). In the event of failure of a database server, RAC will transparently transition its client connections to a surviving database host. Additional servers can be added easily without any downtime.

3.4 Number of Servers
EPP Servers – The EPP cluster consists of 3 servers that can more than handle the anticipated 20M domains. This TLD will utilize 0.09
% of this capacity at its peak volume. As the utilisation increases ARI will add additional servers ensuring the utilisation doesn’t exceed 50% of total capacity. Adding a new server to the cluster can be done live without downtime.
SRS Web Servers – The SRS Web cluster consists of 2 servers that can more than handle the anticipated 20M domains. This TLD will utilize 0.09
% of this capacity at its peak volume. As the utilisation increases ARI will add additional servers ensuring the utilisation doesn’t exceed 50% of total capacity. Adding a new server to the cluster can be done live without downtime.
SRS DB Servers – The SRS DB cluster consists of 4 servers that can more than handle the anticipated 20M domains. This TLD will utilize 0.09
% of this capacity at its peak volume. As the utilisation increases ARI will add additional servers ensuring the total utilisation doesn’t exceed 50% of total capacity. Adding a new server to the cluster can be done live without downtime.

3.5 SRS Security
ARI adopts a multi-layered security solution to protect the SRS. An industry leading firewall is deployed behind the edge router and is configured to only allow traffic on the minimum required ports and protocols. Access to the ARI EPP service is restricted to a list of known Registrar IPs.
An Intrusion Detection device is in-line with the firewall to monitor and detect suspicious activity.
All servers are configured with restrictive host based firewalls, intrusion detection, and SELinux. Direct root access to these servers is disabled and all access is audited and logged centrally.
The SRS database is secured by removal of non-essential features and accounts, and ensuring all remaining accounts have strong passwords. All database accounts are assigned the minimum privileges required to execute their business function.
All operating system, database, and network device accounts are subject to strict password management controls such as validity & complexity requirements.
Registrar access to the SRS via EPP or the Web interface is authenticated and secured with multi-factor authentication (NIST Level 3) and digital assertion as follows:
– Registrar’s source IP must be allowed by the front-end firewalls. This source IP is received from the Registrar via a secure communication channel from within the SRS Web interface
– Registrar must use a digital certificate provided by ARI
– Registrar must use authentication credentials that are provided by encrypted email
All communication between the Registrar and the SRS is encrypted using at least 128 bit encryption which been designated as ‘Acceptable’ till ‘2031 and beyond’ by NIST Special Publication 800-57.

3.6 SRS High Availability
SRS availability is of paramount. Downtime is eliminated or minimised where possible. The infrastructure contains no single points of failure. N+1 redundancy is used as a minimum, which not only protects against unplanned downtime but also allows ARI to execute maintenance without impacting service.
Redundancy is provided in the network with hot standby devices & multiple links between devices. Failure of any networking component is transparent to Registrar connections.
N+N redundancy is provided in the EPP and SRS Web application server tiers by the deployment of multiple independent servers grouped together as part of a load-balancing scheme. If a server fails the load balancer routes requests to the remaining servers.
N+N redundancy is provided in the database tier by the use of Oracle Real Application Cluster technology. This delivers active⁄active clustering via shared storage. This insulates Registrars from database server failure.
Complete SRS site failure is mitigated by the maintenance of a remote standby site – a duplicate of the primary site ready to be the primary if required.
The standby site database is replicated using real time transaction replication from the main database using Oracle Data Guard physical standby. If required the Data Guard database can be activated quickly and service resumes at the standby site.

3.7 SRS Scalability
ARI’s SRS scales efficiently. At the application server level, additional computing resource can be brought on-line rapidly by deploying a new server online. During benchmarking this has shown near linear.
The database can be scaled horizontally by adding a new cluster node into the RAC cluster online. This can be achieved without disruption to connections. The SRS has demonstrated over 80% scaling at the database level, but due to the distributed locking nature of Oracle RAC, returns are expected to diminish as the number of servers approaches double digits. To combat this ARI ensures that when the cluster is ‘scaled’ more powerful server equipment is added rather than that equal to the current members. Capacity can be added to the SAN at any time without downtime increasing storage and IOPs.

3.8 SRS Inter-operability and Data Synchronisation
The SRS interfaces with a number of related registry systems as part of normal operations.

3.8.1 DNS Update
Changes made in the SRS are propagated to the DNS via an ARI proprietary DNS Update process. This process runs on the ‘hidden’ primary master nameserver and waits on a queue. It is notified when the business logic inserts changes into the queue for processing. The DNS Update process reads these queue entries and converts them into DNS update (RFC2136) commands that are sent to the nameserver. The process of synchronising changes to SRS data to the DNS occurs in real-time.

3.8.2 WhoIs
The provisioned data supporting the SRS satisfies WhoIs queries. Thus the WhoIs and SRS share data sets and the WhoIs is instantaneously updated. Under normal operating conditions the WhoIs service is provided by the infrastructure at the secondary site in order to segregate the load and protect SRS from WhoIs demand (and vice versa). WhoIs queries that hit the standby site will query data stored in the standby database – maintained in near real-time using Oracle Active Data Guard. If complete site failure occurs WhoIs and SRS can temporarily share the same operations centre at the same site (capacity numbers are calculated for this).

3.8.3 Escrow
A daily Escrow extract process executes on the database server via a dedicated database account with restricted read-only access. The results are then transferred to the local Escrow Communications server by SSH.

ARI follow defined policies⁄procedures that have developed over time by running critical registry systems. Some principals captured by these are:
– Conduct all changes & upgrades under strict and well-practised change control procedures
– test, test and test again
– Maintain Staging environments as close as possible to production infrastructure⁄configuration
– Eliminate all single points of failure
– Conduct regular security reviews & audits
– Maintain team knowledge & experience via skills transfer⁄training
– Replace hardware when no longer supported by vendor
– Maintain spare hardware for all critical components
– Execute regular restore tests of all backups
– Conduct regular capacity planning exercises
– Monitor everything from multiple places but ensure monitoring is not ‘chatty’
– Employ best of breed hardware & software products & frameworks (such as ITIL, ISO27001 and Prince2)
– Maintain two distinct OT&E environments to support pre-production testing for Registrars

ARI’s SRS adheres to and goes beyond the scope of Specification 6 and Specification 10 of the Registry Agreement. ARI’s EPP service is XML compliant and XML Namespace aware. It complies with the EPP protocol defined in RFC5730, and the object mappings for domain, hosts & contacts are compliant with RFC 5731, 5732 & 5733 respectively. The transport over TCP is compliant with RFC5734. The service also complies with official extensions to support DNSSEC, RFC5910, & Redemption Grace Period, RFC 3915.
ARI’s SRS is sized to sustain a peak transaction rate of 14,000 TPS while meeting strict internal Operational Level Agreements (OLAs). The monthly-based OLAs below are more stringent than those in Specification 10 (Section 2).
EPP Service Availability: 100%
EPP Session Command Round Trip Time (RTT): 〈=1000ms for 95% of commands
EPP Query Command Round Trip Time (RTT): 〈=500ms for 95% of commands
EPP Transform Command Round Trip Time (RTT): 〈=1000ms for 95% of commands
SRS Web Interface Service Availability: 99.9%
ARI measure the elapsed time of every query, transform and session EPP transaction, and calculate the percentage of commands that fall within OLA on a periodic basis. If percentage value falls below configured thresholds on-call personnel are alerted.
SRS availability is measured by ARI’s monitoring system which polls both the EPP and SRS Web services status. These checks are implemented as full end to end monitoring scripts that mimic user interaction, providing a true representation of availability. These ‘scripts’ are executed from external locations on the Internet.

This function will be performed by ARI. ARI staff are industry leading experts in domain name registries with the experience and knowledge to deliver outstanding SRS performance.
The SRS is designed, built, operated and supported by the following ARI departments:
– Products and Consulting Team (7 staff)
– Production Support Group (27 staff)
– Development Team (11 staff)
A detailed list of the departments, roles and responsibilities in ARI is provided in attachment ‘Q24 – ARI Background & Roles.pdf’. This attachment describes the functions of the teams and the number and nature of staff within.
The number of resources required to design, build, operate and support the SRS does not vary significantly with, and is not linearly proportional to, the number or size of TLDs that ARI provides registry services to.
ARI provides registry backend services to 5 TLDs and has a vast experience in estimating the number of resources required to support a SRS.
Based on past experience ARI estimates that the existing staff is adequate to support an SRS that supporting at least 50M domains. Since this TLD projects 17,648
domains, 0.04
% of these resources are allocated to this TLD. See attachment ‘Q24 – Registry Scale Estimates & Resource Allocation.xlsx’ for more information.
ARI protects against loss of critical staff by employing multiple people in each role. Staff members have a primary role plus a secondary role for protection against personnel absence. Additionally ARI can scale resources as required, trained resources can be added to any of the teams with a 2 month lead time.
The Products and Consulting team is responsible for product management of the SRS solution including working with clients and the industry to identify new features or changes required. The team consists of:
– 1 Products and Consulting Manager
– 1 Product Manager
– 1 Technical Product Manager
– 4 Domain Name Industry Consultants
The Production Support Group (PSG) is responsible for the design, deployment and maintenance of the SRS infrastructure including capacity planning and monitoring as well as security aspects – ensuring the SRS services are available and performing at the appropriate level and operating correctly. The team consists of:
– Production Support Manager
– Service Desk:
– 1 Level 1 Support Team Lead
– 8 Customer Support Representatives (Level 1 support)
– 1 Level 2 Support Team Lead
– 4 Registry Specialists (Level 2 support)
– Operations (Level 3 support):
– 1 Operations Team Lead
– 2 Systems Administrators
– 2 Database Administrators
– 2 Network Engineers
– Implementation:
– 1 Project Manager
– 2 Systems Administrators
– 1 Database Administrator
– 1 Network Engineer
The development team is responsible for implementing changes and new features into the SRS as well as bug fixing and complex issue diagnosis. The team consists of:
– 1 Development Manager
– 2 Business Analysts
– 6 Developers
– 2 Quality Analysts
These resources sufficiently accommodate the needs of this TLD, and are included in ARI’s fees as described in our Financial responses.

Similar gTLD applications: (0)

gTLDFull Legal NameE-mail suffixzDetail