24 Shared Registration System (SRS) Performance

Prototypical answer:

gTLDFull Legal NameE-mail suffixDetail
.skolkovoFund for Development of the Center for Elaboration and Commercialization of New Technologiescctld.ruView


1. INTRODUCTION

Shared Registration System (SRS) allows many registrars to partake in registration of domain names. SRS for Registry operator will be in full compliance with prescribed standards.
The Registry Operator will outsource SRS functions to an external subcontractor, namely, JSC Technical Center Internet (hereinafter - Registry Service Provider or RSP). The Registry Service Provider’s duties will comprise installation, maintenance and troubleshooting of the SRS described below.

About the Technical Center of Internet (TCI)

The Technical Center of Internet, JSC (incorporated in Russian Federation, Primary State Registration Number (OGRN): 1097746536117, Taxpayer Identification Number (INN): 7702714697) is one of the largest worldwide technical centers and the only Russian one to service registry operators. A successor to Russian Institute for Public Networks, the Technical Center of Internet has a 20 year-long record of, and is already two years into, operation in its current status. TCI provides Full Registry Solution and DNS service with DNSSEC support to 2 ccTLD registry operators, supporting a total of 4.5m plus second-level domains in ccTLDs .SU, .RU (5th worldwide) and .РФ (1st worldwide IDN ccTLD). TCI serves 26 registrars, including several ICANN-accredited ones. TCI has a number of scattered worldwide DNS nodes, the geographically distributed fully redundant state-of the-art infrastructure and highly qualified staff.

2. GENERAL DESCRIPTION

RSP will implement SRS on the basis of proven technologies and technical solutions currently used for servicing TLDs .SU, .RU and .РФ. The technologies and technical solutions were designed to fully comply with requirements to gTLD RSP.

To provide the required SRS services to registrars the following elements of the RSP infrastructure will be used:
- communication channels, bandwidth providers and IP numbering schemes;
- networking equipment;
- hosting facilities;
- EPP, Web and database applications;
- server equipment;
- security and unauthorized access protection systems;
- SRS data storage;
- monitoring and troubleshooting services.

The 2 co-active SRS nodes are installed at 2 independent, geographically dispersed facilities. Each node includes 1 database server and 3 application servers. All internal service communications within SRS nodes used for data replication and transmission of service commands are completely secured thanks to private networking channels. Access for accredited Registrars is granted using IPv4 and IPv6. Both SRS nodes are located within hosting facilities with a guaranteed uninterrupted power supply, enhanced security and network connectivity with multiple destinations. SRS database servers run on the Sun Oracle platform. At all times, one of the database servers is primary and the other one is stand-by. Data from the primary database are transmitted live to the stand-by database using the Oracle ʺdata guardʺ solution. Coupled with the fast database swap procedure, this technology guarantees an absence of data loss and a minimum database idle time to meet service level and operation continuity requirements. All six Application servers keep permanent connection with the primary database server. The database swapping is run in a semi-automated mode under Database Administrator’s supervision. Two co-active SRS nodes are connected by gigabit Ethernet (GE) channels. The channels take alternative routes which do not intersect in any given point. The connections between the nodes are protected.

The applicant’s technical architecture option is in compliance with geographical diversity and operational continuity parameters as described in the answers to Evaluation Questions #34 and #39.

To ensure a maximum protection of the Registry Operator’s data, an additional storage for the Registry Operator’s data will be arranged in the form of file server and RAID storage located at the facility in Frankfurt, Germany. The construction and tests completion of the German facilities is scheduled for September 2012. The procedure of back-up copying of the registry database will run regularly via the encrypted VPN over the public Internet interface with the 1 Gb transfer rate. The model was tested and produced great results. While running back-up copying, a full copy of the database and the set of registry database transaction files allowing restoration of an actual state of the registry database are saved. For the Oracle database back-up copies of the database transaction files in use bear the name of archive of redo logs. The off-shore deposit of the Oracle redo files allows fast recovery of SRS database data at any point in time up to one month backwards. For more details refer to the answer to Evaluation Question #37.

In addition, at the level of the mutual filesystem data exchange for RAID storages mirroring of some storage partitions of critical SRS data between nodes was arranged. That is to say, a section of the RAID storage containing critical data from one SRS node is mirrored onto a respective section of the RAID storage of another SRS node and vice versa. This provides critical data synchronization between SRS nodes at the filesystem level.

The employment of several alternative routes and protocols for data replication substantially reduces probability of data loss in the event of an unlikely emergency.

So, the two alternative schemes allowing protection of the registry database are deployed:
- databases replication at the level of database servers;
- replication of the RAID storage database, with all the Registry Operator’s data stored on its drives.

That is why SRS nodes are tagged ʺco-activeʺ.

The use of 6 Application servers (3 per each operational facility) is also dictated by the need to conform to requirements to geographical diversity and operational continuity. Access to Application servers and control of their accessibility are exercised using load balancing (SLB). An Application server can be put off- or online at all times.

The SRS architecture is shown in Figure Q24_SRSNodes.

3. PERFORMANCE METRICS

The SRS system has the service level SLR (monthly) parameters in compliance with requirements of SLA Matrix per Specification 10 to Registry Agreement:
- EPP service availability for registrars - 99,95%;
- EPP session-command RTT for at least 90% of the commands - 1000 ms;
- EPP query-command RTT for at least 90% of the commands - 300 ms;
- EPP transform-command RTT for at least 90% of the commands - 500 ms;
- Web applications availability - 99,95%.

The above SRS is capable to support up to 1000⁄sec EPP requests
and maintain 20,000 concurrent EPP sessions.

4. HARDWARE AND SOFTWARE SPECIFICATION

4.1. Database servers

The Oracle database solution with the Sun server platform is used for database functions of SRS. The following hardware and software configuration is in use:

Moscow node
- SUN SPARC Enterprise M5000
- 2.66 GHz SPARC64 VII Quad-Core Processors
- DDRII 16*2GB DIMMs
- HDD 2*146GB SAS HD
- 2xGb Ethernet
- 2 PSU N+N redundancy
- FC Sun StorageTek PCI-E 40Gb
- Sun Solaris 5.10
- Oracle11g Enterprise Edition

St.Petersburg node
- Sun SPARC Enterprise M4000 Server,
- 2.4 GHz SPARC64 VII Quad-Core Processors
- DDRII 8 x 2 GB DIMMs
- 2 x 146 GB SAS HD
- 2xGb Ethernet
- 2 PSU N+N redundancy
- FC Sun StorageTek PCI-E 40Gb
- Sun Solaris 5.10
- Oracle11g Enterprise Edition

Database capabilities are described in detail in the answer to Evaluation Question #33.

4.2. SRS Data storage

Data storage system is one of the critical elements of the SRS nodes. RAID storages are in use as a data storage facility to conform to required disk space volumes, transactions speed and the need to prevent data loss in each location. This industry-standard storage is connected to the database and application servers. In Frankfurt, an EMC storage is used to store the backup data of the registry database and archive redo logs.

In addition to the RAID data storage scheme, the daily backup is performed in two SRS nodes to prevent data loss for an unlikely catastrophic event.

Moscow: EMC VNX 5100 (16x600GB FC, RAID 10)
St.Petersburg: EMC Clariion CX4-120 (15*300GB FC, RAID 10)
Frankfurt: EMC Clariion CX4-120 (11X1000GB SATAII, RAID 5)

To monitor critical functions the software is deployed to monitor general conditions of the storage volume such as the disks’ health, free space, memory and processor utilization.

This configuration of the storage elements provides a necessary operational capacity and prevents data losses.

4.3. Application servers

Registrars exercise EPP functions and Web https access to registration services using EPP and HTTPS protocols via front-end load- balancing Application servers. Their unified hard- and software configuration allows extending the number of servers without interrupting the service. EPP protocol, as described in the answer to Evaluation Question # 25, is in use in SRS to perform domain name registration functions for Registrars. Secured by HTTPS, Web access for Registrar gives access to statistics, billing information and accounts status. The Registry Operator publishes service information for Registrars at their service pages within the Registry Operatorʹs Web pages. The Application servers run on an Intel platform with the following hard- and software configuration:
- Intel 1U SR1625UR, 2CPU, 4GB RAM, 2HDx140GB RAID1
- Red Hat Enterprise Linux
- Oracle WebLogic 10.3 middleware

4.4. Version control

To keep up with the required service parameters all components of the SRS are updated from a single center. For database and application servers software version control there is a source code database run under Oracle with a set of instructions and scripts to install or roll-back software modules and manage source files according to ISO 27005 recommendations. Detailed procedures for version control are described in the answer to Evaluation Question # 39 ʺRegistry Continuityʺ.

5. NETWORK AND OTHER EQUIPMENT. IP-CONNECTIVITY

The following SRS network design proved ideal and is in use for many other critical Registry functions: there are pair of the network routers and switches connected to at least a pair of network connections with non-overlapping physical paths and different breeds of routing. The upstream providers for SRS supply DDoS free connections.

The following network equipment and load balancing system is in use:
- A pair of Cisco 7600 series routers is installed at each SRS node. These routers exercise routing, switching and perform load balancing failure switching functions, and access policies, such as IP access lists, port filtering and other firewall functions.
- A pair of routers at SRS facility with the hot standby router protocol (HSRP) guarantee an automatic switchover if either router goes offline. There is the Cisco’s server load balancing (SLB) protocol to ensure the Application servers’ load balancing. If an Application server failed to execute 2 probe connections, SLB executes switch down the suspicious server and sends an alarm signal to the Monitoring Duty Operators. For traffic load balancing a simple round robin scheme is in use.

There are console routers at each SRS node connected to the network and server equipment via serial ports. This allows a remote command string execution for control purposes, such as service start and stop. Management of the applications and monitoring is run over secure VLAN connections.

A number of PDUs at SRS facilities allows remote power reset of the equipment. PDUs are connected via serial ports and can be managed over Ethernet too.

The network infrastructure for SRS node is also used by other critical registry elements, such as Hidden DNS server, DNSSEC sign server, WHOIS server, NTP server.

All servers and respective applications are connected over a number of VLANs according to the security policy, as described in the answer to Evaluation Question 30b.

Application servers are available for Registrars through both IPv4 and IPv6. The Anycast technology guarantees a secured and attack-resistant access.

The following Internet connectivity is in use for SRS Internet connectivity:
- 2 x Ethernet 100 Mbit⁄c Internet Exchange (SPB-IX) St.Petersburg
- 2 x Ethernet 1000 Mbit⁄c Internet Exchange (MSK-IX) Moscow
- Ethernet 1000 Mbit⁄c Internet (MAP) Moscow
- Ethernet 1000 Mbit⁄c Internet (RELARN) Moscow
- Ethernet 1000 Mbit⁄c Internet (RETN) St.Petersburg
- Ethernet 100 Mbit⁄c Internet (Relcom-SPB) St.Petersburg

Plus, there are 2 private Gigabit Ethernet connections run by separate fibers, with different operators and different paths for database live data replication.

6. RELATIONS TO OTHER SYSTEMS

SRS is a primary functional unit of the registry. It is an origin of data for the other elements of the registry database, such as zone file, WHOIS data, information for statistics and reports, billing data and internal service data. The SRS connections to other systems are shown in Figure Q24_SRSConnections.

The information from the database servers is transferred to Hidden Primary DNS servers and makes up a zone file for .SKOLKOVO TLD as described in the answer to Evaluation Question #35.

Information from SRS is a source of data used for RDDS WHOIS as described in the answer to Evaluation Question #26.

The daily data exchange with the Data Escrow provider uses database information from SRS as described in the answer to Evaluation Question #38.

The Primary database server performs generation and delivery of the Oracle redo files to the remote location in Frankfurt, Germany.

The SRSʹs Primary database is a source of statistics and billing data for Registrars too.

7. DATA PROTECTION

An accredited Registrar is responsible for accuracy of transmitted to SRS data and their formats. But if the Registrar damaged the data by, the .SKOLKOVO Registry Operator will be at pains to recover those, as the architecture of the SRS system of .SKOLKOVO let data roll-back in accordance with parameters set for Registrar continuity in the answer to Evaluation Question #39.

8. SECURITY AND ACCESS CONTROL

To prevent an unauthorized access and data modification in SRS by service team several procedures are in place:
- Access rules control;
- Logging of executed commands in SRS applications;
- Regular audit of the staff operations and procedures;
- One-time access passwords for staff engaged in system management;
- Permanent anti-intrusion monitoring.

Upstream providers clean the incoming traffic from DDoS packets. The Cisco firewall solution, Cisco guard and Cisco detector protect SRS systems from external attacks.

The Security policy for SRS is described in the answer to Evaluation Question #30 (a). Security methods are described in the answer to Evaluation Question #30 (b).

9. SRS MONITORING AND MAINTENANCE

Monitoring of the SRS performance is run by the RSP’s Monitoring Department 24x7x365. Parameters of all the elements of SRS are displayed on the monitoring dashboard of the SRS elements. The monitoring reflects status of the elements’ load, the storage capacity, temperature, on- and off-line status, idle time. Active monitoring as a set of probes and executable procedures generates emergency reports if any critical elements of the SRS functions stop performing or functional parameters degrade. Trouble ticket management system and escalation matrix is in use in emergency, as described in the answer to Evaluation Question #42.

Once a month the RSP checks up the SRS system. Evaluation gives rise to SRS modification plans and upgrades of elements to meet required changes, such as network capacity, processor power, memory upgrades, etc. The Registrars’ input is considered very valuable for improving the system and enhance the quality of the service. SRS elements are also subject to failover testing as described in the answer to Evaluation Question #41.

10. SRS SCALABILITY

The planned capacity will be sufficient for the .SKOLKOVO Registry Operator in the long run, given that it has an agreement with Registry Service Provider that .SKOLKOVO will not use more than 10% of the total SRS capacity.

The SRS configuration can be easily upgraded without service interruption, as every service and network element is duplicated.

The SRS database is clustered, i.e. one element can be replaced with a cluster of many.

Hardware components are scaled by increasing the number of front- and backend servers. Servers can be added at any moment without interrupting service. CPUs and memory can be added to database servers with short service outage.

11. PERSONNEL

The following staff roles are engaged in development, initial implementation and maintenance for the SRS service:

11.1. Initial implementation and ongoing maintenance
Most of the architectural elements, such as networks and servers, are already in place, so minimum implementation is required.

- Applied Systems Administrator (Registry Services Department, RS Support Group): System administration of Application servers.
- Applied Systems Engineer (Registry Services Department, RS Support Group): Installation of server equipment and operating systems.
- Database Administrator (Registry Services Department, RS Support Group): Installation of SRS Database, system administration of database, applications performance tuning, system administration of database servers,, supporting post implementation, database performance tuning
- Data Storage System Administrator (Registry Services Department, RS Support Group): Supporting pre- and post implementation RAID data storage.
- Network Engineer (Networking Department, IP-Network Group): Installation and maintenance of network equipment, IP numbering schemas design, installation and maintenance of communication channels, communication with colocation providers regarding IP-network issues.
- IT-Security Engineer (IT-Security Department): Development and review of Network security Plan.
- Project Manager (Quality Assurance Department): Provide direction to technology teams in defining and prioritizing features, enhancements, platform improvements and customer requests.

11.2. Development

- Application Developer (Registry Services dep., R&D Group): Design and implementation SRS system, design and implementation of solutions SRS security, supporting pre- and post implementation.
- Database Developer (Registry Services Department, R&D Group): Design and implementation of SRS database, supporting pre- and post implementation, design and implementation system monitoring, managing schema objects, such as tables, store procedures, functions, triggers, packages, views, building various queries.
- System Testing Engineer (Registry Services Department, Testing and Versioning Group): Testing and version control.

12. CONCLUSION

The co-active SRS architecture of the RSP satisfies the most rigorous requirements to construction of the critical infrastructure elements. Each element of SRS is duplicated, including servers, network equipment and channels. The critical data are copied on the database and file system levels onto RAID storages.

Backup location in Germany performs Primary database backup with a possibility for recovery of the registry database as of any moment in time in a span of 1 month backwards. Six Application servers maintain connection to the Primary Database providing Anycast access to the EPP, billing and statistics services. Their number can be increased without service stoppage.

The upgrade of the databasesʹ capacity can be performed without breaking the ultra-high continuity parameters. The service levels for SRS satisfy service levels set by ICANN.

Similar gTLD applications: (2)

gTLDFull Legal NameE-mail suffixzDetail
.tatarLimited Liability Company ʺCoordination Center of Regional Domain of Tatarstan Republicʺcctld.ru-2.52Compare
.детиThe Foundation for Network Initiatives “The Smart Internet”tcinet.ru-2.51Compare