IP Network

Published on September 2020 | Categories: Documents | Downloads: 1 | Comments: 0 | Views: 28

of x

Content

DESIGNING AND MANAGING HIGH AVAILABILITY IP NETWORKS SESSION NMS-2T20

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

1

Welcome! NMS-2T20 • Facilities • Introduction • Availability Components • A High Availability Culture: Metrics • People, Process, and Tools • HA Technologies (Afternoon) L1 through L7

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

2

INTRODUCTION AND DEFINITIONS

NMS-2T20 9592_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

3

Network Improvement Method Road to 5 9’s • Establish a standard measurement method • Define business goals as related to metrics • Categorize failures, root causes, and improvements • Take action for root cause resolution and improvement implementation

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

4

What Is “High Availability”? • The ability to define, achieve, and sustain “target availability objectives” across services and/or technologies supported in the network that align with the objectives of the business (i.e. 99.9%, 99.99%, 99.999%) Availability

NMS-2T20 9594_04_2004_c2

Downtime per Year (24x7x365)

99.000%

3 Days

15 Hours

36 Minutes

99.500%

1 Day

19 Hours

48 Minutes

99.900%

8 Hours

46 Minutes

99.950%

4 Hours

23 Minutes

99.990%

53 Minutes

99.999%

5 Minutes

99.9999%

30 Seconds

© 2004 Cisco Systems, Inc. All rights reserved.

5

Availability Definitions Availability • Availability = MTBF/(MTBF + MTTR)

Useful definition for theoretical and practical

• MTBF is Mean Time Between Failure What, when, why and how does it fail?

• MTTR is Mean Time To Repair How long does it take to fix?

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

6

Increasing Availability

M T T R

Availability

A

Mean Time to Repair

NMS-2T20 9594_04_2004_c2

Mean Time Between Failure

M T B F

© 2004 Cisco Systems, Inc. All rights reserved.

7

Why Improve Network Availability? Recent Studies by Sage Research Determined That US-Based Service Providers Encountered: • Percent of downtime unscheduled: 44% that is • 18% of customers experience over 100 hours of unscheduled downtime or an availability of 98.5% • Average cost of network downtime per year: $21.6 million or $2,169 per minute!

Downtime: Costs Too Much!!! SOURCE: Sage Research, IP Service Provider Provider Downtime Study: Analysis Analysis of Downtime Causes, Costs and Containment Strategies, August 17, 2001, Prepared for Cisco SPLOB NMS-2T20 9594_04_2004_c2 9592_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

8

What Availability Level Do I Need?

• The cost of downtime • Align availability to

business objectives • Failure insurance

NMS-2T20 9594_04_2004_c2

9

© 2004 Cisco Systems, Inc. All rights reserved.

Unscheduled Network Downtime Top Causes • Hardware • Links

• Change management • Process consistency

Technology 20%

• Methodology • Communication

• Design • Environmental issues • Natural disasters

User Error and Process 40%

• Software issues Software and Application 40% Source: Gartner NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

• Performance and load • Scaling 10

What Is the Reality? Desire Need Goal Cost Current Reality Guarantee 95%

98%

99.5%

99.9%

Availability Source: Gartner, Copyright ® 2001 NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

11

WORKING IN A NETWORK MANAGEMENT FRAMEWORK

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

12

Accenture Best of Breed Architecture Service Delivery

   l   a    t   r   o    P    l    l    i    B    M    R    C    C    S    M    O    L    M    S    /    L    M    N

   L    M    E

Service Assurance

Mediation

Customer/Internal Portal

Integrated Billing CRM/ OE

Integrated Order Manager Inter-Domain MOM Cisco Information Center

Inter-Domain Config Manager Cramer

Inter-Domain PM/SLM

IOM

ISC (VPN SC)

HP Service Activator

VPS

HP ITO

Observer

IE2100 CE/CNOTE/PERF

NMS-2T20 9594_04_2004_c2

Inter-Domain Mediation

HP OV NNM

Smart Plug-In Oracle

Cisco Works 2000

Navis Core

Smart Plug-In Internet

AWS SNMP

Navis Access

Agents/ PERF

Fire Netflow Omni Hunter (IE2100) Back II 13

© 2004 Cisco Systems, Inc. All rights reserved.

Deloitte Best of Breed

CustomerRelationship Management

Order Management

Market and Sell Products/Services

Order Decomposition

SalesForce Management

Opportunity Management

Contract Product/Service Management Catalog

Order and Configure Products/Services

Order Workflow Order Status Tracking Order Fulfillment Error Handling

Service Provisioning Quality of ServiceFulfillment

Perform Network Provisioning

Perform Resource Provisioning

Equipment Inventory

Workforce Dispatch

Network Element Inventory

Space Management

Network Activation

IP Address Administration

Perform Application Provisioning

Perform Server Provisioning Capacity Management

Perform Policy Provisioning

License Inventory

Hardware/ Configuration/ DiskInventory Activation

Configuration

Software Distribution

CustomerWebInterface Order Entry

BusinessRule Maintenance

Personalization

Product/Service Analysis

Trouble Reporting

Customer/ Data Product Inventory Warehouse Account

Middleware and Workflow Broker

Network Network Backbone

Directory Services

External Carriers and Entities

Network Elements

Servers

Technical SupportInfo

Alternative Sales Platform B2B, EDI

Element Management Disaster Recovery Facilities Monitoring Element Monitoring Server/App Monitor Service Level Management

LogicalDatabase

Customer Support Customer Care

Network and Enterprise Management

Security Firewall Policy Management Intrusion Detect

Trouble Management Trouble Resolution

Trouble Ticketing

Event Correlation

Mediation SLA QoS IPDRs

Financial Reporting

Billing Rating Accounts Collections Receivable

Bill Invoicing Calculation Fraud Control

Payments Processing

Digital Certification Authentication AuthorizeAccount

Decision Support

Performance Measurement

Content Filtering

Commissions

Carrier Settlement

VPN

ACD/CTI/IVR/PBX

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

14

TTI’s Best of Class Architecture

OSF

MDF

N e t   a r c B a s e P a c k a g e : S e c u r i   t   y a n d A d m i   n i   s t   r a t   i   o n

Graphical Reports

Views

Netrac Integrated GUI Service CNM Orientation Customer

Netrac

A la la rm rm Sc Sc re re en en

Applications Service Management Service eView Service Monitoring

Fault

Performance

NeTkT

PMM

Trouble Ticketing

Performance Analysis and Trends

Correlator+ O p Advanced t   i   Correlation and o n Root Cause a l   Analysis Fault Mgmt. Alarm Surveillance

NetCAP Planning

E n g i   n NetCAP Configuration NetCAP Provisioning e e r (   i   M I   Service Views n n a g s v Impact t   e e W r n o Topology o r t   S  o r k l   a Assign/ v r O e y Service Def. )   Design r d e r Sync Activate

Asset Mgmt.

Change Mgmt.

CDR Analysis CallExpert CDR Analysis and Reports

Mediations Device Expert

N e t   r a c A P O I   S  t   o S  o N t   h M e S r B e l   l   S o u t   h

Netrac

NCI Commands to the Network

Network Ev Events

Billing Mediation

Optional CDRs

EMS IP/VPN Network

Network NMS-2T20 9594_04_2004_c2

Web We b GU GU I

15

© 2004 Cisco Systems, Inc. All rights reserved.

Simplified Network Management Framework Inventory Management

Configuration Management

Change Management

Fault Management Problem Management Security Management

Event Management

Performance Management

Accounting Management Instance Management Event NMS-2T20 9594_04_2004_c2 © 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Management

Problem Management 16

Practical Application of Framework Inventory Management

Cisco RME

Configuration Management

Change Management

Cisco RME Remedy ARS

Remedy ARS

HP OV NNM Fault Management Problem Management

Cisco VMS

Security Management

Concord eHealth

Performance Management

Cisco NetFlow

Accounting Management

Event Management

Instance Management Event NMS-2T20 9594_04_2004_c2 © 2004 Cisco Systems, Inc. All rights reserved.

Remedy ARS

MicroMuse NetCool

Management Ma

Problem M Ma anagement 17

AVAILABILITY COMPONENTS HARDWARE, SOFTWARE, POWER/ ENVIRONMNENT, LINK/CARRI LINK/CARRIER, ER, CONFIGURATION/CHANG CONFIGURA TION/CHANGE, E, RESOURCE UTILIZATION, DESIGN

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

18

Hardware Redundancy Options Highly Available Networks Tend to Have Both

-

• Failover redundant modules only

+

• All modules are redundant

• Operating system determines failover

• Protocols determine failover

-

+ +

• Typically cost-effective

• Increased cost and complexity

• Often only option for edge devices (point to point)

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

• Load balancing

+

19

Improving Hardware Availability • Load sharing redundancy • Active/standby redundancy

(processor, power, fans, line-cards) • Active/standby fault detection • Card MTBF (100,000 hrs) • Separate control and forwarding plane • Node rebuild time • “Hitless” upgrades • Robust hot swap (OIR)

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

20

Software Reliability Factors Age of Cisco IOS Release

Mature General Deployment   y    t    i   r   u    t   a    M   e   s Early   a   e Deployment    l   e    R

End of Life

General Deployment

Major

Reliability Increases with Maturity of Release

Time NMS-2T20 9594_04_2004_c2

21

© 2004 Cisco Systems, Inc. All rights reserved.

Software Reliability Observed MTBF

  e   r   u    l    i   a    F    f   o   e   c   n   e    d    i   c   n    I

FCS

1 Year General Deployment

10,000

25,000

45,000

MTBF NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

22

Improving Software Availability • Improved software quality goal (99.999%+) • “Hitless” upgrade • Process independence

(restart and protected memory) • Routing processor switchover • NSF (non-stop forwarding) • Line card switchover • Faster reboot • Uplink fast/backbone fast/HSRP • Routing convergence enhancements NMS-2T20 9594_04_2004_c2

23

© 2004 Cisco Systems, Inc. All rights reserved.

Circuit Diversity • Problem: if links follow a common path through service provider network, you are back to single-point-of-fail single-point-of-failure ure • Solution: employ as much circuit diversity as possible Links Terminate at Different Devices

Links Use Different Paths in SP Network

(Physical Diversity)

(Geographic Diversity)

Enterprise

Service Provider Diversity? NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

24

Link/Circuit Diversity THIS Is Better Than… Enterprise

THIS THIS,, which Is Better Than… Service Provider Network

Enterprise

THIS

Whoops; You’ve Been Trunked!

Enterprise

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

25

Power/Environment • Power outages UPS/generator power UPS/generator switchover coverage UPS/generator capacity UPS generator management

• Power circuit capacity • Air conditioning outages • Temperature fluctuations • Natural disaster Earthquake Flood Hurricane Disaster recovery plan NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

26

Power Environment Power Diversity • How redundant is the path

the electricity travels? • Separate: Power supplies Outlets Circuits Building entrances Power grids Generators NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

27

Power/Environment Data Center Hardening • Cable management • Power: Diversity/UPS • HVAC • Hardware placement • Physical security • Labeling • Environmental control

systems

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

28

Configuration/Change • FCAPS processes (fault, configuration, accounting,

performance, security) • Emergency changes • People, process, tools • User error

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

29

Configuration/Change What Are the Time Bombs? • No technical ownership • Large failure domains • Layer (II/III) design • Loose or non risk-aware change management • High levels of network inconsistency • Lack of network standards (SW, HW, config) • No capacity planning or performance management

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

30

Configuration/Change MTTR―Mean Time to Repair • No identified tiered support mechanism with

individuals who know and understand the network (lack of expertise) • Poor documentation (topology and config) • Large failure domain difficult to understand

and determine root-cause • Networks with control-plane resource issues

require major topology, config and upgrade changes NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

31

Resource Utilization What Happens when Networks Fail? • Resource constraints CPU/memory Inability to process messages Inability to process routing updates Routing or bridging loops

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

32

Network Design Network Complexity Technology Can Increase MTBF

People, Process, and Politics Can Increase Complexity THIS DECREASES MTBF and Increases MTTR NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

33

Design Primary Design Considerations • Hierarchical • Modular and consistent • Scalable • Manageable • Reduced failure • Domain (Layer II/III) • Interoperability • Performance • Availability • Security NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

34

Design Technical Considerations • All routed links • No spanning tree • Intelligent broadcast and multicast control

Much More on Design This Afternoon! NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

35

THE CULTURE OF AVAILABILITY CALCULATING, MEASURING, AND IMPROVING AVAILABILITY

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

36

The Culture of Availability • Identify gaps • Root cause failure analysis • Availability modeling • Availability metrics • Priority and ROI analysis • Quality improvement

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

37

Root Cause Failure Analysis • Priority 1 and 2 business

impacting • Why did the failure occur? HW, SW, link, power/env, change, design

• How could the failure

have been prevented? People, process, tools, technology

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

38

Types of Reliability Models • Parts-count models • Combinatorial model Reliability block diagrams, fault tree analysis

• Markov models Used in engineering to identify availability issues

• Petri Net models • Monte Carlo

simulation models NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

39

Examples of Hardware Reliability (Reliability Block Diagrams) Hardware Reliability = 99.938% with 4 Hour MTTR (325 Minutes/Year)

Hardware Reliability = 99.961% with 4 Hour MTTR (204 Minutes/Year)

Hardware Reliability = 99.9999% with 4 Hour MTTR (30 Seconds/Year)

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

40

Calculated Availability • Calculated availability based on network design, component MTBF and MTTR • MTBF = Mean Time Between Failure Calculated by measuring the average time between failures on a device

• MTTR = Mean Time To Repair The time between when the device/network broke and when it was brought back into service

NMS-2T20 9594_04_2004_c2

41

© 2004 Cisco Systems, Inc. All rights reserved.

Device Availability Calculation • Device MTBF = 45,000 hrs, MTTR = 4 hrs • Downtime = 4 hours every 45,000 hours • Downtime = .7788 hours per year • Availability = MTBF/MTBF + MTTR • Expected availability = 99.991%

Switch Card CPU

P/S

Backplane

Switch Card CPU

P/S Switch Card

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

42

Network Availability Calculation R1

R2

R3

R4

Router R1, R2, R3 and R4 MTBF = 16000 Hours MTTR = 24 Hours

3

1 Router Availability R1, R2, R3 and R4 16000/(16000+24) = 0.9985 Can Include Hardware + Software Components

Availability of R1, R2 in Parallel with R3, R4 = 1 - ((1-0.997)( ((1-0.997)(1 1 - 0.997) 0.997))) = 0.99999104 0.99999104

4 2 Availability of R1, R2 and R3, R4 in Series = (0.9985 0.9985) = 0.997006 NMS-2T20 9594_04_2004_c2

Network Availability = 99.999% Only Base on Device Availability Values; Link Availability Not Included

© 2004 Cisco Systems, Inc. All rights reserved.

43

Cisco Internal Tools: Calculated Availability Contact Your Sales Team for Quality Data • MTBF query tool

MTBF for components can be requested from Cisco User enters part number/product family and predicted p redicted MTBF is provided A system is a chassis populated with Field Replaceable Units (FRU) and software

• NARC: Network Availability and Reliability

Calculation Excel spread sheet, calculates availability/downtime availability/downtime for a system/network given MTBF and MTTR NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

44

Calculated Availability Key Points • Carried out at design time • Availability can be increased by decreasing MTTR or increasing MTBF or both • If service availability target is 99.999% calculated availability must be better than 99.999% Customer experience shows MTBF can be typically 2 x MTBF listed; this may not necessarily be a good thing

• Series components reduce availability, parallel (redundant) components increase availability • Complex networks require modelling tools to calculate engineered availability • Core networks are designed for high availability to a single point of failure; i.e., needs to be 99.999% available with any

single network component (node/link) fails

NMS-2T20 9594_04_2004_c2

45

© 2004 Cisco Systems, Inc. All rights reserved.

Availability Metrics: Where? What? Campus Switch Service Provider WAN Core

Campus

WAN Offices

Client

WAN Edge ISDN

Campus MAN

Telecommuters

WAN Distribution Internet Connectivity

eServers Database Servers

Application Servers

Data Center Switch NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

ISP POP

Remote Offices

VPN Customers/ Electronic Commerce

Partners/ Extranet

46

Availability Measurement Methodologies • Ping (network availability, device availability) • Service assurance agent • Trouble ticket reporting DPM: Defects Per Million Defect may be one user/customer down for one minute or one hour IUM: Impacted User Minutes Number of users affected

outage in minutes

• RMON probe reporting • Application request (SAP, SQL, etc.) NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

47

ICMP Reachability • Method definition • How • Unavailability

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

48

ICMP Device Reachability • Periodic pings to

network devices

NMS-2T20 9594_04_2004_c2

49

© 2004 Cisco Systems, Inc. All rights reserved.

Service Assurance Agent SNMP

Management Application

SA Agent

1. User configures collectors through mgmt application GUI 2. Mgmt application provisions source routers with collectors

3. Source router measures and stores performance data, e.g.: Response time Availability 4. Source router evaluates SLAs, sends SNMP Traps 5. Source router stores latest data point and 2 hours of aggregated points

6. Application retrieves data from source routers once an hour 7. Data is written to a database 8. Reports are generated

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

50

Outage Logs

Date

Device

Problem

Cause

TTR

Cust Affected

DPM

3/13

Sf-rtr01

Bad RSP

Infant Mortality (Hardware)

271

250

145

3/17

DVR-rtr03

Connection Loss

Duplicate Subnet (User-Error)

342

100

57

3/17

NY-rtr17

Connection Loss

Software Bug (Software)

600

290

353

3/18

SEA-rtr02

Connection Loss

No UPS (Power)

60

37

21

NMS-2T20 9594_04_2004_c2

51

© 2004 Cisco Systems, Inc. All rights reserved.

Defects per Million 140 Har dwar e Link User Error/Process

120

Softwar e Envir onment/Power

100 80 60 40 20 0 January NMS-2T20 9594_04_2004_c2

February

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

March

April

May

52

Continual Process Improvement Road to 5 9’s Gaps •• Identify Assess Assess customer customer

“practices” “practices” against against industry industry standards standards

Repeat Process • Move Move on on to next level of

assessment detail

Develop HA Plan •• Specific Specific plans plans to to

“Resolve “Resolve the Gaps”

•• Measurement Measurement & & timelines timelines

“Availability Improvement” Process

Verify •• Gaps Gaps are are resolved resolved •• Impact Impact on on availability availability

NMS-2T20 9594_04_2004_c2

Take Corrective Action • Implement “fixes” • Eliminate Gaps

53

© 2004 Cisco Systems, Inc. All rights reserved.

Investing to Reduce Unplanned Downtime Investment Strategy • ROI • New technology

40% Operator Errors

20% Environmental Factors, HW, OS, Power, Disasters

• Service contracts • Availability monitoring

People and Process • Hiring and training • IT process maturity • Automation • Change and problem mgmt.

40% Application Failure

Application • App. architecture/design • Change management • Problem management • Configuration management • Performance/capacity planning

Source: Gartner; Copyright ® 2001 NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

54

HA Improvement Prioritization Matrix

Example Availability Projects

High

Prioritization Matrix Must Haves

• Operational excellence

Quick Wins

• Tools • Network upgrades

Availability Impact Impact

Money Pits Low Low

NMS-2T20 9594_04_2004_c2

Low Hanging Fruit

Ease of Execution

High

© 2004 Cisco Systems, Inc. All rights reserved.

55

Creating an HA Culture People • Executive messaging: communicate business plans for high availability and the importance of improvement • Reward positive behavior • Provide world-class training to staff • Create a cross-functional technical team and availability champion

Process • • • •

Identify and resolve process deficiencies Start an availability improvement quality process Root-cause analysis Collect and report availability availability metrics metrics

Tools • Availability measurement • Processes for consistency (automate where possible) • Metrics for identifying areas of service improvement NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

56

PEOPLE, PROCESS, AND TOOLS FOR HIGH AVAILABILITY ADDRESSING 40% OF NETWORK OUTAGE TIME

NMS-2T20 9594_04_2004_c1

57

© 2004 Cisco Systems, Inc. All rights reserved.

Best-Practice Development Methodology: • Cross-functional team with experience in network design, operations, and network management

Design, Operations and NMS Experience

• Experience and visibility with Cisco world class network environments • Consulting experience in driving culture and technology changes

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Identify Common Availability and Process Issues

Best-Practice Definition

Best-Practice Development

Environment Comparisons

Root Cause Analysis

58

Achieving High Availability; Best-Practices • Change Management • New Solution Deployment • Configuration Management • Performance/Capacity Management • Fault Management • Problem Management • Security Management • Disaster Recovery

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

59

Change Management CHANGE MANAGEMENT

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

60

Change Management • Change management refers to the consistent process of successfully managing change within an organization • Process includes: Change controller Change documentation requirements Risk level assignment Validation and approval procedures Change meetings Emergency change procedures Post mortem review and root-cause Document change output requirements Change management system and metrics NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

61

New Solution NEW SOLUTION Deployment DEPLOYMENT

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

62

New Solution Deployment • The biggeston challenge is minimizing the impact the existing networking environment • Success requires structured processes that include resources from planning, design, network management, and implementation

NMS-2T20 9594_04_2004_c2 9592_04_2004_c1

63

© 2004 Cisco Systems, Inc. All rights reserved.

New Solution Deployment • The process to

successfully deploy a new solution is based on the (PDIO) methodology:

a  n   n   P  l a

D  e s i  g  n

Required Design Reviews Features Documentation or Services Testing, Pilots Staffing Detailed Plans Tools Templates Training Standards   n  t

O   p  e   r    a  t    e

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

e  n   m e   p  l e   I  m

64

New Solution Deployment New

Requirements

Solution

Analysis

Design

Feature Requirements

Management/Financial

Pilot Technology

Infrastructure Analysis

Track

Pilot Operations/Mgmt.

Performance

Technical Design Track

Deploy

Security

Support Track

Operate

Solution Demonstration

Implement and Operate

Availability Provisioning Interoperability Security Manageability Feasibility analysis NMS-2T20 9594_04_2004_c2

65

© 2004 Cisco Systems, Inc. All rights reserved.

Testing/Validation (The Lab)

Tools • • • • •

Traffic generator Protocol analyzer WAN simulator Session emulator Large network emulator

Collapsed Lab Topology Mimics Production Environment

• Requires tools, production equipment and dedicated use NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

66

Configuration Management CONFIGURATION MANAGEMENT

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

67

Configuration Management • Collection of processes and tools to: Promote network consistency Provide up to date network documentation Asset management

• Benefits Lower support costs Lower network costs due to device, circuit, and user tracking tools and processes that identify unused network components Improved network availability due to a improved time to resolve problems (MTTR) NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

68

Create Standards • Configuration version control and management • IP addressing standards and management • Device naming conventions and DNS/DHCP

assignments • Standard configuration templates • Configuration upgrade procedures • Solution templates • Network documentation (physical/logical)

NMS-2T20 9594_04_2004_c2

69

© 2004 Cisco Systems, Inc. All rights reserved.

Configuration Management Shared Resources:

•

Device/site/customer/circuit data Hardware

•

TFTP Servers

•

• •

DHCP Servers DNS Servers

• •

Serial # Software versions

•

Configuration file archival

•

IP addressing plan

•

Network documentation

•

Device inventory

•

Network troubleshooting

•

Software version control

•

IP address management

NMS Configuration Management Systems (TFTP)

Site Resources: •

LAN Switches

•

Routers

•

Internet Gateways

•

WAN Circuit Ports

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

70

Configuration Management • Versions of HW/SW • Config files

• Inventory • Devices

• Config templates

• Vendors

• Backup configs

• Support contacts

• IP address

• Carrier

• Feeds to change management

• Customer/device/link relationships

NMS-2T20 9594_04_2004_c2

71

© 2004 Cisco Systems, Inc. All rights reserved.

Software Lifecycle Management Software Management Test/Validate Pilot

Certification Starts Candidate Management

  aa  n   P  l Upgrade Trigger

I  m p l   e m e n t

Slow Start

Operation

Problem Management Version Control NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation ID.scr

Certification Complete

72

Software Version Control The Process of Software Version Control Is Critical to Software Consistency and Overall Software Reliability! • Publish and communicate certified device software version standards for identified software tracks • Quality gates during implementation implementati on process • Scheduled periodic audits to ensure network is in sync with the certified standard • Utilize tools to identify, track and sort software versions

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

73

Maintain Documentation • Current device, link, and end-user inventory • Configuration version control system • Software version control • TACACS configuration log • Network topology

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

74

Presentation_ID.scr

Validate and Audit Standards • Configuration integrity

checks

• Device, protocol,

and media audits • Standards and

documentation review

Configuration Consistency Simplifies a Network, Resulting in Fewer Problems and Faster Problem Resolution NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

75

Tools for Configuration Management • Often technology or product family-specific

(Cisco Element Managers) • CW2000 • Micromuse Netcool/Precision • Visionael (change and configuration) • Aperature (change and configuration)

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

76

Presentation_ID.scr

Performance Management PERFORMANCE MANAGEMENT

NMS-2T20 9594_04_2004_c1

77

© 2004 Cisco Systems, Inc. All rights reserved.

Capacity and Performance Management Process Application Profiling Gather Configuration and Traffic Information

Implement Changes

Observe Statistics, Collect Capacity Data, Analyze Traffic

Performance Baselining

Solve Problems Plan Changes Evaluate What--if What Analysis

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

78

Presentation_ID.scr

Performance Management • SNMP polling, RMON

• NMS performance

• Data includes utilization, CPU, memory, buffers, link utilization

• Management systems • Collection and archival

Site Resources • LAN Switches

• Performance reports

• Routers

• Exception reports

• Switch Gateway Modules

• Capacity planning

• Gateways • WAN and PSTN Circuit Ports

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

79

Baselining • How does the network behave normally? • CPU, memory, backplane, buffers, link utilization • Collect data (show commands, performance data) • Determine non-normal thresholds

Develop a Capacity Planning Strategy, Including Common Techniques, Tools, MIB Variables, and Thresholds Used for Capacity Planning NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

80

Presentation_ID.scr

Baselining and Exception Management • Alert mechanisms for performance exceptions • Create trouble ticket to track proactive issues • Investigate and make recommendations accordingly

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

81

What-If Analysis What-If Analysis Centers around Network Change and How the Change Affects the Environment • Identify higher risk changes • Determine potential resource issues (CPU, memory, buffer, backplane, link util, device resources) • Ask questions • If possible, take it to the lab • If possible, slow start implementation and measure key resource areas NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

82

Presentation_ID.scr

Tools for Performance Management • CW 2000 Service Assurance Agent • NetScout nGenius • Cisco NetFlow Collector/Analyzer • SMARTS inCharge for performance • Lucent VitalNet • InfoVista • Concorde Ehealth

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

83

FAULT MANAGEMENT FAULT DETECTION AND REPORTING

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

84

Presentation_ID.scr

Fault Management • Fault management Process of identifying faults through through the use of network management toolsets NMS architecture design and resiliency Syslog collection, monitoring and Analysis SNMP trap collection and notification Exception reporting and analysis

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

85

Fault Management • Detection, notification, of network failures SNMP polling SNMP traps Syslog

• Proactive fault analysis MIB variables Threshold violations Syslog

• Fault infrastructure TFTP, NTP, time-stamps, out-of-band management and vendor access NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

86

Presentation_ID.scr

Fault Management Architecture • NMS stations: Centralized vs. distributed architecture Located close to the network core Adequate bandwidth and separation from other services Redundant hardware/network connectivity

• NMS UPS (Uninterruptible Power Supply) All NMS systems should be protected against power failures NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

87

SNMP Trap Collection and Notification The Collection and Notification of SNMP Traps Is Essential to Rapid Identification and Resolution • SNMP trap collection SNMP traps include generic traps and platform or technology specific traps Traps must be properly and consistently configured on all network devices as well as the network management systems

• SNMP trap notification NMS systems should notify and alert when a trap has been received

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

88

Presentation_ID.scr

Syslog • Collection Establish a centralized system to log all device messages Implement consistent Syslog server and logging configurations on all network devices

• Monitoring A tool or script that parses Syslog files for pre-determined messages and sends real time alerts or notifications to an event management system

• Analysis Periodic review and analysis of Syslog data should be performed daily NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

89

Exception Reporting and Analysis • The process of reviewing and correlating both

critical and non-critical network event data to determine root cause and long-term resolution • Reporting typically consolidates reoccurring events

into one event with an event quantity and sorts events by device, network area and/or message severity and type

“Identify and Resolve Chronic Network Problems” NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

90

Presentation_ID.scr

Fault Management Remote Office Sites

Data Data Collection

Regional Sites

Core

Monitoring

Alerting Notification

Reporting and Analysis

• SNMP traps

• Links errors

• Visual (graphics)

• Report availability

• SNMP polling

• Text

• SNMP exceptions

• Page

• Syslog review

• Performance

• Devices memory CPU • Applications

• Audible

• Fault metrics

• RMON

• Thresholds

• Syslog

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

91

Fault Management Tools • CiscoWorks Device Fault Manager • HP OpenView Network Node Manager • Aprisma Spectrum • SMARTS InCharge for Fault • IBM Tivoli • MicroMuse NetCool

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

92

Presentation_ID.scr

FAULT MANAGEMENT PROBLEM TRACKING

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

93

Problem Management • Problem tracking systems Allows the organization to document, track and report on infrastructure technology problems Reactive/proactive issues

• Priority and escalation procedures Help to ensure that business-impactin business-impacting g issues are assigned a priority and quickly escalated to support groups that can resolve the issue

• Tiered operations structure The network support structure should allow ample resources for problem resolution, proactive analysis, specialty areas, and escalation NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

94

Presentation_ID.scr

Problem Management Trouble Flow Tier ll Network Operations Center

Remote User

Remote Site

IT LAN OPS IT Voice Services IT OPS WAN OPS

Tier l Help Desk

Tier lll Engineering Support

Vendor Support Worldwide Technical Assistance Center (TAC) Support and Onsite Break-Fix Support

Proactive Support: Network Monitoring Reactive Support: User Reported Problem Case Handling

Development Engineers NMS-2T20 9594_04_2004_c2

95

© 2004 Cisco Systems, Inc. All rights reserved.

Problem Management Tiered Operations Structure Tier Support Levels Tier l Help Desk

Tier ll Network Operations Staff

Tier lll Engineering Staff

(Vendor) External Support NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

Roles and Responsibilities • • • • • • • •

Client liaison Recording problems Closing problems Initial problem determination Problem classification Problem escalation to tier 2 Problem resolution for non-complex pro blems, end user issues Management reports for case and availability reporting

• • • • • • • • •

Help desk mentoring Help desk support for complex problems Recording network incidents Resolving network incidents Implementing configuration and network system engineering changes Feedback to the design team on operational issues Problem resolution and network solution documentation Network site implementation documentation Escalation to tier 3

• • • • • • • • • •

Network design Network instrumentation for management Complex problem resolution Complex problem identification and isolation Network monitoring Network tuning Network troubleshooting Network capacity Network standards Network documentation integration

• Vendor problem resolution • Vendor problem reporting • Vendor SLA reporting 96

Presentation_ID.scr

Problem Management Problem Priority Definitions Problem Types

Problem Priority Categories Urgent: P1 • Severe business impact

Site Service Outages

• Loss of service or outage at a location

High: P2 • High business impact

Site Service Impairments

• Degradation, possible workaround exists • Service impairment

Medium: P3 • Minimal business impact

Client Service Problems

• Some specific network functionality is lost • Loss of redundancy

Low: P4 • No business impact

Client Admin and Change Requests NMS-2T20 9594_04_2004_c2

• A functional query 97

© 2004 Cisco Systems, Inc. All rights reserved.

MTTR Objectives/SLAs Support Objectives

Problem Types

Problem Category Type

Response Time during Prime Time

Response Time During Off-Peak

Mean Time to Repair Objective

15 Minutes

15 Minutes

2 Hours

Urgent

15 Minutes

15 Minutes

2 Hours

1 Hour 1 Hour

2 Hours 2 Hours

4 Hours 4 Hours

Medium

1 Business Business 1 Day Day

1 Business Business 1 Day Day

2 Business Business 2 Days Days

Low

1 Business 1 Business Day Day

1 Business 1 Business Day Day

5 Business 5 Business Days Days

Site Service Outages

High Site Service Impairments

Client Service Problems

Client Admin and Change Requests NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

98

Presentation_ID.scr

Network Security NETWORK SECURITY

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

99

Network Security Security Impacts Availability • Denial of Service and other attacks Incident response process Access security guidelines (modems, dialup, support, etc.) Proactive security review (PSIRT, audit) Intrusion detection tools/processes Password management

• Computer viruses Virus scanning tools and processes Incident response process NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

100

Presentation_ID.scr

Preparation, Prevention and Response Security Basics for High Availability Networks • Preparation Create usage policy statements Conduct a risk analysis Establish a security team structure

• Prevention Approving security changes Monitoring security of your network

• Response Security violations Restoration Review NMS-2T20 9594_04_2004_c2 9592_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

101

Security Policies • Security policy and procedures General security procedures Internet access Dial-in access Partner access

• Security operations Internet/partner monitoring CERT/vendor advisory review Security configuration practices Termination practices NMS-2T20 9594_04_2004_c2 9592_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

102

Presentation_ID.scr

Device Security • Device access control Secure access to devices via remote login, console access and SNMP AAA (TACACS+, RADIUS) SNMP access lists SNMP views

• Passwords “Enable Secret” “Service password-encryption”

NMS-2T20 9594_04_2004_c2 9592_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

103

Disaster Recovery DISASTER RECOVERY

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA.

104

Presentation_ID.scr

Disaster Recovery • A disaster recovery plan covers The hardware and software required to run critical business applications The associated processes to transition smoothly in the event of a disaster

• Assess your mission-critical business processes and associated applications before creating the full disaster recovery plan • Critical steps for best-practice disaster recovery: Disaster recovery planning Resiliency and backup services Vendor support services NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

105

Disaster Recover Planning Process • Establish a planning group • Perform risk assessments and audits • Establish priorities for your network

and applications • Develop resiliency design and recovery strategy • Prepare up-to-date inventory and documentation

of the plan • Develop verification criteria and procedures • Implementation NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

106

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Resiliency and Backup Services • Resiliency and backup services are a key part of disaster recovery • Cisco defines network resiliency as the ability to recover from any network failure or issue whether it is related to a disaster, link, hardware, design, or network services • A HA network design is often the foundation for disaster recovery and might handle some minor or local disasters • Key tasks for resiliency planning and backup services include the following: Assess the resiliency of your network, identify gaps and risks Review your current backup services Implement network resiliency and backup services

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

107

Vendor Support Services • Having support services from your major vendors in place adds a strong value to disaster recovery planning • For example, specific managed hot standby sites or on-site services with rapid response times can significantly ease disaster recovery • Key questions regarding vendor support include: Are support contracts in place? Has the disaster recovery plan been reviewed by the vendors, and are the vendors included in the escalation processes? Does the vendor have sufficient resources to support the disaster recovery?

• Most vendors have experience handling disaster situations and can offer additional support NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

108

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Cisco Services FTS: Focused Technical Support Fix It Faster! NOS: Network Optimization Services Make Proactive Improvements (Design, Software Selection, Optimization) NAIS: Network Availability Improvement Support NMS-2T20 9594_04_2004_c2

Identify Gaps with Gap Closure Assistance © 2004 Cisco Systems, Inc. All rights reserved.

109

DESIGNING AND MANAGING HIGH AVAILABILITY IP NETWORKS LUNCH

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

110

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Increasing Availability

Availability

A

M T T R Mean Time Time to Repair

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

Mean Time Between Failure

M T B F

111

Technology Perspective to Improve High Availability • Provide intelligent redundant elements Dual homing, multi-link options, box redundancy

• Leverage load balancing in redundant elements

when possible MLPPP, EtherChannel ®

• Detect failures faster Fine tuning failure detection intervals

• Recover from failures faster Fine tune routing protocol convergence, L2 recovery mechanisms (APS)

• Security and Quality of Service NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

112

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

High Availability Tool Kit

Application Level Resiliency Protocol Level Resiliency

Global Server Load Balancing, Stateful NAT, Stateful IPSec, DNS, DHCP, Cisco Server Load Balancing, IP QoS HSRP, VRRP, GLBP, MPLS-TE, IP Event Dampening , Graceful Restart (GR) in BGP, OSPF NSF, ISIS NSF, IP QoS

Transport/Link Level Resiliency

SONET APS, RPR, DWDM, EtherChannel, Spanning Tree Protocol, LFI, L2 QoS

Device Level Resiliency

Redundant Processors (RP), Switch Fabric, Line Cards, Ports, Power, NSF/SSO

Security at Every Level where Applicable

NMS-2T20 9594_04_2004_c2

113

© 2004 Cisco Systems, Inc. All rights reserved.

Enterprise Multilayer Network Access

Distribution

Core/Backbone

Core

Building Block Additions

Server Farm

NMS-2T20 9594_04_2004_c2

WAN © 2004 Cisco Systems, Inc. All rights reserved.

Internet

PSTN 114

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Service Provider Network SP 2 PE

CE

Service Provider 1

CE PE

Enterprise A

P

PE

CE

Enterprise D

Enterprise B

PE PE CE Enterprise E Enterprise C

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

115

Agenda • High Availability in Layer 2 Networks Access Distribution Core

• High Availability in Layer 3 Networks Access Distribution Core

• High Availability Components Layer 4 and Above

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

116

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

LAYER 2 HIGH AVAILABILITY

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

117

High Availability Tool Kit

Application Level

Global Server Load Balancing, Stateful NAT, Stateful IPSec, DNS, DHCP, Cisco Server Load Balancing, IP QoS

Resiliency Protocol Level Resiliency

HSRP, VRRP, GLBP, MPLS-TE, IP Event Dampening , Graceful Restart (GR) in BGP, OSPF NSF, ISIS NSF, IP QoS

Transport/Link Level Resiliency

SONET APS, RPR, DWDM, EtherChannel, Spanning Tree Protocol, LFI, L2 QoS

Device Level Resiliency

Redundant Processors (RP), Switch Fabric, Line Cards, Ports, Power, NSF/SSO

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

118

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Networking Transport Evolution Enterprise Scenario • Traffic originating in Enterprise network are

transported using

Ethernet/Fast Ethernet/Gigabit Ethernet for a majority of local area networks ATM and Frame Relay for WAN connectivity Metro Ethernet/Metro Optical DPT/RPR, ATM over SONET, Packet over SONET, etc. MPLS/IPSec

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

119

Networking Transport Evolution Service Provider Scenario • Traffic originating on service provider network backbone include Circuit-based like TDM voice and fax Packet-based like IP Cell-based like ATM or Frame Relay

• Majority of traffic is transported over SONET/SDH • Explosive growth of data compared to voice: POS • Scalable technologies like DPT/RPR use SONET/SDH framing and infrastructure: Metro and access networks • DWDM provides scalable solutions to prevent fiber exhaustion: Metro and long haul networks

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

120

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

L1/L2 High Availability • We will focus on the following L1/ L2

technologies from an HA perspective Ethernet RPR SONET

NMS-2T20 9594_04_2004_c2

121

© 2004 Cisco Systems, Inc. All rights reserved.

Transport Technology Comparison

GigE

SONET

DPT/RPR

Topology

Good for PtoP, Mesh

Ring

Ring

Recovery

Depends

< 50ms

< 50ms

Simple, Low Cost

RingRing-Based, Fast Fault Detection

Efficient, RingSimple, RingBased, Fast Fault Detection

Main Advantage

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

122

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

LAYER 2 HIGH AVAILABILITY Ethernet ETHERNET

NMS-2T20 9594_04_2004_c1

123

© 2004 Cisco Systems, Inc. All rights reserved.

Access Layer Detail • Access switch availability directly effect end user experience • Wiring closet up-links

Access Layer

Up-link fail-over redundancy Fast convergence across multiple up-links

• Bandwidth scalability

Wiring Closet

All uplinks actively forwarding traffic

Related Sessions: RST-2505 Campus Design Fundamentals RST-2514 High Availability in Campus Networks NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

124

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

EtherChannel Protocol • A logical aggregation of similar similar links links (up to 8): 10/100/1000/10GE ports • Operates between switches, routers, and certain vendors’ NICs

EtherChannel

• Channel always point-to-point • Two flavors Cisco’s PAgP IEEE 802.3ad

• Sub second recovery NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

125

Configuring EtherChannel On a Catalyst ® 6000: Console> (enable) set port channel 2/2-8 mode desirable Ports 2/2-8 left admin_group 1. Ports 2/2-8 joined admin_group 2. Console> (enable)

On a Cisco 7500: Router(config)# interface port-channel 1 Router(config)# ip address 10.0.0.1 255.255.255.0 Router(config)# ip route-cache distributed Router(config)# interface fasteth 0/0 Router(config)# no ip address Router(config)# channel-group 1 Router(config)# interface fasteth 0/1 Router(config)# no ip address Router(config)# channel-group 1 FastEthernet 0/1 added as member-2 to fechannel1 NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

126

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Access Layer, Layer-2 Mode Load Sharing • Dependent on spanning tree protocol

Root for Even VLANs

Root for Odd VLANs

Layer 2 Trunk

• Multiple VLANs • Per-VLAN STP allows for load sharing • STP permits forwarding around failures

VLAN Trunks

B3 F2 B2 B4 B5 B4 F2 F3 B5 F5 F5 F4 F4

2

3

4

5

4

B6 B7 F6 F7

5

6

7

F = Forward, B = Block NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

127

Spanning Tree Processing Spanning Tree State Machine Blocking

Time to Detect that Root Bridge Not Available: 20 Secs

Listening

Discarding Frames while Calculating New Root: 15 Secs

Learning

Discarding Frames while Learning Addresses: 15 Secs

Forwarding NMS-2T20 9594_04_2004_c2

Finally Forwarding Frames; to Reach this State: 50 Secs

© 2004 Cisco Systems, Inc. All rights reserved.

128

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Spanning Tree Extensions • Extensions decrease STP convergence time • PortFast for access ports (Link4) bypasses listening-learning phases • UplinkFast for direct root link failure (Link2): about 3 to 5 seconds convergence

Root Bridge

Link 1

Link 2

Link 3

X Blocked

• BackboneFast for indirect link failure (Link1): cuts convergence time by Max_Age seconds

Link 4

• Standardized with IEEE

802.1w NMS-2T20 9594_04_2004_c2

129

© 2004 Cisco Systems, Inc. All rights reserved.

IEEE 802.1w: Rapid Spanning Tree • Takes advantage of today’s topologies (full-duplex point-to-point links) • Remarkably similar to Uplinkfast/Backbonefast • No more network-wide timers when all switches run 802.1w • Handshake mechanism between bridges

Root

Proposal 1

Agreement 2

Proposal

Agreement

3

4

• Proposal-Agreement messaging (“I want to become designated: do you agree?”) • Can achieve 1+ second of convergence NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

130

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

IEEE 802.1s: Multiple Instance Spanning Tree • IEEE802.1q only requires one Spanning Tree

VLANs 1-50: Instance 1

VLANs 51-99: Instance 2

• Scales Per-VLAN-SpanningTree (PVST) • Two active topologies • All VLANs mapped to one of two topologies • Lower BPDU counts • Simpler implementation

F B

B F

• Much less CPU utilization • Very high scalability

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

131

LAYER 2 HIGH AVAILABILITY RESILIENT PACKET RING (RPR)

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

132

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Resilient Packet Ring (RPR) Standard • RPR is a layer 2 transport architecture Based on dual counter-rotating ring architecture Uses the best of Ethernet and SONET/SDH Uses SRP-fairness algorithm

• Standards-based on IEEE 802.17 RPR Protocol Draft • IEEE 802.17 is based on Cisco’s SRP (RFC 2892) • Supported on high-end devices • DPT/RPR name used interchangeably • Cisco is committed to SRP and IEEE standards Related Session: OPT 2043 802.17 and Spatial Reuse Protocol (SRP) Protocols NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

133

Application Areas of RPR Networks • PoP Intra-PoP LANs, Inter-PoP MANs and WANs 10–500+ meters over fiber

• Access Metro areas Single and multi-provider customer access MANs 25–100+ km over dark fibre

• Reg Region ional al Metr Metro o area 100–250+ km over dark fibre, DWDM Metro Core, Campus LAN, Enterprise MANs and WANs

• Long Distance Core area Long haul 500–2500+ km over DWDM NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

134

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

RPR Ring Build 7500

7600 PoP Fibers

SONET/SDH Circuits

Wavelengths

Dark Fibers

10720

12000

• A pair of dark fiber strands • A pair of DWDM derived wavelengths • A SDH add-drop STM-n circuit • Any combination of the above segments NMS-2T20 9594_04_2004_c2

135

© 2004 Cisco Systems, Inc. All rights reserved.

Logical RPR Ring

ONS 15194

DPT/RPR DPT/ RPR Ring

Logical Access Metro DPT/RP DPT /RPR R Ring Ring NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

Regional Metro Regional DPT/RP DPT /RPR R Rin Ring g 136

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Campus/Metro Deployment

NMS-2T20 9594_04_2004_c2

137

© 2004 Cisco Systems, Inc. All rights reserved.

SP Edge Deployment • 10/100/1000 Mbps access • Direct access (UTP, MM, SM) • Via Catalyst ® Layer 2+ • Via Catalyst Layer 3

DPT/RPR Aggregation Ring Cisco 10720

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

138

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

RPR Features: Spatial Reuse • Spatial reuse: Increases

overall ring aggregate bandwidth Unicast packets are “destination” stripped

A

B

D

C

Multicast is source stripped

• Multiple nodes can transmit

simultaneously • Both rings used for carrying

traffic: No reserved protection bandwidth NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

139

RPR Protection Switching: Intelligent Protection Switching (IPS) • Less than 50msec restoration if there is fiber/

node failure or signal degrade

• Two protection methods: Wrapping OR

Steering around failure • No reserved protection bandwidth unlike

SONET APS • Protection mechanism works with SONET/

SDH, Dark Fiber • Does not depend on layer 3 routing protocol

for convergence NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

140

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

802.17 Protocol: Protection Protection Failure Detection • Automatic

SF: Signal Fail based on PHY-sensed link failure or keepalive keepa live failure failure SD: Signal Degrade based on PHY-sensed link degradation condition

• Manual FS: Forced Switch initiated by the user MS: Manual Switch initiated by the user

• Detection delay L1 Holdoff: Used to delay the protection response to a PHY-sensed failure (0 to 200 ms) Keepalive Timer: Used to determine the duration of keepalive loss before a protection condition is raised (2 to 200 ms); keepalive frames100 arems also fairness updates and are transmitted approx. every NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

141

802.17 Protocol: Protection Protection Failure Recovery • Recovery from node or link failure • Wait to Restore (WTR) is used to reduce protection

flapping due to transient SD/SF failures • The WTR range is 0 to 1440 sec or never, default

is 10 sec • When the WTR is set to never the protection state

is non-revertive

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

142

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

RPR Protection Switching: Protection Wrapping • Neighbor nodes direct packets away from failure

Fiber Cut/Signal Fail, etc.

• Other nodes send traffic as normal

NMS-2T20 9594_04_2004_c2

X

N1

• Requires only two nodes adjacent to the failure to take action

N4

N2

N3

N1

N2

N4

N3 143

© 2004 Cisco Systems, Inc. All rights reserved.

802.17 Protocol: Protection Protection Steering • This protection mechanism requires all stations to exchange protection details, flush the existing queues (for strict traffic) and recalculate the new traffic path prior to completing the protection event Flush + Recalculate S1

S2

S1

S4

S3

S4

X

S2

S3

Normal Data Flow

Steering Protection Data Flow

S4–S1–S2 (Strip)

S4–3–S2 (Strip)

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

144

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

RPR Topology Display Node2 #show srp topology Topology Map for Interface SRP4/0 Topology pkt. sent every 5 sec. (next pkt. after 1 sec.) Last received topology pkt. 00:00:03 Last topology change was 05:59:02 ago. Nodes on the ring: 3 Hops (outer ring) MAC IP Address Wrapped SRR Name 0 00 0000 00.4 .414 142. 2.87 8799 99 9. 9.64 64.1 .1.3 .35 5 No No Node de2 2 1 00 0007 07.0 .0de dec. c.a3 a300 00 9. 9.64 64.1 .1.3 .36 6 No No Node de3 3 2 00 0010 10.f .f60 60d. d.7a 7a00 00 9. 9.64 64.1 .1.3 .34 4 No No Node de1 1

Side A

Side B

> <

> <

Node 1, srp4/0 9.64.1.34 Side B

Side A

Side A

Inner Ring

><

NMS-2T20 9594_04_2004_c2

Node 3, srp4/0 9.64.1.36

Side B

><

Node 2, srp4/0 9.64.1.35

Outer Ring

145

© 2004 Cisco Systems, Inc. All rights reserved.

RPR Topology Display Node2#show srp topology Topology Map for Interface SRP4/0 Topology pkt. sent every 5 sec. (next pkt. after 0 sec.) Last received topology pkt. 00:00:04 Last topology change was 00:00:09 ago. Nodes on the ring: 3 Hops (outer ring) MAC IP Address Wrapped SRR 0 00 0000 00.4 .414 142. 2.87 8799 99 9. 9.64 64.1 .1.3 .35 5 Ye Yes s 1 00 0007 07.0 .0de dec. c.a3 a300 00 9. 9.64 64.1 .1.3 .36 6 No 2 00 0010 10.f .f60 60d. d.7a 7a00 00 9. 9.64 64.1 .1.3 .34 4 Ye Yes s -

Side A

Name No Node de2 2 No Node de3 3 No Node de1 1

Side B

> <

> < Ouch!

Node 1, srp4/0 9.64.1.34 Side B

Side A

Side A

Inner Ring

NMS-2T20 9594_04_2004_c2

Node 3, srp4/0 9.64.1.36

><

Outer Ring

© 2004 Cisco Systems, Inc. All rights reserved.

Side B

><

Node 2, srp4/0 9.64.1.35 146

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

LAYER 2 HIGH AVAILABILITY SONET/SDH

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

147

SONET/SDH • Provides protection scheme for

physical-layer restoration • Restoration of failure within 50ms • Physical state is communicated to L3 • Available on SONET/SDH line cards on routers • K1/K2 link-layer control information of line

overhead (LOH) frame • Two types of APS Single router APS Multi-router APS NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

148

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

APS: Automatic Protection Switching Provides Automatic Failover Protection for SONET/SDH Lines PE/P Router

PE/P Router ADM

APS Across a Single Card

Active Ring Protection Ring

ADM

APS

PE/P Router ADM

PE/P Router

APS Across Two Cards in the Same Chassis

Single-Router APS NMS-2T20 9594_04_2004_c2

Multi-Router APS

© 2004 Cisco Systems, Inc. All rights reserved.

149

Single-Router APS • Protects against fiber failures and linecard failures,

but not whole router failures

• Switchovers are hidden from applications

at upper levels • On routers, it allows switchovers without causing

a slow layer 3 reconvergence • Conforms to Telcordia GR-253 for SONET and ITU

G.841 for SDH • The standards call for switchovers within 50msec

after detecting the failure NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

150

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Multi-Router APS • The major benefit of multi-router APS is protection

against fiber faults, linecard faults and even complete router failures

• Usually the working port is configured on one

router and the protect port is configured on a different router • Supported on Cisco high-end routing platforms • Multi-Router APS is a hybrid which depends

partially on APS switching and partially on layer 3 routing to direct the flow of packets • The two routers communicate control information

using protect group protocol NMS-2T20 9594_04_2004_c2

151

© 2004 Cisco Systems, Inc. All rights reserved.

Anatomy of an MR-APS Switchover Due to LOS Detected by the Working Router PE Router#1

i  nn g

CE Router (West)

ADM#1  W o r   k k SONET RING

ADM#2

PGP Channel

K1/K2 P r o o   K   1  /  K   t ee c 2  S t  (  w i   i  g n t h h

a l   i  n g )

1. 2. 3. 4. 5. 6. 7. 8.

PE Routers (East)

CE Routers (East)

Router#2

Initially packets are routed over the working lines which are active PE Router#1 detects LOS on received Working line and starts to bring the interface down Working router sends a PGP “State Change” message to the protect router Protect router signals Switch-to-protect request to ADM using K1/K2 bytes ADM selects the protect li ne and sends K1/K2 response back to protect router Router selects protect line and sends PGP “Working Disable” message to working router Working router deselects the working line After the routers reconverge, packets get routed over the newly active protect lines

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

152

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Protect Group Protocol • Protect Group Protocol: Proprietary protocol sent

as UDP packets (Port 172) between routers with MR-APS • Messages are retransmitted if no reply or Ack • PGP Hellos are sent at regular intervals • Authenticated by a configurable authentication

string sent with messages • Supports protocol versioning • Switching may occur due to LC/router crash, signal degradation, LOS (SF), manual switch NMS-2T20 9594_04_2004_c2

153

© 2004 Cisco Systems, Inc. All rights reserved.

APS: Automatic Protection Switching Provides Automatic Failover interface ethernet 0/0 Protection for ip address SONET/SDH Lines7.7.7.7 255.255.255.0 interface pos 2/0/0 aps working 1 ADM

APS Across a Single Card

Active Ring Protection Ring

ADM

APS

PE/P Router ADM interface ethernet 0/0 ip address 7.7.7.6 255.255.255.0 interface pos 3/0/0 Across Two aps protect 1 7.7.7.7

APS

Cards in the Same Chassis

Even across Two Separate Chassis

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

154

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

LAYER 3 HIGH AVAILABILITY: ACCESS

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

155

High Availability Tool Kit

Application Level

Global Server Load Balancing, Stateful NAT, Stateful IPSec, DNS, DHCP, Cisco Server Load Balancing, IP QoS

Resiliency Protocol Level Resiliency

HSRP, VRRP, GLBP, MPLS-TE, IP Event Dampening , Graceful Restart (GR) in BGP, OSPF NSF, ISIS NSF, IP QoS

Transport/Link Level Resiliency

SONET APS, RPR, DWDM, EtherChannel, Spanning Tree Protocol, LFI, L2 QoS

Device Level Resiliency

Redundant Processors (RP), Switch Fabric, Line Cards, Ports, Power, NSF/SSO

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

156

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Multilayer Network Design: Distribution Layer Features

Access Layer

Access

Distribution

Core/Backbone

Core

Building Block Additions

Server Farm

NMS-2T20 9594_04_2004_c2

WAN

Internet

PSTN

© 2004 Cisco Systems, Inc. All rights reserved.

157

First Hop Redundancy Protocols • Hot Standby Router Protocol (HSRP) Cisco informational RFC 2281 (March 1998)

• Virtual Router Redundancy Protocol (VRRP) IETF Standard RFC 2338 (April 1998)

• Gateway Load Balancing Protocol (GLBP) Cisco designed, load sharing, patent pending

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

158

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

HSRP • A group of routers function as one virtual router

by sharing ONE virtual IP address and ONE virtual MAC address • One (Active) router performs packet forwarding

for local hosts • The rest of the routers provide “hot standby”

in case the active router fails • Standby routers stay idle as far as packet

forwarding from the client side is concerned

NMS-2T20 9594_04_2004_c2

159

© 2004 Cisco Systems, Inc. All rights reserved.

First Hop Redundancy with HSRP R1: Active, Forwarding Traffic; R2, R3: Hot Standby, Idle HSRP ACTIVE

HSRP STANDBY

HSRP LISTEN

IP: 10.0.0.254 MAC: 0000.0c12.3456 vIP: 10.0.0.10 vMAC: 0000.0c07ac00

IP: 10.0.0.253 MAC: 0000.0C78.9abc vIP: vMAC:

IP: 10.0.0.252 MAC: 0000.0cde.f123 vIP: vMAC:

R1

Clients IP: MAC: GW: ARP:

R2

CL1

10.0.0.1 aaaa.aaaa.aa01 10.0.0.10 0000.0c07.ac00

R3

CL2

IP: MAC: GW: ARP:

10.0.0.2 aaaa.aaaa.aa02 10.0.0.10 0000.0c07.ac00

Gateway Routers

CL3

IP: MAC: GW: ARP:

10.0.0.3 aaaa.aaaa.aa03 10.0.0.10 0000.0c07.ac00

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

160

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

VRRP • Very similar to HSRP • A group of routers function as one virtual router

by sharing ONE virtual IP address and ONE virtual MAC address • One (master) router performs packet forwarding

for local hosts • The rest of the routers act as “back up” in case

the master router fails • Backup routers stay idle as far as packet

forwarding from the client side is concerned NMS-2T20 9594_04_2004_c2

161

© 2004 Cisco Systems, Inc. All rights reserved.

First Hop Redundancy with VRRP R1: Master, Forwarding Traffic; R2, R3: Backup VRRP ACTIVE

VRRP BACKUP

VRRP BACKUP

IP: 10.0.0.254 MAC: 0000.0c12.3456 vIP: 10.0.0.10 vMAC: 0000.5e00.0100

IP: 10.0.0.253 MAC: 0000.0C78.9abc vIP: vMAC:

IP: 10.0.0.252 MAC: 0000.0cde.f123 vIP: vMAC:

R1

Clients IP: MAC: GW: ARP:

R2

CL1

10.0.0.1 aaaa.aaaa.aa01 10.0.0.10 0000.5e00.0100

R3

CL2

IP: MAC: GW: ARP:

10.0.0.2 aaaa.aaaa.aa02 10.0.0.10 0000.5e00.0100

Gateway Routers

CL3

IP: MAC: GW: ARP:

10.0.0.3 aaaa.aaaa.aa03 10.0.0.10 0000.5e00.0100

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

162

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Access Layer: Layer 3 Mode Load Sharing • NOT dependent on Spanning Tree Protocol

HSRP Primary for Even VLANs

• May use multiple VLANs for load sharing with Multi-group HSRP

HSRP Primary for Odd VLANs

Layer 3 Link

VLAN Trunks

• No need for Layer 2 trunk between switches • Layer 3 link instead if summarizing routes

2

NMS-2T20 9594_04_2004_c2

3

4

5

6

7

8

9

163

© 2004 Cisco Systems, Inc. All rights reserved.

Multi-VLAN Load Balancing Methods Layer 3 Mode Load Balancing

Layer 2 Mode Load Balancing

HSRP 1A HSRP 2s

HSRP 1s HSRP 2A

VLAN Trunk A and B V    L  A   F    N    B   w  d    l    o  c   V    T    r  u   k    L  A   n  k   V    N    A   L  A   A   &  B   N    B

   B   A  &    k    N   B   A   n    N   r  u    L  A   A    T    L    V    V    N   d    k    L  A    F  w   o  c    V    l    B

VLAN A and B

V

L  A   F    o  r   N    T    w   r  u   a  r   n  k   d    A   V    &  B   L  A   N    A

   B    B   &    N   A   A    k    L   n    V   r  u   d    T   a  r    N   A   w   r    L    V   o    F

VLAN A and B

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

164

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

LAYER 3 HA: ACCESS GATEWAY LOAD BALANCING PROTOCOL

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

165

GLBP Problem Statement • Allow dynamic selection of multiple available

gateways to destination within a subnet • Provide automatic detection and re-routing to any gateway in the event of a failure

Fully Utilize Resources (Available Bandwidth) without Administrative Burden NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

166

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

GLBP Entities (Definitions) • GLBP Group A GLBP group consists of one or more GLBP gateways configured with the same GLBP group number

• GLBP Gateway A gateway or router running the Gateway Load Balancing Protocol; it may participate in one or more GLBP groups

• Virtual IP Address (vIP) An IPv4 address or IPv6 prefix; this is the IP address used as the hosts’ default gateway

• Virtual MAC Address A MAC address that a host may receive when it issues an address resolution request for the virtual IP address; there MAY be multiple virtual MAC address for each GLBP group NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

167

GLBP Entities (Definitions) (Cont.) • Active Virtual Gateway (AVG) One Virtual Gateway in a GLBP group is elected Active Virtual Gateway (AVG), and is responsible for operation of the protocol, i.e. allocating MAC addresses

• Active Virtual Forwarder (AVF) One Virtual Forwarder in a GLBP group elected the Active Virtual Forwarder (AVF), and is responsible for forwarding packets sent to a particular virtual MAC address; address; there may be multiple Active Virtual Forwarders in a GLBP group

• Secondary Virtual Forwarder (SVF) A Virtual Forwarder that has learned the virtual MAC address from a Hello message NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

168

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

First Hop Redundancy with GLBP R1: AVG; R1, R2, R3 All Forward Traffic GLBP AVG/AVF,SVF IP: 10.0.0.254 MAC: 0000.0c12.3456 vIP: 10.0.0.10 vMAC: 0007.b400.0101

GLBP AVF,SVF IP: 10.0.0.253 MAC: 0000.0C78.9abc vIP: 10.0.0.10 vMAC: 0007.b400.0102

R1

Clients IP: MAC: GW: ARP: NMS-2T20 9594_04_2004_c2

GLBP AVF,SVF IP: 10.0.0.252 MAC: 0000.0cde.f123 vIP: 10.0.0.10 vMAC: 0007.b400.0103

R2

CL1

10.0.0.1 aaaa.aaaa.aa01 10.0.0.10 0007.B400.0101

R3

CL2

IP: MAC: GW: ARP:

10.0.0.2 aaaa.aaaa.aa02 10.0.0.10 0007.B400.0102

Gateway Routers

CL3

IP: MAC: GW: ARP:

10.0.0.3 aaaa.aaaa.aa03 10.0.0.10 0007.B400.0103

© 2004 Cisco Systems, Inc. All rights reserved.

169

GLBP • GLBP routers function as one virtual router sharing one virtual IP address but using multiple virtual MAC addresses to forward traffic GLBP uses multicast to communicate between GLBP members with following detail: detail: 224.0.0.102, UDP port 3222 Virtual MAC addresses will be of the form: 0007.b4yy.yyyy where yy.yyyy equals the lower 24 bits; these bits consist of 6 zero bits, 10 bits that correspond to the GLBP group number, and 8 bits that correspond to the virtual forwarder number 0000 0 0001 0001 0000 0010 0007.b400.0102 : last 24 bits = 0000 0000 00 00 000 = GLBP group 1, 1, forwarder 2

• Allows traffic from a single common subnet to go through multiple redundant gateways using a single virtual IP address NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

170

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

GLBP Operation

1 AVG/AVF

2 AVF

vIP 10.88.1.10

glbp 1 ip 10.88.1.10 vMAC 0007.B400.0101 .1

R1

R2 ARP Reply

glbp 1 ip 10.88.1.10 vMAC 0007.B400.0102 .2 10.88.1.0/24

.4

.5 A

NMS-2T20 9594_04_2004_c2

B

ARPs for 10.88.1.10

ARPs for 10.88.1.10

Gets MAC 0007.B400.0101

Gets MAC 0007.B400.0102 171

© 2004 Cisco Systems, Inc. All rights reserved.

GLBP Operation

1 AVG/AVF

2 AVF

vIP 10.88.1.10

glbp 1 ip 10.88.1.10 vMAC 0007.B400.0101 .1

R1

R2

glbp 1 ip 10.88.1.10 vMAC 0007.B400.0102 .2 10.88.1.0/24

.4

.5 A

ARPs for 10.88.1.10 Gets MAC 0007.B400.0101

B

ARPs for 10.88.1.10 Gets MAC 0007.B400.0102

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

172

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

GLBP Operation

1 AVG/AVF

2 AVF

vIP 10.88.1.10

glbp 1 ip 10.88.1.10 vMAC 0007.B400.0101 .1

R1

R2

glbp 1 ip 10.88.1.10 vMAC 0007.B400.0102 .2 10.88.1.0/24

.4

.5 A

NMS-2T20 9594_04_2004_c2

B

ARPs for 10.88.1.10

ARPs for 10.88.1.10

Gets MAC 0007.B400.0101

Gets MAC 0007.B400.0102 173

© 2004 Cisco Systems, Inc. All rights reserved.

GLBP Operation

1 AVG/AVF

vIP 10.88.1.10

glbp 1 ip 10.88.1.10 .1

R1

2 AVF glbp 1 ip 10.88.1.10 vMAC 0007.B400.0102 R2 .2 vMAC 0007.B400.0101 10.88.1.0/24

.4

.5 A

ARPs for 10.88.1.10 Gets MAC 0007.B400.0101

B

ARPs for 10.88.1.10 Gets MAC 0007.B400.0102

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

174

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

GLBP Operation A

B

1 = vMAC 1 Color = Path

vIP 10.1.1.1 vIP 10.2.2.2 1

1

3

2

1 2

1 2

gw 10.1.1.1 NMS-2T20 9594_04_2004_c2

3

4

3 4

3 4

gw 10.2.2.2

© 2004 Cisco Systems, Inc. All rights reserved.

175

GLBP Configuration Example ! interface FastEthernet2/0 FastEthernet2/0 ip address 10.88.49.1 255.255.255.0 duplex full glbp 1 ip 10.88.49.10 glbp 1 priority 105 glbp 1 authentication text magicword glbp 1 weighting 100 lower 95 glbp 1 weighting track 10 decrement 10

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

176

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

GLBP Configuration Rules • Load balancing operates on a per-host basis All outbound traffic for a given host will use the same gateway

• Maximum of 4 MAC addresses per GLBP Group • Load balancing algorithm, 3 types: Round-robin Each virtual forwarder MAC takes turns Weighted Directed load determined by advertised weighting factor Host-dependent Ensures that each host is always given the same sa me vMAC

• Default algorithm is round-robin NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

177

GLBP Implementation Issues • Four entries per GLBP group will be used in

the MAC address filter of Ethernet interfaces configured with GLBP groups This may limit the number of groups configurable on an interface that supports only a hardware MAC address filter

• Security includes MD5 authentication • Only use GLBP for layer 2 switched environments So duplicate IP addresses will not be noticed

• Be careful with other IP services NAT, IPSec, Mobile IP, HA NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

178

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

LAYER 3 HA: ACCESS ENHANCED OBJECT TRACKING

NMS-2T20 9594_04_2004_c1

179

© 2004 Cisco Systems, Inc. All rights reserved.

Enhanced Object Tracking • HSRP allowed tracking

of interface line protocol state If the link failed, the HSRP Priority was reduced Another HSRP router with a higher priority could then takeover

s1/0

s1/0 A Active

B e0/0

e0/0 Standby

10.1.0.0

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

180

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Enhanced Object Tracking Enhanced Tracking Clients

Objects Tracked • Line protocol

HSRP • IP routing state

VRRP

Object Tracking Process

GLBP

• IP route reachability • IP route metric threshold

• Enhanced Object Tracking is a stand-alone process that tracks objects • HSRP, GLBP and VRRP act as clients seeking services of Enhanced Object Tracking NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

181

Benefits of Enhanced Object Tracking • More options to ensure high availability • Can help verify end-to-end path good • Provides scalable solution • Support for GLBP, HSRP, and VRRP

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

182

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

What Can I Track? • Interface “line-protocol” state Tracking process tracks the line-protocol state of the interface

• Interface “routing” state A tracked IP routing object is up when IP routing is enabled, the interface line-protocol is up and IP routing active on the interface

• State of an IP route (reachability) A tracked IP route object is considered up and reachable when a routing table entry exists for the route and the route is reachable

• IP route metric threshold Tracks the scaled metric value of an IP route to determine if it is above or below a threshold NMS-2T20 9594_04_2004_c2

183

© 2004 Cisco Systems, Inc. All rights reserved.

New CLI Commands and Options • track interface • track ip route

Object Specification Commands

• ip vrf (tracking) • threshold metric

EoT Customization Commands

• delay (tracking) • track timer • standby track • show track • debug track

Object Assignment Command for HSRP Client Display, Debug, and Troubleshooting Commands

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

184

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Enhanced Tracking Example Line Protocol Tracking track object-number interface type number {line-protocol | ip-routing} [up delay second s][down delay second s][down s] s] track 30 interface Serial3/0 line-protocol up delay 30 ! interface FastEthernet1/0 ip address 10.44.1.1 255.255.255.0 duplex full glbp 1 ip 10.44.1.10 glbp 1 weighting 100 lower 95 glbp 1 weighting track 30

NMS-2T20 9594_04_2004_c2

185

© 2004 Cisco Systems, Inc. All rights reserved.

Enhanced Tracking Example Interface IP Routing Tracking Router A Configuration track 100 interface serial1/0 ip routing interface Ethernet0/0 ip address 10.1.0.21 255.255.0.0 standby 1 ip 10.1.0.1 standby 1 priority 105 standby 1 track 100 decrement 10

A

s1/0

Active e0/0

s1/0 B e0/0 Standby

10.1.0.0

• Interface IP routing will go down if: IP routing is disabled globally Interface IP address is unknown (or IP is disabled or failed to negotiate) Interface line-protocol is down

• Useful for interfaces where IP address is negotiated For example, on a serial interface that uses PPP then the lineprotocol could be up (LCP negotiated successfully), but IP could be down (IPCP negotiation failed) NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

186

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Enhanced Tracking Example IP Route Reachability Tracking

s1/0

s1/0 A

B

Active e0/0

e0/0 Standby

10.1.0.0

Router A Configuration: track 100 interface serial1/0 ip routing ! track 101 ip route 10.22.0.0/16 reachability ! interface Ethernet0/0 ip address 10.1.0.21 255.255.0.0 standby 1 ip 10.1.0.1 standby 1 priority 105 standby 1 track 100 decrement 10 standby 1 track 101 decrement 10 NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

187

LAYER 3 HA: DISTRIBUTION AND CORE

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

188

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Multilayer Network Design: Distribution Layer Features

Distribution Layer

Access

Distribution

Core/Backbone

Core

Building Block Additions

Server Farm

NMS-2T20 9594_04_2004_c2

WAN

Internet

PSTN 189

© 2004 Cisco Systems, Inc. All rights reserved.

Distribution Layer Detail • Scalable layer 3

switching performance • Multiprotocol support at layer 3 • Redundancy and load balancing Distribution switch redundancy HSRP/GLBP can be tuned to achieve 1+ second recovery!

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

Distribution Layer

Wiring Closet

190

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Dual Equal-Cost Path with IP routing • Load balance: don’t waste bandwidth Unlike L1 and L2 redundancy

• Fast recovery to remaining path Detect L1 down and purge: 1 to 2 seconds

• Works with any routed fat pipes Gigabit Ethernet or EtherChannel DWDM or SONET or PVC infrastructure Equal Cost Routes to X Path A Path B

Path A

Destination Network X

Path B NMS-2T20 9594_04_2004_c2

191

© 2004 Cisco Systems, Inc. All rights reserved.

Defining the Distribution Layer

Si

Si

Si

Si

Distribution

Access • Availability, load balancing, QoS and provisioning are the important considerations at this layer • Aggregates wiring closets (access layer) and uplinks to core • Use layer 3 switching in the distribution layer • Protects core from high density peering and problems in access layer NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

• Spanning tree features: Only if you need them: Setting STP Root, Root Guard Rapid PVST+: Per VLAN 802.1w

• Route summarization, fast convergence, equal cost load balancing • HSRP or GLBP to provide first hop redundancy 192

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Multilayer Network Design: Core Layer Features

Core Layer

Access

Distribution

Core/Backbone

Core

Building Block Additions

Server Farm

WAN

NMS-2T20 9594_04_2004_c2

Internet

PSTN 193

© 2004 Cisco Systems, Inc. All rights reserved.

Service Provider Network SP 2 Service Provider 1 PE

CE

CE PE

Enterprise A

CE

Enterprise D

P

PE

Enterprise B

PE PE CE Enterprise E Enterprise C

SP Core NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

194

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Core Layer Detail • Redundant, fast-converging core • Choice to be made for Layer 2 or Layer 3: Layer 3 favored: Less RP neighbors Better multicast support Scales to more campus network modules

Core Layer

• Design independent of technology Gigabit Ethernet backbone, ATM backbone (L2), SONET, RPR

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

195

Improving Convergence Time • Failure Detection Tuning • IP Event Dampening • BGP Multi-path • MPLS Fast Re-Route (FRR)

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

196

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

LAYER 3 HA: DISTRIBUTION AND CORE FAILURE DETECTION AND FAILURE RECOVERY TUNING

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

197

Failure Detection Tuning • Cisco IOS ® exposes some timers which can be

tuned to speed failure detection/convergence • Tweaking will not help a network that already has significant problems • Only tweak if: You have a stable, predictable predictable network You have a lab which can provide an accurate simulation simulation You have a backout plan

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

198

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Layer 3 Failure Detection Tweaking HSRP: Must Be the Same for All Routers in the Group! Router(config)#int Router(config)# int eth0 Router(config)#standby 10 timers 1 3 Router(config)#standby

HSRP Also Supports Subsecond Timers with the Standby 10 Timers msec 30 msec 90

msec

Keyword:

OSPF: Must Be the Same for All Routers on the Subnet! Router(config)#int eth0 Router(config)#int Router(config)#ip Router(config) #ip ospf hello-interval 1 Router(config)#ip Router(config) #ip ospf dead-interval 3

EIGRP: Must Be the Same for All Routers on the Subnet! Router(config)#int eth0 Router(config)#int Router(config)#ip Router(config) #ip hello-interval eigrp <AS#> 1 Router(config)#ip Router(config) #ip hold-time eigrp <AS#> 3 NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

199

Routing Protocol Optimization • LSP throttling: provides the ability to generate LSP

quickly after failure with exponential back-off to handle subsequent multiple failures on the router • SPF throttling: ability to respond to changes very

quickly followed by exponential back-off to handle instabilities in the network • Incremental SPF (ISPF): leaf nodes impacted by

failure will not cause full SPF calculation • Partial route computation • Available in Cisco IOS: 12.0(24)S, 12.2(18)S,

12.3(2)T NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

200

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Events Triggered by Link Failure • Link fails Traffic is interrupted

• Local node LSP (LSA) generated by router local to failure (Time T1) SPF computation at local node and re-convergence (Time T2)

• Remote node Remote nodes receive LSP (LSA) Remote nodes re-compute SPF and re-converge (Time T3)

• Traffic flow resumes • Can we reduce MTTR by tuning T1, T2 and T3? NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

201

Backoff Timer Algorithm spf-interval <Max> [<Init> <Inc>] • Maximum interval: interval: Maximum amount of time the

router will wait between consecutives executions • Initial delay: delay: Time the router will wait before

starting execution • Incremental interval: interval: Time the router will wait

between consecutive execution; this timer is variable and will increase until it reaches maximum-interval

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

202

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

spf-interval 10 100 1000 E1 Event1

E2

E3

SPF 100ms

E4

E5

E6 E6

SPF 1000ms

E7

SPF 2000ms

4000ms

>>> then 8000ms >>> then maxed at 10sec >>> 20s without Trigger is required before resetting the SPF timer to 100ms NMS-2T20 9594_04_2004_c2

203

© 2004 Cisco Systems, Inc. All rights reserved.

Default Values • Maximum-interval:

• Incremental-interval:

SPF: 10 seconds

SPF: 5.5 seconds

PRC: 5 seconds

PRC: 5 seconds

LSP-Generation: 5 seconds

LSP-Generation: 5 seconds

• Initial-wait: SPF: 5.5 seconds PRC: 2 seconds LSP-Generation: 50 milliseconds

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

router isis spf-interval 1 1 50 prc-interval 1 1 50

204

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Timers for Fast Convergence • The timers are

designed to optimize the propagation of the information to other nodes

router isis lsp-ge lsp -gen-i n-inte nterva rval l 5 1 50

Init-Wait = 1ms, 49ms faster than default Exp-Inc = 50ms

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

205

Incremental-SPF • When the topology has changed, instead of

building the whole SPT from scratch just fix the part of the SPT that is affected • Only the leaves of the nodes re-analyzed

during that process are updated in the RIB

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

208

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Incremental-SPF G

F

Cost: 8, NH: D, B C

E

Cost: 13, NH: D

Cost: 6, NH: D, B

D

Cost: 3, NH: D

Cost: 11, NH: B

S2 B

S3

S1

Cost: 3, NH: B

S0

A

Cost: 0, NH: --

C-G Link Is Down; C-G Link Was Not Used in SPT Anyway, Therefore, There Is No Need to Run SPF

NMS-2T20 9594_04_2004_c2

209

© 2004 Cisco Systems, Inc. All rights reserved.

Incremental-SPF H

G

F

Cost: 8, NH: D, B C

E

Cost: 13, NH: D

Cost: 6, NH: D, B

D

Cost: 3, NH: D

Cost: 11, NH: B

S2 B

Cost: 3, NH: B

F Reports a New Neighbor; the SPT Need Only to Be Extended behind F; There Is No Need for Router a to Recompute the Whole SPT Router a Will Compute SPF from Node F

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

S3

S1 S0

A

Cost: 0, NH: --

210

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

LAYER 3 HA: DISTRIBUTION AND CORE IP EVENT DAMPENING

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

211

IP Event Dampening • Prevents routing protocol churn caused by

constant interface state changes • Supports all IP routing protocols Static routing, RIP, EIGRP, OSPF, IS-IS, BGP In addition, it supports HSRP and CLNS routing Applies on physical interfaces and can’t be applied on subinterfaces individually

• Available in 12.0(22)S, 12.2(13)T

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

212

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

IP Event Dampening: Concept • Takes the concept of BGP route-flap dampening

and applies it at the interface level, so all IP routing protocols can benefit • Tracks interface flapping, applying a “penalty”

to a flapping interface • Puts the interface in “down” state from routing

protocol perspective if the penalty is over a threshold tolerance • Uses exponential decay algorithm to decrease the

penalty over time and brings the interface back to “up” state NMS-2T20 9594_04_2004_c2

213

© 2004 Cisco Systems, Inc. All rights reserved.

IP Event Dampening—Deployment Dampening—Deployment Primary Link

R1

HQ/ISP

R3

Remote Office/Ent

Backup Link

R2

Link Flapping Causes Routing Reconvergence and Packet Loss Physical State of Primary Link R3 Path to HQ/ISP NMS-2T20 9594_04_2004_c2

Up Down

P

B

P

B

P

B

P

B

P

Duration Durat ion of P Packet acket Loss *P- Primary *B - Back Backup up © 2004 Cisco Systems, Inc. All rights reserved.

214

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

IP Event Dampening: Deployment Primary Link

R1

HQ/ISP

IP Event Dampening

R3

Remote Office/Ent

Backup Link

R2

IP Event Dampening Absorbs Link Flapping Effects on Routing Protocols Physical Up State of Primary Down Link Logical Up State of Primary Down Link R3 Path

P

B

P

P

B

to HQ/ISP NMS-2T20 9594_04_2004_c2

Duration of Packet Loss © 2004 Cisco Systems, Inc. All rights reserved.

215

IP Event Dampening: Algorithm interface Serial 0 dampening [half-life] [reuse suppress max-suppress] [restart <penalty>]

• Penalty Penalty:: A value applied to the interface each time it flaps • Half-life: Half-life : Amount of time that must elapse without a flap to reduce penalty by half • Suppress: Suppress: If penalty exceeds this value, interface is suppressed from routing protocols’ perspective • Reuse: Reuse: If penalty goes below this numeric limit, interface is reintroduced to routing protocols

Max-Suppress: Maximum amount of time an interface can • Max-Suppress: be suppressed <penalty>: Determines initial penalty (if any) to be • Restart <penalty>: applied to interface when system boots NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

216

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

IP Event Dampening: Algorithm Illustration Actual Interface State

Accumulated Penalty

Maximum Penalty Suppress Threshold

Reuse Threshold Perceived Interface State by Routing Protocols

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

217

LAYER 3 HA: DISTRIBUTION AND CORE BGP MULTI-PATH

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

218

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

iBGP Multi-path: BGP Behavior before iBGP Multi-path

R2 R1 AS 100

R3

R4 AS 200 10.0.0.0/8 R5

• R1 has two paths for 10.0.0.0/8 • Both paths have identical < weight, AS-PATH, origin, localpref, MED >; ONLY next HOPS are different • R1 selects one path as best and send all traffic for 10.0.0.0/8 towards one of the exit points • BGP installs only the best path unlike other routing protocols!! NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

219

BGP Multi-path Review • Allows a router to install multi-path in the RIB • Traffic will be sent to destinations on multiple paths

for load balancing and efficient link utilization • Conditions for iBGP multipath selection All attributes (weight, local preference, AS-path entire attribute not just length), origin, MED, and IGP distance are same The next-hops of the paths are different

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

220

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

iBGP Multi-path • Flag multiple iBGP paths as ‘multi-path’ Each path must have a unique NEXT_HOP

• Number of multi-paths can be controlled maximum-paths maximum -paths i ibgp bgp <1<1-6> 6>

• The best path as determined by the decision

algorithm will be advertised to our peers • Each BGP next-hop is resolved and mapped to

available IGP paths • Support iBGP multi-path 12.0(16.6)ST, 12.2(14)S

NMS-2T20 9594_04_2004_c2

221

© 2004 Cisco Systems, Inc. All rights reserved.

iBGP Multi-path • R1 has two

paths for 10.0.0.0/8 • Both paths are flagged as “multipath”

R2 R1

R4 AS 200 10.0.0.0/8

AS 100 R3

R5

R1#sh ip bgp 10.0.0.0 200 20.20.20.3 from 20.20.20.3 (3.3.3.3) Origin IGP, metric 0, localpref 100, valid, internal, multipath internal, multipath 200 20.20.20.2 from 20.20.20.2 (2.2.2.2) Origin IGP, metric 0, localpref 100, valid, internal, internal, multipath multipath, , best NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

222

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

iBGP Multi-path R1#sh ipentry routefor 10.0.0.0 Routing 10.0.0.0/8 * 20.20.20.3, from 20.20.20.3, Route metric is 0, traffic AS Hops 1 20.20.20.2, from 20.20.20.2, Route metric is 0, traffic AS Hops 1

00:00:09 ago share count is 1 00:00:09 ago share count is 1

R1#show ip cef 10.0.0.0 10.0.0.0/8, version 237, per-destination sharing 0 packets, 0 bytes via 20.20.20.3 20.20.20.3, , 0 dependencies, recursive traffic share 1 next hop 20.20.20.3, FastEthernet0/0 via 20.20.20.3/32 valid adjacency via 20.20.20.2, 20.20.20.2, 0 dependencies, recursive traffic share 1 next hop 20.20.20.2, FastEthernet0/0 via 20.20.20.2/32 valid adjacency

• These two paths are installed in the RIB/FIB • Traffic is load-balanced across the two paths/exit points NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

223

eiBGP Multi-path for MPLS VPN Networks • Enables PE routers to send traffic to the destination

via multiple paths for load balancing via iBGP peer [or] via eBGP peer

• Applicable for MPLS VPN environment ONLY • Improves load balancing traffic in MPLS

VPN network • Useful on PE routers that import eBGP and iBGP

paths from multi-homed and stub networks • Supported in 12.0(24)S image NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

224

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

eiBGP Multi-path: Packet Flow Rules • Labeled traffic: forwarding information on eBGP

paths used • IP traffic: forwarding information on eBGP and

iBGP paths used

NMS-2T20 9594_04_2004_c2

225

© 2004 Cisco Systems, Inc. All rights reserved.

eiBGP Multipath: Deployment Scenario Site 3

CE3

Site 1

CE1

PE1

PE3

Site 2

PE2

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

CE2

226

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

eiBGP Multi-path: Scenario 1 Site 3

• On PE1 [with eiBGP enabled] Traffic coming from Site 1 to Site 2 Incoming traffic is IP Hence only FIB Table will be looked [not LFIB] FIB Table will have the labels for iBGP path(s) only [Since] PE1 doesn’t have any eBGP paths [to Site 2] iBGP multi-path portion of the eiBGP multi-path comes into picture here Hence PE1 only will loadshare on iBGP Paths On the link(s) between

CE3

Site 1 CE1

PE1

PE3

Site 2 CE2

PE2

PE1 and PE2 [iBGP] PE1 and PE3 [iBGP] NMS-2T20 9594_04_2004_c2

227

© 2004 Cisco Systems, Inc. All rights reserved.

eiBGP Multi-path: Scenario 1 (cont’d) Site 3

• On PE2 and PE3 CE3

[with [wit h eiBGP enabled] enabled]

Traffic coming from Site 1 to Site 2 Incoming traffic has at least one label [VPN] [MPLS Traffic] Hence only LFIB Table will be looked [not FIB] LFIB will have only eBGP path(s) installed Hence PE2 and PE3 will send traffic on eBGP path(s) only If iBGP paths are also installed in LFIB, we may get into forwarding loops; [e.g.. between PE2 and PE3] [Because we don’t want to forward a packet received from the provider network back into it]

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

Site 1 CE1

PE1

PE3

Site 2 PE2

CE2

228

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

eiBGP Multi-path: Scenario 2 Site 3

• On PE3 [with eiBGP enabled] Traffic coming from Site 3 to Site 2 Incoming traffic is IP Hence only FIB Table will be looked [not LFIB]

CE3

Site 1 CE1

PE1

PE3

Site 2 CE2

PE2

FIB Table will have the label for iBGP path(s) and IP Forwarding Information for eBGP path(s) [Layer 2 Header, Output Interface] Hence PE3 will send traffic on both eBGP and iBGP path(s) On the link between PE3 and PE2 [iBGP] PE3 and CE2 [eBGP] NMS-2T20 9594_04_2004_c2

229

© 2004 Cisco Systems, Inc. All rights reserved.

eiBGP Multi-path: Scenario 2 Site 3

• On PE2 [with eiBGP enabled] CE3

Traffic coming from Site 3 to Site 2 Incoming traffic has at least one label [VPN] [MPLS Traffic] Hence only LFIB Table will be looked [not FIB] LFIB will have only eBGP path(s) installed Hence PE2 will send traffic on eBGP Path(s) only If iBGP paths are also installed in LFIB, we may get into routing loops [e.g., between PE2 and PE3] [Because we don’t want to forward a packet received from the Provider Network back into it] NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

Site 1 CE1

PE1

PE3

Site 2 PE2

CE2

230

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

LAYER 3 HA: DISTRIBUTION AND CORE MPLS TRAFFICE ENGINEERING

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

231

MPLS Traffic Engineering Fast Re-Route • MPLS traffic engineering allows network administrators

to define explicit paths for traffic with some constraint Path from A to B with 50 Mbps bandwidth

• FRR is a method of protecting MPLS traffic engineering label switched paths • The idea is to locally repair the LSP at the point of failure By re-routing traffic over a pre-defined back-up tunnel Prevents packet loss while IGP converges

• Protection against link and node failures

Related Session: RST-2603 Deploying MPLS Traffic Engineering NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

232

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Link vs. Node Protection Backup Tunnel

Primary LSP Path R5

R1

R3 Protected Link

R2

R4

Backup Tunnel

Primary LSP Path R5

R1 NMS-2T20 9594_04_2004_c2

R2 Protected Link

R6

R3

© 2004 Cisco Systems, Inc. All rights reserved.

R4 Protected Node 233

LAYER 3 HIGH AVAILABILITY: NETWORK EDGE

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

234

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Multilayer Network Design: Edge Module Features

WAN Edge Modules

Access

Distribution

Core/Backbone

Core

Building Block Additions

Server Farm

WAN

NMS-2T20 9594_04_2004_c2

Internet

PSTN 235

© 2004 Cisco Systems, Inc. All rights reserved.

Service Provider Network SP 2 Service Provider 1 PE

CE

CE PE

Enterprise A

CE

Enterprise D

P

PE

Enterprise B

PE PE CE Enterprise E Enterprise C

NMS-2T20 9594_04_2004_c2

Service Provider Edge Aggregation

© 2004 Cisco Systems, Inc. All rights reserved.

236

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

High Availability Tool Kit

Application Level Resiliency Protocol Level Resiliency Transport/Link Level Resiliency Device Level Resiliency

NMS-2T20 9594_04_2004_c2

Global Server Load Balancing, Stateful NAT, Stateful IPSec, DNS, DHCP, Cisco Server Load Balancing, IP QoS HSRP, VRRP, GLBP, MPLS-TE, IP Event Dampening , Graceful Restart (GR) in BGP, OSPF NSF, ISIS NSF, IP QoS SONET APS, RPR, DWDM, Ether Channel, Spanning Tree Protocol, LFI, L2 QoS Redundant Processors (RP), Switch Fabric, Line Cards, Ports, Power, NSF/SSO

237

© 2004 Cisco Systems, Inc. All rights reserved.

Aggregation Edge Is Vulnerable Single Point Connectivity

WAN

Core

Aggregation Edge: Enterprise WAN Edge or SP Aggregation Router

• Single point of failure for 100’s to 1000’s of circuit

terminations • Redundant components used: fans, power, fabric,

route processors NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

238

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Route Processor Redundancy Active Route Processor

• Standby RP is in “hot standby” Standby • Chassis and line card states sync Route Processor • Line protocols: ATM, FR etc. in sync

SSO

• Forwarding table is in sync.

RPR+

RPR

Standby RP is in “hot boot” Startup and running config. in sync During failover line protocols reset Forwarding table is NOT in sync

• • • • •

Standby RP is in “cold boot”. Start up configs are in sync Running configs are not in sync During fail over standby resets line cards and restarts system

NOTE: Router Reload Forces Both Route Processors to Restart

Line Card NMS-2T20 9594_04_2004_c2

• • • •

239

© 2004 Cisco Systems, Inc. All rights reserved.

Router Internals Overview… Forwarding Engine

•••

Line Card Route Processor

MSFC2 (RP)

•••

Multigigabit Multigigabit Crossbar Crossbar Fabric Fabric

Line Card

PFC2 (Earl6)

A

Route Processor

Maintenance Bus

Cisco 12000 Router More Details:

NMS-2T20 9594_04_2004_c2

8 Port GE Line Card

Power Supply

Fan/Blower System

ASIC

Cisco 7600 Router

ASIC

GBIC GBIC

GBIC GBIC

GBIC GBIC

GBIC GBIC

RST 2311 Packet Forwarding Operation on Mid to High-End Routers and Switches RST 2312 Control Plane Operation on Mid to High-End Routers and Switches

© 2004 Cisco Systems, Inc. All rights reserved.

240

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Non-Stop Forwarding Active RP Stores Routes in Routing Information Base FIB Is Transferred to Line Cards and/or Standby RP

Forwarding Information Base

RP and LCs Forward Packets Based on FIB

RP Dies

Standby RP Becomes Active

• FIB is preserved for Non-Stop • Forwarding (NSF) • Graceful restart/NSF mechanism

used to re-converge NMS-2T20 9594_04_2004_c2

Standby RP or Line Card Forwards Packets while Converging

© 2004 Cisco Systems, Inc. All rights reserved.

241

How Does the Redundant RP Handle Routing Protocols? • Using the Graceful restart (or NSF) mechanisms

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

242

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

What Is Graceful Restart? • Under certain failure conditions when a routing

process restarts it seeks the help of peer routers to re-learn routes and resume neighbor relationship while: The data traffic continues to be routed between the restarting router and peers The peer does not pre-maturely declare the restarting router dead

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

243

Cisco Implementation of Graceful Restart • The failure conditions are applicable in platforms

with dual Route Processors (RP) and the conditions force a switch over from active to standby RP • The two RP’s should be in Stateful Switchover

(SSO) mode

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

244

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Networks without NSF/SSO and Graceful Restart 1) Before PE Failover

2) During PE Failover 1.

CE

P PE Adjacency Established Traffic Flow

CE

3) After PE Failover

P PE PE Router Restarts Adjacency Fails; Traffic Stops

PE

CE

Adjacency Re-Established Traffic Stopped NMS-2T20 9594_04_2004_c2

CE

4) After PE Failover CE

P

CE

CE

P

PE

CE

Traffic Flow Resumes 245

© 2004 Cisco Systems, Inc. All rights reserved.

Networks with NSF/SSO and Graceful Restart 1) Before PE Failover

2) During PE Failover 1.

CE(NSF Aware) PE P(NSF Aware) Adjacency Established Traffic Flow

CE(NSF Aware)

3) After PE Failover

P PE PE Router Restarts Traffic Flow Continues

CE

CE

4) After PE Failover CE

P PE Routing Updates Exchanged Traffic Flow Continues NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

P PE Traffic Flow Un-interrupted Non-Stop Forwarding!!

CE

246

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Relationship Building Exercise: 1 GR (NSF/SSO) Capable Router

GR (NSF) Aware Peer During Switchover • Will preserve my forwarding table

Will Preserve Forwarding Table during RP Switchover

NMS-2T20 9594_04_2004_c2

• Will not declare you dead

Agreement

• Will not inform my neighbors

247

© 2004 Cisco Systems, Inc. All rights reserved.

Relationship Building Exercise: 2 GR (NSF/SSO) Capable Router

RP Switchover

Will Build Database with Neighbor’s Information

GR (NSF) Aware Peer

Restart Notification and Acknowledgement

Knowledge Transfer

ACK

Will share Database Information with Neighbor

Updates

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

248

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

NSF/SSO Terminology • NSF capable router (restarting router) A router that preserves it’s forwarding table and rebuilds it’s routing topology after an RP switch over; currently a dual RP router ex: Cisco 7500, 10000, 12000, 7304

• NSF aware router (peer) A router that assists an NSF capable during duri ng restart and can preserve routes reachable via the restarting router

Router B NSF Aware Router A

NSF Capable Router C NSF Unaware

ex: Cisco 7200, 3600, 2600, 1700

• NSF unaware router A router that is not capable of assisting an NSF Capable router during an RP switchover

• NSF capable router is NSF aware too!!!! NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

249

LAYER 3 HA: NETWORK EDGE GRACEFUL RESTART IN ROUTING PROTOCOLS

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

250

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

OSPF NSF • Competing drafts proposed in IETF • Cisco calls it’s implementation OSPF NSF, others

call their implementation OSPF hitless restart • Cisco implementation is Cisco IOS:12.0(22)S,

12.2T, 12.2S (release and device dependent)

NMS-2T20 9594_04_2004_c2

251

© 2004 Cisco Systems, Inc. All rights reserved.

OSPF NSF Operation Summary OSPF NSFCapable Router OSPF NSF Capability Exchange

OSPF NSF-Aware Peer

OSPF Hello with LR Bit Set

OSPF NSF Capability Exchange

Router Restarts

Send Restart Notification

Hello with Restart (RS) Bit Hello w/o Restart (RS) Bit

Request Database Information Run SPF Calculation Update Forwarding Table

Link State Database Request

Send Entire Database

Send Link State Database

Send Updates Only NMS-2T20 9594_04_2004_c2

Respond to Restart Hello w/o RS Bit Set

Send Updates

CONVERGED!

© 2004 Cisco Systems, Inc. All rights reserved.

Update Routing Table 252

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Relevant Show Commands Active RP: show ip ospf esr2#show ip ospf Routing Process "ospf 1" with ID 2.2.2.1 and Domain ID 0.0.0.1 Supports only single TOS(TOS0) routes <snip> Number of areas in th this is router i is s 1. 1 norm normal al 0 stub 0 nssa External flood list length 0 Forwarding rwarding enabled, enabled, last NSF restart restart 00:02:51 ago Nonstop Fo (took 37 secs) Area BACKBONE(0) BACKBONE(0) Number of interfaces interfaces in this area area is 1 (0 l loopback) oopback)

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

253

Relevant Show Commands (Cont.) Active RP: show ip ospf neighbor detail esr2#show ip ospf neighbor det Neighbor 3.3.3.1, 3.3.3.1, interface address address 192.10.0.3 192.10.0.3 In the area 0 via interface GigabitEthernet1/0/0 Neighbor priority priority is 1, State State is FULL, 7 state state changes DR is 192.10.0.3 BDR is 192.10.0.2 Options is 0x52 LLS Options is 0x1 (LR), last OOB-Resync 00:03:08 ago Dead timer due in 00:00:37 Neighbor is up for 00:03:32

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

254

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

BGP Graceful Restart • IETF draft:draft-ietf-idr-restart-06.txt • Provides a graceful recovery mechanism for

a restarting BGP process • Cisco implementation is Cisco IOS:12.0(22)S,

12.2T, 12.2S (release and device dependent) • Requires a graceful restart aware neighbor

NMS-2T20 9594_04_2004_c2

255

© 2004 Cisco Systems, Inc. All rights reserved.

Graceful Restart BGP Operation Summary BGP GRCapable Router

Router Restarts

Send Restart Notification

BGP GRAware Peer OPEN w/ Graceful Restart Capability 64 OPEN w/ Restart Bit Set

Session Established

OPEN w/ Capability 64 Send BGP Hello

Performs Best Path Selection when EoR Is Received

Send Initial Updates, End of RIB (EoR)

Send Updates+ EoR

CONVERGED! NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

Acknowledge Restart, Mark Routes Stale, Start Restart Timer Stop Restart Timer, Start Stale-path Timer Stop Stale-path Timer, Delete Stale Prefixes and Refresh with New Ones 256

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

BGP Graceful Restart Timers • Restart timers are used by peers to set the amount

of time it waits for a restarting router to establish a BGP session after it has indicated a restart • Stalepath timers are used by peers to set the

amount of time it waits to receive an End of RIB marker (end of RIB indicates the neighbor has converged) from the restarting router

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

257

BGP Graceful Restart Timers • Important to keep restart timer below hold time • Default values BGP hold time 180 seconds (3 x 60 sec keepalive) Restart timer default 120 seconds Stale path timer default 180 seconds

• Restart timer is advertised to the peer • Stale path timer is used internally by the router

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

258

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

BGP GR: Deployment Consideration: 1 • Consider routes between R1 and R2 when R1 undergoes graceful restart R1 preserves all routes to AS200 and continues forwarding traffic

AS200 (Non-NSF)

AS300 (NSF Aware)

R3

R2

AS400 (Non-NSF)

R4

R1(NSF/SSO)

R2 reaches AS100 via AS300 All traffic from R2 to R1 goes via R3 All traffic from R1 to R2 goes directly; this can lead to temporary asymmetric routing No packet loss will be experienced from R1 to R2 Some packet loss from R2 to R1 during the re-convergence NMS-2T20 9594_04_2004_c2

RC2

RC1 AS100 RR

NSF Aware ALL Routers Unless Are Indicated 259

© 2004 Cisco Systems, Inc. All rights reserved.

BGP GR: Deployment Consideration: 2 • Consider routes between R1

and R4 when R1 undergoes graceful restart R1 preserves all routes to AS400 and continues forwarding traffic

AS200

AS300

AS400

(Non-NSF)

(NSF Aware)

(Non-NSF)

R3

R2

R4 R1(NSF/SSO)

R4 removes all routes to A R1 continues to forward traffic to R4 R4 does not forward traffic to R1 till R1 re-converges

RC2

RC1 AS100 RR

ALL Routers Are NSF Aware Unless Indicated NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

260

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

BGP GR: Deployment Consideration: 3 • Consider routes between R1 and R3 when R1 undergoes graceful restart R1 preserves all routes to AS300 and continues forwarding traffic

AS200 (Non-NSF)

AS300 (NSF Aware)

R3

R2

AS400 (Non-NSF)

R4 R1(NSF/SSO)

R3 preserves all routes to AS100 and continues forwarding traffic RC2

RC1 AS100 RR

ALL Routers NSF Aware Unless Are Indicated NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

261

BGP Graceful Restart Commands

R18C12KRP(config)#router bgp 100 R18C12KRP(config-router)# R18C12KRP(config-router) # bgp graceful-restart R18C12KRP(config-router)# R18C12KRP(config-router) # bgp graceful-restart restart-time 120 R18C12KRP(config-router)# bgp graceful-restart stalepath-time R18C12KRP(config-router)# 360

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

262

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

BGP Graceful Restart Commands R18C12KRP#sh ip bgp nei BGP neighbor is 10.10.104.1, remote AS 100, internal link BGP version 4, remote router ID 10.10.104.1 BGP state = Established, up for 00:00:10 Last read 00:00:09, hold time is 180, keepalive interval is 60 seconds Neighbor capabilities: Route refresh: advertised and received(new) Address family IPv4 Unicast: advertised and received

Indicates Neighbor Is NSF Aware

Graceful Restart Capabilty: advertised and received Remote Restart timer is 140 seconds Address families preserved by peer: IPv4 Unicast

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

263

Show Command on Peer Router On Peer of Restarting Router ip9-75b# show ip bgp BGP table version is 209, local router ID is 11.11.11.11 Status codes: s suppressed, d damped, h history, * valid, valid, > best, i – internal S Stale Origin Origin codes: codes: i - IGP, IGP, e - EGP, EGP, ? - incomp incomplet lete e Network Next Hop Metric LocPrf Weight Path *> 11.0.0.0 0.0.0.0 0 32768 i *> S 170.10.10.0/24 180.10.10.3 0 0 200 101e 180.10.10.3 0 0 200 101e *> S 180.10.10.0/24 *> S 190.10.10.0/24 180.10.10.3 5 0 200 101e

Marked Stale

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

264

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

EIGRP NSF Operation Summary NSF-EIGRP NSF Capable Router EIGRP NSF Capability Exchange Router Restarts Set Signal Timer; Send Restart Notification Stop Signal Timer; Start Convergence Timer

EIGRP NSF NSF--Aware Peer EIGRP Hello with Restart Options Fields Hello with Restart Bit Set Hello Response w/o Restart Bit Set Null Restart + INIT Restart + Update Packets

Stop Convergence Timer Calculate Best Path Send Startup Updates + EOT NMS-2T20 9594_04_2004_c2

Startup updates + End of Table (EOT) Packet Startup Update + EOT

EIGRP NSF Capability Exchange

Respond to Restart Hello w/o RS Bit Set

Set Route Hold Timer Send Database Info: Send End of Table

Stop Route Hold Timer Update Forwarding Table

CONVERGED! © 2004 Cisco Systems, Inc. All rights reserved.

265

EIGRP NSF Timers • On restart router Signal timer: Used to send Hello with Restart bit set; when this timer expires Hellos’ are sent without Restart bit set Convergence timer: Used to set the amount of time the restarting router waits to receive EOT marker from peers

•

On the peer Route hold timer: Used by peer to indicate the amount of time the peer waits to receive routing updates and EOT marker from restarting router

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

266

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

EIGRP NSF: Configuration Commands • On restarting router router eigrp 100 nsf timers time rs nsf signa signall timers time rs nsf conv converge erge

• On peer router eigrp 100 nsf timers time rs nsf route route-hold -hold

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

267

ISIS NSF • Cisco’s ISIS NSF implementation comes

in two flavors IETF version: draft-ietf-isis-restart-0X draft-ietf-isis-restart-0X Cisco version

• The difference between them IETF version depends on neighbors to rebuild the routing table Cisco version does not depend on neighbors to rebuild routing table; peer can be non-NSF aware

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

268

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

ISIS NSF Operation Summary (IETF Version) ISIS NSFCapable Router ISIS NSF Capability Exchange

ISIS NSF-Aware Peer ISIS Hello with Restart Options Fields

ISIS NSF Capability Exchange

Router Restarts Hello with Restart Request Bit Set

Send Restart Notification

Hello with Restart Acknowledge Bit Set Send CSNP

Run SPF Calculation Update Forwarding Table Send Updates Only NMS-2T20 9594_04_2004_c2

Respond to Restart Hello w/o RS Bit Set Send Entire Link State Database

Send Link State Packets

Send Updates

CONVERGED!

Update Forwarding Table 269

© 2004 Cisco Systems, Inc. All rights reserved.

ISIS NSF Operation Summary (Cisco Version): Point-to-Point Link ISIS NSFCapable Router

Non NSF-Aware Peer

Periodic Hello; No Restart Capability Exchange

ISIS Hello

Periodic Hello Exchange

Router Restarts

After Restart Send a CSNP with a Unique LSP

Run SPF Calculation Update Forwarding Table Send Updates Only NMS-2T20 9594_04_2004_c2

CSNP with Unique LSP Partial SNP Requesting Details of Unique LSP

Send Link State Packets

Send Updates

CONVERGED!

© 2004 Cisco Systems, Inc. All rights reserved.

Request Details of this Unique LSP Send LSP for All Data Missing in the CSNP

Update Forwarding Table 270

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

ISIS NSF: Configuration Command IETF Draft-Based router(config)# router isis router(config-router)# nsf [cisco/ietf]

Cisco Internal Implementation

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

271

Show and Debug Commands: IETF Version show clns neighbor detail Router#show clns nei detail System Id Interface SNPA State Holdtime Type Protocol esr2 PO1/0/0 *HDLC* Up 24 L2 IS-IS Area Address(es): 49.0002 IP Address(es): 180.10.10.1* Uptime: 00:02:27 NSF capable

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

272

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Show and Debug Commands: IETF Version show isis nsf Router#show isis nsf NSF is ENABLED, mode 'ietf' NSF pdb state: NSF L1 active interfaces: 0 NSF L1 active LSPs: 0 8.2.2.1.1.1.1 NSF interfaces awaiting L1 CSNP: 0 Awaiting L1 LSPs: NSF L2 active interfaces: 0 NSF L2 active LSPs: 0 NSF interfaces awaiting L2 CSNP: 0 Awaiting L2 LSPs:

NMS-2T20 9594_04_2004_c2

273

© 2004 Cisco Systems, Inc. All rights reserved.

Effect of NSF/SSO on MTTR MTTR

Recovery Time Comparisons: Cisco Route Processor Redundancy+

(Seconds)

vs. Cisco NSF with SSO*

45 40

37

35

35

30

30 25 20 15 10 5

6 0

1.63

0 Cisc isco 12000

Cisc isco 10000

Cis isc co 7500

*Source: Miercom Copyright ® 2002 NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

274

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

LAYER 4 HIGH AVAILABILITY: State St atefu full NA NAT T

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

275

High Availability Tool Kit

Application Level Resiliency Protocol Level Resiliency

Global Server Load Balancing, Stateful NAT, Stateful IPSec, DNS, DHCP, Cisco Server Load Balancing, IP QoS HSRP, VRRP, GLBP, MPLS-TE, IP Event Dampening , Graceful Restart (GR) in BGP, OSPF NSF, ISIS NSF, IP QoS

Transport/Link Level Resiliency

SONET APS, RPR, DWDM, EtherChannel, Spanning Tree Protocol, LFI, L2 QoS

Device Level Resiliency

Redundant Processors (RP), Switch Fabric, Line Cards, Ports, Power, NSF/SSO

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

276

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Network Address Translation (NAT) • Originally defined in RFC 1631 • NAT has been a factor in: Reducing address depletion Allowing interconnection of private networks using addresses as defined in RFC 1918 Hiding networks from outside the t he administrative domain

• Typically at domain edges To connect B2B To connect to Internet For VPN connections Between “test” and “production” networks

• These domain interconnect points become critical points of failure

More about NAT: 2102 Deploying and Troubleshooting NAT NMS-2T20 9594_04_2004_c2

277

© 2004 Cisco Systems, Inc. All rights reserved.

Critical Points of Failure

DOMAIN A DOMAIN B

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

278

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Critical Points of Failure

DOMAIN A DOMAIN B

NMS-2T20 9594_04_2004_c2

279

© 2004 Cisco Systems, Inc. All rights reserved.

NAT Redundancy with HSRP

Static NAT IL: 192.168.123.4 IG: 9.1.1.1 Static NAT

DOMAIN A

IL: 192.168.123.4 IG: 9.1.1.1 .2

HSRP Virtual IP 192.168.123.1

192.168.123.4 NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

DOMAIN B .3

192.168.123.5 280

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Stateful NAT Adds Redundancy for Dynamic NAT Entries • Supports dynamic pools and port address

translation (PAT/NAPT) • Entries created on primary NAT router are

distributed to backup NAT router • Messages exchanged between SNAT peers

over TCP • SNAT router that created the entries is responsible

for timing the entries • Result is session resiliency in the event of critical

failure when using NAT NMS-2T20 9594_04_2004_c2

281

© 2004 Cisco Systems, Inc. All rights reserved.

Stateful NAT ip nat pool SNATPOOL1 11.1.1.1 11.1.1.9 prefix-length 24 ip nat inside source route-map rm-101 pool SNATPOOL1 mapping-id 10 overload

Dynamic NAT Entry IL: 192.168.123.4:1001 IG: 11.1.1.1:1001 OG: 12.1.1.1:80 OL: 12.1.1.1:80

Dynamic NAT Entry IL: 192.168.123.4:1001 IG: 11.1.1.1:1001 OG: 12.1.1.1:80 OL: 12.1.1.1:80

DOMAIN A

OUTSIDE

INSIDE

NMS-2T20 9594_04_2004_c2

HSRP Virtual IP 192.168.123.1

.2

192.168.123.4

© 2004 Cisco Systems, Inc. All rights reserved.

OUTSIDE .3

DOMAIN B

INSIDE

192.168.123.5

282

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Stateful NAT ip nat pool SNATPOOL1 11.1.1.1 11.1.1.9 prefix-length 24 ip nat inside source route-map rm-101 pool SNATPOOL1 mapping-id 10 overload

Dynamic NAT Entry IL: 192.168.123.4:1001 IG: 11.1.1.1:1001 OG: 12.1.1.1:80 OL: 12.1.1.1:80

Dynamic NAT Entry IL: 192.168.123.4:1001 IG: 11.1.1.1:1001 OG: 12.1.1.1:80 OL: 12.1.1.1:80

DOMAIN A HSRP Virtual IP 192.168.123.1 .2

192.168.123.4

NMS-2T20 9594_04_2004_c2

DOMAIN B .3

192.168.123.5

© 2004 Cisco Systems, Inc. All rights reserved.

283

Phased Implementation • Stateful NAT is being delivered with Cisco IOS in phases • Phase I: Provides support for protocols that do not imbed IP address and port information within the payload of the IP packet Includes HTTP, ICMP, PING, rcp, rlogin, rsh, TCP, Telnet Requires symmetric routing of return traffic Supports only “inside” NAT pools

• Phase II: The following protocols and applications are targeted for support in Phase II: FTP, H225, H245, PPTP/GRE, NetMeeting Directory (ILS), RAS, SIP (both TCP and UDP based), Skinny, TFTP Asymmetric routing support Support for outside NAT pools, using the configuration command ip nat outside source pool

Dynamic entries, which are extended out of static definitions Support for ip nat inside destination NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

284

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Configuration: Primary/Backup

Primary SNAT Router ip nat Stateful id 1 primary 10.88.194.17 10.88.194.17 peer 10.88.194.18 10.88.194.18 mapping-id 10

Backup SNAT Router ip nat Stateful id 2 backup 10.88.194.18 10.88.194.18 peer 10.88.194.17 10.88.194.17 mapping-id 10

• Enable “stateful” Assign unique sNAT router ids

• Explicitly define peers

NMS-2T20 9594_04_2004_c2

285

© 2004 Cisco Systems, Inc. All rights reserved.

Configuration: HSRP Mode

Active SNAT Router

Standby SNAT Router

ip nat Stateful id 1 redundancy SNATHSRP mapping-id mapping -id 10

ip nat Stateful id 2 redundancy SNATHSRP mapping-id mapping -id 10

• Enable “stateful” Assign unique sNAT router ids

• “Point” sNAT to a HSRP group Matches standby name SNATHSRP NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

286

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Configuration (Cont.) ip nat pool SNATPOOL1 11.1.1.1 11.1.1.9 prefix-length 24 ip nat inside source route-map rm-101 pool SNATPOOL1 mapping-id 10 10 overload ! ip route 11.1.1.0 255.255.255.0 Null0 250 ! access-list 101 permit ip 10.88.194.16 0.0.0.15 11.0.0.0 0.255.255.255 access-list 101 permit ip 10.88.194.16 0.0.0.15 88.1.88.0 0.0.0.255 ! route-map rm-101 permit 10 match ip address address 101 !

11.1.1.1

10.88.194.22 NMS-2T20 9594_04_2004_c2

I

O

I

O

88.1.88.8 287

© 2004 Cisco Systems, Inc. All rights reserved.

Mapping-ID? ip nat inside source route-map rm-101 pool SNATPOOL1 mapping-id 10 overload • Used to specify whether or not the local SNAT router will distribute a particular set of locally created entries to a peer SNAT router

• Each dynamically created entry inherits a mapping-id number Comes from the mapping defined on the NAT rule At the point of creation

• Mapping list Specifies which of the entries will be forwarded to peers Provides a way to specify that entries from particular NAT rules should be forwarded

ip nat Stateful id 1 redundancy SNATHSRP mapping-id mappingid 10 mapping-id mappingid 11

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

288

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

LAYER 4 HIGH AVAILABILITY: Layer 4 Statefu Sta tefull IPS IPSec ec

NMS-2T20 9594_04_2004_c1

289

© 2004 Cisco Systems, Inc. All rights reserved.

IPSec Connection Failures Main Office Remote Site VPN Primary

VPN WAN

VPN Backup

• IPSec connection flows need to be maintained through the correct router in the case of multiple head-end devices • HSRP is used for failover, but can an HSRP vIP be used as the VPN tunnel endpoint?

More IPSec VPN Session SEC- 2011 Deploying Site Site to Site IPsec VPN NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

290

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

IPSec Stateful Failover Main Office Remote Site VPN Primary

VPN WAN

VPN Backup

Features • Ensures transport network is always available: business resiliency • Delivers sub-second central site failover • Scalable to 1000s of remote peers • Transparent to remote sites NMS-2T20 9594_04_2004_c2

291

© 2004 Cisco Systems, Inc. All rights reserved.

Stateful IPSec Tunneling Aggregation Site Access Router

Data Center

IPSec VPN

Tunnels Fault

• Used in conjunction with HSRP; HSRP Virtual IP is used as source/destination for IPSec tunnels • State Synchronization Protocol (SSP) is used to transfer state • TCP connection formed from Active to each Standby router NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

Fault VPN

Router or Circuit Fails Stateful IPSec Maintains Connectivity to Users and Applications

292

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Stateful Failover • One HSRP IP address

for inside interfaces

Internal HSRP IP Address

Corporate Network

SSP Session

• One HSRP IP address

for outside interfaces • Active IKE and IPSec

SAs mirrored on standby via SSP

Standby

Active

IPSec Traffic

External HSRP IP Address

• When active fails,

standby takes over IPSec IPS ec traf traffic fic withou withoutt remote’s knowledge NMS-2T20 9594_04_2004_c2

Remote

Remote Site

© 2004 Cisco Systems, Inc. All rights reserved.

293

Stateful IPSec SSP Implementation • Messages include ADD, DELETE, UPDATE, BULK-

SYNC and Sync-check • What is exchanged? Sequence number counters and window states IKE session keys Security association attributes, such as cipher, authentication and compression algorithms Standby Integrity (Sync check)

• Recommended to secure SSP sessions with IPSec

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

294

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Stateful IPSec Configuration ssp group 101 local 10.1.1.1 remote 10.1.1.2 redundancy IPSEC-HA ! crypto isakmp ssp 101 ! Interface Ethernet0/1 ip address 10.1.1.1 standby 1 ip 10.1.1.254 standby 1 priority 150 standby 1 preempt standby name IPSEC-HA

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

Ssp group-id Is Bound to Crypto isakmp Standby Name Is Bound to ssp group

295

HIGH AVAILABILITY FOR SERVICES

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

296

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Multilayer Network Design: Server Module Features

Server Module

Access

Distribution

Core/Backbone

Core

Building Block Additions

Server Farm

NMS-2T20 9594_04_2004_c2

WAN

Internet

PSTN 297

© 2004 Cisco Systems, Inc. All rights reserved.

HA for Single Attached Servers • Single point of failure • Dual supervisors-fast stateful recovery • No increase in complexity Harden with Intra-Chassis Redundancy Here

Single Attached Server Mission Critical Application

HA Dual Supervisors Cisco Catalyst 6000 Series 100BaseT GE or GEC NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

Redundant Uplinks 298

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Redundant Servers with Server Load Balancing Virtual Server 10.1.1.1

10.1.1.2

10.1.1.3

10.1.1.4

Cisco IOS-SLB Device User Requesting 10.1.1.1 Gets Directed to One of Several Identical Servers Eliminates the Server as a Single Point of Failure NMS-2T20 9594_04_2004_c2

ip slb serverfarm WEB-FARM real 10.1.1.2 inservice real 10.1.1.3 inservice real 10.1.1.4 Inservice ! ip slb vserver WEBSVR virtual 10.1.1.1 serverfarm WEB-FARM inservice

Cisco IOS Server Load Balancing Image for the Cisco Catalyst 6000 or the Cisco 7200 or Content Switching Module (CSM)

© 2004 Cisco Systems, Inc. All rights reserved.

299

Data Center Disaster Recovery • This is a topic unto itself • Nevertheless, very important • Let’s consider one aspect where the network can

help ensure continuous access to applications at multiple data centers

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

300

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

High Availability and Performance for Web-Based Business Applications Problem:

DataCenter B DataCenter A

• Want to intelligently and efficiently load balance client requests across multiple data centers • Backup one data center to the other

Solution: • Use Cisco Global Site Selector (GSS) to add intelligent load balancing at the DNS resolution point in

the Internet NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

301

Cisco Global Site Selector (GSS) • GSS becomes authoritative name server for selected

applications (ie, sub-domains) Works with existing DNS infrastructure to connect client to SLB supporting the requested website Monitors load and availability of SLB’s to select the best SLB (site) to support the request

• Benefit: Better control over request resolution process High availability for disaster recovery and GSLB applications Policy-determined, load-balanced resource utilization across sites Improved performance and fast recovery yield positive user experience NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

302

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Cisco Global Server Load Balancing CSM = Content Switching Module CSS = Content Services Switch

DataCenter A

DataCenter B SLB, CSM, CSS

SLB, CSM, CSS

x.com DNS

www.x.com NS GSS-1 www.x.com NS GSS-2 Local DNS

GSS-2

www.x.com GSS-1 Clients Requesting Websites NMS-2T20 9594_04_2004_c2

303

© 2004 Cisco Systems, Inc. All rights reserved.

Cisco Global Server Load Balancing DataCenter A

DataCenter B SLB, CSM, CSS

SLB, CSM, CSS

RR Records Best Destination

Local DNS

GSS-2

www.x.com GSS-1 Clients Requesting Websites NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

304

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Cisco Global Server Load Balancing DataCenter A

DataCenter B SLB, CSM, CSS

SLB, CSM, CSS

Local DNS

GSS-2 GSS-1 Clients Requesting Websites NMS-2T20 9594_04_2004_c2

305

© 2004 Cisco Systems, Inc. All rights reserved.

Cisco Global Server Load Balancing DataCenter A

DataCenter B SLB, CSM, CSS

SLB, CSM, CSS

RR Records Best Destination Local DNS

GSS-2 www.x.com

GSS-1

Clients Requesting Websites NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

306

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Cisco Global Server Load Balancing • In real-time, globally load balance all web-based traffic across multiple data centers • Re-route all traffic to a back up data center in case of a disaster

DataCenter A

DataCenter B SLB, CSM, CSS

SLB, CSM, CSS

• Simplify the management of the DNS process by providing centralized command and control

Local DNS

GSS-2 GSS-1 Clients Requesting Websites NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

307

In Summary… • For HA networking focus on network management,

HA technologies and design optimization (we have covered two; break out sessions cover design optimization is detail) • Understand and choose appropriate redundancy

protocols available for each network layer • Outfit critical edge systems with redundant

intra-chassis components Processor, power, fans, line cards, switch matrix

• Incorporate load sharing when possible • Measure and evaluate improvements • Keep user perspective NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

308

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Recommended Reading High Availability Network Fundamentals ISBN: 1587130173

Data Center Fundamentals ISBN: 1587050234 Available in Sept 2003

Available Onsite at the Cisco Company Store NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

309

Reference Materials • High Availability in Routing http://www.cisco.com/en/US/partner/about/ac123/ac147/current_issue/ high_availability_routing.html

• Disaster Recovery Best Practices http://www.cisco.com/en/US/partner/tech/tk869/tk769/technologies_white_ paper09186a008014f92e.shtml

• Measuring High Availability in Cisco LAN network http://www.cisco.com/application/pdf/en/us/guest/tech/tk769/c1550/ cdccont_0900aecd800b29ac.pdf

• Network Management Best Practices http://www.cisco.com/application/pdf/en/us/guest/tech/tk769/c1550/ cdccont_0900aecd800b29ac.pdf

• Baseline Processes Best Practices http://www.cisco.com/en/US/partner/tech/tk869/tk769/technologies_white_ paper09186a008014fb3b.shtml

• Measuring Delay, Jitter and Packet Loss http://www.cisco.com/en/US/partner/tech/tk869/tk769/technologies_white_ NMS-2T20 9594_04_2004_c2

paper09186a00801b1a1e.shtml © 2004 Cisco Systems, Inc. All rights reserved.

310

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Associated Sessions • NMS-2102: Deploying and Trouble-shooting NAT • NMS-2201: Network Availability Measurement • NMS-2306: Disaster Recovery and Geographic Load Balancing

Protocols • OPT- 2043: 802.17 and Spatial Reuse Protocol (SRP) Protocols • RST-2311: Packet forwarding and Operation of Mid to High-End Routers and Switches • RST-2312: Control Plane Operation of Mid to High-End Routers and Switches • RST-2505: Campus Design Fundamentals • RST-2514: High Availability in Campus Network Deployments • RST-2603: Deploying MPLS Traffic Engineering • RST- 4312: High Availability Availability in Routing • SEC- 2011: Deploying Site-to-Site IPSec VPNs NMS-2T20 9594_04_2004_c2

311

© 2004 Cisco Systems, Inc. All rights reserved.

Appendix A: Acronyms 1 • AVG: Active Virtual Gateway (in GLBP)

• GR: Graceful Restart

• AVF: Active Virtual Forwarder (in GLBP)

• GSS: Global Site Selector • HA: High Availability

• ADM: Add/ Drop Multiplexer • APS: Automatic Protection Switching • ATM: Asynchronous Transfer Mode

• HDLC: High Level Data Link Control • HSRP: Hot Standby Routing Protocol • IKE: Internet Key Exchange

• CSM: Content Switching Module

• LC: Line Card

• CSS: Content Services Switch

• LSP: Link State Path

• DPT: Dynamic Packet Transport

• MAC: Media Access Control

• DWDM: Dense Wave Division Multiplexing

• MARP: Multi-Access Reachability Protocol

• FIB: Forwarding Information Base

(Forwarding Table) • FRR: Fast Re-Route

• MIB: Management Information Base • MLPPP: Multi-Link PPP

• GE: Gigabit Ethernet

• MPLS: Multi-Protocol Label Switching

• GLBP: Gateway Load Balancing

• MTBF: Mean Time Between Failure

Protocol

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

312

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Appendix A: Acronyms 2 • MTTR: Mean Time to Repair

• RRI: Reverse Route Injection

• NAT: Network Address Translation

• RU: Rack Unit

• NIC: Network Interface Card

• SLB: Server Load Balancing

• NSF: Non Stop Forwarding

• sNAT: Stateful Network Address Translation

• PAT: Port Address Translation • PAgP: Port Aggregation Protocol • PPP: Point to Point Protocol • PVF: Primary Virtual Forwarder (in GLBP) • RIB: Routing Information Base (Routing Table) • RFC: Request For Comments • RPR: Resilient Packet Ring (L1/L2 Resiliency Technology) • RPR, RPR+: Cisco’s Route Processor Redundancy (Device Resiliency)

• SNMP: Simple Network Management Protocol • SPF: Single Point of Failure: Shortest Path First (in routing protocols) • SSO: Stateful Stateful Switch Ove Over r • SSP: State Synchronization Protocol • SVF: Secondary Virtual Forwarder (in GLBP) • TCP: Transmission Control Protocol • Protocol UDLD: Unidirectional Link Detection

• RP: Route Processor

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

313

Appendix A: Acronyms 3 • VF: Virtual Forwarder (in GLBP) • vIP: Virtual IP Address • VPN: Virtual Private Network • VRRP: Virtual Router Redundancy Protocol

NMS-2T20 9594_04_2004_c2

© 2004 Cisco Systems, Inc. All rights reserved.

314

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

Q&A

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

315

Complete Your Online Session Evaluation! WHAT:: WHAT

Co Compl mplete ete an on onlin line es ses essio sion ne eva valua luatio tion n and your name will be entered into a daily drawing

WHY WHY::

Wi Win n fa fabu bulo lous us priz prizes es!! Gi Give ve us yo your ur fe feed edba back ck!!

WHERE WHERE:: Go to the In Intern ternet et stat stations ions lo locate cated d throughout the Convention Center HOW: HOW:

NMS-2T20 9594_04_2004_c2

Wi Win nne ners rs wi will ll be po post sted ed on th the e on onsi site te Networkers Website; four winners per day

© 2004 Cisco Systems, Inc. All rights reserved.

316

© 2004 Cisco Systems, Inc. I nc. All rights reserved. Pri Printed nted in USA. Presentation_ID.scr

NMS-2T20 9594_04_2004_c1

© 2004 Cisco Systems, Inc. All rights reserved.

317

IP Network

Comments

Content

Sponsor Documents

Recommended