## Basic ### Control Word的作用 The control word has these five functions: 1. **Pad small packets** - If the AToM packet does not meet this min lengthen the frame is padded to meet the min length on the ethernet link.Because the MPLS header has no length that indicates the length of the frames, the control word holds a length field that indicates the length of the frame. If the received AToM packet in the egress PE router has a control word with a length that is not 0, the router knows that padding was added and can correctly remove the padding before forwarding the frames. 2. Carry control bits of the layer 2 header of the transported protocol 3. **Preserve the sequencing of the transported frames** - With this sequence number receiver can detect the packets:The first packet sent onto the PW has a sequence number of 1 and increments for each subsequent packet by 1 until it reaches 65535. If such out of seq detected they are dropped, re-ordering for out of sequence AToM packet is not done. Sequencing is disabled by default. 4. Facilitate the correct **load balancing** of AToM packet in the MPLS backbone network - Routers perform MPLS payload inspection. Based on that router decides how to LB the traffic. The router looks at the firstnibble,if the first nibble = 4 then its an IPV4 packet. The generic control word starts with a nibble with vale 0, and the control word used the OAM data starts with value 1. **5. Facilitate fragmentation and reassembly** May be used to indicate payload fragmentation 00 = unfragmented 01 = 1st fragment 10 = last fragment 11 = intermediate fragment ## Virtual Private LAN Service (VPLS) - [RFC 4664: Framework for Layer 2 Virtual Private Networks (L2VPNs)](https://www.rfc-editor.org/rfc/rfc4664) - [RFC 4761: Virtual Private LAN Service (VPLS) Using BGP for Auto-Discovery and Signaling](https://www.rfc-editor.org/rfc/rfc4761) - [RFC 4762: Virtual Private LAN Service (VPLS) Using Label Distribution Protocol (LDP) Signaling](https://www.rfc-editor.org/rfc/rfc4762) - [RFC 6074: Provisioning, Auto-Discovery, and Signaling in Layer 2 Virtual Private Networks (L2VPNs)](https://www.rfc-editor.org/rfc/rfc6074) - [RFC 8077: Pseudowire Setup and Maintenance Using the Label Distribution Protocol (LDP)](https://www.rfc-editor.org/rfc/rfc8077) VPLS类型: - LDP-VPLS (defined in RFC 4762) - BGP-VPLS (defined in RFC 4761) > **Draft Martini:** Named after its author Luca Martini (worked for Level 3 at this time), simply uses LDP for signaling to establish point-to-point Layer 2 VPN over MPLS backbone, with no suggestion for auto-discovery. - 作废 > > **Draft Kompella:** Named after its author Kireeti Kompella (worked for Juniper at this time), simply uses BGP for both signaling and auto-discovery to establish multipoint (To be more specific it is fully meshed point-to-point pseudowires) Layer 2 VPN over MPLS backbone. - 作废 > > Both drafts uses the same encapsulation; Provider's Layer 2 header + Tunnel Label + VC Label + Control Word + Payload (L2 Frame), the encapsulation details for each Layer 2 protocol was described in several RFCs. - BGP Auto Discovery (RFC 6074) - BGP AD VPLS是首先通过扩展的BGP UPDATE报文来自动发现VPLS域中的其他成员信息,然后通过LDP FEC 129信令报文来完成本地VSI与远端VSI之间自动协商建立VPLS PW的过程。 --- 当设备没有能力建立大量LDP会话,或者希望手工管理和分配VC标签时,可以配置静态方式的VPLS。通过手工配置VC Label建立VPLS PW连接,而不需要LDP传递二层VC和链路信息。 ```bash vsi vsi-name [ static | auto ] [ bd-mode ] # static - 指定VSI实例采用静态的成员发现方式。通过配置静态方式的VPLS,实现手工管理和分配VC标签的功能。 # 配置静态方式VPLS示例 # vsi a2 pwsignal ldp vsi-id 2 peer 3.3.3.9 static-upe trans 100 recv 100 # # Peer Type : static # 配置LDP方式VPLS示例 # mpls ldp remote-peer 3.3.3.9 remote-ip 3.3.3.9 # vsi a2 pwsignal ldp vsi-id 2 peer 3.3.3.9 # # PW Type : label # auto - 指定VSI实例采用自动的成员发现方式。 # 配置BGP方式VPLS示例 # 配置完成VSI实例的RD后,PE1上VSI之间的BGP Session会建立,此时PE1会向PE2发送携带MP-REACH属性的Update消息,包括Site ID和标签块信息。PE2收到携带MP-REACH属性的Update消息后,根据自己的Site ID和报文中的标签块,计算出唯一的一个标签值,作为VC标签,此时PE2到本端PE1的VC1建立成功。同时根据收到的Update消息中的Site ID和本地标签块,也可以得到PE1的VC标签值,并向PE1发送Update消息,PE1收到PE2的Update消息后作同样的检查和处理,最终也成功建立VC2。 # vsi bgp-vpls auto pwsignal bgp route-distinguisher 192.168.10.2:1 vpn-target 100:1 import-extcommunity vpn-target 100:1 export-extcommunity site 1 range 5 default-offset 0 # site 2 range 5 default-offset 0 # VSI两端的Site ID不能相同。 # default-offset - 指定VSI实例的初始Site ID的偏差值。取值为0或1。缺省值为0。 # range - 可连接的Site的个数 # bgp 100 peer 1.1.1.1 as-number 100 peer 1.1.1.1 connect-interface LoopBack1 # l2vpn-ad-family # 老版本 vpls-family policy vpn-target peer 1.1.1.1 enable peer 1.1.1.1 signaling vpls # # PW Type : label # 配置BGP AD方式VPLS示例 vsi vplsad1 auto bgp-ad vpls-id 10.10.1.1:1 vpn-target 100:1 import-extcommunity vpn-target 100:1 export-extcommunity # bgp 100 peer 2.2.2.9 as-number 100 peer 2.2.2.9 connect-interface LoopBack1 peer 3.3.3.9 as-number 100 peer 3.3.3.9 connect-interface LoopBack1 # ipv4-family unicast peer 2.2.2.9 enable peer 3.3.3.9 enable # l2vpn-ad-family policy vpn-target peer 2.2.2.9 enable peer 3.3.3.9 enable # # PW Type : label # vsi bd-mode命令用来创建一个桥域模式的VSI。 ``` ### BGP-VPLS `site 1 range 5 default-offset 0` ```bash +------------------------------------+ | Length (2 octets) | +------------------------------------+ | Route Distinguisher (8 octets) | +------------------------------------+ | VE ID (2 octets) | +------------------------------------+ | VE Block Offset (2 octets) | +------------------------------------+ | VE Block Size (2 octets) | +------------------------------------+ | Label Base (3 octets) | +------------------------------------+ # Encapsulation information is also attached to the prefix and is encoded as an extended community 'Layer2 Info Extended Community' to the BGP update. The value is 0x800A and is encoded as: +------------------------------------+ | Extended community type (2 octets) | +------------------------------------+ | Encaps Type (1 octet) | +------------------------------------+ | Control Flags (1 octet) | +------------------------------------+ | Layer-2 MTU (2 octet) | +------------------------------------+ | Reserved (2 octets) | +------------------------------------+ ``` - One PE router must advertise at least one label block. The label block is a continuous set of MPLS labels and is used by the remote PE routers in order to select one remote VC label. The remote label is used for the PW between the local and remote PE router. (A PE router can advertise multiple label blocks) - The **VE-ID** must be configured on each PE. It identifies the PE routers within the VPLS domain. - The **VE Block Size (VBS)** is the size of the label block and has a default value of 10 (cisco). - If `ve range` is configured, it is that value. - `ve range` can be configured to be 11 -100. - The **Label Base (LB)** is the first label value of a free set of labels that can be reserved by the PE router to be used for this VPLS domain. - **VE Block Offset (VBO)** is the offset value to be used when multiple label blocks must be created by a PE router. VBO is calculated with this equation: `VBO = RND(VE-ID/VBS) * VBS`(Cisco) -- `Label = Label base + (Remote-site-identifier – offset)` 1. [VPLS with BGP Signaling Tech Note - Cisco](https://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/virtual-private-lan-services-vpls/116121-tech-vpls-bgp-00.html) 2. [L2VPN: THE LABEL BASE AND OFFSET, EXPLAINED! – NETWORK FUN-TIMES (networkfuntimes.com)](https://www.networkfuntimes.com/l2vpn-the-label-base-and-offset-explained/) ## Ethernet VPN (EVPN) ### 01 资料 - [RFC 7209: Requirements for Ethernet VPN (EVPN)](https://www.rfc-editor.org/rfc/rfc7209) - [RFC 7432: BGP MPLS-Based Ethernet VPN](https://datatracker.ietf.org/doc/html/rfc7432) - [RFC 8317: Ethernet-Tree (E-Tree) Support in Ethernet VPN (EVPN) and Provider Backbone Bridging EVPN (PBB-EVPN)](https://datatracker.ietf.org/doc/html/rfc8317) - [RFC 8365: A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN)](https://datatracker.ietf.org/doc/html/rfc8365) - [RFC 8584: Framework for Ethernet VPN Designated Forwarder Election Extensibility](https://www.rfc-editor.org/rfc/rfc8584) - [RFC 9136: IP Prefix Advertisement in Ethernet VPN (EVPN)](https://www.rfc-editor.org/rfc/rfc9136.html) - [RFC 9161: Operational Aspects of Proxy ARP/ND in Ethernet Virtual Private Networks](https://www.rfc-editor.org/rfc/rfc9161) - [IP新技术进阶系列 - EVPN - 华为](https://www.bilibili.com/video/BV1QA411K7Ri/?spm_id_from=333.337.search-card.all.click&vd_source=e18936fea2c36c346889607e0c1e7f44) - [BRKMPL-2253 - EVPN IOS XR Deep Dive for Service Providers and Data Center](https://www.ciscolive.com/c/dam/r/ciscolive/global-event/docs/2023/pdf/BRKMPL-2253.pdf) ### 02 EVPN主要功能 #### 0) 基本概念 ![[Pasted image 20231008131334.png]] #### 1) EVPN解决了什么问题? >传统VPLS MAC学习是数据面触发,EVPN通过控制面学习MAC地址。控制面学习就可以应用策略,协议对于网络的控制更加灵活。这就为后面的负载分担的实现打下了基础。 1. 协议简化,将传统的L2VPN(VLL,VPLS)和L3VPN(VPNv4)合并为EVPN - ==不过这个不是主要矛盾,所谓的简化都只是表面上的简化而已,该有的配置,该有的协议交付,并没有少多少== - 然后LDP和RSVP-TE合并为SR。 - ![[Pasted image 20231008101007.png]] - ![[Pasted image 20231008103155.png]] 2. 解决传统L2VPN的单活问题 - ==多活是EVPN要解决的主要矛盾== - ![[Pasted image 20231008101418.png]] - ![[Pasted image 20231008101444.png]] 3. 解决传统L2VPN收敛慢的问题 - 传统L2VPN - ![[Pasted image 20231008101557.png]] - EVPN - ![[Pasted image 20231008101614.png]] 4. 自动化,结合vxlan实现数据中心的自动化(以后分享vxlan的时候讨论) #### 2) 自动感知邻居 对于传统的L2VPN,RFC 6074已经支持自动发现邻居,但是在运营商大型网络交付中,主流还是RFC 4762静态配置的方式,因为当前业务的开通还是人工的方式,所以工作量差异不会太大,但是对于数据中心这样的自动化场景,就需要自动感知邻居了。 **EVPN启动过程:** 0. 初始态 ![[Pasted image 20231008010828.png]] 1. 创建EVPN实例。 - 为每个PE设备创建EVPN实例(即EVI),每个EVPN实例配置RD、RT属性。 - ![[Pasted image 20231008011025.png]] 2. 配置BGP对等体,并使能EVPN。 - PE设备会向邻居发送Type3(`Inclusive Multicast Ethernet Tag Route`)路由。Type3路由包含RD和Label(MPLS分配)信息。Type3路由属性如下图: - ![[Pasted image 20231008011243.png]] - 当其他邻居设备收到报文时,会把路由信息放到本设备的BUM流量转发表,指导BUM报文转发,BUM即广播、未知单播、组播流量统称。 - ![[Pasted image 20231008011102.png]] - ![[Pasted image 20231008011358.png]] - ![[Pasted image 20231008011423.png]] - ![[Pasted image 20231008011446.png]] - 至此,每个设备学习并且构建了本EVI的==BUM流量转发表== 3. 绑定ESI到EVPN实例。 - ESI生成,并绑定ESI到EVPN实例。 - ![[Pasted image 20231008011744.png]] - 然后触发Type4路由。Type4路由携带如下信息:RD 、ESI,PE源地址。Type4路由属性如下图: - ![[Pasted image 20231008013244.png]] - ![[Pasted image 20231008011842.png]] - Type4向对等体通告ESI被记录在ESI成员信息表中。 - ![[Pasted image 20231008012040.png]] - ![[Pasted image 20231008012155.png]] - 至此,各个路由器生成了本EVI的==ES成员信息表== - 有了ES成员信息表后,启动选取DF,[[#5) Designated Forwarder (DF)]] - ![[Pasted image 20231008012339.png]] 4. PE设备间互发Type1路由,更新ESI Label信息。 - Type1路由属性如下图: - ![[Pasted image 20231008013428.png]] - 完成DF选举之后,各PE之间会相互发布Type1路由,Type1路由携带的主要信息有:ESI值,以及本地为ESI分配的标签值。 - ![[Pasted image 20231008012431.png]] - 各PE收到邻居发来的Type1路由之后,首先验证Type1路由携带的ESI值是否与本地相同,如果相同,就会把该路由携带的ESI标签值更新到本地的ES成员列表中。 - ![[Pasted image 20231008012451.png]] - ![[Pasted image 20231008012511.png]] 5. ARP请求触发控制面 - ![[Pasted image 20231008012652.png]] - ![[Pasted image 20231008012719.png]] - ![[Pasted image 20231008012753.png]] - 这里触发了==DF和non-D==F的保护机制 - ![[Pasted image 20231008012824.png]] - 这里触发了==水平分割==的保护机制,[[#6) 水平分割]] - ![[Pasted image 20231008012916.png]] - 为什么广播报文要带特殊标签? - ![[Pasted image 20231008012957.png]] #### 3) BUM流量转发原则 - 当流量转发给远端PE,该PE不在同一个ESI的时候,只需要加上BUM标签即可 ![[Pasted image 20231008115327.png]] - 当流量转发给远端PE,该PE在同一个ESI的时候,需要BUM标签 + 水平分割标签 ![[Pasted image 20231008115409.png]] #### 4) MAC地址通告与转发 - PE学到MAC地址后,通告Type2路由给远端学习MAC地址 ![[Pasted image 20231008115737.png]] - 报文转发的时候带上外层标签(Type2路由通告的标签) ![[Pasted image 20231008115819.png]] #### 5) Designated Forwarder (DF) > C2双归属至L3和L4,C1发送多播流量至L3和L4。为了避免C2从L3和L4收到重复的流量而造成的网络资源浪费,EVPN引入了DF选举机制,即从L3和L4中指定一个PE来转发多播流量。如果L4被选举为主DF,则从C1方向发来的多播流量将只从L4向C2转发,而未被选中的L3则成为备份DF。 > ![[Pasted image 20231008112242.png]] - 通过从其他PE收到的以太网段路由获取Source IP地址,根据Source IP地址大小的顺序对多归PE列表内的PE进行排序,并且顺序分配由0开始的序号; - 如果是基于接口的DF选举,则Source IP地址小的PE被选为主DF; - 如果是基于VLAN进行DF选举,则需要按照公式:`(V mod N)=i `计算出作为DF的PE设备的序号,其中i表示PE的序号,N为多归到同一CE的PE数量,V表示Ethernet Segment对应VLAN的VLAN ID。 #### 6) 水平分割 - C1双归属至L1和L2且使能了负载分担时,如果L1和L2之间建立了邻居关系,则当L1从C1收到了多播流量后,L1会将多播流量转发至L2。为了避免L2继续将流量转发至C1形成环路,EVPN中定义了水平分割功能,即在L1收到来自C1的多播流量会转发给L2,L2收到报文后将检查流量中携带的EVPN ESI Label,发现该标签中的ESI值等于L2与C1连接的网段的ESI,则L2不会将该多播流量发送至C1,从而避免形成环路。 ![[Pasted image 20231008112836.png]] > [!tip] ESI MPLS标签是谁发布的 > This label is referred to as the ESI label and MUST be distributed by all PEs when operating in All-Active redundancy mode using a set of **Ethernet A-D per ES routes** - `Ethernet A-D per ES routes` 携带 `ESI Label Extended Community` 来完成ESI Label的发布 ![[Pasted image 20231008125719.png]] #### 7) 别名(Aliasing) 与 备份路径(Backup Path) - 如下图,就是C1双归的L1和L2的时候,C1只发了数据流给L1,这个时候L1学到了MAC并且通告给了L3和L4,由于L2没有流量触发MAC学习,也就没有路由发给L3和L4,那么L3&L4去往C1的流量只能发给L1,无法形成负载分担。==别名(Aliasing)== 的功能就是L3和L4哪怕只学习到L1发来的路由,但是L3和L4可以根据L1和L2之前发布的`Ethernet A-D per EVI route`判断出来L2也能到达C1,然后就可以之间将流量负载分担发送给L1以及L2。 ![[Pasted image 20231008113308.png]] - ==备份路径(Backup Path)==的功能跟别名一样,只是别名应用在多归多活场景,备份路径用在多归单活场景,形成一个备份路径做快收敛,故障前部转发流量。 疑问:为什么需要`Ethernet A-D per EVI route`才能形成别名和备份路径? `Ethernet A-D per ES Route`和`Ethernet A-D per EVI Route`对比如下表: - 当各网关设备之间的BGP EVPN邻居关系建立成功后,网关设备之间会传递以太自动发现路由。以太自动发现路由可以向其他网关通告本端网关对接入站点的MAC地址的可达性,即网关对连接的站点是否可达。其中,Ethernet Auto-Discovery Per ES路由主要用于ESI多活场景中的快速收敛和水平分割,Ethernet Auto-Discovery Per EVI路由主要用于ESI多活场景中的别名 - 需要使用`Ethernet A-D per EVI Route`发布的MPLS标签来转发路由 - 如果PE学习到了远端CE的MAC地址,封装2层标签,外层为隧道MPLS标签,内层为Type2发布的标签 - The MPLS label stack to send the packets to PE1 is the **MPLS LSP** stack to get to PE1 (at the top of the stack) followed by the **EVPN label** advertised by PE1 for CE1's MAC. - 如果PE没有学习到远端CE的MAC地址,封装2层标签,外层外隧道MPLS标签,内层为`Ethernet A-D per EVI Route`发布的标签 - The MPLS label stack to send packets to PE2 is the **MPLS LSP** stack to get to PE2 (at the top of the stack) followed by the **MPLS label in the Ethernet A-D route** advertised by PE2 for <ES1, VLAN1>, if PE2 has not advertised MAC1 in BGP. | 字段 | Ethernet Auto-Discovery Per ES路由 | Ethernet Auto-Discovery Per EVI路由 | |-----------------------------------|-----------------------------------------------------------|-----------------------------------------------------------| | RD | 该字段是由VXLAN网关上设置的源VTEP IP地址与0组合而成,例如X.X.X.X:0。 | 该字段为EVPN实例下设置的RD(Route Distinguisher)值。 | | ESI | VXLAN网关与某一VM连接的唯一标识。在VM多归场景中,VXLAN网关通过这一字段获知哪些网关连接了同一个VM。 | VXLAN网关与某一VM连接的唯一标识。在VM多归场景中,VXLAN网关通过这一字段获知哪些网关连接了同一个VM。 | | Ethernet Tag ID | 该字段为全F。 | 该字段用于标识一个ES下的不同的子广播域,全0标识该EVI只有一个广播域。 | | MPLS Label | 该字段为全0。 | 该字段为绑定EVPN实例的BD所关联的VNI。 | #### 8) 快速收敛 当C1和L1之间的链路出现故障时,L1会对L3&L4发布撤销类型的以太自动发现路由(`Ethernet A-D per ES Route`),即向L3&L4通告其对Site1可达性变成了不可达。当L3&L4收到以太自动发现路由后,L3&L4将仅使用L2向Site1发送流量,这样可以避免逐条发送MAC路由撤销信息,大大减少了收敛时间。 ![[Pasted image 20231008113851.png]] #### MAC Mobility ![[Pasted image 20231008131403.png]] ### 03 EVPN路由 - 主要用到的是type1~type5,type6~type8了解即可 ![[Pasted image 20231008014810.png]] - 非常经典的总结 ![[Pasted image 20231008120255.png]]