This is my revision notes for Computer Network and Application. I will go through a five-level protocol model: Application Layer, Transport Layer, Network Layer, Link Layer, and Physical Layer. This post includes the summary of computer network knowledge as well as network security.
Measuring Performance
Propagation Delay
The time it takes for a signal to travel from the sender to the receiver in a communication system.
Formula: Distance / Transmission speed
Transmission Delay
The time it takes to put bits (1/0) into a medium controlled by bandwidth of a link (data carrying capacity of a link)
Formula: Data size / Link bandwidth
Queuing Delay
Queuing delay refers to the time a packet spends waiting in a queue before it can be transmitted or processed further.
Processing Delay
Processing delay refers to the time taken to process a packet at a network node before it can be forwarded or transmitted.
Application Architecture
Definition: Designed by the application developer and dictates how the application is structured over various end systems
Client-server architecture (C/S)
There is an always-on host, called the server, which serves requests from many other hosts, called clients. With the client-server architecture, clients do not directly communicate with each other.
Peer-to-peer architecture (P2P)
There is minimal (or no) reliance on dedicated servers in data centers. Instead the application exploits direct communication between pairs of intermittently connected hosts, called peers.
Advantages:
Scalability: P2P systems can be more scalable as the workload is distributed across multiple peers, and adding more peers can increase the system’s capability.
Fault Tolerance: There is no central point of failure.
Cost: P2P architectures are also cost effective since they normally don’t require significant server infrastructure and server bandwidth.
Disadvantages:
Faces challenges of security, performance, and reliability due to their high decentralized structure.
Performance (time)
Assume file size: F, server upload rate: us, N peers, lowest download rate of the peer: dmin
C/S: max{ NF/us, F/dmin }
P2P: max{ F/us, F/dmin, NF/(us + u1 + … + uN) }
C/S faster than P2P conditions
Participating peers: In a P2P network, the availability of the file depends on the peers hosting it, and if these peers go offline or become unreliable, it can impact the file distribution speed
Network infrastructure: If the network infrastructure is well-designed and optimized, a C/S architecture can take advantage of high speed connections and low latency
Socket Programming
Socket: A software interface that allows a process to send messages into, and receive messages from the network. A socket is the interface between the application layer and the transport layer within a host.
Socket programming for C/S and P2P
C/S:
CS systems follow a centralized approach where a server listens on a specific socket for clients connections
In CS architecture, the server creates and binds a socket to a specific IP address and port number. It listens for incoming client connections on this socket using functions like ‘socket()’, ‘bind()’, and ’listen()’.
Clients create a socket and connect it to the server’s IP address and port number using functions like ‘socket()’ and ‘connect()’.
Communication in CS architecture usually involves a request-response model, where clients send requests to the server, and the server responds to those requests.
P2P:
P2P systems are decentralized, and peers communicate directly with each other without relying on a central server.
Each peer in a P2P network creates a socket to listen for incoming connections from other peers, as well as initiate connections to other peers.
Socket programming for TCP and UDP
TCP
Server: socket(), bind(), listen()
Client: socket(), connect(), send()
UDP: (no connection between client and server)
Server: socket(), bind()
Client: socket(), sendto()
Domain Name System (DNS)
A DNS server provides name resolution (conversion from a domain name to an IP address). A name server is a process listening on UDP/TCP port 53 for requests. When a domain name is detected, it will be resolved and a reply will be sent.
DNS hierarchical database
Root name server: able to resolve all queries or another intermediate name server
Top-level domain servers (TLD): responsible for com, org, net, edu, etc, and all top-level country domains uk, fr, ca, jp.
Authoritative DNS servers: organization’s DNS servers, providing authoritative hostname to IP mappings for organization’s servers
Local name server: handles local DNS requests. Caches resolved addresses.
Queries
Iterative: contacted server replies with name of server to contact
Recursive: puts burden of name resolution on contacted name server
DNS records (stored in distributed database)
Resource records (RR) format: (name, value, type, ttl)
Type = A: name = hostname, value = IP address
Type = NS: name = domain, value = hostname of authoritative name server
Type = CNAME: name = alias name, value: canonical name
Type = MX: value = name of mail server associated with the name
HTTP
Behaviors of HTTP
Requests methods: GET, POST, HEAD, PUT(update), DELETE(remove)
Persistent connections: HTTP 1.0 requires 2 RTTs per object. HTTP 1.1 allows multiple requests to be sent and received over the same connection, reducing the overhead of establishing new connections for each request.
Pipelining: HTTP 1.1 enables sending multiple requests without waiting for each response. In HTTP 1.0, the client had to wait for a response before sending the next request.
Host header: HTTP 1.1 introduced the host header which allows hosting multiple websites on a single IP address, and enables server to identify the appropriate website to handle the request.
Caching/proxies:
Goal: satisfy client request without involving origin server
Advantages
Reduce response time for client request
Reduce traffic on an institution’s access link
Enables ‘poor’ content providers to effectively deliver content
Transport Layer
The transport layer is responsible for providing logical communication between processes. It uses the services of the network layer to transfer data between processes.
UDP: out of order delivery / no connection establishment / best effort
TCP and UDP both provide multiplexing and de-multiplexing of data from several processes
Multiplexing at sender: handle data from multiple sockets, add transport header
De-multiplexing at receiver: use header info to deliver received segments to correct socket
Port numbers and IP address in TCP and UDP
TCP: Datagram must specify: source IP address / source port number / destination IP address / destination port number
UDP: Datagram must specify: destination IP address / destination port number
UDP de-multiplexing: IP datagrams with same destination port, but different source IP address and/or source port numbers will be directed to same socket at destination
Reliable data transfer
rdt1.0: underlying channel perfectly reliable
rdt2.0: underlying channel may flip bits in packet
checksum to detect bit errors
sender retransmits packet on receipt of NAK
rdt2.1: handles garbled ACK/NAKs
sender: sequence number added to packet / must check if received ACK/NAK corrupted
receiver: must check if received packet is duplicated or not
rdt2.2: a NAK-free protocal
receiver sends ACK for last packet received OK (receiver must include sequence number of packet being ACKed)
rdt3.0 (Alternating Bit Protocol): channels with errors and loss
sender waits ‘reasonable’ amount of time for ACK
retransmits if no ACK received in this time
Go-Back-N (GBN)
One timer, on timeout, retransmit all packets in window from last ACK + 1
Receiver discards out of order packets and sends back cumulative ACK
Selective Repeat (SR)
Sender keeps track of the ACK received from the receiver and retransmits only the lost or damaged packets, instead of retransmitting the entire window of packets.
Receiver acknowledges the receipt of each packet, and uses a buffer to store out-of-order packets.
If the window size is N, we need 2N sequence numbers
TCP
TCP is designed to provide the appearance of a reliable channel over the unreliable network layer (IP). TCP is a modified hybrid of go-Back-N and selective repeat.
TCP segment
Source and destination ports: The combination of the source port and destination port fields allows TCP to establish and maintain multiple simultaneous connections between different applications or services running on the same or different hosts.
SYN: indicating that this segment is a synchronous request to initialize a TCP connection. It is used during the TCP handshake process to establish a connection
ACK (cumulative ACKS): indicating that this segment is an acknowledgment of a previously received segment. It confirms that the recipient has received the segment identified by the ACKNUM field
SEQNUM: specifies the sequence number assigned by the sender to this segment. It is used to number the data bytes in a TCP stream. By using sequence numbers, TCP ensures that data is received in the correct order and any missing or duplicate segments can be detected.
Window size: The window size field is a 16-bit value that indicates the amount of data, in bytes, that a receiver is willing to accept before requiring the sender to pause and wait for acknowledgement.
ACKNUM: indicates the sequence number that the sender of this segment expects to receive next
Checksum: used for error detection. The TCP checksum is a 16-bit field that helps ensure the integrity of the TCP segment during transmission. It is calculated by the sender and verified by the recipient to detect errors that may have occurred during transmission.
TCP connections are established by a 3-way handshake: request connection, grant connection, acknowledge
Why not 2 way handshake:
1: Server does not know the client is able to receive its message
2: If the first ACK is lost, client will send another request which will result in the server allocating more resources to make unnecessary establishment
Why picking random values for starting sequence numbers
It is possible that a server establishes two connections, retransmitted data will be delivered to the second connection
Security issue, if an attacker knows all sequence numbers starting with the same value, he is able to inject some data
Termination
Client -> Server: FIN
Server -> Client: ACK / FIN
Client -> Server: ACK
Time wait increases the chance of server receiving the final ACK
TCP flow control
Receiver informs sender of the receive window size in the header of TCP segments
At the sender, LastByteSent - LastByteAcked <= Receive Window
If the receive buffer is full, the sender will just send one byte
TCP congestion control
Slow start: the value of the congestion window begins at 1 MSS and increase by 1 MSS every time a transmitted segment is first acknowledged (grows exponentially). If there is a loss event indicated by a timeout, TCP sender sets the value of congestion window to be 1 and ‘ssthresh’ to be ‘cwnd / 2’. Slow start will end when the value of ‘cwnd’ equals ‘ssthresh’, or three duplicated ACKs detected.
Congestion avoidance mode: TCP increases the value of ‘cwnd’ by just a single MSS every RTT (grows linearly). If there is a loss event indicated by a timeout, TCP sender sets the value of congestion window to be 1 and ‘ssthresh’ to be ‘cwnd / 2’. Congestion avoidance will end when a loss event occurs, or there are three duplicated ACKs detected.
Fast recovery (TCP Reno incorporated): TCP sender sets ‘ssthresh’ to be ‘cwnd / 2’, and ‘cwnd’ to be ssthresh + 3 * MSS.
Fast retransmit: For TCP Reno - 1990, if there are three duplicated ACKs detected, retransmit immediately.
If there is timeout, TCP Reno will behave the same as TCP Tahoe. They behave differently when there are three duplicated ACKs detected. TCP Reno will go to fast recovery mode, while TCP Tahoe will go to slow start mode.
Network Layer
Datagram protocol
A connectionless protocol where each packet, also known as a datagram, is treated independently and can take different routes to reach its destination. In the case of IP, each IP packet (datagram) contains the source and destination IP addresses, as well as the data payload.
Limitation: Reliable delivery / ordering / congestion control / Fragmentation and reassembly
Functions
Path determination: route taken by packets from source to dest
Forwarding: move packets from routers input to appropriate router output
Call setup: some network architectures require router call setup along path before data flows
Network layer
Control plane
network-wide logic
determines how datagram is routed among routers along end-end path from source host to destination host
Two control-plane approaches
Traditional routing algorithms: implemented in routers
Classification
Global or decentralized
Global: all routers have complete topology, link cost info (link state algorithms)
Decentralized: router knows physically-connected neighbors, link cost to neighbors / Iterative process of computation, exchange of info with neighbors
Static or dynamic
Static: routers change slowly over time
Dynamic: routers change more quickly (periodic update / in response to link cost changes)
Dijkstra’s algorithm
Complexity: O(n2), more efficient: O(nlogn). Requires O(nE) messages
Might have oscillations
Distance Vector algorithm
Attributes
iterative: continues until no nodes exchange info (self-terminating)
asynchronous: nodes need not exchange info / iterate in lock step
distributed: each node communicates only with directly-attached neighbors
Formula
DX(Y, Z) = distance from X to Y, via Z as next hop
DX(Y, Z) = c(X, Z) + minw{DZ(Y, w)}
Process
Each node waits for message from neighbors
Recomputes distance table
If least cost path to any dest has changed, notify neighbors
Convergence time varies
May be routing loops / count-to-infinity problem
Comparison
Link state algorithm
Faster convergence / Efficient routing
Distance vector
Simple to implement / Lower overhead
Software-defined networking (SDN): implemented in remote servers
Each router contains a flow table that is computed and distributed by a logically centralized routing controller
programmable control applications
Data plane
local, per-router function
determines how datagram arriving on router input port is forwarded to router output port
forwarding table
IP Addressing
Classfull addressing
Disadvantages: inefficient use of address space, address space exhaustion
address format: a.b.c.d/x, where x is the number of bits in network portion of address
Subnetting
The network administrator takes a network with a given IP address range and subnet mask and further divides it into smaller subnets by borrowing bits from the host portion of the IP address.
Intra-AS routing refers to the routing protocols and mechanisms used within an individual Autonomous System (AS). All routers in AS must run same intra-domain protocol. An AS is a collection of networks under a single administrative control, such as an Internet Service Provider (ISP) or a large enterprise network. Intra-AS routing protocols, also known as Interior Gateway Protocols (IGPs), are used to exchange routing information and determine the best paths for routing packets within the boundaries of the AS.
Inter-AS routing refers to the routing protocols and mechanisms used for exchanging routing information between different Autonomous Systems (ASes) in the Internet. Inter-AS routing protocols, also known as Exterior Gateway Protocols (EGPs), are responsible for facilitating the exchange of routing information and enabling communication between ASes.
Fragmentation
Large IP datagram divided (fragmented) within net
One datagram becomes several datagrams
Reassembled only at final destination
IP header bits used to identify order related fragments
16-bit identifier: Examine the identification numbers of the datagrams to determine which of the datagrams are actually fragments of the same larger datagram
Flags: In order for the destination host be absolutely sure it has received the last fragment of the original datagram, the last fragment has a flag bit set to 0, whereas all the other fragments have this flag bit set to 1
Fragment offset: Specify where the fragment fits within the original IP datagram
IPv6
Why do we transition to IPv6
Address space exhaustion / Enhance security / Improved network performance
Features
128 bit address are written in hex: x : x : x : x : x : x : x : x, each x is 4 hex digits, sequence of zero fields given by “::”
IPv6 8 fields in base header vs 13 fields in IPv4 can provide faster processing, simpler management, and more flexibility
Fragmentation is no longer performed at intermediate routers. The source host should choose datagram size so fragmentation is not necessary. (e.g. send datagram with different sizes to target until they don’t arrive)
No checksum design reduces the processing overhead on intermediate routers and simplifies packet forwarding. The link-layer protocols in modern networks, such as Ethernet, already have their own error detection mechanisms, such as CRC (Cyclic Redundancy Check), which ensure data integrity during transmission.
Additional support
Multicast and anycast routing
Security
Mobile hosts and auto-configuration
Real time applications
Transition from IPv4 to IPv6
Dual stack: some routers with dual stack can translate between formats (tend to lose some functionalities)
Tunneling: IPv6 carried as payload in IPv4 datagram among IPv4 routers
Internet Control Message Protocol (ICMP)
ICMP is primarily used for diagnostic and error reporting purposes in IP networks. It provides a means for network devices, such as routers and hosts, to communicate with each other regarding network-related issues.
Main functions
Error Reporting: ICMP is used to report errors and anomalies in IP packet delivery. For example, if a router encounters a problem while forwarding an IP packet, it can generate an ICMP error message and send it back to the source IP address to inform the sender about the issue.
Network Reachability: ICMP is used to check the reachability of hosts or networks. The most common example is the ICMP Echo Request and Echo Reply messages, also known as “ping.” By sending an Echo Request message, a device can determine if another device on the network is reachable and functioning properly.
Path MTU Discovery: ICMP is involved in the process of Path Maximum Transmission Unit (PMTU) Discovery. It helps determine the maximum size of IP packets that can be transmitted without fragmentation along a path between two hosts. ICMP messages, such as “Packet Too Big,” are used to inform the sender that the packet is too large and needs to be fragmented or reduced in size.
Trace the path taken by a packet over the network: By sending packets with increasing TTL values, traceroute effectively maps the network path from the source to the destination. The ICMP “Time Exceeded” messages indicate the presence of routers along the path, while the “Echo Reply” message confirms the packet’s successful arrival at the destination.
Link Layer
Data-link layer has responsibility of transferring datagram from one node to physically adjacent node over a link
Offered services:
Framing
Link access
Reliable delivery between two physically connected devices
Flow control
Error detection / correction
Error detection and correction
Parity
Single bit parity (detect single bit errors)
odd parity: number of 1, if odd, parity bit = 0
even parity: number of 1, if odd, parity bit = 1
Two dimensional bit parity (detect and correct single bit errors)
Checksums
Errors do not occur as a one-off single bit error, we normally have an error burst
Internet checksum is 1’s compliment sum of the segment contents
When information is received, sum the contents and add with the checksums. If it is “FFFFFFFF”, then the information received is correct
Cyclic Redundancy Check polynomials (CRCs)
Choose r + 1 bit pattern (Generator) G
Goal: choose r CRC bits, R such that
<D, R> exactly divisible by G
receiver knows G and divides <D, R> by G
can detect all burst errors less that r + 1 bits
If G(x) with degree of r (length r + 1), then it is guaranteed to detect burst length errors of r or less
Advantages: CRCs are popular because they are simple to implement in binary hardware, easy to analyze mathematically, and particularly good at detecting common errors caused by noise in transmission channels.
Multiple access protocols
Channel partitioning
Frequency Division Multiple Access (FDMA)
Channel spectrum divided into frequency bands
Each station assigned fixed frequency band
Unused transmission time in frequency bands go idle
Time Division Multiple Access (TDMA)
Access to channel in ‘rounds’
Each station gets fixed length slot (length = packet transmission time) in each round
Unused slots go idle
Code Division Multiple Access (CDMA)
Unique code assigned to each user to encode data
Allows multiple users to ‘coexist’ and transmit simultaneously with minimal interference (if codes are orthogonal)
Random access
Slotted ALOHA
Time is divided into equal size slots
Node with new arriving packets: transmit at beginning of next slot
If there is a collision: re-transmit the packet in future slots with probability p, until successful
At best: channel used for useful transmissions 37% of time (Np = 1)
Pure (unslotted) ALOHA
Transmit immediately without awaiting for beginning of slot
Collision probability increases: frame sent at t0 collides with other packets sent in [t0 - 1, t0 + 1]
At best: 18% of time (Np = 0.5)
Carrier Sense Multiple Access (CSMA)
Listen before transmit
Types of CSMA
Persistent CSMA: retry immediately with probability p when channel becomes idle
Non-persistent CSMA: retry after random interval
CSMA collisions
Collisions can occur due to propagation delay
Collisions means entire packet transmission time is wasted - up to 2 packet times
Carrier Sense Multiple Access with Collision Detection (CSMA/CD)
Master node invites slave nodes to transmit in turn
Token passing
Control token passed from one node to next sequentially
Distributed polling
Begins with N short reservation slots
Stations with message to send posts reservation
Reservation seen by all stations
After reservation slots, message transmissions ordered by known priority
LAN / MAC / physical address
Used locally to get frame from one interface to another physically-connected interface
Format: 48 bit MAC address burned in the adapter ROM (hexadecimal notation)
Address Resolution Protocol (ARP)
Given a destination IP address, work out the MAC address of the destination
ARP Table: IP/MAC address mappings for some LAN nodes
Table entry format: <IP Address; MAC Address; TTL>
Process
1: A broadcasts ARP query packet, containing B’s IP address
2: B receives ARP packet, replies to A with its MAC address
3: A caches IP-to-MAC address pair in its ARP table until timeout
Ethernet
Dominant wired LAN technology
Sending adapter encapsulates IP datagram in Ethernet frame
Physical topology
Bus: all nodes in same collision domain (can collide with each other | CSMA/CD)
Star: active switch in center / each “spoke” runs a separate Ethernet protocol (nodes do not collide with each other)
Switches
Functions
Filtering, storing, forwarding Ethernet frames
Examine incoming frame’s MAC address, selectively forward frame to one-or-more outgoing links when frame is to be forwarded on segment.
Plug and play, self-learning
Switch table entry: [node LAN address, switch interface, time stamp, TTL]
If a frame’s destination is unknown, flood (flood learning)
Cycles result
organize switches in a spanning tree by disabling subset of interfaces
The Spanning Tree Protocol (STP) is a network protocol used in Layer 2 Ethernet networks to prevent loops and ensure the creation of a loop-free topology. Its main role is to establish a tree-like structure within a network by selectively blocking redundant paths.
MPLS
Initial goal: high-speed IP forwarding using fixed length labels
Fast lookup using fixed length identifier (rather than longest prefix matching)
MPLS routing: path to destination can be based on source and destination addresses
Fast reroute: pre-compute backup routes in case of link failure
Traffic Engineering: MPLS allows network administrators to control the path and flow of traffic through the network.
Flexibility: MPLS forwarding decisions can differ from those of IP
Use destination and source addresses to route flows to same destination differently.
Re-route flows quickly if link fails: pre-computed backup paths
MPLS can support multiple levels of connection tunnelling through label stacking (VPN support)
Traceroute
A network diagnostic tool used to trace the path that an IP packet takes from a source device to a destination device over an IP network. It provides valuable information about the routers or network devices traversed by the packet, helping identify network connectivity issues, bottlenecks, and routing problems
Security
Confidentiality: only sender, intended receiver should understand message contents
Authentication: sender, receiver want to confirm identity of each other
Message integrity: sender, receiver want to ensure message not altered without detection
Access and availability: services must be accessible and available to users
Attacks
Eavesdrop: intercept messages
Actively insert messages into connections
Impersonation: Fake (spoof) source address in packet (or any field in packet)
Hijacking: take over ongoing connection by removing sender or receiver, inserting himself in place
Deny of service: prevent service from being used by others
Nonce: number used only once-in-a-lifetime
Challenge-response protocols are authentication protocols that involve a challenge presented by the verifier to the prover, who must provide a response that proves their identity or possession of a secret key.
Hash function
produces fixed-size message digest (fingerprint)
given message digest x, computationally infeasible to find m such that x = H(m)
process data quickly and hashed value is efficient for verification
Public key encryption algorithm
Requirements
1: need K+(B) and K-(B) such that K-(K+(m)) = m
2: given public key K+(B), it should be impossible to compute private key K-(B).
Important property
K-(K+(m)) = m = K+(K-(m))
Certification authority (CA): binds public key to particular entry, E.
E registers its public key with CA by providing “proof of identity” to CA
CA creates certificate binding E to its public key
Key distribution problem: refers to the challenge of securely distributing encryption keys between parties involved in secure communication or cryptographic systems. The key distribution problem arises from the need to establish and exchange encryption keys in a secure and reliable manner.
Secure e-mail
Alice wants to provide confidentiality, sender authentication, message integrity.