Lecture 1 Goals, Principles, And Paradigms

Goals of Distributed Systems
Principles and Paradigms
- Key Principles of Distributed Systems

Goals of Distributed Systems

Transparency
Dependability
Scalability
Performance
Flexibility

Transparency

Concealment of the separation of components of a distributed system - single image view. User can’t see that the system is distributed.

There are a number of forms of transparency, including the following. Ordered by ease to achieve:

Access: Local/remote resources accessed in the same way (e.g. file system - same commands to access any file)
Location: Users are unaware of the location of resources (e.g. remote files don’t have different names, such as host:filename)
Migration: Resources can migrate without a name change (You don’t have this with Google Drive/Dropbox cloud sync: you have location/access transparency, but file name changes if the file moves to a different location)
Replication: Users unaware of the existence of multiple copies (E.g. you might not see an edit another person has made, so you don’t have replication transparency)
Failure: Users are unaware of the failure of individual components
Concurrency: Users are unaware of sharing resources with others

Is transparency always desirable?

In some situations, it is useful to know that a file is stored remotely (e.g. optimising latency by not reading/writing all the time)
There are other cases too, where some transparency is useful, and others may not be

Is it always possible?

Concurrency and failure transparency can be very difficult

Dependability

In theory, distributed systems promise higher availability (replication), but availability may degrade (more components, so more points of failure)
- Constantly have to deal with the threat of failure
Usually requires all the other goals/principles of distributed systems
- Dependability requires consistency, security and fault tolerance

Scalability

A system is scalable if it can handle the addition of users and resources without suffering a noticeable loss of performance or increase in admin complexity

Scaling has three dimensions:
- Size: number of users and resources (may overload the system)
- Geography: the distance between users and resources (communication problem)
- Administration: number of organisations that have admin control over parts of the system (administrative mess)
Scalability often conflicts with (small system) performance: need to ask yourself how big the system is actually going to be - don’t need to overengineer your solution.
‘Scalability’ claim is often abused: the meaning isn’t clear. Size is undefined/may not relate to desire
Vertical Scaling: “Scaling UP”: Increasing resources of a single machine. Not really a distributed system - just replacing one machine with a faster one
Horizontal Scaling: “Scaling OUT”: Adding more machines

Techniques for Scaling:

Hiding communication latencies
Distribution (spreading data and control around)
Replication (making copies of data and processes)
Decentralisation
- Services (Don’t want to have a service running on only one machine)
- Data (don’t centralise directories/storage)
- Algorithms
  - Don’t require any machine to hold the complete system state
  - Allow nodes to make decisions based on local information
  - Algorithms must survive failure of nodes
  - No assumption of a global clock
- It’s hard to write decentralised algorithms

Performance

Any system should thrive for max performance, but in distributed systems, performance conflicts with other desired properties (transparency, security, dependability, scalability)

Numbers Every Programmer should Know

L1 cache reference ...................... 0.5 ns
Branch mispredict ......................... 5 ns
L2 cache reference ........................ 7 ns
Mutex lock/unlock ........................ 25 ns
Main memory reference ................... 100 ns
Compress 1K bytes with Zippy .......... 3,000 ns = 3 us
Send 2K bytes over 1 Gbps network .... 20,000 ns = 20 us
Read 1 MB sequentially from memory .. 250,000 ns = 250 us
Round trip within same datacenter ... 500,000 ns = 0.5 ms
Disk seek ........................ 10,000,000 ns = 10 ms
Read 1 MB sequentially from disk . 20,000,000 ns = 20 ms
Send packet CA->Netherlands->CA . 150,000,000 ns = 150 ms

from Peter Norvig, Jeff Dean, see also: http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html

Flexibility

Build a system out of required components
Extensibility: Components/serviced can be changed or added
Openness (and clarity) of interfaces/spec
- Allows reimplementation and extension
Interoperability - allow others to add/build onto your system
Separation of policy and mechanism (standardised internal interfaces)

Common Mistakes, and Rules of Thumb

Common Mistakes

Reliable network
Zero latency
Infinite bandwidth
Secure network
Topology does not change
One administrator
Zero transport cost
Everything is homogeneous: people have different versions, different libraries etc.

‘Rules of Thumb’

Tradeoffs: many challenges provide conflicting requirements, so need to have trade offs, and understand what is more important
Separation of Concerns: Split the problem into individual concerns, and address these independently
End-to-End Argument: Some communication functions can only be reliably implemented at the application level
Policy vs. Mechanism: System should build mechanisms that allow flexible application of policies (avoid built in policies: e.g. crypto choice)
K.I.S.S.: Keep it as simple as possible

Principles and Paradigms

Key Principles of Distributed Systems

Principles

System Architecture
Communication
Partitioning, Replication and Consistency
Synchronisation and Coordination
Naming
Fault Tolerance
Security

Paradigms

Shared Memory
Distributed Objects
Distributed file system (everything is a file)
Distributed coordination
Server Oriented Architecture, web services
Distributed database
Shared documents (e.g. web: everything’s a document)
Agents