What is a distributed system?
multiple cooperating computers
storage for big web sites, MapReduce, peer-to-peer sharing, &c
lots of critical infrastructure is distributed
什么是分布式系统?
多台协同计算机
大型网站的存储、MapReduce、对等共享和c
许多关键的基础设施都是分布式的
Why do people build distributed systems?
to increase capacity via parallelism
to tolerate faults via replication
to place computing physically close to external entities
to achieve security via isolation
为什么人们要建立分布式系统?
通过并行增加容量
通过复制容忍故障
将计算物理上靠近外部实体
通过隔离实现安全
But:
many concurrent parts, complex interactions
must cope with partial failure
tricky to realize performance potential
Why take this course?
interesting -- hard problems, powerful solutions
used by real systems -- driven by the rise of big Web sites
active research area -- important unsolved problems
hands-on -- you'll build real systems in the labs
但是:
许多并发部件,复杂的交互作用
必须处理部分故障
难以实现性能潜力
为什么选这门课?
有趣——难题,强有力的解决方案
被实际系统使用——由大型网站的兴起所驱动
活跃的研究领域——尚未解决的重要问题
实际操作——你将在实验室里建立真正的系统
实验室:
目标:加深对一些重要技术的理解
目标:有分布式编程经验
第一个实验将在一周后从星期五开始
之后一段时间内每周一次
实验1:MapReduce
实验2:使用Raft进行容错复制
实验3:容错密钥/值存储
实验4:切分密钥/值存储
MAIN TOPICS
This is a course about infrastructure for applications.
* Storage.
* Communication.
* Computation.
The big goal: abstractions that hide the complexity of distribution.
A couple of topics will come up repeatedly in our search.
Topic: implementation
RPC, threads, concurrency control.
The labs...
主要议题
这是一门关于应用程序基础设施的课程。
*储存。
*沟通。
*计算。
大目标:隐藏分布复杂性的抽象。
在我们的搜索中,有几个主题会反复出现。
主题:实施
RPC、线程、并发控制。
实验室。。。
Topic: performance
The goal: scalable throughput
Nx servers -> Nx total throughput via parallel CPU, disk, net.
[diagram: users, application servers, storage servers]
So handling more load only requires buying more computers.
Rather than re-design by expensive programmers.
Effective when you can divide work w/o much interaction.
Scaling gets harder as N grows:
Load im-balance, stragglers, slowest-of-N latency.
Non-parallelizable code: initialization, interaction.
Bottlenecks from shared resources, e.g. network.
Some performance problems aren't easily solved by scaling
e.g. quick response time for a single user request
e.g. all users want to update the same data
often requires better design rather than just more computers
Lab 4
主题:性能
目标:可扩展的吞吐量
Nx服务器->通过并行CPU、磁盘、网络的Nx总吞吐量。
[图表:用户、应用程序服务器、存储服务器]
因此,处理更多的负载只需要购买更多的计算机。
而不是由昂贵的程序员重新设计。
当你能在没有太多互动的情况下分工合作。
随着N的增长,缩放变得越来越困难:
负载即时通讯平衡,分散,最慢的N延迟。
非并行代码:初始化,交互。
来自共享资源(如网络)的瓶颈。
一些性能问题不容易通过扩展来解决
e、 g.单个用户请求的快速响应时间
e、 g.所有用户都希望更新相同的数据
通常需要更好的设计,而不仅仅是更多的计算机
实验4
Topic: fault tolerance
1000s of servers, big network -> always something broken
We'd like to hide these failures from the application.
We often want:
Availability -- app can make progress despite failures
Recoverability -- app will come back to life when failures are repaired
Big idea: replicated servers.
If one server crashes, can proceed using the other(s).
Labs 1, 2 and 3
主题:容错
上千台服务器,庞大的网络->总是出故障
我们希望对应用程序隐藏这些失败。
我们经常想要:
可用性——应用程序可以在失败的情况下取得进展
可恢复性——修复故障后,应用程序将恢复活力
好主意:复制服务器。
如果一台服务器崩溃,可以继续使用其他服务器。
实验1、2和3
Topic: consistency
General-purpose infrastructure needs well-defined behavior.
E.g. "Get(k) yields the value from the most recent Put(k,v)."
Achieving good behavior is hard!
"Replica" servers are hard to keep identical.
Clients may crash midway through multi-step update.
Servers may crash, e.g. after executing but before replying.
Network partition may make live servers look dead; risk of "split brain".
Consistency and performance are enemies.
Strong consistency requires communication,
e.g. Get() must check for a recent Put().
Many designs provide only weak consistency, to gain speed.
e.g. Get() does *not* yield the latest Put()!
Painful for application programmers but may be a good trade-off.
Many design points are possible in the consistency/performance spectrum!
主题:一致性
通用基础设施需要定义良好的行为。
E、 g.“Get(k)从最近的卖出(k,v)中得出值。”
获得良好的行为是很难的!
“副本”服务器很难保持一致。
客户端可能在多步更新过程中崩溃。
服务器可能会崩溃,例如在执行之后但在响应之前。
网络分区可能会使活动服务器看起来死气沉沉,“大脑分裂”的风险。
一致性和性能是敌人。
强烈的一致性需要沟通,
e、 g.Get()必须检查最近的Put()。
许多设计只提供弱一致性,以获得速度。
e、 g.Get()不*生成最新的Put()!
对于应用程序程序员来说很痛苦,但这可能是一个很好的折衷方案。
在一致性/性能范围内有许多设计点是可能的!
CASE STUDY: MapReduceLet's talk about MapReduce (MR) as a case studya good illustration of 6.824's main topicshugely influentialthe focus of Lab 1MapReduce overviewcontext: multi-hour computations on multi-terabyte data-setse.g. build search index, or sort, or analyze structure of webonly practical with 1000s of computersapplications not written by distributed systems expertsoverall goal: easy for non-specialist programmersprogrammer just defines Map and Reduce functionsoften fairly simple sequential codeMR takes care of, and hides, all aspects of distribution!
案例研究:MapReduce
让我们以MapReduce(MR)作为案例研究
6.824的主要主题
极具影响力
实验1的重点
MapReduce概述
上下文:多TB数据集上的多小时计算
e、 建立搜索索引,或排序,或分析网站的结构
仅适用于1000台计算机
不是由分布式系统专家编写的应用程序
总体目标:方便非专业程序员
程序员只定义Map和Reduce函数
通常相当简单的序列码
MR负责并隐藏分销的各个方面!
Abstract view of a MapReduce job
input is (already) split into M files
Input1 -> Map -> a,1 b,1
Input2 -> Map -> b,1
Input3 -> Map -> a,1 c,1
| | |
| | -> Reduce -> c,1
| -----> Reduce -> b,2
---------> Reduce -> a,2
MR calls Map() for each input file, produces set of k2,v2
"intermediate" data
each Map() call is a "task"
MR gathers all intermediate v2's for a given k2,
and passes each key + values to a Reduce call
final output is set of <k2,v3> pairs from Reduce()s
Example: word countinput is thousands of text filesMap(k, v)split v into wordsfor each word wemit(w, "1")Reduce(k, v)emit(len(v))
MapReduce可扩展性很好:
N“worker”计算机为您提供Nx吞吐量。
Maps()可以并行运行,因为它们不交互。
Reduce()s也是如此。
所以你可以通过购买更多的计算机来获得更多的吞吐量。
MapReduce hides many details:
sending app code to servers
tracking which tasks are done
moving data from Maps to Reduces
balancing load over servers
recovering from failures
However, MapReduce limits what apps can do:
No interaction or state (other than via intermediate output).
No iteration, no multi-stage pipelines.
No real-time or streaming processing.
MapReduce隐藏了许多细节:
正在将代码发送到应用服务器
跟踪已完成的任务
将数据从地图移动到reduce
在服务器上平衡负载
从故障中恢复
但是,MapReduce限制了应用程序的功能:
无交互或状态(除了通过中间输出)。
没有迭代,没有多级管道。
无实时或流式处理。