4 Representing Numbers and Letters with Binary

4 二进制

Hi I'm Carrie Anne, this is Crash Course Computer Science

嗨,我是CarrieAnne,欢迎收看计算机科学速成课!

and today we're going to talk about how computers store and represent numerical data.

今天,我们讲计算机如何存储和表示数字

Which means we've got to talk about Math!

所以会有一些数学

But don't worry.

不过别担心

Every single one of you already knows exactly what you need to know to follow along.

你们的数学水平绝对够用了

So, last episode we talked about how transistors can be used to build logic gates,

上集我们讲了,怎么用晶体管做逻辑门

which can evaluate boolean statements.

逻辑门可以判断布尔语句

And in boolean algebra, there are only two, binary values: true and false.

布尔代数只有两个值:True和False

But if we only have two values,

但如果只有两个值,

how in the world do we represent information beyond just these two values?

我们怎么表达更多东西?

That's where the Math comes in.

这就需要数学了

So, as we mentioned last episode, a single binary value can be used to represent a number.

上集提到,1个二进制值可以代表1个数

Instead of true and false, we can call these two states 1 and 0 which is actually incredibly useful.

我们可以把真和假,当做1和0

And if we want to represent larger things we just need to add more binary digits.

如果想表示更多东西,加位数就行了

This works exactly the same way as the decimal numbers that we're all familiar with.

和我们熟悉的十进制一样

With decimal numbers there are "only" 10 possible values a single digit can be; 0 through 9,

十进制只有10个数(0到9)

and to get numbers larger than 9 we just start adding more digits to the front.

要表示大于9的数,加位数就行了

We can do the same with binary.

二进制也可以这样玩

For example, let's take the number two hundred and sixty three.

拿263举例

What does this number actually represent?

这个数字"实际"代表什么?

Well, it means we've got 2 one-hundreds, 6 tens, and 3 ones.

2个100,6个10,3个1

If you add those all together, we've got 263.

加在一起,就是263

Notice how each column has a different multiplier.

注意每列有不同的乘数

In this case, it's 100, 10, and 1.

100,10,1

Each multiplier is ten times larger than the one to the right.

每个乘数都比右边大十倍

That's because each column has ten possible digits to work with, 0 through 9,

因为每列有10个可能的数字(0到9)

after which you have to carry one to the next column.

如果超过9,要在下一列进1.

For this reason, it's called base-ten notation, also called decimal since deci means ten.

因此叫"基于十的表示法"或十进制

AND Binary works exactly the same way, it's just base-two.

二进制也一样,只不过是基于2而已

That's because there are only two possible digits in binary 1 and 0.

因为二进制只有两个可能的数,1和0

This means that each multiplier has to be two times larger than the column to its right.

意味着每个乘数必须是右侧乘数的两倍

Instead of hundreds, tens, and ones, we now have fours, twos and ones.

就不是之前的100,10,1,而是4,2,1

Take for example the binary number: 101.

拿二进制数101举例

This means we have 1 four, 0 twos, and 1 one.

意味着有,1个"4",0个"2",1个"1"

Add those all together and we've got the number 5 in base ten.

加在一起,得到十进制的5

But to represent larger numbers, binary needs a lot more digits.

为了表示更大的数字,二进制需要更多位数

Take this number in binary 10110111.

拿二进制数10110111举例

We can convert it to decimal in the same way.

我们可以用相同的方法转成十进制

We have 1 x 128, 0 x 64, 1 x 32, 1 x 16, 0 x 8, 1 x 4, 1 x 2, and 1 x 1.

1x128,0x64,1x32,1x16,0x8,1x4,1x2,1x1

Which all adds up to 183.

加起来等于183

Math with binary numbers isn't hard either.

二进制数的计算也不难

Take for example decimal addition of 183 plus 19.

以十进制数183加19举例

First we add 3 + 9, that's 12, so we put 2 as the sum and carry 1 to the ten's column.

首先3+9,得到12,然后位数记作2,向前进1

Now we add 8 plus 1 plus the 1 we carried, thats 10, so the sum is 0 carry 1.

现在算8+1+1=10,所以位数记作0,再向前进1

Finally we add 1 plus the 1 we carried, which equals 2.

最后1+1=2,位数记作2

So the total sum is 202.

所以和是202

Here's the same sum but in binary.

二进制也一样

Just as before, we start with the ones column.

和之前一样,从个位开始

Adding 1+1 results in 2, even in binary.

1+1=2,在二进制中也是如此

But, there is no symbol "2" so we use 10 and put 0 as our sum and carry the 1.

但二进制中没有2,所以位数记作0,进1

Just like in our decimal example.

就像十进制的例子一样

1 plus 1, plus the 1 carried,

1+1,再加上进位的1

equals 3 or 11 in binary,

等于3,用二进制表示是11

so we put the sum as 1 and we carry 1 again, and so on.

所以位数记作1,再进1,以此类推

We end up with this number, which is the same as the number 202 in base ten.

最后得到这个数字,跟十进制202是一样的

Each of these binary digits, 1 or 0, is called a "bit".

二进制中,一个1或0叫一"位"

So in these last few examples, we were using 8-bit numbers with their lowest value of zero

上个例子我们用了8位,8位能表示的最小数是0,8位都是0

and highest value is 255, which requires all 8 bits to be set to 1.

最大数是255,8位都是1

Thats 256 different values, or 2 to the 8th power.

能表示256个不同的值,2的8次方

You might have heard of 8-bit computers, or 8-bit graphics or audio.

你可能听过8位机,8位图像,8位音乐

These were computers that did most of their operations in chunks of 8 bits.

意思是计算机里,大部分操作都是8位8位这样处理的

But 256 different values isn't a lot to work with, so it meant things like 8-bit games

但256个值不算多,所以它意味着8位游戏

were limited to 256 different colors for their graphics.

所使用的图形被限制为256种不同的颜色

And 8-bits is such a common size in computing, it has a special word: a byte.

8位是如此常见,以至于有专门的名字:字节

A byte is 8 bits.

1字节=8位,1bytes=8bits

If you've got 10 bytes, it means you've really got 80 bits.

如果有10个字节,意味着有80位

You've heard of kilobytes, megabytes, gigabytes and so on.

你听过千字节(KB)兆字节(MB)千兆字节(GB)等等

These prefixes denote different scales of data.

不同前缀代表不同数量级

Just like one kilogram is a thousand grams,

就像1千克=1000克,

1 kilobyte is a thousand bytes.

1千字节=1000字节

or really 8000 bits.

或8000位

Mega is a million bytes (MB), and giga is a billion bytes (GB).

Mega是百万字节(MB),Giga是十亿字节(GB)

Today you might even have a hard drive that has 1 terabyte (TB) of storage.

如今你可能有1TB的硬盘

That's 8 trillion ones and zeros.

8万亿个1和0

But hold on!

等等,

That's not always true.

我们有另一种计算方法

In binary, a kilobyte has two to the power of 10 bytes, or 1024.

二进制里,1千字节=2的10次方=1024字节

1000 is also right when talking about kilobytes,

1000也是千字节(KB)的正确单位

but we should acknowledge it isn't the only correct definition.

1000和1024都对

You've probably also heard the term 32-bit or 64-bit computers

你可能听过32位或64位计算机

you're almost certainly using one right now.

你现在用的电脑几乎肯定是其中一种

What this means is that they operate in chunks of 32 or 64 bits.

意思是一块块处理数据,每块是32位或64位

That's a lot of bits!

这可是很多位

The largest number you can represent with 32 bits is just under 4.3 billion.

32位能表示的最大数,是43亿左右

Which is thirty-two 1's in binary.

也就是32个1

This is why our Instagram photos are so smooth and pretty

所以Instagram照片很清晰

they are composed of millions of colors,

它们有上百万种颜色

because computers today use 32-bit color graphics

因为如今都用32位颜色

Of course, not everything is a positive number

当然,不是所有数都是正数

like my bank account in college.

比如我上大学时的银行账户T_T

So we need a way to represent positive and negative numbers.

我们需要有方法表示正数和负数

Most computers use the first bit for the sign:

大部分计算机用第一位表示正负:

1 for negative, 0 for positive numbers,

1是负,0是正

and then use the remaining 31 bits for the number itself.

用剩下31位来表示符号外的数值

That gives us a range of roughly plus or minus two billion.

能表示的数的范围大约是正20亿到负20亿

While this is a pretty big range of numbers, it's not enough for many tasks.

虽然是很大的数,但许多情况下还不够用

There are 7 billion people on the earth, and the US national debt is almost 20 trillion dollars after all.

全球有70亿人口,美国国债近20万亿美元

This is why 64-bit numbers are useful.

所以64位数很有用

The largest value a 64-bit number can represent is around 9.2 quintillion!

64位能表达最大数大约是9.2×10^18

That's a lot of possible numbers and will hopefully stay above the US national debt for a while!

希望美国国债在一段时间内不会超过这个数!

Most importantly, as we'll discuss in a later episode,

重要的是(我们之后的视频会深入讲)

computers must label locations in their memory,

计算机必须给内存中每一个位置,做一个"标记"

known as addresses, in order to store and retrieve values.

这个标记叫"地址",目的是为了方便存取数据

As computer memory has grown to gigabytes and terabytes that's trillions of bytes

如今硬盘已经增长到GB和TB,上万亿个字节!

it was necessary to have 64-bit memory addresses as well.

内存地址也应该有64位

In addition to negative and positive numbers,

除了负数和正数,

computers must deal with numbers that are not whole numbers,

计算机也要处理非整数

like 12.7 and 3.14, or maybe even stardate: 43989.1.

比如12.7和3.14,或"星历43989.1"

These are called "floating point" numbers,

这叫浮点数

because the decimal point can float around in the middle of number.

因为小数点可以在数字间浮动

Several methods have been developed to represent floating point numbers.

有好几种方法表示浮点数

The most common of which is the IEEE 754 standard.

最常见的是IEEE754标准

And you thought historians were the only people bad at naming things!

你以为只有历史学家取名很烂吗?

In essence, this standard stores decimal values sort of like scientific notation.

它用类似科学计数法的方法,来存十进制值

For example, 625.9 can be written as 0.6259 x 10^3.

例如,625.9可以写成0.6259×10^3

There are two important numbers here: the .6259 is called the significand.

这里有两个重要的数:

And 3 is the exponent.

.6259叫"有效位数",3是指数

In a 32-bit floating point number,

在32位浮点数中

the first bit is used for the sign of the number -positive or negative.

第1位表示数的符号——正或负

The next 8 bits are used to store the exponent

接下来8位存指数

and the remaining 23 bits are used to store the significand.

剩下23位存有效位数

Ok, we've talked a lot about numbers, but your name is probably composed of letters,

好了,聊够数了,但你的名字是字母组成的

so it's really useful for computers to also have a way to represent text.

所以我们也要表示文字

However, rather than have a special form of storage for letters,

与其用特殊方式来表示字母,

computers simply use numbers to represent letters.

计算机可以用数表示字母

The most straightforward approach might be to simply number the letters of the alphabet:

最直接的方法是给字母编号:

A being 1, B being 2, C 3, and so on.

A是1,B是2,C是3,以此类推

In fact, Francis Bacon, the famous English writer,

著名英国作家弗朗西斯·培根(FrancisBacon)

used five-bit sequences to encode all 26 letters of the English alphabet

曾用5位序列来编码英文的26个字母

to send secret messages back in the 1600s.

在十六世纪传递机密信件

And five bits can store 32 possible values so that's enough for the 26 letters,

五位(bit)可以存32个可能值(2^5)这对26个字母够了

but not enough for punctuation, digits, and upper and lower case letters.

但不能表示标点符号,数字和大小写字母

Enter ASCII, the American Standard Code for Information Interchange.

ASCII,美国信息交换标准代码

Invented in 1963, ASCII was a 7-bit code, enough to store 128 different values.

发明于1963年,ASCII是7位代码,足够存128个不同值

With this expanded range, it could encode capital letters, lowercase letters,

范围扩大之后,可以表示大写字母,小写字母,

digits 0 through 9, and symbols like the @ sign and punctuation marks.

数字0到9,@这样的符号,以及标点符号

For example, a lowercase 'a' is represented by the number 97, while a capital 'A' is 65.

举例,小写字母a用数字97表示,大写字母A是65

A colon is 58 and a closed parenthesis is 41.

:是58,)是41

ASCII even had a selection of special command codes,

ASCII甚至有特殊命令符号

such as a newline character to tell the computer where to wrap a line to the next row.

比如换行符,用来告诉计算机换行

In older computer systems,

在老计算机系统中

the line of text would literally continue off the edge of the screen if you didn't include a new line character!

如果没换行符,文字会超出屏幕

Because ASCII was such an early standard,

因为ASCII是个很早的标准

it became widely used,

所以它被广泛使用

and critically, allowed different computers built by different companies to exchange data.

让不同公司制作的计算机,能互相交换数据

This ability to universally exchange information is called "interoperability".

这种通用交换信息的能力叫"互操作性"

However, it did have a major limitation: it was really only designed for English.

但有个限制:它是为英语设计的

Fortunately, there are 8 bits in a byte, not 7,

幸运的是,一个字节有8位,而不是7位

and it soon became popular to use codes 128 through 255,

128到255的字符渐渐变得常用

previously unused, for "national" characters.

这些字符以前是空的,是给各个国家自己"保留使用的"

In the US, those extra numbers were largely used to encode additional symbols,

在美国,这些额外的数字主要用于编码附加符号

like mathematical notation, graphical elements, and common accented characters.

比如数学符号,图形元素和常用的重音字符

On the other hand, while the Latin characters were used universally,

另一方面,虽然拉丁字符被普遍使用

Russian computers used the extra codes to encode Cyrillic characters,

在俄罗斯,他们用这些额外的字符表示西里尔字符

and Greek computers, Greek letters, and so on.

而希腊电脑用希腊字母,等等

And national character codes worked pretty well for most countries.

这些保留下来给每个国家自己安排的空位,对大部分国家都够用

The problem was,

问题是

if you opened an email written in Latvian on a Turkish computer,

如果在土耳其电脑上打开拉脱维亚语写的电子邮件

the result was completely incomprehensible.

会显示乱码

And things totally broke with the rise of computing in Asia,

随着计算机在亚洲兴起,这种做法彻底失效了

as languages like Chinese and Japanese have thousands of characters.

中文和日文这样的语言有数千个字符

There was no way to encode all those characters in 8-bits!

根本没办法用8位来表示所有字符!

In response, each country invented multi-byte encoding schemes,

为了解决这个问题,每个国家都发明了多字节编码方案

all of which were mutually incompatible.

但相互不兼容

The Japanese were so familiar with this encoding problem that they had a special name for it:

日本人总是碰到编码问题,以至于专门有词来称呼:

mojibake, which means "scrambled text".

mojibake意思是乱码

And so it was born Unicode one format to rule them all.

所以Unicode诞生了统一所有编码的标准

Devised in 1992 to finally do away with all of the different international schemes

设计于1992年,解决了不同国家不同标准的问题

it replaced them with one universal encoding scheme.

Unicode用一个统一编码方案

The most common version of Unicode uses 16 bits with space for over a million codes -

最常见的Unicode是16位的,有超过一百万个位置-

enough for every single character from every language ever used

对所有语言的每个字符都够了

more than 120,000 of them in over 100 types of script

100多种字母表加起来占了12万个位置。

plus space for mathematical symbols and even graphical characters like Emoji.

还有位置放数学符号,甚至Emoji

And in the same way that ASCII defines a scheme for encoding letters as binary numbers,

就像ASCII用二进制来表示字母一样

other file formats like MP3s or GIFs -

其他格式比如MP3或GIF-

use binary numbers to encode sounds or colors of a pixel in our photos, movies, and music.

用二进制编码声音/颜色,表示照片,电影,音乐

Most importantly, under the hood it all comes down to long sequences of bits.

重要的是,这些标准归根到底是一长串位

Text messages, this YouTube video, every webpage on the internet,

短信,这个YouTube视频,互联网上的每个网页

and even your computer's operating system, are nothing but long sequences of 1s and 0s.

甚至操作系统,只不过是一长串1和0

So next week,

下周

we'll start talking about how your computer starts manipulating those binary sequences,

我们会聊计算机怎么操作二进制

for our first true taste of computation.

初尝"计算"的滋味

Thanks for watching. See you next week.

感谢观看,下周见