请选择 进入手机版 | 继续访问电脑版

英语家园

 找回密码
 注册

QQ登录

只需一步,快速开始

超棒的英语学习网站快速提高英语水平YY语音频道,免费学习本广告位招租

社区广播台

查看: 117|回复: 0
收起左侧

[TED] 【TED】如何解读基因组并重组人类?

[复制链接]

汇报天数: 101 天

连续汇报: 2 天

[LV.6]常住居民II

积分排名 2

管理员

Rank: 45Rank: 45Rank: 45Rank: 45Rank: 45

白雪公主管理员勋章

发表于 2017-10-10 01:27:11 | 显示全部楼层 |阅读模式



00:12

For the next 16 minutes, I'm going to take you on a journey that is probably the biggest dream of humanity: to understand the code of life.

00:21

So for me, everything started many, many years ago when I met the first 3D printer. The concept was fascinating. A 3D printer needs three elements: a bit of information, some raw material, some energy, and it can produce any object that was not there before.

00:38

I was doing physics, I was coming back home and I realized that I actually always knew a 3D printer. And everyone does. It was my mom.

00:47

My mom takes three elements: a bit of information, which is between my father and my mom in this case, raw elements and energy in the same media, that is food, and after several months, produces me. And I was not existent before.

01:02

So apart from the shock of my mom discovering that she was a 3D printer, I immediately got mesmerized by that piece, the first one, the information. What amount of information does it take to build and assemble a human? Is it much? Is it little? How many thumb drives can you fill?

01:21

Well, I was studying physics at the beginning and I took this approximation of a human as a gigantic Lego piece. So, imagine that the building blocks are little atoms and there is a hydrogen here, a carbon here, a nitrogen here. So in the first approximation, if I can list the number of atoms that compose a human being, I can build it. Now, you can run some numbers and that happens to be quite an astonishing number. So the number of atoms, the file that I will save in my thumb drive to assemble a little baby, will actually fill an entire Titanic of thumb drives -- multiplied 2,000 times. This is the miracle of life. Every time you see from now on a pregnant lady, she's assembling the biggest amount of information that you will ever encounter. Forget big data, forget anything you heard of. This is the biggest amount of information that exists.

02:26

But nature, fortunately, is much smarter than a young physicist, and in four billion years, managed to pack this information in a small crystal we call DNA. We met it for the first time in 1950 when Rosalind Franklin, an amazing scientist, a woman, took a picture of it. But it took us more than 40 years to finally poke inside a human cell, take out this crystal, unroll it, and read it for the first time. The code comes out to be a fairly simple alphabet, four letters: A, T, C and G. And to build a human, you need three billion of them. Three billion. How many are three billion? It doesn't really make any sense as a number, right?

03:12

So I was thinking how I could explain myself better about how big and enormous this code is. But there is -- I mean, I'm going to have some help, and the best person to help me introduce the code is actually the first man to sequence it, Dr. Craig Venter. So welcome onstage, Dr. Craig Venter.

03:39

Not the man in the flesh, but for the first time in history, this is the genome of a specific human, printed page-by-page, letter-by-letter: 262,000 pages of information, 450 kilograms, shipped from the United States to Canada thanks to Bruno Bowden, Lulu.com, a start-up, did everything. It was an amazing feat.

04:07

But this is the visual perception of what is the code of life. And now, for the first time, I can do something fun. I can actually poke inside it and read. So let me take an interesting book ... like this one. I have an annotation; it's a fairly big book. So just to let you see what is the code of life. Thousands and thousands and thousands and millions of letters. And they apparently make sense. Let's get to a specific part. Let me read it to you:

04:46

"AAG, AAT, ATA."

04:50

To you it sounds like mute letters, but this sequence gives the color of the eyes to Craig. I'll show you another part of the book. This is actually a little more complicated.

05:02

Chromosome 14, book 132:

05:07

As you might expect.

05:14

"ATT, CTT, GATT."

05:20

This human is lucky, because if you miss just two letters in this position -- two letters of our three billion -- he will be condemned to a terrible disease: cystic fibrosis. We have no cure for it, we don't know how to solve it, and it's just two letters of difference from what we are.

05:39

A wonderful book, a mighty book, a mighty book that helped me understand and show you something quite remarkable. Every one of you -- what makes me, me and you, you -- is just about five million of these, half a book. For the rest, we are all absolutely identical. Five hundred pages is the miracle of life that you are. The rest, we all share it. So think about that again when we think that we are different. This is the amount that we share.

06:15

So now that I have your attention, the next question is: How do I read it? How do I make sense out of it? Well, for however good you can be at assembling Swedish furniture, this instruction manual is nothing you can crack in your life.

06:32

And so, in 2014, two famous TEDsters, Peter Diamandis and Craig Venter himself, decided to assemble a new company. Human Longevity was born, with one mission: trying everything we can try and learning everything we can learn from these books, with one target -- making real the dream of personalized medicine, understanding what things should be done to have better health and what are the secrets in these books.

07:00

An amazing team, 40 data scientists and many, many more people, a pleasure to work with. The concept is actually very simple. We're going to use a technology called machine learning. On one side, we have genomes -- thousands of them. On the other side, we collected the biggest database of human beings: phenotypes, 3D scan, NMR -- everything you can think of. Inside there, on these two opposite sides, there is the secret of translation. And in the middle, we build a machine. We build a machine and we train a machine -- well, not exactly one machine, many, many machines -- to try to understand and translate the genome in a phenotype. What are those letters, and what do they do? It's an approach that can be used for everything, but using it in genomics is particularly complicated. Little by little we grew and we wanted to build different challenges. We started from the beginning, from common traits. Common traits are comfortable because they are common, everyone has them.

08:02

So we started to ask our questions: Can we predict height? Can we read the books and predict your height? Well, we actually can, with five centimeters of precision. BMI is fairly connected to your lifestyle, but we still can, we get in the ballpark, eight kilograms of precision. Can we predict eye color? Yeah, we can. Eighty percent accuracy. Can we predict skin color? Yeah we can, 80 percent accuracy. Can we predict age? We can, because apparently, the code changes during your life. It gets shorter, you lose pieces, it gets insertions. We read the signals, and we make a model.

08:40

Now, an interesting challenge: Can we predict a human face? It's a little complicated, because a human face is scattered among millions of these letters. And a human face is not a very well-defined object. So, we had to build an entire tier of it to learn and teach a machine what a face is, and embed and compress it. And if you're comfortable with machine learning, you understand what the challenge is here.

09:04

Now, after 15 years -- 15 years after we read the first sequence -- this October, we started to see some signals. And it was a very emotional moment. What you see here is a subject coming in our lab. This is a face for us. So we take the real face of a subject, we reduce the complexity, because not everything is in your face -- lots of features and defects and asymmetries come from your life. We symmetrize the face, and we run our algorithm. The results that I show you right now, this is the prediction we have from the blood.

09:43

Wait a second. In these seconds, your eyes are watching, left and right, left and right, and your brain wants those pictures to be identical. So I ask you to do another exercise, to be honest. Please search for the differences, which are many. The biggest amount of signal comes from gender, then there is age, BMI, the ethnicity component of a human. And scaling up over that signal is much more complicated. But what you see here, even in the differences, lets you understand that we are in the right ballpark, that we are getting closer. And it's already giving you some emotions.

10:21

This is another subject that comes in place, and this is a prediction. A little smaller face, we didn't get the complete cranial structure, but still, it's in the ballpark. This is a subject that comes in our lab, and this is the prediction. So these people have never been seen in the training of the machine. These are the so-called "held-out" set. But these are people that you will probably never believe. We're publishing everything in a scientific publication, you can read it.

10:53

But since we are onstage, Chris challenged me. I probably exposed myself and tried to predict someone that you might recognize. So, in this vial of blood -- and believe me, you have no idea what we had to do to have this blood now, here -- in this vial of blood is the amount of biological information that we need to do a full genome sequence. We just need this amount. We ran this sequence, and I'm going to do it with you. And we start to layer up all the understanding we have. In the vial of blood, we predicted he's a male. And the subject is a male. We predict that he's a meter and 76 cm. The subject is a meter and 77 cm. So, we predicted that he's 76; the subject is 82. We predict his age, 38. The subject is 35. We predict his eye color. Too dark. We predict his skin color. We are almost there. That's his face.

11:57

Now, the reveal moment: the subject is this person.

12:04

And I did it intentionally. I am a very particular and peculiar ethnicity. Southern European, Italians -- they never fit in models. And it's particular -- that ethnicity is a complex corner case for our model. But there is another point. So, one of the things that we use a lot to recognize people will never be written in the genome. It's our free will, it's how I look. Not my haircut in this case, but my beard cut. So I'm going to show you, I'm going to, in this case, transfer it -- and this is nothing more than Photoshop, no modeling -- the beard on the subject. And immediately, we get much, much better in the feeling.

12:42

So, why do we do this? We certainly don't do it for predicting height or taking a beautiful picture out of your blood. We do it because the same technology and the same approach, the machine learning of this code, is helping us to understand how we work, how your body works, how your body ages, how disease generates in your body, how your cancer grows and develops, how drugs work and if they work on your body.

13:19

This is a huge challenge. This is a challenge that we share with thousands of other researchers around the world. It's called personalized medicine. It's the ability to move from a statistical approach where you're a dot in the ocean, to a personalized approach, where we read all these books and we get an understanding of exactly how you are. But it is a particularly complicated challenge, because of all these books, as of today, we just know probably two percent: four books of more than 175.

13:58

And this is not the topic of my talk, because we will learn more. There are the best minds in the world on this topic. The prediction will get better, the model will get more precise. And the more we learn, the more we will be confronted with decisions that we never had to face before about life, about death, about parenting.

14:32

So, we are touching the very inner detail on how life works. And it's a revolution that cannot be confined in the domain of science or technology. This must be a global conversation. We must start to think of the future we're building as a humanity. We need to interact with creatives, with artists, with philosophers, with politicians. Everyone is involved, because it's the future of our species. Without fear, but with the understanding that the decisions that we make in the next year will change the course of history forever.

15:15

Thank you.

00:12

接下来的一刻钟,我要带大家踏上一段旅程 这大概是全人类的终极梦想—— 解读生命的密码!

00:21

我的经历开始于很多很多年以前, 那时我遇到了第一台3D打印机。 3D打印真是个非常赞的概念 它需要三个要素: 少量的信息,一些原材料,再加上点能量

00:34

就能制造出以前从没存在过的任何东西。 当时我正在研究物理学 有天我回到家,突然意识到我家里就有台3D打印机

00:43

而且每人家里都有一台 就是我妈妈。

00:47

我妈妈用这三个要素: 少量的信息—— 来自我爸和我妈的共同投入 原材料和能量的共同来源——食物 历时几个月,制造出了我 而我以前从来没有存在过!

01:02

除了震惊的发现我妈其实是台3D打印机 我还立即被另一个部分吸引了 第一个要素,信息—— 到底需要多少信息 才能制造和组装一个人呢? 是要很多?还是很少? 要用多少个U盘去储存?

01:21

我最开始是学物理的, 我想如果把人看成是一个巨型的乐高玩具 小的乐高模块就像是原子—— 这里有氢原子,这边有碳原子,上面这有氮原子。 按照最初的这个设定 如果能够列出组成人类的所有原子 应该就能组装出一个人。 大致计算一下 得到的结果非常惊人。 所需要的原子的总数, 全部存到U盘里面——即便是组装一个小婴儿 用掉的U盘就能装满整个泰坦尼克号 再乘以2000倍... 这就是生命的奇迹。 现在你再看到一个孕妇 她正在组装你能见到的最大量的信息

02:16

不要谈大数据,不要谈以前听说过的数字 这就是现存的,最最大量的信息。

02:22

(掌声) 但是......

02:26

好在大自然比一个年轻的物理学家要聪明多了。 在四十亿年的进化过程中 这些信息被压缩在叫做DNA的小晶体当中。 在1950年代我们第一次知道了DNA 那时一位杰出的女科学家Rosalind Franklin 给DNA拍了张照 但我们花了超过40年的时间, 才最终能够从人类细胞中提取这种晶体, 展开来,第一次去阅读它。 这个遗传密码由简单的字母表组成, 四个字母,A,T,C和G (碱基)。 要组装一个人,需要30亿个字母。 30亿....30亿是多少? 光这么说大家可能都没概念, 我在想怎么表达才能让人更清楚, 这些遗传密码的数量到底有多庞大。 所以...我需要点帮助... 最合适来帮我介绍遗传密码的人, 就是第一位进行人类基因组测序的人, Craig Venter 博士。

03:29

我们欢迎Craig Venter博士到台上来—— (掌声) 不是他本人—— 但这是史上第一次,一个人的基因组 被一页一页,一个字母一个字母的打印在纸上—— 总共26万2千页,450千克, 从美国运到加拿大 感谢Bruno Bowden还有 Lulu.com—— 他们负责完成了这一切,一项壮举。

04:07

这些就是生命密码给人最直观的视觉感受。 现在我可以来玩点有趣的—— 从这里面挑一段来读一读。 我来找一本有意思的...比如这一本... 我放了书签在里面,这书太厚了... 给你们看一下,生命的密码长什么样子 成百上千...成千上万...上百万的字母... 它们当然都有意义。 让我来找一段特殊的 读给你们听...

04:46

"AAG, AAT, ATA"

04:50

你们可能觉得像是听天书, 但这段序列决定了Craig眼睛的颜色。 在看看另外一段... 这一段稍微复杂一些...

05:02

第14号染色体,书本编号132...

05:07

你们想象到了哦...

05:14

"ATT, CTT, GATT"

05:20

这个人很幸运, 因为如果他在这个位点上少了2个字母, 30亿中的2个... 他就会患上一种非常可怕的疾病—— 囊肿性纤维化(cystic fibrosis) 目前没有治疗的方法,这是绝症, 仅仅是2个字母的区别。

05:39

这是一部鸿篇巨著, 它帮助我理解,也能让你们看到 一件更加另人叹为观止的事。 我们中的每一个人, 是什么让我成为我,让你成为你... 大概只占这其中的500万... 只有半本书... 所有剩下的,我们完全一模一样。 500页,涵盖了你的生命奇迹; 余下的,我们全都一样。 讨论人与人差异的时候反思一下, 我们有这么多共通的东西。

06:15

现在我已经引起了你们的兴趣, 下一步就是: 怎么去读取这些信息? 怎么理解和运用它们? 不管你在组装宜家家居上有多在行... 这么长的说明书...基本是不可能完成的任务

06:32

2014年,两位著名的TED参加者 Peter Diamandis 和 Craig Venter 决定成立一个新公司 人类长寿公司(Human Longevity, Inc.)诞生了。 唯一的任务—— 竭尽全力,穷尽其学的研究这些书目 只为达到一个目的: 让个人化医疗成为现实。 怎么做才能提高人类健康水平 了解这些书目背后的秘密。

07:00

一个强大的团队,拥有40位数据分析人员 还有很多其他的人力支持 和他们一起工作十分愉快。 实际上工作流程不很复杂 我们用一种叫做机器学习的方法。 一方面,我们有几千个基因组; 另一边我们建立一个超大的人类信息数据库: 性状,3D扫描,核磁共振,所有能想到的 在这两个端点之间, 有神秘的翻译在进行。 我们在中间建了一个机器, 建好之后训练这台机器—— 实际上不只一台机器,而是很多台... 试图去理解基因组并把它翻译成性状。 有哪些字母——它们控制什么性状—— 这是普适的方法,可以用在所有问题上, 但用在基因组学上异常的复杂。 一点一点有了进展,我们再尝试更有挑战性的东西 最开始我们从常见的特征下手, 常见特征最容易因为它们太常见了, 每个人都有。

08:02

我们开始提出如下问题: 能预测身高吗? 能不能根据这些信息预测身高? 可以,在5厘米的误差范围以内。 BMI 主要跟生活习惯有关, 但我们仍然能预测得差不多,8千克上下的误差。 眼睛的颜色能不能预测? 可以,80%准确率。 皮肤颜色? 可以,80%准确。 年龄? 可以,因为很明显基因随着年龄产生变化。 DNA 会变短,缺失一些片段,插入另外一些片段 我们读取这些信号,然后建立模型。

08:40

现在来个有意思点的挑战: 我们能不能预测人的面孔? 这个略有点复杂, 因为有几百万个碱基都对人脸产生影响。 而且人脸并不是一个构造十分精准的物体。 所以必须要建立一整个单独的模块, 给机器去训练和学习人脸是什么, 再把这个模块压缩整合进去。 如果你对机器学习有点概念的话, 就能够想象这个挑战是有多大。

09:04

现在15年过去了——15年前我们读取第一条序列 ——今年10月,我们总算有了些进展, 当时还是很激动人心的。 这是我们的一个测试对象,一张人的脸—— 我们要对测试对象的面孔进行简化, 因为并不是所有的特征都是面孔的一部分—— 很多特点、缺陷和不对称是生活的痕迹。 把面孔调整对称之后,跟我们运算的结果比较。 现在给你们看,我们根据血液样本生成的预测。

09:43

等一下—— 你们的眼睛正在左右两边交替看, 大脑希望两幅图是一模一样的。 我其实想请大家反过来, 找找两幅图的不同点, 其实非常多。 性别提供最多的信息, 接下来是年龄,BMI(体质指数),种族; 再考虑更多因素会变得更加复杂。 但是这样的结果,即便有很多不同, 表示我们已经接近了, 正在逐渐靠得更近——而且这已经能够鼓舞人心了

10:21

这是另外一个测试对象, 这边是预测结果。 脸小了一点,完整的颅骨结构没预测到。 但至少像那么回事。 这是又一个测试对象, 这是预测结果。 这些面孔在训练机器的时候是没有用过的, 就是所谓的随机测试组。 并且你们不认识这些人,可能说服力不太够。 我们在学术期刊上发表了这些结果, 你们可以去读一下。

10:53

但既然我们在台上,Chris 给我出了个点子, 我可以挑战一下,尝试预测一个你们都认识的人。 这里有管血液——你们很难想象 我们为了带一管血液到这里花了多少工夫... 这支试管里的血液足够完成一次全基因组测序 只需要这么多。 完成了测序,下面我们一条条来看—— 我们综合了所有已知的信息—— 从血液测试的结果,我们预测这是一名男性, 被试是男性。 预测他身高1米76, 被试身高1米77。 预测他体重76kg,被试是82kg; 我们还预测了年龄,38岁 被试实际是35岁。 预测了眼睛的颜色,有点偏深了; 预测他的皮肤颜色, 基本上准确。 这是他的面孔...

11:57

现在到了揭晓的时刻: 被试对象是这个人。

12:04

我是有意拿自己做测试的, 我属于一个特别又特殊的种族, 南欧人,意大利人——从来都不符合模型预测。 而且这一种族在模型里是一个复杂的边界情况。 但还有另一个重点—— 最常用的来辨识人的方法, 不是由基因组编译的。 是人们的自由意志——我想让自己看起来怎么样, 虽然我的发型不是我自己决定的,但胡子是的。 下面我们来看一下—— 单纯的用photoshop,不用建模—— 把胡子加上去。 是不是立即觉得变得很相像了。

12:42

那么,我们为什么要研究这些? 当然不是为了预测身高, 或者是根据血液样本得到一张美照; 我们研究是因为同样的技术和手段—— 对基因组的机器学习, 能帮助我们了解人类自身, 你的身体怎么运作,身体如何老化, 疾病是如何产生的, 癌症是怎么出现和恶化的; 药物如何起作用—— 药物是不是能够对你有效。

13:19

这是一个巨大的挑战, 而且是一个全球的科学家都面临的挑战 ——个性化医疗。 从只能借助统计学方法—— 每个人都只是沧海一粟—— 到能够实现有针对性的治疗, 通过解码这些基因信息, 我们能够彻底了解每一个人。 但这是一项异常复杂的挑战, 因为到目前为止在这么庞大的基因组信息中, 我们大概只了解2%: 175本书里的4本...

13:58

当然这不是我今天演讲的主题, 因为我们会进步,会了解更多—— 有很多顶尖的人才在从事这项工作。 预测能力会提升,模型会更准确。 随着了解的逐渐深入, 我们需要做的决定会越来越多, 而且是一些从前没有想象过的决定—— 关于生,关于死,关于子孙后代... 所以我们在此的讨论,涉及生命最本质的东西, 这些改变不只是在科学和技术层面。 我们必须要有全球性的对话, 必须要为全人类的未来设想。 我们需要和创新人才、艺术家、哲学家交流, 还需要政治家的参与。 每个人都身在其中,因为这关乎人类的未来。 不需要惊慌—— 但必须了解我们现在做出的每一项决定, 都会彻底改变历史。

您需要登录后才可以回帖 登录 | 注册

本版积分规则

提示
随便
看看

精彩
图片

帖子
导读
快速回复 返回顶部 返回列表