
演讲题目:How AI models steal creative work — and what to do about it
演讲简介:
生成式人工智能依赖于三大资源:人力、算力和数据。虽然公司在前两者上投入巨大,但常常未经许可或未付费的情况下使用无许可的创意作品作为训练数据,AI专家埃德 · 牛顿-雷克斯提出了一项计划,确保AI公司和创作者能够共同繁荣。
中英文字幕
The technology and vision behind generative AI is amazing, but stealing the work of the world's creators to build it is not.
生成式人工智能背后的技术和愿景都令人惊叹,但用窃取全世界创造者的成果来构建却并不光彩。
There are three key things that AI companies need to build their models, three key resources -- people, compute and data.
人工智能公司建立模型需要三个关键要素,三种关键资源——人、计算和数据。
That is, engineers to build the models,
也就是说,工程师来构建模型,
GPUs to run the training process and data to train the models on.
GPU 来运行训练过程,用数据来训练模型。
AI companies spend vast sums on the first two, sometimes a million dollars per engineer and up to a billion dollars per model.
人工智能公司在前两者上投入巨资,有时每位工程师的成本高达百万美元,每个模型成本高达十亿美元。
But they expect to take the third resource, training data, for free.
却希望免费获得第三种资源,即训练数据。
Right now, many AI companies train on creative work they haven't paid for or even asked permission to use.
目前,许多人工智能公司在在未经许可或未支付报酬的情况下利用创作者的劳动成果来训练(模型)。
This is unfair and unsustainable.
这是不公平且不可持续的。
But if we reset, and license our training data, we can build a better generative AI ecosystem that works for everyone,
但是,如果我们重置并授权我们的训练数据,我们就能建立一个更好的生成式人工智能生态系统,该生态系统适用于所有人,
both the AI companies themselves and the creators, without whose work these models would not exist.
包括人工智能公司本身和创作者,而没有他们的努力,这些模型就不会存在。
Most AI companies today do not license the majority of their training data.
当今大多数人工智能公司都没有授权其大部分训练数据。
They use web scrapers to find, download and train on as much content as they can gather.
他们使用网络爬虫查找、下载,然后用尽可能多的内容来训练(模型)。
They're often pretty secretive about what they do train on, but what's clear is that training on copyrighted work without a license is rife.
他们通常对自己的训练内容保密,但很明显,没有版权许可证的作品训练很普遍。
And the people who make the art that society consumes feel the same way.
而那些创作供社会消费的艺术品的人也有同样的态度。
Today, we launched a "Statement on AI Training," a short, simple open letter, which simply reads:
今天,我们发布了一份关于”人工智能培训的声明”,这是一封简短的公开信,信中简单地写道:
"The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted."
“未经许可使用创造者的劳动成果来训练生成式人工智能的行为是对那些作品的劳动者们重大的、不公平的威胁,绝不能被允许。
This has already been signed by 11,000 and counting creators around the world, including Nobel-winning authors,
全球已有 11,000 多位创作者 签署了该协议,其中包括获诺贝尔奖作家,
Academy Award-winning actors and Oscar-winning composers.
奥斯卡获奖演员和奥斯卡获奖作曲家。
And if you agree with this sentiment, I encourage you to sign it today at ai training statement. org.
如果你同意这种观点,我鼓励你今天在 aitrainingstatement.org上签字。
What this statement and previous ones like it make abundantly clear is that these artists, these creators,
这份声明和之前的类似声明非常清楚地表明,这些艺术家,这些创作者,
view the unlicensed training on their work by generative AI models
认为生成式人工智能模型用他们的作品进行未经许可的训练
as totally unjust and potentially catastrophic to their professions.
是完全不公正的,而且可能对他们的职业造成灾难性影响。
So if you are an advocate for unlicensed AI training,
因此,如果你倡导未授权人工智能培训,
just remember that the people who wrote the music that you are listening to and the books you're reading probably disagree.
请记住,你聆听的音乐的创作者们,你阅读的书籍的创作者们可能不同意。
So where does this leave us?
那么,这会让我们走向何方呢?
Well, right now, many of the world's artists, writers, musicians, creators straight-up hate generative AI.
好吧,现在,世界上许多艺术家、作家、音乐家、创作者都对生成式人工智能深恶痛绝。
And we know, from their own words, that one of the reasons for this is that we're training on their work without asking them.
而且我们知道,用他们的话来说,造成这种情况的原因之一,就是在未经过授权的情况下用他们的劳动成果进行训练。
But it doesn't have to be this way.
但本不必如此。
The AI industry and the creative industries
人工智能行业和创意产业
can be and should be mutually beneficial.
可以而且应该是互惠互利的。
But for this mutually beneficial relationship to emerge,
但是,要想建立这种互惠互利的关系,
we have to start from a position of respect for the value of the works being trained on and the rights of the people who made them.
我们必须从尊重用来培训的作品价值和创作这些作品的人的权利的立场出发。
I'm not arguing that all AI development should be halted.
我并不是说所有的人工智能开发都应该停止。
I'm not arguing that AI should not exist.
我并不是在争辩人工智能不应该存在。
What I'm arguing is that the resources used to build generative AI should be paid for.
我的论点是,用于训练和构建生成式人工智能的资源应该付费。
Licensing is hard work.
授权许可是一项繁重的工作。
It will slow you down in the short term, but you'll ultimately reach exactly the same point -- models that are just as capable, just as powerful --
它会在短期内减慢你的速度,但你最终会达到完全相同的地步——同样强大、同样高效的模型 ——
and you'll do so without forcing the world's publishers to batten down the hatches and destroy the commons,
这样,你既能避免让全球的出版商严防死守、摧毁公共资源,
and without pitting the world's creators against you.
也不会把世界上的创作者变成你的对立面。
So I hope that more AI companies will follow the example set by those we've certified at Fairly Trained, and license all their training data.
因此,我希望更多的人工智能公司能效仿那些已经通过Fairly Trained 认证的企业,为其训练数据获取正式授权。
I hope that employees at these companies will demand this of their employers.
我希望这些公司的员工能要求雇主这样做。
And I hope that everyone who uses generative AI will ask what their favorite models were trained on.
我希望每个使用生成式人工智能的人都会去问他们最喜欢的模型是用什么数据训练的。
There is a future in which generative AI and human creativity can coexist, not just peacefully, but symbiotically.
在未来,生成式人工智能与人类创造力不仅可以和平共存,还能实现共生。
It's been a rough start, but it's not too late to change course.
虽然起步艰难,但现在调整方向还为时不晚。
Thank you.
谢谢大家。
|
|