Voice techonology 语音技术

Now we’re talking 轻而易“语”

Now we’re talking 轻而易“语”

Voice technology is making computers less daunting and more accessible

有了语音技术,电脑不再令人敬而远之,反而更加平易近人

ANY sufficiently advanced technology, noted Arthur C. Clarke, a British science-fiction writer, is indistinguishable from magic. The fast-emerging technology of voice computing proves his point. Using it is just like casting a spell: say a few words into the air, and a nearby device can grant your wish.

英国科幻小说作家亚瑟·克拉克(Arthur C. Clarke)曾经指出,任何科技只要先进到足够的程度,就和魔法没有区别。迅速兴起的语音电脑证明了他的观点。它用起来就像是变魔法:对着空气说句话,附近的智能设备就会帮你如愿以偿。

The Amazon Echo, a voice-driven cylindrical computer that sits on a table top and answers to the name Alexa, can call up music tracks and radio stations, tell jokes, answer trivia questions and control smart appliances; even before Christmas it was already resident in about 4% of American households. Voice assistants are proliferating in smartphones, too: Apple’s Siri handles over 2bn commands a week, and 20% of Google searches on Android-powered handsets in America are input by voice. Dictating e-mails and text messages now works reliably enough to be useful. Why type when you can talk?

亚马逊智能音箱(Amazon Echo)是一种声控筒状台式电脑,听到“阿丽夏”(Alexa)这个名字,它就会做出反应,挑选歌曲,选择电台,讲笑话,回答各种琐碎问题,还能控制智能设备;甚至早在圣诞节到来之前,它就已经入住了4%的美国家庭。语音智能助手还扩张到了智能手机行业:苹果(Apple)的语音助手西里(Siri)每周要处理超过20亿个指令;在美国,安卓手机的谷歌搜索指令有20%都是通过语音发布的。电邮和短信的语音输入技术已经发展得足够稳定,十分好用了。说话就能解决,何必还去打字?

This is a huge shift. Simple though it may seem, voice has the power to transform computing, by providing a natural means of interaction. Windows, icons and menus, and then touchscreens, were welcomed as more intuitive ways to deal with computers than entering complex keyboard commands. But being able to talk to computers abolishes the need for the abstraction of a “user interface” at all. Just as mobile phones were more than existing phones without wires, and cars were more than carriages without horses, so computers without screens and keyboards have the potential to be more useful, powerful and ubiquitous than people can imagine today.

这种转变非同小可。尽管语音技术看起如此简单,但通过提供自然的交流方式,它具备着改变电脑的力量。从Windows操作系统,到图标和菜单,再到触屏技术,这些和电脑打交道的方式更加直观,比输入复杂键盘指令更受欢迎。但是,一旦能够与电脑交谈,就不存在将“用户界面”抽象出来的必要了。就像手机不光是没有线的电话,汽车不光是没有马的马车,没有屏幕和键盘的电脑也有潜力变得更好用、更强大、更无所不在,超乎今人的想象。

Voice will not wholly replace other forms of input and output. Sometimes it will remain more convenient to converse with a machine by typing rather than talking (Amazon is said to be working on an Echo device with a built-in screen). But voice is destined to account for a growing share of people’s interactions with the technology around them, from washing machines that tell you how much of the cycle they have left to virtual assistants in corporate call-centres. However, to reach its full potential, the technology requires further breakthroughs—and a resolution of the tricky questions it raises around the trade-off between convenience and privacy.

语音技术并不会完全替代其他形式的输入与输出。有时候,要和机器聊天,打字仍旧比语音更容易(据说,亚马逊打算研发一种带有内置屏幕的语音设备)。不过,从告诉你剩余洗衣时间还有多长的洗衣机,到企业呼叫中心的虚拟助手,作为人们与周边科技互动的方式,语音注定会越来越受青睐。然而,要充分发挥其潜力,语音技术还需要进一步突破,解决好由此产生的微妙问题,拿捏好便利性与否与隐私权之间的平衡。

Alexa, what is deep learning? 

阿丽夏,深度学习是啥?

Computer-dictation systems have been around for years. But they were unreliable and required lengthy training to learn a specific user’s voice. Computers’ new ability to recognise almost anyone’s speech dependably without training is the latest manifestation of the power of “deep learning”, an artificial-intelligence technique in which a software system is trained using millions of examples, usually culled from the internet. Thanks to deep learning, machines now nearly equal humans in transcription accuracy, computerised translation systems are improving rapidly and text-to-speech systems are becoming less robotic and more natural-sounding. Computers are, in short, getting much better at handling natural language in all its forms (seeTechnology Quarterly).

电脑指令系统已经伴随我们好些年了。但是,他们性能不稳定,需要经过长时间训练,才能识别特定用户的语音。“无需训练就能可靠地识别几乎任何人的讲话,这是电脑的新功能,也是“深度学习”能力的最新印证。深度学习”是一种人工智能技术,该技术可让某种软件系统接收上百万次案例训练,这些案例往往是从网络上精选出来的。现在,有了深度学习技术,机器在转录准确性上,已经与人无异,电脑翻译系统正在飞速发展,文本转语音系统的机器人腔越来越少,更加接近自然人声。简言之,电脑对各种自然语言的处理能力都今非昔比了。

Although deep learning means that machines can recognise speech more reliably and talk in a less stilted manner, they still don’t understand the meaning of language. That is the most difficult aspect of the problem and, if voice-driven computing is truly to flourish, one that must be overcome. Computers must be able to understand context in order to maintain a coherent conversation about something, rather than just responding to simple, one-off voice commands, as they mostly do today (“Hey, Siri, set a timer for ten minutes”). Researchers in universities and at companies large and small are working on this very problem, building “bots” that can hold more elaborate conversations about more complex tasks, from retrieving information to advising on mortgages to making travel arrangements. (Amazon is offering a $1m prize for a bot that can converse “coherently and engagingly” for 20 minutes.)

尽管深度学习意味着机器能更加可靠的识别人声,发音也不再生硬,但是机器依然无法理解语言的意思。这是语音技术中最困难的一点,真要想蓬勃发展,这是声控电脑所必须克服的问题。电脑必须能够理解文字的意思,才能就某个话题展开连贯对话,而不是像今天常见的,电脑只对简单的、一次性的语音指令做出回应(“嘿,西里,帮我设个10分钟的闹钟”)。在大大小小的高校和公司中,研究员们正在研究这个问题,设计能够针对复杂任务进行细致对话的机器人,从检索信息,到提供房产按揭建议,再到安排行程等。(亚马逊设置了100万美元奖金,奖励能够连贯地愉快聊天20分钟的机器人)

When spells replace spelling

 动口代替动手

Consumers and regulators also have a role to play in determining how voice computing develops. Even in its current, relatively primitive form, the technology poses a dilemma: voice-driven systems are most useful when they are personalised, and are granted wide access to sources of data such as calendars, e-mails and other sensitive information. That raises privacy and security concerns.

在语音电脑的发展问题上,消费者和监管者也发挥着一定的决定作用。尽管目前来说,语音技术尚处于相对原始的发展阶段,但它已然让人们陷入两难:声控系统的个性化程度越高,允许接触的私人日程、电邮和其他敏感信息越丰富,则发挥的用处也越大。这引发了人们对隐私和安全问题的担忧。

To further complicate matters, many voice-driven devices are always listening, waiting to be activated. Some people are already concerned about the implications of internet-connected microphones listening in every room and from every smartphone. Not all audio is sent to the cloud—devices wait for a trigger phrase (“Alexa”, “OK, Google”, “Hey, Cortana”, or “Hey, Siri”) before they start relaying the user’s voice to the servers that actually handle the requests—but when it comes to storing audio, it is unclear who keeps what and when.

从更复杂的角度看,许多声控设备一直都处于待命状态,等待被声音指令一触即发。联网的麦克风监听着每个房间,每一部智能手机,已经有人在担心这一切意味着什么了。并非所有的音频都发送到了云设备——在设备开始将用户语音传达给实际处理语音请求的服务器之前,他们随时等待着“一声令下”(“阿夏丽”,“好,谷歌”,“嘿,微软小娜”,或“嗨,西里”)——但是,一旦开始储存音频,就难说谁会在什么时候保留什么录音了。

Police investigating a murder in Arkansas, which may have been overheard by an Amazon Echo, have asked the company for access to any audio that might have been captured. Amazon has refused to co-operate, arguing (with the backing of privacy advocates) that the legal status of such requests is unclear. The situation is analogous to Apple’s refusal in 2016 to help FBI investigators unlock a terrorist’s iPhone; both cases highlight the need for rules that specify when and what intrusions into personal privacy are justified in the interests of security.

警方在阿肯色州调查的一桩谋杀案中,亚马逊智能音箱有可能听到了凶杀过程,于是警方要求公司提供该智能设备可能获取的任何音频资料。亚马逊公司却拒绝合作,理由(受到了隐私拥护者们的支持)是这种要求是否合法尚不明确。无独有偶,2016年,苹果供公司也拒绝配合协助FBI调查员解锁一名恐怖分子的苹果手机;这两个案例都突出了明确法规的必要性,出于安全利益考虑,对个人隐私的何时何种侵扰属于合法,应该得到明确。

Consumers will adopt voice computing even if such issues remain unresolved. In many situations voice is far more convenient and natural than any other means of communication. Uniquely, it can also be used while doing something else (driving, working out or walking down the street). It can extend the power of computing to people unable, for one reason or another, to use screens and keyboards. And it could have a dramatic impact not just on computing, but on the use of language itself. Computerised simultaneous translation could render the need to speak a foreign language irrelevant for many people; and in a world where machines can talk, minor languages may be more likely to survive. The arrival of the touchscreen was the last big shift in the way humans interact with computers. The leap to speech matters more.

尽管这些问题尚未解决,消费者们仍旧愿意接受语音电脑。在许多情况下,语音比其他交流方式要方便、自然得多。与众不同的是,当你使用它时,还可以同时做其他事情(开车,健身或在街上走路)。语音可以让由于种种原因不能使用屏幕和键盘的人们感受到电脑的力量。它不仅给电脑带来惊人影响,还影响了语言使用本身。电脑同声传译让许多人不必会说外语;在一个机器可以讲话的世界中,小语种生存下去的可能性更高。触屏的到来是人类与电脑互动模式的上一次重大转变。语音的飞跃有过之而无不及。

原文出处:经济学人网站

译者:linda10030

本译文仅供个人研习、欣赏语言之用,谢绝任何转载及用于任何商业用途。本译文所涉法律后果均由本人承担。本人同意简书平台在接获有关著作权人的通知后,删除文章。

——
Jan 7th 2017 | 1045 words

经济学人双语_ 史桂盛