Tech
Geoffrey Hinton has a hunch about what’s next for AI
Published
4 years agoon
By
Terry Power
Deep learning set off the latest AI revolution, transforming computer vision and the field as a whole. Hinton believes deep learning should be almost all that’s needed to fully replicate human intelligence.
But despite rapid progress, there are still major challenges. Expose a neural net to an unfamiliar data set or a foreign environment, and it reveals itself to be brittle and inflexible. Self-driving cars and essay-writing language generators impress, but things can go awry. AI visual systems can be easily confused: a coffee mug recognized from the side would be an unknown from above if the system had not been trained on that view; and with the manipulation of a few pixels, a panda can be mistaken for an ostrich, or even a school bus.
GLOM addresses two of the most difficult problems for visual perception systems: understanding a whole scene in terms of objects and their natural parts; and recognizing objects when seen from a new viewpoint.(GLOM’s focus is on vision, but Hinton expects the idea could be applied to language as well.)
An object such as Hinton’s face, for instance, is made up of his lively if dog-tired eyes (too many people asking questions; too little sleep), his mouth and ears, and a prominent nose, all topped by a not-too-untidy tousle of mostly gray. And given his nose, he is easily recognized even on first sight in profile view.
Both of these factors—the part-whole relationship and the viewpoint—are, from Hinton’s perspective, crucial to how humans do vision. “If GLOM ever works,” he says, “it’s going to do perception in a way that’s much more human-like than current neural nets.”
Grouping parts into wholes, however, can be a hard problem for computers, since parts are sometimes ambiguous. A circle could be an eye, or a doughnut, or a wheel. As Hinton explains it, the first generation of AI vision systems tried to recognize objects by relying mostly on the geometry of the part-whole-relationship—the spatial orientation among the parts and between the parts and the whole. The second generation instead relied mostly on deep learning—letting the neural net train on large amounts of data. With GLOM, Hinton combines the best aspects of both approaches.
“There’s a certain intellectual humility that I like about it,” says Gary Marcus, founder and CEO of Robust.AI and a well-known critic of the heavy reliance on deep learning. Marcus admires Hinton’s willingness to challenge something that brought him fame, to admit it’s not quite working. “It’s brave,” he says. “And it’s a great corrective to say, ‘I’m trying to think outside the box.’”
The GLOM architecture
In crafting GLOM, Hinton tried to model some of the mental shortcuts—intuitive strategies, or heuristics—that people use in making sense of the world. “GLOM, and indeed much of Geoff’s work, is about looking at heuristics that people seem to have, building neural nets that could themselves have those heuristics, and then showing that the nets do better at vision as a result,” says Nick Frosst, a computer scientist at a language startup in Toronto who worked with Hinton at Google Brain.
With visual perception, one strategy is to parse parts of an object—such as different facial features—and thereby understand the whole. If you see a certain nose, you might recognize it as part of Hinton’s face; it’s a part-whole hierarchy. To build a better vision system, Hinton says, “I have a strong intuition that we need to use part-whole hierarchies.” Human brains understand this part-whole composition by creating what’s called a “parse tree”—a branching diagram demonstrating the hierarchical relationship between the whole, its parts and subparts. The face itself is at the top of the tree, and the component eyes, nose, ears, and mouth form the branches below.
One of Hinton’s main goals with GLOM is to replicate the parse tree in a neural net—this would distinguish it from neural nets that came before. For technical reasons, it’s hard to do. “It’s difficult because each individual image would be parsed by a person into a unique parse tree, so we would want a neural net to do the same,” says Frosst. “It’s hard to get something with a static architecture—a neural net—to take on a new structure—a parse tree—for each new image it sees.” Hinton has made various attempts. GLOM is a major revision of his previous attempt in 2017, combined with other related advances in the field.
“I’m part of a nose!”
GLOM vector
A generalized way of thinking about the GLOM architecture is as follows: The image of interest (say, a photograph of Hinton’s face) is divided into a grid. Each region of the grid is a “location” on the image—one location might contain the iris of an eye, while another might contain the tip of his nose. For each location in the net there are about five layers, or levels. And level by level, the system makes a prediction, with a vector representing the content or information. At a level near the bottom, the vector representing the tip-of-the-nose location might predict: “I’m part of a nose!” And at the next level up, in building a more coherent representation of what it’s seeing, the vector might predict: “I’m part of a face at side-angle view!”
But then the question is, do neighboring vectors at the same level agree? When in agreement, vectors point in the same direction, toward the same conclusion: “Yes, we both belong to the same nose.” Or further up the parse tree. “Yes, we both belong to the same face.”
Seeking consensus about the nature of an object—about what precisely the object is, ultimately—GLOM’s vectors iteratively, location-by-location and layer-upon-layer, average with neighbouring vectors beside, as well as predicted vectors from levels above and below.
However, the net doesn’t “willy-nilly average” with just anything nearby, says Hinton. It averages selectively, with neighboring predictions that display similarities. “This is kind of well-known in America, this is called an echo chamber,” he says. “What you do is you only accept opinions from people who already agree with you; and then what happens is that you get an echo chamber where a whole bunch of people have exactly the same opinion. GLOM actually uses that in a constructive way.” The analogous phenomenon in Hinton’s system is those “islands of agreement.”
“Imagine a bunch of people in a room, shouting slight variations of the same idea,” says Frosst—or imagine those people as vectors pointing in slight variations of the same direction. “They would, after a while, converge on the one idea, and they would all feel it stronger, because they had it confirmed by the other people around them.” That’s how GLOM’s vectors reinforce and amplify their collective predictions about an image.
GLOM uses these islands of agreeing vectors to accomplish the trick of representing a parse tree in a neural net. Whereas some recent neural nets use agreement among vectors for activation, GLOM uses agreement for representation—building up representations of things within the net. For instance, when several vectors agree that they all represent part of the nose, their small cluster of agreement collectively represents the nose in the net’s parse tree for the face. Another smallish cluster of agreeing vectors might represent the mouth in the parse tree; and the big cluster at the top of the tree would represent the emergent conclusion that the image as a whole is Hinton’s face. “The way the parse tree is represented here,” Hinton explains, “is that at the object level you have a big island; the parts of the object are smaller islands; the subparts are even smaller islands, and so on.”
According to Hinton’s long-time friend and collaborator Yoshua Bengio, a computer scientist at the University of Montreal, if GLOM manages to solve the engineering challenge of representing a parse tree in a neural net, it would be a feat—it would be important for making neural nets work properly. “Geoff has produced amazingly powerful intuitions many times in his career, many of which have proven right,” Bengio says. “Hence, I pay attention to them, especially when he feels as strongly about them as he does about GLOM.”
The strength of Hinton’s conviction is rooted not only in the echo chamber analogy, but also in mathematical and biological analogies that inspired and justified some of the design decisions in GLOM’s novel engineering.
“Geoff is a highly unusual thinker in that he is able to draw upon complex mathematical concepts and integrate them with biological constraints to develop theories,” says Sue Becker, a former student of Hinton’s, now a computational cognitive neuroscientist at McMaster University. “Researchers who are more narrowly focused on either the mathematical theory or the neurobiology are much less likely to solve the infinitely compelling puzzle of how both machines and humans might learn and think.”
Turning philosophy into engineering
So far, Hinton’s new idea has been well received, especially in some of the world’s greatest echo chambers. “On Twitter, I got a lot of likes,” he says. And a YouTube tutorial laid claim to the term “MeGLOMania.”
Hinton is the first to admit that at present GLOM is little more than philosophical musing (he spent a year as a philosophy undergrad before switching to experimental psychology). “If an idea sounds good in philosophy, it is good,” he says. “How would you ever have a philosophical idea that just sounds like rubbish, but actually turns out to be true? That wouldn’t pass as a philosophical idea.” Science, by comparison, is “full of things that sound like complete rubbish” but turn out to work remarkably well—for example, neural nets, he says.
GLOM is designed to sound philosophically plausible. But will it work?
You may like
-
What’s changed in the US since the breakthrough climate bill passed a year ago?
-
What’s next for China’s digital currency?
-
The Download: what’s next for the moon, and facial recognition’s stalemate
-
What’s next for the moon
-
Introducing MIT Technology Review Roundtables, real-time conversations about what’s next in tech
-
What “rewilding” means—and what’s missing from this new movement
My senior spring in high school, I decided to defer my MIT enrollment by a year. I had always planned to take a gap year, but after receiving the silver tube in the mail and seeing all my college-bound friends plan out their classes and dorm decor, I got cold feet. Every time I mentioned my plans, I was met with questions like “But what about school?” and “MIT is cool with this?”
Yeah. MIT totally is. Postponing your MIT start date is as simple as clicking a checkbox.
Now, having finished my first year of classes, I’m really grateful that I stuck with my decision to delay MIT, as I realized that having a full year of unstructured time is a gift. I could let my creative juices run. Pick up hobbies for fun. Do cool things like work at an AI startup and teach myself how to create latte art. My favorite part of the year, however, was backpacking across Europe. I traveled through Austria, Slovakia, Russia, Spain, France, the UK, Greece, Italy, Germany, Poland, Romania, and Hungary.
Moreover, despite my fear that I’d be losing a valuable year, traveling turned out to be the most productive thing I could have done with my time. I got to explore different cultures, meet new people from all over the world, and gain unique perspectives that I couldn’t have gotten otherwise. My travels throughout Europe allowed me to leave my comfort zone and expand my understanding of the greater human experience.
“In Iceland there’s less focus on hustle culture, and this relaxed approach to work-life balance ends up fostering creativity. This was a wild revelation to a bunch of MIT students.”
When I became a full-time student last fall, I realized that StartLabs, the premier undergraduate entrepreneurship club on campus, gives MIT undergrads a similar opportunity to expand their horizons and experience new things. I immediately signed up. At StartLabs, we host fireside chats and ideathons throughout the year. But our flagship event is our annual TechTrek over spring break. In previous years, StartLabs has gone on TechTrek trips to Germany, Switzerland, and Israel. On these fully funded trips, StartLabs members have visited and collaborated with industry leaders, incubators, startups, and academic institutions. They take these treks both to connect with the global startup sphere and to build closer relationships within the club itself.
Most important, however, the process of organizing the TechTrek is itself an expedited introduction to entrepreneurship. The trip is entirely planned by StartLabs members; we figure out travel logistics, find sponsors, and then discover ways to optimize our funding.
In organizing this year’s trip to Iceland, we had to learn how to delegate roles to all the planners and how to maintain morale when making this trip a reality seemed to be an impossible task. We woke up extra early to take 6 a.m. calls with Icelandic founders and sponsors. We came up with options for different levels of sponsorship, used pattern recognition to deduce the email addresses of hundreds of potential contacts at organizations we wanted to visit, and all got scrappy with utilizing our LinkedIn connections.
And as any good entrepreneur must, we had to learn how to be lean and maximize our resources. To stretch our food budget, we planned all our incubator and company visits around lunchtime in hopes of getting fed, played human Tetris as we fit 16 people into a six-person Airbnb, and emailed grocery stores to get their nearly expired foods for a discount. We even made a deal with the local bus company to give us free tickets in exchange for a story post on our Instagram account.
Tech
The Download: spying keyboard software, and why boring AI is best
Published
1 year agoon
22 August 2023By
Terry Power
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.
How ubiquitous keyboard software puts hundreds of millions of Chinese users at risk
For millions of Chinese people, the first software they download onto devices is always the same: a keyboard app. Yet few of them are aware that it may make everything they type vulnerable to spying eyes.
QWERTY keyboards are inefficient as many Chinese characters share the same latinized spelling. As a result, many switch to smart, localized keyboard apps to save time and frustration. Today, over 800 million Chinese people use third-party keyboard apps on their PCs, laptops, and mobile phones.
But a recent report by the Citizen Lab, a University of Toronto–affiliated research group, revealed that Sogou, one of the most popular Chinese keyboard apps, had a massive security loophole. Read the full story.
—Zeyi Yang
Why we should all be rooting for boring AI
Earlier this month, the US Department of Defense announced it is setting up a Generative AI Task Force, aimed at “analyzing and integrating” AI tools such as large language models across the department. It hopes they could improve intelligence and operational planning.
But those might not be the right use cases, writes our senior AI reporter Melissa Heikkila. Generative AI tools, such as language models, are glitchy and unpredictable, and they make things up. They also have massive security vulnerabilities, privacy problems, and deeply ingrained biases.
Applying these technologies in high-stakes settings could lead to deadly accidents where it’s unclear who or what should be held responsible, or even why the problem occurred. The DoD’s best bet is to apply generative AI to more mundane things like Excel, email, or word processing. Read the full story.
This story is from The Algorithm, Melissa’s weekly newsletter giving you the inside track on all things AI. Sign up to receive it in your inbox every Monday.
The ice cores that will let us look 1.5 million years into the past
To better understand the role atmospheric carbon dioxide plays in Earth’s climate cycles, scientists have long turned to ice cores drilled in Antarctica, where snow layers accumulate and compact over hundreds of thousands of years, trapping samples of ancient air in a lattice of bubbles that serve as tiny time capsules.
By analyzing those cores, scientists can connect greenhouse-gas concentrations with temperatures going back 800,000 years. Now, a new European-led initiative hopes to eventually retrieve the oldest core yet, dating back 1.5 million years. But that impressive feat is still only the first step. Once they’ve done that, they’ll have to figure out how they’re going to extract the air from the ice. Read the full story.
—Christian Elliott
This story is from the latest edition of our print magazine, set to go live tomorrow. Subscribe today for as low as $8/month to ensure you receive full access to the new Ethics issue and in-depth stories on experimental drugs, AI assisted warfare, microfinance, and more.
The must-reads
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
1 How AI got dragged into the culture wars
Fears about ‘woke’ AI fundamentally misunderstand how it works. Yet they’re gaining traction. (The Guardian)
+ Why it’s impossible to build an unbiased AI language model. (MIT Technology Review)
2 Researchers are racing to understand a new coronavirus variant
It’s unlikely to be cause for concern, but it shows this virus still has plenty of tricks up its sleeve. (Nature)
+ Covid hasn’t entirely gone away—here’s where we stand. (MIT Technology Review)
+ Why we can’t afford to stop monitoring it. (Ars Technica)
3 How Hilary became such a monster storm
Much of it is down to unusually hot sea surface temperatures. (Wired $)
+ The era of simultaneous climate disasters is here to stay. (Axios)
+ People are donning cooling vests so they can work through the heat. (Wired $)
4 Brain privacy is set to become important
Scientists are getting better at decoding our brain data. It’s surely only a matter of time before others want a peek. (The Atlantic $)
+ How your brain data could be used against you. (MIT Technology Review)
5 How Nvidia built such a big competitive advantage in AI chips
Today it accounts for 70% of all AI chip sales—and an even greater share for training generative models. (NYT $)
+ The chips it’s selling to China are less effective due to US export controls. (Ars Technica)
+ These simple design rules could turn the chip industry on its head. (MIT Technology Review)
6 Inside the complex world of dissociative identity disorder on TikTok
Reducing stigma is great, but doctors fear people are self-diagnosing or even imitating the disorder. (The Verge)
7 What TikTok might have to give up to keep operating in the US
This shows just how hollow the authorities’ purported data-collection concerns really are. (Forbes)
8 Soldiers in Ukraine are playing World of Tanks on their phones
It’s eerily similar to the war they are themselves fighting, but they say it helps them to dissociate from the horror. (NYT $)
9 Conspiracy theorists are sharing mad ideas on what causes wildfires
But it’s all just a convoluted way to try to avoid having to tackle climate change. (Slate $)
10 Christie’s accidentally leaked the location of tons of valuable art
Seemingly thanks to the metadata that often automatically attaches to smartphone photos. (WP $)
Quote of the day
“Is it going to take people dying for something to move forward?”
—An anonymous air traffic controller warns that staffing shortages in their industry, plus other factors, are starting to threaten passenger safety, the New York Times reports.
The big story
Inside effective altruism, where the far future counts a lot more than the present
October 2022
Since its birth in the late 2000s, effective altruism has aimed to answer the question “How can those with means have the most impact on the world in a quantifiable way?”—and supplied methods for calculating the answer.
It’s no surprise that effective altruisms’ ideas have long faced criticism for reflecting white Western saviorism, alongside an avoidance of structural problems in favor of abstract math. And as believers pour even greater amounts of money into the movement’s increasingly sci-fi ideals, such charges are only intensifying. Read the full story.
—Rebecca Ackermann
We can still have nice things
A place for comfort, fun and distraction in these weird times. (Got any ideas? Drop me a line or tweet ’em at me.)
+ Watch Andrew Scott’s electrifying reading of the 1965 commencement address ‘Choose One of Five’ by Edith Sampson.
+ Here’s how Metallica makes sure its live performances ROCK. ($)
+ Cannot deal with this utterly ludicrous wooden vehicle.
+ Learn about a weird and wonderful new instrument called a harpejji.
Tech
Why we should all be rooting for boring AI
Published
1 year agoon
22 August 2023By
Terry Power
This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.
I’m back from a wholesome week off picking blueberries in a forest. So this story we published last week about the messy ethics of AI in warfare is just the antidote, bringing my blood pressure right back up again.
Arthur Holland Michel does a great job looking at the complicated and nuanced ethical questions around warfare and the military’s increasing use of artificial-intelligence tools. There are myriad ways AI could fail catastrophically or be abused in conflict situations, and there don’t seem to be any real rules constraining it yet. Holland Michel’s story illustrates how little there is to hold people accountable when things go wrong.
Last year I wrote about how the war in Ukraine kick-started a new boom in business for defense AI startups. The latest hype cycle has only added to that, as companies—and now the military too—race to embed generative AI in products and services.
Earlier this month, the US Department of Defense announced it is setting up a Generative AI Task Force, aimed at “analyzing and integrating” AI tools such as large language models across the department.
The department sees tons of potential to “improve intelligence, operational planning, and administrative and business processes.”
But Holland Michel’s story highlights why the first two use cases might be a bad idea. Generative AI tools, such as language models, are glitchy and unpredictable, and they make things up. They also have massive security vulnerabilities, privacy problems, and deeply ingrained biases.
Applying these technologies in high-stakes settings could lead to deadly accidents where it’s unclear who or what should be held responsible, or even why the problem occurred. Everyone agrees that humans should make the final call, but that is made harder by technology that acts unpredictably, especially in fast-moving conflict situations.
Some worry that the people lowest on the hierarchy will pay the highest price when things go wrong: “In the event of an accident—regardless of whether the human was wrong, the computer was wrong, or they were wrong together—the person who made the ‘decision’ will absorb the blame and protect everyone else along the chain of command from the full impact of accountability,” Holland Michel writes.
The only ones who seem likely to face no consequences when AI fails in war are the companies supplying the technology.
It helps companies when the rules the US has set to govern AI in warfare are mere recommendations, not laws. That makes it really hard to hold anyone accountable. Even the AI Act, the EU’s sweeping upcoming regulation for high-risk AI systems, exempts military uses, which arguably are the highest-risk applications of them all.
While everyone is looking for exciting new uses for generative AI, I personally can’t wait for it to become boring.
Amid early signs that people are starting to lose interest in the technology, companies might find that these sorts of tools are better suited for mundane, low-risk applications than solving humanity’s biggest problems.
Applying AI in, for example, productivity software such as Excel, email, or word processing might not be the sexiest idea, but compared to warfare it’s a relatively low-stakes application, and simple enough to have the potential to actually work as advertised. It could help us do the tedious bits of our jobs faster and better.
Boring AI is unlikely to break as easily and, most important, won’t kill anyone. Hopefully, soon we’ll forget we’re interacting with AI at all. (It wasn’t that long ago when machine translation was an exciting new thing in AI. Now most people don’t even think about its role in powering Google Translate.)
That’s why I’m more confident that organizations like the DoD will find success applying generative AI in administrative and business processes.
Boring AI is not morally complex. It’s not magic. But it works.
Deeper Learning
AI isn’t great at decoding human emotions. So why are regulators targeting the tech?
Amid all the chatter about ChatGPT, artificial general intelligence, and the prospect of robots taking people’s jobs, regulators in the EU and the US have been ramping up warnings against AI and emotion recognition. Emotion recognition is the attempt to identify a person’s feelings or state of mind using AI analysis of video, facial images, or audio recordings.
But why is this a top concern? Western regulators are particularly concerned about China’s use of the technology, and its potential to enable social control. And there’s also evidence that it simply does not work properly. Tate Ryan-Mosley dissected the thorny questions around the technology in last week’s edition of The Technocrat, our weekly newsletter on tech policy.
Bits and Bytes
Meta is preparing to launch free code-generating software
A version of its new LLaMA 2 language model that is able to generate programming code will pose a stiff challenge to similar proprietary code-generating programs from rivals such as OpenAI, Microsoft, and Google. The open-source program is called Code Llama, and its launch is imminent, according to The Information. (The Information)
OpenAI is testing GPT-4 for content moderation
Using the language model to moderate online content could really help alleviate the mental toll content moderation takes on humans. OpenAI says it’s seen some promising first results, although the tech does not outperform highly trained humans. A lot of big, open questions remain, such as whether the tool can be attuned to different cultures and pick up context and nuance. (OpenAI)
Google is working on an AI assistant that offers life advice
The generative AI tools could function as a life coach, offering up ideas, planning instructions, and tutoring tips. (The New York Times)
Two tech luminaries have quit their jobs to build AI systems inspired by bees
Sakana, a new AI research lab, draws inspiration from the animal kingdom. Founded by two prominent industry researchers and former Googlers, the company plans to make multiple smaller AI models that work together, the idea being that a “swarm” of programs could be as powerful as a single large AI model. (Bloomberg)
You must be logged in to post a comment Login