Digital journalism has evolved to a point of paradox: we now have access to such an overwhelming amount of news that it’s actually become more difficult to understand current events.
IDEO New York developer Francis Tseng is—in his spare time—searching for a solution to the problem by exploring its root: the relationship between content and code. Tseng received a grant from the Knight Foundation to develop Argos*, an online news aggregation app that intelligently collects, summarizes and provides contextual information for news stories. Having recently finished version 0.1.0, which he calls the first “complete-ish” release of Argos, Tseng spoke with veteran journalist and documentary filmmaker Jason Cohn about the role technology can play in our consumption—and comprehension—of the news.
I’d say that it collects all of the important news and tells you exactly what you need to know about it. Instead of tracking separate publications, it pulls in new articles from a variety of different sources (right now, just a few major publications) and condenses everything into a set of bullet points. Argos uses a technique called “hierarchical agglomerative clustering” (HAC) to organize news stories. HAC takes each individual article and compares it to every other article, and comes up with a number representing how similar they are, on a scale of 0-1. If the similarity is above some threshold—say, greater than 0.7—the articles are considered to be talking about the same event, and they are grouped together into “event clusters.” The same HAC clustering technique is then applied to the event clusters to organize events into “story clusters.”
A sentence ranking algorithm generates a summary of each event cluster by analyzing the text of every article in the cluster and assigning an “informativeness” value to each sentence. It then takes the top five highest-ranked sentences and forms a summary. Right now I have a very simple version implemented, but the long-run focus will be on improving these core algorithms that do the heavy lifting.
I personally don’t think that journalists will be replaced, at least not for a long time. Someone still needs to go out and do the research and the original reporting to get the information that the algorithm uses, which seems like the hardest and most valuable part of journalism.
Both. I’ve always had a fascination with artificial intelligence, and that fascination has probably made me more sensitive to problem areas where AI approaches could be applicable. Personally, I find information overload, especially when it comes to keeping up with the news, really anxiety inducing and overwhelming, and I think because I was already familiar with some of these text processing techniques, it made sense to try applying them to this particular problem.
That's a particularly good example of the fact that there are different kinds of information overload. There’s one that is just about having too much information, but there’s another that is about figuring out which information you can trust. In general I would say technology has the potential to salve these kinds of anxieties, but it can just as easily do the opposite. More than anything, I hope Argos reduces the problem rather than adds to it, but one of the big failings I’ve felt with this first version is that it doesn’t accomplish that; it ends up generating more noise. Based on the feedback I’ve gotten, I'm looking at new designs and reevaluating techniques in order to rectify that.
When I originally conceived of Argos, I was (and probably still am) naive about the extent of the issues that plague digital journalism. The more I looked into it, the more problems came up. Initially, the idea was about reducing information overload, mostly by supplying missing context to stories. But after completing this first version, I think that reducing overload and providing context might be really big and different issues. So right now I’m trying to determine how—and if—they can be solved simultaneously.
Exactly. The other main goal was increasing the understanding one got out of reading the news, which is what context is supposed to help with. But that’s kind of useless if all that information ends up driving people away from reading the news in the first place.
In the first version the design was focused more on providing an experience similar to a traditional news publication or news reader. The app was organized around different feeds, or streams, presented as full-screen images with the title and summary of each event.
One issue was that the design wasn’t distinct from other news readers—you can’t really tell from looking at it that it’s offering anything different from a traditional RSS reader or news application. And it’s really hard to expect that every news story will come with a nice big image. Also, though, I think that in approaching the app as a traditional news publication, I missed the point of what I was trying to accomplish. The feedback I've gotten has helped me understand that Argos can have a more useful and unique place in being kind of a “daily scan” of the day's news events. So the new designs are more akin to brief updates about events around the world, to provide people with an entry or starting point for reading more and getting involved in discussions.
Yes. The original source material is there to read, and context is optionally provided. You can choose to read more about the event itself, or about the people, places, organizations, etc. that are involved.
The issue of bias in journalism is one of those big problems that cropped up when I looked more closely at what I wanted Argos to do. It seems like a perennial issue. One potential solution is that, if you sample a large enough set of publications, bias might “average out,” so that any summary that is formed from their aggregate has less bias than any individual article. I don’t know if that works in practice though. My cop-out answer would be that Argos isn’t really trying to solve the problem of bias, but personally I hope that one day it can. Looking at the history and associations of the author who wrote the piece could actually be a very good technique. In the new version of Argos, I want to incorporate more community dialogue around the events and stories. In my experience, a lot of these issues about bias—being as complex and nuanced as they are—are best hashed out by individuals discussing them.
It is very difficult to pull off correctly. A lot of comments devolve into knee-jerk shouting matches. Occasionally, though, I’ve found something really insightful. On Reddit, sometimes experts will come in and succinctly explain things, and every once in a while you see a very levelheaded discussion about a story. I think the potential is there, but it’s hard to get it right.
Yes! I have been thinking a lot about how to incorporate grassroots and community reporting. I’m dissatisfied with Argos’s current approach of relying on major publications, but it’s just easier at this stage.
It’s possible this usage of the content falls under the terms of “fair use.” All the original articles are cited, and I’m hoping that these high-level summaries can interest people in reading the original posts. Since things are still quite experimental, I’m not worrying about the potential legal consequences now, but at some point I will want to talk to the publishers.
Yes, if the clustering algorithm works well enough, it should be able to recognize that these articles are talking about exactly the same thing as the ones from last week, so it’s not a new event.
Maybe with Argos it will!
*Argos is a personal project of Francis Tseng's and is not affiliated with IDEO.
Jason Cohn is a journalist and documentary filmmaker who has written for Rolling Stone, the Los Angeles Times and other major periodicals. He recently produced and directed the award-winning EAMES: The Architect and the Painter for American Masters on PBS.