Digital journalism has evolved to a point of paradox: we now have access to such an overwhelming amount of news that it’s actually become more difficult to understand current events.
IDEO New York developer Francis Tseng is—in his spare time—searching for a solution to the problem by exploring its root: the relationship between content and code. Tseng received a grant from the Knight Foundation to develop Argos*, an online news aggregation app that intelligently collects, summarizes and provides contextual information for news stories. Having recently finished version 0.1.0, which he calls the first “complete-ish” release of Argos, Tseng spoke with veteran journalist and documentary filmmaker Jason Cohn about the role technology can play in our consumption—and comprehension—of the news.
How would you describe to my mom, who just got her first smartphone, how Argos works and why it’s useful?
I’d say that it collects all of the important news and tells you exactly what you need to know about it. Instead of tracking separate publications, it pulls in new articles from a variety of different sources (right now, just a few major publications) and condenses everything into a set of bullet points. Argos uses a technique called “hierarchical agglomerative clustering” (HAC) to organize news stories. HAC takes each individual article and compares it to every other article, and comes up with a number representing how similar they are, on a scale of 0-1. If the similarity is above some threshold—say, greater than 0.7—the articles are considered to be talking about the same event, and they are grouped together into “event clusters.” The same HAC clustering technique is then applied to the event clusters to organize events into “story clusters.”
A sentence ranking algorithm generates a summary of each event cluster by analyzing the text of every article in the cluster and assigning an “informativeness” value to each sentence. It then takes the top five highest-ranked sentences and forms a summary. Right now I have a very simple version implemented, but the long-run focus will be on improving these core algorithms that do the heavy lifting.
Are journalists necessary anymore? Have I been replaced by an algorithm?
I personally don’t think that journalists will be replaced, at least not for a long time. Someone still needs to go out and do the research and the original reporting to get the information that the algorithm uses, which seems like the hardest and most valuable part of journalism.
It seems like some new tech applications derive from the recognition of a new technical capacity, which you then look for novel ways to use. In other cases, it begins with the recognition of a need, and you look for extant technologies that provide a possible solution. Which way did you approach Argos?
Both. I’ve always had a fascination with artificial intelligence, and that fascination has probably made me more sensitive to problem areas where AI approaches could be applicable. Personally, I find information overload, especially when it comes to keeping up with the news, really anxiety inducing and overwhelming, and I think because I was already familiar with some of these text processing techniques, it made sense to try applying them to this particular problem.
I’m glad you brought up information anxiety and overload. I think it’s a huge barrier to understanding the news today. There is so much information and analysis, but I don’t trust any of it, and I don’t know how to begin to get to the bottom of what’s really happening. Do you think Argos, or technology in general, can be an aid in this problem?
That's a particularly good example of the fact that there are different kinds of information overload. There’s one that is just about having too much information, but there’s another that is about figuring out which information you can trust. In general I would say technology has the potential to salve these kinds of anxieties, but it can just as easily do the opposite. More than anything, I hope Argos reduces the problem rather than adds to it, but one of the big failings I’ve felt with this first version is that it doesn’t accomplish that; it ends up generating more noise. Based on the feedback I’ve gotten, I'm looking at new designs and reevaluating techniques in order to rectify that.
What did you learn through building the first version that will influence the next one?
When I originally conceived of Argos, I was (and probably still am) naive about the extent of the issues that plague digital journalism. The more I looked into it, the more problems came up. Initially, the idea was about reducing information overload, mostly by supplying missing context to stories. But after completing this first version, I think that reducing overload and providing context might be really big and different issues. So right now I’m trying to determine how—and if—they can be solved simultaneously.
You mean because providing more context increases—by a lot—the amount of information a news consumer requires?
Exactly. The other main goal was increasing the understanding one got out of reading the news, which is what context is supposed to help with. But that’s kind of useless if all that information ends up driving people away from reading the news in the first place.
What changes are you planning for the next version of Argos?
In the first version the design was focused more on providing an experience similar to a traditional news publication or news reader. The app was organized around different feeds, or streams, presented as full-screen images with the title and summary of each event.
One issue was that the design wasn’t distinct from other news readers—you can’t really tell from looking at it that it’s offering anything different from a traditional RSS reader or news application. And it’s really hard to expect that every news story will come with a nice big image. Also, though, I think that in approaching the app as a traditional news publication, I missed the point of what I was trying to accomplish. The feedback I've gotten has helped me understand that Argos can have a more useful and unique place in being kind of a “daily scan” of the day's news events. So the new designs are more akin to brief updates about events around the world, to provide people with an entry or starting point for reading more and getting involved in discussions.
And you give them avenues for digging deeper into contexts and definitions?
Yes. The original source material is there to read, and context is optionally provided. You can choose to read more about the event itself, or about the people, places, organizations, etc. that are involved.
What about the issue of bias? As journalists, we’ve always had to be wary of our sources. And as consumers of news, it’s always been a good idea to assume that someone with an agenda is feeding reporters their information. How does Argos handle biases and opinions that are embedded in a story?
The issue of bias in journalism is one of those big problems that cropped up when I looked more closely at what I wanted Argos to do. It seems like a perennial issue. One potential solution is that, if you sample a large enough set of publications, bias might “average out,” so that any summary that is formed from their aggregate has less bias than any individual article. I don’t know if that works in practice though. My cop-out answer would be that Argos isn’t really trying to solve the problem of bias, but personally I hope that one day it can. Looking at the history and associations of the author who wrote the piece could actually be a very good technique. In the new version of Argos, I want to incorporate more community dialogue around the events and stories. In my experience, a lot of these issues about bias—being as complex and nuanced as they are—are best hashed out by individuals discussing them.
I think I have to disagree with you on that! My feeling about web comments on news stories is that they rarely shed much light relative to heat.
It is very difficult to pull off correctly. A lot of comments devolve into knee-jerk shouting matches. Occasionally, though, I’ve found something really insightful. On Reddit, sometimes experts will come in and succinctly explain things, and every once in a while you see a very levelheaded discussion about a story. I think the potential is there, but it’s hard to get it right.
One thing I’ve been focusing on is that I’m trying to get my hands mostly on reporting that’s coming from the field. The rest is just analysis, which, in my opinion, is more susceptible to ideological influence.
Yes! I have been thinking a lot about how to incorporate grassroots and community reporting. I’m dissatisfied with Argos’s current approach of relying on major publications, but it’s just easier at this stage.
Do you need to have content licensing relationships with news organizations to harvest the data or can you just… borrow it?
It’s possible this usage of the content falls under the terms of “fair use.” All the original articles are cited, and I’m hoping that these high-level summaries can interest people in reading the original posts. Since things are still quite experimental, I’m not worrying about the potential legal consequences now, but at some point I will want to talk to the publishers.
It seems to me that one really useful type of story for Argos is something like the lost Malaysian airliner. There was a big burst of real news when the plane disappeared, and then there was an endless cycle of conjecture and empty reportage with very little actual “news.” If you were following that story on Argos, would you avoid being updated every five minutes to discover that they still haven’t found the missing plane?
Yes, if the clustering algorithm works well enough, it should be able to recognize that these articles are talking about exactly the same thing as the ones from last week, so it’s not a new event.
It would be nice if cable news worked like that.
Maybe with Argos it will!
*Argos is a personal project of Francis Tseng's and is not affiliated with IDEO.