ArXivSorter: A Sorting Algorithm for Astro-ph

by saurav on April 18, 2012

Overwhelmed by the large number of papers on astro-ph? Only have time to look at the relevant papers?  Or back from an internet-free vacation (what is that?) and want to find out what you missed?  Try ArXivSorter, which uses a friends-of-friends algorithm to sort the papers on astro-ph for you. Using an initial list of authors (including yourself) as the input, the algorithm quantitatively looks at how connected you are to the co-authors, calculates a probability to each, and sorts them for you.

If you are like me, you probably don’t have the stamina to read 40–50 abstracts at a time. By the time I get to the end, I am only skimming the abstracts, perhaps even ignoring a relevant paper. By sorting the abstracts, ArXivSorter allows me to first read the abstracts for papers that I am actually likely to read. There is no loss of information. ArXivSorter is not a filter but simply ranks the list of papers based on your settings.

While the level of connectivity is calculated each day, the algorithm does not learn or remember which papers you actually read. In other words, which author you choose to read today has no effect on the future ranking of his/her papers.

One feature I particularly like is that I have access to a ranked list of recent papers for times when I was not able to stay up to date. Or I can sort the papers for a particular month, for all papers since astro-ph was launched in 1992.

Of course, reading papers that are ranked based on your preferences has a major drawback: you always read what you are likely to read (just like what Google search does for you these days). Papers from authors from outside your “network” will be ranked low. The algorithm discriminates against newer authors as they are not well-connected. For this reason, I almost always look at the entire list, although my attention span is likely to scale with the rank. As the algorithm uses last name_first initial (e.g. dhital_s) as the identifier for all authors with that name, common names (e.g., smith_a) are going to be confusion-limited. Short Chinese surnames, in particular, are likely to be affected.

The website was started and is maintained by Jean-Philippe Magué & Brice Ménard, circa 2007. I have been using the website since and have found the ranking to work very well. There are occasional very relevant papers that are ranked low, but I have learned and skim the entire list. It was particularly useful when I was an early graduate student, needing to read a lot of papers but not knowing how to choose them.

There exist some great ways to read astro-ph : subscribing to the mailing list, read the web-based version in its entirety or the categorized version, or use the collaborative web-based Vox Charta. In the age when we read papers months before they are actually published in the journals, ArXivSorter provides another method to find the papers we want to read.

How do you read astro-ph papers?

[Update: Check out the Astro-ph tools page on the wiki.]

{ 11 comments… read them below or add one }

1 Anthony Smith April 18, 2012 at 6:20 am

I like the CosmoCoffee filter for the arXiv, which searches for keywords and phrases and sorts the list accordingly: http://cosmocoffee.info/arxiv_new.php

Reply

2 Thomas April 18, 2012 at 11:20 am

myADS (http://myads.harvard.edu/) also sorts astro-ph by user-choosen keywords and sends the results to you every morning in an email.

I’m always surprised more people don’t seem to know about it.

Reply

3 Kelle April 22, 2012 at 11:05 pm
4 Warrick April 18, 2012 at 12:48 pm

I’m subscribed to the RSS feed for astro-ph (and the major journals). I have a sort of iterative scheme. On first glance I discard based on title. When I have a little time to spare, I go through some abstracts and discard some more. That leaves me with a manageable selection of papers to consider more thoroughly.

I think the main advantages of subscribing via RSS are that (a) I don’t miss anything and (b) I can peruse the list from anywhere I can login to Google Reader, including my phone. Still, this might be a useful way of finding old material, too.

Reply

5 Chris Beaumont April 18, 2012 at 1:26 pm

I found that organizing papers by subject, and hiding the abstracts by default, allows me to quickly scan each day’s listing for relevant articles. I made the astro-ph map to make that easier: http://www.ifa.hawaii.edu/users/beaumont/astroph/

Reply

6 saurav April 18, 2012 at 1:44 pm

Chris, that is really cool! I assume the different colors only serve to delineate the different papers?

7 Chris Beaumont April 18, 2012 at 2:07 pm

Yes, that’s right. An early version used color and box size to encode extra information, but I was never really happy with it. Color now just serves to make things legible.

8 Nathan Goldbaum April 18, 2012 at 2:32 pm

If your institution uses voxcharta, I’ve found the sorting it does based on either your votes or your institution’s votes is usually quite good. It’s all based on keywords in abstracts.

Reply

9 Jocelyn April 18, 2012 at 2:55 pm

I’ll second http://myads.harvard.edu/, a few good keywords and the papers I am interested in float to the top of the list – including those from authors I am less familiar with.

Reply

10 Jessica Lu April 18, 2012 at 6:33 pm

I believe VoxCharta ranks on keywords and authors. And the ranking is dynamic so that every time you vote for a paper, the keywords and authors for that paper are added or upped in your ranking criteria. However, you can only receive the “recommended” papers rather than a total ranked list.

Reply

11 Ciska Kemper April 20, 2012 at 12:00 pm

@Jessica : In Vox Charta you can actually sort “Today’s posts” based on your voting history. This sorts the entire list per day according to your voting record on keywords and authors. That way you do get a total ranked list.

Leave a Comment

Previous post:

Next post: