ArXivSorter: A Sorting Algorithm for Astro-ph

Overwhelmed by the large number of papers on astro-ph? Only have time to look at the relevant papers?  Or back from an internet-free vacation (what is that?) and want to find out what you missed?  Try ArXivSorter, which uses a friends-of-friends algorithm to sort the papers on astro-ph for you. Using an initial list of authors (including yourself) as the input, the algorithm quantitatively looks at how connected you are to the co-authors, calculates a probability to each, and sorts them for you.

If you are like me, you probably don’t have the stamina to read 40–50 abstracts at a time. By the time I get to the end, I am only skimming the abstracts, perhaps even ignoring a relevant paper. By sorting the abstracts, ArXivSorter allows me to first read the abstracts for papers that I am actually likely to read. There is no loss of information. ArXivSorter is not a filter but simply ranks the list of papers based on your settings.

While the level of connectivity is calculated each day, the algorithm does not learn or remember which papers you actually read. In other words, which author you choose to read today has no effect on the future ranking of his/her papers.

One feature I particularly like is that I have access to a ranked list of recent papers for times when I was not able to stay up to date. Or I can sort the papers for a particular month, for all papers since astro-ph was launched in 1992.

Of course, reading papers that are ranked based on your preferences has a major drawback: you always read what you are likely to read (just like what Google search does for you these days). Papers from authors from outside your “network” will be ranked low. The algorithm discriminates against newer authors as they are not well-connected. For this reason, I almost always look at the entire list, although my attention span is likely to scale with the rank. As the algorithm uses last name_first initial (e.g. dhital_s) as the identifier for all authors with that name, common names (e.g., smith_a) are going to be confusion-limited. Short Chinese surnames, in particular, are likely to be affected.

The website was started and is maintained by Jean-Philippe Magué & Brice Ménard, circa 2007. I have been using the website since and have found the ranking to work very well. There are occasional very relevant papers that are ranked low, but I have learned and skim the entire list. It was particularly useful when I was an early graduate student, needing to read a lot of papers but not knowing how to choose them.

There exist some great ways to read astro-ph : subscribing to the mailing list, read the web-based version in its entirety or the categorized version, or use the collaborative web-based Vox Charta. In the age when we read papers months before they are actually published in the journals, ArXivSorter provides another method to find the papers we want to read.

How do you read astro-ph papers?

[Update: Check out the Astro-ph tools page on the wiki.]
11 comments… add one
  • Anthony Smith Apr 18, 2012 @ 6:20

    I like the CosmoCoffee filter for the arXiv, which searches for keywords and phrases and sorts the list accordingly: http://cosmocoffee.info/arxiv_new.php

  • Thomas Apr 18, 2012 @ 11:20

    myADS (http://myads.harvard.edu/) also sorts astro-ph by user-choosen keywords and sends the results to you every morning in an email.

    I’m always surprised more people don’t seem to know about it.

  • Warrick Apr 18, 2012 @ 12:48

    I’m subscribed to the RSS feed for astro-ph (and the major journals). I have a sort of iterative scheme. On first glance I discard based on title. When I have a little time to spare, I go through some abstracts and discard some more. That leaves me with a manageable selection of papers to consider more thoroughly.

    I think the main advantages of subscribing via RSS are that (a) I don’t miss anything and (b) I can peruse the list from anywhere I can login to Google Reader, including my phone. Still, this might be a useful way of finding old material, too.

  • Chris Beaumont Apr 18, 2012 @ 13:26

    I found that organizing papers by subject, and hiding the abstracts by default, allows me to quickly scan each day’s listing for relevant articles. I made the astro-ph map to make that easier: http://www.ifa.hawaii.edu/users/beaumont/astroph/

    • saurav Apr 18, 2012 @ 13:44

      Chris, that is really cool! I assume the different colors only serve to delineate the different papers?

    • Chris Beaumont Apr 18, 2012 @ 14:07

      Yes, that’s right. An early version used color and box size to encode extra information, but I was never really happy with it. Color now just serves to make things legible.

  • Nathan Goldbaum Apr 18, 2012 @ 14:32

    If your institution uses voxcharta, I’ve found the sorting it does based on either your votes or your institution’s votes is usually quite good. It’s all based on keywords in abstracts.

  • Jocelyn Apr 18, 2012 @ 14:55

    I’ll second http://myads.harvard.edu/, a few good keywords and the papers I am interested in float to the top of the list – including those from authors I am less familiar with.

  • Jessica Lu Apr 18, 2012 @ 18:33

    I believe VoxCharta ranks on keywords and authors. And the ranking is dynamic so that every time you vote for a paper, the keywords and authors for that paper are added or upped in your ranking criteria. However, you can only receive the “recommended” papers rather than a total ranked list.

    • Ciska Kemper Apr 20, 2012 @ 12:00

      @Jessica : In Vox Charta you can actually sort “Today’s posts” based on your voting history. This sorts the entire list per day according to your voting record on keywords and authors. That way you do get a total ranked list.

Leave a Reply

Your email address will not be published. Required fields are marked *