The pagerank and other metadata we compute is not part of the S3 corpus, but we ...

The pagerank and other metadata we compute is not part of the S3 corpus, but we do collect this information and probably will make it available in a separate S3 bucket in Hadoop SequenceFiles format. Be aware that our pagerank will probably not have a high degree of correlation to Google's pagerank number, since their pagerank calculation is going to be a lot more sophisticated than our version.