Tuesday, March 20, 2012

Stanford Large Network Dataset Collection

http://snap.stanford.edu/snap/index.html
The SNAP library is focuses on the analysis of large social and information networks. Data collected from: Epinions.com, LiveJournal, Slashdot, Wikipedia, Enron email, Arxiv High Energy Physics, US Patents, Google, Amazon, Gnutella, roads from PA/CA/TX, Gowalla, Brightkite, Twitter, Memetracker, etc.

Databases include:
Social networks: online social networks, edges represent interactions between people

Communication networks: email communication networks with edges representing communication

Citation networks: nodes represent papers, edges represent citations

Collaboration networks: nodes represent scientists, edges represent collaborations (co-authoring a paper)

Web graphs: nodes represent webpages and edges are hyperlinks

Amazon networks : nodes represent products and edges link commonly co-purchased products

Internet networks : nodes represent computers and edges communication

Road networks : nodes represent intersections and edges roads connecting the intersections

Autonomous systems : graphs of the internet

Signed networks : networks with positive and negative edges (friend/foe, trust/distrust)

Location-based online social networks : Social networks with geographic check-ins

Wikipedia networks and metadata : Talk, editing and voting data from Wikipedia

Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets

No comments:

Post a Comment