Jump to content
Korean Random

UfoRia

User
  • Posts

    1
  • Joined

  • Last visited

Reputation

0 Noob

Basic information

  • Gender
    Не определилось ;)

Contacts

  • Nick
    UfoRia
  1. Hi, I can't post in the development forum since I just registered a moment ago, so I thought I would post here in the hopes someone would be able to answer my question. Is there any interface to query the database directly, flat file gobbling or donations for a data dump? I run a large cloudera hadoop/yarn cluster at my house and I am looking for large datasets to test some ipython modules we have written. I also would like to be able to calculate my stats, friends stats and my sons stats in real-time without waiting a week. We have written modules for map reducing a large number of stock indexes (15 min delays) and I am working on blending wg api stats, along with replay mining and was wondering if the xvm database (or reporting database) was accessible for use. I am already dumping all of my stats to amazon s3/sqs, then map reducing the files and dropping them in hadoop. I use a blend of hive, cassandra and mysql that backends an apache cluster that lives in a room by my garage and assorted Amazon EC2 instances in multiple availability zones. I enjoy most of the existing wot stat sites and thought it would be a relatively good test for our star cluster, ipython and map reduction routines. We can handle several million transactions per minute, but the data we use is boring and providing several hundred statistics provided in quasi-realtime isn't a real test using crap data. We have serialization routines that can parse flat files as well, so if any mechanism is available please let me know. We'd comply with any nda, gpl *, etc requirements since we want to test our framework with something fun and provide something to the community once we're complete. Anyhow, it doesn't hurt to ask. :) Thanks! Ufo edit: If there is a mechanism in place to do so, we would honor it. We'd likely serialize the data, so as long as its less than 500tb serialized/map-reduced we'd be good to go.
×
×
  • Create New...