Get your own customer support community

Recent activity

Subscribe to this feed
  • question

    cboyce replied on April 30, 2009 05:23 to the question "Is Hadoop appropriate for real-time log processing?" in Cloudera:

    cboyce
    Thank you both for your quick and helpful feedback. In our particular environment, real-time would mean ideally mean withing 5 seconds, so that, for example, a support technician could observer whether or not a user was properly authenticating while they had the customer on the phone. Even if real-time is not an option, I think if we were able to compile log information around every other hour, and provide support technicians with access to the real-time data, we may still be able to give our support group a greater set of tools than they have now.
  • question

    cboyce asked a question in Cloudera on April 29, 2009 22:14:

    cboyce
    Is Hadoop appropriate for real-time log processing?
    I'm still working through the training videos and doc, so I haven't quite got into the Hadoop paradigm of thinking yet. Hopefully the community can help me determine if Hadoop is the right solution for what I'm looking to do. I'm considering Hadoop for processing various access logs for use by technical support at an independent ISP. The idea is to provide something like an index via username, so that a support technician can look up all activity across various logs associated with a particular customer. One of my concerns is that we don't have the scale that would benefit by large scale data processing that Hadoop is targeted at. A month's worth of logs weigh in at around 100 gigs, and we'd probably have 10 or so machines (at the most) to be able to throw at Hadoop. This seems like small potatoes compared to what Hadoop is meant to do. Another concern is with HDFS's inability to update files. Ideally, support techs would have access to real-time log indexes, since often times they'll be watching the logs with the customer on the phone. The more I think about it, the more it seems like the data would be processed in real-time, and later queried. The training videos recommended against querying against HDFS directly because of speed concerns, so I'm not sure if that's an option. Not having a thorough understanding of Hadoop yet makes it difficult for me to tell if I'm suffering from "everything's a nail when you have a hammer" syndrome, so hopefully the community can chime in and help me determine if this is an appropriate tool for what we want or not.