Huge amounts of information are flooding companies every second, which has led to an increased focus on big data and the ability to capture and analyze this sea of information. Enterprises are turning to big data and Apache Hadoop in order to improve business performance and provide a competitive advantage. But to unlock business value from data quickly, easily and cost-effectively, organizations need to find and deploy a truly reliable Hadoop infrastructure that can perform, scale, and be used safely for mission-critical applications.
As more and more Hadoop projects are being deployed to provide actionable results in real-time or near real-time, low latency has become a key factor that influences a company's Hadoop distribution choice. Thus, performance and scalability should be evaluated closely before choosing a particular Hadoop solution.
Performance
The raw performance of a Hadoop platform is critical; it refers to how quickly the platform can ingest, process and analyze information. The MapR Distribution for Hadoop in particular provides world-record performance for MapReduce operations on Hadoop. Its advanced architecture harnesses distributed metadata with an optimized shuffle process, delivering consistent high performance.
The graph below compares the MapR M7 Edition with another Hadoop distribution, and it vividly illustrates the vast difference in latency and performance between these Hadoop distributions.
One particular solution that is optimized for performance is Cisco UCS with MapR. MapR on the Cisco Unified Computing System? (Cisco UCS?) is a powerful, production-ready Hadoop solution that increases business and IT agility, supports mission-critical workloads, reduces total cost of ownership (TCO), and delivers exceptional return on investment (ROI) at scale.
The Cisco/MapR solution for Hadoop is designed, tested and validated to handle the most demanding workloads. The MapR and Cisco UCS combination brings the power of the MapR distribution to a dependable deployment model that can be quickly implemented and customized using the Cisco Unified Fabric and powerful Cisco UCS rack servers. In addition, MapR has integrated with Cisco? Tidal Enterprise Scheduler (Cisco TES), making it easy for administrators to provide automated load balancing, data exchange, and advanced event-based scheduling on MapR.
Scalability
Another attribute of any Hadoop platform is its scalability, which is the ability to expand in terms of number of nodes, node density, tables, files, etc. When looking at various Hadoop alternatives, make sure that your choice of Hadoop platform can scale easily and cost-effectively without requiring administrators to make changes to application logic.
The MapR Distribution for Hadoop is fully optimized for scalability. The Cisco UCS solution for MapR is based on the Cisco? UCS Common Platform Architecture (CPA) for Big Data (Cisco Validated Design). The Cisco UCS CPA is a highly scalable architecture designed to meet a variety of scale-out application demands with transparent data and management integration capabilities. Whether you're deploying a large data center or buying single racks, the Cisco UCS CPA with MapR solution can be sized to deliver advanced performance, enabling Hadoop to scale as your workload increases.
Cisco UCS CPA for Big Data integrates industry-leading computing, networking, and management capabilities into a unified, fabric-based architecture optimized for big data workloads.
If you're looking to Hadoop to help you unlock business value from your data, it's important to consider your Hadoop distribution choice carefully. With Cisco UCS and MapR, you'll benefit from maximum availability, high performance and scalability.
By the way, Cisco will be showcasing our Unified Data Center portfolio at Red Hat Summit in San Francisco from April 14th to April 17th and at Cisco Live San Francisco from May 18th to May 22nd. Stop by and say hello and let me know if you have any comments or questions, or via twitter at @CicconeScott.