Skip to content

Posts tagged ‘Cloudera’

6
Nov

What is a “Hadoop”? Explaining Big Data to the C-Suite


Keep hearing about Big Data and Hadoop? Having a hard time explaining what is behind the curtain?

The term “big data” comes from computational sciences to describe scenarios where the volume of the data outstrips the tools to store it or process it.

Three reasons why we are generating data faster than ever: (1) Processes are increasingly automated; (2) Systems are increasingly interconnected; (3) People are increasingly “living” online.

DataEvolutionAs huge data sets invaded the corporate world there are new tools to help process big data. Corporations have to run analysis on massive data sets to separate the signal from the noisy data.  Hadoop is an emerging  framework for Web 2.0 and enterprise businesses who are dealing with data deluge challenges – store, process, index,  and analyze large amounts of data as part of their business requirements.

So what’s the big deal? The first phase of e-commerce was primarily about cost and enabling transactions.  So everyone got really good at this. Then we saw differentiation around convenience… fulfillment excellence (e.g., Amazon Prime) , or relevant recommendations (if you bought this and then you may like this – next best offer).

Then the game shifted as new data mashups became possible based on… seeing who is talking to who in your social network, seeing who you are transacting with via credit-card data, looking at what you are visiting via clickstreams, influenced by ad clickthru, ability to leverage where you are standing via mobile GPS location data and so on.

The differentiation is shifting to turning volumes of data into useful insights to sell more effectively. For instance, E-bay apparently has 9 petabytes of data in their Hadoop and Teradata cluster. With 97 million active buyers and sellers they have 2 Billion page view and 75 billion database calls each day.  E-bay like others is racing to put in the analytics infrastructure to (1) collect real-time data; (2) process data as it flows; (3) explore and visualize. Read more »

15
May

New Tools for New Times – Primer on Big Data, Hadoop and “In-memory” Data Clouds


Data growth curve:  Terabytes -> Petabytes -> Exabytes -> Zettabytes -> Yottabytes -> Brontobytes -> Geopbytes.  It is getting more interesting.

Analytical Infrastructure curve: Databases -> Datamarts -> Operational Data Stores (ODS) -> Enterprise Data Warehouses -> Data Appliances -> In-Memory Appliances -> NoSQL Databases -> Hadoop Clusters

———————

In most enterprises, whether it’s a public or private enterprise, there is typically a mountain of data, structured and unstructured data, that contains potential insights about how to serve their customers better, how to engage with customers better and make the processes run more efficiently.  Consider this:

  • Online firms–including Facebook, Visa, Zynga–use Big Data technologies like Hadoop to analyze massive amounts of business transactions, machine generated and application data.
  • Wall street investment banks, hedge funds, algorithmic and low latency traders are leveraging data appliances such as EMC Greenplum hardware with Hadoop software to do advanced analytics in a “massively scalable” architecture
  • Retailers use HP Vertica  or Cloudera analyze massive amounts of data simply, quickly and reliably, resulting in “just-in-time” business intelligence.
  • New public and private “data cloud” software startups capable of handling petascale problems are emerging to create a new category – Cloudera, Hortonworks, Northscale, Splunk, Palantir, Factual, Datameer, Aster Data, TellApart.

Data is seen as a resource that can be extracted and refined and turned into something powerful. It takes a certain amount of computing power to analyze the data and pull out and use those insights. That where the new tools like Hadoop, NoSQL, In-memory analytics and other enablers come in.

What business problems are being targeted?

Why are some companies in retail, insurance, financial services and healthcare racing to position themselves in Big Data, in-memory data clouds while others don’t seem to care?

World-class companies are targeting a new set of business problems that were hard to solve before – Modeling true risk, customer churn analysis,  flexible supply chains, loyalty pricing, recommendation engines, ad targeting, precision targeting, PoS transaction analysis, threat analysis, trade surveillance, search quality fine tuning,  and mashups  such as location + ad targeting.

To address these petascale problems an elastic/adaptive infrastructure for data warehousing and analytics capable of three things is converging:

  • ability to analyze transactional,  structured and unstructured data on a single platform
  • low-latency in-memory or Solid State Devices (SSD) for super high volume web and real-time apps
  • Scale out with low cost commodity hardware; distribute processing  and workloads

As a result,  a new BI and Analytics framework is emerging to support public and private cloud deployments.

Read more »

1
May

The Vendor Landscape of BI and Analytics


“In God we trust, all others bring data”
—————————-

The “Raw Data -> Aggregated Data -> Intelligence -> Insights -> Decisions” is a differentiating causal chain in business today.  To service this “data->decision” chain a very large industry is emerging.

The Business Intelligence, Performance Management and Data Analytics is a large confusing software category with multiple sub-categories — mega-vendors (full stack, niche vendors, data discovery, visualization, data appliances, Open Source, Cloud – SaaS, Data Integration, Data Quality, Mobile BI, Services and Custom Analytics).

But the interest in BI and analytics is surging. Arnab Gupta, CEO of Opera states why analytics are taking center stage, “We live in a world where computers, not people, are in the driver’s seat. In banking, virtually 100% of the credit decisions are made by machines. In marketing, advanced algorithms determine messages, sales channels, and products for each consumer. Online, more and more volume is spurred by sophisticated recommender engines. At Amazon.com, 40% of business comes from its “other people like you bought…” program.”  (Businessweek, September 29, 2009).

Here is a list of vendors who participate in this marketspace:

Read more »