If the analytics team wrestles with getting access to data, how timely are the insights?
To address the question…Global CIO are shifting their strategy — “need to build data-as-a-service offering for my data” to enable the analytics users in the organization. The more advanced CIOs are asking – “how should I build data science capabilities as a shared foundation service?”
The CIO challenge is not trivial. Successful organizations today operate within application and data eco-systems which extend across front-to-back functions (sales & marketing all the way to fulfillment and service) and well beyond their own boundaries. They must connect digitally to their suppliers, partners, distributors, resellers, regulators and customers. Each of these have their “data fabrics” and applications which were never designed to connect, so with all the data-as-a-service and big data rhetoric, the application development community being asked to “work magic” in bringing them together.
Underutilization and the complexity of managing growing data sprawl is not new. But the urgency to address this is increasing dramatically during the last several years. Data-as-a-Service (DaaS) is seen as a big opportunity in improving IT efficiency and performance through centralization of resources. DaaS strategies have increased dramatically in the last few years with the maturation of technologies such as data virtualization, data integration, MDM, SOA, BPM and Platform-as-a-service.
The questions which are accelerating the Data-as-a-Service (DaaS) trend: How to deliver the right data to the right place at the right time? How to “virtualize” the data often trapped inside applications? How to support changing business requirements (analytics, reporting, and performance management) in spite of ever changing data volumes and complexity.
The Execution Challenge for DaaS
Typical analytics environments are complex programming environments (e.g., SAS) separated from the sources of data, leading to costly data movement, cumbersome model deployment, and all of the errors and delays associated with silos of development. This challenge is multiplied with the information volumes associated with Big Data.
Extracting data from the source databases and application silos (like SAP) with traditional analytics programs is slow and complex, prompting most enterprises to leverage subsets or samples of the available information. While samples are sufficient in certain contexts, enterprises require a “DaaS” platform capable of delivering accurate, actionable, and timely data no matter the size or the location of the data.
DaaS strategies are becoming extra-ordinarily complex as real-time and low-latency business processes become the target state. They have to integrate .Net applications, mobile devices, the cloud and social data into a universal connectivity fabric. Informatica at their Analyst meeting presented a great layered figure that illustrates the evolving DaaS challenge facing us as we migrate towards the Internet of Things.
Given the growing complexity….Enterprise DaaS strategy & Infrastructure is core focus area for business unit and enterprise CIOs. The reasons include:
- Enterprise Datawarehouse (EDW) strategies are increasingly moving to cross enterprise Data-as-a-Service (DaaS) strategies.
- Structured and unstructured data growth force the evolution to DaaS – otherwise there is going to be chaos.
- As Data in app silos moves to a centralized corporate/enterprise asset – DaaS infrastructure becomes critical.
- To do any form of real-time enterprise analytics, operational transparency or KPIs….you need DaaS in place first.
In the early years of this market, most DaaS was focused primarily on the financial services, telecom, and government sectors. However, in the past 24 months, we have seen a significant increase in adoption in the healthcare, insurance, retail, manufacturing, eCommerce, and media/entertainment sectors.
Data as a Service (DaaS) Use Cases
DaaS value is articulated well by Sam Hamilton, vice president of data at PayPal in an MIT/SAS survey: (with a DaaS foundation)… We have gone from report creation that takes weeks or months to deliver, to self-service, real- time data analysis. And we have gone from data analysis done by a small group of analysts to data-driven decisions throughout PayPal, done by most of the staff. All of this has progressively shrunk the latency of time to value of data.
Data as a Service (DaaS) is based on the concept that the fragmented transaction, product, customer data can be provided on demand to the user regardless of geographic or organizational separation of provider and consumer. Additionally, the emergence of PaaS and service-oriented architecture (SOA) has rendered the actual platform on which the data resides also irrelevant.
Data as a Service (DaaS) has many target use cases:
- providing a single version of the truth
- integration of data from multiple systems of record
- integrating data across different systems of engagement
- enabling real-time business intelligence (BI)
- high-performance scalable transaction processing
- federating views across multiple domains
- improving security and access
- integrating with cloud and partner data and social media
- delivering real-time information to mobile apps
- enterprisewide search
Organizations are looking to solve tough data and process integration challenges as they once again begin to invest in new business capabilities.
What is Data-as-a-Service (DaaS)?
Data as a Service (DaaS) brings the notion that data related services can happen in a centralized place – aggregation, quality, cleansing and enriching data and offering it to different systems, applications or mobile users, irrespective of where they were. DaaS is the major enabler of the Master Data Management (MDM) concept.
Master Data has been the holy grail in enterprise data management for almost two decades now. The focus for most firms is on the single version of the truth or Golden Source “Product”, “Customer”, “Transaction” and “Supplier” data. Why? Fragmented inconsistent Product data slows time-to-market, creates supply chain inefficiencies, results in weaker than expected market penetration, and drives up the cost of compliance. Fragmented inconsistent Customer data hides revenue recognition, introduces risk, creates sales inefficiencies, and results in misguided marketing campaigns and lost customer loyalty. Fragmented and inconsistent Supplier data reduces efficiency, negatively impacts spend control initiatives, and increases the risk of supplier exceptions.
DaaS solutions provide the plumbing that enable MDM playbooks. They provide the following advantages:
- Agility (and Time to Market) – Customers can move quickly due to the consolidation of data access and the fact that they don’t need extensive knowledge of the underlying data. If customers require a slightly different data structure or has location specific requirements, the implementation is easy because the changes are minimal.
- Cost-effectiveness – Providers can build the base with the data experts and outsource the presentation layer, which makes for very cost-effective report and dashboard user interfaces and makes change requests at the presentation layer much more feasible.
- Data quality – Access to the data is controlled via data services, which tends to improve data quality, as there is a single point for updates. Once those services are tested thoroughly, they only need to be regression tested, if they remain unchanged for the next deployment.
- Cloud like Efficiency, High availability and Elastic capacity. These benefits derive from the virtualization foundation —one gets efficiency from the high utilization of sharing physical servers, availability from clustering across multiple physical servers, and elastic capacity from the ability to dynamically resize clusters and/or migrate live cluster nodes to different physical servers.
Agility (and Time to Market) is the important driver for DaaS probably more than cost.
Data-as-a-Service (DaaS) Elements
Client need — “I want to enable the MDM strategy. I want to build a data-as-a-service offering for my data” to the rest of the organization.
Components to enable this are as follows:
1) Data acquisition – can come from any source….datawarehouses, emails, portals, third party data sources
2) Data stewardship and standardization — boil it down to a standard manual or auto-magic
3) Data aggregates – Stick build data warehouse for acquisition. This has a strong service and technology driven quality control mechanism. Different than let’s write 100 ETL programs.
4) Data servicing: via web services, extracts, reports etc… Make it easy to consume for the end user either machine to machine or directly via reporting universe.
5) Data Consumption: via visualization tools like Tableau Software or Spotfire, in-memory data association and discovery tools like Qlikview
All these capabilities come together around the data logistics chain. The last few decades have seen a dramatic shift in how data is handled in companies. Firms are shifting away from from a hierarchical, one-dimensional enterprise data warehouse (EDW) initiative (with fixed data sources) to a fragmented network in favor of strategic partnerships with external data sources. This phenomenon causes ripple effects throughout the old data logistics network. Data-as-a-Service (DaaS) at its core is way to address this problem of fragmentation.
Behavioral Politics around DaaS
In many organizations the individual who owns the data has power. They can determine who is in the know, and in addition they can shape the “story”. One of the key benefits of DaaS is fast, low cost access to the data. Removing barriers to data access will impact the level of control/power of the current data owner.
So best practice case study inform us that a DaaS effort focused on critical enterprise data must be a joint effort between business and IT and often requires senior executive (CEO?) support to get over ownership issues. Senior level engagement is typically driven by ROI business cases and this may be part of an engagement or offering.
The challenge for DaaS may be more around organizational alignment then technical deployment. One of the key drivers of a DaaS environment is the integration of data from multiple systems of record. Different systems of record are likely to have different data definitions and hierarchies. Metadata management and data integration services are important in this situation.
Data-as-a-Service (DaaS) is a combination of applications and technologies that consolidates, cleans, and augments source enteprise data, and synchronizes it with all applications, business processes, and analytical tools. The target goal – significant improvements in operational efficiency, reporting, and fact based decision-making.
Domain Knowledge, Application Knowledge, People/talent, Processes, Technology Platforms are key requirements of DaaS strategy.
Obviously, the market leaders want to position ourselves to become the experts in knowing the underlying data so everyone else in the organization does not have to….domain expertise becomes really important here.
- See also: A Very Short History of Data Science - Good overview timeline that traces the evolution of the term “Data Science” and its use, attempts to define it, and related terms.
- Platform as a Service (PaaS) is being applied to Enterprise Data
- Data Virtualization is a pre-cursor to DaaS. Vendors include: Composite Software, Denodo Technologies, IBM, Informatica, Microsoft, Oracle, and Red Hat. Other vendors who fill pieces of the DaaS puzzle include Endeca Technologies, Gigaspaces, Ipedo, Memcached, Pentaho, Quest Software, Talend, and Terracotta.
- A variety of technologies comprise the DaaS category including distributed data caching, search engines, elastic caches, information lifecycle management (ILM) solutions, data replication, data quality, data transformation, content management, and data modeling.
- IT landscape has evolved into a complex arrays of different systems, applications, and technologies. This fragmented environment has created significant data problems that are beginning to impede business processes; reducing the ROI of Customer Relationship Management (CRM), Enterprise Resource Planning (ERP), and Supply Chain Management (SCM) initiatives; corrupting analytics; and costing corporations billions of dollars a year in rework. Improving enterprise data quality is a growing issue. This needs to be done in a coordinated fashion with the downstream and upstream data warehousing / analytical side of the business.