Mass Data Fragmentation: A Problem to Be Reckoned With

Jeff Byrne, Senior Analyst & Consultant


When it comes to the short list of IT priorities in 2019, data management is becoming a white-hot issue. In the era of big data and multicloud, organizations’ data is growing in velocity, volume and variety like it never has before. Whether your data is at the edge, on-premises, in the cloud, or somewhere in-between, it’s one of your organization’s most valuable assets, and yet one of the most difficult to control and manage. As a result, many IT leaders today are asking the same question: how can we more effectively manage our data to ensure it’s not only secure and protected, but also visible and accessible, so that it can be analyzed and used productively to help improve and grow our business?

Unfortunately, if you’re like most companies, the answer is not so simple. The reality is that companies’ data tends to be stored in silos across multiple locations. These can include:

  • Traditional use-case silos: Data stored in backups, archives and across numerous separate file shares. To manage these workloads, companies often use point-products that do not integrate and cannot be centrally managed, which makes data management even more challenging.
  • Secondary use-case silos: Where multiple copies of the same data exist for test/dev, analytics and backup scenarios.
  • Geographically dispersed silos: Data silos exist on-premises, across regional offices, and in different public clouds.

And the problem is not just that your data is siloed; large chunks are likely also dark, invisible to users and apps. As a company grows, data is accumulated faster than it can be processed and analyzed, and much of that data is buried away in storage systems and copies that are scattered around the organization; untagged, untapped and unrecognized.

Once data becomes siloed and in many cases also dark, it becomes a costly but largely invisible burden that can no longer benefit your organization. This problem is now becoming known as “mass data fragmentation”.

Though many firms don’t realize it, the problem is already huge: think of all the backup and archival copies, test/dev and analytics clones, and DR replicas of data that are routinely made in your organization. Based on Taneja Group research, companies tend to make 5 or more copies of a majority of their data, resulting in additional data stores that are 4 or 5 times the size of their primary data stores. As data copies proliferate, the problem continues to get worse. Who can possibly be aware and keep track of all this data, let alone manage and protect it?

Simply google “mass data fragmentation”, and you’ll get a sense for the scope and severity of the issue.

As companies move their IT assets from on-premises to one or more public clouds, data becomes even more fragmented. Data often becomes captive in the cloud, due to proprietary data formats and the egress tax that service providers impose. In a recent Taneja Group study, we learned that cross-cloud data portability and avoidance of lock-in are the two primary motivators for adopting multicloud storage, and yet a majority of companies are not achieving those benefits today.

Mass data fragmentation leads to several significant challenges, including a lack of data visibility, the risk of regulatory non-compliance, and exploding storage requirements. Together, these issues increase management complexity and cost and—worse yet—reduce a company’s competitiveness and ability to serve its customers.

Companies have attempted to rein in the problem of data fragmentation using various approaches, such as deduplicating incoming data and limiting the number and frequency of copies. But at the end of the day, each of these attempts has fallen short.

How Do We Rein In Mass Data Fragmentation, Before the Problem Gets Any Bigger?

How can companies overcome the challenges of mass data fragmentation, even as their data stores continue to grow exponentially?

Attempts to address mass data fragmentation using conventional methods and technologies have failed. What’s needed is a new, breakthrough approach, based on innovative technology that supports existing data management processes but is not constrained by them.

Just as virtualization helped address the problem of underutilized islands of compute, a new paradigm is required to identify, reach and allow organizations to benefit from the vast pockets of inaccessible, untapped and often invisible data that is trapped in their data centers and beyond.

Here are seven characteristics we believe you should look for in a solution to address mass data fragmentation:

  • First and foremost, look for a solution based on an architectural platform designed to consolidate data silos, make data visible and accessible across location and silo boundaries, no matter how or where it resides. One platform to access your data and related services across on-prem and cloud environments that can be managed from a single user interface.
  • Given the sensitive data, solutions to address mass data fragmentation should include data protection, security and compliance capabilities.
  • Find a solution that brings compute to where your data sits, both simply and cost-effectively. This will enable apps to run directly on the data, whether to secure and protect it or to analyze and extract value. To take advantage of continuing innovation, focus on solutions that are extensible to allow third party apps to fit into the architectural framework, and that enable your developers to write their own custom apps, all within the context of the same platform, UI and operating model.
  • Integrated machine learning capabilities are now table stakes, so that users can gain operational and business insights from the vast troves of data residing in different departments and functions throughout your organization.
  • Emphasize multi-protocol support, so users and apps can access data in its native format.
  • Look for offerings with proven data resiliency and integrity.
  • Last but not least, focus on mass data fragmentation solutions that enable simple, SaaS-based management.

Some exciting new solutions aimed at overcoming mass data fragmentation challenges are now emerging, which don’t require you to change your data collection, curation or management processes. Take a test drive of one or more of these solutions in your own environment to see how effectively they will work for you.

Published by Jeff Byrne

Jeff brings to Taneja Group more than 25 years of marketing and operational experience at a variety of infrastructure software, systems and semiconductor companies. He focuses on all flavors of cloud and virtualization technologies, and also covers the intersection of these technologies with various types of storage. Jeff develops and leads primary research initiatives to help vendors better understand market trends and customer requirements, and in response, to adapt their products, solutions and messaging to more effectively address IT buyers’ needs. Jeff advises clients on issues ranging from product and competitive positioning to messaging and go-to-market programs, and helps companies to work through challenging product and technology transitions. Prior to joining Taneja Group, Jeff spent five years as Vice President of Marketing and later Vice President of Corporate Strategy at VMware, a leading provider of virtualization, cloud and mobility solutions acquired by EMC in 2004. Earlier in his career, Jeff held senior management positions at DG Systems, Dataquest, MIPS, and HP. He holds a Bachelor of Science degree in Math and Computational Sciences from Stanford University and an MBA from Harvard.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: