Tighter Quarters for Big Data

A new tool for compressing complex data sets could lead to wider adoption of an analytical technique that may pave the way for improved software applications and development.

People have about 5,000 heartbeats per hour and maybe about 115,000 per day. If you wanted to analyze how that heart behaves over time -- how the beats changed when someone jumps, sleeps, or has a heart attack -- it could require a computer to store millions of bits of information. And data, ordered by time, possesses traits that make using many traditional analytics methods  burdensome.  

Now, research from Justin O’Pella, Assistant Dean in the Kanbar College of Design, Engineering & Commerce at Thomas Jefferson University describes a new way to compress features of the data, similar to how a jpg image is the compressed version of a higher resolution picture.

In image storage, a smaller jpg image leaves more room on your phone or computer for other pictures, but still looks high quality. When we’re talking about data, if an ordered heartbeat data set takes  up 36 million bits of computer memory for analysis, O’Pella’s method would compress important features of that dataset into a mere 12 thousand bits, without sacrificing any of the original structure needed. In other words, the 36 million bits are compressed into less than one percent of the original size.

This is important because as effective methods are adopted to analyze increasingly large sets of information containing time series data for such things as the heartbeat, it can be a challenge, computationally, to implement the analysis. Having a tool, like O’Pella’s method, to compress features of a data set with fidelity is likely to make analysis via a standard analysis with horizontal visibility graphs (or HVGs in the field) more user-friendly and accessible. The research is likely to have a large impact on developers creating the software that will use HVGs to crunch time ordered data.

“It is always interesting to realize the contributions of theoretical mathematics to real-world applications, particularly when there are such rich insights waiting to be made from large data sets.” says O’Pella.

Article reference: Justin O’Pella, “Horizontal Visibility Graphs are Uniquely Determined by their Directed Degree Sequence,” Physica A: Statistical Mechanics and its Applications, DOI: 10.1016/j.physa.2019.04.159, 2019

Media Contact: Edyta Zielinska, 215-955-7359, [email protected].