How to Collaborate Over Data

The proliferation of big data has opened the floodgates to more information on how we work, consume and play than ever before, and social scientists are finally able to test theories and hypotheses on real world data. However, with the exception of a few companies like Twitter that have created grants to provide research institutions access to their data, the majority of companies with big stores of data have kept a relatively tightfisted approach to data sharing. Keeping data in silos is holding back potentially groundbreaking academic research and limiting opportunities for private-public partnerships to move commercial innovation forward.

I believe there’s a very simple exchange that dictates why academic data partnerships can be so valuable. Companies have access to hundreds of millions of data points that they cannot possibly analyze on their own. Academics have the resources and ability to engage in this research but often lack access to interesting and proprietary data. It’s a perfect scenario in which businesses are essentially able to trade data for “Ph.D. horsepower.”

Partnerships between private companies and academic institutions are not new. Pharmaceutical and medical device companies have long partnered with biomedical engineers at universities. With the recent flood of data, partnerships need to increase dramatically. The benefits speak for themselves. Companies can utilize research to grow thought leadership, while academic partners get value by presenting the research at conferences and submitting it to top-tier journals.

The benefits are clear, so why have more companies not adopted a pro-research stance for their data? Because there are barriers—but they’re certainly not insurmountable ones. The challenges and solutions include:

1. It’s not easy: First and foremost, none of what is described above should suggest that it’s easy to form these partnerships. Academics and practitioners need to identify a potential project of mutual interest; then a research agreement needs to be structured, reviewed and executed. After that, the data need to be scrubbed of personal identifiers, shared via a secure server and often explained in painstaking detail to the researchers.

2. Data security: The second reason that these relationships don’t proliferate is concern around data security. Stringent policies around data use, stipulated in the research agreements, are necessary. Not only do personal identifiers need to be scrubbed from the data, but also client and location identifiers. There are also strict regulations around who gets access to the data, where it’s stored and how it’s shared.

3. Cross-Sector Collaboration: It’s rare that companies, especially startups, recruit academics into private-sector roles. This is particularly true in social sciences like economics, where tenure-track roles are considered to be the ultimate prize, and private-sector roles are a consolation. Researchers love to publish their work, so, paradoxically, these corporate and academic collaborations will likely flourish after the benefits resulting from these unions are better known.

4.Value proposition: There’s a concept in the management literature that suggests companies need to maintain a careful balance between “exploration” of new technologies and “exploitation” of existing ones. This research and development falls into the exploration side of the equation, which means that it can absolutely yield dividends but the returns are less certain and more difficult to monetize. Add to that concerns about who owns intellectual property as a result of any discovery, and it seems logical that companies might want to tread lightly.

5. Academic standards: The peer review process is an onerous one and with good reason. It is designed to ensure that only academic work meeting the highest standards is published. It requires solid data and a strong thesis, which have not permeated all parts of corporate America. There’s a tremendous amount of work required to turn a five-page corporate white paper valuable to the company into a full-length academic paper valuable to researchers, even if we ignore the additional requests that inevitably accompany a “revise and resubmit” letter from any journal.

While it’s clear that some obstacles exist to the creation of these academic-corporate partnerships, their value far outweighs the costs. As data warehouses grow larger and the supply of analysts in the private sector remains less elastic, some companies are already starting to take novel approaches to data collaboration. For example, Kaggle hosts competitions in which the company acts as a data intermediary, procures data from private companies, makes it available to data scientists and rewards the most predictive model. Hopefully, data sharing will not pave the way for thousands of new journal articles, conference presentations and books but will also allow innovative companies the opportunity to generate novel insight from their massive data warehouses that they can put into practice.