The Growing Data Opportunity
tl;dr Data silos are a problem, and the fragmentation of data is increasing. While there's a lot of value in data for analytics, BI, computer cognition, it's impossible to capture that value when your data is distributed across many SaaS services and databases. Astronomer, a cloud-based tool for building unified data firehoses, is our vision for solving the problem of data silos.
I'm old enough to remember a time when:
- All of my company's data could be found in a locked closet at the company headquarters (and that room was hot!)
- All of our applications read and wrote data to a single production database that was housed in that closet, and it was comprised of 100+ tables that were prefixed based on what application or business function that table pertained to (i.e. it was a fucking mess.)
Regrettably, some companies are still living in this reality :(
In the late 90s, early SaaS pioneers NetLedger and Salesforce challenged our views of where software resides, but Amazon AWS wouldn't arrive for another 10 years.
In 2004, the mobile revolution started (for me). The Palm Treo 650 was the first mobile device that I heavily used (it had a 0.3 megapixel camera, baby!). But it would be 5 years until the first iPhone, and a few years from there before mobile adoption was widespread.
As computer technology rolls along, sometimes a perfect storm emerges that catalyzes a major transformation. We're entering one of these eras now, due to a confluence of trends:
- Computers are everywhere (mobile, wearables, IoT) and as a result "normal people" aren't afraid of technology anymore. This grows the population of data-ambitious people.
- Cloud computing is entering a major deployment phase (AWS, Google Cloud, Docker). Data will be in the cloud and from there it's staged to flow more fluidly.
- SaaS/microservices proliferation is spreading data across organizational borders, into many data silos.
- Computer cognition is emerging, and machine learning is data hungry.
As a result, machine-to-machine data communication will skyrocket, and this will catch a lot of companies unprepared.
Computers are Everywhere
Because we all have super computers in our pocket, on our desk, and soon in everyday objects all around us, responsibility for information technology has become dispersed. People are taking responsibility for their own technology, and IT departments are willingly releasing power, especially as the "bottom-up" SaaS product strategy gains popularity (i.e. Slack).
As more organizations master their own data, the population of people inside organizations learning to wield that data will grow.
Companies are certainly becoming more ambitious with data but are the people being empowered? Most people don't have access to relevant customer data and if they do it's usually well after the fact.
Cloud computing
image source: reactionwheel.net
Companies are moving applications and databases to the cloud in droves. Cloud computing is in a deployment period.
For example, GE has already moved 1,000 apps to AWS and the question of moving the remaining 75% of their apps is one of when not if.
As more organizations adopt cloud computing, a side effect is that more of their data will be in the cloud, and this data is now staged to flow with more fluidity.
"Big data in the cloud democratizes the use of big data beyond large companies." — Peter Levine
Data in the cloud gives you the ability to give access to anyone. And if everyone has access to the organization's real-time data, the organization becomes more agile, allowing it to iterate more rapidly. This has an effect of flattening of the organization, fostering a collaborative culture, and making decisions quicker.
SaaS/Microservices Proliferation
We are moving towards a tipping point where more of an organization's data is generated and stored outside of the company than within, but this varies by company.
With SaaS, this effect is fractal. If you look inside the company that you think has your data, the majority of their data (which is really your data) is in their SaaS tools, and so on down the rabbit hole (a few levels, at least.)
And even within your organization, your developers are likely adopting microservice architectures, using "the right database for the job" (read up on Hadoop, MongoDB, mySQL, Redshift, Cassandra, InfluxDB — databases are exploding). Using the right database for the job is a technically sound decision but there are side effects. There can be several databases per application, and usually these applications will not share databases with each other (a concept known as an "anti-pattern").
This is a hard truth for some but a truth nonetheless: your data is becoming spread out and there's no going back to centralization.
Computer cognition
Finally, the most powerful trend that is really just starting — software can learn from data. Once we see self-driving cars in everyday life, we will scrutinize every job that humans do and ask ourselves "can a self-learning computer do this work?"
In the near future, analytics and business intelligence will be "table stakes" because computers are ready to do real work for us. And machine learning is very data hungry. It wants data from as many sources as you can provide. A lot of the time, providing more data is more effective than trying to improve algorithms. There is a hunger for all data we can gather.
"Just as all real-world workflows became software, all software will become analytical. Computers will have cognition; machines will have intelligence." — Zetta Venture Partners
Will you isolate?
You have a choice: be on an isolated island or put your data on the wire. It reminds me of those early days of the Internet. When I would have serious conversations with large companies on whether they really needed to have a website (especially the brick and mortar companies)—conversations that I often lost (but they all came around eventually).
I see the same thing now with data. Any company that creates value also creates valuable data, and the smart ones will be prepared to share it.
"Data is incredibly valuable. It helps create superior products, it forms a barrier to entry, and it can be directly monetized." — Leo Polovets
99.9% of companies aren't even close to being prepared to capitalize on this data opportunity. A good early move is to begin gathering your data into a flexible, real-time firehose that you can wield in an agile way.
This is why we're building Astronomer — to facilitate this vision.
— Ry
