← Back to essays

From Behavioral Analytics to Data Science

Great companies create great products. We admire Apple, Tesla, Nike, Netflix because of their great products. They are winning because they make sure that product innovation isn't reactive, but predictive.

The good news is, it can be. Tech exists to understand what users do (or don't do), before, during, and after they experience your product.

Astronomer is a means to that end -- but let's start at the beginning...

The USERcycle era

Before we were Astronomer, we were USERcycle, a platform for user analytics. We realized, however, that all of our customers had the same problem: getting data into their analytics platform was very painful. What if we solved that problem? We could be a platform for user event -- or clickstream -- data collection.

After all, user event data is has multiple uses and it's low hanging fruit. Everybody has a website and apps. Some companies have great product managers, most don't. And it's quite rare to see data well-used. We felt that it needed to change.

The first step was making user behavioral analytics easier. But that quickly evolved when our clients started needing access to more types of data, asking us to extend our clickstream infrastructure. We realized then that a component of our product -- Apache Airflow, which we used in a variety of ways -- could solve quite a few customer problems out of the box.

Airflow is a workflow management system created by engineers at AirBnB to schedule, deploy and monitor dependency-based data pipes.

We started by exposing Apache Airflow to our internal team to enable them to move more quickly. Then we thought, we work with smart customers. They have engineers, too. Could they derive value from Airflow? The quick answer seemed to be yes, but it was always in conjunction with clickstream data, not as a separate entity.

As we were building tools, we realized little by little that moving all data around is critical to product analytics ... and frankly, nobody is good at it. Suddenly, we were a total data hub with two offerings: a Clickstream module and an Apache Airflow module. We were becoming a platform for modern data engineering.

This trend continued: we stepped into different parts of the ecosystem to provide customers with value, and they pointed us to another problem. Every problem had to do with moving data. So the growth of the platform has been completely organic -- and will continue to be. Any tool is disposable; it's the data engineering capability that's not.

Currently, our organic growth involves further refactoring our Clickstream module to pure open source, and making it easier to deploy to a private cloud. And while we were using Apache Airflow to move data, that's not really where it shines -- there are better technologies emerging. So we are investing in Kafka to build out a next-level real-time ETL streaming module.

We'll innovate however we have to, so that others can do the same.

Our First Product, A Good First Step

Clickstream is now relatively simple. You might need a few minutes of dev time, but the setup is fast. It's a perfect first step for any company (and an initiative is fairly painless to kick off). Clickstream's sole focus is ingestion -- no transformation, processing or manipulating. Ingestion is the first thing to tackle because if you get down the road with logic too early, you won't have the right data, and the whole process will need to be re-worked once you do.

Of course, the ingestion market is more mature, so the window of opportunity will close sooner. But at that point, Apache Airflow and Kafka Connect will be accessible through the "command center." Companies at any stage can sign up and access one, two or three of these "modules."

You can think of us like a platform with multiple applications on top, which means we're going to use the same components to drive different applications and can freely share these (open source) tools. Then organizations will install one platform and get different tools. It's worth it, even if they don't use all of them at first, because deployment is so simple. And when they do grow, the tool's there. All they have to do is turn it on.

So while we're currently a great alternative for someone like Segment, we're built for so much more. And as we grow into more of a data engineering platform, we'll continue focus on whatever our customers need. It might be building out hundreds of clickstream integrations, but it also might be focused on operationalizing a vendor-neutral data warehouse or streaming third party data to a private data lake. It's all about being agile enough to give our customers what they need to be agile.

It All Comes Down to Agility

This agile, Lean Startup mentality has always resonated: build something small, measure it, learn from it, make changes. Lot's of people have read the book, but most aren't practicing it. And once the few who are doing it with data take off, it's going to be too late for everyone else. That's the nature of technology. Our tools are centered around that concept. We want people to be able to try new tools, and quickly.

I was talking to a friend about the need for this the other day. Let's call him Joe. He works at an enterprise company who just signed a three-year project with a top five consulting firm to build out their Hadoop data structure. But Joe and the ones who actually engineer the data know that's way too slow. The problem is, everything about their business will be different in three years. Not to mention, Joe has to deliver value today. So he has shadow IT building what they need, in an agile way. We want to make that easier, whether it has to be done in the shadows or in the light.

The thing is, we have to get people buy into the vision -- and when we find someone willing to think outside the box to create an excellent product, we equip them with the tools they need to keep on growing. Data engineering is required for any kind of business today, whether you're building a product or offering services. You need data to make decisions and provide value. But you shouldn't be spending your time building out a platform when that's someone else's core competency.

Agility for Us, Too

While we're enabling this agility, we have to operate in the same way with our products. Yet we want to maintain a reliable, rock solid core. Our solution is what we call the Houston API, written in GraphQL. Even if a module gets "retired," the API stands firm. It's the glue that holds the frontend and backend together, no matter what technologies we want to switch out on either end. Everything is interchangeable, but the data engineering hub remains strong. And customers can count on access to the technology they signed up for.

That too, by the way, was driven by customer needs. They wanted the option to interact with the API and ignore the frontend altogether, so we gave it to them. They can write their system to our systems and programmatically create pipelines. The fact that it's in GraphQL gives you tooling to tinker with it and explore it as a developer. That's a big differentiator from our competitors.

So Who Needs Astronomer?

Companies that are aligned and operate (or want to operate) with agility to do better analytics -- not just product and behavioral analytics, but also data science or more -- are perfect fits for us. Or, if you love this idea of data engineering and having "librarians" working in your org to decide where data goes, we're a good fit. Some companies are willing to spend millions to get the right data structure, but three-year consulting firms aren't the answer anymore -- the world if moving too fast. Those who want to iterate to success are a good fit.

Another way to look at it is like this: If you're dissatisfied with how you're doing analytics today or feeling pressured by a competitor and can't afford to wait three years, Astronomer can help. Quick innovation is something that many companies aren't equipped to do, especially in certain industries. Big consulting firms aren't equipped either. It requires thinking outside the box and solving a very real problem at a striking velocity.

— Ry