Mattermark, a popular startup providing data on private companies, is young and relatively small from an IT infrastructure perspective, but it's not immune from the types of operational inefficiencies that often plague larger, more established companies. And, like a growing number of those larger companies, Mattermark is using Apache Mesos, as well as the Mesosphere-developed
Marathon framework, to solve its problems.
The 2-year-old company, which runs on just a handful of Amazon Web Services instances and stores mere gigabytes of data, realized earlier this year that its IT operations had become untenable. One illustrative example of the problem has to do with data processing, specifically the hundreds of data mining, machine learning and indexing jobs that Mattermark runs on a daily basis, but that previously were launched haphazardly without much documentation.
"We would have lots of EC2 instances running things that are very important, but no one knew which one was running what," explained Samiur Rahman, a machine learning engineer at Mattermark. "Very problematic."
The company's leadership knew that if it wanted to fulfill its goal of being to go-to source for data on private companies, something had to give. "We know we need to scale in the next year or two," Rahman says. "So either we can stay in this environment and keep building on it, or we can start cleaning it up and move the to modern infrastructures that a lot of companies are making big bets on."
Scheduling away with Mesos
Mattermark decided several months ago to re-architect its infrastructure with Mesos, and it had some specific requirements in mind for what a new system should be able to provide:
- An abstraction layer between developers and the company's AWS instances.
- The ability to distribute jobs across AWS instances.
- The ability to specify how much resources any given job needs.
- Fine-grained control over the scheduling of jobs.
- Resource isolation to prevent the noisy-neighbor problem.
The company now runs
Chronos and
Marathon on top of Mesos and has seen great results. Mattermark has been able to schedule batch jobs in a controlled manner and better utilize its AWS infrastructure by intelligently packing different job types onto the same resources. One example is the ability to place low-power web-scraping tasks on the same instances as high-memory, high-CPU machine learning jobs.
"Something that can actually do this kind of resource allocation—allowing these workloads to coexist on the same resources— is very important," Rahman notes. It automates processes in a reliable manner and also saves money by letting Mattermark use fewer AWS instances for the same amount of work.
Compared with historical server-monitoring best practices, where companies would start getting worried as capacity crept toward 100 percent, "Our monitoring is different because now hitting 80-90 percent is exactly what you want," he explains. "That means you're using your resources properly."
Although some of Mattermark's workloads remain on "bare" AWS instances (i.e., not Mesos worker nodes) backed by Elastic Load Balancer, Rahman added that several tasks requiring high availability are already running in the Mesos-Marathon environment. As Mattermark rolls out new workloads and job types, they'll all go into the new system.
"We're pretty committed at this point to working inside of Mesos," he says. And because Mattermark has already captured most of its application environments inside preconfigured Docker containers, it's actually a relatively easy process to move from pure AWS instances onto a Mesos cluster running on AWS instances.
A "positive Catch-22" with big data
In the case of Mattermark, though, the real value is how Mesos will allow the company to scale its infrastructure along with its business. And vice versa.
Although its 150-gigabyte MySQL database would hardly be considered "big data," Rahman says that Mattermark plans to grow its size pretty significantly in the years to come. One obvious reason is that the longer Mattermark is in business, the more historical data it will be storing about each company in its database.
More strategically, though, Mattermark hopes to scale its database from about 1 million companies to hundreds of millions of companies across the globe. As it adds more companies, it also wants to add more types of data about each—employee counts, website traffic, funding info, news mentions and social media growth, for example.
"Our scale is more around the speed at which we keep our data up to date and the speed at which we can grow the number of companies in our database," Rahman notes.
He thinks Mesos can help out a lot in this regard, because Mattermark can easily incorporate new data-processing technologies such as Kafka and Spark when it needs to, and easily add capacity to run bigger processing jobs. The less the company has to worry about whether its infrastructure is up to the task, the more it can focus on accessing, analyzing and delivering the right data at the right time.
"We want the operations to be ready to scale to the point where we're getting more data and we're doing more with the data," Rahman says. "And we still want to be able to give our customers the results in the same amount of time that we can right now."
There's no need to fear Mesos or the Datacenter Operating System
Looking into the future, Rahman says he'd love to see Mattermark move its operations from open source Mesos to the
Mesosphere Datacenter Operating System, largely as a means to offload concerns over updating the software components and fixing bugs. While experimenting with an Early Access version of the DCOS, he was able to set up in 30 minutes a system that normally would have taken at least a couple weeks.
"Startups should know about DCOS because it makes operations so easy," he said. And they shouldn't be afraid to try experiment with it, or with the open source Mesos components, just because they don't operate at Yelp- or Apple- or Twitter-scale.
"The amount of developer freedom and operational efficiency Mesos has given us is worth any effort it took to re-architect our system," Rahman said. "As soon as you have more than one server, it's going to become worth it."