AWS as a legacy

6 minute read

AWS has its long history as a public cloud — more than ten years. Over the years, about 100 services covering all possible use cases have been launched. The question is if and how much obsolete are some of the solutions offered by AWS? Modern technologies develop quickly, and the recent rise of the OSS business model has allowed many interesting free products to enter the market.

Some services are certainly in good shape. Simply because of their simplicity and irreplaceable — S3 for data storage, EC2 virtual machines, or SQS message bus. On the other hand, some technologies as it turns out after close examination can be replaced by modern and cheaper solutions. Recently, I partnered with one startup to optimize the costs of AWS infrastructure. It is a small company serving a huge entertainment website with UGC (user-generated content). Of course, they collect and store the possible maximum of analytical data from the website and application. The primary long-term storage is, of course, S3. Analytics and processing performed with advanced AWS services.

AWS Redshift is a column database for the storage and analytics of big data. Amazon uses Redshift, combined with MongoDB, to migrate from Oracle DB solutions. The solution acquired many years ago, is based on a heavily modified PostgreSQL 8 kernel. It has a lot of significant installations where the number of nodes is more than 100 and hundreds of petabytes of data.

On the other hand, there is a modern, free and open-source solution created from scratch - ClickHouse by Russian internet giant Yandex. Initially, the database was developed as storage of Yandex.Metric service (a competitor to Google Analytics). As the volume of collected data grew, the cluster and the cost of the service scaled accordingly. One of the problems is that scale-out doesn`t guarantee an increase of performance accordingly, and vertical - shows even worse dynamics. So, the usage of the cluster’s storage does not keep up with the number of computational resources that are needed to process an increasing amount of data.

After a small test, it turned out that a cluster of 30 Redshift nodes can be replaced with R4 10 nodes and get a several times performance boost, while TCO reduced by 3 times.

Encouraged by such results, the company decided to conduct a full audit, after which it replaced all I2 nodes with I3, which allowed to reduce the cost by another 30%.

After it, management decided to re-architect as much as possible and to abandon the legacy wherever possible, whether it is AWS service or internal application.

During aggressive growth and limited operational support, managed services and PaaS can become a rescue and solution for immediate tasks, but in a few years, it can become a factor limiting the growth of the company or even an anchor pulling down.

A few years ago, when Netflix was substantially smaller, company speakers did a lot of presentations at various events and talked about its strategy of using AWS services. No advanced services, only basic and simple services, basically EC2. The reason is simple — giving control to someone for basics you stop seeing the forest behind the tree and perhaps begin to lose more than you acquire.

About five years ago, different vendors promoted to customers the idea of CoE (Center of Excellence) - a group of initiators and deeply technical specialists who would solve the problem of IT development, migration to the cloud, etc. This company decided to gather such kind of group once a year to audit the infrastructure and access what else to “throw out”.

It seems that history, as it should, repeats itself — 10 years ago we learned how to keep virtualization footprint minimal and effective, today it happens with clouds. And some companies, oops, are already in a similar situation but with container applications, which also gradually turn into legacy and had to be terminated.

Updated: