My experience is in web- sites/apps/services. From tiny personal projects to commercial apps running on 8,000 servers. If what you do is AI, ML, ETL, HPC, DBs, blockchain, or anything significantly different from web apps, what I’m writing here might not be relevant.
3/25
Step 1: Forget that all these things exist: Microservices, Lambda, API Gateway, Containers, Kubernetes, Docker.
Anything whose main value proposition is about “ability to scale” will likely trade off your “ability to be agile & survive”. That’s rarely a good trade off.
4/25
Start with a t3.nano EC2 instance, and do all your testing & staging on it. It only costs $3.80/mo.
Then before you launch, use something bigger for prod, maybe an m5.large (2 vCPU & 8 GB mem). It’s $70/mo and can easily serve 1 million page views per day.
5/25
1 million views is a lot. For example, getting on the front page of @newsycombinator will get you ~15-20K views. That’s just 2% of the capacity of an m5.large.
6/25
It might be tempting to use Lambda & API Gateway to save $70/mo, but then you’re going to have to write your software to fit a new immature abstraction and deal with all sorts of limits and constraints.
7/25
Basic stuff such as using a cache, debugging, or collecting telemetry/analytics data becomes significantly harder when you don’t have access to the server. But probably the biggest disadvantage is that it makes local development much harder.
8/25
And that’s the last thing you need. I can’t emphasize enough how important it is that you can easily start your entire application on your laptop, with one click.
With Lambda & API Gateway you’re going to be constantly battling your dev environment. Not worth it, IMO.
9/25
CloudFormation: Use it. But too much of it can also be a problem. First of all, there are some things that CFN can’t do. But more importantly, some things are best left out of CFN because it can do more harm than good.
10/25
The rule of 👍: If something is likely to be static, it’s a good candidate for CFN. Ex: VPCs, load balancers, build & deploy pipelines, IAM roles, etc. If something is likely to be modified over time, then using CFN will likely be a big headache. Ex: Autoscaling settings.
11/25
I like having a separate shell script to create things that CFN shouldn’t know about.
And for things that are hard/impossible to script, I just do them manually. Ex: Route 53 zones, ACM cert creation/validation, CloudTrail config, domain registration.
12/25
The test for whether your infra-as-code setup is good enough is whether you feel confident that you can tear down your stack & bring it up again in a few minutes without any mistakes. Spending an unbounded amount of time in pursuit of scripting everything is dumb.
13/25
Load balancers: You should probably use one even if you only have 1 instance. For $16/mo you get automatic TLS cert management, and that alone makes it worth it IMO. You just set it up once & forget about it. An ALB is probably what you’ll need, but NLB is good too.
14/25
Autoscaling: You won’t need it to spin instances up & down based on utilization. Unless your profit margins are as thin as Amazon’s, what you need instead is abundant capacity headroom. Permanently. Then you can sleep well at night — unlike Amazon’s oncall engineers 🤣
15/25
But Autoscaling is still useful. Think of it as a tool to help you spin up or replace instances according to a template. If you have a bad host, you can just terminate it and AS will replace it with an identical one (hopefully healthy) in a couple of minutes.
16/25
VPCs, Subnets, & Security Groups: These may look daunting, but they’re not that hard to grasp. You have no option but to use them, so it’s worth spending a day or two learning all there is about them. Learn through the console, but at the end set them up with CFN.
17/25
Route 53: Use it. It integrates nicely with the load balancers, and it does everything you need from a DNS service. I create hosted zones manually, but I set up A records via cfn. I also use Route 53 for .com domain registration.
18/25
CodeBuild/Deploy/Pipeline: This suite has a lot of rough edges and setup can be frustrating. But once you do set it up, the final result is simple and with few moving parts.
Don’t bother with CodeCommit though. Stick with GitHub.
Sample pipeline: github.com/dvassallo/gith…
19/25
S3: At 2.3 cents per GB/mo, don’t bother looking elsewhere for file storage. You can expect downloads of 90 MB/s per object and about a 50 ms first-byte latency. Use the default standard storage class unless you really know what you’re doing.
20/25
Database: Today, DynamoDB is an option you should consider. If you can live without “joins”, DDB is probably your best option for a database. With per-request pricing it’s both cheap and a truly zero burden solution. Remember to turn on point-in-time backups.
21/25
But if you want the query flexibility of SQL, I’d stick with RDS. Aurora is fascinating tech, and I’m really optimistic about it’s future, but it hasn’t passed the test of time yet. You’ll end up facing a ton of poorly documented issues with little community support.
22/25
CloudFront: I’d usually start without CloudFront. It’s one less thing to configure and worry about. But it’s something worth considering eventually, even just for the DDoS protection, if not for performance.
23/25
SQS: You likely won’t need it, and if you needed a message queue I’d consider something in-process first. But if you do have a good use case for it, SQS is solid, reliable, and reasonably straightforward to use.
24/25
Conclusion: I like to seperate interesting new tech from tech that has survived the test of time. EC2, S3, RDS, DDB, ELB, EBS, SQS definitely have. If you’re considering alternatives, there should be a strong compelling reason for losing all the benefits accrued over time.
25/25
No matter your experience level, AWS takes time to get right.
With @getRender, I just push to @github. That's all. Performance is great and it's super affordable
Good thread. With infinite options to use this or that configured this or that way it's useful to get a high-level outlook from someone who's experienced.
I think that would be more trouble than it's worth. If the app can run on multiple servers, I'd rather use small instances, and just bring up a few more based on a schedule. But in general, for production, I prefer not scaling down at all.
I long ago figured out that ElasticBeanStalk + RDS + Route53 + Cloudfront was the easiest to get a simple project going. I hate managing EC2 instances and VPC manually. What do you think about ElasticBeanStalk?
I have no direct experience with it, but I don’t see any downside, except maybe that you might be locked into the runtime versions they support. I agree it simplifies the setup a lot.
Yeah you do get locked to specific runtime versions or you have to roll out a docker instance (which again, more setup required). I did run into a few really weird bugs when trying to delete elasticbeanstalk stuff.
Short version: if you manually delete anything created by elasticbeanstalk (VPC, EC2 or RDS instances) you can't delete your elasticbeanstalk, it will remain in a "zombie" mode
What do you use for deploying? We use CodeDeploy on petition.parliament.uk but I never feel totally comfortable with it when we're being hit by 100k+ active users and one at a time takes for ever when you have 32 servers.
I use CodeDeploy too. At Amazon we used it for deployments to thousands of servers. You can change the deployment preferences to deploy to more than 1 instance at a time, either as a % or a specific number: docs.aws.amazon.com/AWSCloudFormat… ...
You can also check what deployment stage is taking a long time to execute. There could be something that can be optimized. My deploys take less than a minute on each server.
We're somewhere around 2.5 minutes per server - a lot of which is connection draining and installing dependencies. I guess building custom AMIs would be an option using something like
Seems like a good use case for Docker, despite Daniel's advice. Building custom AMIs is a step above modifying instances during deployments, but both are brittle and inefficient compared to Docker-based deployments, which would reduce your rollout time from 80 minutes to <10.
Yes, we're currently reworking our architecture and looking at using Fargate and Aurora after having a chat with AWS. Traffic to the site is very sporadic - 99% of the time it's at 100k active users per day but jumped to 6m at the time of the Revoke Article 50 petition.
I understand the approach of using the things that you know, but Lambda is a great technology. The fact that I don’t have to worry about money, scaling, OS updates, runtime updates, security (mostly), ... is just too big of an advantage not to use it. And I can still test locally
I know Lambda pretty well. I worked very closely with the team that built it. I agree it's great tech, and I am optimistic that abstracting away the OS is the future of compute. But the Lambda of today has too many restrictions to make it worth it (for the things I work on). IMO.
No file system access beyond tmp, no stateful web sockets (have to persist state in DDB, requiring 1 read and 1 write per message — plus handling new complex failure modes), the 15 min timeout (mostly related to the previous one), hard to send telemetry data async, ... [cont]
... the 250MB bundle limit (requires convoluted workarounds), no sticky sessions (calls from same user going to same proc). That's just off the top of my head, and just things related to what I'm doing.
I can’t relate to your use case, but replacing traditional LAMP stacks with Lambda+APIGW+SNS+SQS+DDB has proven to work for fragment the price compared to running EC2. My idea is to put anything into Lambda and if it doesn’t fit >> ECS Fargate. [cont...]
But the main issue in my case could be that I have no one to look after the setup. No one checks CVE vulns for OS, checks under/overprovisioning etc. All I have is a couple of CW Alarms and that’s it. Maybe it’s all about the solution architecture? Event-driven and EC2 don’t fit?
I believe the cost savings in the low end. But cost-optimizing ~$100/mo is not my goal.
As for CVEs, I just run “yum update --security” in a cron once a week. If you do that, your patching will be more frequent and robust that Lambda's (wink!)
Beware about scale: ...
Don't be fooled by Lambda's claims of capacity management. You still need to monitor your invoc & conc rate, & request limit ⬆️ when you get close. A single small-ish EC2 instance can give you more capacity than Lambda's default capacity, and IMO EC2 capacity is easier to monitor
extremely slow iteration speed, 10 years of partially undocumented behavior on the EC2 machine, broken environments need replacing and can't be updated, need to set up a 0.0.0.0/0 -> nat routing table manually when unchecking allocate me EIPs option...
Lately the new docker images have also failed to start sshd when a key is set, so you can't even check what part of the machine didn't match up with your expectations.
The unit it scales on is also "1 EC2 machine". There's a reason we moved to DIY Kubernetes years ago.
The update icon spinner is also unbalanced. And you stare at it a lot.
(We run a few services that need extra 9s because they are prerequisites for other things working, think dockerhub mirror and ACS, but we've gone and make those EB via CF now so nobody has to see the spinner)
About sums up my experience the last 8 years. Disagree w both Docker and Lambda comments, as both solve your requirement about running locally being priority 1. We’ve bootstrapped several businesses w @goserverless & RDS w crazy easy velocity & it’s so easy
Fair point about Docker. In hindsight I should have probably not included it in the list. I still feel that the advantages are not that significant in reality, but I guess it depends on how complex the runtime env is. ...
About Lambda, it’s not that you can’t run the code locally—but the fact that it forces you to break up your app into methods/jobs/functions, which is what makes it harder to run & debug locally. There’s a big benefit in having all the app functionality running from one process.
If you’re working raw then yes. Especially in static/compiled stacks. But @goserverless + Serverless offline plugin completely addresses that to be like any other single process. Plenty of others do the same across most mainstream languages
It’s totally true that you still have to learn the quirks of lambda, and APIG can do crazy shiz until you learn the ropes. It’s no silver bullet. But I’ve watched so many miss on simple and insane time to market because they only look at raw lambda
With SQS, I think it depends on your needs and experience... the failure modes in SQS may be surprising initially, and you have to have a plan to deal with those cases. Lambda + SQS is actually really nice depending on needs for certain work loads.
Oh, I simply meant just queuing messages in memory (in a linked list, etc). Obviously if persistence/durability is required, then SQS is a decent choice.
I made that comment when I was still comparing with Lambda in my head. ...
... With Lambda it’s common to use SQS for anything async even when durability is not necessary, either because of function timeout concerns, or concerns about cost from long execution time.
If I were to start from scratch, I'd give the CDK a go. I agree it's much more appealing than raw CloudFormation. Doing something as simple as lower casing a str is a nightmare in CFN, but should be straightforward in CDK. I did not use it yet, but one day I'll give it a shot.
We found CloudFront had a huge impact on performance in our use case - people downloading files from different parts of the world. You won’t notice the performance hit if you are based in the US without a CDN but drop yourself in NZ and you feel the pain. Pretty easy to set up.
Speaking of CloudFront, what do you think about hosting websites on S3+CF?
I've found it to be a rather easy setup which takes care of all your website hosting needs. But I'd love to hear options and opinions.
Giving up relational capabilities is also a huge self imposed constraint - one could do well without, just like not choosing lambda or Kubernetes.
And redis brings enough chops in itself, write to disk and in interface with Postgres.
Hmmm... an example please. Aurora gives you faster failovers, better performance, elastic storage, 15 replicas, higher durability, better caching, etc. and full protocol compatibility without no issues. No comparison.
I agree 100% about the 'storage dichotomy' - try really REALLY hard to use S3 ... until the 'NoJoin' world (aka NoSQL) gets you down, then RDS all the way!
This is just wrong. We use Aurora Postgres and it easily handles the burstiest workloads with millions of monthly active users. It’s battle tested and works great.
Here’s some evidence of the kind of problems I describe: reddit.com/r/aws/comments… — There’s no comparison in test of time fitness between Postgres & Aurora Postgres.
A bit maybe, but not as much. There’s a local version of DDB that’s quite good: docs.aws.amazon.com/amazondynamodb…. But recently I started using the real DDB even for dev, with table names namespaced per developer (all devs sharing the same AWS account).
DDB seems unsuitable for all use cases I have encountered because you have to overprovision by a large factor if your access to keys is ever not uniform
1/2 Only one side of the Medal: CodePipeline doesn't integrate with CodeBuild very well. No .git context, pipelines can't be canceled programatically or manually. No post deployment hooks available so it can results in broken deplyoments.
2/2 If you configure CodePipeline with CodeBuild only one system can use webhooks otherwise two builds will start. This results in broken branch "badges" because CodePipeline doesn't communicate the correct "badge" state to CodeBuild.
In fact you can’t even use badges if the pipeline is set up through CloudFormation. See “BadgesEnabled”: docs.aws.amazon.com/AWSCloudFormat… — There are many small annoying pitfalls like that.
Do you know if there is a good guide to using Route53?
I tried once for a static S3 website, and I couldn't make it work.
I haven't had similar troubles with other web hosts and DNS management tools.
Agreed - even if you do not need to autoscale, having the "desired", "min" and "max" set to 1 is extremely useful. Whenever the instance is unreachable (i.e, your health checks fails), it'll automatically try to replace it. LOVE that!
You should try aws cdk, scripting complex infra is *significantly* faster.
Specially using typescript. Also, scripting simple infra becomes a trivial task. docs.aws.amazon.com/cdk/latest/gui…
Terraform > CF.
It works with many more vendors so you can use the same language to automate your EC2 server creation, ALB, etc and your New Relic, DNS solution (cloudflare, etc) if not R53, logging solution, etc.
I was resistant. CF was in the works then TF was brought up and I couldn't deny it.
Having a modules registry makes it even better. The community builds solid modules to help decrease the learning curve; that can be good and bad since you should know what you're creating.
Experiencing this right now in a project. Tests can not really test all aspects of different aws products I am using so I have to deploy and then manually test everything. Cloud formation is also remarkably slow and takes 20-25 minutes on each deployment.
Just curious, how current are you with the latest tooling around Lambda & API Gateway? It's super easy to run a local dev environment and test Lambda functions, whether they're regular funcs or behind an API endpoint.
In my extensive experience using Lambda, the biggest pain point is having a place to centrally log everything from various functions so it's easy to track down errors or issues. I'm experimenting with a system using CloudWatch where every operation starts by logging to a..
Not very current, I'll admit. But I'm reasonably up-to-date on the limits of Lambda & API Gateway. The challenge you're facing with logging is one example of battling the environment.
Another one relevant to the work I'm doing is WebSocket support. ...
Sure, it's technically supported, but one of the benefits of WS is that you can have a stateful session over a connection. But a separate Lambda function gets invoked for every message, so if you want to keep state you have to save it & fetch it from DynamoDB — on every message!
disagree with this point. Serverless is the future and the tech is improving every day. Look at JAMStack and Gatsby and you will find examples of easy debugging / CICD.
I agree it’s the future. But is it the present? JAMStacks are good for simple things, but there’s a limit (right now) on how much you can do with that approach. (I’m working on something to improve that 😁)
It runs—but the Lambda environment is significantly more restricted. You can only run something for 15 minutes, filesystem access only for temp files, low network bandwidth, hard to use a cache, buffer work across invocations, etc, etc.
It does depend on the project. There are limitations for sure. Start up time could be a major issue for some. I just want to say it’s very easy to have a basic api without things like cache or file system etc to work cross platform and in Lambda without code changes.
I run “yum update --security” in a cron once a week. I make the script sleep a random amount of minutes before running the update so that multiple hosts won’t update at the same time. If there’s something really urgent, I run it manually. (I use the latest AmazonLinux 2 AMIs.)
I run “yum update” automatically on instance start using a UserData script: github.com/encrypted-dev/… — And if the instance doesn’t pass the health check, it won’t go online.
CodeDeploy has an option to deploy to “outdated instances”. I just run this when the host boots up in the UserData script: github.com/dvassallo/gith…. CodeDeploy is configured to discover instances based on their tags, so the instance gets “registered” immediately.
Except, of course, the cost of maintaining and securing a Unix machine. Even if you pay that cost in time spent, you still will be paying it. Also if this is what you use AWS for, then Linode is far simpler and cheaper.
Lambda can be a pain, but often worth it. Separate your function logic from the Lambda API interface in/out for easier testing. Use the max memory and time options, you generally don't save by not using max options, and often will cause other issues down the road.
Not just ML or AI or ETL or HPC or any other profitless projects, these 25 steps are good only for small - medium web apps. Big web apps need microservices because of heavy computations all over the place. Web app can be quite heavy because it sync data with external third party
I disagree. Or maybe we disagree what “big” is 😁 — Microservices might be necessary to scale the tech with the amount of people maintaining it. But I pretty convinced that app utilization scale is not increased by microservices.
big as in "1 server to handle the website", "1 server to handle CRON & daily computation", "1 server to handle requests from Android/iphone", and maybe 1 server for the bots, haha, also you have many devs with different skillset, so each use their language, combine it with SQS.
You don’t need microservices for horizontal scaling though. I ran monolithic apps on thousands of servers. You just host the same app on a fleet of servers and put a load balancer in front.
well yeah, but horizontal scaling just adds extra bother and maintenance and extra configurations, might as well just vertical scale the machine to be agile and survive. But to be frank, if "the market" isn't big, no matter how agile or efficient you are, it will still go zombie
the big problem isn't agile or survive, or efficient. The problem is more basic. The market isn't there. Which is why some use bots to jacked up their numbers. You need to go global with web these days, cannot just "one country", i learned this the hard way.
thousands of servers? Well tell you what, i made a maternity ecommerce that integrates with all marketplaces&warehouse system in Indonesia w/ promo, sms, email marketing, combo, mini store&brands inside it...alone, it handle thousands of visitors. Can you go more agile than that?
Now despite all that, i insist everyone else that try it will ends up as a zombie because the market isn't there. Unless you want to be a professional scammer, best to think an idea that go global. One country only = dead. The market is not big enough, even if you are "agile"
So it is not about agile or microservices that is the problem, it is more basic. It is the market. Telling people to ignore microservices can be the wrong advice because they might want to skip the monolith part and just go microservice or serverless.
The problem with monolith, is that it's only good for the "cheap and agile" early phase. When you become bigger, you go microservice, but to be frank, never was a fan of it. I think people should just skip it and go serverless. Cheaper.
if you combine all of it on 1 server and just scale that one server, it hits the CPU really fast. I tried it. It's really for "early phase" web apps. Middle and later stage, you still need to go microservices or serverless. Java for Big Data, Python for ML, etc. You need team.
In terms of learning and figuring stuff out on AWS do you think any certifications might help?There are a number of certifications and I wasn't really sure what I needed to pick. Objective is to learn AWS and be aware of all the things that you talk about right now.
I don't know tbh. For me personally, I tend to learn more/faster while trying to accomplish something. But everyone is different. I hear that AWS's certifications are quite thorough, but I have no first hand experience with them.
Karthik- Pure learning can be done without certs. However, they provide a structured learning approach if you're just starting out (Recognition in the job market is a bonus). I recommend to start with AWS Solutions Architect-Associate.
Personally speaking, I think it helps. I am not sure if it adds value having a certified title, but it helped me expand my knowledge on several AWS services which otherwise I wouldn't have learnt.
AWS Certified Solutions Architect - Associate will give you a bit of knowledge on most AWS services and security/networking. Knowing what most of the services do can help you pick the right tool for the job.
You don’t need to know each service in depth. Knowing what’s out there is half the battle.
e.g Simply knowing something like Athena can help you query csv files in S3 is useful and if you then decide to use it, you can do a deep dive and learn it in depth.
Architect associated is the most popular one. If your role is mostly business oriented start with the Cloud Practitioner one (it's not difficult to pass at all) and if you want to mark the difference in your resume do the professional level (challenging, even with experience).
To learn about the services this book is awesome: amazon.es/Certified-Solu…. But if you want to stablish a solid architecture knowledge try to assist to the Architecting on AWS course... I love the current version (I deliver most AWS course as part of my job).
Route53 and hosted zones etc Terraform does a much better job than Cfn imho. Also while definitely having limitations.
While there’s definitely a learning curve for lambda development, once you’re past it, I don’t think it’s a bad experience. Serverless Framework does a lot.
Personally, lock-in with AWS is the least of my worries. What is the worry? That they go out of business? Or they discontinue a product? Or they raise prices?
I think there is more chance that I get hit by a comet vs that happening 😁
Maybe the only downside of lock-in is a new vendor offering something better and not being able to migrate easily. But that’s a risk I’m willing to take.
Personally dealing with hybrid clouds.. on and off premise.. so some of these features will restrict what I can do if I cannot easily replicate on premise , but I guess that's a subset of what is needed and can be managed thusly..
isn't this entire article is based on the premise that I am a startup and need to rush to market with a product in order to return 10X on a VC's money... ?
Agree! Sometimes in striving to be lean, we think we need to use the latest cutting edge tech like Lambda and API gateway but if you factor employee hours into the total cost of ownership, it sometimes becomes more worthwhile/economic to run a server and leave a little headroom.
I think they are helpful for building ops supports tools , like sending slack messages based on events. Though aws has come up with a new product for this.
Yeah precisely. Such as the example of responding to a slack command.
I said 'microservice' as a joke bc people throw that term around, but may have just made it confusing!
Fantasic read - I have just completed all 3 AWS associate certs and spent many hours contemplating this topic - Your experience and insight is really valuable!
This really speaks to how I feel about the AWS ecosystem, way better than I've ever managed to elucidate it. It's so much easier to just lament the mediocre (at best) docs of the various pieces of the AWS ecosystem, rather than the usual REAL issue - unnecessary overcomplexity.
I don’t agree with the point on ignoring docket containers. They are used to define how time app is built and in different ways based on arguments. I think their main value proposition is that they put developers in charge of coding build and deployment process.
Yes, Docker is the only one I’m a bit on the fence on. I see some benefits, and the drawbacks are not huge. But the last time I tried it I realized that my EC2 image is already a decent container, and having an extra layer was probably not worth it.
Single-container VMs are underrated IMO. It's much faster than baking new AMIs for each release and decouple the system vs application configuration quite nicely.
Yes, if I were to use Docker that’s how I’d use it. Right now I’m not even baking new AMIs, but just add my stuff on instance start using UserData. Example: github.com/encrypted-dev/…
But I agree that if this became more complicated, Docker would likely be better.
I’m sure the stats are based on some real info, but IMO it is very misleading to interpret that chart as *you* will save money or become more agile. It’s just an ad after all.
Depends on what you do. If serving web pages, and each page view is 100 KB gripped, serving 1 million page views per day would cost $8.50/day. Doesn’t sound that insane for something so popular.
Parameter store of AWS Systems Manager.
It's a centralized hierarchical store for managing secrets or plain-text data. You can integrate it easily in CodePipeline or Cloudformation templates but be aware of versioning it doesn't always point to the latest version as expected.
26
Agree with most of this, but I would make a point that Kubernetes is now the standard base layer for "l7 apps". Spend a bit of time deploying a minimal Kubernetes cluster and launch your containers directly on it. Scaling comes then for free.
Terraform can be a good choice for the capabilities and its ability to configure stuff outside AWS.
But for me, cloud portability is the least of my worries. What is the concern? That AWS would go out of business? Or they discontinue a product? Or they raise prices? Unlikely 😁
Companies often want to have engineers focused on business features, when things like #microservices , #Kubernetes , #Serverless are pushing engineers to solve many technical and organizational problems.
Also check out @goconvox if you want to leverage lots of the good parts of @awscloud without all the complexity. Open-source and free for the first developer. You can get up and running with most web apps in minutes. Built by a few early heroku engineers.