We had a unique webinar last week - "More than Functions", a discussion about the future of serverless and observability. We were lucky to have the best minds in serverless with us:
- Yan Cui, Principal Engineer at DAZN
- Jeremy Daly, CTO at AlertMe.news
- Corey Quinn, Last Week in AWS
- Ran Ribenzaft, CTO at Epsagon
The webinar is available online:
Below is a transcript of the webinar. Enjoy!
More than Functions - Serverless Observability Webinar
Shannon: My name is Shannon Brown and I work for Epsagon, and first thank you so much for coming to the Webinar. We're just over the moon that we have these three amazing panelists that you're about to hear their credentials and then you're going to care about me and it's going to go whoa, but we're really excited today. Today we're going to talk about More than Functions. A discussion about the future of serverless in observability which is what we do here in Epsagon. We're going to try and make it the least marketing Epsagon Webinars. So, don't expect a lot of talk about Epsagon, but more about serverless in general on the future. So, let me—There’s important people and when I do this, that’s because they're—I’m looking at them here. I apologize. So, the participants we had today, first, yeah, Yankee, Young, of course, anyone who knows serverless knows young, he's an AWS serverless hero. He's a well-known speaker and blogger on the serverless circuit. He's worked with AWS for 10 years and has extensive experience in AWS Lambda and production. He also provides content and has provided content for the serverless well architected white paper that's been published by AWS, and as the instructor of production ready serverless. You can find him in, and just so you guys know, I will share all of the finding these guys information at the end, but you can find him on twitter at the burningmonk or we just thought on serverless@the burningmonk.com. Say hi Young. There we go.
Alright, next we have Cory. He's a [hoot]. He's fun. Cory is a cloud economist who goes into companies and literally helps companies with their horrifying AWS bills. We're going to get to that a little bit later Cory. He publishes last week and AWS, which is great publication. I suggested. It's a weekly roundup of news throughout Amazon's ecosystem. He also is the host of a very entertaining podcast called Screaming in the Cloud, which features conversations with domain experts in the world of cloud computing in general, not just AWS lambda. You can find him on twitter @queeniepig. I will share it at the end and you can find them online at screaminginthecloud.com. Say hi Corey.
Corey: I'm a treasure and a delight.
Shannon: Absolutely, and Jeremy. I've had a lot of great conversations with Jeremy. I definitely go to him when I questioned, so for sure. Jeremy is a CTO at Alert Me with over 20 years experience in managing the development of complex web and mobile applications for domestic and international businesses. He has a weekly newsletter that I also get off by none. That focuses on the technical details of building applications and products in the cloud using serverless technology. So, he really does focus on serverless. He's very active and the serverless community and he writes a lot about serverless. You can find him on twitter @jeremy/daily and @jeremydaily.com. Say hi Jeremy.
Jeremy: Hello. Thanks Shannon.
Shannon: Ran is the CTO and cofounder at Epsagon. He's also a passionate developer. I can tell you that for sure because he's never sleeps, I don't think. He has vast experience in network, infrastructure and cybersecurity software. He loves sharing open source tools to make everyone's life easier. You can see that with some of his blog posts about cost. He's been on the conference circuit speaking about his passion on serverless and the future of observability and you can find him on twitter @ranrib and that's it. Say Hi Ran.
Ran: Hi, everyone.
Shannon: Finally, my CEO, and also the cofounder of the other of core product, Epsagon. He is also a software engineer, with 12 years experience in coding, machine learning, cybersecurity, reverse engineering. He is a serverless advocates. He also can be seen speaking at quite a few, sorry guys. Hello face. Quite a few conferences and meetups, and you can find him on twitter @nitzanshapira. Say Hi Nathan.
Okay, just a few things. Welcome to everyone else who's been listening. Don't worry if you missed it. I'm going to share the bio's and how to get in touch with these guys, how to follow them online at in check. So please be looking for that. So, just a quick format opener. Nitzan going to talk about five minutes about observability and serverless. He's going to err towards not talking about Epsagon, although I think he'll brag on the team a little bit. Then I will introduce each of-- Well, I've already introduced the panelists, but then I will go to questions with the panelists. I'll ask questions to specific panelists after they speak for a little bit of time. Then the other panelists will be able to join in. The audience will be muted, but if you do have questions, please chat them in the chat. I just talked to the team that's on here and are willing to answer questions afterwards if we didn't get to them. So, if you would like, I won't send you marketing email or you can black mean like everyone else, but if you would like to leave an email address or you want to email me with a question that's specific for somebody, I’m more than happy to pass the information on. If we do have time afterwards, we will open up for Q and A and then we'll close it out. After we finish the Webinar, we'll go ahead and going to record the session. We'll share with you where the session is, if you have to miss part of it, or if you wanted to share it with somebody else because these guys are amazing. We would love that. So, I think I'm going to hand it over to Nitzan. Nitzan, your turn.
Nitzan: Thanks Shannon. Thank you all for joining and mostly thank you Corey, Jeremy and Young for being here. We really appreciate it and I'm sure everyone would be interested to hear what you have to say. I'm just going to do a quick overview and the background to the session. Not going to be too long and so, I'm just going to share my screen, before I start and just a few words about Epsagon, and our team. So, Epsagon was founded by me and Ran. So, we are the two guys on the top left. We came from a background of cyber security and network operations and so on. We started a company about a year, a year ago. What we did is basically researching the serverless domain, speaking with people and finding out that monitoring and troubleshooting, and what they call today, observability is really very big problem, and we've been funded in January at the team, currently has 11 people, including experience from u relic and ab dynamic. So, we bring this knowledge of both deep cybersecurity and technology together with the APM, and we just launched last week, so we're probably saw it on tech ranch, and let's talk about what serverless is. So, you know, serverless is not really a, has a specific meaning. Everyone defines it differently. So, what I think services is a mix of both managed, compute and managed services that are tied together to create an application that is managed. So, you don't need to manage the infrastructure and you can focus on your business logic, and you can see the function as a service such as AWS or google functions, and of course, the managed services that are very, very diverse and are all connected together so you can actually focus on your business logic without worrying about the infrastructure. The benefits are of course the paper use. So, the silver authorization is usually very low, and using serverless, you can kind of just pay for what you're using. The main benefits in my opinion, are actually the auto scaling and the lower need for operation. So, the infrastructure is managed by the cloud provider and you can focus again on your application and iterate faster and this can actually give you an advantage in today's market. The adoption, you know, just looking at their recent survey, all the metrics show that it's growing about two weeks compared to last year and, I mean, I don't have to tell you, you can look at the conferences. It's everywhere and it's growing very fast. However, it's still quite in the beginning. That's why these discussions can be very interesting. The problem is looking again at the famous, yarn drawing. I don't think anyone actually had a better one saying it is like, so, everyone was like, okay, this is done. No need to make another one ever. So, you know, the problem is basically, people think about functions and how easy it is. You just deployed a function, it's working, but when you actually connect it to an application, this is usually quite complicated, and you can end up with a very distributed application event to event, a lot of components, and then when you think about how to monitor this thing, how to travel to these issues, how to understand performance, this can be very complicated. So, this is where the actually the problem comes from. You have this and you have those unique problems to servers. So, you'll get time out to get out of memories that costs can actually go up and you don't even know. So, these are some things that are new in service. So, combining both creates a big challenge that even in the last service is very dominant.
So, devalue monitoring are the top two in this year survey and also in the previous years. So, it's definitely not getting better because the application is just get more complicated, and that's why these problems are so big today, and just when we talk about APM or observability or whatever you call it, eventually as a company, people need the same things as they used to need in any kind of application. They want to travel through issues fast. In serverless, every small problem can become a big problem. So, that's why troubleshooting becomes one of the first things people notice and then you need to identify the serverless unique problems. So, what about those times, those of memory, maybe an API that you're using can cause your alarm that timeout, these things did not exist in a server-based environment, and then the cost is another issue that people have become a bit worried about.
So, how do I know what's my bill is going to be, and how do I monitor this in order to enjoy the benefits of the paper use, and just one last slide to say why we actually started the company and why we think the existing or the traditional tools did not fit.
So, many of them are just not distributed. So, they just think about sir about serverless as, okay, I had something for a server, let's try for a function, and this works on a certain level, but once you gain some complexity, you need a distributor tracing solution. and this is true for microservices and containers as well. The second thing is some of them are very, how they focused on agents, which cannot try in the function. So, if the agent has all this logic inside of it, you cannot just take it into a function, it doesn't work.
Another thing is the automation. So, when you have hundreds or thousands of functions. You want to move fast, you want to iterate on the time being automatic is, in my opinion, very, very critical, and this is something that we have on our focus, and again, making sure you take care of the unique problems or sell though less to kind of provide whatever, monitoring you can expect. So, you will be ready for any problem that you have. So, this is just like the way we see the world of serverless and observability and I'm happy to hear the panelist with probably much more interesting things to say. Thank you.
Shannon: Awesome. Thanks Nitzan. I really appreciate it. Okay. So, I think it's time to go to the panelists. So, the first question that I want to ask everybody is really just to give context. You know, we're here to talk about observability and the future of observability and serverless, and so what I think would be important is that everybody understand what each one of you thinks observability is. So, if you don't mind spending maybe a minute or two just briefly letting us all know what you think observability is, and let's go ahead and start with Young. [laughs].
Yan: I guess for me it was just the. For me, the difference between monitoring and observability is that with monitoring, you can look in out for those things that you know can go wrong on your system, but once you knows something's gone wrong, how do you then actually figure out what the problem is, and this is especially difficult when you've got a massively complicated distributed system, where the end result, I can observe, could be say user getting a certain response time, but then to the problem could be hidden in so many different layers, in the coaching, and having a good observability means that you can then ask questions that you didn't know to ask you before and think deeper any arbitrary problems that can come up in your system.
Shannon: Nice. Alright. Going to be hard to be. I'm kidding. [laughs] Not a competition. Corey, your turn. What is observability to you?
Corey: Fundamentally, it's a buzzword to some extent because if you call something monitoring in 2018, you're going to have a hard time selling it to people who think of it in the same way. I mean the challenge you see with serverless is people love conference where you get on stage and start talking about how you're breaking a monolith into microservices, because fundamentally you want to turn every outage into a murder mystery. There's no good way to trace this back in the traditional sense. There needs to be a new set of tooling that winds up providing visibility into the ridiculous monstrosity of a labyrinth that you've built that no one can fit in their head at once. It opens up a whole new realm of, I want to say failure modes, that break in new and exciting ways and caused you to tear your hair out, and always play the game of, is it me? Is it the provider? Is it something else that I can't control? It's just a continuation of the old saw that everything is terrible, but now we have new ways of making things terrible.
What observability brings is letting us know exactly how terrible they are at any given moment.
Shannon: [laughs]. Alright. I love it. [laughs]. I hear you. Ron, what do you think of observability is to you?
Ron: So, I’m really relating to Young and I really think that being able to identify problems efficiently and quickly as possible is the key thing that's going on with zip ability. It's definitely an overused word, like everybody can do it on security, can hear it on monitoring and so on, but the thing that in serverless things became much more distributed and then before and now identifying the issues which can be a code break, or even performance issues are getting to be really hard to detect or to understand like, their real root cause analysis and not just depend on a single individual function or a resource and it's to be able to tell the old story, the full picture or the bigger picture of what's going on with the action.
Shannon: Awesome. Great. Thank you. Jeremy, what’s observability for you?
Jeremy: Yeah, so I mean, I think observability is sort of one of those. It is a buzzword. I think Corey is right about that, but it's also an unintended consequence that developers and cloud architects actually brought on ourselves because we started with these monolithic applications and something broke along the way. You put an item in your cart, you go to the checkout, you put in your credit card number, it reaches out to the payment processor. It fails. Oh, there's a problem. Everything stops, and then we said, well, yeah, but we want to make distributed systems because maybe, you know, on black Friday we don't want our payment system overloaded. So, we go, and we build these distributed systems where we say, well, let's just say, they tried to make a payment. We'll throw that in some pipeline somewhere, and then you have marketing who says, “Well, we want to know that somebody tried to buy something because even if they don't buy it, we still want to know they tried to,” and then you've got your analytics team. So, you start sending these events and messages all over the place and it gets more and more complex, and there was no longer this clean linear failure path, and now just kind of spreads out everywhere, and so, if you look at how we monitored applications in the past, you try to sort of attribute it to what we do now we're trying to adapt it to what we do now. It gets much, much more complex. So, to me, the idea of observability is not just being able to see the event sort of flowing through the system, but also the idea of being able to do that tracing, do that, debugging a raised the issue of who's responsible as I think as Ron has said, you know, trying to figure out what do we do? How do we diagnose that problem? And I think that's what observability or these tools that are being built around service ability give us the option to do.
Shannon: Nice. Well, I'm actually going to stay with you Jeremy to kind of extend on that because you wrote in your serverless microservice patterns for AWS blog. It's from August of this year, and you stated that you're a huge fan of building microservices with serverless systems. So, I just wanted to ask you a couple of questions around that. First tell us why of course, and then second, you know, a lot of people are moving towards microservices for distributed systems. What makes serverless a better choice for these types of architectures?
Jeremy: Yeah, so, I think that microservices is something that I started embracing a while ago, and you again, you look at containers, you're look at Docker, you look at those sort of things that could give you the ability to start breaking up your application into smaller parts, but the issue you run into is that, microservices in themselves are these mini monoliths, right? So, even though they may have a number of different functions and they're all around you, you're splitting your organization into seems, you're still-- You felt to scale that one unit and that one unit is often complex. It's got databases behind it, it has, you know, a number of separate teams that have to run or whatever to make something happen, and the problem with looking at a traditional microservice is in order to scale that up, whether you're going horizontally or you're going vertical, you have to scale the entire microservice. You run into problems with there is, and again, this is the worst example, but everybody uses it. The image processing example, is to say that maybe you have a part of some service that has to process a massive amount of images. If you need to scale up your entire service and you don't just have a single image processing service, then you have to scale that entire thing, which is overkill, and again, is sort of a potentially a waste of money. It's over provisioning and you don't need to do it.
It's really interesting when you start looking at serverless functions is, now you're looking at Nano services, and I always, and some people talk about Nano services being their own separate things, but I look at them as Nano services being services within microservices. So, you may have a small component of your microservice that needs to process millions of images or something like that. With serverless, those functions, the individual functions, the few functions or the single function, the database, the managed services that are attached to it, those can all scale independently of the rest of the microservice. So, now you could have a microservice that is going to scale as much as you need it to, but you'll have individual parts are those national services that will also scale, and so, you're basically perfectly provisioning your resources and your compute times, and all that kind of stuff is now sort of perfectly scalable for what you need to do. Of course, there's limitations and there's all kinds of issues. You know, that you're going to run into, you know, with regards to scaling some of those things and backend services and so forth, and what you might need to do with throttling, but overall it's a much better approach in my opinion than to just trying to say, “Okay, we built this small little monolith that does our billing system, and now if we get a lot of activity on black Friday, okay. Well, now we have to scale it out to 100 containers or thousand containers or whatever it is.” In order to do that with service, you might say, “Well that service can just scale because the actual payment processing function, we can throttle, we can que, we can do anything we need to do to make sure that we don't overwhelm the backend systems.” So, it's just a much better approach in my opinion.
The other thing that I really like about this is the idea of tracking costs, right? I didn't mention that when I said observability but understanding how much you're being charged to do particular things, I think is a really important factor that is left out of the equation. So, you can go and see, okay, this function ran, you know, whatever 20 million times and it costs us this much and that's fine, but what we're doing actually at alert me is, we are processing articles from a publisher's websites. So, we have a service that goes and downloads those articles, and we pay a certain amount of money for that processing piece. Then we run it through the IBM Watson's NLU, which is their natural language processing, and that gives us back a number of compute unit. So, we know exactly how much that costs us to run that through NLP. Then we throw them into a queue, we run it through another bunch of algorithms if it doesn't pass certain thresholds, we have human curators that'll actually look and see whether or not the articles are high enough confidence rate, and then they'll go ahead, and they'll approve those or disprove them, and all of that information is tracked. So, we know that it takes a particular curator, you know, 5.6 seconds or whatever it is to review this particular article.
So, if you look at it from a traditional approach, we knew how much it costs to download the article. We knew how much it costs for that user to, I'm sorry, for the curator to review the article. We know how much it costs because we get the compute units from IBM Watson, but what we didn't know was, how much of our server resources are we actually using? Now we can actually say, “Okay, to a process, this article, this function takes 22 seconds or whatever it is, and then to do this piece of it, it takes this amount of time,” and while that may seem like getting really, really granular, for us, you know, we base a lot of the things we do on volume. So, we want to know if we download 50 articles a day from a particular publisher and X number of those have to be reviewed by curators because they're shorter articles and maybe they don't have all the data we need to make a good judgment, and that happens how many times per month? Third tests per month, whatever it is.
We can actually look at the cost now and say, “This is exactly how much it costs to provide the service right down to the compute level for this particular customer.” In our circumstance, it makes a lot of sense because we're working with a lot of large publishers. We have a distributed team. So, it's very helpful for us to have that type of information. Now, if you go back to the observability is of things though, that comes with a cost, right? Because now you've got these individual functions that are running, and if there's a breakdown somewhere in that, then somebody has to go back and look at the logs or they have to go and try to figure out what exactly went wrong. We get this all the time, where we have bad keywords are bad entities that get extracted. So, being able to go ahead and trace that information is much more difficult in these distributed systems. I don't know if that answers the question.
Shannon: No, it does. It's really helpful, and you know, we see-- And I'm going to actually ask a question about this afterwards, but does anyone else on the panel want to raise their hand in and jump in on this one? I think Corey, are you itching? Are you now? None?
Corey: Actually, I just want to about what Jeremy said, about the fine grade scalability that you can pinpoint the function or the metal that needed to be scaled and not just scaling a big monolith which involves with lots of procedures, most of stuff going on, so we can find that I can say from my experience at Epsagon, we’ve built some functions that are running millions of times per day and we've got some functions that are running 10 times per day because they just generating some reports. So, the ability to scale only this. Yeah, I know, three functions that are using really high-volume traffic and to understand and break down how much each one of them is cost. It is priceless. I don't want to scale like a random music two instance or any VM that I'm consuming lots of memory, a lot of CPU and don't understand how much each of my resources is handling with. So, it makes more sense to be able to get this fine grain of scalability to the resource that we need.
Jeremy: Yeah, and even just a little bit beyond that too, is it's not just the idea of fine grained, it's also how quickly it happens. So if you think about trying to, I mean, I still have plenty of implementations that I deal with that are using Ops works to scale VMs and if you need to bring a new one online, it has to run through all those chef scripts and all that stuff to bring that back up and you're looking at, I don't know, six, seven minutes before a new server comes online, and then the problem is, is that even if you just have to scale just back past that tipping point, now you bring on another M four large instance or something like that, that is completely overkill for that tiny bit of added service that we need to deal with, or the traffic that we need to deal with. Whereas with serverless, I mean it, you know, cold starts aside, which are fairly minimal in most cases. You know, you're, you're looking at three or four seconds Max before all of that extra traffic can start to be handled, which I think is a pretty amazing thing.
Shannon: Yeah. Question. Ron did you want to jump in or Corey before it hit the next question? Okay. [crosstalk] [laughs] So, Corey, this is actually a really good segue because I told you I'd get back to it that on your LinkedIn profile. It says it to your cloud economists who helps companies with their horrifying AWS bills, and so I, you know, we've also been running into, you know, customers, it had significant to cost spikes, you know, based on like a single bug. We call it the $50,000 bug. So, you know, not specifically to AWS lambda, but Corey in general, what are the best practices to make sure your serverless application cost is according to budget and you don't get an unexpected bill?
Corey: Cloud economist is one of those great job titles because it's comprised of two words that no one can accurately define it. Also, when someone walks up to you at a party and says, what do you do? And you say, I'm a cloud economist. Suddenly people realize they want to be anywhere, but in the conversation, they're about to have. So, it really cuts down on the meaningless Chit Chatter. I'm going to talk about cloud economics and even some of the panelists are already asleep. Right now, the way that we're seeing them to deploy it and almost every company, even the serverless first environments, the costs of the lambda functions, and to an extent the API gateways tied to it, basically, equate to a rounding error, even if they have no instances running at all, data transfer or storage costs wind up dwarfing this by orders of magnitude. If you see a company spending hundreds of dollars on lambdas, they're spending tens of thousands on other things. It tends to be the sort of thing that today is not a major cost driver. The future of this is fascinating where you can start attributing cost directly down to components of your applications and that's great, but the economics of it tend to work out slightly differently. People tend to sit there and do the math out. Okay. With a lambda function, if I'm running it for, throughout the entire month, what will it cost me ignoring the instantiation fees? They try and do an apples to apples comparison with EC2 instances, and they put a lot of work into it, and they come out with the wrong answer to a dumb question that nobody's asking. The problem that you see is that you also get a lot of stuff for free with lambda. You get auto scaling on demand that is perfectly suited. You don't have to consult the bones to wind up calculating out your reserved instance usage for the next three years. You don't have to wind up mucking about with a whole bunch of load balancers and things like it. It abstracts away a lot of the toil, the tedium that everyone loves to complain about over their eighth beer when we're crying about cloud.
The economic story as far as what to do is you're going into it right now, is almost irrelevant in the context that it's not a major cost driver. As workload shift, as people get more ingrained in this, that changes significantly, and the best thing you can do today to plan for that far off glorious future is tag appropriately. Make sure that you have a tagging strategy that works. Trying to go back later and figure out what drove what cost and why everything skyrocketed is challenging. Not just on a per function basis, but on a per service basis per component. Being able to answer the question later when you suddenly scale massively, and the CFO poops and abacus kicks the door off the hinges and starts screaming. It's never about the money that was spent. It was about the fact that it was unexpected money. If you want to enrage an accountant, a CFO, an analyst, whatever, start a sentence with the word ‘Surprise,’ and suddenly everyone winds up having a very unpleasant day. They don't like changes into anything that impacts what they do, because they're being asked to forecast 18 to 36 months out and almost every case, and doing that with something on demand, that spiky, being able to anticipate what that's going to be, is and will continue to remain a challenge, but building out a corpus of data historically to show, “Okay, we know that when we have X users, it costs us exactly Y cents to service them. With the serverless themes, you don't have the same fudge factor in quite the same way, you know, down to the very small fraction of a penny what each request costs to service and the economic model becomes more accurate as a direct result, so it's going to cost us X dollars, isn't an answer were ever going to be able to give, but it will cost us X dollars per customer starts to be something that informs models a lot more effectively than that.
Then it just becomes a debate over normalizing what a given customer behavior pattern looks like. I mean, the answer of course, long-term becomes that it's complicated. There is no silver bullet answer to this, and I'm somewhat unconvinced that there's ever going to be a one size fits most answer here, especially since you talked to five different companies doing interesting things with serverless, and you'll find at least 20 use cases that don't look like anything else. People are going in so many different directions with this, that right now it's a wide-open field. I think that trying to predict what patterns are going to emerge in a couple of years is a little bit of a fool's errand, but I do know that being able to have a platform that gives you a shared context to wind up doing these things is increasingly necessary and for better or worse, the native tooling offerings in this space are not super right now.
Shannon: Okay. Anyone else want to touch on that?
Corey: Fight me
Shannon: [laughs]. Alright.
Yan: When we’ve got behind you. That's not a fair fight.
Shannon: [laughs]. You're scared of the dog in.
Corey: That’s not an issue. She's not a real dog.
Jeremy: I was wondering how she's staying, so still.
Corey: That’s Taxidermy?
Jeremy: No, but I would say that think Corey is right. I mean, again, trying to get extremely granular or trying to figure out exactly what everything costs. I mean, anything that can help with the models. I think what you notice with serverless is that it is much more of a linear scale as opposed to this stair approach, which is the problem that you have with most like, you know, virtual instances where even containers to some to some degree. So, I do think that as companies can look at those costs and again, you know, you don't want to surprise your accountant as Corey said, but if you can predict with some amount of accuracy that it's not-- Okay, well, if we add a new server, we can add another thousand users and if we add another server, we can add another thousand users, where you can basically say, look, from a scalable standpoint, this is where it's going to grow, and even better than that is to look at it and say, “I'm a small company. You know, look, I'm working off a thousand dollar a month budget that I gave myself to start this little tiny company or something like that.” You could be very, very prescriptive about what you spent and could grow as the demand grew as opposed to saying, “Okay, well I got to get a bunch of services to do image processing and I got to get a bunch of services to do ML, and I got to get all these other things.” Now you have this ability to grow from this tiny, tiny micro services are tiny Nano services and grow up and, and I think that's a, that's an interesting approach, which is one of the reasons I think serverless is very, very good for startups.
Shannon: Nice. Awesome. Well thanks you guys. I actually wanted to jump to my CTO for a minute. So, Ron, you know, we have a lot of experience with serverless and Epsagon, because of course we use serverless, we eat our own dog food. We also speak with people who-- We have the luxury of speaking with people every day who are focused on serverless development, but we don't, in these conversations often see traditional Dev ops teams. We're seeing environments where developers are deploying directly to a production environment. So, how do you think operations, Dev, ops R and D teams and organizations will evolve in order to develop an operate serverless apps?
Ran: Yeah. So, I just think that it's actually true, like we're seeing developers are now like really focusing on the deployment and doing everything by themselves, and the Apps are like left out of the party, and it's a bit weird situation because you don't want developer to handle the monitoring, and I know all the CCD stuff, you want someone else to take care for it, and developers keep their velocity to develop their new business logic and the new features and so on. So, I think what led to this issue was initially the developers are the one who brought serverless alive. There are the one promoted at some conferences. They're the one who started using it all over, and they are the one who had the revolution, and as I said, it seems like the Dev ops guys just, felt we're still handling the communities and scaling issues and orchestration and so on and developers just went to another branch and said, “Hey, we don't need it anymore, just use the better infrastructure for them.” The developers themselves felt responsible for taking care for the deployment, tooling, monitoring, troubleshooting and so on, and if you take a look about the Dev ops, it makes sense. It's something different. It's not the same thing that they're used to. Even if we're talking about Dev ops is a new role. It just, I know born in the last couple of years around the world, and suddenly something new happened and there is new infrastructure that need to adapt to and it's different definitely, and I keep hearing the theme of serverless equals no apps, which is definitely not true. In serverless, there are different apps, I don't know, I'll say it in a nicer way, but there are definitely apps to handle and it's different and it's something that haven't arrived yet fully to the operations because as we see this whole ecosystem is still growing. We can see a new tooling coming every day, like four deployments and definitely frameworks, some for monitoring, for security, and for so many things that even focused that we've talked about previously, and it still being established and I believe that once everything will be fully established and the ecosystem will be fully mature, you'll be able to hand over what the developers are doing now to the obscene to be more a function wise, to run constantly on production.
Just a few things about like what, what it means to do apps in serverless. So, first of all, it's taken care of for the CICB pipeline. I know it's pretty simple today to deploy a function, but once they're ending up with hundreds of functions, you really need to take care about how your pipeline looks and keep, make sure that you're doing the right version and so on and how to orchestrate this whole stuff, and definitely there is the obstacle relates to monitoring, understand that everything is alive and working correctly and under the-- It meets the SLA that's you're expecting. Also, we talked about costs, like who is going to monitor the cost. It's definitely not the job of the developer to understand how much their business flow or user registration is cost, to his company. Someone else will need to take care for it, and ultimately when you're talking about serverless is mostly taking care of for hundreds of resources which are mainly functions but also like the cues, the web servers, the database servers, the storage and so on, and someone will do need to take care of for it to make sure everything is playing like opera where you've got, I know hundreds of people who are running something and there's one conductor that makes sure that everything is running properly. So, that's probably what will be the Dev ops and roll, but it will take time. I think that seeing startups will still see it for the long-term that developers will take care for it, and for enterprises will start to see the shift starting to go like over the time. Absolutely take care of these stuff once there will be more mature.
Shannon: Jeremy, you develop quite a bit in serverless and you work as a consultant with a lot of companies that are moving into serverless. What are you seeing? Do you have any responses to Ron’ answer?
Jeremy: Yeah. I mean the apps is not going away. If you probably read my newsletter this morning, but basically, you know, I'm actually doing consulting for a company right now where I'm completing a risk assessment for the infrastructure. The infrastructure is entirely serverless, so I have to go through and we have to have disaster recovery plans and we have to have a backup plans and you know, all these different things that have to be in place in order to prove to a company that everything's going to be okay. Sometimes when you tell them what everything is serverless, but there are no servers, there's no way for somebody to hack into it other than hacking into our AWS account that, you know, there's, yes, there's ways that it can security issues, but for the most part, a lot of those are mitigated, and so we don't have to worry about fail over and backups and some of these other things the same ways we used to. So, it's sort of a challenge to fill out these risk assessment things because you're wanting to put an A and most of those columns, but no, so, I mean there's so much that operation still has to do a and it goes well beyond just configuring and do a couple of things like that. Yeah, you're not setting up the orchestration anymore, but you're still doing the CICD as Ron said. You're still planning on that disaster recovery. Somebody still has to be responsible for monitoring the system, you know, developers, yes, they're closer to the infrastructure now, but at the same time you're probably not alerting them at two in the morning if something, you know, something goes wrong, right? So, you sort of have a first line of defense, somebody that can trace those issues and then identify which of the hundred microservices, you know, the issue happened in, and then potentially, you know, then wake up the developer and-- Because again, everything's moved down to application code, where we don't worry about failures anymore, so much at the infrastructure level. We worry about failures because, you know, some null value couldn't be mapped over and that's causing an issue with it slowing down the processing of records or something like that. So, that's where more of us moving to. So, I do think that apps people are going to have to take a step back and say, “How can I help in the development of the application? How can I get closer to, you know, understanding how the code is working and what services it needs to interact with? But I think if anything there'll be more of a sort of a conductor or a helper in that regard but it's certainly not going away.
Shannon: And Young I saw your head. Do you want to chime in here?
Yan: Yeah. So, I think it's true that a lot of companies, where the adoption for service because you developers as opposed to the opposite, perhaps build into quite a few companies, and in fact, at my current company I saw on our previous company, a lot of the adoption's been driven by the apps team, because they also found that they also got this amazing tool that can allow them to do-- To automate a lot the things that they have to manually by hand or using very complicated scripts or mechanisms because now they have all these folks into a depressed environments, with their conflict, cloud trail and bunch of other things whereby they can react to things happening in the environment and take better control using lambda to automate a lot of things that they had to do on a daily basis as well. So, I also seen cases where the adoptions been driven from the other side, from the apps team, as opposed to the development team.
Shannon: Yeah. Corey, did you want to jump in on this one?
Corey: Not touching that one with a 10-foot pole. I’ve loud angry opinions because I'm an old grumpy apps person who has not quite adjusted to the new state of the world.
Shannon: I just had these nightmares of like PHP scripts and you know in Pearl back in the day to manage infrastructure and I was having palpitations. You know, I'm going to call down and—
Corey: Yeah, right now and you wake up and then you have to do it.
Jeremy: Well it was much, much easier managing a single Linux server, running a CGI bin on the T1 line that was plugged into your office. That was when days were simpler.
Corey: Yeah. Speaking, spoken like someone who was good at things. I don't know what that's like.
Shannon: Me Neither. I was terrible. Alright. Okay. Young, the last question I wanted to kind of segue and follow up with you, because you do. You speak a lot about the future of serverless. I don't know if anyone that's watching some of the anti-attendees have, but certainly our team has watched your future of serverless talk quite a few times and so I just thought for the people that are kind of seeing you for the first time, or maybe if you don't mind talking a little bit about how cloud vendors evolve the terms of their serverless offering, and what they're going to do regarding monitoring and tracing. So, a lot of our audience of course is interested in the monitoring and tracing piece. So, I was hoping you could touch on that.
Shannon: Young I can't hear you. I think you're on mute.
Young: Sorry. I think the apps until now I've seen a lot of companies that they adopt serverless, and it turns out to be the first time they actually had gotten their contact if AWS. I find that oftentimes these companies find themselves a bit lost in the whole complexity of AWS, and there's a lot of the current tools that we have. They did a good job in catering for this particular market and to make things accessible and simplify things for you, but then again, many of these vendors, they don't provide enough value add to what I get from [unintelligible 00:43:22] already. Maybe sometimes you get the additional data points out. I don't get from say for example, but for someone who is already familiar with AWS and they don't really help me out enough and in terms of addressing the gaps that in what I have for a AWS already, and I think as more and more with these customers, they level off the usage of serverless, and view more and more complicated systems.
The vendors also need to up their game and cater for more advanced use cases. I think that's why I like what you guys have been doing at Epsagon, because I think from even early on when I was speaking to Nitzan and Ron when the guys were starting out, you are really focused on tackling the problem with chasing workflows, as opposed to individual functions, and I think I still know who's been a bit in, I guess none travois serverless applications, so that is the kind of problem that I run into all the time I suppose to a what's going on in one particular function, and from here, you know, working in a much larger company now-- I think as an enterprise, I also have many different use cases for same chasing information from different angles. As a developer. I’m interested in beings with apps, used to be the workflow, and as the security team I'm interested in being able to detect. So, especially with traffic from this role, in my infrastructure, and maybe could accurately identify witnesses such as over commissioned functions based on what the function actually needs to talk to, and as the infrastructure team, I'm interested in understanding what my infrastructure is costing me, and where to focus my optimization efforts so that I target, you know, based on my infrastructure, that's causing a proportionate, unfortunately the higher than other, and other parts of our infrastructure, and as the product team, I'm interested in understanding the cost of each user transaction to understand what features are profitable to me as a company and really just how much I should charge for those features.
I think of those things, you got this comment intersection with observability and having that visibility into what's going on in my application. I think that's why the observability is such an important piece of the past, so and as I guess, with I guess, some of some of the existing providers are worth in the past. Things like [unintelligible 00:45:37] very good example whereby on top of giving you that visibility, they also had a lot to learn to diagnose potential problems before they happen and alert you proactively, and that's something I really love to see more and more vendors will do, so that, when serverless go as, “Oh, you just don't release and obviously in your performance profile for these function has changed. Maybe something has changed that you didn't intend to. So, perhaps you should look into that or maybe something like, “Oh, the cost for this particular function has gone up since you did a release and yesterday maybe again, you may something that you made a mistake here.” Things like that. I'd like to see the vendors that do a lot more for me going forward. Even though there'll be the right information, I can do it myself, but they're just so much easier if someone else does the job for me.
Shannon: I got you. I got you. Does anyone want to chime in on this one?
Ran: Yeah, actually I can jump in. I can relate to your-- I think it was one of your last boss with their percentiles and that we're doing it all wrong and you mentioned it again right now that observability or in general, the need to understand what's going on in production can be very different from each role to another in the company. For the product it will be more breakdown of understanding what are these main business transactions and how long they take and how many times they occur, and even cost, like how much it costs. While for the apps, it's mostly monitoring, then for developers, it's troubleshooting and for some they want like amount of Los, some will want only the ratio to understand whether they need the SLA, so I can really relate to that and it's correct. It's starting to get into my mind that you can’t have a product that one fits all because you can't have a single dashboard that will fit all the roads for all the several lists needs and it's more like you need to have an appropriate to view for each one of them for each role.
Shannon: Yeah. Jeremy, I’m seeing your head going up and down.
Jeremy: I'm just agreeing. I mean, I think that one of the things that serverless gives us the ability to do, especially from an event driven standpoint and I'm not sure if anyone's familiar with Rob rule and Nordstrom and his idea of the distributed ledger, but it's this fascinating idea to basically say any event that happens in the system, just capture it, and then decide later on if it needs to go somewhere or someone can decide later on if it needs to go somewhere, and where it's interesting just in terms of, did an event happen or did something happen as a result of an event? Those are the kinds of things where if you have this ability to replay those events and you can see that sort of larger picture. It's just an interesting approach and that we could replay those events and we could fix errors that happened, and we can go back, and we could say maybe we want to simulate those events to see why this particular function is taking longer to execute now or whatever.
So I just-- There's a lot of benefits to serverless and I don't-- I know I didn't explain that anywhere near as well as Rob does, but there’s so many of these benefits that you don't have with a lot of these traditional systems, but that does come down to like, you know, like Young said, if you can't find the right way to figure out where those issues are going wrong, just because, you know, maybe a function now takes one second extra to complete every time, that's a bad example, but let's say that's the case, at what point do you notice it? At what point does it become a problem? Is it a cost thing? Is it a latency thing? I mean, where is it? So, the more tools we have and the different ways, in the different facets we can slice and how we can look at that data just becomes very powerful for developers and people working in these distributed systems, to start pinpointing those problems and optimizing for a solution.
Yan: Yeah, one interesting to think of around data service, we've been doing this for a very long time for PR, we collect as much information as we can, the part use the actions and whatnot, and then we taught so that we can ask to obviously questions by running, all kind of ad hoc queries. We should start doing that for our own system so that we can—Like I said, capture every event that has this happening system, so that we can also then go back and ask questions and understand how our system is actually behaving
Shannon: Yeah, and Corey? I see.
Corey: Well I'm still hiding here.
Shannon: [laughs] I know. I’m trying to call you out Corey. C’mon. We need the other favorite, you know.
Corey: No, I hear you. The problem that I tend to see, well broadly, is that people are in some cases, diving into this without really having a conception of what that means. Whether that means in terms of economics, which, okay, fine. That's where I live. That's not the interesting part of the story. There's the point of where it comes from, not knowing what that's going to mean for their architecture, not knowing what it's going to be from a staffing perspective-- We have problems with 50 developers all clobbering each other, so now we're going to go ahead and build out a microservices environment and still wind up with the world of people who don't know how to communicate, because this is a political problem more than it is a technical one. These are still very early days. There's exciting times, and especially recently to make us a more timely. We saw that lambda now has extended his front time to 15 minutes, which is great and awesome. Don't get me wrong, but that enables three really interesting and helpful use cases and millions of terrible ones. So, by and large, if you're extending your runtime, have a lambda function past five minutes, you're almost certainly doing it wrong.
There are exceptions to this, but as a general rule, don't go down that path unless you enjoy pain. At that point it's, “Oh, if we can get it to 45 minutes, we can shove our whole monolith into a lambda funk.” Don't do that. I don't want to see a cobalt runtime. There is no AWS 400, so let's act appropriately here. That they’ll be responsible with the tools with which we're given to work and that's something that's a challenge because again, you're going back to an earlier sentiment. You don't have people who have grown up doing operations and then pivoting over to this very often. The lessons you learn from working in operations come in the form of scars, where we did this thing, and everything caught fire, so we know next time not to do it that way. We set fires and new and exciting ways, but those hard-won lessons are more or less being dropped by the wayside.
So painful lessons around things like caching for example, tend to wind up manifesting in new and exciting ways with lambda. There are fun account limits you forget or they actually a smack into woman your site breaks. There's a whole list of learning opportunities here. So, even approaching this from perspective, “Oh, we can fire all of our hideously expensive apps people.” You still need someone who understands the principles of how economics and how environments work in this-- Not economic. How operational environments work in this context, and that's something that I think is somehow falling by the wayside, messaging wise. There we go. They’re completed.
Shannon: Nice. So, I think that's all the questions, unless anybody else wants to jump in on this one again. We got about seven minutes left and I promised people that I would let them ask questions. So—
Corey: Also noted. You know what your problem is.
Shannon: [laughs]. Thanks Corey. Is there anyone, online, I love you. Is there anyone that’s watching any attendees that would like to ask any specific questions?
Corey: Fight me.
Jeremy: Maybe while we're waiting, I'll jump in on the 15-minute lambda thing. So I actually think that the idea of letting a serverless function run as long as it needs to is actually a really idea, because where we're going and I think you see this with far gate and a couple of these other things, and of course if anyone's familiar with serverless framework V2, that's coming out, you can actually now launch either a function as a lambda function or as a container and the blurred line between the two, and I can see Corey shaking his head there, but the blurred line between the two is that-- Essentially if you've got functions that need to run for a longer period of time because they're doing some sort of more complex level of processing, doesn't have to be a monolith. I think there are examples here where that function will need to run for longer period of time. Maybe a container is a better place for that, but the idea of being able to spin that up as quickly as a lambda function can, I think is a powerful concept.
Corey: Oh, you could shove an entire container into it and run indefinitely. Great, but in its current incarnation, if you have indefinite running lambda functions. You ever leave the oven on and freak out and go back home to make sure it's off? Picture that with 10,000 lambda functions left running concurrently. Yeah. There's no current great cost economic story around lambda, give it an indefinite running period. Oh, Doggie, does that change? And it's like, “Wow, our phone number, or is that our phone number? That's our bill for the day. Oops.” Yeah, we're adding commas at an unsustainable rate, and that winds up being a problem for people. So yes, there is a neat story around being able to do that, but I think it also is a regression in form of how you wind up thinking about this. Something like being able to hand off to far gate and blurring the line between those two, I'm completely onboard with. To some extent, I think the more innovative aspect of this is the event model as opposed to going down the path of just, “Ooh, I just throw code over the wall. That in some ways is more compelling story, but I'm not sold on let these things run forever, because again, there are three to five use cases that are super handy for that and the rest are terrible. I'm going to run WordPress in a lambda function. Not Lambda functions, plural. A single lambda function.
Jeremy: Well, yeah. I mean I think if you're looking at-- You still have the event model though. I mean that's the biggest thing, is that it's got to be triggered by an event. So, you're not sitting—It’s not there just responding to ongoing events, the same thing and now you're getting into state maintenance and all that kind of stuff. So, I do think there's a big difference between that. I'm just thinking that if you have things that need to process, and I use the example in my posts about maybe video trans coding or something like that, where if that is a, you know, 25, 30 minute process, if you can do that by triggering that, you know, that compiled code that does that for you, and you can trigger 50 of those at a time if you need to, or a thousand of those at a time if you need to as opposed to scaling up more servers or scaling up containers or doing something like that. I think that's a really good use case and I don't like putting limits on things just because somebody may misuse them or abuse them. I think that you need to give people certain freedoms, and where a lot of criticism of serverless comes from is to say, well, why only five minutes? Why do I only get, you know, three gigs of memory? Why can I only do this? Why can I only do that? And I think that hurts the adoption to some degree. I think I'd like people to say, Oh yeah, I'm going to put WordPress in lambda and then let them realize that's a terrible idea, but for the people who need that extra computer, need those extra capabilities, don't hand tower, hamstring them because I think there are plausible use cases that that would make sense for.
Corey: There, and to some extent, I think that the narrative for building that is something you want to gate to some extent. We saw this with S3 for a while where you at a checkbox away from any authenticated user which meant globally not in your account and we're still dealing with the fallout from that years later. It's. It's one of those things where yes, there are valid use cases, but by the same token, you're burying a landmine for people to step on and that's not going to be a great feeling and from a customer perspective being. I think being able to reach out to get an unlocked for your account and then you sign a waiver might wind up being something that works a little bit better, but you're still going to see people who don't fully grasp this, and they learn how to fill out the form, so it gets approved and it winds up becoming this. It becomes a nightmare. I think that a few stories like that resonated a hell of a lot louder than the interesting capability storeys. One bad experience outweighs 10 good ones. If you take a look at how people wind up viewing these things. From a pure engineering capability storey, I agree wholeheartedly with what you're saying. It's wonderful. I’m just thinking of the human element as we take a look at the varying levels of technical sophistication of many AWS customers.
Ran: Which is why we need apps people because developers need to be saved from themselves. Sometimes. Not all of them.
Shannon: I'm going to-- Unfortunately. Interrupt on that, but that I wholeheartedly agree. I wholeheartedly agree. We need apps people. I think that's a good thing that came out of the panelist. Nobody has questions, but I do want to give you each the opportunity to say something. There maybe somebody here that's just getting started with serverless. If you want to give one or two quick examples of what's, you know, a good application for sort of, or like a quick sentence or two of advice. If you could, I'm going to go around the panel and I'll hit needs on as well and then I think we're going to sign up. So, Jeremy, you're up. Do you have anything to say to the serverless news out there?
Jeremy: Yeah, I mean I love serverless. It's a complete change in sort of a, it's a paradigm shift. It's a sort of a mind bend sometimes to think about events and how those cascades and how that works. So, it's a lot different than working with a monolith or even to some degree individual microservices. So, how functions communicate with one another is a lot different than how sub routines can easily be accessed and shared within a monolithic application. So, my advice would be, you know, really think about the smallest unit of compute that you can do. What is it-- What has to happen to this event, and really get to think about event. So, an event comes in, files uploaded, user makes a request to an API gateway or something like that. What is the smallest amount of computer you have to do to that event to then create something useful as a result of it, and if you can think in those terms, then you can start to experiment. I am not in the camp that says never have, you know, one function respond to several API approach, for example. I think there are use cases where that makes sense. Think about scaling and always build all of your Lambda functions n+1, always think that these things will scale up. Never bulid it in a way that it will constrain it to only run in low concurrency.
Shannon: Awesome. Ran, you're next.
Ran: I would say, for my end - it's great to start with, pick a random tutorial and start using Lambda, but if you're intending to go fully serverless, read some tutorials, thorough tutorials, I know that Jeremy and Yan got some nice of them. Go through it and understand what it takes to really have a serverless application in production, and not just a random function, before you dive in. There are lots of caveats that you need to understand before you get into it, like mono-repo, how to monitor, how to detect cost issues, how you know about performance, and so many stuff that you need to be aware of before jumping in, because it's not shallow water - it's very deep, and you want to know what you are getting into.
Corey: I would say that when you're starting out, take a look at things like Serverless Application Repo, look at GitHub, for similar patterns for what you're trying to do. And if you can't find any similar patterns to what you're trying to do, jump into a public forum, hire a consultant - not me, that is not my field - and have someone take a look at what it is you're doing, and if the response is a bunch of people looking at you like this [..], then maybe stop and think: "is this an appropriate use case for it?", because most pattern that we've seen have evolved to the point of relative stability. If you've followed one of those you're probably in the right path, if you're bbuilding something new, it's possible you've discovered something new and awesome - it's also possible you're about to do a grievous error, and you can save a lot of pain by talking to people who will shriek, and then help you find a better way for that to go. One of the things I love about the entire serverless community right now is how open and willing to pitch in people are, even if the value of their entire contribution is "NOOOO!!!!".
Shannon: Thank you! Nitzan.
Nitzan: Honestly, I think that you've said all the important things. I think that today, if you're doing something new, you probably should do it in the cloud, and if you do it in the cloud, you should probably think if it should be serverless, and usually the answer is yes. So I think "serverless first" is the way to go today.
Shannon: Spoken like a true serverless fan.
Corey: One other comment to weigh in on that as well, is: it does save you a lot of time going serverless with the second application. Expect the first one to be a learning curve, or a learning cliff, as the case may be. I wound up figuring that out when I spent two weeks into writing a very simple application, and sobbing. Just expect there to be a little bit of learning to thinking in new ways.
Jeremy: Yeah. I've re-written serverless applications. There is one application that I think I've re-written maybe six times at this point. And it's quick, because of the quick functions, but just looking at it from a difference approach and choosing different things. It is definitely a learn by doing situtation in any case.
Ran: Sorry, I must jump in as well. At Epsagon, as well, we changed the architecture at least three times, like completely removing a hundred functions and deploying new hundred functions. Because, as time goes by, you understand how it's best to construct such an application, how things should communicate with each other. It's something you're getting from experience. Hopefully someone will be able to pull up a tutorial, but it's more of an experience thing.
Shannon: Yan, I'm gonna end with you, our local serverless hero. So, what's your advice for the new that's starting out?
Yan: I guess everyone has most of things have already been said already. I guess that, what Corey said, don't expect a silver bullet service, there are many other problems but it comes to its own caveats. And yet, when you're starting out, just try to read as much as possible, from many people here on this panel, including Ran, and Nitzan, and the others at Epsagon that provide good content for the community, and other people as well, such as Tom McLaughlin, and PureSec as well have great content. Definitely spend the time to learn and don't just jump in and expect magic will happen.
Shannon: Yea, thank you. And I think with that, we are going to end, but before that, first - thank you so much! To the panelists, I learned to much today, I am just humbled by having you again today, I really appreciate it. I do suggest for everybody who is an attendee - each one of these gentlemen have a ton of content. I'd get their emails. Even if you're just starting out, or if you're trying to go from just one application, to, you know, advocating inside your enterprise for serverless. Working with people like Corey, and like Jeremy, and like Ran, and like Nitzan, and like Yan, and working through their experience and their strengths helping other customer grow their serverless footprint, those experiences will help you. So, we put their Twitter handles in the chat, we put their websites in the chat. I do suggest certainly subscribing to their emails, and thank you. Well done. The other things to say and that we did record this session and we will be posting it and sending you information. So, everyone, thank you so much! I really appreciate it! Bye guys.