deploy-your-own-saas, How do you cut a monolith in half, #define CTO, Everything’s Broken Everything’s Too Complicated
The scryer looks at you with sympathetic eyes.
"If "the cloud" is just someone else's computer so let's go with mine then."
Une liste de tout un tas d'alternatives en self-hosting à des solutions SaaS telles que Netflix, Evernote, Dashlane, Google Analytics, Mailchimp, … C'est intéressant rien que pour constater la diversité des solutions !
Performance isn’t easy either. You don’t want queues, or persistence in the central or underlying layers of your system. You want them at the edges.
It’s slow is the hardest problem to debug, and often the reason is that something is stuck in a queue. For long and short-lived tasks, we used back-pressure to keep the queue empty, to reduce latency.
When you have several queues between you and the worker, it becomes even more important to keep the queue out of the centre of the network. We’ve spent decades on tcp congestion control to avoid it.
If you’re curious, the history of tcp congestion makes for interesting reading. Although the ends of a tcp connection were responsible for failure and retries, the routers were responsible for congestion: drop things when there is too much.
The problem is that it worked until the network was saturated, and similar to backlog in queues, when it broke, errors cascaded. The solution was similar: back-pressure. Similar to sleeping twice as long on errors, tcp sends half as many packets, before gradually increasing the amount as things improve.
[…]
Pushing work to the edges is how your system scales. We have spent a lot of time and a considerable amount of money on IP-Multicast, but nothing has been as effective as BitTorrent. Instead of relying on smart routers to work out how to broadcast, we rely on smart clients to talk to each other.
Pushing recovery to the outer layers is how your system handles failure. In the earlier examples, we needed to get the client, or the scheduler to handle the lifecycle of a task, as it outlived the time on the queue.
[…]
Be warned: A distributed system is something you can draw on a whiteboard pretty quickly, but it’ll take hours to explain how all the pieces interact.
Un article sur la communication entre les services d'un même système. L'auteur étale d'abord les mauvais usages, puis rappelle comment de vieux problèmes ont pu être résolus au sein de systèmes distribués. C'est truffé de bonnes guidelines.
I joined Stripe as an engineer in 2010. I began by working on the backend infrastructure: designing the server architecture, creating our credit card vault, and producing internal abstractions to make people’s jobs easier. I loved writing code, but I also spent a bunch of time on other things: figuring out our recruiting program, shaping the culture, or making our first T-shirts (which have been banned since we hired our first designer). I wasn’t doing these things particularly because I preferred them to coding: instead, I had a very strong vision of the environment I wanted to be a part of, and I was willing to go out of my way to make it exist.
As time went on, I accumulated more and more responsibilities which were not strictly writing code. As Nelson Elhage liked to put it, my job became full-time “early employee”. My days were filled with writing cultural guides, acclimatizing new people, running our recruiting program, and the like. I’d often think it was time to give up on coding altogether, but I somehow always found a way back to it.
About a year and half ago, we officially declared me CTO. It was really just putting a word to what I was already doing — the most common reaction was “wait, I assumed Greg was CTO already”. This post is the story of what happened next: finding a partner to build our engineering team with, and figuring out my role as the organization changed.
L’histoire d’un CTO qui cherche sa place. Mais au fond je pense que l’histoire, les questionnements, peuvent correspondre à n’importe quel poste de management technique.
It’s not hard to understand either why the hacker-types who spend their days and nights in front of a computer and who make all the decisions about what tools will run your thermostat and phone love it, or why everyone else thinks it’s not for them.
Si vous avez la flemme de tout lire, vous pouvez sauter la première partie en lisant depuis « Not an exception ». Pour résumer la première partie, l’auteur parle de ses difficultés à passer un examen en remote pour sa certification Azure, et de toute la complexité et incidents techniques pour y parvenir. La deuxième partie fait un résumé des déboires, qu'on reconnaitra même en dehors des systèmes d'examen, et c'est tout le sujet. Les systèmes et applications se complexifient, mais malheureusement pas parce que les services à rendre deviendriaent complexes. Je pense que cette tendance est d'autant plus dramatique dans les systèmes publiques comme pour la déclaration des impôts.