November 1, 2014
In 1975, Frederick Brooks wrote a book on software engineering that is still applicable today in 2014! That book is called The Mythical Man Month and I found myself relating to many of the software-related scheduling and planning issues Brooks encountered almost half a century ago. Below are some of the key ideas I found the most compelling:
Programmers are naturally optimistic and programming is a tasks that lends itself to optimism.
Brooks argues “All programmers are optimists… Perhaps the hundreds of nitty frustrations drive away all but those who habitually focus on the end goal.” which leads to a false assumption that engineering tasks will take only as long as it ‘ought’ to take. However, any software development effort usually consists of many tasks chained end-to-end. The probability that every single one of them will go well is almost zero considering the perfection necessary as a programmer. Considering the volume of nitty frustrations I encounter everyday I definitely relate to being an optimist and have observed others and myself misjudge how long things will take due to this optimism.
1/3 planning 1/6 coding, 1/4 component tests, 1/4 integration tests
From my anecdotal evidence, this breakdown for how software time is spent is spot-on. Almost all of the engineering tasks that have been underestimated at our company have not taken into account the amount of time needed to properly test and integrate the system before it is production ready. Much of the reason is because we (the engineering team) are transitioning from “startup-mode” where we didn’t need as adequate testing because we had fewer customers. More customers find more bugs, so our acceptable threshold for stability has increased. And with it, the effort spend testing. I’ve just recently started to integrate testing into my estimates and so far both quality of product is higher, and target completion dates are more accurate.
**The Mythical Man Month: More people doesn’t equate to faster completion. **
Consider the following 2 graphs:
The first one shows just how difficult it is to maintain communication among more than a few people. The second illustrates how many months a project will take given the number of people on a team. Organizing work around a complex task is difficult with more people involved. But just how much more difficult did not dawn on me until I saw it visually. This bolsters my belief that features should be owned by two people or less who serve as a hub for collecting the knowledge necessary.
The Second System Effect: The second system you build will tend to be overengineered due to pent up desires
We are in the beginning stages of building parts of a second system, so we have not witnessed this yet. But after reading this chapter I will be more vigilant to make sure every part of a spec has solid business value, and to watch out for costly unimportant components.
Better to extend the schedule than release a half baked product
Brooks uses the analogy of an omelette as a delayed software project. You can either spend more time cooking it properly, turn off the heat and serve it raw, or turn up the heat and burn it. However, from my experience it is actually more effective to use less egg from the start. I actually disagreed with this point because you can better hit a target by removing non-essential functionality from a feature early on in the planning stages, which is a better alternative than extending the schedule or releasing a bad product.
Conceptual integrity is the most important consideration in system design
“It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas”, Brooks comments, “[the] ratio of function to conceptual complexity is the ultimate test of system design”. This definition is great because it describes what distinguishes good code from bad. It also helps in clarifying the objective of certain processes we have at the office, like code reviews and pair programming. By collaborating and reviewing each other’s code, we can hold each other up to high standards and maintain conceptual integrity.
An interesting example they brought up of conceptual integrity was the WIMP (windows, icons, menus, and pointing) interface of the modern GUI. I never much thought about it until now, but on further inspection, it is incredible how much can be done on a modern OS with such a simple concept (compared to typing commands in a terminal, as was computing before the GUI).
Documentation is an essential tool that can be the difference between catastrophe and success
A couple of months ago we started instituting a process where features are planned out using spec documents. It was a process modeled after what we were already doing informally: putting together a rough outline of how we were going to build things out so that we could get feedback on it. Over time, we’ve seen these documents come in handy, but only if the document has an owner, and only if effort is put into it to make it the canonical source of truth for anything related to the feature. This requires careful effort in not just making sure the spec covers all the details, but also in writing it such that it is easy to digest. I think the ability to write organized prose is undervalued among technical people, as this is essential in making sure a spec document delivers value.
People don’t set targets or write specs if they feel the organization will not see them as tentative.
I liked this note because it makes sure we understand that specs are living documents and never to expect them not to change over time.
Members of the team need to strive to be flexible because change is the only thing that’s guaranteed.
“Structuring an organization for change is much harder than designing a system for change. Each man must be assigned to jobs that broaden him, so that the whole force is technically flexible. On a large project the manager needs to keep two or three top programmers as a technical cavalry that can gallop to the rescue wherever the battle is thickest.”
I truly believe that the last sentence applies to each member of the current engineering team and our aspiration is to make sure every member can be part of the “cavalry”. It helps define who we are looking for technically as well, since we expect every member of the team to respond fast to changes in requirements, fast in both communication, understanding, and implementation.
Program maintenance is 40% more than cost of development
Cumulatively, I’ve spent a few months out of this year working purely on regressions and bug fixes. We need to always remember that maintenance is far larger than development. It is especially important as we build out features and make choices about technical debt and how much effort we’re going to spend time on testing. Because a couple days spent on testing can save us weeks of time fixing bugs. And it results in happier customers!
Bugs will naturally scale with time and customers
The more time customers spend with a product, the more bugs they’ll find as they bump into edge cases. I’ve experienced this firsthand as well.
Tooling - make effort to share and find tools. Unified toolsets can boost productivity.
We definitely embody this both on the engineering and customer happiness teams at our company. Although, as we’ve grown it’s become more difficult to get tool usage to be adopted company-wide. Finding and adopting great tools is something our company culture promotes.
Disastrous schedule slippage happens one day at a time.
The takeaway for this point is that it is essential to recognize slippage faster and communicate it clearly. One thing that I have found works is setting more granular targets that allow for more segmented estimation. Targets that are 1-2 weeks are less likely to be totally derailed than targets that span multiple months.
Milestones need to be concrete and defined with ‘knife edge’ sharpness. On the flip side, fuzzy milestones are actually millstones that grind down morale
I have seen this first hand, but wasn’t able to pinpoint exactly what was causing the problem. I am glad to see the idea expressed in a way that presents the root of the problem clearly. Milestones need to be concrete. This is where having a test suite comes in handy, because tests either run green, or they don’t.
No silver bullet, software is inherently complex and no management or process changes can improve the inherent complexity.
Closest thing to silver bullet is to buy not build
We’ve been lucky to have the budget to buy and not build, and personally my preference swings towards buying instead of building for components just because maintaining your own system is expensive! I found it fascinating to hear someone from 40 years ago, before the existence of SAAS, say the same thing.
In conclusion, I think the book had a plethora of wisdom, a good amount of truisms, but allowed me to better form a framework that our existing engineering processes can call on for justification. For example, pair programming doesn’t just make our code vaguely better, it establishes a consistent conceptual integrity. Why do we set concrete targets for ourselves? Because disastrous schedule slippage happens one day at a time. And how about removing that feature from Tint 2.0? Because of the Second System Effect! The reason why this book is timeless is because it’s about people, not software, and as long as writing software is complex, people, not computers, will be the ones dealing with the complexity.
October 3, 2014
I’ll be the first to admit my failures. But I hope that also means that I’ll be the first to learn from them. Over the past year, I helped write a backbone app that started out as a free widget to powering a display in Times Square. The rapid growth in customers and increasing demand for key new features helped accelerate an already growing amount of technical debt. As we tacked on new feature after new feature, things got complicated. Eventually, we reached a point where we all agreed that a refactor was in order.
We chose to refactor using Marionette because we were already familiar with Backbone’s patterns and figured that it would be an easier learning curve. Sure enough, after 3 weeks using Marionette to refactor Tint Analytics, we’ve gotten up to speed. We have identified some key Backbone mistakes we made that Marionette helps us handle. Here’s a list of Backbone pitfalls and how Marionette works to help us avoid them.
Views containing too much logic
In Backbone, there are only 2 kinds of objects: Models and Views. Models help connect to the API and serve to maintain state. Views do everything else. As you can imagine, this can lead to Views which start out small, but quickly grow. And since the library doesn’t have any established pattern on how to handle composition, it’s easy to have Views grow to an unmanageable size. Marionette helps us by extracting Views into Controllers and Views, and encouraging highly composited architecture with the idea of Layouts and Regions. The Marionette Controller is in charge of view initialization and communication between subviews, acting as a Mediator.
Not having enough Modularization
This point also ties in with the above point of views containing too much logic. Because it requires a lot of boilerplate in Backbone to create composite views, it is too easy to hold off on creating lots of small views composed to create the larger overall view. Marionette helps by extending Backbone Views into ItemViews, CompositeViews, and LayoutViews. Marionette automatically takes care of accepting collections and iterating through models to create ItemViews, reducing the cost of composition and increasing modularization of view code.
Forgetting to unbind events causing unexpected behavior and memory leaks
Backbone has no pattern or tool to help get rid of zombie views. Instead, it relies on developers to come up with their own solutions to unbind events. I relied on the BaseView technique to make sure events were being unbound, but I always thought it was a little goofy for the library to not handle this automatically. Luckily, Marionette does. Marionette Controllers, Views, and Modules have built in functionality to automatically unbind events. Yay!
Interdependency through global variables
One conundrum that I encountered while building our large Backbone app was how to handle multiple views that share a model, or a view that needs to reference another view’s model. I eventually ended up having a couple global state variables. The problem was, there was no way to figure out the parts of the code that were manipulating or reading the global. In addition, it was easy for the Backbone model and the global variable to become out of sync. Oy!
Entities to the rescue. Marionette Entities are an additional abstraction that allows you to have clearly defined entry and exit points for your models, making them all both globally accessible and also well defined and easily debuggable. It also lets you easily implement functionality like making a model getter a Singleton or having custom model initialization. The best thing is that the View considers the Entity a black box and communicates to it via messaging, reducing unnecessary coupling.
A giant router with all module initialization
Almost every sample app I’ve looked at for Backbone has a single router file. For simple applications, this works fine, but for larger scale application, the router can grow to be unmanageable. A large router is hard to read and maintain because it often times will be responsible for initializing unrelated Views and Models. Marionette helps solve this by distributing routing and initialization among marionette controllers. It allows you to define Model and View instantiation where you can find it later.
Overall, the codebase is looking easier to digest, although we still have much to learn. By not making the same mistakes we did above, you too will be able to avoid code hell, and instead, see something like this:
August 14, 2014
An open letter to the tech workers of san francisco
Dear fellow tech workers,
“We interviewed a senior being evicted from their home in the Mission who said, ‘Google is Hitler’. What would you say to that?”
An interviewer from TechCrunch asked me this question a month ago. The question didn’t surprise me, even though it should have. It seems like that’s all that’s been in the news lately.
The same week, I went to a Youth Speaks poetry slam with Monica. This was the first poetry slam that I’ve ever been to, and I was excited to hear youth speaking out about the issues they hold dearest. The event was fantastic and it inspired me to see youth cultivating their creativity.
But it wasn’t long until a slam came up accusing “toxic” tech workers of ruining the city:
Link to video
On valencia now that's all you see. It's spreading. Like an airborne toxicity. And that's exactly what I mean, it's a toxic city. So they force us out. Both young and old. Raised up the cost of living, no rent control. So if we can't afford to live our only option is to die or move out to Tracy or Antioch like a couple of my guys. While I'm in my city, they're out in the burbs. Not to mention that Twitter and Google are too strung up for words. They're speechless. Denying the fact that the only ones who can afford to live here now are the ones that are Google bussed in. Like they're employees from the mystical wonderland called the valley of silicon. It's really damn sickening, and I'm a 19 year old mother f**cking San Franciscan, damn. - Jerome Robles-Reyes "In My City"
It seems, from all fronts, that the city hates tech workers. Even SF Streetsblog, a blog I hold near and dear as a daily cyclist, declares the tech community as a monoculture that “blames those less wealthy for their own problems”.
Monocultures serve no one, including those whose culture takes over. - Fran Taylor [SF Streetsblog](http://bit.ly/1Ad0qEo)
From these articles, I should be ashamed. I should move back to where I came from. I guess that would be Indiana.
But I’m staying in San Francisco. The solution to evictions is building more housing. But building more housing isn’t going to conquer the root problem which is the animosity many native SF’ers have against people who work in software.
Instead of leaving, I’m going to see all the hate as a challenge to become a better member of the local San Francisco community. I think as tech workers we can make a big difference in public perception with consistent, everyday steps that any techie is capable of doing. You don’t need to be a community organizer to make things happen. A community is just a bunch of ordinary folks having relationships with each other.
I did some research, and apparently there are 20 ways to not be a gentrifier as described by local paper Oakland Local. It inspired me to make a list of my own:
Go get a haircut at a local barbershop or hairdresser (price must be < $15 (guys) or < $30 (gals)). Talk to your hairdresser. Talk about the car accident that happened down the block last weekend. Talk about the traffic issues from Outside Lands. And listen. Learn what’s on the mind of folks in the community.
Read and talk about local news. Be aware of the pulse of the city and about what’s affecting everyone, not just the software industry.
Get involved in local volunteerism. - This summer I helped Doug, a local SFUSD high school teacher, in an externship hosted at Tint. He learned technical skills with us that he can bring to the classroom in the upcoming school year. This fall, I hope to mentor local high school students so they too can learn how to write code. There are lots of resources for you out there, you just have to look! For starters, check out SF Citi or Mission Bit.
Participate in local art. It could be as simple as going to a poetry slam or an art walk, or go even further! My friend and colleague Brandon is a great example for this. He’s working with a local organization called Clittorati on the Vulvatron. What could be more SF than a visually iconic mobile art piece, empowering women, goddesses, and the feminine identity?
Don’t talk down to people less fortunate than you - I once met a fellow tech worker who condescendingly referred to the 38 as the ‘dirty eight’. As someone who rides the 38 every day, it made my blood boil to hear that comment. I finally knew how it felt to hate techie outsiders. Don’t reinforce negative stereotypes.
These are just a small subset of the many things that can be done to cultivate a community and dismantle the image of the evil techie outsider. But, the biggest change that anyone can make is to treat everyone from all walks of life with respect. Even with the fairest of intentions, it’s easy to seem condescending to outsiders, so it’s our responsibility to think and act actively to participate in the community.
May 3, 2014
6 months ago, Nikhil and I were the only developers at our 4 person startup. With business growing steadily, we were so spread thin that there was no hope of improving our product if we didn’t bring more help onto the team. So, Nik and I put our recruiting hats and began our journey to find talented engineers to join us. Fast forward 6 months to the present, and our Engineering team is about to grow to 7 (including 2 interns!), and I can safely say that I’ve learned a whole lot in the process:
Never consider recruitment work “a waste of time”
Time spent finding the right people for your team lays the foundation for everything else at a startup. A great product starts with a great team. So no matter how disheartening it feels to comb through resume after resume and still not find the right fit, always remember that recruitment work is as important as building a new feature or optimizing a process. So do yourself a favor and put quality time into doing the following:
Work your network
Our second engineering hire, Brett, came from Nikhil’s extended network and not from any job board or recruiting company. You never know who’s looking and with social media, it’s easier than ever to let all of your friends know that you’re looking to hire. It’s also easier to bring someone onto the team if they’re vetted by a friend than if they are a stranger. Not only do you feel like you can trust their competency, they can also better trust your competency!
Post a Quality Job Post
Know what people are looking for in their next job.
Hint: it’s probably Mastery, Autonomy, and Purpose
And know why your job is what people are looking for.
Your job post should highlight your strengths.
For example, our strength is our company culture. Our mission is to build a company culture that champions transparency, fairness, happiness, and sustainability. And we make sure to highlight that in our job posting:
- Profit Sharing - We split 20% of all revenue made over 100k and distribute it evenly among the team.
- Team Transparency - We calculate compensation based on a formula that we all agree on. Cap table is made available to all employees. Business financials are known by all teammates.
- Personal Autonomy / Consensus Driven Culture - We foster consensus-driven rather than top-down decision making when it comes to important business decisions. From what features to build next to what furniture to buy for the office, we believe it’s the fairest way of making decisions.
- Customer Driven Culture - We’re very in-tune with our customers and they love us. For example, we decide what features to build based on surveys we send directly to customers. Check out this one that we sent out last year to decide what we would build this past quarter.
- Personal Development Stipend - A monthly stipend designed for self-improvement. Whether it’s books, yoga classes, or a fitness tracker, we want our teammates to improve themselves.
Send Quality Emails
Quality recruiting emails are emails that recognize and understand the candidate. Here are some tips to add some empathy to your correspondance, embedded in a sample Tint recruitment email:
Thanks for scheduling a time with me! To prep for our interview, I’d recommend reading up on our company, getting familiar with what we do, and coming up with a few questions to ask us.
Here are some helpful links to peruse:
- Give the candidate a small assignment to assess their interest in the listing.
- Arm your candidate with the basic knowledge you expect them to know so you can have a productive discussion.
- Give the candidate the motivation they need to get excited about the opportunity.
Protip: Use Assistant.to to schedule your interviews. It’s a Gmail extension that allows you to easily give candidates a way to instantly book a meeting with you and have it show up in your calendar!
Protip 2: Use Yesware to create templates for your common recruiting emails, saving you further time.
Use a CRM
Handing resumes manually through email is incredibly time consuming. Use one of thousands of resume tracking Applicant Tracking Software (ATS in recruiter lingo) such as Resumator, Jobvite, or JobScore to simplify your life.
Find creative places to post to
A job listing link can travel far! But it’s your job to take it there. Consider the following places we posted our link to:
- Craigslist - We posted our listing on 10 major metro tech centers advertising paid relocation and had some success on attracting some good candidates. At $25 a posting, it was an affordable way to reach attractive candidates in markets that have much less competition than San Francisco.
- Hacker News - We found some quality candidates (including one of our interns) from posting our listing as a comment within the monthly “Who’s Hiring” thread. It gets posted on the 1st of every month, so don’t miss out!
- Indeed/Careers/Monster - Surprisingly, these mainstream job boards are frequented by talented people too! Most ATS systems will post to these major boards automatically, so be sure to configure your system to do that.
- Github Jobs - We found some alright leads from this paid posting, fewer applicants but the average quality was higher.
- StackOverflow Careers - We paid to run a campaign on StackOverflow but found that all of the submissions were overseas Java developers at big corporations looking for visa sponsorships. Maybe we were doing something wrong, but we ended up asking for a refund.
- Reddit - Plenty of subreddits to explore if you’re looking to find a community of people who you think would be a good fit. Think /r/bigdatajobs or /r/sysadminjobs
If you’re looking to expand your team, you have to recruit like a pro. It’s better to do things thoroughly from the get-go than to lukewarmly recruit for a longer period of time. Follow the tips above, and finding an engineer in San Francisco shouldn’t be as impossible as everyone says.
April 8, 2014
I’ve been meaning to set up a build / integration server for the past year but haven’t gotten around to it for a myriad of reasons. Last week, I had enough with the
- Features breaking every time a new feature is released (regression)
- Manually smoke testing URLs
- Having no structure for developing/testing new features
So, I decided to setup a Continuous Integration system for Tint! Here’s some notes on what I found as I navigated the confusing waters of setting up a build server.
Continuous integration: the practice, in software engineering, of merging all developer working copies with a shared mainline several times a day.
Outline what your needs are with the build server. For example, my needs were:
- Run selenium tests, preferably in parallel
- Be triggered by Github pull requests and git pushes
- Have an easy to use UI to see breaking builds
- Have easy integration to email, HipChat, and Github
- Travis CI - I used them over CircleCI because Travis seemed to have more industry adoption and also had a better UI and documentation.
- Sauce Labs - The leader in Selenium Grid SAAS, they also do a lot of active development on open source Selenium projects such as Selenium Builder, which is cool.
- Ruby/Rspec/Rake - Wanted to use a language that had strong automation tools around it and was low on verbosity, yet still easily readable, so we went with Ruby and company.
- Of course, there are many, many alternatives to Travis and Sauce (I actually started this project using CircleCI and BrowserStack), however, I chose Travis and Sauce in the end because they had more documentation and were easier to use.
- (Optional) Use The Selenium Guidebook to setup a framework for your Selenium Tests, and to learn how to write maintainable tests. Make sure all the tests run on your Cloud Selenium tool you plan on using (Sauce or BrowserStack) before going forward.
- Make an account with TravisCI, and turn on the repo that you want to set up a build server for.
- TravisCI will automatically hook into Github events so that it will trigger a build on pull requests and pushes. It uses a .travis.yml file in the root of your directory to figure out how to create your build server.
- Configure your .travis.yml so it builds your server properly. For example, in our travis.yml file, we clone our puppet manifest and then use puppet to create our webapp server and handle package dependencies, virtualhost files, random config files, and starting the services.
- Once the server is built properly, you can use Sauce Lab’s Connect feature to run Selenium Webdriver test on Sauce against your build server. Unfortunately, how this all works is not adequately covered in the Sauce Labs documentation, so bear with the magic (this is literally what Sauce Labs gives you for documentation).
Debugging Your TravisCI Build Server
If you’re having trouble creating a working build server, you can email firstname.lastname@example.org nicely and they will allow you to setup a debug build server for you to log into to test things out. However, according to their support team, running a Vagrant VM using the default precise32 ubuntu image is very close to their current setup, so consider that as an option as well.
I highly recommend using Puppet to simplify your build process. Puppet is also useful outside of setting up CI as it allows you to easily configure many servers quickly (for example, adding a new virtualhost file to 10 servers in one shot), and does it in a way that is maintainable and version controlled.
Even with all of these useful tools, it took me a good couple of days to get the build up and running, so don’t be discouraged by how tedious it may seem. The only way to really debug a build server at the moment seems to be to make changes, rebuild the server, see if the build goes farther, and then repeat until everything is working as expected.
Weird things I ran into
When sauce labs tells you to insert the following into your travis.yml:
curl https://gist.github.com/santiycr/5139565/raw/sauce_connect_setup.sh | bash
It should actually be
curl https://gist.githubusercontent.com/santiycr/5139565/raw/sauce_connect_setup.sh | bash
Otherwise, you’ll get a strange “Connection refused” error.
Trying to load a second private repo in your Travis build server will result in a “Repository not found” or “Authentication failed” error which can only be fixed using this obscure support article.
March 26, 2014
I was running into the issue where our CPU on our nginx webapp servers was not being fully utilized, and caused timeouts whenever CPU went above about 10% and memory was hardly being used. I had tried changing the configurations for nginx in the past with no success, so things were getting out of hand. When our traffic spiked yesterday morning due to Google Cloud Developers Conference, Where they are using Tint, we went down, and I had to increase our server count to 20 8GB 8vCPU servers.
Twenty servers to handle 20RPS just seemed ridiculous to me since nginx can handle thousands of RPS on a tuned machine. So I spent a couple of hours yesterday formulating a process to guess and check the effects of the server configurations in order to find out what was causing the issue.
- Isolate a single production server by removing it from all load balancers.
- Set up a Blitz.io account and validate the server in step 1 using the various methods outlined within blitz.
- Load test the server to see its performance.
- Shell into the server and change the server configuration, I was experimenting with /etc/nginx/nginx.conf and /etc/php5/fpm/pool.d/www.conf (don’t forget to restart the server)
- Load test the server while running ‘top’ and see if the performance changed.
Those 5 steps allowed me to finally figure out a combination of settings that allowed nginx and PHP to better utilize the CPU.
Server Configuration Changes
I changed pm.max_children = 5 to pm.max_children = 375
See the links below for more details on what these settings mean.
- All of our traffic (~1600 concurrent users on Google Analytics realtime overview) can be handled by a single server with these new configurations. CPU of the single server handling all of our traffic was ~40%.
- 6 of these servers behind a load balancer, an average of 53RPS could be handled while keeping response time less than 1s. Usually our RPS is around 5-15.
March 12, 2014
Yesterday, we ran into an issue where we were trying to release our white-label app in the Apple App Store. We wanted to submit it under a pseudonym or anonymously, however the App Store requires you to use a credit card with a name attached to it, and uses that name when you publish in the App Store. This is obviously unacceptable for a white label app, since searching any of our teammate’s names will quickly lead back to our company, and it put us in a bind because creating a
credit card under a pseudonym is a pain in the ass.
So, after some sleuthing around, we found a credit card masking service called Do Not Track Me run by a company called Abine. Some further sleuthing revealed that this is in fact a legitimate company as cited by Forbes. We signed up for the premium monthly service, created a masked card, and voila! We were able to sign up to an Apple
Developer Account under a pseudonym.