Does Failing At Startups Make You More Successful Than Succeeding At Them?

Springtime For Hitler, The Startup

The other day someone asked me how things were going with Exversion, a question that startup founders get asked every time we pull our noses out of our Cup ‘o Noodle and dare to go outside. Let’s be honest, the person asking is expecting reports about progress and achievements, but given the odds of startup land to the person answering that question it’s more about benchmarking where you are on the road to failure.

In my case it was impossible to answer honestly. Things were great: I asked my last remaining cofounder to leave four months ago, I’d terminated a free hosting arrangement and put Exversion’s small collection of expenses entirely on my credit card, I’d stopped meeting with investors and had taken a job that made it impossible to work on this company full time.

Everything about the current situation screamed failure, and yet since I had started the downward spiral I signed a lease on my dream apartment in my dream neighborhood, paid off all of my outstanding credit card debt, developed a steady plan for paying my student loans off, stabilized and strengthen Exversion’s infrastructure, hired three people, began a collaboration with Microsoft to integrate our API into Excel, sipped champagne on a private yacht, smoked cuban cigars on the beach at Montauk with Ja Rule,  committed to speak at three major conferences and received invitations to speak at two others. Oh and that new job? It’s at the UN.

About a year ago Exversion was taking off: YC flew as out for a chat, TechCrunch covered us, powerful people in DC invited us to VIP lunches, investors sought us out, but I was tired, in debt, isolated from my friends and family and perpetually broke. It would not be fair to say I was miserable because I loved what I was building, I loved working with my cofounders, and I loved the perks offered to us when everyone assumed our success was inevitable. But my life was in a perpetual state of chaos, as was the life of my team. Handling my stress was one thing– living in the 3rd world without running water or electricity then being homeless in the former Soviet Union does wonders for your crisis management skills let me tell you– but watching my cofounders (people I had bonded with and adored) deteriorate under this same stress was too difficult. That made me miserable.

If anyone else was writing this post, the remaining paragraphs would roll out like this: a little symbolic flagellation, a few trite “lessons learned”, capped off with reaffirming ones vows to the cult of personality that poisons the tech scene. “Despite all that I have suffered I still firmly believe in the effectiveness of this snake oil rich white men have sold me, please don’t shun me for my failure. I still deserve to be among the best and brightest.”

But fuck that. (1)

Don’t Bitch About It, Get the Data

Right now I feel like I’m starring in a modern day adaptation of The Producers, because having a FAILING startup has been much better for my career than a successful one ever was. I’m not joking, this is actually my second time failing and the second time I’ve seen a huge jump in my status and income as a result. And when I look around my community I see the same strange little pattern: the people “succeeding” with their startups look terrible while the people “failing” are flying around the world, eating at good restaurants, making important new friends. I even know a few serial failures. Folks who keep founding startups that never come close to ever raising money or getting significant traction and yet they are nowhere near financially ruined.

And I’ve come to feel that the difference between founders who profit from failure and founders who are crushed by it is how much they buy into the rhetoric of startup Gods (the investors, bloggers, and authors who write long, generic, feel good advice about how startup founders should think/feel/behave ultimately based on assumptions of meritocracy and consumer intelligence that have been so thoroughly debunked by now it’s a wonder anyone can base an argument on them and still be taken seriously). Simply put, in my experience, those who think the path to success is built on doing exactly what startup experts tell you to do tend to get ripped apart. While those who are open to following or dismissing the startup dogma depending on their needs sometimes do better as failures than others do as successes.

But I’m a data person, so observing this in retrospect was not enough. I wanted to figure out a way to demonstrate it with science.

Step One: Let’s piss off Naval Ravikant

A lot of people fail at data science because they forget about the SCIENCE part of it. They treat data like tea leaves, mindlessly throwing a bunch of it into a cup, then staring at the bottom and trying to interpret patterns. The first step in exploring an issue with data is to establish a research question. So what did I want to know?

  • I want to know what positions startup founders are most likely to have before they become founders
  • I want to know what positions startup founders are most likely to have after they are no longer founders
  • I want to know if things like how long their startup lasted, their gender, their age, whether they are technical or not have any influence on the first two questions.

So how do I find that data? First thing I need is employment histories, which can be easily found on LinkedIn … except… well, there are two problems. First, LinkedIn has records on millions of people most of whom will never ever found a company of any kind. The second problem is that LinkedIn really hates it when you to scrape them.

They really really hate it.

Rather than try to write a perfect script to thwart LinkedIn’s anti-scraping methods and filter millions of records down to the population of people I wanted to look at I decided to start with another site first.


Of course AngelList also doesn’t like being scraped, and their markup makes it very difficult to get information if indeed the information is even there to begin with. But people do tend to link their LinkedIn profiles to their AngelList profiles, and by foregoing the need to crawl LinkedIn we eliminate most of the problems harvesting data from LinkedIn.

So here’s what I did: After playing around on AngelList I realized the PEOPLE section had an ajax request that hit the following url: which would return a nice neat chunk of HTML easily parsed by BeautifulSoup. From there I could extract all the links to AngelList profiles and parse the profiles themselves with BeautifulSoup. Essentially all I’m looking for is a link with the class name “fontello-linkedin”. If that exists, the script grabs the location of the LinkedIn profile and downloads the page from LinkedIn servers, saves it to a HTML file on my hard drive and moves on.

The reason for saving profiles rather than parsing them was so I could tweak what information I was extracting as often as I wanted without having to worry about LinkedIn finding out and blocking me.

Everything seemed perfect and simple. Until I crashed AngelList’s servers.


For the record … I’m almost 100% sure the downtime AngelList experienced and the running of my deliberately misbehaving bot was a coincidence. I mean… it’s not that complicated of a process, it wasn’t hitting that many pages. But whatever, sorry.

Once AngelList was back online, I tweaked the script to sleep for a random number of seconds (up to a full minute) both to mitigate the small chance that I was overloading AngelList’s servers and to keep LinkedIn from detecting my bot. It made the script very slow, but I just opened up work for my actual job in another window and let it run.

Step Two: Cleaning and Normalizing the Data

I’ve gone into more detail about this in other posts, but once I had the data from LinkedIn I couldn’t really do anything with it until I had normalized it. Particularly the job titles. Let me tell you, people put some ridiculous things down as their official positions. I mean people were identifying themselves as “VP Crystal Ball” and “Wish Granter”. One guy was apparently a janitor before starting his startup. Plus there was every conceivable spelling, abbreviation and capitalization variation of “Cofounder and CEO” imaginable.

And the reality of data is that it’s inherently biased, and gets more biased the more you clean it. So where are the biases with this data? We have a potential sampling error with AngelList, as investors are much more likely to have public profiles on AngelList than small time entrepreneurs. So right away we might have a skew just in the very nature of who is likely to submit their info.

In the course of cleaning I did a lot of guessing. I can use machine learning tools to determine gender with a reasonable degree of certainty, but that’s not fool-proof. I estimated the date of birth by assuming that the earliest date of education was undergrad and that undergrad education starts at about 19. Reasonable, but again not fool-proof.

Step 3: Choosing a Method of Analysis.

Once I had the data I had to figure out a way of tracing paths in and out of founder status and figure out which one were the most frequently travelled. And how to arrange the data so that it could be done easily?

Normally my first instinct would be to graph it. As a first step graphs are pretty nice, they give a clear picture you can show basically anyone to illustrate a relationship. Afterwards you want to dot all your i’s and cross all your t’s with statistical significance, but graphs are a nice way to get your bearings and see if you’re on the right track.

Except when I built my job title classification system I ended up with about 35 titles, which is way too much to graph nicely. If I narrow it down further, I loose nuance and possibly inject more vulnerabilities into the process.

So what to do?

Markov chains!

Markov chains would allow me to look at what the most common transitions between jobs were, and how my demographic criteria affected (or didn’t) the results. In this case I don’t really care so much about the probability of each chain, if I did calculate those values it would be possible to built a hack that analyzed a person’s LinkedIn profile and determined their likely career path in startup land. Which could be neat but is essentially a project for another day. So really what I did is a simplified version of Markov chains reporting only the frequency of each full chain itself.

The most common chains in our sample look like this:

Board Member->Board Member->Board Member
Investor->Board Member->Board Member
Board Member->Board Member->Investor
Board Member->Investor->Board Member
Founder->Board Member->Board Member
Investor->Investor->Board Member
Board Member->Investor->Investor
Board Member->Board Member->Founder

Just on first glance the data suggests that startup land is one giant game of musical chairs, with people already on the top switching off between investing and founding. But remember we already established that our dataset was vulnerable to a sampling error, potentially skewing too heavily on the investor side. These results aren’t necessarily surprising.

Big Data Is Still Bullshit, But Sometimes Size Matters

Everyone thinks the benefit of having lots of data is accuracy, but that’s not true. More data collected in a biased manner is still biased. The main advantage to having more data is the ability to control for these kinds of errors. We have data from about 750 people, split into roughly 4,000 items. For most forms of statistical analysis you really only need about 100 cases, so we have some leeway here.

There are a couple of things we could do to control for a sampling bias. We could take a random sample of our 700+ individuals. We could also engineer a completely unbiased sample by randomly selecting a set number of investors, engineers, designers, product managers, etc.

So let’s see what– if any– skew might exist in our data. I wrote a quick script to count the number of profiles with at least one job title equivalent to either “Investor” or “Board Member”.

384 investors out of 778

Not bad. Much better than I expected.

Step 4: Selecting At Random

Still I really like drawing random samples from my data and running the same analysis a few times. Besides doing it really only requires adding two lines of code to our Markov chain generator. No reason not to.

And our most common chains with a random sampling?

Sample One:

Board Member->Board Member->Board Member
Board Member->Board Member->Investor
Investor->Board Member->Board Member
Board Member->Investor->Board Member

Sample Two:

Board Member->Board Member->Board Member
Founder->Board Member->Board Member
Investor->Board Member->Board Member

Sample Three:
Board Member->Board Member->Board Member
Investor->Board Member->Board Member
Board Member->Board Member->Investor

So let’s have some fun and see what the situation looks like for the rest of us by completely removing all the investors.

Engineer->Founder->Senior Engineer
Founder->Senior Marketing->Founder
CEO->Lead Marketing->Founder

But we started collecting this data in order to look at the most common patterns in an out of Founder status. So let’s tweak our scripts again, bring the investor back in and restrict our chains to only the ones that follow this pattern: ?->Founder->?

Board Member->Founder->Board Member
Board Member->Founder->Founder
Founder->Founder->Board Member
Investor->Founder->Board Member
Founder->Founder->Lead Engineer
Investor->Founder->Lead Engineer
Founder->Founder->Senior Marketing

As I continued to pick through the data I saw a lot more downward motion– that is people starting off with one role, founding a company and their next position being lower than what they had before. But with 35 categories there are so many different combinations it becomes hard to really get an overview of that idea.

What I decided to do was build another dictionary that would take those 35 categories and assign them a numerical value between 1-6, one being an entry level (or non-startup) job, 6 being Board Member. Here’s a general idea of what that looked like:

6 – Board Member
5 – Investor
4 – Advisor
3 – Lead Engineer
2 – Senior Engineer
1 – Engineer

Now you may find yourself thinking “But that’s completely arbitrary and not at all objective…” and you’d be right. Welcome to data science 🙂 At some point in every analysis there is some kind of judgement call that is based on a completely subjective assumption. Usually these decisions happen at the collection stage, so they are easy to cover up. If I present data on the habits of doctors, not many people are going to think to ask how I’ve defined “doctor” (medical, PHD?, non-MD medical professionals?) yet some decision had to be made in order to collect the data in the first place.

The main problem with the assumption I’m making using this ranking system is that it does things like put an entry level Engineer roughly equal with an Intern. HackerNews will love this idea I am sure.

Anyway, keeping our filter on the ?->Founder->? pattern this is what things look like with the average time spent at each position and the average time spent as a founder:

chain freq first_avg second_avg third_avg founder_years male female technical non-technical
No Movement 199 3.451843 3.988275 3.577052 3.981381 178 9 93 (47%) 106 (53%)
Downward 229 3.650291 4.185953 3.063683 4.239042 200 10 116 (51%) 113 (49%)
Upward 116 3.186782 4.429598 3.635057 4.320076 104 5 50 (43%) 66 (57%)

Couple of things that surprised me about this:

– Founders who see a career boost post founding actually stay LONGER than other groups. Remember we’re restricting the data to the pattern ?->Founder->? here so it’s not just about founder_years being higher, but the difference in second_average too. This is completely opposite of what I was expecting, but seems to support PG’s “stay alive and get rich” axiom

– I added the percentages to the technical/non-technical columns because I could not believe what I was looking at … WHOA seriously? I would think technical skills would give you a better chance of coming out of the founding experience with a better job. After all aren’t we all saying how badly everyone needs good developers? Ouch.

– ….. there really are no women in this industry. Well, okay that wasn’t actually a surprise… but… man.

Places Where Things Might Be Wrong

A smart reader might be wondering if the data isn’t skewed significantly towards Investors, why do they dominant so many of our chains? Is there something worth reading into in that? Maybe startup land is less meritocracy and more “rich getting richer helping out their friends”?

And while that might actually be true and I’d certainly love to write that blog post, I don’t think this data can support that conclusion. One of the things I noticed while writing the script that parsed LinkedIn files was that investors tended to list all the prominent companies they had invested in under their experience. So in other words the job title “Investor” might be at Company “Bullshit Capital” but it might also be seen listed under “eBay” followed immediately by listings for “Investor – Facebook”, “Investor – Twitter”, etc.

That would naturally make the chain Investor->Investor->Investor (or Board Member->Board Member->Board Member) way more likely than other possibilities.

The other factor is that I lumped ALL types of investing together (angel, VC, seed, etc) where as other roles were split into hierarchies (Engineers, Designers, Marketing People, etc). The rationale for this was twofold: one, it’s not really fair to equate angel investing with entry level investing; two, I was interested in the paths technical and non-technical to founding a company … less interested in the break down of investors in the pool.

Fail Slow and Be Sneaky

My take away from this data is that coming out of starting a company with a better career than you left is about staying alive as long as possible. I’m very curious as to why founders with technical skills appear to be less likely to rebound strong post-startup life. It could be a “Big fish, Little Pond” illusion– after all a lowly “Engineer” at Google might be a lead architect anywhere else– or it could be something else.

What remains to be seen is what keeps a startup alive for a long time? After all it costs nothing at all to add the title “Founder” to a listing on LinkedIn. Creating an AngelList account is free. We tend to assume that startup success is fast, overnight if possible. Lots of users, lots of capital raised, the earlier the better. It’s almost like working for it is a sign of something lacking. If you were smarter you’d have found the lightning in the bottle right away.

But the data suggests that while that might be good for investors, it’s not so good for the founders.

And that’s really what I find so striking about my experiences with “startup success” -vs- “startup failure”. Everything that investors and incubator gatekeepers were telling us we HAD TO DO in order for them to take us seriously left me and my team worse off. I have yet to see anyone present any evidence that quitting your job to live just below the poverty line without medical insurance is more likely to lead to startup success than bootstrapping or side hustling.

It’s worth asking: who benefits from furthering the crash-and-burn methodology? Well … the investors. If you’re bootstrapping you don’t need to raise money until you start to see growth, that gives you leverage and possibly your pick of investors. Investing in your company becomes more expensive. The payoff gets smaller. You might forego incubation altogether.

Investors have less options to choose from, their odds go down. Maybe your odds go up– or not– but their odds definitely go down. That’s not nefarious, that’s just basic common sense. The cheaper you can invest, the more investments you can make, the better you spread out your risks. The cheaper you invest the more money you make when one of those investments hits the better you mitigate your risks.

It all makes perfect sense.

It’s just not a very good system if you happen to be a founder.


Data (2)

Raw job listings

Raw job listings with investors filtered out

Full chains


(1) – You may have noticed that my cursing has increased. That’s because I always used to give my blog posts to Jacek before publishing and he thought my fondness for coarse language was unprofessional … he wasn’t wrong about that. (*waves* Hi Jacek!)

(2) – Will be putting most of the data up on Exversion tomorrow and adding links here as I go. It doesn’t take that long to get it on Exversion but in the middle of cleaning up the final files I run out of API credits for so it’s either wait until tomorrow or release the data without the male/female split. LOL


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s