Big Crisis Data

August 21, 2016 / b3llm4r / Leave a comment

14453162721_390ac868e2_o

For the last three years humanitarian data has been a huge part of our consulting work, our bread and butter so to speak. It’s something we’re deeply passionate about, which is why we got super excited when we saw a new book specifically on data science in crisis situations was being released this July.

We’ve had so much fun reading How Not to Network a Nation with all of you, we decided to take advantage of this opportunity to continue the book club with Big Crisis Data. (Our review of How Not to Network will be online next week)

This promises to be a wholly different reading experience, less narrative, more detailed technical approaches. Not entirely sure if it will work as a group reading experience, but excited to see what the response is!

Like before, we’re kicking off this selection by giving away a free copy … which you’ll want to try to take advantage of because Big Crisis Data does have a hefty sticker price!

Data Beach Reads

June 14, 2016June 15, 2016 / b3llm4r / Leave a comment

exversion_beach_read

For the longest time my summer reading was an academic work titled something like “Global Capital Markets of the Czech Republic” It was about 8 inches by 11 inches, a gazillion pages with a teal hard cover. It was filled with amusing and bizarre stories about the policy blunders that took the Czech Republic from Communist to Capitalist. It was wonderful.

There’s still sand from the beach stuck between its pages.

There’s a point just after high school where the concept of summer reading goes from stuff that’s designed to educate you, to stuff that’s designed to be rather mindless.

So in the spirit of being oddballs we thought it might be fun to invite the internet to our book club 🙂

We’ll be reading Benjamin Peters‘s How Not to Network a Nation. And …oh… we’re going to give away a copy of the book too! Just in case you were wondering 😀 Follow the link below to enter!

Win a copy of How Not To Network a Nation

We’ll also been posting regular status updates as we read on GoodReads. Come follow along!

How to Accelerate Smartly

July 14, 2015 / b3llm4r / Leave a comment

Untitled design
Accelerating in business is like accelerating in traffic: you could get where you want to go faster but you’re much more likely to slam into a wall and ruin your chances of getting there at all. Last month I gave a talk at StartupBus finals about making decisions that were wrong for our actual business but were nevertheless attractive because we assumed they would make us seem like a more legitimate startup. We wasted a lot of time chasing validation at Exversion‘s expense and it finally got to the point where I started asking myself who really benefitted from pushing this idea that all startups need to take the same path into the world of venture capital. There are so many ways to be successful, I cannot overstate how important it is to pick an approach that fits your business and your industry.

Accelerators and incubators are generally thought of as the first step on the road to startup legitimacy. So much so that founders seem willing to give up a slice of their equity to push their business into a program without any real sense of what they expect the program to do for them. Joining an accelerator is the startup equivalent of going to college: you do it because you’re supposed to and because you’re raised to think that it will improve your odds going forward.

Unlike college, there’s very little evidence that it will.

Still, there are so many different kinds of programs now there’s bound to be one that fits your business. Now the two programs we got the most out of are accepting new applications. And guess what? Neither one of them takes equity. I have often joked that we are the masters of accelerators that give you no money. If you think the money is the most important part of joining an accelerator you won’t be in business much longer. I’m here to convince you that you don’t need the money, and to be frank you don’t even want the money either.

Friends of Ebay

Internally Friends of Ebay is always abbreviated as FoB, even though obviously the correct acronym is FoE. Fitting because your relationship with Ebay is kind of awkwardly undefined. Essentially FoB is just Ebay subletting their unused office space to startups that interest them. It’s a great program in that you get free, fully equipped office space including printers, scanners, conference rooms, a kitchen and private bathrooms. Lunch is free on Tuesdays, you can book a free massage on Mondays and you can usually secure the event space for your own events free of charge. Every month there would be a little ice cream party for people who had birthdays that month. If a speaker drops by to address Ebay’s workers, you get to be there too.

However there’s also a careful and strict separation between you and Ebay, perhaps for everyone’s benefit. Your key card will only give you access to certain parts of the offices at certain times, which means you can’t raid the snacks in Ebay’s kitchen or blow off some steam with a game of Street Fighter on their arcade machine. The isolation makes it hard to mingle with Ebay staff and find that mentorship they promise but you are invited to their Christmas party.

Instead you bond pretty tightly with the other startups because you are a little colony all to yourself. It ends of being a tiny WeWork that you get completely 100% for free.

First Growth Venture Network

We used to jokingly refer to FGVN as group therapy for startups. Every month we would all gather and talk about our problems, ask for advice and just generally commiserate with our colleagues. FGVN was exactly the right thing for us, at exactly the right time. We would come in feeling beaten down and lost but leave with a renewed sense of determination and purpose.

Plus, you know, they always fed us well.

FVGN days are full days, the first couple of hours are private discussion as a founder group, then you move to the event space where an elite mix of alumni, entrepreneurs, venture capitalists and tech journalists have gathered for a panel discussion from an even more elite group of entrepreneurs, VCs and executives. I mean, where else can you eat strawberry shortcake with Alan Patricoff while listening to David Karp talk about the early days of Tumblr? Or the time we had a breakout session with David Draiman from Disturbed and I kept catching myself humming “Down with the Sickness” while he talked about his new app? Or the time Jacek turned around to hold the door for someone and it turned out to be one of the founding investors in Spotify?

Surreal things happen at First Growth.

But beyond the extraordinary experiences the ordinary experiences of FGVN are pretty awesome as well: the networking, free consultations from fundraising experts on everything from pitching to deal structure. Actually I think the most valuable thing about FGVN is the ability to just talk openly and honestly about what’s really going on. You don’t realize how valuable the opportunity to have feedback from someone who has no agenda is until you’re without it.

FGVN is also just a great opportunity to get to know the startup team at Lowenstein Sandler. The program is free because they want your business, but to be honest, they are among the best and most well connected startup lawyers around. By the end of the program we wanted to give them our business too!

A necessary trade-off here is that because Lowenstein makes their money providing legal services for VC deals, the content of FGVN meetings inevitably always seems to be about fundraising. Even when the formal topic is something else, discussions ultimately work their way back to fundraising. The audience is heavily skewed with VCs and every panel has at least one VC on it who seems to take every question and make it somehow about their investing strategy. Josh Kopelman is invited to speak A LOT and sometimes it seems there are more inside jokes and banter between him and Ed Zimmerman then actual advice for entrepreneurs.

The take away here is that bootstrapped startups will not get as much out of FGVN as startups actively looking for investors.

Apply Soon

Both programs are now accepting applications for their next class. These programs are open to startups at all stages (idea stage, pre-money, funded, etc) and take absolutely no equity. They were great experiences for us, so we encourage you to apply today!

Proposal: Building Scalable Data Infrastructure Without Geeks

July 8, 2015 / b3llm4r / Leave a comment

Original image by Tom Carmony

Every year there’s a technical conference just for the nonprofit community run by the Nonprofit Technology Enterprise Network. This year we had so much fun talking data with so many great organizations we submitted a session idea to the community for consideration at the 2016 conference. We’re calling it: Building Scalable Data Infrastructure Without Geeks

The most important decisions about an organization’s data are often made before the organization has enough money to hire an expert. Most of the advice small, cash strapped nonprofits get on how to manage their data is “buy this piece of software”, and yet it is possible to set up a scalable, developer/analyst friendly infrastructure MacGyver style from tools the nontechnical staff knows.

If you like this idea, we encourage you to vote for it. If it gets picked we’ll publish a companion blog post here with plenty of resources and advanced topics.

Guide to Data Science Competitions

June 14, 2015June 17, 2015 / b3llm4r / 4 Comments

Summer is finally here and so are the long form virtual hackathons. Unlike a traditional hackathon, which focus on what you can build in one place in one limited time span, virtual hackathons typically give you a month or more to work from where ever you like.

And for those of us who love data, we are not left behind. There are a number of data science competitions to choose from this summer. Whether it’s a new Kaggle challenge (which are posted year round) or the data science component of Challenge Post’s Summer Jam Series, there are plenty of opportunities to spend the summer either sharpening or showing off your skills.

The Landscape: Which Competitions are Which?

Kaggle
Kaggle competitions have corporate sponsors that are looking for specific business questions answered with their sample data. In return, winners are rewarded handsomely, but you have to win first.
Summer Jam
Challenge Post’s Summer Jam Open Data Mashup runs in June and focuses on mashing up multiple open data sets (use the Data Search Engine to find some great options). Competitors are not asked to answer a specific question, so this competition is well suited for beautiful experiments in visualizing data.
DrivenData
Like Kaggle, DrivenData competitions have a sponsor with a specific research question and specific sample data. DrivenData sponsors, however, tend to be more social impact minded.

Over the months we’ve posted many great links on winning data science competitions through our mailing list, but if you’ve missed them here’s a list of the best resources, advice and tutorials:

Choosing Your Weapons
DATA SCIENCE WARS: R VS. PYTHON
http://101.datascience.community/2015/05/12/data-science-wars-r-vs-python/

3 Must-Ask Questions Before Choosing That Machine Learning Algorithm!
http://www.analyticbridge.com/profiles/blogs/wait-why-are-you-using-that-algorithm

Dictionary of Algorithms and Data Structures
http://xlinux.nist.gov/dads/

Fast Non-Standard Data Structures for Python
http://kmike.ru/python-data-structures/

A list of assorted tools and such mentioned and used During DSSG 2014
https://hackpad.com/A-list-of-assorted-tools-and-such-mentioned-and-used-During-DSSG-2014-wl5QgF3LsSU

Data Science Resources
https://github.com/jonathan-bower/DataScienceResources

12 Best Free Ebooks for Machine Learning
http://designimag.com/best-free-machine-learning-ebooks/

Top 10 data mining algorithms in plain English
http://rayli.net/blog/data/top-10-data-mining-algorithms-in-plain-english/

Python Shortcuts
The Top Mistakes Developers Make When Using Python for Big Data Analytics
https://www.airpair.com/python/posts/top-mistakes-python-big-data-analytics

11 Python Libraries You Might Not Know
http://blog.yhathq.com/posts/11-python-libraries-you-might-not-know.html

iPython Notebook Gallery (includes pandas cheat sheet)
http://nb.bianp.net/sort/views/

Visualizations
D3.js Step by Step
http://zeroviscosity.com/category/d3-js-step-by-step

For inspiration, check this index of visualization types for visualizing text
http://textvis.lnu.se/

Gestalt Principles for Data Visualization
http://emeeks.github.io/gestaltdataviz/section1.html

Advice From Past Competitors
Machine learning best practices we’ve learned from hundreds of competitions – Ben Hamner of Kaggle
https://www.youtube.com/watch?v=9Zag7uhjdYo

LESSONS LEARNED FROM THE HUNT FOR PROHIBITED CONTENT ON KAGGLE
http://mlwave.com/lessons-from-avito-prohibited-content-kaggle/

What I Learned From The Kaggle Criteo Data Science Odyssey
https://medium.com/@chris_bour/what-i-learned-from-the-kaggle-criteo-data-science-odyssey-b7d1ba980e6

6 Tricks I Learned From The OTTO Kaggle Challenge
https://medium.com/@chris_bour/6-tricks-i-learned-from-the-otto-kaggle-challenge-a9299378cd61

How to use R, H2O, and Domino for a Kaggle competition
http://blog.dominodatalab.com/using-r-h2o-and-domino-for-a-kaggle-competition/

Competing in a data science contest without reading the data
http://blog.mrtz.org/2015/03/09/competition.html

KAGGLE ENSEMBLING GUIDE
http://mlwave.com/kaggle-ensembling-guide/

Building a Search Engine for Data

April 21, 2015 / b3llm4r / Leave a comment

I don’t know if you’ve heard, but last month we built a search engine that is currently indexing open data sites around the world.

In the current startup ecosystem, that seems so retro. We’ve all basically accepted Google as our lord and master, the tastemaker to end all tastemakers. Why bother investing time and energy when you will never be able to compete with such a dominate player?

But as anyone who has dipped their toes in the waters of SEO will tell you, Google’s algorithms judge quality by making a bunch of core assumptions about what useful internet content is supposed to look like. These assumptions over emphasize pages with lots of high quality text (blogs) and under emphasize pages with duplicate structure and low amounts of text (like … for example, catalogues).

That means using Google to try to figure out which open data site has the data you need is practically impossible. Suppose you are a reporter living in Brooklyn trying to find data on animal sacrifice (hey it happens), such data undoubtably exists either through 311 calls or police reports but the question is where do you look for it? You could search the national level through Data.gov, the state level through Data.ny.gov, on the city level through NYC Open Data, or you could search any number of repositories run by informal local initiatives such as BetaNYC’s data repository.

But that assumes that you know that any of those sites exist in the first place.

And here’s what would happen if you ran those searches:

– Data.gov’s search returns no results for “animal sacrifice”

– Data.ny.gov returns data on the number of horses injured or killed at racetracks

– NYC Open Data returns the Brooklyn Public Library Catalogue (wtf?) and a mysteriously named “Multiagency Permits” dataset.

– BetaNYC returns no results for “animal sacrifice”

Exversion’s search engine returns reports from animal services, animal shelters, animal care enforcement, etc. And the best part is most of this data links back to Data.gov … the first site we searched that told us it had no data that fit our query.

Building a Search Engine is Hard

Building search engines isn’t just passe, it’s freaking hard. And gets infinitely harder the more the web grows. The processing power required to crawl, essentially downloading and parsing billions of documents is pretty intense. Then storing that data, indexing it and running queries over those billions and billions of documents requires more resources than most startups can manage. Even with valuations swelling up like they are.

It’s not easy. It’s no longer the low hanging fruit that it was in the nineties when the web was smaller with fewer devices generating content.

Fortunately for us we only need to crawl a very specific subset of the web, which means we can slash the resources needed to complete this task by making a few specific assumptions:

The information we want is hosted on specific sites
These sites have a common API and structure
New information is not added to these sites on a daily basis and old information is rarely if ever updated

Anyone with a computer can create content Google might want to index. Not everyone can run an open data portal. Likewise the number of companies that provide ways of publishing content on the internet is infinite and increasing exponentially while the number of companies providing ways to publish data are only a handful. And of that handful there are two major players who dominate the market: Socrata and CKAN.

All Socrata instances come with a sitemap. It’s not obvious where this is but their robots.txt will give you the link. From there we just careful follow each link and scrape the title and description from each listing of data. It works well because this is what sitemaps are designed to do.

CKAN makes it even easier because CKAN has a pretty decent API. One of the endpoints allows the user to search through the data available on that instance. It requires no authentication and if you don’t provide a query it will return …. well everything.

Even better package search returns all the metadata. So with pagination we can scrape hundreds of thousands of datasets in minutes.

Outliers, Special Snowflakes, and Scrapers As Service

Of course not EVERY site uses either Socrata or CKAN. Most of the world’s open scientific data is on proprietary platforms, Most of the world’s open Geodata is on GeoNode instances. Before we could consider the challenge completed we had to figure out a way to handle sites that didn’t share a standard structure.

I’m awfully fond of Scraping As Service because I’m really not fond of writing individual scrapers every time I need to grab data from a site. But as useful as they are, these companies always seem to have a hard life. ScraperWiki decided they make more money from consulting and closed down their interface for individual devs. Import.io raised venture funding and inexplicably stopped working. 3Taps curates what they will scrape. 80legs tends to accidentally DDOS sites.

About a year ago someone showed me Kimono and there are a few really interesting features that stood out. While Import.io will allow you just to control scrapers via API, you can actually access your data via API with Kimono. You can schedule scrapes to occur regularly. And best of all, Kimono has webhooks.

Which means we could set up Kimono to scrape our outliers and when the scrape is complete, it sends the data directly to our servers via webhook.

Sweet!

Indexing All The Data

The technology is stable, so right now we’re just adding as many sites to our crawl list as possible. We want to prioritize the sites being crawled to make sure that the ones you are most interested in searching get indexed first. You can check out what we’ve indexed so far and what’s on the queue right here. Feel free to add your favorite data site if you don’t see it there.

(photo by pleuntje)

A Tale of Two Conferences

March 17, 2015March 17, 2015 / b3llm4r / Leave a comment

This year we didn’t go to SXSW.

Instead Exversion went to NTEN’s Nonprofit Technical Conference in Austin, basically the week before the techies descended and bought out sixth street for their private parties.

It would have been easy to stick around in town for SXSW, several NTEN people did, but in the end I was glad we didn’t. I love SXSW, but this week I was flabbergasted by the quality of leads that came to us through the smaller, more specialized conference. Had we stayed, I would just now be following up on those connections and as a result the momentum might have been lost.

NTEN is basically paying for itself in clients and partners, which really surprised me. I’m more used to the benefits of conferences being more intangible.

Last year we had an extremely productive SXSW, filled with glimmers of that unique SXSW magic the organizers have practically trademarked. I ran into a guy I had been trying to set up a meeting with for three months on line at the Spotify House (he was behind me!) and we ended up having a critical meeting right there on the street. I grabbed beers with some Ushahidi devs and got great free advice on how to structure an open source consultancy. I met one of the cofounders of Infogram on the dance floor. I got frequently– and inexplicably– mistaken for Anna Kendrick (Was she even in town?).

But we didn’t bring in any new clients or new users. The opportunities that came from SXSW came months later, where as the opportunities from NTEN started pouring in almost as soon as we landed back home.

And some people would look at that and say that large conferences like SXSW are not worth the trouble, but really I think the truth is these are two distinctly different types of conferences.

A lot of startup people think they’re going to SXSW to sell something– get new customers, get investors, launch their hot new whatever– but think about this for a minute: who comes to SXSW looking to buy things? Who looks at their business problem and thinks to themselves “I’m sure I can find a vendor with a solution to this at SXSW”?

No one. There may be some investors looking to scout startups, but actual deals are few and far between. If the crowd at SVB’s club house is any indication, most VC firms are sending associates rather than partners.

No, SXSW peddles in influence and novelty. People go hoping to build connections with influential people. And the influential people go looking to build connections with more influential people. People go to taste test the hot new thing, but only if the hot new thing is given to them for free. Nobody is going home with a new contract, a new client, or a big investment. Successful hustlers come home with hundreds of new contacts, maybe one or two of those will turn into something real.

Basically all developer conferences are like this. People come to learn, and to meet people, not to buy things.

NTEN, on the other hand, is a conference that people attend specifically to buy things. Thousands of organizations send representatives to find solutions to their technical problems. One such colleague told me that he had received specific instructions from his boss to come home with either a great product they could buy or a great consultant they could hire.

At the same time, a couple of years ago I attended another conference that peddled in influence. Thought leaders galore! One of the many people I met there was an entrepreneur running a small technical business in the developing world. Nobody was paying him much attention, he wasn’t anyone’s prestige catch.

Today things are completely different: he’s a TED fellow, a VC, and was named to one of those fancy “30 under 30” lists. When Exversion was working on ebola data for the UN this summer, we were able to collaborate. He turned out to be one of those contacts that paid for the conference, but it took years for that connection to bring returns.

The moral of the story is it’s really critical to research conferences before you buy a badge. SXSW and events of that nature can offer fantastic longterm benefits, but if you need immediate results you’ll probably leave feeling like you’ve wasted your money. It’s not difficult to figure out whether a conference will be a buyers conference or an influence builder: look at the speakers, the panel topics, the branding. Ask yourself: who is going to buy a ticket to attend this and what will they hope to get out of it?

And the types of conferences you’re attending really should be strategic. Many people see SXSW as a conference to “launch” … I could not disagree more. SXSW is a valuable conference a year or more before you launch. It will connect you with journalists and hustlers whose networks and resources could be game changing. But in order to get access to those advantages you need to develop the relationship in a natural way, over time.

(image credit: Anthony Quintano)

8 Absolutely No Bullshit Things Every Entrepreneur Should Know

January 28, 2015January 28, 2015 / b3llm4r / Leave a comment

There are a lot of bullshit posts about “lessons learned” out there. Posts that you would assume by their title should be full of helpful advice for starting and running a business, but more often than not are overloaded with rebranded self-help fluff. I think I’m supposed to be inspired, instead I always find myself wondering why there’s so much no one ever tells you about the financial, legal, and tax complications of starting a company. Why five blogs will run virtually identical stories advising founders to eat right, get plenty of sleep, and build a strong team (without ever really spelling out what that means) but no one talks about the errors that are truly startup lethal. There are screeds on hiring the best programmers, but very little on hiring a good accountant. Guides on pitching investors are published to the internet in triplicate, but good luck finding thoughts about managing corporate credit written in plain English.

So I decided to share the information I’ve gathered– in a safe, general purpose way, mind you. You shouldn’t interpret any of this as financial, legal or tax advice as I am neither a financial advisor, a lawyer or a tax expert.

But anyway, here are eight things you should definitely know as an entrepreneur.

1) Learn How to Loan Your Startup Cash
In the beginning the business will be run largely out of your own pocket. No matter what the legal structure or tax situation, more likely than not you won’t be able to pay the company’s bills without injecting cash from your own account.

Make sure you understand how to enter this in correctly in your accounting software of choice. It should be considered a loan from the owner (or shareholder loan). If you don’t specify that, it could be counted as income which the company will be taxed on.

2) Understand Revolving Credit
Best thing I ever did for Exversion was set up a separate, personal credit card for business expenses. Servers do not cost that much money; our biggest expenses are often things like my travel expenses, conference tickets, memberships to professional organizations. Things that I can easily claim as deductions on my taxes.

But these expenses were uneven too. There would be nothing for months, then suddenly I’d need to drop a grand on something. Getting a second card did two things: First, it simplified the process of itemizing those deductions for the IRS. Second, it meant instead of having some months where I had to pay thousands of dollars to avoid falling too far into debt, I could treat the debt I did have as revolving credit.

We all occasionally have expenses it will take us a month or two to pay off. This summer I moved into a new apartment and had to buy a lot of furniture for it, which got expensive. If a business trip comes up while I’m still paying that off suddenly I’m super close to maxing out my card and racking up higher interest payments. By splitting my expenses it became much easier to keep what was outstanding on each card low while making what I was paying out pretty predictable.

Plus it’s really good for your credit score.

3) Top-quality Lawyers, Accountants, and Sales People Pay for Themselves.
It’s really hard to pay thousands of dollars to a professional for something that seems so routine and boring you could do it yourself or go to a low cost alternative. But… no, a $100 accountant is not cheaper than a $500 one if he misses deductions that trim $700 off your tax burden.

4) At The Same Time, Don’t Be a Snob
H&R Block does Exversion’s corporate taxes. We’re not ready for a big accounting firm just yet and after a few experiments with independent CPAs I went back to people I trust at the Block. Finding a good accountant is without a doubt the hardest part of running a business. I’ve met so many small business owners who as their business grows think to themselves they really should stop using the McDonalds of tax prep. Real businesses don’t go to H&R Block, right?

Unfortunately, what you think is a dedicated independent professional isn’t always what’s going on. This year I spoke to two CPAs before going back to my old preparer: the first one was chronically overbooked, the second one hired interns he barely supervised to do his clients’ taxes so that he could take on as many clients as possible. In both cases I found that after the initial courtship they were impossible to contact, never followed up with me and would often refuse to answer questions or really put much time or effort into discussing options.

Meanwhile my guy at the Block would do my taxes right in front of me, explaining the consequences of each decision, and always on the lookout for a new deductions. If by some chance he screws up, H&R Block will handle the problem for me.

So don’t assume that resources that most people consider out of their reach are better than what you can find on main street.

5) Everything Important You Will Ever Do For Your Company Comes Pre-Launch
In startup land there’s a certain susceptibility to The Field of Dreams Fallacy. What’s funny about this is that almost every entrepreneur will deny it up-down-and-sideways, but when you press them for details their launch strategy starts with … well, the launch.

This was the hardest lesson I learned this year: the launch is not the beginning of the launch strategy, it’s the end. Startup mythos tells you to launch ASAP and if you don’t immediately blow up then tweak and pivot until you find the right fit.

This is incredibly stupid. Do not do this.

For one, it puts you in the position of trying to build a market, a community, and a product concurrently. You will never fully be able to determine if you’re failing to get traction because no one wants what you’ve built, because you’re missing two or three key features or because your users don’t know you exist. The fail fast mentality relies on the assumption that users will be drawn to the products and services the same way the ghosts of baseball players are drawn to a baseball field in the middle of nowhere.

So for months I’ve been working on building up our mailing list, our Twitter following, working on a little ebook to recruit contacts in a key sector … all in preparation of something new that’s in alpha testing right now. The more I work on this strategy, the more I think how stupid we were to just launch before we had established channels of communication between the people we wanted to serve.

You can still iterate fast and agile. In fact if you devote a large chunk of time in the beginning to building your userbase before you have a product, your iterations will be smarter and more accurate.

6) “Failure is not much of a thing at all. It’s mostly a point of view.”
The quote comes from Ben Pieratt’s Creative Mornings talk and nothing more poignantly sums up how I feel about the events of the last year. I’m not going to say don’t listen to advice from startup people in a blog post of advice from a startup person, but I will say don’t TAKE the advice of startup people just because you assume you will look successful or are afraid of looking like a failure. The perception of success and failure in this community is warped beyond belief. It has absolutely no relationship with reality.

Early in the year my cofounder left the company. Now, FOR MONTHS people had been suggesting to me that I really should encourage him to move on. There were different reasons why people thought that was a good idea, but the general consensus was the pros and cons of the current situation weren’t really lining up in Exversion’s favor.

But when I finally did it, nearly all of these same people suddenly started dismissing Exversion as a failure just because conventional wisdom says the loss of a cofounder signals failure. In retrospect I can’t tell you how glad I am that this decision was not made because they told me to make it. If I had I would have been pissed at this outcome.

The truth is nearly every successful company has had a period where the community labelled them a failure, a money pit, or a lost cause. I can think of two friends who had startups everyone said where “failures”, quietly working away under that terrible label until one day Google and Intel took notice and bought them. Not bad for a “failed” startup.

For a more specific example, check out what people were saying about Spotify in 2009

The startup community is a small community of entrepreneurs who understand this, and a much larger community of rubberneckers who need to generate commentary in order to stay relevant.

7) The Most Important Designs Are Empty States
What does your app/site/whatever look like when nothing’s on it? What does my account dashboard look like when I haven’t done anything? Remember that this is the user’s first impression– Not the beautiful, content-rich mockups of active users.

8) Learn to Write A Functional Spec
One of the best decisions I’ve ever made is asking a PM friend of mine to teach me how to write a spec. As a freelance developer, specs had never been a formal part of the process for me. I had no idea what a good one included or how to organize them. Now I’m a total convert, writing detailed functional specs helps me fully articulate what my expectations are, which in turn makes the experience of working with freelance, remote, and overseas developers much smoother.

Lies, Damn Lies and Podcasts

December 16, 2014December 16, 2014 / b3llm4r / Leave a comment

What makes Serial so hypnotic is that you end up kind of believing that Adnan is innocent, while at the same time kind of believing that the witnesses against him are telling the truth too. Maybe not the full truth, granted, but it feels unlikely that everything is a complete fabrication.

And yet, obviously, both of those positions cannot be correct at the same time. Adnan can’t be completely innocent like he claims if Jay is telling even a fraction of the truth. So the listener remains locked in this morbid fascination, looking for a clue, a hint, that will push one of these options off the table completely.

Of all the people discussed in Sarah Koenig’s Serial, the one that has always interested me the most is Jen. I can understand the reasons why Adnan would lie. I can understand the reasons why Jay would lie. But Jen is the one who doesn’t make sense to me. While Jen’s truthfulness doesn’t necessarily mean Adnan is guilty, it is certainly easier to believe in Adnan’s innocence if you can debunk Jen’s testimony.

Plus there were bits about the way Serial presented things that didn’t make sense to me. Why would Jen continue to hang out with someone who involved her in a murder investigation? Why didn’t she know the number of shovels she took Jay to clean off? Why does everyone keep talking about the “consistency” in Jay’s story when that story seems to change every time he tells it?

So I found myself going through the transcription of Jen and Jay’s police interviews and creating timelines of the various accounts of January 13th, 1999. I built a data repository to allow easy manipulation and visualization of each element that you can play with here.

The funny thing is, when people look at the various accounts of the 13th they do so under the assumption that people are telling the truth. Therefore Jen’s testimony of what Jay told her that Adnan told him is given the same weight as Jay’s testimony of what Adnan said when logically it shouldn’t. Third, and fourth hand knowledge, rumor and supposition are much more likely to be where the inaccuracies are, even if the person is honest.

I could never keep all the various versions of what might have happened on January 13th and where that information came from clear. So I wanted the timelines I built to take into account both THE SOURCE of the information as well as who the information was about. Here’s the visualization of that. You can examine accounts from February 27th (Jen’s interview), February 28th (Jay’s 1st interview), or March 15th (Jay’s second interview) either arranged by person or interview. You can easily filter out one source of information and compare the various timelines against each other to see where the inconsistencies lie.

If you want to build your own visualization of the events of January 13th, you can find the data right here.

As I started breaking down the details of Jay and Jen’s stories in short place and time data points, it cleared up a lot of confusion created by Serial’s storytelling style of reporting: Jen didn’t know how many shovels there were because she was in the car, playing look out while Jay went behind the dumpster to find them, a lot of the large inconsistencies in Jay’s story come from him clearly attempting to keep his friends out of trouble … lies that in context seem less like deceit and more like misplaced loyalty and naiveté. While Jay’s first interview and his second changes up many major details, the core timeline of events remains pretty much the same … even as the state claims different.

But looking at the events of January 13th this way also raised some new questions:

– According to both Jen and Jay, Adnan called Jay at least three times between 1pm and 4pm. Twice on the cellphone and once on Jen’s landline. Jen says the landline call was the third and final call (presumably the notorious “pick me up at Best Buy”), Jay says the landline call was the second call. Obviously there was some reason why the police ruled out this information, but it’s interesting as a hypothetical nevertheless. If this important call came through the landline, then it’s possible the murder took place later– at a time consistent with Jay’s testimony rather than where the state tried to cram it in.

– The Patapsco Park bit comes originally from Jen. According to Jen, Jay leaves a message for her asking for a pickup at a park around 7pm. Jen isn’t sure what he means, calls him back for clarification, gets Adnan who claims “Jay is busy”. According to her he sounds high at the time. When the police press Jen for the name of the park she can’t remember exactly what it’s called (or rather the person doing the transcript couldn’t understand her), but the cross streets she gives indicatePatapsco.

Ritz: Where’s that located?

Jen: On [inaudible] Park Road

Ritz: Where’s that, is that Baltimore County?

Jen: Yeah, it’s off of Crosby and Chesworth I believe.

There are so many interesting possibilities here: Jen has already admitted she wasn’t sure if she understood where Jay wanted to get picked up … perhaps Jay really was in Patapsco Park at some point that day … perhaps Jay– young and naturally skeptical of the police– made up the bit about Patapsco in order to better match Jen’s misunderstood account of things because he assumed if their stories didn’t match up the police wouldn’t believe him.

– In Jen’s interview she says Jay dropped Adnan off at some girl’s house some point after picking him up from Best Buy (4ish maybe) … that’s such a weird detail to get wrong. Weirder still that it supposedly happened around the time of the Nisha call. Now, I’m not saying Jay dropped Adnan off at Nisha’s because that would be absolutely impossible given the geography, but it stood out to me for some reason.

– According to Jen, Cathy didn’t know about the murder until the day before Jen’s interview. Jen also makes no mention of Adnan hanging out with them at Cathy’s house which makes way more sense to me than this fantastic idea that Jay, Jen, Cathy and Cathy’s boyfriend Jeff have all heard Adnan has killed his girlfriend and yet still have no qualms about hanging out with him.

– There are points in Jay’s interview where he seems like he is trying to protect Jen by putting distance between them. For example he originally identifies Jen’s house as the house of her brother, and speaks about Jen the way one would speak about the sibling of a relative one barely knows.

Ritz: Okay When you arrive at [Jen’s brother] house, what do you do?

Jay: Sit down, video game out.

Ritz: Who was home at that time?

Jay: Just me and [Jen’s brother]

Ritz: Do you recall what time you arrived at the house?

Jay: No, not exactly. I know a little while his sister came in the house.

Ritz: And who is his sister?

Jay: Jen

Ritz: And how old is Jen?

Jay: Excuse me, I think she’s eighteen.

Ritz: and how old is [Jen’s brother].

Jay: 15

The more I read of the transcripts the more I think the general story Jay is telling is true but that he fiddles with the details in order to improve his position. Either to protect his friends, minimize his own involvement, or to improve the state’s case. I don’t think he made the whole thing up under police pressure, but I could buy the idea that he made up the bits about Adnan repeatedly telling him he was going to kill Hae Lee before the 13th in order to support premeditation.

Devil’s Advocate: Don’t Be Agile

November 18, 2014November 21, 2014 / b3llm4r / Leave a comment

Continuing our series on the pros and cons of popular technical topics for non-technical managers. We look today at the concept of Agile Development and how it will really #$%* you and your developers over.

As a developer, there are three words I dread hearing when I’m interviewing at startups:

We are agile-ish.

Or really any variation thereof: agile inspired, agile influenced, kind of agile. UGH. Going agile is like going gluten-free: doing it half-way causes more pain than not doing it at all.

So just don’t be agile.

The Spirit of Agile is Trusting Your Developers to Lead

To non-technical managers Agile is appealing because it means getting results fast. Indeed, agile promises a MVP in your hands ASAP, a quick launch, followed by growth growth growth.

Who wouldn’t love that?

But the problem with Agile Development is that most non-technical managers do not really do much research into why Agile methodology gets results so quickly. They don’t realize that Agile aims to move the decision making power out of their hands and into the hands of their developers. They think that by importing the format of Agile– sprints and scrums and whatnot– efficiency will just naturally follow. They micromanage, which is exactly what Agile Development is trying to prevent, and when delays hit they micromanage even more.

Agile Development aims to destroy the typical model of “the waterfall” where in orders of what to build come down from above for programmers to execute. That doesn’t mean that non-technical team members have no place in the process. On the contrary, non-technical team members, particularly customers and other stakeholders are a vital part of Agile Development.

But in order to work, programmers in Agile need the freedom to experiment, and to execute without getting caught up in multiple levels of management approval. Agile aims to create small teams where consensus and collaboration are easier.

What typically happens in “Agile-ish” situations is that as developers work on initial product specs, they must constantly come back to the non-technical manager for clarification or adjustment. As the non-technical manager tries to save time and money, acting as Agile Development intends– that is experimenting with one approach, collecting feedback and evolving– become nearly impossible. Building in systems to collect feedback are resources not devoted to the core product and are usually put off until “later”. Testing out new ideas is resented, as non-technical managers have already decided what they want to be built and perceive these experiment as time and money taken away from the real product. (This is particularly true when the dev team is working freelance)

Agile Development relies on the assumption that you trust the judgement of the people you’ve hired and therefore do not feel the need to dictate every element of every decision to them. After all, what does the manager’s opinion on the position or color of a button matter if you’re going to experiment with different options and choose the one that is the most successful with the customer?

But too often non-technical managers get caught up in policing their technical staff in order to eliminate waste. They are afraid of the false starts that Agile Development tells them to embrace. They want Agile, but they want an impossible hybrid of Agile where all the decisions about the product are made in the beginning and are all 100% right.

This is not Agile Development at all. The first rule of Agile Development is to assume that you’ve gotten some aspect of the product wrong and to structure your process around systematically identifying and correcting those mistakes. Even if by some miracle you do manage to get everything 100% correct the first time, business situations change. Products need to evolve.

Yet in the Agile-ish development cycle the product must first be built exactly as the non-technical manager wants it. If the non-technical manager has overlooked a decision or a problem comes up, the dev team must wait for commands. To take initiative and to try a solution without the non-technical manager’s approval is insubordination. Feedback from customers are cherry picked and reinterpreted by the manager. When one component of the product is finished, very little effort is “wasted” talking with stakeholders. It’s more important to move on to the next feature.

And so it goes…

Agile + Remote == Death

What is it that people do when they want to be Agile without actually giving up the control necessary to be Agile? They take on the structure of Agile Development without any of the philosophy and end up with an impossible boondoggle in code. We’ve all experienced the horrors of failing Agile: the 15-min stand-ups that last for two hours, the endless series of planning meetings, “sprints” composed mainly of bug fixes and tweaks because not enough time was budgeted for proper testing and code review.

Agile relies on free flowing communication between members of a small team. Co-location, while not absolutely essential, is considered extremely important.

When your team is remote, especially when they are spread out across timezones, the type of informal collaboration and communication Agile aspires to becomes very difficult to achieve. As a result the daily morning “stand up” becomes the primary (and sometimes sole) method of communication between team members. Instead of fifteen minutes touching base, these conference calls become impossibly bogged down with conversations that would have naturally happened throughout the work day if everyone was working out of the same space.

It is possible for a remote team to be Agile, but it is very difficult … especially when the team members are strangers to each other.

Becoming Agile: Spirit First, Process Second

The goal of Agile Development is to rid dev teams of bureaucracy by throwing out the restrictions of stuffy management processes. It is therefore ironic that the first thing non-technical managers do when going Agile is ignore Agile Development’s core philosophy and skip straight to implementing their processes. Agile development, when poorly done, has become the very monster it was intended to slay.

Its core principles read something like a eulogy now:

Customer satisfaction by rapid delivery of useful software

Welcome changing requirements, even late in development

Working software is delivered frequently (weeks rather than months)

Close, daily cooperation between business people and developers

Projects are built around motivated individuals, who should be trusted

Face-to-face conversation is the best form of communication (co-location)

Working software is the principal measure of progress

Sustainable development, able to maintain a constant pace

Continuous attention to technical excellence and good design

Simplicity—the art of maximizing the amount of work not done—is essential

Self-organizing teams

Regular adaptation to changing circumstances

If you aren’t willing to sign up for that, just don’t be Agile.

	latuji on Guide to Data Science Com…
	neil on Guide to Data Science Com…
	Ashish Dutt on Guide to Data Science Com…
	martijn on Guide to Data Science Com…
	DailyTekk (@DailyTek… on Be As Evil As Possible: How We…

Happy Endpoints

A data blog by the Exversion team