Every year, thousands of entrepreneurs launch startups, aiming to make it big. This journey is well-known to many, from the story of the tech savant who dropped out of college to start an iconic business to the advice articles that are published by many magazines. But the survival of a startup is also a well-known story. But how do those survival rates shake out when we turn to the evidence? As it turns out, the U.S. Census Bureau collects data on business dynamics that can be used for survival analysis of jobs and firms.
In this tutorial, we build out a series of functions in Python to better understand business survival across the the United States. Kaplan-Meier Curves (KM Curves) are a product limit estimator that allows for calculation of survival of a defined cohort of businesses over time and are central to this tutorial. By comparing survival rates in various Metropolitan Statistical Areas (MSAs), we find regions that may fair far better in business survival than others. One particularly interesting finding is that while firm survival rates fall over time, it's possible for employment to grow among surviving firms.
In order to get started, we're going to first load in a series of Python packages that will allow us to build out a survival analysis:
Loading up packages follows the usual routine.
import io, requests, zipfile
import pandas as pd
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
import plotly.graph_objs as go
init_notebook_mode()