Data Science and Law: What One Lawyer Learned From a 50-Hour Data Science Bootcamp
Is there a relation between data science and law? Here's what a lawyer learned from a 50-hour data science bootcamp at Data Science Dojo.
With an increased focus on the growing role of data science and data analytics in the future of law, I decided that it was high time to learn what all the fuss is about. How do data science and law work together? Initially, I considered taking a course on data analytics geared for lawyers, but shockingly, I couldn’t find much, with the exception of a couple of classes that focused on e-discovery where predictive coding is a hot topic. One new player in the legal data space, LexPredict also offers a bunch of trainings for lawyers, but the company seemed geared towards biglaw and in any event, didn’t list dates or prices for its classes
Unable to find data engineering classes for lawyers, I decided to get at the subject from another angle: start with the data science tech and work my way back to the law. That approach gave me a plethora of options, from low cost classes at Udemy and Coursera to 12-week bootcamps costing $10k or more. However, because I didn’t have the luxury of giving up my day job, I knew that I’d need a compact course since any program that dragged out over a period of weeks or months increased the chances that I’d drop out once my caseload and client emergencies presented a conflict. Likewise, given that I’d have to take time out of my practice for a class which would cause some financial loss, I didn’t want to shell out several thousands of dollars for a class.
Based on my criteria, Data Science Dojo’s data science bootcamp fit the bill: it’s a reasonably priced 5-day, 50-hour onsite program that didn’t have any pre-requisites (though there was about 10 hours of pre-class prep). And the class covered broad ground: in a span of the week I learned both the coding tools like basic R, MS Azure, Hadoop, and Hive along with concepts like data mining and visualization, predictive modeling, Ensemble methods like bagging and boosting, random forests, the importance of cross validation, the difference between training and test data, AB Testing basics, building a recommendation system and handling real time and streaming data (we hacked a quick IoT solution using Azure tools, though truth be told, I was pretty much lost by then). Below are some of my takeaways on big data, especially as it relates to the legal profession and what it’s like for a lawyer to learn a new skill at an advanced age.
The mechanics of building a predictive model aren’t particularly difficult; understanding what features to include and how to approach the problem is – and that’s where domain knowledge is important.
One of the underlying themes of the class is that data science (itself a buzzword) is merely a collection of skills; intuition and domain knowledge matters as much as coding a predictive model. Yet oddly, when data science is discussed in the legal profession, we downplay the importance of legal expertise and its value in creating effective models.
Predictive models are iterative and constant questioning is a good thing.
Although most lawyers will argue a legal principle ad nauseam, when it comes to data, we’re surprisingly passive. For the past two years, Clio has released a Trends Report that produced interesting, albeit counter-intuitive results. Yet the results are reported as is, with no questions as to the methodologies used, what the data means or how it was gathered. That’s not true data science: it’s group think.
Big Legal Data Isn't all that Big
Our instructor shared with us the Five V’s — Volume, Velocity, Variety, Veracity, and Value – which are used to evaluate whether data rises to the level of big data. For volume, we’re talking about huge amounts of data – not terabytes, but exabytes and beyond – too large to be stored and processed on traditional machines. For example, on Facebook, 10 billion messages are exchanged each day. It’s hard to imagine many sources of legal data that approach that volume. Our instructor’s point was that we shouldn’t make a data problem into a big data problem unless absolutely necessary. So I wonder whether lawyers are using the term “big data” for small data or treating ordinary data problems as big data problems.
Kaggle Competitions are Way Cool
I hadn’t know much about Kaggle before my class. Although our involvement in Kaggle was limited to an in class competition over who could build the most accurate model to predict survival on the Titanic, more broadly, Kaggle serves as a platform where companies can crowd-source creation of data models. Many of the contests attract large numbers of participants – most likely because the sponsors pony up substantial cash prizes as incentive. Lawyers are often criticized for not crowd-sourcing orb-sharing information like other professions — but I’ve not seen a single platform that offers any financial reward to lawyers for creating content that might be used as the equivalent of case notes. If any of the companies adding blog content to supplement caselaw – as Fastcase in collaboration with Lexblog are doing now – offered a thousand dollar award every week for best content, I think we’d see an explosion of high-quality crowd-sourced materials
All Practicing Lawyers, Not Just Millennials, Need to Understand New Technology
Most of the conversation about the importance of learning about big data or AI or other new tools comes in the context of advice as to what millennials need to learn. But I think it’s even more important for us mid-career and older lawyers to to keep pace with the future if we want to have control over how the last decade or two of our careers play out.
After 50 hours of bootcamp, I’ve had to catch up on client work – and I’m not sure how soon it will be before I can apply all the fancy new tricks and knowledge that I’ve learned. For now, I’m satisfied that at least, I’ve taken the first step. When will you do the same?