Topic modeling: a Swiss Army knife for faculty, geeks, and librarians.

Speaker(s): 

Unsupervised topic modeling is a sophisticated, machine-learning-based technique for extracting “aboutness” information from large collections of documents. That sounds scary, but it’s really something anyone can use -- software packages such as Mallet make it possible for scholars, librarians, and other non-geeks to take advantage of what topic modeling offers, on their laptops, right out of the box.  Popular with digital scholars in history and political science, topic modeling promises to be an important tool for legal scholarship.
Topic modeling answers questions like:

  • What’s in this huge collection of documents, cases, or statutes?
  • What are these documents about?
  • How does the discourse represented by this set of documents differ from the one represented by this other set?

At the LII, we’ve developed models that:

  • show the relative level of development of different areas of law in different states
  • suggest, with eerie accuracy, the topics found in a particular judicial opinion
  • automatically replicate (though admittedly with less granularity), the organization of a human-written treatise such as Collier on Bankruptcy
  • discover topical themes within faculty scholarship and relate them to Wikipedia-style encyclopedia entries

​and we’re thinking about using them to  :

  • replicate hand-constructed indexes to large corpora like the Code of Federal Regulations or the Congressional Record
  • discover the differences between the discourse surrounding crime and criminality in the 1980s and that in the period starting around the year 2000
  • construct finding aids for large, confusingly titled bodies of guidance documents such as IRS written determinations and SEC no-action letters
  • figure out what’s in all those Congressional committee prints

Sara Frug and Tom Bruce will show the LII’s work in these areas, along with a set of visualization and scaling tools that help in model development.  This is perhaps the most exciting, scholar-friendly technology to come along in quite some time, and we hope you’ll take a look.
 

Slides and Documents: 

Schedule info

Time slot: 
18 June 13:00 - 14:00
Room: 
180
Video: 
See video