Navigation

Subscribe to Teknoids!

Teknoids is a mailing list for folks interested in technology in law schools. There is an archive to the list that goes back to 1995.
If you want to keep up with the latest tech in law schools, click through to subscribe!

Session Tags

more tags

Shopping cart

There are no products in your shopping cart.

0 Items

Total: $0.00

Men Who Stare at Code: Regular Expressions, Metadata Retrieval, And The Use of ESP To Guess Titles Without Actually Looking.

Submitted by John Joergensen, Rutgers, The State University of NJ, Newark on Mon, 02/28/2011 - 9:22pm

Presenter(s):

John Joergensen, Rutgers, The State University of NJ, Newark

For those with some experience with establishing an maintaining scholarly and other document repositories, the problem of gathering quality metadata for cataloging and retrieval is well known. The solution is to find methods to extract metadata from existing documents by the most efficient means available. Social tagging, and various commercial products, and getting authors to fill out forms present themselves as solutions, but typically fall far short of what is needed.

This session will illustrate methods for extracting useful metadata from documents structured and semi-structured documents, including efficient methods for manual extraction, as well as automated extraction methods.

Methods to be discussed will include analyzing and parsing text for metadata extraction, using metadata extraction tools for binary files.

Some experience with Perl, Python or other scripting language with a regular expression component
will be assumed, but the intrepid beginner will be welcome.

Schedule info

Time slot:

23 June 16:00 - 17:00

Room:

267

Audience

Track:

Librarian

Track:

Technologist