Last Wednesday, a small group of reporters gathered in The Tyee’s downtown Vancouver newsroom to spend their lunch hour learning about something new, something with the potential to assist their reporting. Or, perhaps they were there for the free lunch? Regardless, this gathering kicked off a month-long series of brown bag lunch-and-learn sessions that I’ll be delivering in The Tyee’s newsroom focusing on new tools for reporting.
The first session focused on DocumentCloud, a tool that enables reporters to more easily work with “primary source” material like reports, contracts, and other large, typically hard-to-parse, documents. The presentation was kept intentionally short, roughly thirty minutes, where I walked through the various features in DocumentCloud using an existing PDF document that had been used in a recent story on The Tyee. Specifically, I covered the DocumentCloud’s analysis tools (displaying entities – names of people, places, etc. – and timelines), creating highlighted notes in a document, and embedding the notes, or the full document, into a story using The Tyee’s content-management system.
Questions ranged from how accurate the entity extraction is (Document Cloud uses Thompson Reuters’ OpenCalais to extract semantic information from text), to the challenges that arise from using scans of paper documents that then rely on DocumentCloud’s optical character recognition (OCR) capabilities (again, Document Cloud relies on a third-party tool, Tesseract, to do this). In anticipation of some of these questions, I had uploaded a number of hard-to-parse documents and demonstrated how DocumentCloud had both extracted the text and analyzed the document for meaningful entities.
Immediately after the session, I set up DocumentCloud accounts for each of the reporters in attendance.
Two days later, The Tyee published this story by education reporter Katie Hyslop that highlights corporate influence on British Columbia’s educational plan, which makes use of DoucmentCloud. Award-winning food reporter Colleen Kimmett is also looking to DocumentCloud to help with an upcoming story that involves a large contract document. So, we’re off to a good start here, and I’m keen to see how the other Tyeesters that attended – reporters Geoff Dembicki, legislative reporter Andrew MacLeod, and contributors like Mitch Anderson – make use of it.
It just goes to show how a small time investment can have a potentially big impact. My sense is that part of the secret sauce is keeping the sessions short and practical; focusing on tools that make sense in the context of the type of reporting being done, and using existing stories to create a “before and after” moment.
There are three more sessions this month, October 10 (tomorrow!), October 17, and October 24. If you have any suggestions on new tools or techniques for reporting that could be covered in thirty minutes – something you might want to learn, or might want to see presented to a room full of talented reporters – drop me a note or leave a comment.
**We're at [Mozfest][mozfest] this weekend, and we have a plan.** Our aim: turn our [Getting down with GitHub session][session] into an ...… Continue reading