Automatically Identifying Writing Strategies for Science Communication
This blog post explains our paper, Writing Strategies for Science Communication: Data and Computational Analysis which was published this year in the Conference on Empirical Methods in Natural Language Processing (EMNLP). If you’re interested in the technical details of the work, feel free to check out our paper or the project’s github repo!
It’s important that the public gets trustworthy, understandable information about new scientific findings. But have you ever tried reading a scientific paper or talked to a scientist about their research? Chances are you and the scientist both left with a headache!
Many scientists would love to share more of their work, and many readers would love to learn about it, but explaining and understanding scientific concepts is hard.
Researchers interested in writing a blog post about their research can learn about strategies for making research accessible and interesting to the general public in style guides like 12 Tips for Scientists Writing for the General Public. While great resources, these guides only provide general strategies like ‘don’t use jargon’ or ‘know your audience.’ There isn’t guidance on how to use these strategies in one’s own writing. This makes it difficult for researchers or novice science writers to communicate their work to an audience outside their research field.
What we did
To combat this communication issue, we used modern natural language processing (NLP) techniques to automatically identify writing strategies for science communication. Our goal is to eventually build tools that give automatic feedback to writers for using strategies in their own writing to best communicate with their audience.
To understand which writing strategies were common within various outlets of science communication, we consulted existing style guides and converged on a set of writing strategies. These strategies include guidance like reporting only the main ideas of the findings or stating their real world impact.
We gathered documents from a wide span of science news publications, including blog sites, scientific magazines (like Scientific American), and university press releases. We annotated a subset of these documents to mark occurrences of the different writing strategies, then used a mix of rule-based and machine-learning approaches to identify the strategies across the whole set of publications.
What we found
We found that there were exciting differences between how venues used the strategies. For example, press releases highlight the real world impact of research more than scientific magazines. Magazines in turn use less specialized jargon and incorporate more storytelling, active voice, and present tense into their sentences.
These findings are important because they suggest that depending on the audience different strategies for science communication might be more effective. This is a valuable first step to the creation of writing tools to aid scientific writers when writing for different audiences. The next step for us is to explore the effectiveness of these writing strategies on audience engagement. By using these methods we hope to help bridge the communication gap between researchers and the public―and prevent a few headaches along the way.
By: Lauren Kim and Tal August
This work would not have been possible without our collaborators, Katharina Reinecke and Noah Smith, as well as our annotators and those who read drafts of the paper and offered feedback. Thank you!