Data
As part of my work in computational semantics and the interfaces with lexical semantics I have been involved in the release of data sets that may be useful for others. These data sets have typically been released under the umbrella of two collaborative projects:
Here are a few highlights:
- MegaAcceptability: Acceptability judgments for effectively all clause-embedding verbs of English in a range of frames, collected using our “bleaching method”. Originally released with White & Rawlins 2016. With Aaron Steven White.
- MegaVeridicality: Judgments about projection of embedded clause content in positive and negative syntactic contexts for both finite and non-finite frames, covering all verbs in MegaAcceptability. That is, a complete window into factivity and veridicality in English. Version 1 (the finite subset) was released with White & Rawlins 2018. With Aaron Steven White.
- Semantic Proto-Roles: Thematic roles have resisted comprehensive semantic analysis despite their incredible usefulness in linguistic theory. One influential proposal for why is that they should be thought of as decomposed, in a potentially gradient fashion, into more fine-grained thematically relevant properties of events and participants. These data sets allow lexicon-scale investigation of this hypothesis by annotating large amounts of data with proto-role property judgments. This emerged out of the first Fred Jelinek Memorial CLSP Summer Workshop in 2014 and was first published in Reisinger et al. 2015 (the figure is drawn from White, Rawlins & Van Durme 2017). This project is with many other researchers, but some highlights include Ben Van Durme, Aaron Steven White (who worked with Ben & I as a postdoc during the second phase of this reach), Dee Ann Reisinger, Rachel Rudinger, and Frank Ferraro. See our recent arxiv preprint (White et al. 2019) for a comprehensive overview of this and related data sets, as well as toolkits.
- Reisinger, D., Frank Ferraro, Craig Harman, Rachel Rudinger, Kyle Rawlins & Benjamin Van Durme. 2015. Semantic proto-roles. Transactions of the ACL 3. 475–488. DOI: 10.1162/tacl_a_00152
- White, Aaron Steven & Kyle Rawlins. 2016. A computational model of S-selection. In Mary Moroney, Carol-Rose Little, Jacob Collard, & Dan Burgdorf (eds.), Proceedings of SALT 26, 641–663. DOI: 10.3765/salt.v26i0.3819
- White, Aaron Steven & Kyle Rawlins. 2018. The role of veridicality and factivity in clause selection. In Sherry Hucklebridge & Max Nelson (eds.), Proceedings of NELS 48, Download: https://ling.auf.net/lingbuzz/004012
- White, Aaron Steven, Kyle Rawlins & Benjamin Van Durme. 2017. The semantic proto-role linking model. In Proceedings of the European chapter of the Association for Computational Linguistics, 92–98. ACL. Download: https://www.aclweb.org/anthology/E17-2015/
- White, Aaron Steven, Elias Stengel-Eskin, Siddharth Vashishtha, Venkata Govindarajan, Dee Ann Reisinger, Tim Vieira, Keisuke Sakaguchi, Sheng Zhang, Francis Ferraro, Rachel Rudinger, Kyle Rawlins & Benjamin Van Durme. 2019. The Universal Decompositional Semantics Dataset and Decomp Toolkit. Download: https://arxiv.org/abs/1909.13851