NASSLLI 2022: implementing semantic compositionality¶

Kyle Rawlins, kgr@jhu.edu ¶

Johns Hopkins University, Department of Cognitive Science¶

This course provides an introduction to what is involved in actually implementing, in a computational sense, a system of compositional semantics of the sort commonly assumed in theoretical linguistics and philosophy (see e.g. Szabó 2017). The target audience is students who have had introductory-level programming experience, as well as basic exposure to linguistic or logical semantics in some form, or have basic compositional semantics experience; it is an introductory course that does not assume deep background knowledge in either area.

(Current) plan for the week¶

Overview, basics of jupyter, compositional architecture
Composition systems and composition operations
Metalanguages and simple type inference
Type inference redux: polymorphism
Case study: quantification. Computational semantics vs NLU.

Tech note¶

Slides for this class are generated via a Jupyter notebook, using my Lambda Notebook project.
The notebooks and generated slides are available via https://rawlins.io/teaching/nasslli2022/
To get the full class experience, I hope you'll install Python/Jupyter/Lambda notebook on a laptop and play along.
I'll give a few "exercises" along the way (~ 1 per day) that could be fun to try if you want something to help you integrate knowledge.
More background, you may be interested in handouts from my S22 seminar on compositionality: https://www.dropbox.com/sh/qxy1yuogr0s1eho/AABWkbHtDeu-RIJO5Esk-v9Ta?dl=0

Overview¶

Goals: computation¶

Take the core of compositional semantics, and show what it takes to implement it
Provide the basic tools needed to implement specific semantic(/pragmatic) analyses in Python, and verify implementability
Provide the tools to understand how computational semantics and Natural Language Understanding (NLU) might meet up with theoretical linguistics desiderata, and how the toolkit computational researchers may already have could contribute to linguistics research.

Goals: linguistics/philosophy¶

Introduce how linguists think about compositionality and semantics in a way that directly connects to computation/programming
Deepen understanding of compositionality by providing a new lens to view it through, that of computational implementation

Background?¶

This course assumes:

some exposure to linguistic semantics / philosophy of language, and/or
some exposure to programming / NLP / computational semantics

I expect it will lean on 1 a bit more than 2. However:

This course won't teach programming per se, so if you have no background at all, would pair well with a course that does (e.g. the NASSLLI course "Python for logic and language", Khalil Iskarous)

What is your background?

Compositionality, preliminary¶

Sloganized: The meaning of the whole is determined by the meanings of the parts and the ways in which they combine.

Szabo, SEP entry on compositionality def (C): The meaning of a complex expression is determined by its structure and the meanings of its constituents.

(1) a. The chair is under the rug.
    b. The rug is under the chair.

Frege's Conjecture (Heim and Kratzer 1998): Semantic composition is function application.

(1')  a. under(the_rug)
      b. under(the_chair)

(1'') a. (under(the_rug))(the_chair)
      b. (under(the_chair))(the_rug)

In [112]:

tree1 = ("S", ("NP", ("D", "the"), ("N", "chair")), ("VP", ("V", "is"),
                                                               ("PP", ("P", "under"),
                                                                ("NP", ("D", "the"), ("N", "rug")))))
tree2 = ("S", ("NP", ("D", "the"), ("N", "rug")), ("VP", ("V", "is"),
                                                               ("PP", ("P", "under"),
                                                                ("NP", ("D", "the"), ("N", "chair")))))

SideBySide(svgling.draw_tree(tree1), svgling.draw_tree(tree2))

Out[112]:

First set of desiderata¶

representations for constituents
representations for constituent meanings
representations for ways of combining
- where the immediate target is probably Function Application of some stripe
what even is a "representation" in the computational version of this project?

Implementations, and why bother¶

What is an implementation?¶

A semantic analysis that gives details is sometimes called an implementation.

Idea $\Rightarrow$ implementation details + assumptions = implementation?
How many details needed? (what is a "detail"?)

Marr levels?¶

Marr 1982 (pp. 24-25):

Computational level: what does the analysis do -- in terms of a mapping from "one kind of information to another"
Algorithmic level: how does it do it -- what are the exact representations and algorithms employed in this transformation
Implementational / Hardware level: "the details of how the algorithm and representation are realized physically"

Marr 1982. Vision: A computational investigation into the human representation and processing of visual information. MIT Press. doi:10.7551/mitpress/9780262514620.001.0001.
https://plato.stanford.edu/entries/computational-mind/#FunEva

Degrees of implementation¶

What is sufficient to count as an implementation? No one answer. Some possibilities for now:

Computational level: are there enough details and assumptions that are made explicit (or "obvious") for a researcher to reconstruct the same analysis?
Algorithmic level: Is there a means provided to mechanistically run the analysis? (Is it "executable"?)

We need to differentiate competence/"proof of concept" implementation vs. performance/human-like implementation. We will not be doing psycholinguistics here...

An implementation may or may not use the same algorithms that humans do!

This course is about what it takes to have an executable, competence-oriented, implementation.

Other relevant projects¶

I'm not the only one working in this space, but it's not crowded. A few pointers:

Haskell: functional programming language that is well-suited to direct implementations of semantics. Van Eijck and Unger 2010, Computational Semantics with Functional Programming is the now classic starting point.
- Example: Julian Grove's dissertation work on presuppositions has an implemented Haskell fragment
The Lambda Calculator (Lucas Champollion and others) is a teaching-oriented tool that makes good on implementing what you need for semantics 1. (Not really research-oriented, source code no longer available?)
Boxer for DRT parsing (but I don't know how to get ahold of it any more).
Blackburn and Bos 2005, Representation and Inference for Natural Language

Toy example: arithmeticese¶

(1) two plus two is four
(2) three times two plus two is eight
...

note that intonation disambiguates ex. 2
Specialized language, but parasitic on ordinary natural language copular sentences
The language fragment is compositional
straightforward mapping to arithmetic formulas

Straightforward mapping:

In [113]:

# Some rather basic python code
2 + 2 == 4

Out[113]:

True

In [114]:

# some very slightly less basic python code
def isV(x): # `is` is a python reserved word
    return lambda y: x == y

isV(2+2)(4)

Out[114]:

True

In [115]:

# even less basic, but closer to what we want...

def isV(x):
    return lambda y: x == y

def plus(x):
    return lambda y: y + x

def two():
    return 2

def four():
    return 4

isV(plus(two())(two()))(four())

Out[115]:

True

Toy example: arithmeticese¶

Alright, but how could you more fully implement arithmeticese as a natural language fragment? Here's some lambda notebook code to do it:

In [116]:

%%lamb
||two|| = 2
||four|| = 4
||isV|| = lambda x_n: lambda y_n: x <=> y
||plus|| = lambda x_n: lambda y_n: x + y
||times|| = lambda x_n: lambda y_n: x * y

$[\![\mathbf{\text{two}}]\!]^{}_{n} \:=\: $${2}$
$[\![\mathbf{\text{four}}]\!]^{}_{n} \:=\: $${4}$
$[\![\mathbf{\text{isV}}]\!]^{}_{\left\langle{}n,\left\langle{}n,t\right\rangle{}\right\rangle{}} \:=\: $$\lambda{} x_{n} \: . \: \lambda{} y_{n} \: . \: ({x} = {y})$
$[\![\mathbf{\text{plus}}]\!]^{}_{\left\langle{}n,\left\langle{}n,n\right\rangle{}\right\rangle{}} \:=\: $$\lambda{} x_{n} \: . \: \lambda{} y_{n} \: . \: ({x} + {y})$
$[\![\mathbf{\text{times}}]\!]^{}_{\left\langle{}n,\left\langle{}n,n\right\rangle{}\right\rangle{}} \:=\: $$\lambda{} x_{n} \: . \: \lambda{} y_{n} \: . \: ({x} * {y})$

In [117]:

(two * (plus * two)).trace()

Out[117]:

Full composition trace. 1 path:
    Step 1: $[\![\mathbf{\text{plus}}]\!]^{}_{\left\langle{}n,\left\langle{}n,n\right\rangle{}\right\rangle{}} \:=\: $$\lambda{} x_{n} \: . \: \lambda{} y_{n} \: . \: ({x} + {y})$
    Step 2: $[\![\mathbf{\text{two}}]\!]^{}_{n} \:=\: $${2}$
    Step 3: $[\![\mathbf{\text{plus}}]\!]^{}_{\left\langle{}n,\left\langle{}n,n\right\rangle{}\right\rangle{}}$ * $[\![\mathbf{\text{two}}]\!]^{}_{n}$ leads to: $[\![\mathbf{\text{[plus two]}}]\!]^{}_{\left\langle{}n,n\right\rangle{}} \:=\: $$\lambda{} y_{n} \: . \: ({2} + {y})$ [by FA]
    Step 4: $[\![\mathbf{\text{two}}]\!]^{}_{n} \:=\: $${2}$
    Step 5: $[\![\mathbf{\text{[plus two]}}]\!]^{}_{\left\langle{}n,n\right\rangle{}}$ * $[\![\mathbf{\text{two}}]\!]^{}_{n}$ leads to: $[\![\mathbf{\text{[[plus two] two]}}]\!]^{}_{n} \:=\: $${4}$ [by FA]

(Derivation for step 5 is $\Downarrow$)

In [118]:

(two * (plus * two))[0].content.derivation

Out[118]:

${[\lambda{} y_{n} \: . \: ({2} + {y})]}({2})$

$({2} + {2})$

Reduction

${4}$

In [119]:

((two * (plus * two)) * (isV * four)).trace()

Out[119]:

Full composition trace. 1 path:
    Step 1: $[\![\mathbf{\text{isV}}]\!]^{}_{\left\langle{}n,\left\langle{}n,t\right\rangle{}\right\rangle{}} \:=\: $$\lambda{} x_{n} \: . \: \lambda{} y_{n} \: . \: ({x} = {y})$
    Step 2: $[\![\mathbf{\text{four}}]\!]^{}_{n} \:=\: $${4}$
    Step 3: $[\![\mathbf{\text{isV}}]\!]^{}_{\left\langle{}n,\left\langle{}n,t\right\rangle{}\right\rangle{}}$ * $[\![\mathbf{\text{four}}]\!]^{}_{n}$ leads to: $[\![\mathbf{\text{[isV four]}}]\!]^{}_{\left\langle{}n,t\right\rangle{}} \:=\: $$\lambda{} y_{n} \: . \: ({4} = {y})$ [by FA]
    Step 4: $[\![\mathbf{\text{plus}}]\!]^{}_{\left\langle{}n,\left\langle{}n,n\right\rangle{}\right\rangle{}} \:=\: $$\lambda{} x_{n} \: . \: \lambda{} y_{n} \: . \: ({x} + {y})$
    Step 5: $[\![\mathbf{\text{two}}]\!]^{}_{n} \:=\: $${2}$
    Step 6: $[\![\mathbf{\text{plus}}]\!]^{}_{\left\langle{}n,\left\langle{}n,n\right\rangle{}\right\rangle{}}$ * $[\![\mathbf{\text{two}}]\!]^{}_{n}$ leads to: $[\![\mathbf{\text{[plus two]}}]\!]^{}_{\left\langle{}n,n\right\rangle{}} \:=\: $$\lambda{} y_{n} \: . \: ({2} + {y})$ [by FA]
    Step 7: $[\![\mathbf{\text{two}}]\!]^{}_{n} \:=\: $${2}$
    Step 8: $[\![\mathbf{\text{[plus two]}}]\!]^{}_{\left\langle{}n,n\right\rangle{}}$ * $[\![\mathbf{\text{two}}]\!]^{}_{n}$ leads to: $[\![\mathbf{\text{[[plus two] two]}}]\!]^{}_{n} \:=\: $${4}$ [by FA]
    Step 9: $[\![\mathbf{\text{[isV four]}}]\!]^{}_{\left\langle{}n,t\right\rangle{}}$ * $[\![\mathbf{\text{[[plus two] two]}}]\!]^{}_{n}$ leads to: $[\![\mathbf{\text{[[isV four] [[plus two] two]]}}]\!]^{}_{t} \:=\: $${True}_{t}$ [by FA]

(Derivation for step 9 is $\Downarrow$)

In [120]:

((two * (plus * two)) * (isV * four))[0].content.derivation

Out[120]:

${[\lambda{} y_{n} \: . \: ({4} = {y})]}({4})$

$({4} = {4})$

Reduction

${True}_{t}$

Equality

Toy example: arithmeticese¶

Ok, that was fun, but what's the point?

A few questions to be answered:

what exactly is happening in each of these jupyter cells?
what is it that is underlying the implementation here? Types, derivations, composition, metalanguage python objects, ...
what does it take to implement an analysis with off-the-shelf lambda notebook / python components?
what does it take to add something new?

Implementations: why bother?¶

More details $\Rightarrow$ stronger analysis, better sense of what assumptions matter.
Verifiability / replicability.
- In principle: a digital fragment is a compact representation of all necessary assumptions, well suited as an appendix (a la Montague Grammar fragments of old).
Interesting(?), strengthens useful skillsets
Possible: Connections to computational semantics / natural language understanding (NLU)

Alternatives to implementing?¶

Perhaps worth noting: NLU/NLP is not currently in the business of doing anything like this!

Large, powerful language models: predict the next word, predict masked word. Seem to "know" something (arguably, a lot) to do with "meanings".
E.g. latest internet flavor, Dall-E/Dall-E mini, based on GPT-3 ($\Downarrow$)

Dall-E mini on doing semantics:

dall-e "linguists doing semantics"

Implementions of compositionality in NLU?¶

There was a brief window around 2010 where it seemed like mainstream NLP might take on the challenge I am developing here. This wave was crushed by language modeling.

A few high points (for me) if you want to follow up; see the Kornai Sunday bootcamp for a lot more on this topic:

Baroni, Marco & Roberto Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. EMNLP 2010, 1183–1193.
Mitchell, Jeff & Mirella Lapata. 2010. Composition in distributional models of semantics. Cognitive Science 34(8). 1388–1429.
Baroni, Marco, Raffaella Bernardi & Roberto Zamparelli. 2014. Frege in space: A program for compositional distributional semantics. Linguistic Issues in Language Technologies 9(6). 5–110.
Socher, Richard, Brody Huval, Christopher D. Manning & Andrew Y. Ng. 2012. Semantic compositionality through recursive matrix-vector spaces. EMNLP/CoNLL, 1201–1211.
Socher, Richard, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng & Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. EMNLP 2013, 1631–1642.

Can linguistics-style computational semantics meet up with current NLU?¶

Unfortunately open question.

Certain recent work has benefitted from the overall compositional architecture, mapping object-language to metalanguage. A few highlights that I hope to touch on on Friday:

Lake, Brenden & Marco Baroni. 2018. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. ICML 35, 2873–2882.
Kim, Najoung & Tal Linzen. 2020. COGS: A compositional generalization challenge based on semantic interpretation. EMNLP 2020, 9087–9105.
Soulos, Paul, R. Thomas McCoy, Tal Linzen & Paul Smolensky. 2020. Discovering the compositional structure of vector representations with role learning networks. BlackboxNLP 3, 238–254.

Basics: python and jupyter¶

Python¶

As noted earlier, this course won't introduce python. But, a quick cheat sheet if you're rusty:

scope in python is determined by indentation
functions are typically defined by def fname(arguments): followed by an indented block. There's also a lambda args: returnval notation for writing "anonymous" functions.
Operators: the usual arithmetic operators. == for checking equality, = for assignment. and, or, not for boolean operators.
lists uses [], e.g. ['a','b','c'], tuples use () (with at least one comma), sets and dicts (~ hash tables) both use {} except that dicts have a mapping for each element (e.g. {1: 'a', 2: 'b', 3: 'c'})
main flow control elements: if condition:... elif condition: ... else:. for x in iter:, while condition:
comprehensions (especially list comprehensions) are well worth understanding. try:...except:... blocks.

Jupyter: code, cells & cell outputs, magics, kernels¶

Jupyter: takes the idea of an interactive interpreter and goes to 11, combining it with techniques for integrating code and documentation.

The document unit for Jupyter is a notebook.
These slides are generated from a notebook, which you can download and play around with.
The main app for interacting with notebooks is JupyterLab. There's also the "classic" notebook editor, which is simpler.

Note: Google Colab (seen in Khalil Iskarous' course) is running a jupyter-based UI. However, it's (annoyingly) not compatible with lambda notebook frontend code. TBD.

Cells¶

Jupyter breaks code into units of "cells".

Cells have a content (which is typically either code or markdown)
Cells have an output, which can be displayed.
The output of a code cell is determined by the value of the last line. (If it's not None, it's displayed.)

In [121]:

x = "this is a string assigned to a variable x"
x

Out[121]:

'this is a string assigned to a variable x'

In [122]:

print("You can also use python `print`, but it doesn't determine the cell output")
2+2

You can also use python `print`, but it doesn't determine the cell output

Out[122]:

More cell outputs¶

The jupyter frontend can handle displaying many useful things beyond text, including graphical outputs, and this is why Jupyter is fairly popular in data science. It also can be controlled more directly, e.g. via the display function.

In [123]:

from IPython.display import HTML
import nltk
display(HTML("<b>A table drawn via an HTML object:</b><table><tr><td>1</td><td>2</td></tr><tr><td>3</td><td>4</td></tr><tr><td>5</td><td>6</td></tr></table><br /><b>A tree drawn via an SVG object:</b>"))
display(nltk.Tree.fromstring("(S NP (VP V NP))")) # nltk trees render to a svg object via `svgling`

A table drawn via an HTML object:

1	2
3	4
5	6

A tree drawn via an SVG object:

Jupyter notebooks also support MathJax in markdown (including in markdown cell outputs). Aka latex math mode in the browser:

$\lambda x_e . \mathit{Cat}'(x)$ renders as:

$\lambda x_e . \mathit{Cat}'(x)$

More advanced frontend things: injected javascript code, widgets / notebook extensions, ...

There's so much potential here for interesting interactive tools, teaching aids, etc...largely untapped in linguistics/logic.

Side note: linguistics wishlist¶

running numbered examples
better bibtex support (though things now exist)

Tree diagrams were on this list for a long time until I finally buckled down and wrote a solution myself - svgling.

Magics¶

If you see a %% at the beginning of a cell, the cell is invoking a magic. These change the behavior of cells from processing python, in potentially drastic ways.

There is also % for line magics.

Lambda notebook uses various custom magics (return to this shortly):

In [124]:

%%lamb
||cat|| = lambda x_e : Cat_<e,t>(x)

$[\![\mathbf{\text{cat}}]\!]^{}_{\left\langle{}e,t\right\rangle{}} \:=\: $$\lambda{} x_{e} \: . \: {Cat}({x})$

Behind the scenes, each jupyter notebook makes use of a kernel while running. This is the basic program that executes cells.

The default kernel is a python kernel.
The lambda notebook is primarily implemented as a kernel replacing this default.
It lets you run python code normally, but enables the various magics and support code that constitutes the lambda notebook.

The number one jupyter caveat¶

If there's one thing that trips people up in Jupyter, it's order of execution.

Code cells don't run until you run them.
State is shared across cells.
But the order in which state is updated is determined by the order in which cells are run.

Tips: be wary of running out of order. Make cells self-contained. Group imports. Pay attention to input numbers. Use "Run all above selected cell" (and other "Run all" commands). When in doubt, restart the kernel and run all.

The number two jupyter caveat¶

Jupyter's best use case is not for developing software that will be called by other notebooks.

Great for:

Just messing around in python
Prototyping
Documenting a package (especially one that supports jupyter/ipython reprs)
Documenting a process/analysis (data science, but also stuff like this class
...

That's Jupyter in a nutshell -- questions?

An architecture for implementation¶

Compositional semantics provides a clear starting point for how to approach implementing.

typed metalanguage, so implement types and the corresponding metalanguage
mapping: object language $\Rightarrow$ metalanguage
- composition: recursive combination of meaning representations via composition operations
- so, we need to figure out composition operations

Lambda notebook¶

The lambda notebook has 4 parts:

code to implement metalanguage representations and tools for manipulating them
code to implement mappings from a structured object language to the metalanguage, aka composition systems
a type system, used in both the object and metalanguage
frontend code to display objects in 1-3 using Jupyter output code.

Metalanguage¶

The metalanguage is an extension of a relatively straightforward first-order logic + lambda calculus.

This is a programmer's metalanguage, not a logician's...
Various extensions: sets, tuples, numbers, various non-standard operators, partial definedness, ...
No real model theory or inference, which could be a big gap depending on your goals. (E.g. this isn't an attempt to reinvent prolog, or write my own solver.)
Parses via the %te line magic.

In [125]:

%te L f_<e,t>: L g_<e,t>: L x_e: f(x) & g(x)

Out[125]:

$\lambda{} f_{\left\langle{}e,t\right\rangle{}} \: . \: \lambda{} g_{\left\langle{}e,t\right\rangle{}} \: . \: \lambda{} x_{e} \: . \: ({f}({x}) \wedge{} {g}({x}))$

Composition systems¶

A composition system is a set of composition operations together with a bunch of code to do something with them.

Systems either work via bottom up composition, or operate on tree objects.
When composable objects are combined bottom-up via the python * operator, the composition system looks for a successful combination.
Here's the default system:

In [126]:

lamb.lang.get_system()

Out[126]:

Composition system 'H&K simple'
Operations: {
    Binary composition rule FA, built on python function 'lamb.lang.fa_fun'
    Binary composition rule PM, built on python function 'lamb.lang.pm_fun'
    Binary composition rule PA, built on python function 'lamb.lang.pa_fun'
    Binary composition rule VAC, built on python function 'lamb.lang.binary_vacuous'
}

Types¶

The type system has:

Some number of elementary types. Defaults: $e, n, t$.
Various complex "type constructors", most important being the standard functional type (notated via "$\langle, \rangle$"). Type constructors for sets, tuples.
Polymorphic types, aka type variables.
Code that implements polymorphic "type unification".

Recap so far¶

Goal: provide a means of implementing, in python, key elements of Linguistically-focused Compositional Semantics

Questions raised: what is in implementation, why could it be valuable, and how could it meet up with concerns in modern NLP (if it does)?
Main tools beyond just "python": Jupyter notebooks, the Jupyter Lambda Notebook project
Quick tutorial of Jupyter
Architecture for implementation, derived directly from compositional semantics

Exercises¶

Install Jupyter and the Lambda Notebook
Step through this notebook and figure out how to run cells, etc
Play around with the following pure python implementation of arithmeticese. Try constructing some new computations. How is it different or similar to what you expect from an implementation?

In [127]:

def isV(x):
    return lambda y: x == y

def plus(x):
    return lambda y: y + x

def two():
    return 2

def four():
    return 4

isV(plus(two())(two()))(four())

Out[127]:

True

NASSLLI 2022: implementing semantic compositionality¶

Kyle Rawlins, kgr@jhu.edu¶

Johns Hopkins University, Department of Cognitive Science¶

(Current) plan for the week¶

Plan for this notebook/slide set:¶

Tech note¶

Overview¶

Goals: computation¶

Goals: linguistics/philosophy¶

Background?¶

Compositionality, preliminary¶

First set of desiderata¶

Implementations, and why bother¶

What is an implementation?¶

Marr levels?¶

Degrees of implementation¶

Other relevant projects¶

Toy example: arithmeticese¶

Toy example: arithmeticese¶

Toy example: arithmeticese¶

Implementations: why bother?¶

Alternatives to implementing?¶

Implementions of compositionality in NLU?¶

Can linguistics-style computational semantics meet up with current NLU?¶

Basics: python and jupyter¶

Python¶

Jupyter: code, cells & cell outputs, magics, kernels¶

Cells¶

More cell outputs¶

Side note: linguistics wishlist¶

Magics¶

The number one jupyter caveat¶

The number two jupyter caveat¶

An architecture for implementation¶

Lambda notebook¶

Metalanguage¶

Composition systems¶

Types¶

Recap so far¶

Exercises¶

Kyle Rawlins, kgr@jhu.edu ¶