so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMs

Published in EMNLP 2025, 2025

Recommended citation: Bhyravajjula, S., Walsh, M., Preus, A., & Antoniak, M. (2025). "so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMs." EMNLP 2025. https://aclanthology.org/2025.emnlp-main.1783/

Analyzed whitespace as a semantic-spatial element in poetry using 19k Poetry Foundation poems by 4k poets, releasing 2.8k public-domain texts with original formatting. Created WISP, a novel whitespace typology, and released WISP-Bench, a benchmark to evaluate whitespace performance in linearization methods that capture image-to-text, highlighting the impacts of dataset preparation on NLP and LLM tokenization design strategies.

Download paper here

Authors: Sriharsh Bhyravajjula, Melanie Walsh, Anna Preus, Maria Antoniak

Venue: EMNLP 2025 (Main), Suzhou, China

Recommended citation: Bhyravajjula, S., Walsh, M., Preus, A., & Antoniak, M. (2025). “so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMs.” EMNLP 2025.