so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMs
Published in EMNLP 2025, 2025
Recommended citation: Bhyravajjula, S., Walsh, M., Preus, A., & Antoniak, M. (2025). "so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMs." EMNLP 2025. https://aclanthology.org/2025.emnlp-main.1783/
Analyzed whitespace as a semantic-spatial element in poetry using 19k Poetry Foundation poems by 4k poets, releasing 2.8k public-domain texts with original formatting. Created WISP, a novel whitespace typology, and released WISP-Bench, a benchmark to evaluate whitespace performance in linearization methods that capture image-to-text, highlighting the impacts of dataset preparation on NLP and LLM tokenization design strategies.
Authors: Sriharsh Bhyravajjula, Melanie Walsh, Anna Preus, Maria Antoniak
Venue: EMNLP 2025 (Main), Suzhou, China
Recommended citation: Bhyravajjula, S., Walsh, M., Preus, A., & Antoniak, M. (2025). “so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMs.” EMNLP 2025.
