June 2020 – Present

Research Programmer

Information Sciences Institute, University of Southern California

Programmer in Natural Language Processing at USC ISI Center for Vision, Image, Speech and Text Analytics (VISTA).
June 2019 – December 2019
Massachusetts, USA

Research Data Analyst

McLean Hospital, Harvard Medical School

Applied Machine Learning and NLP techniques to electronic health records to predict early readmission risk.
June 2017 – June 2018
Madrid, Spain

Research Assistant

Universidad de Educación a Distancia (UNED)

Research Assistant at the Digital Humanities Lab of the School of Computer Science at UNED University.
December 2014 – December 2015
Madrid, Spain

Linguistic Data Analyst

Fundación del Español Urgente

Conducted a Corpus Linguistics project on the evolution of the Spanish language on the media during the 20th century.
October 2010 – January 2016
Madrid, Spain

Analytical Linguist

Molino de Ideas

Developed, annotated and evaluated linguistic resources for Spanish language.



A shared task on automatic detection of borrowings at IberLEF 2021. Organized with Luis Espinosa Anke, Julio Gonzalo, Constantine Lignos and Jordi Porta.


A PyTorch model that classifies Spanish text as being easy to read (plain language) or not.


A scraper for extracting the text of news articles via RSS.

Observatorio Lázaro

An observatory of anglicism usage in the Spanish press.

Corpus of political speeches

Analysis and visualizations in Python of a corpus of Spanish political speeches from 1937 to 2019.


Named Entity Recognition for podcast transcripts. With Julian Fernandez, Kristen Sheets and Linxuan Yang.

Figurative language classification

A project on annotation and classification of non literal tweets. With Qingwen Ye and Julia Cathcart.

Subtitles Corpus

A corpus of Spanish subtitles from LOTR, Star Wars, OITNB, GoT, HIMYM, etc.


A corpus linguistics project supported by Fundeu on the evolution of the Spanish language on the media during the 20th century. With Leticia Martín-Fuertes and Molino de Ideas.


A rule-based automatic language detector based on the syllable structure of words. Current supported languages: Spanish, French, Italian, Portuguese, Catalan, Latin and Basque.


More Publications

Proceedings of the Society for Computation in Linguistics (SCiL 2021): Vol. 4 , Article 41.

Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), at SEPLN 2020.

Computational Linguistics MS thesis, Department of Computer Science, Brandeis University

Proceedings of the 4th Workshop on Computational Approaches to Code Switching at LREC2020

Proceedings of the 12th Conference on Language Resources and Evaluation Conference (LREC2020)

Proceedings of the 10th International Workshop on Health Text Mining and Information Analysis at EMNLP2019.

Writing & dissemination

I occasionally write a column about language for Spanish newspaper eldiario.es, a column that was awarded with the Miguel Delibes National Journalism Award (Premio Nacional de Periodismo Miguel Delibes) in 2017 for an article about conceptual metaphor and cancer (Metáforas peligrosas. El cáncer como lucha).

I also write for Archiletras, a pop Linguistics magazine where I’m also member of the editorial board.

In 2016 I wrote the pop linguistics book Anatomía de la Lengua.

From 2012 to 2015 I was a radio contributor at Spanish National Radio (RNE) on a weekly section about language and Linguistics.

Some of my personal writing can be read in my old blog (in Spanish).

These are the columns and other journalistic contributions I have written so far:

Talks & media appearances

More Talks

Article by Álex Grijelmo for El País about Observatorio Lázaro

Interview for linguistic podcast Con la lengua fuera

Interview for Universidad Nacional de Educación a Distancia (UNED)

Acceptance speech for Premio Nacional de Periodismo Miguel Delibes 2017.

Interview for the Valladolid Press Association (Asociación de Prensa de Valladolid) for the Miguel Delibes Journalism Award.

Honors & awards

Sep 2020

Outstanding Corpus Thesis Award (MS level)

Institute for Corpus Research, Incheon National University

Jun 2020

Karen Spärck Jones Award for Outstanding Achievement in Natural Language Processing

Brandeis University

Premio Nacional de Periodismo Miguel Delibes

Asociación de Prensa de Valladolid

Jul 2017

LaCaixa Scholarship for graduate studies in the US

LaCaixa Foundation

Nov 2009

First award of the Arquímedes National Contest for Young Researchers

Ministry of Science and Education of Spain