Experience

 
 
 
 
 
July 2021 – Present
Madrid, Spain

Research Staff

NLP & IR group, UNED

PhD student at the Natural Language Processing and Information Retrieval group at the School of Computer Science at UNED University.
 
 
 
 
 
June 2020 – June 2021

Research Programmer

Information Sciences Institute, University of Southern California

Programmer in Natural Language Processing at USC ISI Center for Vision, Image, Speech and Text Analytics (VISTA).
 
 
 
 
 
June 2019 – December 2019
Massachusetts, USA

Research Data Analyst

McLean Hospital, Harvard Medical School

Applied Machine Learning and NLP techniques to extract un electronic health records to predict early readmission risk.
 
 
 
 
 
June 2017 – June 2018
Madrid, Spain

Research Assistant

Digital Humanities Lab, UNED

Research Assistant at the Digital Humanities Lab of the School of Computer Science at UNED University.
 
 
 
 
 
December 2014 – December 2015
Madrid, Spain

Linguistic Data Analyst

Fundación del Español Urgente

Conducted a Corpus Linguistics project on the evolution of the Spanish language on the media during the 20th century.
 
 
 
 
 
October 2010 – January 2016
Madrid, Spain

Analytical Linguist

Molino de Ideas

Developed, annotated and evaluated linguistic resources for Spanish language.

Talks & media appearances

More Talks

Radio interview at Noosfera on Linguistics and Computational Linguistics

Talk on Observatorio Lázaro at Trabalengua 2021 conference

Radio interview at Un idioma sin fronteras at the Spanish National Radio

Conversation on Twitch with Lengwitch about linguistics communication on the internet

Interview for radio program La Tarde at COPE with Pilar Cisneros.

Observatorio Lázaro featured at Julia en la Onda radio program at Onda Cero.

Article by Álex Grijelmo for El País about Observatorio Lázaro

Publications

More Publications

(2022). Borrowing or Codeswitching? Annotating for Finer-Grained Distinctions in Language Mixing. Accepted the 14th Conference on Language Resources and Evaluation (LREC 2022).

PDF Code Video

(2022). Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022): Long Papers.

PDF Code Slides Video

(2021). Overview of ADoBo 2021: Automatic Detection of Unassimilated Borrowings in the Spanish Press. Procesamiento del Lenguaje Natural (SEPLN 2021): Vol. 67 , p. 277-285.

PDF Slides Video

(2021). Extracting English Lexical Borrowings from Spanish Newswire. Proceedings of the Society for Computation in Linguistics (SCiL 2021): Vol. 4 , Article 41.

PDF Slides

(2020). Two models for Named Entity Recognition in Spanish: submission for the CAPITEL Shared Task at IberLEF 2020. Proceedings of the Iberian Languages Evaluation Forum..

PDF

(2020). Lázaro: An Extractor of Emergent Anglicisms in Spanish Newswire. Computational Linguistics MS thesis, Department of Computer Science, Brandeis University.

PDF Code Slides

Projects

pylazaro

A Python library that automatically detects lexical borrowings (or loanwords) in Spanish

COALAS 🐨

COrpus of AngLicisms in the SpAnish PresS. With Constantine Lignos

Observatorio Lázaro

An observatory of anglicism usage in the Spanish press.

@LazaroBot

A Twitter bot that tweets new anglicisms found in the Spanish press.

ADoBo

A shared task on automatic detection of borrowings at IberLEF 2021. Organized with Luis Espinosa Anke, Julio Gonzalo, Constantine Lignos and Jordi Porta.

Caravaggio

A PyTorch model that classifies Spanish text as being easy to read (plain language) or not.

Morssa

A scraper for extracting the text of news articles via RSS.

Corpus of political speeches

Analysis and visualizations in Python of a corpus of Spanish political speeches from 1937 to 2019.

NER4Podcasts

Named Entity Recognition for podcast transcripts. With Julian Fernandez, Kristen Sheets and Linxuan Yang.

Figurative language classification

A project on annotation and classification of non literal tweets. With Qingwen Ye and Julia Cathcart.

Subtitles Corpus

A corpus of Spanish subtitles from LOTR, Star Wars, OITNB, GoT, HIMYM, etc.

Aracne

A corpus linguistics project supported by Fundeu on the evolution of the Spanish language on the media during the 20th century. With Leticia Martín-Fuertes and Molino de Ideas.

AZRAEL

A rule-based automatic language detector based on the syllable structure of words. Current supported languages: Spanish, French, Italian, Portuguese, Catalan, Latin and Basque.

Writing & dissemination

I occasionally write a column about language for Spanish newspaper elDiario.es, a column that was awarded with the Miguel Delibes National Journalism Award (Premio Nacional de Periodismo Miguel Delibes) in 2017 for an article about conceptual metaphor and cancer (Metáforas peligrosas. El cáncer como lucha).

I also write for Archiletras, a pop Linguistics magazine where I’m also member of the editorial board.

In 2016 I wrote the pop linguistics book Anatomía de la Lengua.

From 2012 to 2015 I was a radio contributor at Spanish National Radio (RNE) on a weekly section about language and Linguistics.

Some of my personal writing can be read in my old blog (in Spanish).

These are the columns and other journalistic contributions I have written so far:

Honors & awards

Socia de honor de Asetrad

Asociación Española de Traductores, Correctores e Intérpretes

Adam Kilgarriff Prize

Adam Kilgarriff Endowment Fund

Premio Archiletras de la Lengua de investigación

Revista Archiletras

Generation Google Scholarship for Women in Computer Science

Google

Premio HDH 2021 (Hispanic Digital Humanities Award)

Asociación de Humanidades Digitales Hispánicas

Outstanding Corpus Thesis Award (MS level)

Institute for Corpus Research, Incheon National University

Jun 2020

Karen Spärck Jones Award for Outstanding Achievement in Natural Language Processing

Brandeis University

Premio Nacional de Periodismo Miguel Delibes

Asociación de Prensa de Valladolid

Jul 2017

LaCaixa Scholarship for graduate studies in the US

LaCaixa Foundation

Nov 2009

First award of the Arquímedes National Contest for Young Researchers

Ministry of Science and Education of Spain

Contact