Corpus of Paraguayan Spanish and Guarani

Languages and Applied Linguistics

WHO: Josefina Bittar, Madeleine Powell.

WHAT: Madeleine helped build and publish a database of interviews (a corpus) in Paraguayan Spanish and Guarani. She corrected the transcriptions and anonymized the audio. Madeleine is currently working with the manager of the California Language Archive to upload the audio files and transcriptions into the Archive.

WHY: A linguistic corpus is a collection of language production items, such as written texts, video and audio recordings, that document language usage in (a) specific period(s) of time by a particular community or communities and that generally feature language users with diverse characteristics (age, sex, place of birth, linguistic profile). The Corpus of Paraguayan Spanish and Guarani of Asunción (CEGPA) is a collection of fourteen-hour-long audio interviews with people aged 18 to 91 living in Paraguay's capital. This country is known for its widespread bilingualism in Spanish and Guarani, one of the native languages. By making the collection available to the academic community through the California Language Archive platform, researchers worldwide can study the characteristics of Paraguayan Spanish and Paraguayan Guarani, as well as the code-switching between the two. Exploring dialect and language features, especially among bilingual populations, offers invaluable insight into how languages change when in contact with other languages and how multilingual people use their entire linguistic repertoire to express ideas and form identities. The CEPGA was carefully transcribed by USCS alum Valentín Barbosa and Guarani expert Antonio Adrián Zena Mereles. Assistant Professor Josefina Bittar, UCSC alumna Erika Garcia Aceves, and current UCSC senior Madeleine Powell edited it. The recordings were collected in 2015 in Asunción, Paraguay, by Josefina Bittar and included the participation of community leader and docent Prudencio Israel Pedrozo Candia as an interviewer. The anonymous voices of the Corpus are all residents of Asunción who volunteered their time and shared their incredible life stories and points of view during the interviews. Assistant Professor Josefina Bittar hopes this Corpus will promote and increase linguistic research on her country of origin in Paraguay.

WHAT'S NEXT: The publication of the Corpus is expected for June of this year! Stay tuned!

THE WOW: “Throughout the process, I was able to learn more about the language landscape of Asunción, Paraguay, and I’ve learned a lot about the amount of work that goes into creating a resource as extensive as a linguistic corpus,” Madeleine said. “In working through the data, I developed strategies that helped me focus on the smaller details in the transcriptions to ensure each participant in the corpus has their personally identifying information removed from the data. I’ve developed time management and task delegation skills, and I’ve truly enjoyed working with Professor Bittar. I am so glad to have been able to contribute to a resource promoting future studies of bilingualism.”

Madeleine Powell and Josefina Bittar are working on the final edits of one of the interviews that are featured in the Corpus of Paraguayan Spanish and Guarani of Asunción (CEGPA)