Home
Events
2024-11-11

Workshop "Data Structures and Cleaning Textual Data"

Yann Audin (Phd candidate in digital humanities at the Université de Montréal and project leader at the Canada Research Chair in Digital Textualities) will be leading a series of three workshops on automatic language processing.

The third workshop "Data Structures and Cleaning Textual Data" is aimed at people with a knowledge of Python who want to learn how to cleanse textual data and use JSON, CSV and XML data formats. Participants will analyze a literary text of their choice using the Python libraries Spacy and NLTK. They will also learn how to transform a text into textual data according to their research interests.

Python is used in automatic language processing, programming education, artificial intelligence, scientific programming, web development and many other fields. This so-called high-level language is particularly readable by humans, which contributes to its popularity. Python is distributed under a very permissive license, and is supported by a strong and vast community of practice that develops libraries for almost any situation.

This workshop will take place on November 11 2024 at CRIHN, room C-8132, 3150 rue Jean Brillant, Université de Montréal from 10:30 a.m. to noon.

Downloading a recent version of Anaconda is recommended, but not necessary.

{ "lang": "en", "plausible_domain": "ecrituresnumeriques.ca", "plausible_api_host": "https://plausible.ecrituresnumeriques.ca", "id": 1582, "settings_id": 1, "languages_code": "en", "site_title": "Research Laboratory on Digital Textualities", "site_description": "Research Laboratory on Digital Textualities, led by Marcello Vitali-Rosati and his team.", "feature_image": null, "site_email": "contact@ecrituresnumeriques.ca", "site_address_html": "Université de Montréal Pavillon Lionel Groulx 8e étage Local C8041 3150, rue Jean Brillant Montréal (QC) H3T 1N8 Canada", "site_phone": "(+1) 514 343 5665", "site_rights": "© 2026 Laboratory in Digital Textualities. Some rights reserved.", "site_social_accounts": [ { "label": "Mastodon", "icon": "mastodon", "url": "https://mamot.fr/@ENumeriques" }, { "label": "X/Twitter", "icon": "twitter", "url": "https://twitter.com/ENumeriques" }, { "label": "Zotero", "url": "https://www.zotero.org/groups/critures_numriques/items", "icon": "zotero" }, { "label": "GitLab", "icon": "gitlab", "url": "https://gitlab.huma-num.fr/ecrinum/" }, { "label": "GitHub", "icon": "github", "url": "https://github.com/Ecrituresnumeriques" }, { "label": "Facebook", "icon": "facebook", "url": "https://www.facebook.com/Chaire-de-recherche-du-Canada-sur-les-%C3%A9critures-num%C3%A9riques-439923422871073/" }, { "label": "Instagram", "icon": "instagram", "url": "https://www.instagram.com/enumeriques/" }, { "label": "Internet Archive", "url": "https://archive.org/details/@crc_sur_les_critures_num_riques", "icon": "archive" }, { "label": "Nakala", "icon": "archive", "url": "https://nakala.fr/collection/10.34847/nkl.55b04ql2" }, { "label": "Papyrus Repository", "url": "https://papyrus.bib.umontreal.ca/xmlui/browse?type=affiliation&value=Universit%C3%A9%20de%20Montr%C3%A9al.%20Chaire%20de%20recherche%20du%20Canada%20sur%20les%20%C3%A9critures%20num%C3%A9riques", "icon": "udem" }, { "label": "YouTube", "icon": "youtube", "url": "https://www.youtube.com/channel/UC5LIw0dopbSSgqI2zdIi84w" } ], "site_menu_main": [ { "label": "The lab", "url": "", "items": [ { "label": "About", "url": "/en/about" }, { "label": "Team", "url": "/en/team" } ] }, { "label": "Activities", "url": "", "items": [ { "label": "Projects", "url": "/en/projects" }, { "label": "Events", "url": "/en/events" }, { "label": "Publications (Zotero)", "url": "https://www.zotero.org/groups/322999/critures_numriques/library", "external": true } ] }, { "label": "Topics & Concepts", "items": [ { "label": "Research Axis", "url": "/en/research-axis" }, { "label": "Research Objects", "url": "/en/research-objects" }, { "label": "Research Fields", "url": "/en/research-fields" }, { "label": "Key Concepts", "url": "/en/key-concepts" } ] } ], "site_menu_secondary": [ { "label": "Français", "url": "/fr" } ], "site_menu_footer": [ { "label": "About", "url": "/en/about" }, { "label": "Projects", "url": "/en/projects" }, { "label": "Stylo", "external": true, "url": "https://stylo.huma-num.fr/" }, { "label": "Sens public", "url": "https://www.sens-public.org/", "external": true }, { "label": "Revue3.0", "external": true, "url": "https://revue30.org/" }, { "label": "Revue 2.0", "external": true, "url": "https://revue20.ecrituresnumeriques.ca/" }, { "label": "Greek Anthology", "url": "https://anthologiegrecque.org", "external": true }, { "label": "Skholé", "url": "https://skhole.ecrituresnumeriques.ca/" } ], "site_header_image_id": "c92549d9-8c6e-4805-b162-c952920257f9", "site_long_description": "We are now living in a digital space. This space is made of writing. Our identities are writing – profiles, databases' entries, lines of code –, our actions are writing – from clicks to buying a book or planning a trip – the objects around us are made of writing. The Research Laboratory on Digital Textualities aims to offer a new reading and a new understanding of this writing that now makes our world. On this site you will find all the projects led by Marcello Vitali-Rosati and his team, the publications of the Lab members and the description of all the theoretical concepts used for our research. ", "site_zotero_group_id": "322999", "site_footer_logos": [ { "directus_files_id": "f096827c-b644-45ed-89a6-e0cb2367006f" }, { "directus_files_id": "f53ad38e-86e6-4656-8915-35f890f8d2fd" }, { "directus_files_id": "41b61cb3-a410-4a81-a591-f943f30aa775" }, { "directus_files_id": "851f10ec-e4ad-4fc2-906c-7385a09af30f" }, { "directus_files_id": "f3a97c97-1f47-46a7-a698-4c99b2fc3689" }, { "directus_files_id": "0d8cf306-5e8c-438e-973a-d27f553addb4" } ], "date_start": "2024-11-11", "date_end": "2024-11-11", "time_start": "10:30:00", "time_end": "12:00:00", "event_type": "workshop", "event_languages": [ "fr" ], "events_id": 791, "title": "Workshop \"Data Structures and Cleaning Textual Data\"", "slug": "workshop-python-structure", "content_html": "Yann Audin (Phd candidate in digital humanities at the Université de Montréal and project leader at the Canada Research Chair in Digital Textualities) will be leading a series of three workshops on automatic language processing. \nThe third workshop \"Data Structures and Cleaning Textual Data\" is aimed at people with a knowledge of Python who want to learn how to cleanse textual data and use JSON, CSV and XML data formats. Participants will analyze a literary text of their choice using the Python libraries Spacy and NLTK. They will also learn how to transform a text into textual data according to their research interests.\nPython is used in automatic language processing, programming education, artificial intelligence, scientific programming, web development and many other fields. This so-called high-level language is particularly readable by humans, which contributes to its popularity. Python is distributed under a very permissive license, and is supported by a strong and vast community of practice that develops libraries for almost any situation.\nThis workshop will take place on November 11 2024 at CRIHN, room C-8132, 3150 rue Jean Brillant, Université de Montréal from 10:30 a.m. to noon.\nDownloading a recent version of Anaconda is recommended, but not necessary. ", "links": null, "description": "The third workshop \"Data Structures and Cleaning Textual Data\" is aimed at people with a knowledge of Python who want to learn how to cleanse textual data and use JSON, CSV and XML data formats.", "location": "CRIHN, salle C-8132, 3150 rue Jean Brillant, Université de Montréal", "legacy_image": "https://donnees.ecrituresnumeriques.ca/assets/93a8cba9-0b1f-461a-9c47-e1b0dcc1f5dc?key=system-medium-cover&modified=2024-11-07T15:37:26.217Z", "legacy_slug": null, "legacy_location": null }