Computational notebooks (in short notebooks) provide a popular environment for literate programming, especially in education contexts. This also applies to Natural Language Processing ( ). On this page, a first overview is provided of the use of Notebooks in connection with the CLARIN infrastructure.
Notebooks Provided by CLARIN Centres
The following notebooks are provided by CLARIN centres:
- LINDAT/CLARIAH-CZ:
- SAFMORIL:
- ILC4CLARIN:
- PORTULAN CLARIN:
- Tokenization: Segmentation of texts into lexical tokens.
- Syllabification: Syllabification of expressions.
- Sentence splitting: Segmentation of texts into sentences and paragraphs.
- UPOS tagging: Tokenization and morphosyntactic tagging of expressions in texts with Universal Dependencies POS tagset.
- LXPOS tagging: Tokenization and morphosyntactic tagging of expressions in texts with LX POS tagset.
- UÉvoraPOS tagging: Tokenization and morphosyntactic tagging of expressions in texts with UÉvora POS tagset.
- Universal Sub-syntactic analysis: Tokenization, lemmatization, inflection analysis and morphosyntactic tagging of expressions in texts within the Universal Dependencies framework.
- LX Sub-syntactic analysis: Tokenization, lemmatization, inflection analysis and morphosyntactic tagging of expressions in texts within the LX framework.
- Named entity recognition: Detection and semantic classification of names in texts.
- Universal Dependency parsing: Analysis of grammatical functions in sentences within Universal Dependencies framework.
- LX Dependency parsing: Analysis of grammatical functions in sentences within LX framework.
- Constituency parsing: Analysis of syntactic constituents in sentences.
- Grammatical quantitative analysis: Occurrence counting of grammatical elements in texts.
- Machine Translation: Translation of a sentence from a source language to a target language (Portuguese-Chinese).
Notebooks for Processing Europeana Newspaper Collections
Jupyter notebooks for Europeana newspaper text resource processing with CLARIN NLP tools:
- SSHOC Marketplace entry about this tutorial
- Notebooks on github
- Introductory screencast presenting the resources and the notebooks
-
Tutorial slides:
- Candela, G., Chambers, S., & Sherratt, T. (2023). An approach to assess the quality of Jupyter projects published by GLAM institutions. Journal of the Association for Information Science and Technology, 1–15. https://doi.org/10.1002/asi.24835