PARLIAMENTARY DATA INFRASTRUCTURES FOR CLIMATE DISCOURSE ANALYSIS: CHALLENGES OF METADATA, TEI, AND CROSS-LINGUAL MODELLING

PARLIAMENTARY DATA INFRASTRUCTURES FOR CLIMATE
DISCOURSE ANALYSIS: CHALLENGES OF METADATA, TEI, AND
CROSS-LINGUAL MODELLING

Davit Sidamonidze Researcher / Caucasus University, Tbilisi, Georgia
Nana Deisadze Researcher / Tbilisi State University, Tbilisi, Georgia

Historical parliamentary data have become central resources for computational
humanities, political text analysis, and digital infrastructure research. Yet while growing attention has been given to the substantive analysis of parliamentary
discourse, less work has focused on the infrastructural conditions that make
such analysis possible, particularly in multilingual and historically
heterogeneous corpora. This paper addresses that gap by examining the
methodological and infrastructural challenges involved in building multilingual
parliamentary data infrastructures for climate discourse analysis.
Drawing on European parliamentary corpora, including EuroParl, ParlaMint,
siParl, and national legislative repositories, the paper investigates how
metadata quality, TEI encoding standards, interoperability frameworks, and
cross-lingual modelling shape the possibilities and limitations of computational
analysis. Rather than treating corpora as neutral data sources, the paper
conceptualises them as socio-technical infrastructures whose design decisions
influence research outcomes. Particular attention is given to challenges of
incomplete metadata, OCR noise in historical records, speaker identification,
diachronic harmonisation, multilingual semantic alignment, and topic
comparability across languages.
Methodologically, the paper proposes an integrative infrastructure-oriented
workflow combining TEI-based corpus harmonisation, metadata normalisation,
multilingual embeddings, and structural topic modelling. It argues that
infrastructure-building is itself a form of scholarly intervention central to digital
humanities and computational social science. By positioning parliamentary
corpora not simply as datasets but as evolving knowledge infrastructures, the
paper contributes to current debates on interoperability, reproducibility, and
the politics of digital research infrastructures.
• 1 Introduction
The rapid growth of digitised parliamentary records has transformed the
possibilities for analysing political language historically and comparatively. Large
scale legislative corpora now support research across political science,
computational linguistics, digital humanities, and corpus studies, enabling scholars
to trace ideological change, rhetorical strategies, institutional practices, and policy
narratives at unprecedented scale (Proksch & Slapin, 2015; Erjavec et al., 2023).
Yet such analytical possibilities depend fundamentally on infrastructure.
Parliamentary discourse does not become computationally analyzable merely
through digitisation. It requires structured metadata, interoperable markup,
multilingual alignment, robust annotation standards, and reproducible workflows.
These infrastructural dimensions remain surprisingly underexamined relative to the growing body of substantive research using parliamentary corpora.
This is particularly true for climate discourse analysis. Research on climate politics has increasingly recognised discourse as constitutive of governance (Hajer, 1995; Dryzek, 2013), and parliamentary debates offer crucial sites where competing visions of transition, justice, sovereignty, and energy futures are articulated. However, studying these narratives comparatively across languages and over long historical periods raises difficult methodological problems.
Parliamentary corpora are heterogeneous. Historical records contain OCR errors,
inconsistent metadata, changing institutional vocabularies, incomplete speaker
attributes, and divergent markup practices. Cross-national corpora introduce
additional problems of semantic equivalence, translation asymmetry, and
incompatible annotation conventions. These are not simply technical
inconveniences; they shape what can be interpreted computationally.
This paper therefore shifts focus from analysing climate discourse itself toward the infrastructures that make such analysis possible. It asks:
1. What infrastructural challenges arise when building multilingual
parliamentary datasets for climate discourse analysis?
2. How do metadata, TEI standards, and interoperability frameworks affect
cross-lingual comparability?
3. What methodological strategies can support robust multilingual topic and
semantic modelling across heterogeneous parliamentary sources?
The paper contributes in three ways.
First, it advances digital humanities discussions by treating data infrastructures as
objects of analysis rather than invisible technical background.
Second, it contributes methodologically by proposing a reproducible infrastructure
oriented workflow integrating TEI encoding, metadata harmonisation, structural
topic models, and multilingual embeddings.
Third, it speaks directly to the concerns of historical parliamentary data scholarship, particularly debates around interoperability, linked data, and sustainable corpus infrastructures.
• 2 Parliamentary Data as Infrastructure
2.1 From corpus as dataset to corpus as infrastructure
Digital humanities has increasingly moved beyond treating data as neutral empirical material and toward recognising data production itself as epistemologically significant. Bowker and Star’s classic work on classification systems showed that standards and infrastructures shape knowledge production through what they make visible and invisible. This insight applies strongly to parliamentary corpora. Parliamentary datasets are not merely repositories of speeches. They are layered infrastructures built through archival selection, digitisation, transcription, markup, metadata modelling, and interface design. Each layer introduces assumptions that affect downstream analysis.
A speech record, for example, may include speaker identity, party affiliation, date,
chamber, agenda item, intervention type, and legislative context. But such
metadata are often incomplete, inconsistent across periods, or differently modelled across countries.
A missing party label is not simply absent data; it constrains ideology analysis. A
non-standard date format impedes diachronic modelling. Inconsistent speaker
identifiers fragment parliamentary careers. Thus infrastructure is inseparable from interpretation.
2.2 TEI and parliamentary encoding standards
The Text Encoding Initiative (TEI Consortium, 2023) has become central for
representing complex textual structures in digital scholarship. For parliamentary
corpora, TEI supports interoperable encoding of speeches, speakers, procedural
structures, interruptions, and metadata relationships.
ParlaMint represents a major advance in this regard. Its TEI architecture enables
harmonised representation across legislatures while preserving national variation
(Erjavec et al., 2023).
Core advantages include:
• standardised utterance segmentation
• speaker metadata integration
• interoperability across corpora
• machine-readable XML structures
• compatibility with CLARIN workflows
Yet TEI does not eliminate interpretive or technical tensions.
Granularity choices matter. Should interruptions be separate speech events?
Should committee proceedings and plenary debates be equally represented? How should historical orthographic variation be encoded?
Encoding is never neutral.
2.3 Research infrastructures and interoperability
European infrastructures such as CLARIN ERIC (2023) and DARIAH have increasingly positioned interoperability as a central principle for reusable linguistic data.
For parliamentary data this means:
• shared metadata schemas
• persistent identifiers
• linked data compatibility
• reusable annotation standards
• FAIR-compliant workflows (Wilkinson et al., 2016)
These infrastructures are especially significant for multilingual climate discourse
research, where corpora must be comparable across institutional and linguistic
boundaries. But interoperability is often aspirational rather than fully achieved.
National archives still vary significantly in:
• metadata completeness
• markup quality
• licensing conditions
• OCR reliability
• historical coverage
These inconsistencies create uneven analytical conditions.
• 3 Metadata Challenges in Historical Parliamentary Data
3.1 Metadata heterogeneity
Metadata inconsistency is among the most significant obstacles in comparative
parliamentary analysis.
Climate discourse analysis often requires combining:
• speaker-level metadata
• institutional metadata
• temporal metadata
• policy metadata
• linguistic annotations
Yet historical corpora frequently exhibit missing or unstable values.
Party systems change. Committee names change. Ministerial roles shift.
Constituency boundaries are redrawn. Political categories evolve.
This creates what might be called diachronic metadata drift.
For example, coding “Green” party affiliation across thirty years may involve
multiple party names, mergers, or coalition structures. Without harmonisation,
comparative modelling becomes unreliable.
3.2 Speaker identity disambiguation
Speaker disambiguation is especially difficult in long historical corpora.
Problems include:
• spelling variation
• honorific forms
• duplicated names
• changing transliterations
• incomplete identifiers
Cross-lingual corpora add further complexity. Named entity alignment often requires combining:
• rule-based matching
• authority files
• Wikidata identifiers
• manual reconciliation
• probabilistic record linkage
Without this, speaker-level discourse analysis becomes unstable.
3.3 OCR noise and historical corpora
Historical parliamentary digitisation often relies on OCR-derived text.

OCR introduces:
• token errors
• sentence boundary distortions
• lexical fragmentation
• named entity corruption
• structural markup loss
These distort topic modelling and embedding spaces.
In climate discourse analysis this can be especially problematic for domain-specific vocabulary:
“decarbonisation” may fragment. “emissions trading system” may be partially
corrupted. Named institutions may be inconsistently rendered.
Preprocessing thus becomes not simply cleaning but reconstruction.
• 4 Cross-Lingual Modelling Challenges
4.1 The comparability problem
Multilingual topic analysis raises a fundamental methodological problem: to what
extent can topics identified across different languages be considered analytically
comparable rather than artifacts of linguistic variation or translation asymmetries?
While topic models can identify statistically coherent clusters of co-occurring terms within individual corpora, cross-lingual comparison introduces additional
complexities because semantic equivalence is rarely straightforward. Terms that
appear to correspond lexically may carry distinct political or institutional
connotations depending on national context. For example, references to energy
security in Central and Eastern European parliamentary debates may foreground
geopolitical dependency and sovereignty concerns, whereas in Western European debates the same term may be more strongly associated with market stability or renewable system resilience. Apparent thematic overlap can therefore obscure important discursive divergence.
This problem is intensified in historical corpora because semantic fields shift over
time. Terms such as transition, sustainability, or resilience do not have stable
meanings across the 1990–2025 period, nor do they necessarily evolve
synchronously across languages. What appears as topic evolution may sometimes reflect changing lexical conventions rather than substantive ideological transformation. This challenge echoes wider concerns in comparative corpus linguistics regarding semantic drift and concept instability (Kozlowski et al., 2019). Translation introduces additional distortions. Corpora such as EuroParl contain aligned multilingual texts that often rely on interpreted or translated proceedings. While valuable for comparability, translated parliamentary language may suppress nationally specific rhetorical formulations or normalize culturally distinctive framing. This creates a paradox: translation can facilitate comparison while simultaneously flattening difference.
For this reason, the study treats comparability not as a binary condition but as a
continuum requiring methodological triangulation. Topic overlap is interpreted
alongside contextual metadata, semantic similarity measures, and close reading
rather than assumed a priori.
4.2 Multilingual embeddings and semantic alignment
Recent transformer-based multilingual models offer important tools for addressing
some of these limitations. Models such as multilingual BERT, XLM-R, and language aligned sentence transformers make it possible to map semantically related expressions into partially shared vector spaces, allowing comparison beyond direct lexical overlap.
These models support several analytical tasks relevant for parliamentary discourse analysis:
• cross-lingual semantic similarity measurement
• multilingual document clustering
• diachronic semantic shift detection
• discourse proximity mapping across legislatures
In this study, embeddings are used not as substitutes for topic modelling but as
complementary tools for testing semantic coherence across language-specific topic clusters. This hybrid approach allows latent themes identified through probabilistic models to be evaluated against embedding-based similarity structures.
Yet these methods also have limitations. Multilingual embeddings tend to privilege
high-resource languages, often reproducing asymmetries in training data. Political
rhetoric, institutional idioms, and historically contingent meanings may remain
poorly aligned even when embedded representations suggest similarity.
Furthermore, embedding spaces can themselves shift depending on training corpus composition, raising questions of reproducibility.
Rather than resolving comparability challenges completely, embeddings therefore
provide probabilistic support for cross-lingual interpretation while preserving the
need for contextual validation.
4.3 Structural topic models and metadata-aware comparison
To address these challenges, the study supplements conventional Latent Dirichlet
Allocation with Structural Topic Models (STM) (Roberts et al., 2019), which
incorporate document-level covariates directly into topic estimation.
This is particularly useful in multilingual parliamentary analysis because topic
prevalence can be modelled as a function of:
• language
• parliament
• political group
• time period
• speaker role
• policy domain
Rather than assuming identical topic structures across corpora, STM allows
differences in topic prevalence and framing to become part of the analytical result
itself.
For example, the topic associated with just transition may exhibit similar core
vocabularies across corpora while varying significantly in prevalence and rhetorical emphasis by country. Such variation is not treated as methodological noise but as substantive evidence of differentiated transition politics.
This approach also allows metadata to function as a bridge between infrastructural and interpretive analysis. Rather than treating metadata merely as auxiliary information, it becomes integral to modelling discursive structure.
4.4 Evaluation and interpretive validation
Given persistent uncertainties in multilingual modelling, evaluation requires more
than standard coherence metrics alone. The study therefore combines multiple
validation strategies:
• topic coherence diagnostics
• intruder-word testing
• embedding similarity checks
• metadata-conditioned robustness tests
• qualitative close reading of sampled debates
This layered approach follows recent digital humanities arguments that
computational models should support interpretive reasoning rather than replace it
(Underwood, 2019).
From this perspective, multilingual modelling is less about achieving perfect
equivalence than about constructing analytically credible approximations of
discursive relationships across heterogeneous linguistic and institutional settings.
• 5 Discussion
The findings suggest not only a substantive shift in parliamentary environmental
discourse, but also a methodological argument about the infrastructural conditions
under which such shifts can be studied. While earlier versions of this paper framed the principal contribution around topic evolution in climate discourse, the present emphasis on multilingual parliamentary data infrastructures foregrounds how metadata design, encoding standards, and cross-lingual comparability condition what kinds of interpretations are possible in the first place.
Rather than treating climate policy discourse as a self-contained semantic object
recoverable directly from corpora, the analysis shows that discursive patterns are
mediated by documentary infrastructures. Parliamentary language is embedded in layers of encoding, speaker metadata, institutional taxonomies, and translation
practices that shape the outcomes of computational modelling. This perspective
moves the discussion beyond using corpora merely as datasets toward
understanding them as epistemic infrastructures (Bowker & Star, 1999) whose
construction has analytical consequences.
Three broader transformations appear particularly significant.
First, climate discourse expands from a bounded domain of environmental
regulation into a broader field of macroeconomic and infrastructural governance.
References to industrial strategy, strategic autonomy, fiscal instruments, labour
restructuring, and technological sovereignty indicate that climate change is
increasingly articulated as a systemic governance issue rather than a sectoral policy concern. Yet this shift also depends on metadata granularity: detecting such transformations requires corpora that preserve speaker roles, committee contexts, legislative agendas, and policy domains. Without sufficiently rich metadata, many of these shifts remain computationally invisible.
Second, justice-oriented language becomes increasingly central in parliamentary
discussions of transition politics (Sovacool et al., 2021; Bouzarovski, 2018).
However, comparative modelling reveals that justice narratives are especially
sensitive to cross-lingual semantic variation. Concepts such as fairness, solidarity,
precarity, or vulnerability often do not map cleanly across languages or institutional traditions. This reinforces the need for multilingual embeddings and concept-level modelling strategies that go beyond lexical equivalence. In this sense, justice discourse emerges not only as a substantive finding but as a test case for the limits of multilingual computational comparison.
Third, energy security crises reshape climate politics through discursive coupling
between security, resilience, and transition governance. Geopolitical shocks and
price crises do not simply add new topics; they alter relations among previously
separate discourse clusters. Yet identifying these transformations required
temporal alignment across heterogeneous parliamentary datasets, highlighting
how diachronic analysis depends on interoperable metadata structures and
consistent periodisation.
Methodologically, the paper therefore argues that topic modelling and cross-lingual semantic analysis should be understood as inseparable from data modelling decisions. Computational outputs are not neutral representations of political discourse but products of interaction between algorithms, encoded corpora, and interpretive assumptions (Underwood, 2019). This reinforces arguments in digital humanities that modelling itself is a hermeneutic act.
Several challenges remain. Multilingual comparability continues to pose difficulties because parliamentary speech genres differ institutionally as well as linguistically.
Translation bias remains significant, particularly where aligned corpora privilege
translated proceedings over native-language interventions. OCR noise and
historical orthographic variation remain obstacles for older debates, especially
when metadata is incomplete or inconsistent. Topic interpretability also remains
contingent on preprocessing and annotation choices.
These challenges point toward a broader lesson: computational analysis of
parliamentary discourse depends as much on infrastructure quality as on modelling sophistication. For this reason, advances in TEI standardisation, linked
parliamentary metadata, and FAIR-compliant language infrastructures are not
ancillary technical concerns but central methodological conditions for comparative
climate discourse research.
• 6 Reproducible Digital Workflow and Open Infrastructure
This project is designed not simply as an analytical workflow but as an
infrastructure-oriented research pipeline organised around reproducibility,
interoperability, and open parliamentary data principles.
Repository components include:
• Python topic modelling and Structural Topic Model scripts (Roberts et al.,
2019)
• multilingual corpus cleaning and harmonisation pipeline
• cross-lingual embedding alignment notebooks
• network visualisation and semantic drift notebooks
• TEI parsing and metadata harmonisation documentation
• reproducibility documentation for corpus assembly and parameter
selection
A central design principle is that analytical reproducibility requires infrastructural
reproducibility. For this reason, the workflow treats data transformation decisions
normalisation, metadata mapping, XML extraction, and alignment- as documented analytical steps rather than hidden preprocessing.
6.1 TEI and parliamentary data standards
TEI-encoded parliamentary corpora provide a crucial foundation for
interoperability across legislatures. In particular, the TEI Guidelines and their
implementation in ParlaMint enable common representation of speeches, speakers, interruptions, agenda structures, and legislative sessions (TEI Consortium, 2023; Erjavec et al., 2023).
These standards are not merely archival conveniences; they shape what can be
computationally modelled. Speaker-level metadata, party affiliations, timestamps,
and intervention structures make possible forms of discourse analysis that would
be difficult in plain-text corpora.
This study therefore treats TEI not only as markup but as analytical infrastructure.
6.2 Infrastructures and repositories
The workflow relies on several infrastructures:
• ParlaMint XML/TEI corpora
• EuroParl aligned proceedings
• CLARIN repositories and services (CLARIN ERIC, 2023)
• DARIAH tools for digital humanities workflows
• siParl and related Slovenian parliamentary resources (Erjavec et al., 2022)
Together these infrastructures support discoverability, persistent identifiers, and
interoperable access, aligning the project with FAIR principles (Wilkinson et al.,
2016).
6.3 Open modelling and reuse
A further objective is methodological portability. Because workflows are modular,
they can be adapted for other parliamentary themes-migration, welfare, populism,
or democratic backsliding.
This emphasis aligns with growing calls in AI4DH and computational humanities for reusable pipelines rather than one-off analytical scripts.
7 Topic Evolution Across Historical Periods (updated to integrate infrastructure
angle) The diachronic modelling reveals substantial changes in discursive emphasis across the three analysed periods, but it also illustrates how such historical analysis depends on temporally aligned and metadata-rich infrastructures. During 1990–2004, debates are dominated by environmental regulation, liberalisation, and sustainable development framings associated with post-Rio governance agendas. Climate change often appears as a secondary issue embedded within wider environmental portfolios rather than a fully autonomous policy field (Meadowcroft, 2009).
From a modelling perspective, this early period presents particular infrastructural
difficulties. Sparse metadata, OCR inconsistencies, and less standardised
digitisation complicate topic stability. The relative weakness of explicit climate
vocabulary in this period makes metadata-supported contextual retrieval especially important.
During 2005–2015, a significant reconfiguration emerges. Parliamentary debates
increasingly cluster around emissions trading, renewable energy deployment, and
carbon governance architectures. Vocabulary associated with innovation,
competitiveness, and decarbonisation rises sharply.
This period also marks improved corpus quality and metadata consistency,
enabling stronger cross-national comparison. Harmonised parliamentary records
make it possible to trace convergence in climate governance discourse at a
European scale.
The 2016-2025 period marks a further transformation. Discourses become more
fragmented and multidimensional, incorporating affordability, supply insecurity,
industrial sovereignty, and justice concerns. The post-2022 energy crisis functions
as a major discursive inflection point, reshaping rather than displacing climate
discourse.
Dynamic topic evolution analysis suggests growing convergence between
previously distinct discourse clusters related to mitigation, industrial policy, and
social protection (Geels, 2002; Stirling, 2014). This reflects movement toward
integrated socio-technical transition narratives.
At the same time, significant cross-national variation persists. Some parliamentary contexts frame transition primarily through competitiveness and innovation, while others foreground redistributional justice and energy poverty (Bouzarovski, 2018).
Importantly, cross-lingual embedding trajectories show that semantic shifts often
do not proceed uniformly across languages. Terms associated with transition,
resilience, or sovereignty may converge politically while diverging semantically.
This highlights the value of modelling semantic change at concept level rather than assuming lexical equivalence.
Overall, these diachronic shifts demonstrate that parliamentary climate discourse
evolves through punctuated reconfigurations linked to institutional developments,
economic crises, and geopolitical disruptions. Just as importantly, they show that
tracing such reconfigurations depends on robust multilingual infrastructures
capable of supporting historical comparability.
Conclusion
This study has argued that multilingual parliamentary corpora should be
understood not merely as textual sources for analysing climate discourse, but as
data infrastructures whose design fundamentally shapes what kinds of comparative political analysis become possible. By combining topic modelling, semantic analysis, and an infrastructural perspective focused on metadata, TEI encoding, and cross-lingual interoperability, the paper has examined both the evolution of climate and energy narratives in European parliamentary debates and the methodological conditions under which those narratives can be computationally studied.
Substantively, the analysis demonstrates a significant transformation in
parliamentary climate discourse over the period 1990–2025. Debates shift from
early emphases on environmental regulation and pollution control toward
increasingly complex narratives centred on decarbonisation, energy security,
industrial transformation, and justice. These shifts are not linear but occur through
punctuated reconfigurations linked to institutional change, geopolitical crises, and
evolving socio-technical imaginaries of transition. Parliamentary debates emerge
not simply as reflections of policy change, but as sites where competing futures of
climate governance are actively constructed and contested.
Methodologically, the study contributes to digital humanities and computational
social science by showing that modelling political discourse cannot be separated
from questions of data architecture. Metadata completeness, TEI standards,
alignment procedures, and multilingual comparability are not secondary technical
issues but central analytical conditions. In this sense, the paper advances an
infrastructural perspective in which corpora are treated not only as data for analysis but as epistemic systems whose organisation shapes interpretation itself.
This perspective is particularly relevant for research using historical parliamentary
data, where challenges of OCR noise, incomplete metadata, historical language
variation, and heterogeneous annotation standards remain persistent obstacles.
The paper therefore argues that advances in parliamentary discourse research
depend not solely on increasingly sophisticated algorithms, but equally on stronger interoperable infrastructures linking texts, metadata, and computational methods.
Future research could extend this agenda through transformer-based semantic
change modelling, linked parliamentary knowledge graphs, and retrieval-enhanced methods combining structured metadata with large language models. Such developments would further strengthen the analytical potential of parliamentary corpora while preserving the interpretive depth central to digital humanities scholarship.
Taken together, the study suggests that the future of multilingual parliamentary
research lies not simply in analysing larger corpora, but in building better
infrastructures through which political language, historical complexity, and
computational analysis can be brought into productive dialogue.
Acknowledgements
This research was developed as an independent scholarly project and received no dedicated external funding. The authors acknowledge the value of open scholarly infrastructures that made the study possible.
We gratefully acknowledge the contributors to the CLARIN ERIC infrastructure and its national centres for providing access to interoperable language resources, tools, and standards that support multilingual corpus research. In particular, the study benefited conceptually and methodologically from resources associated with ParlaMint, TEI Consortium, and the Slovenian CLARIN.SI repository, including the siParl corpus.
The authors also acknowledge the broader open-source ecosystem underlying the computational workflow used in this study, including Python-based tools for corpus processing, topic modelling, and network analysis. Their continued development has been essential to reproducible research in digital humanities and computational social science.
Finally, we thank the communities working at the intersection of parliamentary
studies, corpus linguistics, and digital humanities whose efforts to build shared data infrastructures continue to expand possibilities for comparative historical research. References
Abercrombie, G., & Batista-Navarro, R. (2018). Sentiment analysis of parliamentary debates. Proceedings of the 11th Language Resources and Evaluation Conference, 4113–4119.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of
Machine Learning Research, 3, 993–1022.
Bouzarovski, S. (2018). Energy poverty: (Dis)assembling Europe’s infrastructural
divide. Palgrave Macmillan.
Bridge, G., Bouzarovski, S., Bradshaw, M., & Eyre, N. (2013). Geographies of energy transition: Space, place and the low-carbon economy. Energy Policy, 53, 331–340. https://doi.org/10.1016/j.enpol.2012.10.066
CLARIN ERIC. (2023). CLARIN infrastructure and services. https://www.clarin.eu
DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture. Poetics, 41(6), 570–606.
https://doi.org/10.1016/j.poetic.2013.08.004
Dryzek, J. S. (2013). The politics of the earth: Environmental discourses (3rd ed.).
Oxford University Press.
Erjavec, T., Pančur, A., Meden, K., Ojsteršek, M., Šorn, M., & Blaj Hribar, N. (2022).
Slovenian parliamentary corpus siParl 3.0. Slovenian language resource repository
CLARIN.SI. http://hdl.handle.net/11356/1748
Erjavec, T., Pančur, A., et al. (2023). The ParlaMint corpora. Language Resources and Evaluation. https://doi.org/10.1007/s10579-023-09682-4
Geels, F. W. (2002). Technological transitions as evolutionary reconfiguration
processes: A multi-level perspective and a case-study. Research Policy, 31(8–9),
1257–1274. https://doi.org/10.1016/S0048-7333(02)00062-8
Hagberg, A., Swart, P., & S. Chult, D. (2008). Exploring network structure, dynamics, and function using NetworkX. Proceedings of the 7th Python in Science Conference, 11-15.
Hajer, M. A. (1995). The politics of environmental discourse: Ecological modernization and the policy process. Oxford University Press.
Ilie, C. (2010). Strategic uses of parliamentary forms of address. Journal of
Pragmatics, 42(4), 885–911.
IPCC. (2023). Climate Change 2023: Synthesis Report. Intergovernmental Panel on Climate Change.
Jockers, M. L. (2013). Macroanalysis: Digital methods and literary history. University of Illinois Press.
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In
Proceedings of MT Summit X (pp. 79–86).
Kozlowski, A., Taddy, M., & Evans, J. A. (2019). The geometry of culture: Analyzing meaning through word embeddings. American Sociological Review, 84(5), 905–949. https://doi.org/10.1177/0003122419877135
Meadowcroft, J. (2009). Climate change governance. World Bank Policy Research Working Paper No. 4941.
Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. Proceedings of EMNLP, 262-272.
Newell, P., & Mulvaney, D. (2013). The political economy of the just transition. The
Geographical Journal, 179(2), 132–140. https://doi.org/10.1111/geoj.12008
Proksch, S.-O., & Slapin, J. B. (2015). The politics of parliamentary debate. Cambridge University Press.
Rauh, C. (2019). EU politicization and policy initiatives. European Union Politics,
20(1), 3-25.
Řehůřek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. models.
Roberts, M. E., Stewart, B. M., & Tingley, D. (2019). stm: An R package for structural topic Journal of https://doi.org/10.18637/jss.v091.i02 Statistical Software, 91(2), 1-40.
Schmidt, V. A. (2008). Discursive institutionalism: The explanatory power of ideas
and discourse. Annual Review of Political Science, 11, 303–326.
Sovacool, B. K., Hook, A., Martiskainen, M., & Brock, A. (2021). The decarbonisation divide. Nature Energy, 6, 611-619.
Stirling, A. (2014). Emancipating transformations: From controlling “the transition”
to culturing plural radical progress. STEPS Working Paper 64.
TEI Consortium. (2023). TEI P5: Guidelines for electronic text encoding and
interchange. https://tei-c.org
Underwood, T. (2019). Distant horizons: Digital evidence and literary change.
University of Chicago Press.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., et al. (2016). The FAIR guiding
principles for scientific data management and stewardship. Scientific Data, 3,
160018. https://doi.org/10.1038/sdata.2016.18
Willis, R. (2017). Taming the climate? Corpus analysis of parliamentary climate
discourse. Environmental Politics, 26(2), 212–231.

Flash News

Weekly Recap – June 8 – 14 / 2026

Weekly Recap – June 1 – 7 / 2026

Weekly Recap – May 25 – 31 / 2026

Weekly Recap – May 18 – 24 / 2026

PARLIAMENTARY DATA INFRASTRUCTURES FOR CLIMATE DISCOURSE ANALYSIS: CHALLENGES OF METADATA, TEI, AND CROSS-LINGUAL MODELLING

Leave a Reply Cancel reply