DH Internship, Pt. 1 (on critical discourse analysis, natural language processing and indecisiveness)

While I maintain that my intentions to consistently update the content on this site were good, I have been otherwise preoccupied.

Between group projects and general assessments from my three modules, this semester has been certainly stacking up to be a busy one. Despite this,  I’ve been enjoying immersing myself in research for the subject of this blog, my internship at CNGL.

Before I tell you what I am personally researching, I’ll briefly explain CNGL and why I was particularly interested in working in their particular institution. 

CNGL, the centre for global intelligent content, are (in their own words) a, “collaborative academia-industry research centre dedicated to delivering disruptive innovations in digital intelligent content, and to revolutionising the global content value chain for enterprises, communities and individuals”. They work on researching machine translation and personalisation, amongst many other things, which I’m personally interested in researching.

My predominant research interests, prior to this course, were in critical discourse analysis and the structure of language. Despite this, I have no formal education in linguistics, so my own undergraduate dissertation involved splashing in a large pool that was not my own (or some more functional metaphor for a pursued interest without knowledge..?). So one of my key interests in taking a masters here at TCD was that they offered particularly unique skill sets to people who previously had no formal education with the mysterious world of computers. Corpus linguistics, in particular, and programming, are each subjects that I’ve played around with myself and deeply desired to know. Part of the course also involved an internship, (the core point of the post, I’m coming to it) which I had hoped would be relevant to my future research/employment interests, mainly in the interactions between humans, computers and language.

So in the new year, we were sent a document listing brief descriptions of the various internship positions available to us. I drew out a list of two, one that focused on my research interests (this one) and one on probable professional interests.

My internship, in it’s description, was originally titled, “Detecting Dialect in Parliamentary Debates”. I actually misread that originally as,”Detecting Dialectics in Parliamentary Debates”, which, as luck would have it, is probably a more accurate description of  what the work has instead entailed.

When I had my first meeting with Owen and Alex from CNGL, they asked about my research interests and emphasised that this internship would be flexible to reflect them. Owen told me to contact him and let him know what aspect of their project I would be most interested in working on. Personally, I thrive on very particular and strict instruction so was racked with indecision for several days before getting back to him to explain. At our next meeting, it was decided that I would be focusing on researching and producing a white paper on the intersection between critical discourse analysis and natural language processing.

The idea was that the project,perhaps best described as a multidisciplinary discourse analysis, would analyse the utilization of natural language processing techniques.

Initially, I spent a few weeks familiarizing myself with NLP by reading some of the seminal texts (particularly the Jurafsky & Martin book that you’ll find in my references, bonus points for opening with 2001: a space odyssey references). For those interested in getting started with NLP or CDA themselves, the next section of my blog post details some of the techniques and overarching concepts that I think are key.

ominous discourse image by Unbekannter Künstler.Kluibi at de.wikipedia [Public domain], from Wikimedia Commons

What is discourse?

Discourse denotes the general concept of communication. Discourse can, and has, been interpreted as constituting multiple ideas that are largely dependant on the field of study through which it is interpreted.  Discourse is a language behaviour that is inherently linked to social practices. The examination of discourse has, over time, developed into the field of  discourse analysis. Key to this field is the examination of language beyond sentence structure.

What is Discourse Analysis?


Discourse analysis is an multi disciplinary approach to sociolinguistics, wherein language is examined beyond the semantic level. Discourse analysis not only analyses the ways in which speech acts are formed, but simultaneously reviews the social, cultural, and often, political context of the speech act.

What is critical discourse analysis?

Critical Discourse Analysis, ( or CDA, as is it’s popular acronym) is a discipline within discourse analysis that predominantly examines politically-motivated discourse. Critical discourse analysis is a cross-disciplinary study of the interrelation between language and power, discourse and ideology.

Whilst discourse analysis can utilise qualitative, quantitative and computational analysis, CDA often collects a corpus of texts to analyse language’s function in producing and reproducing ideology. CDA can provide critical analysis of the relationships within language because it  views the construction of language as being inseparable from its context in relation to power structures within society.

CDA purports that language is constructed from power relations. Words are carefully chosen and constructed to imply binaries of power, who is the subordinate and who wields the power. CDA is not only a method of analysis, but is described as that shared belief that language and power have intrinsic links and, as such, act to form all other social relations around us at multiple levels. Discourse analysis is concerned with the organisation of language and social constructs, it attempts to deconstruct constructions of language in order to reveal discourses within a text. Ruth Wodak and Teun Van Dijk are particularly useful critics within the field to begin with.

What is NLP and why is it useful here?

Natural language processing as a field  is commonly understood to be the computational study and examination of natural language. Natural language refers to human language, as opposed to computer code etc. Natural language processing includes  text classification and sentiment analysis, machine parsing and information extraction, tokenization and meaning extraction.

Natural language processing (or NLP as we will subsequently refer to it as) is useful to this project for the valuation of discourses from multiple linguistic perspectives, from lexical analyses to syntactic and semantic analyses. NLP techniques are utilised for both the comprehension and processing of language. NLP techniques, “can be used alone to address discourse processing or in combination with other investigative techniques” (p.513. Crossley, Allen et al.)

NLP programming predominantly utilises the java or python programming languages for the automatic extraction of linguistic features (Jurafsky & Martin, 2008).

So, back to my slightly altered internship. Last week I asked if it might be possible for me to include some more practical examination of the political corpus mentioned in the brief, rather than focusing solely on a sort of literature review style paper. Owen was really enthusiastic about this and agreed that some transcription and sample analysis could really help to contextualise the work. We agreed that a multi-modal analysis would really be of interest for a discourse analysis, so I’ll be comparing discourse in a manual transcription of a video clip of the Dail’s daily live stream with automatic audio transcription from the same. Ideally I can examine the extent to which things that are missed through audio transcriptions, such as multi-modal elements like gaze direction, influence discourse.

The transcriptions I’ll be using will be partially sourced through KildareStreet.com, which is a volunteer built website which aggregates information from the parliamentary debate transcriptions, although these are the transcriptions submitted by the speaker’s teams themselves.  These transcriptions will then be analysed utilising natural language processing software, and, where it is deemed appropriate, an interpretive form of critical discourse analysis that I will perform manually.

The analysis and it’s finer details will be covered in my next blog post, too avoid a frightening and unreadable long post! So for now, I’ll leave you  with this yawning saluki (courtesy of Pleple2000)

Citations and references

Beaman, K. (1984) Coordination and subordination revisited: syntactic complexity in spoken and written narrative discourse. In D. Tannen (ed.) Coherence in spoken and written discourse. Norwood, N.J.: Ablex. pp.45-80.

Biber, D. (1988) Variation across speech and writing. Cambridge: Cambridge University Press.

Crossley, Scott A., et al. “Analyzing Discourse Processing Using a Simple Natural Language Processing Tool.” Discourse Processes 51.5-6 (2014): 511-534.

Harris, Zellig S. Discourse analysis. Springer Netherlands, 1981.

Jurafsky, Dan, and James H. Martin. Speech & language processing. Pearson Education India, 2000.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s