Content analysis is "a wide and heterogeneous set of manual or computer-assisted techniques for contextualized interpretations of documents produced by communication processes in the strict sense of that phrase (any kind of text, written, iconic, multimedia, etc.) or signification processes (traces and artifacts), having as ultimate goal the production of valid and trustworthy inferences."
Though the locution "content analysis" has come to be a sort of 'umbrella term' referring to an almost boundless set of quite diverse research approaches and techniques, it is still today in use in the Social and Computer Science Domains and in the Humanities to identify methods for studying and/or retrieving meaningful information from documents. In a more focused way, "content analysis" refers to a family of techniques oriented to the study of "mute evidence" of texts and artifacts. Texts come from communication processes in a narrow sense of that phrase (i.e. types of communication intentionally activated by a sender, using a code sufficiently shared with the receiver). There are 5 types of texts in content analysis:
- written t. (books, papers, etc.),
- oral t. (speech, theatre plays, etc.),
- iconic t. (drawings, paintings, icons, etc.),
- audio-visual t. (TV programs, movies, videos, etc.),
- hypertexts (can be one or more of the texts above, on the Internet).
On the other side, content analysis can also study traces (documents from past times) and artifacts (non-linguistic documents), which come from communication processes in a broad sense of that phrase - commonly referred to as "signification" in Semiotics (in the absence of an intentional sender, semiosis is developed by abduction).
Despite the wide variety of options, generally speaking every "content analysis" method implies «a series of transformation procedures, equipped with a different degree of formalisation depending on the type of technique used, but which share the scientific re-elaboration of the object examined. This means, in short, guaranteeing the repeatability of the method, i.e.: that pre-set itinerary which, following pre-established procedures (techniques), has led to those results. This path changes consistently depending on the direction imprinted by the interpretative key of the researcher who, at the end of the day, is responsible for the operational decisions made».
Over the years, content analysis has been applied to a variety of scopes. Hermeneutics and Philology have been using content analysis since the dawn of time to interpret sacred and profane texts and, in not a few cases, to attribute texts' authorship and authenticity.
In recent times, particularly with the advent of mass communication, content analysis has known an increasing use to deeply analyse and understand media content and media logic. The political scientist Harold Lasswell formulated the core questions of content analysis in its early-mid 20th-century mainstream version: "Who says what, to whom, why, to what extent and with what effect?". The strong emphasis for a quantitative approach started up by Lasswell was finally carried out by another "father" of content analysis, Bernard Berelson, who proposed a definition of content analysis which, from this point of view, is emblematic: «a research technique for the objective, systematic and quantitative description of the manifest content of communication». This was the product of a positivist epistemological context which is quite close the naïve realism that has long since become obsolete. Approaches of this type are rising again due to the tremendous fertility of the most recent technologies and application within mass and personal communications. Content analysis has indeed come across huge amount of textual big data as a consequence of the recent spread of new media, particularly social media and mobile devices. Threats are represented by the fact that the complexity of the process of semiosis is not rarely underestimated and made banal whenever statistics is uncritically applied to large amount of analogic-native data. In such a case, the main problem stems from a naive use of measures and numbers as an always valid certificate of "objectivity" and "systematicity", though moving from the sharable principle to contain bad, offhand evidence-detached analyses spoiled by the «human tendency to read textual material selectively, in support of expectations rather than against them».
- Description 1
- Uses of content analysis 2
- The process of a content analysis 3
- Reliability in content analysis 4
- See also 5
- References 6
- Further reading 7
- External links 8
The method of content analysis enables the researcher to include large amounts of textual information and systematically identify its properties, such as the frequencies of most used keywords by locating the more important structures of its communication content. Such amounts of textual information must be categorised to provide a meaningful reading of content under scrutiny. For example, David Robertson created a coding frame for a comparison of modes of party competition between British and American parties. It was developed further in 1979 by the Manifesto Research Group aiming at a comparative content-analytic approach on the policy positions of political parties. This group created the Manifesto Project Database.
Since the 1980s, content analysis has become an increasingly important tool in the measurement of success in public relations (notably media relations) programs and the assessment of media profiles. In these circumstances, content analysis is an element of media evaluation or media analysis. In analyses of this type, data from content analysis is usually combined with media data (circulation, readership, number of viewers and listeners, frequency of publication). It has also been used by futurists to identify trends. In 1982, John Naisbitt published his popular Megatrends, based on content analysis in the US media.
The creation of coding frames is intrinsically related to a creative approach to variables that exert an influence over textual content. In political analysis, these variables could be political scandals, the impact of public opinion polls, sudden events in external politics, inflation etc. Mimetic Convergence, created by Fátima Carvalho for the comparative analysis of electoral proclamations on free-to-air television, is an example of creative articulation of variables in content analysis. The methodology describes the construction of party identities during long-term party competitions on TV, from a dynamic perspective, governed by the logic of the contingent. This method aims to capture the contingent logic observed in electoral campaigns by focusing on the repetition and innovation of themes sustained in party broadcasts. According to such post-structuralist perspective from which electoral competition is analysed, the party identities, 'the real' cannot speak without mediations because there is not a natural centre fixing the meaning of a party structure, it rather depends on ad-hoc articulations. There is no empirical reality outside articulations of meaning. Reality is an outcome of power struggles that unify ideas of social structure as a result of contingent interventions. In Brazil, these contingent interventions have proven to be mimetic and convergent rather than divergent and polarised, being integral to the repetition of dichotomised world-views.
Mimetic Convergence thus aims to show the process of fixation of meaning through discursive articulations that repeat, alter and subvert political issues that come into play. For this reason, parties are not taken as the pure expression of conflicts for the representation of interests (of different classes, religions, ethnic groups) but attempts to recompose and re-articulate ideas of an absent totality around signifiers gaining positivity.
Every content analysis should depart from a hypothesis. The hypothesis of Mimetic Convergence supports the Downsian interpretation that in general, rational voters converge in the direction of uniform positions in most thematic dimensions. The hypothesis guiding the analysis of Mimetic Convergence between political parties' broadcasts is: 'public opinion polls on vote intention, published throughout campaigns on TV will contribute to successive revisions of candidates' discourses. Candidates re-orient their arguments and thematic selections in part by the signals sent by voters. One must also consider the interference of other kinds of input on electoral propaganda such as internal and external political crises and the arbitrary interference of private interests on the dispute. Moments of internal crisis in disputes between candidates might result from the exhaustion of a certain strategy. The moments of exhaustion might consequently precipitate an inversion in the thematic flux.
As an evaluation approach, content analysis is considered by some to be quasi-evaluation because content analysis judgements need not be based on value statements if the research objective is aimed at presenting subjective experiences. Thus, they can be based on knowledge of everyday lived experiences. Such content analyses are not evaluations. On the other hand, when content analysis judgements are based on values, such studies are evaluations.
Qualitative content analysis is “a systematic, replicable technique for compressing many words of text into fewer content categories based on explicit rules of coding”. It often involves building and applying a “concept dictionary” or fixed vocabulary of terms on the basis of which words are extracted from the textual data for concording or statistical computation.
Uses of content analysis
Holsti groups fifteen uses of content analysis into three basic categories:
- make inferences about the antecedents of a communication
- describe and make inferences about characteristics of a communication
- make inferences about the effects of a communication.
He also places these uses into the context of the basic communication paradigm.
The following table shows fifteen uses of content analysis in terms of their general purpose, element of the communication paradigm to which they apply, and the general question they are intended to answer.
|Uses of Content Analysis by Purpose, Communication Element, and Question|
|Make inferences about the antecedents of communications||Source||Who?||
|Describe & make inferences about the characteristics of communications||Channel||How?||
|Make inferences about the consequences of communications||Decoding process||With what effect?|
|Note. Purpose, communication element, & question from Holsti. Uses primarily from Berelson as adapted by Holsti.|
The process of a content analysis
According to Dr. Klaus Krippendorff, six questions must be addressed in every content analysis:
- Which data are analysed?
- How are they defined?
- What is the population from which they are drawn?
- What is the context relative to which the data are analysed?
- What are the boundaries of the analysis?
- What is the target of the inferences?
The assumption is that words and phrases mentioned most often are those reflecting important concerns in every communication. Therefore, quantitative content analysis starts with word frequencies, space measurements (column centimeters/inches in the case of newspapers), time counts (for radio and television time) and keyword frequencies. However, content analysis extends far beyond plain word counts, e.g. with Keyword In Context routines words can be analysed in their specific context to be disambiguated. Synonyms and homonyms can be isolated in accordance to linguistic properties of a language.
Qualitatively, content analysis can involve any kind of analysis where communication content (speech, written text, interviews, images ...) is categorised and classified. In its beginnings, using the first newspapers at the end of 19th century, analysis was done manually by measuring the number of lines and amount of space given a subject. With the rise of common computing facilities like PCs, computer-based methods of analysis are growing in popularity. Answers to open ended questions, newspaper articles, political party manifestoes, medical records or systematic observations in experiments can all be subject to systematic analysis of textual data. By having contents of communication available in form of machine readable texts, the input is analysed for frequencies and coded into categories for building up inferences. Robert Weber notes: "To make valid inferences from the text, it is important that the classification procedure be reliable in the sense of being consistent: Different people should code the same text in the same way". The validity, inter-coder reliability and intra-coder reliability are subject to intense methodological research efforts over long years. For example, In 2008, Yukihiko Yoshida did a study called  “Leni Riefenstahl and German expressionism: research in Visual Cultural Studies using the trans-disciplinary semantic spaces of specialized dictionaries.” The study took databases of images tagged with connotative and denotative keywords (a search engine) and found Riefenstahl’s imagery had the same qualities as imagery tagged “degenerate” in the title of the exhibition, "Degenerate Art" in Germany at 1937.
One more distinction is between the manifest contents (of communication) and its latent meaning. "Manifest" describes what (an author or speaker) definitely has written, while latent meaning describes what an author intended to say/write. Normally, content analysis can only be applied on manifest content; that is, the words, sentences, or texts themselves, rather than their meanings.
Dermot McKeone highlighted the difference between prescriptive analysis and open analysis. In prescriptive analysis, the context is a closely defined set of communication parameters (e.g. specific messages, subject matter); open analysis identifies the dominant messages and subject matter within the text.
A further step in analysis is the distinction between dictionary-based (quantitative) approaches and qualitative approaches. Dictionary-based approaches set up a list of categories derived from the frequency list of words and control the distribution of words and their respective categories over the texts. While methods in quantitative content analysis in this way transform observations of found categories into quantitative statistical data, the qualitative content analysis focuses more on the intentionality and its implications.
Following latest developments within the critique of content analysis epistemology and methodology, evidence set under scrutiny by content analysis may come from processes of communication strictiore sensu (i.e. active role of a sender, code in common between sender and receiver) or processes of what in semiotics is commonly known as signification or communication processes latiore sensu (absence of sender and code, semiosis developed by abduction).
As the uncritical use of text is today widely recognized as naive in the Social Sciences domain, we can move from the original classification by Krippendorff 
Reliability in content analysis
Neuendorf suggests that when human coders are used in content analysis two coders should be used. Reliability of human coding is often measured using a statistical measure of intercoder reliability or "the amount of agreement or correspondence among two or more coders".
- Donald Wayne Foster
- Transition words
- Text mining
- The Polish Peasant in Europe and America
- Graneheim, Ulla Hällgren, and Lundman, Berit (2004). Qualitative content analysis in nursing research: concepts, procedures and measures to achieve trustworthiness. Nurse Education Today, 24(2), 105-112.
- Budge, Ian (ed.) (2001). Mapping Policy Preferences. Estimates for Parties, Electors and Governments 1945-1998. Oxford, UK: Oxford University Press. ISBN 978-0199244003.
- Krippendorff, Klaus, and Bock, Mary Angela (eds) (2008). The Content Analysis Reader. Thousand Oaks, CA: Sage. ISBN 978-1412949668.
- Roberts, Carl W. (ed.) (1997). Text Analysis for the Social Sciences: Methods for Drawing Inferences from Texts and Transcripts. Mahwah, NJ: Lawrence Erlbaum. ISBN 978-0805817348.
- Wimmer, Roger D. and Dominick, Joseph R. (2005). Mass Media Research: An Introduction, 8th ed. Belmont, CA: Wadsworth. ISBN 978-0534647186.
- Yoshikoder (on github), an open source content analysis program
- Web site of the Content Analysis Guidebook Online, provides some CATA software for free download, list of archives, bibliographies and other important sources
- Contains a general introduction to media analysis and media profile measurement including an outline of the differences between open and prescriptive analysis
- History of content analysis software in psychology and applications of content analysis, text mining and text analytics in market research.
- course in the School of Social Work at the University of Washington.Advanced Research Methods & Design, a list of programs for analyzing text, images, video and audio; a resource used by the Software for Content Analysis
- QDA Miner Lite, a freeware for qualitative content analysis.
- KH Coder, an open source free software for quantitative content analysis or text mining
- gives information on various content analysis techniques.
- Text Analysis Info, gives an overview on text analysis software that is updated regularly.