Transparent research and accessible data: A trend to pay attention to
Much of the recent debate is about the transparency and accessibility of both primary data collected by researchers and secondary sources they may use in their work. Another consideration is that many of the foremost political science journals are updating their standards to include principles about raw data and replication. Indeed, enhancing transparency across the board will make inter-disciplinary dialogue easier, and more accessible social science will make for better research.
The DA-RT initiative involves three broad concepts, which will be explored at length.
1. Data Transparency, whereby researchers publicize the data they use as evidence,
2. Analytic Transparency, where researchers publicize ‘how they measure, code, interpret, and analyze that data,’[i]
3. Process (or Production) Transparency, where researchers explain how they came to choose their research design, and why they used particular sets of data, theories, and methods.
The idea behind data transparency is that inasmuch as a researcher makes claims which rely on data they collected, this raw data should be available to readers – so that ideally, another researcher could replicate the same research results using that raw data, which would further validate the findings. The principle of replicability of results is a crucial point in the DA-RT initiative, which aims to make social science more legitimate and more impactful by prioritizing replicability. In practice, this means that another researcher, using your data, method, and coding process, can reach the same result, thus bolstering the scientific value of your research findings. As Moravscik puts it, the “recent discovery of a large number of non-replicable positive results in the natural and social sciences” has had detrimental effects on the legitimacy and credibility of social sciences.[ii] Transparency regimes in regards to data, therefore, are directly linked to legitimizing social science as a discipline as well as individual research findings. Another benefit is that this data might be useful for other scholars down the line, and opens research to criticism and dialogue in the academic community.
In practice, data transparency means making the raw data collected for a research project both public and accessible to readers. Some journals have been using web applications to store data and code, making it possible to link readers directly to datasets. A Dataverse is an open source web application, developed by Harvard University, which “share[s], preserve[s], cite[s], explore[s] and analyze[s] research data,” essentially making data available to other researchers. It also ensures that the creators of data are fully credited for their work, and is a way for researchers to directly cite data sets. Some universities and social science journals have initialized some applications, where, for example, the data for active citation, or supplementary materials, can be archived with a permanent link. This makes data accessible over the long term. In areas where accessing data is difficult, such as the Western Balkans, building an accessible archive of data from past projects would be particularly useful to other researchers.
Transparent Secondary Sources: Active Citation
In terms of secondary sources, data transparency also implies a new and changing standard in how literature is referenced. Replacing the static citations of old, proponents of DA-RT, such as Moravscik, call for ‘active citation’, which includes hyperlinks to texts available online, as well as summaries of how the source was used. Moravscik defines active citation as “the use of rigorous, annotated (presumptively) primary-source citations hyperlinked to the sources themselves.”[iii] This implies two things. For primary sources, it means that if citing your own sources, you must also include a link to some part of the relevant data – at least a paragraph which contextualizes the data. For secondary sources, this also implies the extra step of explaining, in citations, how precisely the secondary resource was used, and how it corroborates the researcher’s point.
Figure 1: Moravscik, Andrew. “Transparency: The Revolution in Qualitative Research.” Symposium: Openness in Political Science. American Political Science Association. January 2014.
In practice, active citation involves the creation of a ‘transparency appendix’ at the end of any piece of work. Moravscik outlines four elements to this appendix: “(1) a copy of the full citation (2) an excerpt from the source, presumptively at least 50–100 words (3) an annotation explaining how the source supports the claim being made (4) optionally, an outside link to and/or a scan of the full source.”[iv] Not only would active citation make research reports more engaging (especially literature reviews), it would lend them credibility by offering hyperlinks and contextualizing information for sources cited. This gives not just a clear overview of the context of secondary sources, but also access to the source material for those who might not have it. In turn, this could increase the dialogue between researchers and the sources they utilize, and allow readers to retrace the steps of the research and thus deduce its methodological rigour and the strength of its claims. Increasing the transparency of reports will increase their potential impact and discourage plagiarism or methodological laziness.
Analysis, like narrative, implies making an otherwise chaotic social world legible. The process involves research, observation, and interpretation which unearth causal links, patterns, or descriptions. Through analytic transparency, authors demonstrate how they arrived at certain inferences or conclusions. Analytic transparency provides insights into the manner by which data was coded and analyzed, which allows readers to judge its merits. In practice, this means explaining, usually within the text itself, how the author grappled with the complex set of factors and possible interpretations– why did they choose that particular set of literature, or choose to focus on a certain historical narrative, or privilege one factor over another, etcetera? If there are contradictions in secondary research, how were they dealt with? The more transparent a researcher is about how and why they reached certain conclusions, the more validity their claims will have, because their analysis will be open to critique for peers and the public.
Figure 2: www.dartstatement.org
Analytic transparency is in part admission that data is always ideological, depending on the conditions of its collection, analysis, and reception by an audience. Within the LSE ‘Politics of Data’ project, Jeffrey Alan Johnson argues for that the data is inherently meta-political, given that those who produce data ‘encode’ it, while data users decode it to “shape social practice.”[v] Both encoding and decoding takes place from the standpoints of the particular data producers and users, meaning that there are several steps in the process where the transmission of data is more than a neutral process. Supports of DA-RT provide the above chart (Figure 2) in one of their presentations, presenting the steps of evidence-based social inquiry. Ideally, all of these steps would be elucidated in a research transparency scheme, given that the steps of observation, categorization, analysis, and interpretation, all depend on the vicissitudes of the researcher, the research method used, the mode of analysis, and the readers interpreting the final results. This is where the notions of analytic and process transparency become key.
Process or Production Transparency
If analytic transparency seeks to justify the interpretations of data and secondary sources, process transparency is about publicizing the process behind the decisions to use certain methods, data, cases, etc., in an attempt to confront the potential biases in these choices. It ˝obliges social scientists to publicize the broader set of research design choices that gave rise to the particular combination of data, theories, and methods they employ.˝[vi] This discourages researchers from cherry-picking from the large set of possible methods, sources, and observations in order to prove their own hypotheses, and demands methodological rigour in these choices. As well, it obliges researchers who use their own data to ˝offer a full account of the procedures used to collect or generate the data.˝[vii]
Concerns and Criticisms: the ethics of transparency
The DA-RT initiative is not without its skeptics, and many academics have expressed concern about a variety of issues.[viii] Though increased research transparency and access to source data have been described as part of the expanding of ethical standards in research,[ix] some researchers worry precisely about the unintended ethical consequences of higher levels of transparency. Kristen Renwick Monroe writes about the ethical issues of revealing data which cause injury or other negative effect to research subjects, or which was collected on the condition that it to be anonymized prior to publication. Data collected from vulnerable groups or on sensitive issues partly relies on the trust between the subject and the researcher, and this must absolutely be protected in any transparency scheme. Indeed, one of the main arguments by a group of scholars opposed to the implementation of DA-RT is the ethical dilemma posed by balancing transparency and the unique vulnerability of studies which use human subjects. For the privacy and safety of human subjects to be respected, raw data must effectively be anonymized before being published, if any of it can be made accessible at all.
Other worries about data access include the fact that raw data can differ in form. While some, like databases of gathered information, are relatively easy to make accessible, other forms of data (interviews, field notes) can be much more difficult to digitize and archive. The very act of making data accessible may put an extra logistical and financial burden on researchers. In addition, analytic transparency is fairly straightforward for quantitatively-driven work, but for work which in a large part relies on qualitative findings, it is not entirely clear “what norms, principles, or considerations should guide authors and reviewers in pursuing and judging analytic transparency for non-statistical forms of inquiry.”[x] The difficulty of applying the same standards to qualitative and quantitative work is one that has not been fully addressed.
There is also a fear that researchers may be forced to make their own data accessible before they have a chance to themselves analyze it – and thus give up the opportunity to be the first to publish the results. Therefore, there is an admission among the proponents of DA-RT that data can be reasonably kept private for a specific amount of time. Changes to the American Political Science Association ethics guidelines state that ˝Researchers who collect or generate data have the right to use those data first. Hence, scholars may postpone data access and production transparency for one year after publication of evidence-based knowledge claims relying on those data, or such period as may be specified by (1) the journal or press publishing the claims, or (2) the funding agency supporting the research through which the data were generated or collected.˝[xi] For research institutions which rely on external funding, negotiating the shelf life of data and the timeframe for publication will be an important new development in the donor guidelines.
DA-RT in Practice
Editors of major journals have mostly been receptive to these initiatives, however, and have begun to implement the new transparency standards. The Journal Editors’ Transparency Statement (JETS) is a joint statement signed by 27 leading political science journals in the United States, Great Britain, and Europe. The key points in the new standards of transparency include ensuring that cited data is available at the time of publication, and requiring authors to “delineate clearly the analytic procedures upon which their published claims rely,” while providing access to these materials if possible.[xii]
For example, the Italian Political Science Review, published by Cambridge Press, updated their editorial policy in 2015 to reflect the new norms.[xiii] This update obliges potential authors make ‘replication data’ accessible – which means all of the raw data which was used to create tables and charts for the work. For more quantitative-heavy work, relevant code, algorithms, and computer programs must likewise be made available. They suggest that this supplementary data and other materials can be archived on the Review’s website or its ‘Dataverse’ web application, so that the online article can be hyperlinked to replication data. The Review also emphasizes the anonymization of sensitive data, with confidential material being removed prior to being made accessible.
Rather than individual researchers, it is journals and funders are increasingly scrutinized regarding these changing standards. The Transparency and Openness Promotion (TOPS) Committee, sponsored by the Center for Open Science, recently published a new set of transparency guidelines which attempt to evaluate the effectiveness of journals and funding bodies. The eight components of transparency are
Each component is rated on a pre-set scale. With a growing number of signatories, these guidelines will likely continue to influence social science research and publication in the future, especially of research institutions and think tanks. The Center for Open Science has also issued a set of badges which are awarded to journals and organizations with good transparency practices, which signals the possibility of research transparency becoming as important an indicator of trustworthiness and legitimacy as financial transparency is.
For think tanks, Open Policy Research suggests a set of ‘disclosure regimes’ that will enable them to act transparently not only in their financial functioning but in their research outputs as well. A key part of this is data disclosure, including the statistics, spreadsheets, and other research data which are elements of finalized research results.[xv] Naturally, this requires a significant overview of the ethics involved, as well as the policies surrounding donors’ proprietorship of research. In practice, the implementation of DA-RT will require research institutions to update their code of ethics to reflect new transparency standards and put in place policies of anonymization of personal data and the protection of human subjects.
Though the concerns laid out remain open questions, it appears likely that transparency standards in research will soon become the norm for all journals and organizations which publish social science work, and thus it is a trend which must be paid attention to. As standards of ethics and legitimacy shift to incorporate data access and research transparency, academic and research organizations must respond accordingly.
[i] Moravscik, Andrew. “One Norm, Two Standards: Realizing Transparency in Qualitative Political Science.” Posted on January 1, 2015. http://thepoliticalmethodologist.com/2015/01/01/one-norm-two-standards-realizing-transparency-in-qualitative-political-science/.
[ii] Moravscik, Andrew. “One Norm, Two Standards: Realizing Transparency in Qualitative Political Science.” Posted on January 1, 2015. http://thepoliticalmethodologist.com/2015/01/01/one-norm-two-standards-realizing-transparency-in-qualitative-political-science/
[iii] Moravscik, Andrew. “Active Citation: A precondition for replicable qualitative research.” January 2010.
[iv] Moravscik, Andrew. “Transparency: The Revolution in Qualitative Research.” Symposium: Openness in Political Science. American Political Science Association. January 2014. https://www.princeton.edu/~amoravcs/library/transparency.pdf
[v] Allan Johnson, Jeffrey. “How data does political things: The processes of encoding and decoding data are never neutral.” LSE Impact of Social Science Blog. Oct 7, 2015. http://blogs.lse.ac.uk/impactofsocialsciences/2015/10/07/how-data-does-political-things/
[vi] Moravscik, Andrew. “One Norm, Two Standards: Realizing Transparency in Qualitative Political Science.” Posted on January 1, 2015. http://thepoliticalmethodologist.com/2015/01/01/one-norm-two-standards-realizing-transparency-in-qualitative-political-science/.
[vii] ˝2012 DA-RT Ethics Guide Changes.˝ Data Access and Research Transparency. 2015. http://www.dartstatement.org/#!2012-apsa-ethics-guide-changes/c13ay
[viii] “Petition to delay DA-RT implementation.” Nov 3 2015. https://docs.google.com/forms/d/1BWFO6462XNPBO8MyxV5WAcFtWn4m0fSXuOwq84FodKM/viewform
[ix] Efendić, Emir. “Ten ethical principles for public policy research.” Policy Hub. October 2015. http://www.policyhub.net/en/politics-and-standards/99
[x] “Petition to delay DA-RT implementation.” Nov 3 2015. https://docs.google.com/forms/d/1BWFO6462XNPBO8MyxV5WAcFtWn4m0fSXuOwq84FodKM/viewform
[xi] ˝2012 DA-RT Ethics Guide Changes.˝ Data Access and Research Transparency. 2015. http://www.dartstatement.org/#!2012-apsa-ethics-guide-changes/c13ay
[xiii] “Instructions for Contributors.” Italian Political Science Review / Rivista Italiana di Scienza Politica, Official journal of the Società Italiana di Scienza Politica. http://journals.cambridge.org/images/fileUpload/documents/IPO_ifc_July15.pdf
[xv] Gonzalez-Capitel, Jaime. “Redefining Transparency for Think Tanks.” Sept 9, 2014. http://openpolicyresearch.com/2014/09/09/redefining-transparency/