The durability of research findings can be cast in terms of three Rs. Findings should be reproducible (the same type of analysis using the same data should produce the same result); replicable (redoing an experiment to collect fresh data should produce the same result); and robust (alternative analyses using the same data should draw the same conclusion). This week's issue of Nature Magazine includes four papers that look at these three Rs in the social and behavioural sciences. Three of them are are an outcome of nearly US$8 million in DARPA funding provided back in 2019 to the Systematizing Confidence in Open Research and Evidence (SCORE) programme. The fourth paper reports the outcome of a series of one-day ‘replication games’ workshops organized around the world since 2022 by the Institute for Replication, a virtual, non-profit network. The results are sobering: researchers could replicate the results of only half of the studies that they tested. Clearly, current rates of replicability and reproducibility leave much room for improvement, to put it mildly. But rather than despair or throw the proverbial stones, we should focus on the value that such methodical retrospection offers. Any insights that can make the path to reliable findings more reliable will accelerate progress. Looking back at previous work is as necessary as looking ahead; it should be funded and it should be published. Rigorous practices to do so demonstrate the scientific method at work, as the papers just published show so clearly. You can see the papers, our editorial as well as the news coverage, and a Q&A with Brian Nosek, a lead on the SCORE project here: https://lnkd.in/emgPeiXb
Replication Standards
Explore top LinkedIn content from expert professionals.
Summary
Replication standards are guidelines that help ensure research findings can be repeated by other researchers, increasing trust and reliability in scientific results. These standards include practices for sharing data, documenting methods, and clearly reporting procedures so that others can attempt to reproduce or replicate studies and verify their outcomes.
- Share data and code: Make your datasets and analysis scripts publicly available so others can review, reuse, or attempt replication of your work.
- Document study methods: Provide detailed descriptions of your study design, recruitment procedures, and statistical analyses to make it easier for others to follow and repeat your process.
- Encourage open practices: Use badges and incentives to signal transparency and welcome replication studies that test the boundaries or generalizability of your findings.
-
-
Attention #Journal #Editors! Given the recent #FrancescaGino #scandal at Harvard Business School, there is an urgency to improve #research #credibility. This article, just published, describes things we can do NOW that do not require much time, effort, or money, to help minimize future #retractions and scandals. For example: 1️⃣ Encourage or Require Data and Code Sharing. Editors can implement a policy that strongly encourages (or requires) authors to submit replication datasets and analysis code as supplementary materials. This supports verification and reuse without requiring financial investment or infrastructure beyond what most journal platforms already offer. 2️⃣Adopt and Promote Badges for Open Practices. Editors can offer badges (e.g., for open data, open materials, and preregistration) as visual incentives for transparency. These badges cost nothing but can powerfully signal credibility to readers and reviewers. 3️⃣Ask Reviewers to Comment on Transparency. Editors can modify reviewer forms to include a checkbox or question on whether the manuscript meets basic transparency standards, such as reporting sample sizes, exclusions, and robustness checks. This minimal addition raises awareness and accountability without slowing down the process. 4️⃣Publish Editorials That Set Expectations. Editors can write brief editorials or editors’ notes that articulate the journal’s commitment to research credibility and encourage authors to follow best practices. This sets a cultural tone and aligns expectations without requiring policy overhauls. 5️⃣Encourage Submission of Replication Studies. Editors can explicitly state in the journal’s aims and scope that replication studies are welcome, especially those that test boundary conditions or use different samples. This small change encourages cumulative science and helps legitimize replication as a valued scholarly contribution. 6️⃣Promote Methodological Transparency in Author Guidelines. Revise the journal's author instructions to require detailed reporting of study procedures, including recruitment, exclusions, measures, and statistical methods. Providing a checklist or link to reporting standards (e.g., #APA’s JARS) helps authors comply without confusion. 7️⃣Create a Fast-Track for Methodologically Transparent Studies. Offer a streamlined review process for papers that meet predefined transparency criteria, such as open data and preregistered protocols. This creates an incentive for authors to adopt best practices and reduces reviewer burden. Academy of International Business (AIB) HR Division - Academy of Management ONE Division, AOM AOM STR - Strategic Management Division AOM Organization & Management Theory Division (OMT) AOM TIM Division Australian & New Zealand Academy of Management Eastern Academy of Management AOM ENT Division EUROPEAN ACADEMY OF MANAGEMENT Ellen Granberg Sevin Yeltekin Iberoamerican Academy of Management Management Faculty of Color Association (MFCA) Eddy Ng
-
Whether you want to further research or build innovative applications, replicating papers is often a valuable starting point. However, it can be a complex and time consuming task, considering papers' complexity and availability of resources. For this reason, in "PaperBench: Evaluating AI's ability to Replicate AI Research" by Giulio Starace et al. 2025 (multiple first authors) at OpenAI, the authors introduce the homonymous benchmark which can be used to assess an AI agent's capability to perform said task, in particular AI / ML oriented ones. In terms of data, their "benchmark consists of 20 Spotlight and Oral papers selected from those presented at the 2024 International Conference on Machine Learning (ICML)". For each paper, they manually define a "rubric": that is, a set of requirements that the agent needs to complete so to ensure a correct replication. This, in turn, leads to a total of 8316 "individually gradable outcomes". The agent is placed in a GPU-provisioned virtual machine and shown the paper along an addendum file with clarifications from the authors. From these inputs, the agent is expected to produce a codebase replicating the paper's empirical results, containing the code and a bash script as an entrypoint for experiment execution. Particularly interesting is their Replication Scoring system, defined on a continuum, as per Figure 2. A rubric (subset of requirements needed to be met to claim replication) is associated with each of the 20 papers. Each rubric was developed in collaboration with the original authors of the corresponding paper. More in detail, these requirements are organized hierarchically, forming a tree-like structure (bottom up in the picture, from the paper itself). The "leaf" node requirements (at the bottom with no children), are binarily evaluated (0 = failed, 1 = succeeded, blue and red child leaves). Then, for a given non-leaf node, the completion is defined as the weighted average of the scores of its children. This is bubbled up through the tree, up until we obtain a final float describing the degree of replication of the paper, 55% in this case. The weight of a given node is defined based on how important the node's requirements are in terms of replicating the paper compared to its siblings, and the binary evaluation is done via an LLM-as-a-judge system. The requirements can be of three types: Result Matching (candidate agent replicated a given result), Execution (whether the candidate agent's entrypoint bash script successfully executed the correct blocks of logic) and Code Development (whether the implementations are correct). The paper is rich in results: all in all, their best agent, Claude 3.5 Sonnet with open-source scaffolding, "achieves an average replication score of 21%", "not yet outperform[ing] the human baseline". Link to the article in the comments: in the picture, from it, the aforementioned tree-based structure to evaluate replication scores.
-
🇼🇮🇳🇩🇴🇼🇸 🇸🇪🇷🇻🇪🇷 🔄 𝗣𝗼𝘀𝘁 𝟵𝟴: 𝗔𝗰𝘁𝗶𝘃𝗲 𝗗𝗶𝗿𝗲𝗰𝘁𝗼𝗿𝘆 𝗥𝗲𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: 𝗧𝗵𝗲 𝗜𝗻𝗻𝗲𝗿 𝗪𝗼𝗿𝗸𝗶𝗻𝗴𝘀 𝗨𝗻𝘃𝗲𝗶𝗹𝗲𝗱 🔄 Active Directory replication is the backbone of a synchronized and resilient domain environment. But how does it actually work under the hood? From Update Sequence Numbers (USN) to Up-To-Dateness Vector Tables (UTDV) , understanding these core components is key. I’ve created a detailed PDF guide that breaks down the replication process step-by-step. It covers: - The role of USN , DSA GUID , and Invocation ID in tracking changes. - How the High-Watermark Vector Table (HWV) and Up-To-Dateness Vector Table (UTDV) ensure efficient updates. - A real-life example with tables and visuals to simplify complex concepts. 𝘙𝘦𝘧𝘦𝘳𝘦𝘯𝘤𝘦: 𝘊𝘩𝘢𝘱𝘵𝘦𝘳 6, 𝘈𝘤𝘵𝘪𝘷𝘦 𝘋𝘪𝘳𝘦𝘤𝘵𝘰𝘳𝘺, 5𝘵𝘩 𝘌𝘥𝘪𝘵𝘪𝘰𝘯 𝘣𝘺 𝘉𝘳𝘪𝘢𝘯 𝘋𝘦𝘴𝘮𝘰𝘯𝘥, 𝘑𝘰𝘦 𝘙𝘪𝘤𝘩𝘢𝘳𝘥𝘴, 𝘙𝘰𝘣𝘣𝘪𝘦 𝘈𝘭𝘭𝘦𝘯, 𝘢𝘯𝘥 𝘈𝘭𝘪𝘴𝘵𝘢𝘪𝘳 𝘎. 𝘓𝘰𝘸𝘦-𝘕𝘰𝘳𝘳𝘪𝘴, 𝘖'𝘙𝘦𝘪𝘭𝘭𝘺 𝘔𝘦𝘥𝘪𝘢. ┈➤ You can also find all previous Windows Server posts here: https://lnkd.in/gw7K5Her #ActiveDirectory #WindowsServer #ITPro #TechTips #ADReplication #DeepDive
-
New chapter on replication studies in my free textbook https://lnkd.in/eB5B9udq I discuss the difference between direct and conceptual replications, how to analyze them, why don't just do 1 study with alpha 0.0025, and how to deal with conflicting results across studies. I define a direct replication as a study where a researcher has the goal to not introduce variability in the effect size compared to the original study, while in a conceptual replication variability is introduced intentionally with the goal to test the generalizability. I hope this is helpful. Some approaches focus on a judgment of the similarity in operationalizations between studies, but as there are always differences, I think we should focus on what the researcher intends to test. Then I discuss 3 approaches to analyze replication studies. A test of differences in effect sizes. Is the replication study significant? Is the effect (or the difference between effects) too small to matter? The first is too rarely done, but the most interesting. Then it becomes a bit niche, but I dive into why we do 2 studies with alpha = 0.05, and not just one study with alpha 0.05 * 0.05 = 0.0025. There is a bunch of non-statistical reasons (identifying systematic error). Interestingly, it depends which of the 2 options is more efficient, in terms of that you need a smaller sample size. It does not matter a lot, but depending on the power, sidedness of the test, and effect size, either can be just a bit more efficient. Then I discuss Many Labs 5 as an example of how difficult it is to predict if studies will replicate. It’s a great study showing researchers can’t predict whether moderators matter in replication studies. It shows we can only know is something replicates if we replicate it. I hope it will be a useful chapter to read through if you are thinking of doing replication research yourself, or if you want to teach about it to your students! https://lnkd.in/eB5B9udq We will also discuss this topic in the next 2 episodes of our podcast Nullius in Verba!
-
#replication studies are fundamental to advance #scientific #knowledge. Put differently, replication is a cornerstone of #science. However, time, money, and energy required for scientific work and thus repliction studies are limited. So, #researchers sometimes have to decide whether and how a replication should be untertaken. In their PNAS Perspective, Clintin Davis-Stober et al. develop and present an intriguing framework guiding replication decisions (see below). Based on this, they further lay out a set of questions scientisits should/could answer when deciding whether and how to replicate. 1) What nonepistemic #values are related to, or impacted by, this replication? E.g., Is there a larger harm we are trying to mitigate for a population? How will broader audiences use the results from the replication? 2) What epistemic values should be considered when evaluating evidence resulting from the replication? E.g., Will the replication results be used to develop or test a theory? 3) What cognitive attitudes do I hold about the replication? E.g., Is the aim to establish a highly robust and replicable result (Book of Truths) or not (Book of Conversations)? How do I consider the claims of the original study? 4) Is there alignment between my replication decisions and my values and attitudes? How does my experimental design connect to my values? Is this consistent with my cognitive attitude on the replication? Read the full perspective to gain more valuable insights and guidance (incl. #preregistration).
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development