Turbocharged: automating quality analysis in trust & safety

Author

Alexandra Bianca Tîrnăcop

Corresponding Author

Affiliation: Bucharest University of Economic Studies, Bucharest, Romania

Email: tirnacopalexandra21@stud.ase.ro

ORCID: 0009-0004-2945-0643

Published:December 15, 2025

How to Cite

Tîrnăcop, A. B. (2025). Turbocharged: automating quality analysis in trust & safety. CACTUS Tourism Journal, 32 (1). doi.org/10.24818/CTS/7/2025/2.11

Based on the official APA guide. Review the full set of examples.

Abstract

Trust and Safety (T&S) is a key framework for online platforms, aiming to protect users from harm such as misinformation, harassment, and exploitation, while also supporting free expression. Although policies, AI tools, and cross-platform collaboration (e.g., GIFCT, StopNCII.org) enhance moderation, significant challenges remain. This study uses a demo dataset of 15 social media posts, reviewed by 9 moderators and checked by a single analyst. Each ticket has been reviewed by three raters to ensure agreement. The model achieved a precision, recall, and F1 score of 70.37%, with an overall accuracy of 64.44%. Automation improves efficiency but requires bias moderation, transparency, and human intervention to address challenging content. However, outsourcing and underinvestment in moderators raise ethical concerns, as human reviewers face psychological risks without adequate support. To address these issues, this paper proposes a decision matrix for use in both machine learning training and moderator and quality analyst training.

Keywords

artificial intelligence; key performance indicators; machine learning

JEL Classification

M11, O22

References

Ahmed, A. and Khan, M.N. (2024). AI and Content Moderation: Legal and Ethical Approaches to Protecting

Free Speech and Privacy [online] Available at: <https://www.researchgate.net/publication/

383661951_AI_and_Content_Moderation_Legal_and_Ethical_Approaches_to_Protecting_Free_Speec

h_and_Privacy> [Accessed 18 October 2025].

Business and Human Rights Resource Centre (BHRRC) (2021). Santa Clara Principles present standards for

tech platforms to provide transparency and accountability in content moderation. [online] Available at:

<https://www.business-humanrights.org/en/latest-news/the-santa-clara-principles-on-transparency-andaccountability-in-content-moderation/> [Accessed 20 March 2025].

Cyberhaven (2015), What are False Positives? [online] Avaliable at: <https://www.cyberhaven.com/infosecessentials/what-are-false-positives> [Accessed 13th October 2025].

Digital Trust and Satefy Partnership (DTSP) (2024a). Trust & Safety Best Practices Framework. [pdf] Digital

Trust & Satefy Partnership. Available at: <https://dtspartnership.org/wpcontent/uploads/2021/04/DTSP_Best_Practices.pdf> [Accessed 20 March 2025].

Digital Trust and Satefy Partnership (DTSP), (2024b). Best Practices for AI and Automation in Trust & Safety.

[pdf] Digital Trust & Satefy Partnership. Available at: <https://dtspartnership.org/wpcontent/uploads/2024/09/DTSP_Best-Practices-for-AI-Automation-in-Trust-Safety.pdf> [Accessed 21

March 2025]

Eissfeldt, J. and Mukherjee, S. (2023). Evaluating the Forces Shaping the Trust & Safety Industry [online]

Available at: <https://www.techpolicy.press/evaluating-the-forces-shaping-the-trust-safety-industry/>

[Accessed: October 19 2025].

Global Internet Forum to Counter Terrorism (GIFCT) (2024). GIFCT’s Hash-Sharing Database. GIFCT. [online]

Available at: <https://gifct.org/hsdb/> [Accessed 21 March 2025].

Global Internet Forum to Counter Terrorism (GIFCT) (2022). HSDB Taxonomy - FOR PUBLICATION (Dec

2022). [pdf] GIFCT. Available at: <https://gifct.org/wp-content/uploads/2022/12/HSDB-TaxonomyFOR-PUBLICATION-Dec-2022-1.pdf> [Accessed 21 March 2025].

Google for Developers (2025). Machine Learning Concepts. Classification: Accuracy, recall, precision, and

related metrics. [online] Available at: <https://developers.google.com/machine-learning/crashcourse/classification/accuracy-precision-recall> [Accessed 14 October 2025].

Habibi, M., Hovy, D. and Schwartz, C. (2025). The Content Moderator's Dilemma: Removal of Toxic Content

and Distortions to Online Discourse, Social and Information Networks. [online] Available at:

<https://doi.org/10.48550/arXiv.2412.16114> [Accessed 18 October 2025].

Horatio Insights (2025). What is Content Moderation? Pros, Cons, and Best Practices. [online] Available at:

<https://www.hirehoratio.com/blog/what-is-content-moderation> [Accessed 18 October 2025]

IHRB (2025). Content moderation is a new factory floor of exploitation – labour protections must catch up.

[online] Available at: <https://www.ihrb.org/latest/content-moderation-is-a-new-factory-floor-ofexploitation-labour-protections-must-catch-up> [Accessed 18 October 2025]

INTERPOL, (2024a). Crimes against children. INTERPOL, [online] Available at:

<https://www.interpol.int/en/Crimes/Crimes-against-children> [Accessed 21 March 2025].

INTERPOL, (2024b). Crimes against children. International Child Sexual Exploitation database. INTERPOL,

[online] Available at: <https://www.interpol.int/en/Crimes/Crimes-against-children/International-ChildSexual-Exploitation-database> [Accessed 21st March 2025].

Juba, B. and Le, H. S. (2019). Precision-Recall versus Accuracy and the Role of Large Data Sets. Proceedings of

the AAAI Conference on Artificial Intelligence, 33, pp. 4039–4048.

https://doi.org/10.1609/aaai.v33i01.33014039.

Listen Data (2024). How to Calculate Confusion Matrix in Excel. [online] Available at:

<https://www.listendata.com/2024/06/confusion-matrix-in-excel.html> [Accessed 13 October 2025].

Microsoft (2025). AND function. [online] Available at: <https://support.microsoft.com/en-us/office/and-function5f19b2e8-e1df-4408-897a-ce285a19e9d9> [Accessed 13 October 2025].

Mollas, I., Chrysopoulou, Z., Karlos, S. and Tsoumakas, G. (2021). Ethos: an online hate speech detection dataset.

[online] Available at: <https://arxiv.org/pdf/2006.08328> [Accessed 19th October 2025].

Oversight Board (2025). Content Moderation in a New Era for AI and Automation. [online] Available at:

<https://www.oversightboard.com/news/content-moderation-in-a-new-era-for-ai-and-automation/>

[Accessed 18 October 2025].

Reelmind (2025). Ametures Gone Wild: AI Content Moderation Challenges. [online] Available at:

<https://reelmind.ai/blog/ametures-gone-wild-ai-content-moderation-challenges> [Accessed 18 October

2025].

Ricknell, E. (2020). Freedom of Expression and Alternatives for Internet Governance: Prospects and Pitfalls.

Media and Communication, 8(4), pp. 110-120. https://doi.org/10.17645/mac.v8i4.3299.

Santa Clara Principles (SCP), (2021a). SCP 2.0 Toolkit for Companies. The Santa Clara Principles on

Transparency and Accountability in Content Moderation. [online] Available at:

<https://santaclaraprinciples.org/toolkit-companies/> [Accessed 20 March 2025].

Santa Clara Principles (SCP), (2021b). Santa Clara Principles 2.0 Open Consultation Report. The Santa Clara

Principles on Transparency and Accountability in Content Moderation. [online] Available at:

<https://santaclaraprinciples.org/open-consultation/> [Accessed 21 March 2025].

Shulruff, T. (2024). Trust and Safety work: internal governance of technology risks and harms. Journal of

Integrated Global STEM 1(2), pp. 95-105. https://doi.org/10.1515/jigs-2024-0003/html [Accessed 19

October 2025].

Shweta, R.C. Bajpai, R.C. and Chaturvedi, H.K. (2015). Evaluation of Inter-Rater Agreement and Inter-Rater

Reliability for Observational Data: An Overview of Concepts and Methods, Journal of the Indian

Academy of Applied Psychology, 41(3), pp. 20-27.

Siapera, E. (2021). AI Content Moderation, Racism and (de)Coloniality, International Journal of Bullying

Prevention, 4, pp. 55-65, https://doi.org/10.1007/s42380-021-00105-7.

StopNCII.org, (2025). How StopNCII.org Works. Stop Non-Consensual Intimate Image Abuse. [online] Available

at: <https://stopncii.org/chi-siamo/> [Accessed 22 March 2025].

Tremau (2025). Content Moderation: Key Practices & Challenges. [online] Available at:

<https://tremau.com/resources/content-moderation-key-practices-challenges/> [Accessed 19 October

2025].

TSPA (2025). Content Moderation Quality Assurance. [online] Available at: <https://www.tspa.org/curriculum/

ts-fundamentals/content-moderation-and-operations/content-moderation-quality-assurance/>,

[Accessed 19 October 2025].

Vargas Penagos, E. (2025). Platforms on the hook? EU and human rights requirements for human involvement in

content moderation, Cambridge Forum on AI: Law and Governance, 1, e23. https://doi.org/

10.1017/cfl.2025.3.

Walker, A.R. (2025). Legal Defense Fund exits Meta civil rights advisory group over DEI changes. [online]

Available at: <https://www.theguardian.com/technology/2025/apr/11/meta-ldf-dei-policy> [Accessed

19 October 2025].

Weigl, L. and Bodo, B. (2025). Trust and safety in the age of AI - the economics and practice of the platformbased discourse apparatus. Amsterdam Law School Legal Studies & Institute for Information Law. 2025-

1. http://dx.doi.org/10.2139/ssrn.5116478.

Woods, J. (2022). Bias in AI Program: Showing Businesses How to Reduce Bias and Mitigate Risk. Vector

Insitute, [online] Available at: <https://vectorinstitute.ai/bias-in-ai-program-showing-businesses-howto-reduce-bias-and-mitigate-risk/> [Accessed 20 March 2025].

Zeng, J., & Kaye, D. B. V. (2022). From content moderation to visibility moderation: A case study of platform

governance on TikTok. Policy & Internet, 14, pp. 79–95. https://doi.org/10.1002/poi3.287.