Evaluation of ChatGPT-4, Gemini, Claude, and Copilot in Generating Nursing Diagnoses Based on NANDA-I Taxonomy II: A Comparative Cross-Sectional Study

dc.contributor.authorTuncer, Metin
dc.contributor.authorYalcinkaya, Turgay
dc.date.accessioned2026-04-25T14:20:15Z
dc.date.available2026-04-25T14:20:15Z
dc.date.issued2025
dc.departmentSinop Üniversitesi
dc.description.abstractAim To evaluate the capability of large language models to generate nursing diagnoses based on NANDA-I Taxonomy II and assess their performance across domains and overall.Background Large language models are emerging tools in nursing, showing potential to aid in diagnosis generation and education. However, their accuracy and applicability in clinical and educational settings remain underexplored.Methods This cross-sectional comparative study used 10 realistic patient scenarios based on NANDA-I Taxonomy II, covering 12 domains. The study aimed to evaluate the capability of four models to generate nursing diagnoses based on patient scenarios. The responses were assessed by five nursing experts for accuracy and alignment with NANDA-I Taxonomy II in a single-blind evaluation process.Results All models demonstrated similar performance across different domains and overall, with Claude attaining the highest overall performance score. Expert evaluations indicated moderate interrater reliability.Discussion Small variations between models and occasional omissions suggest that expert review is still required before clinical use.Conclusions Large language models are not yet sufficiently reliable for independent use in clinical settings and nursing education. Their application as supportive tools necessitates a cautious approach. Moreover, the development of specialized models designed to address the unique demands of the nursing field would be advantageous.Implications for nursing When large language models are used in nursing practice, their limitations should be considered, and the outputs they produce should be verified by nurses.Implications for nursing policy Ensuring the safe integration of artificial intelligence tools into nursing necessitates the establishment of robust regulatory policies to safeguard patient safety, the deployment of effective systems to monitor models' performance, and the development of comprehensive guidelines and training programs.
dc.identifier.doi10.1111/inr.70135
dc.identifier.issn0020-8132
dc.identifier.issn1466-7657
dc.identifier.issue4
dc.identifier.orcid0000-0002-0115-295X
dc.identifier.orcid0000-0003-1780-9191
dc.identifier.pmid41311033
dc.identifier.scopus2-s2.0-105023232124
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1111/inr.70135
dc.identifier.urihttps://hdl.handle.net/11486/8447
dc.identifier.volume72
dc.identifier.wosWOS:001651536100013
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakPubMed
dc.language.isoen
dc.publisherWiley
dc.relation.ispartofInternational Nursing Review
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WOS_20260420
dc.subjectartificial intelligence
dc.subjectchatgpt
dc.subjectclaude
dc.subjectgemini
dc.subjectlarge language models
dc.subjectmicrosoft copilot
dc.subjectNANDA-I
dc.subjectnursing diagnosis
dc.subjectstandardized nursing language
dc.titleEvaluation of ChatGPT-4, Gemini, Claude, and Copilot in Generating Nursing Diagnoses Based on NANDA-I Taxonomy II: A Comparative Cross-Sectional Study
dc.typeArticle

Dosyalar