- Research
- Open access
- Published:
A cross-sectional study on ChatGPT’s alignment with clinical practice guidelines in musculoskeletal rehabilitation
BMC Musculoskeletal Disorders volume 26, Article number: 411 (2025)
Abstract
Background
AI models like ChatGPT have the potential to support musculoskeletal rehabilitation by providing clinical insights. However, their alignment with evidence-based guidelines needs evaluation before integration into physiotherapy practice.
Objective
To evaluate the performance of ChatGPT (GPT-4 model) in generating responses to musculoskeletal rehabilitation queries by comparing its recommendations with evidence-based clinical practice guidelines (CPGs).
Design
This study was designed as a cross-sectional observational study.
Methods
Twenty questions covering disease information, assessment, and rehabilitation were developed by two experienced physiotherapists specializing in musculoskeletal disorders. The questions were distributed across three anatomical regions: upper extremity (7 questions), lower extremity (9 questions), and spine (4 questions). ChatGPT’s responses were obtained and evaluated independently by two raters using a 5-point Likert scale assessing relevance, accuracy, clarity, completeness, and consistency. Weighted kappa values were calculated to assess inter-rater agreement and consistency within each category.
Results
ChatGPT’s responses received the highest average score for clarity (4.85), followed by accuracy (4.62), relevance (4.50), and completeness (4.20). Consistency received the lowest score (3.85). The highest agreement (weighted kappa = 0.90) was observed in the disease information category, whereas rehabilitation displayed relatively lower agreement (weighted kappa = 0.56). Variability in consistency and moderate weighted kappa values in relevance and clarity highlighted areas requiring improvement.
Conclusions
This study demonstrates ChatGPT’s potential in providing guideline-aligned information in musculoskeletal rehabilitation. However, due to observed limitations in consistency, completeness, and the ability to replicate nuanced clinical reasoning, its use should remain supplementary rather than as a primary decision-making tool. While it performed better in disease information, as evidenced by higher inter-rater agreement and scores, its performance in the rehabilitation category was comparatively lower, highlighting challenges in addressing complex, nuanced therapeutic interventions. This variability in consistency and domain-specific reasoning underscores the need for further refinement to ensure reliability in complex clinical scenarios.
Clinical trial number
Not applicable.
Background
Artificial intelligence (AI) has become an integral component of modern healthcare, offering innovative tools to enhance diagnostics, clinical decision-making, and patient care efficiency [1, 2]. Among AI applications, natural language processing (NLP) models such as ChatGPT have garnered significant attention due to their ability to generate human-like responses to complex medical inquiries [3, 4]. These models are trained on vast datasets derived from diverse information sources, enabling them to produce contextually relevant and coherent outputs. However, ensuring the alignment of AI-generated content with evidence-based clinical practice guidelines (CPGs) is paramount to establishing their reliability in medical practice [5].
Recent studies have demonstrated the potential of AI tools in various healthcare domains, including disease diagnosis, treatment planning, and patient education. In musculoskeletal physical therapy, ChatGPT has shown notable promise, achieving up to 80% compliance with CPGs [6,7,8]. Additionally, it has been proposed as a means to optimize workflows in resource-limited settings where rapid access to accurate clinical information is crucial [9]. Despite these advantages, concerns persist regarding the consistency and reliability of AI-generated responses, particularly in complex clinical scenarios that demand nuanced clinical reasoning [10, 11]. While AI models like ChatGPT can generate well-structured and plausible content, they may also produce inaccurate or misleading information, underscoring the necessity for critical evaluation by healthcare professionals [12].
Musculoskeletal rehabilitation, a cornerstone of physical therapy, plays a crucial role in restoring function, alleviating pain, and enhancing the quality of life for individuals with musculoskeletal disorders [13]. For physiotherapists, adherence to evidence-based CPGs ensures the effectiveness and standardization of treatments, thereby optimizing patient outcomes and maintaining professional accountability [14, 15]. Integrating AI systems such as ChatGPT presents an opportunity to support physiotherapists by providing rapid, guideline-consistent recommendations, particularly in time-constrained or resource-limited settings [9]. Previous research has predominantly examined ChatGPT’s responses in diagnosing specific musculoskeletal pathologies or surgical conditions, such as lumbar radicular pain [11], degenerative spondylolisthesis [7], as well as surgical procedures like anterior cruciate ligament (ACL) reconstruction [10], and rotator cuff repairs [16]. Additionally, its application in clinical decision support has been explored [8]. However, physiotherapy practice encompasses not only diagnostics but also detailed patient assessment and evidence-based rehabilitation interventions, requiring comprehensive clinical reasoning beyond mere diagnostic capabilities. Yet, the current literature lacks an in-depth evaluation of ChatGPT’s capability to provide physiotherapy-specific recommendations, particularly regarding assessment strategies and rehabilitation interventions aligned with CPGs. Unlike previous studies that focused primarily on medical diagnoses or surgical scenarios, our study specifically investigates how ChatGPT performs in physiotherapy-specific domains such as assessment and rehabilitation—core components of physiotherapeutic management in musculoskeletal care. Therefore, the novelty of this study lies in its focused evaluation of ChatGPT’s performance in three physiotherapy-specific domains: disease information, patient assessment, and evidence-based rehabilitation practices—areas that have been less explored in previous research. By identifying areas where AI aligns with or diverges from established guidelines, this research seeks to inform the potential role of ChatGPT in clinical education and decision support within musculoskeletal rehabilitation.
Methods
Study design
This study employed a cross-sectional observational design to compare recommendations generated by evidence-based CPGs with those provided by ChatGPT’s GPT-4 model for musculoskeletal conditions. A total of twenty questions were systematically developed by two physiotherapists, each with over eight years of clinical experience in musculoskeletal rehabilitation. The development process began by identifying the most commonly encountered musculoskeletal conditions and clinical decision-making challenges in physiotherapy. Only conditions with established and accessible CPGs were included to ensure objective benchmarking. The questions were then categorized into three key domains: disease information, patient assessment, and rehabilitation. Additionally, efforts were made to ensure a balanced anatomical distribution—seven questions for the upper extremity, nine for the lower extremity, and four for the spine. A purposive sampling approach was adopted, and the number of questions was determined based on the need to cover a diverse range of physiotherapy-relevant scenarios while ensuring feasibility for expert rating and statistical analysis.
Each question was submitted in a new session to avoid memory retention or contextual influence between prompts. While ChatGPT does not learn from individual prompts in real time, periodic updates released by OpenAI may influence model performance; therefore, specifying the data collection period enhances reproducibility. The model was explicitly instructed to respond from the perspective of a physiotherapist to simulate clinical reasoning and decision-making comparable to that of an experienced practitioner. All responses were recorded verbatim to ensure accuracy and consistency during subsequent analysis.
Outcome measurement
The responses generated by ChatGPT were independently assessed by two musculoskeletal physiotherapists, each with over eight years of clinical experience. The two physiotherapists who evaluated the responses were also involved in developing the questions. To mitigate potential bias, each question was discussed and evaluated independently by both raters using a predefined Likert scale, and inter-rater agreement was statistically analyzed using weighted kappa coefficients.
A 5-point Likert scale (ranging from 1 = Strongly Disagree to 5 = Strongly Agree) was used to evaluate the following predefined response characteristics:
Relevance
Does the response directly address the question posed?
Accuracy
Is the information provided accurate and consistent with current clinical practice guidelines?
Clarity
Is the response well-organized and easy to comprehend?
Completeness
Does the response comprehensively address all aspects of the question?
Consistency
How consistent are the responses generated by ChatGPT when the same question is posed multiple times?
To facilitate comparative analysis, Table 1 presents the questions, CPG-derived answers, and ChatGPT’s verbatim responses, offering a comprehensive overview of the dataset.
Statistical analysis
The average scores from the two raters’ Likert scale evaluations were computed. To assess inter-rater agreement, weighted kappa coefficients were calculated for each evaluation criterion: relevance, accuracy, clarity, completeness, and consistency. Furthermore, the dataset was stratified into three primary categories—disease information, assessment, and rehabilitation—to examine inter-rater agreement within each category. Weighted kappa values were computed both for the overall dataset and within these specific domains.
All statistical analyses, including weighted kappa computations, were conducted using IBM SPSS Statistics version 25.
Results
A comparative analysis was conducted on the 20 predefined questions and their corresponding ChatGPT-generated responses, with the original questions, ChatGPT responses, and guideline-based answers presented in Table 1.
Section-wise average scores revealed that clarity received the highest mean score of 4.85, suggesting that ChatGPT’s responses were well-structured and easy to comprehend. In contrast, consistency obtained the lowest mean score of 3.85, indicating notable variability in responses when the same question was posed multiple times. Additional scores included accuracy at 4.62, relevance at 4.50, and completeness at 4.20.
Inter-rater agreement, evaluated using weighted kappa values, varied across the assessed criteria. Consistency exhibited the highest inter-rater agreement, with a weighted kappa of 0.88, followed by completeness (κ = 0.71) and accuracy (κ = 0.57). Moderate agreement was observed for relevance (κ = 0.45), whereas clarity exhibited the lowest agreement, with a weighted kappa of 0.27. The question-wise average scores and weighted kappa values for each criterion are detailed in Table 2.
Further subgroup analysis based on question categories (disease information, assessment, and rehabilitation) revealed notable differences. The disease information category exhibited the highest agreement, with a weighted kappa of 0.90, suggesting ChatGPT provided consistent and accurate responses in this domain. In contrast, the rehabilitation category showed lower agreement, with a weighted kappa of 0.56, indicating greater variability in ChatGPT’s performance when addressing rehabilitation-related queries. The average scores and weighted kappa values for each category are summarized in Table 3.
Discussion
The findings of this study provide valuable insights into both the potential and limitations of ChatGPT’s GPT-4 model in addressing queries related to musculoskeletal rehabilitation. By systematically comparing ChatGPT’s responses with evidence-based CPGs, this study highlights the model’s ability to generate clinically relevant and accurate information while also identifying areas requiring improvement. ChatGPT demonstrated notable strengths in clarity and accuracy, as reflected in its high ratings. The highest mean score for clarity underscores the AI model’s ability to present well-structured and easily comprehensible information. Another noteworthy strength was the model’s performance in relevance and accuracy (4.62), reflecting its ability to provide contextually appropriate and largely evidence-aligned responses. Furthermore, the high weighted kappa values for consistency and completeness in the disease information category suggest that ChatGPT’s responses are not only comprehensive but also exhibit minimal variability when addressing well-defined queries. Despite these strengths, the lowest scores for completeness and consistency highlight areas in need of improvement. Variability in responses to repeated queries raises concerns about reliability, particularly in scenarios requiring precise and consistent recommendations. The rehabilitation category exhibited the lowest weighted kappa value, suggesting challenges in addressing the nuanced and complex aspects of therapeutic interventions.
These results align with prior research emphasizing the strengths and challenges of AI-driven decision-support tools in healthcare [17, 18]. For instance, Bilika et al. explored ChatGPT’s application in physiotherapy decision-making, underscoring the importance of its cautious use and the need for informed judgment in clinical practice [19]. Such clarity enhances the usability of ChatGPT as an educational tool for physiotherapists and other healthcare professionals, particularly in environments where rapid access to clear and concise information is critical. These results are consistent with earlier studies demonstrating ChatGPT’s adherence to clinical guidelines in musculoskeletal care, with reported compliance rates reaching up to 80% in similar contexts [8]. This finding underscores ChatGPT’s potential as a decision-support tool, particularly for routine clinical queries. This may be attributed to limitations in ChatGPT’s training data, particularly concerning specialized and context-sensitive rehabilitation scenarios. For example, Gianola et al. found inconsistencies in ChatGPT’s recommendations for lumbosacral radicular pain compared to CPGs, raising concerns about accuracy and internal consistency [11]. Similarly, Sawamura et al. [12] concluded that although ChatGPT can generate accurate responses, its reference reliability and selection remain notable limitations. Consequently, they emphasized the necessity of cautious use, as ChatGPT is not entirely dependable for clinical decision-making. Additionally, the moderate inter-rater agreement for relevance and the low agreement for clarity emphasize the need for refining ChatGPT’s training algorithms to enhance alignment with evidence-based guidelines. Such variability in content quality has also been documented in studies examining AI-generated responses to complex medical queries [20].
Limitations
This study has several limitations that should be acknowledged. First, only two assessors were involved in the evaluation, which may limit the generalizability of inter-rater agreement results. Second, the assessors also developed the questions, introducing potential bias despite independent scoring and statistical analysis of agreement. Third, while the assessment criteria were informed by existing literature and expert consensus, the rating tool itself has not been psychometrically validated. Additionally, the study was limited to a fixed set of 20 questions, which, although diverse, may not fully represent the range of real-world clinical scenarios. Finally, the findings reflect ChatGPT’s performance during a specific time window and may not apply to future updates of the model.
Implications for clinical practice
The findings of this study carry significant implications for the integration of AI tools like ChatGPT into musculoskeletal rehabilitation. While the model demonstrates promise in delivering clear and accurate information, its limitations necessitate careful implementation. ChatGPT should be utilized as a supplementary resource rather than a primary decision-making tool, ensuring that AI-generated responses are cross-checked against established clinical guidelines to maintain accuracy. Furthermore, the results underscore ChatGPT’s potential role in clinical education. By providing readily accessible, guideline-consistent information, ChatGPT can serve as a valuable resource in the training of physiotherapists and other healthcare professionals. However, further refinement of the model is necessary to enhance its ability to address the complexities of rehabilitation interventions and ensure adherence to domain-specific practices. Particularly, improving the reliability of ChatGPT’s recommendations in rehabilitation contexts could significantly expand its clinical utility. At the same time, educators should be mindful that reliance on AI tools does not replace the need for students to engage critically with original sources, learn how to interpret clinical guidelines, and develop independent reasoning skills.
In summary, ChatGPT exhibits high potential in musculoskeletal rehabilitation education and information retrieval but requires cautious integration into practice due to its variability in complex clinical reasoning tasks. Ongoing evaluation and refinement of AI tools are essential to enhance their reliability, particularly in context-specific applications such as physiotherapy.
Future directions
To address the identified limitations, future research should focus on fine-tuning ChatGPT for specific healthcare domains, particularly musculoskeletal rehabilitation. Incorporating domain-specific datasets and enhancing the model’s ability to interpret complex clinical scenarios could substantially improve its practical applicability. Additionally, longitudinal studies assessing the long-term impact of ChatGPT’s integration into clinical practice are warranted.
Furthermore, the role of contextual and placebo effects in shaping user perceptions of AI-generated responses requires further exploration. As highlighted in this study, non-specific factors—such as response presentation and perceived authority—may influence both the acceptability and perceived reliability of AI tools. Investigating these factors could inform strategies to optimize the design and deployment of AI systems in healthcare, ensuring their effective and responsible use. Future research should include controlled trials assessing the impact of ChatGPT use in clinical education and patient care decision-making, particularly in real-time clinical scenarios.
Conclusion
This study underscores ChatGPT’s dual role as both a promising and a challenging tool in musculoskeletal rehabilitation. While it excels in clarity, relevance, and accuracy, its limitations in consistency and domain-specific reasoning necessitate careful oversight and ongoing refinement. Addressing these challenges could enable ChatGPT to evolve into a reliable decision-support tool, ultimately enhancing clinical practice, education, and patient care in musculoskeletal rehabilitation.
Key Points
Findings
-
ChatGPT’s responses demonstrated high clarity (average score of 4.85) and relevance (4.50) when compared to clinical guidelines for musculoskeletal rehabilitation.
-
The lowest score was observed in consistency (3.85), highlighting variability in repeated responses.
Implications
-
These findings suggest that ChatGPT has potential as a supplementary tool for physiotherapists, offering guideline-aligned recommendations in musculoskeletal care.
-
Enhancing the model’s consistency could improve its reliability for routine clinical use and decision support.
Caution
-
The study relied on a predefined set of questions, which may not represent the full complexity of clinical practice.
-
The evaluation was based on subjective scoring, which, despite using experienced raters, introduces potential variability. Furthermore, although the assessment criteria were informed by existing literature and expert consensus, the rating tool itself has not been formally validated.
Data availability
All data supporting the findings of this study, including ChatGPT-generated responses and Likert-scale ratings, are available from the corresponding author upon reasonable request.
Abbreviations
- ACL:
-
Anterior Cruciate Ligament
- AI:
-
Artificial Intelligence
- CPG:
-
Clinical Practice Guideline
- GPT-4:
-
Generative Pre-trained Transformer 4
- NLP:
-
Natural Language Processing
- SPSS:
-
Statistical Package for the Social Sciences
References
Sarella PNK, Mangam VT. AI-driven natural Language processing in healthcare: transforming patient-provider communication. Indian J Pharm Pract. 2024;17(1).
Kalra N, Verma P, Verma S. Advancements in AI based healthcare techniques with FOCUS ON diagnostic techniques. Comput Biol Med. 2024;179:108917.
Jain K, Prajapati V. NLP/deep learning techniques in healthcare for decision making. Prim Health Care: Open Access. 2021;11(3):373–80.
Biswas SS. Role of chat Gpt in public health. Ann Biomed Eng. 2023;51(5):868–9.
Van Dis EA, Bollen J, Zuidema W, Van Rooij R, Bockting CL. ChatGPT: five priorities for research. Nature. 2023;614(7947):224–6.
Rajjoub R, Arroyave JS, Zaidat B, Ahmed W, Mejia MR, Tang J, et al. ChatGPT and its role in the decision-making for the diagnosis and treatment of lumbar spinal stenosis: a comparative analysis and narrative review. Global Spine J. 2024;14(3):998–1017.
Ahmed W, Saturno M, Rajjoub R, Duey AH, Zaidat B, Hoang T et al. ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis. Eur Spine J. 2024:1–22.
Hao J, Yao Z, Tang Y, Remis A, Wu K, Yu X. Artificial Intelligence in Physical Therapy: Evaluating ChatGPT’s Role in Clinical Decision Support for Musculoskeletal Care. Ann Biomed Eng. 2025:1–5.
Ismail AMA. ChatGPT: an expected excellent future technology in enhancing patient care education and physiotherapists’ continuous training. Eur J Physiotherapy. 2024;26(1):62–3.
Johns WL, Martinazzi BJ, Miltenberg B, Nam HH, Hammoud S. ChatGPT provides unsatisfactory responses to frequently asked questions regarding anterior cruciate ligament reconstruction. Arthroscopy: The Journal of Arthroscopic & Related Surgery; 2024.
Gianola S, Bargeri S, Castellini G, Cook C, Palese A, Pillastrini P, et al. Performance of ChatGPT compared to clinical practice guidelines in making informed decisions for lumbosacral radicular pain: a cross-sectional study. J Orthop Sports Phys Therapy. 2024;54(3):222–8.
Sawamura S, Bito T, Ando T, Masuda K, Kameyama S, Ishida H. Evaluation of the accuracy of ChatGPT’s responses to and references for clinical questions in physical therapy. J Phys Therapy Sci. 2024;36(5):234–9.
Finucane LM, Stokes E, Briggs AM. Its everyone’s responsibility: responding to the global burden of musculoskeletal health impairment. Musculoskeletal science & practice; 2023. p. 102743.
Lin I, Wiles L, Waller R, Goucke R, Nagree Y, Gibberd M, et al. What does best practice care for musculoskeletal pain look like? Eleven consistent recommendations from high-quality clinical practice guidelines: systematic review. Br J Sports Med. 2020;54(2):79–86.
Hoffmann TC, Lewis J, Maher CG. Shared decision making should be an integral part of physiotherapy practice. Physiotherapy. 2020;107:43–9.
Kolac UC, Karademir OM, Ayik G, Kaymakoglu M, Familiari F, Huri G. Can popular AI large Language models provide reliable answers to frequently asked questions. About Rotator Cuff Tears? JSES International; 2024.
Neravetla AR, Nomula VK, Mohammed AS, Dhanasekaran S, editors. Implementing AI-driven Diagnostic Decision Support Systems for Smart Healthcare. 2024 15th International Conference on Computing Communication and Technologies N. (ICCCNT); 2024: IEEE.
Dlugatch R, Georgieva A, Kerasidou A. AI-driven decision support systems and epistemic reliance: a qualitative study on obstetricians’ and midwives’ perspectives on integrating AI-driven CTG into clinical decision making. BMC Med Ethics. 2024;25(1):6.
Bilika P, Stefanouli V, Strimpakos N, Kapreli EV. Clinical reasoning using ChatGPT: is it beyond credibility for physiotherapists use? Physiother Theory Pract. 2024;40(12):2943–62.
AlShehri Y, McConkey M, Lodhia P. ChatGPT provides satisfactory but occasionally inaccurate answers to common patient hip arthroscopy questions. Arthroscopy: The Journal of Arthroscopic & Related Surgery; 2024.
Erickson M, Lawrence M, Jansen CWS, Coker D, Amadio P, Cleary C, et al. Hand pain and sensory deficits: carpal tunnel syndrome: clinical practice guidelines linked to the international classification of functioning, disability and health from the academy of hand and upper extremity physical therapy and the academy of orthopaedic physical therapy of the American physical therapy association. J Orthop Sports Phys Therapy. 2019;49(5):CPG1–85.
Blanpied PR, Gross AR, Elliott JM, Devaney LL, Clewley D, Walton DM, et al. Neck pain: revision 2017: clinical practice guidelines linked to the international classification of functioning, disability and health from the orthopaedic section of the American physical therapy association. J Orthop Sports Phys Therapy. 2017;47(7):A1–83.
Martin RL, Cibulka MT, Bolgla LA, Koc TA Jr, Loudon JK, Manske RC, et al. Hamstring strain injury in athletes: clinical practice guidelines linked to the international classification of functioning, disability and health from the academy of orthopaedic physical therapy and the American academy of sports physical therapy of the American physical therapy association. J Orthop Sports Phys Therapy. 2022;52(3):CPG1–44.
McDonough CM, Harris-Hayes M, Kristensen MT, Overgaard JA, Herring TB, Kenny AM, et al. Physical therapy management of older adults with hip fracture: clinical practice guidelines linked to the international classification of functioning, disability and health from the academy of orthopaedic physical therapy and the academy of geriatric physical therapy of the American physical therapy association. J Orthop Sports Phys Therapy. 2021;51(2):CPG1–81.
Carcia CR, Martin RL, Houck J, Wukich DK, Altman RD, Curwin S, et al. Achilles pain, stiffness, and muscle power deficits: Achilles tendinitis: clinical practice guidelines linked to the international classification of functioning, disability, and health from the orthopaedic section of the American physical therapy association. J Orthop Sports Phys Therapy. 2010;40(9):A1–26.
Kreiner DS, Hwang SW, Easa JE, Resnick DK, Baisden JL, Bess S, et al. An evidence-based clinical guideline for the diagnosis and treatment of lumbar disc herniation with radiculopathy. Spine J. 2014;14(1):180–91.
Lucado AM, Day JM, Vincent JI, MacDermid JC, Fedorczyk J, Grewal R, et al. Lateral elbow pain and muscle function impairments: clinical practice guidelines linked to the international classification of functioning, disability and health from the academy of hand and upper extremity physical therapy and the academy of orthopaedic physical therapy of the American physical therapy association. J Orthop Sports Phys Therapy. 2022;52(12):CPG1–111.
Willy RW, Hoglund LT, Barton CJ, Bolgla LA, Scalzitti DA, Logerstedt DS, et al. Patellofemoral pain: clinical practice guidelines linked to the international classification of functioning, disability and health from the academy of orthopaedic physical therapy of the American physical therapy association. J Orthop Sports Phys Therapy. 2019;49(9):CPG1–95.
Koc TA Jr, Bise CG, Neville C, Carreira D, Martin RL, McDonough CM. Heel pain–plantar fasciitis: revision 2023: clinical practice guidelines linked to the international classification of functioning, disability and health from the academy of orthopaedic physical therapy and American academy of sports physical therapy of the American physical therapy association. J Orthop Sports Phys Therapy. 2023;53(12):CPG1–39.
Kotsifaki R, Korakakis V, King E, Barbosa O, Maree D, Pantouveris M, et al. Aspetar clinical practice guideline on rehabilitation after anterior cruciate ligament reconstruction. Br J Sports Med. 2023;57(9):500–14.
Enseki KR, Bloom NJ, Harris-Hayes M, Cibulka MT, Disantis A, Di Stasi S, et al. Hip pain and movement dysfunction associated with nonarthritic hip joint pain: A revision: clinical practice guidelines linked to the international classification of functioning, disability, and health from the academy of orthopaedic physical therapy and American academy of sports physical therapy of the American physical therapy association. J Orthop Sports Phys Therapy. 2023;53(7):CPG1–70.
Michener LA, Heitzman J, Abbruzzese LD, Bondoc SL, Bowne K, Henning PT, et al. Physical therapist management of glenohumeral joint osteoarthritis: a clinical practice guideline from the American physical therapy association. Phys Ther. 2023;103(6):pzad041.
Kelley MJ, Shaffer MA, Kuhn JE, Michener LA, Seitz AL, Uhl TL, et al. Shoulder pain and mobility deficits: adhesive capsulitis: clinical practice guidelines linked to the international classification of functioning, disability, and health from the orthopaedic section of the American physical therapy association. J Orthop Sports Phys Therapy. 2013;43(5):A1–31.
Mehta SP, Karagiannopoulos C, Pepin M-E, Ballantyne BT, Michlovitz S, MacDermid JC, et al. Distal radius fracture rehabilitation: clinical practice guidelines linked to the international classification of functioning, disability, and health from the academy of orthopaedic physical therapy and academy of hand and upper extremity physical therapy of the American physical therapy association. J Orthop Sports Phys Therapy. 2024;54(9):CPG1–78.
George SZ, Fritz JM, Silfies SP, Schneider MJ, Beneciuk JM, Lentz TA, et al. Interventions for the management of acute and chronic low back pain: revision 2021: clinical practice guidelines linked to the international classification of functioning, disability and health from the academy of orthopaedic physical therapy of the American physical therapy association. J Orthop Sports Phys Therapy. 2021;51(11):CPG1–60.
Martin RL, Davenport TE, Paulseth S, Wukich DK, Godges JJ, Altman RD, et al. Ankle stability and movement coordination impairments: ankle ligament sprains: clinical practice guidelines linked to the international classification of functioning, disability and health from the orthopaedic section of the American physical therapy association. J Orthop Sports Phys Therapy. 2013;43(9):A1–40.
Acknowledgements
The authors have no acknowledgments.
Funding
No.
Author information
Authors and Affiliations
Contributions
E.S. was responsible for the conception and design of the study, data collection, and interpretation of the data. S.Y. contributed to the study design, data collection, and interpretation of the data. Both E.S. and S.Y. were involved in the drafting of the manuscript, and all authors contributed to the interpretation of the data for the work and revising it critically for important intellectual content. All authors have finally approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Safran, E., Yildirim, S. A cross-sectional study on ChatGPT’s alignment with clinical practice guidelines in musculoskeletal rehabilitation. BMC Musculoskelet Disord 26, 411 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12891-025-08650-8
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12891-025-08650-8