AI-driven rehabilitation: evaluation of ChatGPT-4o for generating personalized physical rehabilitation plans in comorbid patients

Михалко, Ярослав Омелянович; Дудіцька, Світлана; Лариса, Балацька; Філак, Фелікс Георгійович; Рубцова, Єлізавета Іллівна

Please use this identifier to cite or link to this item: https://dspace.uzhnu.edu.ua/jspui/handle/lib/73809

Title:	AI-driven rehabilitation: evaluation of ChatGPT-4o for generating personalized physical rehabilitation plans in comorbid patients
Other Titles:	Реабілітація на основі штучного інтелекту: оцінка ChatGPT-4o для створення персоналізованих планів фізичної реабілітації у пацієнтів з коморбідними захворюваннями
Authors:	Михалко, Ярослав Омелянович Дудіцька, Світлана Лариса, Балацька Філак, Фелікс Георгійович Рубцова, Єлізавета Іллівна
Keywords:	ChatGPT-4o, large language model, performance, physical rehabilitation
Issue Date:	Apr-2025
Publisher:	ALUNA Publishing
Citation:	AI-driven rehabilitation: evaluation of ChatGPT-4o for generating personalized physical rehabilitation plans in comorbid patients / Y.O. Mykhalko, S. Dyditska. L. Balatska, F. Filak, Y. Rubtsova // Wiadomości Lekarskie Medical Advances. – 2025, – Vol. 78(4). – p. 753-759.
Abstract:	Aim: To evaluate the performance of ChatGPT-4o in creating personalized physical rehabilitation plans for comorbid patients. Materials and Methods: ChatGPT-4o was employed to generate physical rehabilitation plans for 50 clinical cases of comorbid patients. These plans were evaluated independently by two experts according to 6 criteria using a 5-point Likert scale. Experts also classified each plan regarding its suitability for use into 3 categories: “Completely unsuitable for use”, “Suitable for use with corrections”, “Completely suitable for use”. Statistical analysis included the Mann–Whitney U test, intraclass correlation coefficient (ICC) and linear weighted Cohen's kappa (kw). The statistical significance was set at p<0.05. Results: The overall mean score of ChatGPT-4o generated rehabilitation plans was 4.30±0.28 with the highest scores for respiratory and musculoskeletal pathology (4.37±0.36 and 4.33±0.24, respectively). Among the evaluation criteria, the highest indicators were observed for Clinical accuracy and Safety (4.59±0.59 and 4.41±0.71, respectively). 72.00% of the generated plans were classified as “Suitable for use with corrections”. None of the plans were identified as “Completely unsuitable for use”. The agreement percentage ranged from 84% to 90%, ICC values were 0.80-0.86, and overall suitability kw was 0.77. Conclusions: LLM-generated rehabilitation plans show promise as supportive tools in clinical practice, but they are not yet at a stage where they can be implemented without expert review and modification. The high overall inter-rater reliability provides confidence in the evaluation process, while also highlighting areas for improvement in both the LLM's performance and the assessment methodology.
Type:	Text
Publication type:	Стаття
URI:	https://dspace.uzhnu.edu.ua/jspui/handle/lib/73809
ISSN:	0043-5147
Appears in Collections:	Наукові публікації кафедри терапії та сімейної медицини

Files in This Item:

File	Description	Size	Format
AI-rehabilitation.pdf		4.18 MB	Adobe PDF	View/Open

Show full item record