Adaptive Difficulty Setting for Exercises

At Karuna Labs we have created a virtual reality software to treat chronic pain through cortical reorganization (aka neuroplasticity).

This entails slowly training our patient’s brain and body to recover the full extent of their movements through physiotheraputic exercises and visual inputs to reprogram the patient’s representations of their pain.

For example, if a patient has suffered a shoulder injury. In their sessions they will put on the VR headset and perform a daily task, such as picking up a can from a shelf. We will then slowly increase the difficulty level of the exercise (i.e. height of the shelf in this case) in subsequent sessions such that the patient does not notice this increase. In doing so the patient is able to progressively, but imperceptibly, regain their mobility.

The machine learning problem: How can we prepare a specialized training regime for a new patient using our database of patients who have successfully completed the sessions. Our database contains movement data captured through our sensors and the patients’ medical history.

That is, how do we decide the difficulty level of the exercises based on their health and current range of mobility.

We implement various supervised learning algorithms for regression in Scikit-learn.

General steps in creating a regression model

  1. Data Exploration
  2. Data Preparation – data cleaning, handling missing values, data transformations, normalizing.
  3. Handling Outliers
  4. Dimensionality Reduction – PCA, feature selection
  5. Choosing appropriate models based on given data and desired outcomes.
  6. Training models
  7. Parameter Tuning
  8. Measuring Performance on test/unseen data

In this tutorial we will primarily focus on training various regression models in Scikit-learn and measuring their performance.

1. Data Preparation

1.1. Read input data

# import libraries
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, r2_score

# read in the data
data = pd.read_excel(“synthetic_final.xlsx”)

# preview of the data

# all column names


['patient_id',  'promis29_score', 'age', 'sex', 'heart_health', 'diabetes', 'asthma', 'back_pain', 'arthritis', 'high_blood_pressure', 'height_inches', 'weight_pounds', 'difficulty_1', 'pain_1', 'rom_1', 'success_rate_1',  'difficulty_2', 'pain_2', 'rom_2', 'success_rate_2', 'difficulty_3', 'pain_3',  'rom_3', 'success_rate_3', 'difficulty_4', 'pain_4', 'rom_4', 'success_rate_4', 'difficulty_5', 'pain_5', 'rom_5', 'success_rate_5', 'difficulty_6', 'pain_6', 'rom_6', 'success_rate_6', 'difficulty_7', 'pain_7', 'rom_7', 'success_rate_7', 'difficulty_8', 'pain_8', 'rom_8', 'success_rate_8']

1.2. Data Dictionary

# patient_id - masked patient ID
# promis29_score - a patient reported score based on a questionnaire on their health
# age - age of the patient at the time of the first session
# sex - sex of the patient
# heart_health - 0 = healthy heart, 1 = unhealthy heart
# diabetes - 0 = does not suffer from diabetes, 1 = suffers from diabetes
# asthma - 0 = does not suffer from asthma, 1 = suffers from asthma
# back_pain - 0 = no back pain, 1 = has back pain
# arthritis - 0 = does not suffer from arthritis, 1 = suffers from arthritis
# high_blood_pressure - 0 = normal BP, 1 = high BP
# height_inches - height in inches
# weight_pounds - wieght in pounds
# difficulty_* - difficulty setting for the particlaur session.
# pain_*' - pain reported by patient during session
# rom_* - range of movement of the patient.
# success_rate_* - success rate in completing the exercise.


1.3. Splitting data into train and test sets

# split data into training and test
train = data.iloc[:80,:]
test = data.iloc[80:,:]

# Predicting difficulty level for 8th session

# Preparing data by dropping unnecessary columns

train8X = train.drop(['patient_id', 'success_rate_1', 'success_rate_2', 'success_rate_3', 'success_rate_4', 'success_rate_5', 'success_rate_6', 'success_rate_7', 'difficulty_8', 'pain_8', 'rom_8','success_rate_8'], axis = 1)
test8X = test.drop(['patient_id', 'success_rate_1', 'success_rate_2', 'success_rate_3', 'success_rate_4', 'success_rate_5', 'success_rate_6', 'success_rate_7','difficulty_8','pain_8','rom_8','success_rate_8'], axis = 1)
train8Y = train.drop(['patient_id', 'promis29_score', 'age', 'sex', 'heart_health', 'diabetes', 'asthma', 'back_pain', 'arthritis', 'high_blood_pressure', 'height_inches', 'weight_pounds','difficulty_1', 'pain_1', 'rom_1', 'success_rate_1','difficulty_2', 'pain_2', 'rom_2', 'success_rate_2', 'difficulty_3', 'pain_3',rom_3','success_rate_3', 'difficulty_4', 'pain_4', 'rom_4', success_rate_4','difficulty_5', 'pain_5', 'rom_5', 'success_rate_5', 'difficulty_6', 'pain_6', 'rom_6', 'success_rate_6', 'difficulty_7', 'pain_7', 'rom_7', 'success_rate_7', 'pain_8', 'rom_8','success_rate_8'], axis = 1)
test8Y = test.drop(['patient_id', 'promis29_score', 'age', 'sex', 'heart_health', 'diabetes', 'asthma', 'back_pain', 'arthritis', 'high_blood_pressure', 'height_inches', 'weight_pounds', 'difficulty_1','pain_1',
'rom_1', 'success_rate_1', 'difficulty_2', 'pain_2', 'rom_2','success_rate_2','difficulty_3','pain_3','rom_3','success_rate_3','difficulty_4','pain_4', 'rom_4', 'success_rate_4','difficulty_5','pain_5','rom_5','success_rate_5','difficulty_6','pain_6','rom_6','success_rate_6','difficulty_7','pain_7','rom_7',success_rate_7', 'pain_8','rom_8','success_rate_8'], axis = 1)

2. Regression Models

The regression models aim to make a prediction about what the difficulty setting should be for the eighth session, i.e. a value between 0 and 100.

2.1. Support Vector Machines

# import library

from sklearn import svm

# train model on the training data
model_svm = svm.SVR(), train8Y)

# make predictions on the test data 
predicted_values = model_svm.predict(test8X)

# calculate error between the predicted values and the ground truth
print("Mean squared error: %.5f"% mean_squared_error(test8Y, predicted_values))
print('Variance score: %.2f' % r2_score(test8Y, predicted_values))

Mean squared error: 1.16618
Variance score: -0.02


2.2. Linear Regression

from sklearn import linear_model
reg_model = linear_model.LinearRegression(), train8Y)
predicted_reg = reg_model.predict(test8X)
mean_squared_error(test8Y, predicted_reg)
print("Mean squared error: %.5f"% mean_squared_error(test8Y, predicted_reg))
print('Variance score: %.2f' % r2_score(test8Y, predicted_reg))

Mean squared error: 1.96415
Variance score: -0.71

2.3. Ridge Regression

ridge_reg = linear_model.Ridge(alpha = .5), train8Y)
predicted_ridge = ridge_reg.predict(test8X)
mean_squared_error(test8Y, predicted_ridge)
print("Mean squared error: %.5f" % mean_squared_error(test8Y, predicted_ridge))
print('Variance score: %.2f' % r2_score(test8Y, predicted_ridge))

Mean squared error: 1.78342
Variance score: -0.55

2.4. Lasso Regression

lasso_reg = linear_model.Lasso(alpha = 0.1), train8Y)
predicted_lasso = lasso_reg.predict(test8X)
mean_squared_error(test8Y, predicted_lasso)
print("Mean squared error: %.5f"% mean_squared_error(test8Y, predicted_lasso))
print('Variance score: %.2f' % r2_score(test8Y, predicted_lasso))

Mean squared error: 1.09308
Variance score: 0.05

2.5. Bayesian Ridge Regression

bayesian_ridge_reg = linear_model.BayesianRidge(), train8Y)
predicted_bayes = bayesian_ridge_reg.predict(test8X)
mean_squared_error(test8Y, predicted_bayes)
print("Mean squared error: %.5f" % mean_squared_error(test8Y, predicted_bayes))
print('Variance score: %.2f' % r2_score(test8Y, predicted_bayes))

Mean squared error: 1.25829
Variance score: -0.10

2.6. Regression Tree

#regression tree
from sklearn import tree
clf = tree.DecisionTreeRegressor()
clf =, train8Y)
clf_pred = clf.predict(test8X)
print("Mean squared error: %.5f" % mean_squared_error(test8Y, clf_pred))
print('Variance score: %.2f' % r2_score(test8Y, clf_pred))

Mean squared error: 2.00000
Variance score: -0.74

2.7. Gradient Boosting Regression

from sklearn.ensemble import GradientBoostingRegressor
est = GradientBoostingRegressor(n_estimators=10, learning_rate=0.1,
max_depth=1, random_state=0, loss='ls').fit(train8X, train8Y)
print("Mean squared error: %.5f" % mean_squared_error(test8Y, est.predict(test8X)))
print('Variance score: %.2f' % r2_score(test8Y, est.predict(test8X)))

Mean squared error: 0.78961
Variance score: 0.31

2.8. Multi Layer Perceptron

# multi layer perceptron
from sklearn.neural_network import MLPRegressor
mlp_reg = MLPRegressor(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(15,), random_state=1), train8Y)
print("Mean squared error: %.5f" % mean_squared_error(test8Y, mlp_reg.predict(test8X)))
print('Variance score: %.2f' % r2_score(test8Y, mlp_reg.predict(test8X)))

Mean squared error: 5.40565
Variance score: -3.71

3. Conclusion

We see that the model with the least mean squared error in the ensemble method Gradient Boosting Regression. In the linear regression models we can see that Lasso regression performs the best.

C:\Users\Karan Hora\Desktop\karuna-patient-VRdifficulty\code_data\results.png