Guillaume Macneil

Durham University Computer Science Undergraduate

Email: guillaume@gcmacneil.com

guillaume.macneil@durham.ac.uk


The Multisport Exercise Dataset

About

This is a dataset containing all of my recorded running, strength training, cycling and swimming activities from Tuesday 06 September 2022 to Saturday 23 March 2024. It was created with the goal of facilitating the open development of predictive models in the domain of sports and general fitness. I aim to update the dataset at the start of every month for the foreseeable future. Currently, the dataset contains 306 entries split in the following way:

Activity Number of Entries Percentage Storage Size (MB)
Running 123 40.2 22.4
Strength 183 59.8 29.3
Cycling 0 0.0 0.0
Swimming 0 0.0 0.0
Note that the cycling and swimming entries are not currently implemented, see limitations.

All data in the dataset has been collected using a Garmin Forerunner 255. For most running activities, further running dynamics data is provided by a Garmin HRM-Pro Plus (though I do sometimes forget to wear it). Although the accuracy of such devices is debatable, the measurements are at least consistent.

It should be noted that I am, first and foremost, training with the purpose of becoming a better judoka, though I am also interested in hypertrophy, strength and cardiovascular endurance. With that said, my training goals often vary based on what I am currently working towards (i.e. a Judo competition, a 10k race, a swimrun, etc.)

Visualisations and Related Projects

As an accompaniment to the dataset, I have also made a static webpage generator that provides statistics and visualisations for every single activity in the dataset. This could be used for visual inspection of trends in the data or simply for inspiration. These visualisations can be accessed here.

When I have time to make them, some example projects based on the dataset will be provided.

Download

Within this dataset, there are 2 sections / file types - the individual activity files and the activity statistics file. The activity files contain the second-by-second measurements (heart rate, speed, cadence, power, etc.) for each activity. The activity statistics file describes the training effects (calories, load, etc.) and pertinent statistics for each activity (sleep duration, HRV, etc.). The dataset (last updated on Tuesday 02 April 2024) can be downloaded below:

Checksums:

Limitations

There are a number of limitations with this dataset, some of which are inherent in its design, and others which will be improved in the future. They are listed below:

Inherent limitations:

  1. As all the data in the dataset is collected from my activities and my activities alone, it has very limited generalisability to anyone else. This dataset should be used for the testing and development of models, not the training of models to be applied to a wider population.
  2. Due to the multisport nature of the dataset, it will have limited applicability to training styles that focus on any one individual sport. See the about section for more information about my personal training style, and use that to inform your use of the dataset.
  3. Personally, I use metric measurements. Since the visualisations are statically generated (by me), they also use metric measurements. However, as a gesture of goodwill to imperial unit users, imperial versions of the dataset are provided too.

Temporary limitations:

  1. As I have only recently made this dataset, development is still actively ongoing. For this reason, the cycling and swimming data is not currently implemented. This will be addressed as soon as possible.
  2. Currently, although the strength training activities list the sets, reps and weights during the workout, the exercises themselves are not listed. This is a pretty glaring limitation that will also be addressed soon.
  3. Also, the visualisations of the strength training activities do not illustrate the work performed during bodyweight exercises very well at all.
  4. The number of features in the dataset files is perhaps somewhat limited. Over time, more features will likely be introduced.

If you have any suggestions for the improvement of the dataset, feel free to contact me!