Preliminary Validation Study of Consumer-level Activity Monitors and Mobile Applications for Step Counting under Free Living Conditions
Preliminary Validation Study of Consumer-level Activity Monitors and Mobile Applications for Step Counting under Free Living Conditions
Manolis Adamakis, MSc, PhD candidate
Faculty of Physical Education and Sport Science, National and Kapodistrian University of Athens, Athens, Greece
Corresponding Author: firstname.lastname@example.org
Journal MTM 6:1:26–33, 2017
Background: The last decade’s technological advances have spurred a continuously increasing interest in objective monitoring of physical activity with the use of wearable devices. Even though an increased accuracy is important in some lines of research, a balance between precision, feasibility and low-cost monitoring technologies is clearly needed.
Aims: The purpose of this study was to compare the accuracy for step counting between one spring-levered pedometer, two piezo-electric pedometers and two free of charge pedometer applications for Android smartphones, under free-living conditions.
Methods: Eleven healthy adults, ranging in BMI from 20.20 to 24.77 kg*m−2, volunteered to participate in the study. They wore the selected criterion pedometer Yamax SW-200 (SW), which is considered the “gold standard”, the Garmin Vivofit (GV), Medisana ViFit (MV), Accupedo application (AC) and Pedometer 2.0 application (PD), for a 24-h period, under free-living settings. Data were analyzed using descriptive and inferential statistics.
Results: All devices and applications demonstrated strong correlations with the SW reference pedometer (IC = 0.86 – 0.94), but often large mean differences Significant differences were observed among the five pedometers (F = 5.21, p = 0.01). Only the PD counted almost similar steps as the SW (F = 0.57, p = 0.47), even though it had high Mean Absolute Percent Error (MAPE). The 3 remaining pedometers significantly overestimated counted steps, with the AC application been the least accurate [F = 11.92, p < 0.01; MAPE = 36.15%].
Conclusions: The results of the present study showed favorable outcomes for the estimation of steps per day for the PD application in healthy and normal weight people. The two piezo-electric pedometers (GV – MV) appeared to give similar values, however these values constantly overestimated step counting compared with the criterion pedometer. Taking into account the free cost and feasibility of the PD application, the results demonstrate good potential for future use in free-living settings.
Regular aerobic physical activity increases exercise capacity and physical fitness, which can lead to many health benefits and overall health improvement.1 Twenty minutes of daily fast walking has the ability to reduce the risk of premature death by 16% to 30% for both healthy and overweight adults.2 In order to assist people increasing their physical activity, low-cost techniques are required.
The last decade’s technological advances have spurred a continuously increasing interest in objective monitoring of physical activity with the use of wearable devices. Even though an increased accuracy is important in some lines of research, a balance between precision, feasibility and low-cost monitoring technologies is recommended.3
Pedometers and consumer-level activity trackers have been an effective, low-cost tool used by many practitioners and scientists. Smart wearable sensors are adequate for preventive methods in many facets of medicine, such as cardiopulmonary, vascular, endocrine, rehabilitation and for monitoring health.4 Furthermore recent researches on mobile phone technology have reported on the feasibility and efficacy of phone applications in interventions for weight management, through behavior change of physical activity and dietary habits.5 However, for the majority of these new monitors and smartphone applications limited evidence exists regarding their validity and reliability.
A systematic review of 22 studies on the validity and reliability of Fitbit and Jawbone activity trackers indicated high validity for step counting and lower validity for energy expenditure and sleep estimation.6 A recent research compared the validity of Fitbit Ultra, Nike Fuelband and Yamax SW-701 during a two minute walk test.7 It was found that Fitbit Ultra was the most accurate device (5% error), followed by Yamax (15% error), while Nike Fuelband was the least accurate (34% error). Lee, Kim and Welk8 examined the validity of eight consumer-level devices during a 69 minute laboratory protocol. They concluded that BodyMedia Fit was the most accurate device in estimating energy expenditure, followed by Fitbit Zip and Fitbit Ultra. Under free-living conditions, Ferguson and colleagues9 have recently published their results regarding the validity of seven activity monitors, worn by 21 adults who went about their daily life for 48 hours. They concluded that validity ranged widely between devices and the most accurate ones were Fitbit One, Fitbit Zip and Withings Pulse, especially in step counting with very strong correlations (r = 0.94–0.99).
Nowadays these monitors, coupled with smartphone devices equipped with in-built accelerometer and the appropriate applications, have a vast potential to enhance user experience and utility. They are intended for individuals to log, record, track and evaluate their general fitness, health or wellness.10 According to the Food and Drug Administration (FDA),10 these mobile applications are categorized as mobile Health (mHealth) applications. However they may meet the definition of medical device but FDA intends to exercise enforcement discretion for these applications because they pose lower risk to the public.10 The report of Research2Guidance11 stated that mHealth and fitness applications in 2012 were almost 97,000, while between the 300 most downloaded applications, 102 were related to exercise and physical fitness.12 Fitness and wellness applications are the most rapidly growing mobile sector in the U.S.A., with a 134% rise of users between 2010 and 2011.13 Furthermore their use may be a powerful tool to encourage and promote physical activity and health.14–16
Smartphone technology and mobile applications have recently been reviewed in physical activity and health promotion.17 Hekler and colleagues18 tested three Android smartphones (HTC MyTouch, Nexus One and Motorola Cliq) and concluded that they can provide comparable physical activity estimates to an ActiGraph, both in laboratory-based and free-living context, regardless of the activity classification algorithm used. On the other hand, Boyce, Padmasekara and Blum19 compared three pedometer iOS applications (iSteps Lite, Pedometer Lite and Lyr Free) with a conventional Yamax pedometer and found that they were inaccurate in step count and speed estimate at varying intensities of activity. Lastly, Bergman, Spellman, Hall and Bergman20 compared an iOS application (iPedometer) with a StepWatch 3 and direct observation and found similar results regarding the inaccuracy of the application, concluding that this is not a valid instrument for monitoring activity. Till nowadays, little evidence exist regarding the validity of Android physical activity applications and, as Bort-Roig and colleagues,17 stated, ‘well designed studies are needed that comprehensively assess physical activity measurement accuracy’.
The purpose of this preliminary study was to compare the convergent validity for step counting between a spring-levered ‘golden standard’ pedometer, two piezo-electric pedometers and two free of charge pedometer applications for Android smartphones equipped with an accelerometer, under free-living conditions.
Participants were eligible for inclusion if they were aged 18 years or over, lived in metropolitan Athens, Greece, had an average BMI, were healthy, did not use medication that would affect their body weight or metabolism, were nonsmokers and could ambulate without walking aids. A convenience sample of seven males and four females, with an average age 32.45 years (SD = 3.62 years), ranging in Body Mass Index (BMI) from 20.20 to 24.77 kg*m−2, volunteered to participate in the study. Prior to data collection, each individual provided informed consent in order to participate.
Two consumer-level physical activity monitors and two freeware Android applications, running in a Samsung Galaxy S4 smartphone, were examined. The Samsung device was selected because nowadays it is considered the primary Android phone manufacturer.18 According to Del Rosario, Redmond and Lovell,21 Samsung S4 is equipped with an accelerometer of resolution ±0.61 m·s−2, gyroscope ±0.06 °/s, magnetometer ±0.15 μT (x/y axis), ±0.30 μT (z axis) and barometer ±1 hPa.
Garmin Vivofit (GV): The GV (Garmin Ltd., USA), software version 3.7, is a wrist-worn triaxial accelerometer that can measure steps taken, distance traveled, calories burned and can assess sleep patterns. This monitor is small (21 mm × 10.5 mm) and ligthweight (25.5 gr). It is also water resistant up to 50 m and has an extended battery lifespan of one year. GV can store data for one month and has a wireless function through a USB ANT™ Stick that makes it possible to upload data to the Garmin Connect™ (Web site) database, in order to store, analyze and compare trainings sessions. No research has been published on the GV.
Medisana ViFit (MV): The MV (Medisana AG, Neuss, Germany) is a triaxial accelerometer that can measure steps taken, distance traveled, total activity time, calories burned, percent of daily target achieved and can also assess sleep patterns. This monitor is smaller (5.8 × 2.1 × 1.5 cm) and lighter (11.5 gr) than GV. Its battery life span is significantly smaller, about 10 days, and is less expensive. MV can synchronize data to the VitaDock Online (Web site) via a USB cable or the accompanying VitaDock application for iOS and Android devices. No research has been published on the MV.
An internet search for all free pedometer applications was conducted on the Samsung Galaxy S4 smartphone using the standard ‘Play Store’ application. The Android OS operating system was chosen because it was the most widely used operating system22,23 and had the most freeware applications.24 Inclusion criteria for pedometer applications were:
- (1) Free of charge indefinitely after download. Applications with a free trial period of finite length were excluded.
- (2) Full and efficient functionality after downloading, without additional software download being necessary.
- (3) Functionality only through the built-in accelerometer and not using GPS or 3G signal.
- (4) Able to record the number of steps taken, average speed, total distance and energy expenditure.
- (5) Adjustable sensitivity settings.
- (6) Manual input of demographic and somatometric data (gender, age, weight, height and step length).
- (7) Among the most popular and downloadable applications, according to users’ ratings and number of downloads from the Google Play Store (as mentioned in the Store on the 23 March 2015).
Accupedo (AC): The AC application (Corusen LLC; http://www.accupedo.com), software version 5.0.9, of 2.64 MB size, is one of the most widely used pedometer applications and in 2011 was rated by the users as the best pedometer (Consumer Reports, 2011). It had been downloaded 5.000.000 to 10.000.000 times and 28.034 users had rated it with 3.8 in a five-point scale, from whom 12.771 users rated it with 5. It was availiable for both Android and iOS devices. The application met all inclusion criteria and could record in real-time average speed, total distance, number of steps taken and activity energy expenditure. No validation research has been published on the AC.
Pedometer 2.0 (PD): The PD application (DSD; https://play.google.com/store/apps/details?id=step.counter.pedometer), software version 3.1.9, of 2.39 MB size, had been downloaded 500.000 to 1.000.000 times and 6.829 users had rated it with 3.9 in a five-point scale, from whom 3.419 users rated it with 5. It was available only for Android devices. The application met all inclusion criteria and could record in real-time average speed, total distance, number of steps taken and activity energy expenditure. Furthermore it was the only application with a self calibration capability, which was used in order to determine the appropriate sensitivity settings for every participant seperately. No research has been published on the PD.
All devices could measure various physical activity parameters, however in the present study only the accuracy of counted steps for each device was examined. None of the devices had been previously validated.
The consumer-level devices were compared with the selected criterion spring-levered pedometer Yamax SW-200 (SW). This pedometer is considered the ‘gold standard’ instrument of measuring physical activity levels and steps taken in field settings25 and has been used as a criterion device in many studies.26 Furthermore this device’s accuracy is not affected by the BMI of the participants.27,28
All participants attended an appointment at which demographic data were obtained, with mass, height and step length measured following standardized procedures.29 Anthropometric measures were obtained at the beginning of the data collection session. Standing height was measured to the nearest 0.1 cm using a wall mounted Harpenden stadiometer (Harpenden, London, UK). Body mass was measured with participants in light clothes and bare feet on an electronic scale (Omron BF-511) to the nearest 0.1 kg.
Before data collection, all batteries were changed and a series of shake tests were performed in order to ensure correct calibration of SW.30 For the calibration of the pedometer applications and the selection of the appropriate sensitivity level, the in-built function of the PD application was used, following application’s recommendations.
The GV was fitted on the left wrist, while the smartphone and MV were placed in an elastic belt fitted tight around the waist. For every participant, demographic and somatometric data were entered manually in the smartphone applications and with the use of proprietary software in the pedometers, prior to data collection.
Participants were instructed to leave all devices on simultaneously for approximately 24 hours, excluding sleeping and showering. The wear period was not limited to a particular period of the week and no guidelines or restrictions on activity levels were provided, in order to ensure the study broadly represented free-living steps. Data collection took place in June–July 2015.
Data were extracted using the proprietary software for all consumer devices, in the same fashion that a consumer would utilize the software and were processed according to the manufacturers’ instructions. Participants were asked about any non-wear periods, and all indicated full compliance (that is, removal only for sleeping and bathing).
Participants’ demographic data were analyzed descriptively. Step count was determined by comparing the pedometers and applications with the SW. Descriptive analyses were conducted to examine the associations with the criterion measure. Intraclass correlations were computed to examine overall group-level associations. Mean absolute percent errors (MAPE) were also calculated to provide an indicator of overall measurement error. MAPE were computed as the average of absolute differences between the activity monitors and the SW value divided by the SW value, multiplied by 100. Validity on step counting was quantified using one-way repeated measures ANOVA, followed by Bonferroni corrections. To further evaluate individual variations in a more systematic way, Bland–Altman plots with corresponding 95% limits of agreement and fitted lines (from regression analyses between mean and difference) with their corresponding parameters (i.e., intercept and slope) were presented. All statistical analyses were computed with SPSS 21.0 and MedCalc 12.7.
Participants took an average of 7707 steps*day−1 according to the criterion pedometer.
Table 1 shows the indicators of agreement Intraclass correlation coefficients (IC) between SW and all activity monitors. Overall associations across the whole duration were consistently high with all methods yielding large (> 0.86) and significant correlations with SW.
Table 1. Intraclass correlation matrix between devices and applications.
The MAPE values for the overall group comparisons (Figure 1) were lowest for MV (21.69%) and GV (22.91%), higher for PD (26.18%), and highest for AC (36.15%).
Figure 1. Mean absolute percent error (±SD) for all devices and applications.
Significant differences were observed between the five pedometers (F = 5.21, p = 0.01, η2 = 0.34). Only the PD did not differ significantly from the criterion SW (F = 0.57, p = 0.47, η2 = 0.05). The three remaining pedometers significantly overestimated counted steps. More specifically, GV (M = 9140, SD = 4561 steps) differed significantly from the criterion pedometer (F = 8.88, p = 0.01, η2 = 0.47), as well as MV (M = 9032, SD = 5051 steps; F = 5.49, p = 0.04, η2 = 0.35) and AC (M = 10298, SD = 5610 steps), which was the least accurate (F = 11.92, p = 0.01, η2 = 0.54).
The Bland Altman plots (Figure 2) suggested that all devices and applications over-counted steps, revealing the narrowest 95% limits of agreement for AC (difference = 2592 steps; 95% CI = −4264 to −919 steps), lower values for GV (difference = 1433 steps; 95% CI = −2504 to −362 steps) and MV (difference = 1325 steps; 95% CI = −2589 to −60 steps), and the lowest values for PD (difference = 517 steps; 95% CI = −2043 to 1008 steps). The slopes for the fitted lines were not significant for GV (slope = −0.20, p = 0.102) and PD (slope = 0.04, p = 0.847), which suggests no significant patterns of proportional systematic bias with these devices. However, significant bias was observed for the MV (slope = −0.30, p = 0.016) and AC (slope = −0.41, p = 0.001).
Figure 2. Bland Altman plots for step estimation.
This preliminary study aimed to examine the convergent validity of two consumer-level pedometers and two Android smartphone applications for step counting, under free-living conditions in healthy and normal weight adults. To our knowledge, this is the first study which has tried to compare the accuracy of commercial pedometers and freeware smartphone applications under free-living settings. Furthermore no previous studies have assessed the validity of GV and MV. The monitors tested in the present study were not marketed as research-grade monitors, as in Lee, Kim and Welk’s8 study; however the present study partially supports the relative accuracy of the various monitoring technologies.
All devices and applications demonstrated strong correlations with the SW reference pedometer (IC = 0.86 – 0.94), but often large mean differences. The results showed favorable outcomes for the estimation of steps per day from the PD Android application, which may not have the smallest MAPE from SW (26.18%), however it did over count on average only 517 steps and had the highest limits of agreement in the Bland Altman plots. The AC application provided the largest error for estimation for step counting, showing on average 2592 more steps, the largest MAPE (36.15%) and the lowest correlation, showing in general poor validity.
The two piezo-electric pedometers (GV and MV) appeared to give similar values, however these values constantly overestimated step counting compared to the criterion pedometer and their MAPE of about 22% was almost the same as that of PD’s application. On the other hand, these two monitors had the highest correlations with the SW. It is hard to reconcile how monitors can produce high correlations and still have inaccurate group-level estimates. Therefore, caution should be used when interpreting these findings with the GV and MV.
Furthermore, the slopes of the Bland Altman plots showed significant systematic bias for the MV pedometer device and the AC application. This result further reinforces the validity of the PD application, whose slope was within the accepted limits of non systematic bias pattern. The GV also provided evidence of non proportional systematic bias pattern, rendering it a more valid alternative than MV.
Overall, the performance of these consumer-level monitors and applications is not very impressive, as non of the monitors and applications under examination had MAPE near the upper acceptable limit of 3% used in previous studies.31 The MAPE of GV, MV and PD could be considered acceptable, considering the diverse range of activities a normal adult performs in a single day. It is possible that monitors overestimated some activities and underestimated others, as in Lee, Kim and Welk’s8 study. The two activity monitors and PD application may be suitable for consumer use, however they are not yet valid for use in research settings, due to large MAPEs.
Finally, results must be viewed with caution, because larger datasets and more diverse samples are needed. The sample population of the present preliminary study was small and included only healthy, young individuals within the normal range of body weight. Therefore, we cannot generalize these findings to other age groups or body sizes. A further limitation was the use of free pedometer programs, while paid programs may be more valid or more functional. Further research is needed to examine these devices and applications across various activities and intensities. We also believe that FDA should consider these applications as primary medical ones and establish more strict rules regarding their production and use, seeing the mass penetration they have in consumers’ downloading preferences. In future studies advanced piezo-electric pedometers should be used as reference devices, such as Fitbit Withings Pulse, because they tend to be more valid and accurate than spring-levered ones.7–9,19,32 A different methodological approach in free-living validation studies could also be to perform a validity test on the various devices under examination in semi-structured conditions and use later on, with the same sample, the most accurate device as the reference tool.
In conclusion, the present study offers preliminary evidence for the validity of a freeware Android application in measuring total steps per day under free-living conditions. As suggested by Hekler and colleagues,18 Android smartphones may be an acceptable alternative for tracking everyday physical activity of individuals. Taking into account the free cost and feasibility of the PD application, the results demonstrated good potential for its future use. The two physical activity monitors provided similar estimates for step counting, however these estimates were inaccurate comparing to the criterion measure. More research is clearly needed to establish the validity not only in step counting but also in energy expenditure estimation of these monitors and applications under various settings and diverse samples. However, people who own Android smartphones and would like to track their everyday physical activity have the opportunity to use this freeware application, while this may not be yet suitable for use in clinical and research settings.
No competing interests
All authors have completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous 3 years; no other relationships or activities that could appear to have influenced the submitted work.
2. Ekelund U, Ward HA, Norat T, et al. Physical activity and all-cause mortality across levels of overall and abdominal adiposity in European men and women: the European Prospective Investigation into Cancer and Nutrition Study (EPIC). Am J Clin Nutr 2015;101(3):613–21. doi:10.3945/ajcn.114.100065.
9. Ferguson T, Rowlands AV, Olds T, et al. The validity of consumer-level, activity monitors in healthy adults worn in free-living conditions: A cross-sectional study. Int J Behav Nutr Phys Act 2015;12:42.
10. Food and Drug Administration. Mobile medical applications: Guidance for industry and Food and Drug Administration staff. 2015. Availiable at http://www.fda.gov/downloads/MedicalDevices/…/UCM263366.pdf (accessed 10 November 2015).
12. Dunbrack LA, Duffy J. The second wave of clinical mobility: Strategic solution investments for mobile point of care in Western Europe. Framingham, MA: IDC Health insights. 2012. Availiable at http://www.ehealthnews.eu/images/stories/pdf/idc_the_second_wave_of_clinical_mobility.pdf (accessed 1 November 2015).
13. Comscore. Mobile Year in Review 2011. 2012. Availiable at http://www.comscore.com/Insights/Presentations_and_Whitepapers/2012/2012_Mobile_Future_in_Focus (accessed 5 November 2015).
15. Middelweerd A, Mollee JS, van der Wal CN, et al. Apps to promote physical activity among adults: A review and content analysis. Int J Behav Nutr Phys Act 2014;11(97):1–9. doi:10.1186/s12966-014-0097-9.
18. Hekler EB, Buman MP, Grieco L, et al. Validation of physical activity tracking via Android smarphones compared to ActiGraph accelerometer: Laboratory-based and free-living validation studies. JMIR mHealth uHealth 2015;3(2):e36.
24. Austin S. The surprising numbers behind Apps. 2013. Availiable at http://blogs.wsj.com/digits/2013/03/11/thesurprising-numbers-behind-apps/ (accessed 9 November 2015).