Prediction of Snacking Behavior Involving Snacks Having High Levels of Saturated Fats, Salt, or Sugar Using Only Information on Previous Instances of Snacking: Survey- and App-Based Study


IntroductionGeneral Background

Noncommunicable diseases, such as cardiovascular disease, cancer, and chronic respiratory disease, are currently the biggest threats to health []. These are greatly influenced by behaviors, such as poor diet, physical inactivity, and smoking (eg, [,]). Modification of the obesogenic environment would be most effective at bringing about change on a large scale []. However, such an environmental change is unlikely to occur in a short period. Therefore, most related interventions seek to change people’s responses to their environments. Approaches to achieve these changes have ranged from mass media campaigns at the population level to group-based and individual healthy lifestyle coaching. Over the last decade, digital technologies for supporting a healthy lifestyle have been on the rise, but there is still room for improvement [-].

A potential avenue for improving the effectiveness of technology-assisted interventions is providing just-in-time adaptive interventions (JITAIs). JITAIs are designed to predict the points at which a person is likely to be in most need of and most receptive to reminders or assistance for changing a target behavior []. For example, a person engaged in an effort to quit smoking could be reminded of their goal at the point at which they are most likely to lapse [,]. This may be effective because motivation for a particular behavior can vary over time []. Additionally, health-related behaviors may be elicited by cues in the environment (eg, a certain time of the day or walking past a bakery store) and be largely habitual [-]. This can make change difficult unless behaviors elicited by these cues are disrupted. Helping a person identify those cues may reduce the undesired behavior in a number of ways, for example, by helping the person avoid the cues, adjust their behavior, or respond to the cues in a different way [,]. JITAIs could help people achieve such goals.

There is increasing research into the use of JITAIs (eg, [,]). Their development has been greatly boosted by the widespread adoption of powerful personal smartphones. For example, in 2021, 88% of all adults (aged 16 years or older) possessed a smartphone in the United Kingdom [], with similar statistics throughout Western Europe and North America. Modern smartphones allow increasingly complex data collection from their owners, including date, time, ambient temperature, and location. Furthermore, owing to the high computational capabilities of modern smartphones, many of the required computations (eg, for prediction) can be carried out locally, without the need for distant servers and an active internet connection.

Snacking on foods and drinks that have high saturated fats, salt, or sugar (HFSS) is the focus of this research. Snacking can be defined as “food and beverage intake between meals, including products, such as potato chips, chocolate, and soft beverages” []. HFSS foods contribute to poor health [], and many snacks fall into this category. Indeed, a previous study found that people who are overweight or obese eat an average of 1.3 snacks per day, with 79% of these snacks being high in either fat or sugar []. However, reducing HFSS snacking poses many challenges, particularly because it can be triggered by emotional or environmental factors []. It can also occur in an automatized (reflexive) way, making it less amenable to conscious control efforts [-]. Additionally, snacks that are high in sugar may make a person crave more sugary foods, because consumption of these snacks can lead to a spike and subsequent dip in blood sugar levels [,]. Indeed, feelings of hunger and food preoccupation are key reasons cited for snacking [].

Several sophisticated approaches to predict aspects of maladaptive eating behavior have already been proposed, using ecological momentary assessments (EMAs). For example, Arend et al [] studied binge eating episodes in clinical participants. The authors reported excellent predictive accuracy based on an EMA protocol with 36 items, including emotional and environmental variables. Based on initial testing, they were subsequently able to identify a smaller subset (n=5-9) of highly valid individualized predictors (EMA items), thereby reducing the need for an extensive EMA protocol. Forman et al [] similarly investigated the predictive adequacy of a large number of variables concerning dietary lapses. Some questions, such as those relating to cravings or affect, were answered as many as 4 times a day, and others were answered only once. Kaiser et al [] used data from 2 weeks of EMAs on stress and emotion, together with sensor data, to predict (reasonable accuracy) food cravings. Finally, Spanakis et al [] tracked several individual states, such as emotions and cravings, which might predict “unhealthy eating events,” including unhealthy snacking and other events, such as consumption of high-calorie food as part of a meal, in people who were overweight or obese. Participants were questioned as many as 10 times a day. Based on the collected data, a bottom-up clustering algorithm was used to arrive at 6 different subgroups of participants characterized by a specific pattern of eating behavior (eg, eating in the evening at home), to enable tailoring the intervention to a specific profile, which was implemented in a randomized controlled trial [].

Research based on EMAs is valuable because the prediction of a behavior as complex as eating can potentially only be accomplished by considering a multitude of variables, including environmental, psychological, and physiological variables. Moreover, the assessment of these variables takes place in daily life, contributing to ecological validity. However, health interventions based on EMAs typically require long periods to train the machine learning (ML) algorithms for prediction, as well as considerable commitment and motivation from participants. There is therefore interest in exploring whether the prediction of a particular behavior can proceed based on information that is both minimal and easily available, without much effort from participants.

In the domain of mental health, research on so-called digital phenotyping has recently started to develop. Digital phenotyping uses a smartphone as a tool for objective and ecologically valid measurements. This method includes passively obtained data, without needing input from the user. Digital biomarkers, such as sensor technology, geolocation, characteristics of voice and speech, and human-computer interaction, are obtained [-]. Through this method, a pattern might be discovered over several weeks (ie, the user is taking too long to respond to messages, is browsing online until late at night, and is mostly at home). This can lead to suspicion that things are not going particularly well for the user, and the suspicion may be increased by the tone, timing, and content of the user’s social media posts. Research has shown that mood states in mood disorders can be predicted using digital biomarkers based on the circadian rhythm [].

Prediction of HFSS Snacking Using ML

This research project takes a step in the direction of digital phenotyping for the prediction of unhealthy eating behavior. Specifically, to what extent can HFSS snacking be predicted only based on prior HFSS snacking combined with information that can be automatically or easily collected from a smartphone (date, time, and location)? However, this endeavor might fail due to temporal resolution requirements. If high precision is needed, failure will be inevitable owing to the intrinsic stochasticity of the eating behavior. Another problem is the degree of accuracy that can be achieved after a modest training period, because participants may not have the patience for extended training (eg, Tulu et al []), and even with mostly passively collected data, participants still need to indicate instances of HFSS snack consumption.

On the positive side, ML has progressed to such an extent that modern algorithms have many characteristics desirable for the present application, including the capacity to deal with sparse data and efficient learning of time series. For example, as an alternative to recurrent neural networks, which are well suited to time series data, ensemble methods have a good ability to deal with sparse data by reducing the impact of noise and outliers []. Therefore, our aim was to compare a selection of ML algorithms, with a view to identify a good algorithm for predicting instances of HFSS snacking based on only prior instances, which were coded in terms of time, day, and location (the latter was encoded in terms of broad categories). The algorithms were chosen to reflect complementary characteristics and be representative of the range of good options currently available. Random forest regressor (RFreg), Extreme Gradient Boosting regressor (XGBreg), feed forward neural network (FFNN), and long short-term memory (LSTM) were considered.

RFreg is a tree-based ensemble method that trains many decision trees simultaneously with bootstrapping followed by aggregation, collectively referred to as bagging. Bootstrapping involves the training of several individual decision trees (between 100 and 300 in the present case) on several subsets of the dataset, using various subsets of available features []. Aggregation means that the outputs from the distinct decision trees are combined into a single decision. RFreg is considered to generalize well and be resistant to overfitting, as well as produce high prediction accuracy, because of ensemble learning [].

XGBreg is another tree-based ensemble method, which uses a form of gradient boosting, relying on the idea that correcting the model’s earlier errors and learning from them can help to improve performance in the future. This is a sequential ensemble learning method where the model tries to improve performance with each iteration []. Both RFreg and XGBreg are ensemble learning techniques, but the former builds multiple trees in parallel and then employs an average for prediction, while the latter constructs 1 tree at a time, in a way that is informed from the errors of the previous tree [].

FFNN is a relatively simple type of artificial neural network, in which information is processed in 1 direction, from input units to units in one or more hidden layers to output units, such that there are no cycles in the connections between the nodes. Hidden layer units apply nonlinear functions to their input, enabling an FFNN to learn complex associations between input and output []. An FFNN is trained using gradient descent methods, specifically error backpropagation [].

Finally, the LSTM model is a kind of recurrent neural network with an architecture designed for learning long-term dependencies in time series []. A recurrent neural network includes cycles that feed network activations from earlier time steps as inputs to determine predictions at the present time step. As a result of these recurrent connections, the model creates an implicit recollection of past occurrences, stored in its hidden layer []. Recurrent models can process contexts of arbitrary length; the LSTM model has a specific structure designed to store values for longer compared to standard recurrent neural networks. The LSTM model is the only recurrent model that was employed, with the other models operating on a fixed context.

Summary of the Purpose and Aims

Despite much interest in predicting eating behavior, there has been less work on prediction involving minimal data. Therefore, the feasibility of predicting HFSS snacking using only previous instances of snacking collected across a “practical” length of time (practical in the sense of participant recruitment and engagement) is unclear. Additionally, there is a wide range of ML algorithms. It is of interest to explore the quality of prediction against the assumed characteristics of HFSS snacking behavior, such as sparseness and high noise.

With these considerations in mind, this study aimed to (1) develop an app, which would enable data collection on HFSS snacking with minimum effort from participants; (2) define a sensible problem for characterizing HFSS snacking behavior (prediction of HFSS snacking using previous instances of snacking and limited information [time, location, and day of the week]); (3) apply a range of standard and unmodified ML algorithms to the problem; (4) compare the performance of these algorithms to each other and to some baseline statistical models; and (5) consider whether the task of predicting HFSS snacking from minimal information is feasible and suggest some directions for future work.


MethodsOutline of Data Collection

Data on HFSS snacking were collected to examine which ML algorithm was best able to predict such behavior based on only previous instances and minimal information (time, location, and day of the week). We first describe the procedures employed to collect the data.

Data were collected in 2 parts. First, a survey was created to explore various assumptions about the target behavior of interest. Second, we implemented an app to obtain data on HFSS snacking. Participants who reported having 2 or more HFFS snacks daily in the first part were invited for the app-based second part of the study. The second part of the study involved monitoring participants’ snacking behaviors for 28 days. The 2 parts of the study are outlined in . This dataset is referred to as the “UK dataset.” To increase the ecological validity of our work, we also employed a cleaned version of a similar dataset collected in the Netherlands, which has been described by Spanakis et al []. The dataset in the report by Spanakis et al [] is referred to as the “Dutch dataset.”

The data collection details below concern the UK dataset; corresponding details for the Dutch dataset can be found in the report by Spanakis et al [].

Figure 1. Screening and data collection processes for the UK dataset. Initially, demographic and lifestyle (weight/diet) surveys were used to determine eligibility for the trial. In part 2 (app-based part of the study), participants were asked to record snacking habits for a target period of 4 weeks. Ethical Considerations

For the UK dataset, ethics approval was granted by the Psychology Research Ethics Committee at City, University of London (reference: PSYETH (S/L) 17/18 87). The informed consent form for the first study part (survey part) informed participants that the study was about snacks that are high in sugar, salt, and fat and that the study would be conducted in 2 parts (a brief first part concerning some general questions about eating behavior and a 28-day second part, which would involve participants recording occasions of snacking on their smartphones via an app). The informed consent form for the second part (app-based part) explained in detail about HFSS snacks and informed participants that they would have to record their HFSS snack consumption on an app for 28 days. The informed consent form provided some information about the next steps, if the individual agreed to participate, and the amount of compensation.

All data were collected anonymously. To protect participant anonymity, the Data Protection Office at City, University of London, which is registered with the Information Commissioner’s Office (registration number: Z8947127), was engaged to confirm that any personally identifiable information (PII) was securely collected. Data collection involved 3 companies external to City, University of London: Dev Technosys implemented the app, Linode provided virtual private servers, and Twillo provided programmable messaging services. It was ensured that these companies were compliant with the General Data Protection Regulation (GDPR) and used encrypted communication.

Participants were compensated £2.50 (US $3.13) for their time for the survey part of the study and £16 (US $20.00) for the app-based part.

For the Dutch dataset, ethics approval for the study was provided by the Faculty of Psychology and Neuroscience of Maastricht University in 2013. While the data were not open access, the principal investigators of the study stated availability of the data upon reasonable request [].

UK Dataset: Survey-Based Part of the Study

Data for the first part of the study were collected using a self-administered survey designed using Qualtrics []. The survey was run on the crowdsourcing platform Prolific Academic []. The study was made available on Prolific Academic, and prospective participants decided whether to take part. The study had a target sample of 200 participants on December 23, 2019, and recruitment was closed on February 13, 2020, when this participant number had been reached.

There are no established guidelines for the minimum sample size required for an ML assessment, and a particular ML algorithm can work effectively up to a certain error threshold [] for a promising but still experimental proposal. Based on previous related research (Spanakis et al [] recruited 100 participants for an ML study broadly similar to our study), the expected time to train all 4 ML models for each participant, and our budget for paying participants to take part for 28 days, we aimed for approximately 100 participants for the ML assessment. The target of 200 participants for the survey part of the study was an estimate of how many participants would be needed to identify enough participants with a reasonably high intake of HFSS snacks, who would be willing to take part in the app-based part of the study across 4 weeks. The survey part had no inferential value.

Only UK citizens between the ages of 18 and 60 years were allowed to participate in the study. Participants were excluded if they did not have a smartphone or stated that they were unwilling to participate in the follow-up study (ie, the app-based part). Additionally, we only recruited participants having a T-Mobile, O2, Vodafone, Three, or EE mobile phone service, since at the time of running the study, these were the only SIM card providers in the United Kingdom providing services compatible with the Twillo messaging service, which we employed in the second part of the study. After these exclusions, there were 184 participants for the first part of the study.

The survey consisted of questions concerning basic demographics and motivation for healthy eating (Tables S1 and S2 in ). Specifically, there were 10 questions covering gender, ethnicity, employment status, weight, height, HFSS snacking habit, whether the person is trying to lose weight (2 questions), and whether the person is trying to eat in a healthy way. Participants were not allowed to skip questions. The duration of the survey was about 15 minutes.

UK Dataset: App-Based Part of the Study – Snack Tracker App

We developed the Snack Tracker app to record unhealthy snacking for this project. The app was designed by SD, and the coding was undertaken by Dev Technosys [], a company specializing in app development. We created versions of the app for both Android and iOS devices. However, the app is no longer available for use.

Mobile app development is usually divided into 2 main components: frontend and backend (). Regarding the frontend (the user interface), the app was designed to be easy to use, with a simple sequence of screens ( and Figure S1 in ). The current date and day of the week were automatically captured for each recording to minimize user effort. The app worked online, allowing users to log in and record any snacks they had eaten. In the case of connection loss, the app allowed users to save their records, and the app transmitted the data to the server once the mobile device was connected to the internet.

Figure 2. The frontend (user interface) and backend (data storage) of the Snack Tracker app used for participants in the second part of the study to record their intake of snacks with high saturated fats, salt, or sugar over a period of 4 weeks.

The app frontend was coded using React Native, which is an open-source JavaScript framework for writing iOS and Android apps. All operations performed by app users and project admins were handled by Rest Application Programming Interfaces (APIs) created in Node JS, which allows running JavaScript on the server side. Rest APIs were used to communicate with the database of participant data (MongoDB) for store and retrieve operations (ie, these APIs acted as a bridge between the app frontend and the database where participant data were stored). The server used in this project was a cloud-based server (Linode), which controlled all operations and allowed the management of the app environment.

The backend part of the app concerned storing the data and user credentials, as well as offering a web-based admin panel to manage the project. Participants accepting the invitation to the app-based part of the study were registered manually using the admin panel to prevent random users from recording data on the app. The backend also handled initial user login and user requests to save and record another snack, go back and edit, or just save an entry ().

Figure 3. The frontend of the Snack Tracker app: (A) splash screen, (B) login screen, (C) home screen, (D) new snack screen, (E) time recording screen (time picker), (F) location recording screen, (G) recording save screen, and (H) review recording summary.

The app development included designing and delivering messages as SMS text messages and app push notifications to keep users engaged and remind them to record snacks (). Push notifications could be offered even if participants had no internet access. For automated SMS text messages, Twilio was used, which is a cloud communication tool. The software could programmatically send SMS text messages using Twillo’s web service APIs. For more information, see [].

Figure 4. Examples of reminders sent to participants to “nudge” them to record their intake of snacks with high saturated fats, salt, or sugar during the second part of the study (app-based part). UK Dataset: App-Based Part of the Study – Data Collection

Among the 184 participants who completed the survey-based part of the study, participants were further excluded if they reported consuming fewer than two HFSS snacks per day, leaving 170 participants. We initially invited 100 of these participants to participate in the app-based part of the study, with the invitations sent in 3 stages between February and March 2020. Initially, only 68 participants accepted the invitation, and in the second and third invitation stages, 45 and 25 participants, respectively, were invited, with 27 and 16, respectively, accepting the invitation, resulting in an overall sample of 111.

Participants were instructed to participate for 28 days. However, exact start and end dates of the study differed between participants, as was expected.

During the study, participants were messaged via Prolific if they made only 1 recording or no recordings on any day and if they had made multiple recordings (two or more) each day for a period but then abruptly made a reduced number of recordings. These messages were sent in part to ensure that there were no technical problems with the Snack Tracker app. Additionally, a few Prolific messages were sent to randomly selected participants to check that the reminders regarding snack recording () were received as intended. Informal feedback from participants throughout the study did not indicate any technical problems.

For the purpose of the study, an HFSS snack was defined as any food eaten between the main meals, which was high in either saturated fat, salt, or sugar. Specifically, participants were informed that we are interested in “snacks high in sugar, salt, or fat, which includes… sugary snacks… salty and fatty snacks.” For each of these categories, several examples common in the United Kingdom were provided (eg, biscuits, cake, chocolate, crisps, salted nuts, and salted popcorn). shows exactly how HFSS snacks were explained to the participants. Considering the instructions and several examples of typical items provided, it is likely that participants did not have difficulty with the definition of HFSS snacks. There was no poststudy feedback of any such difficulties. In addition, in keeping with the broad aim of the study (to help people manage their own behavior), there were no strict criteria on what participants should or should not define as an HFSS snack. It was not considered that the app needed to be independently validated, which is in line with similar work, including the study by Spanakis et al [].

Whenever participants had an HFSS snack, they were asked to record it in the Snack Tracker app. In this mode, they only had to mention the location (coded as home, place of work, and other; although a GPS-based method would aid in this process, this method was not possible in our app) as the time and day of the week were saved automatically. If participants had 2 HFSS snacks at the same time, this would be recorded as a single snacking instance. Throughout the study, participants received 3 kinds of notifications (). Daily reminders were sent at 7 PM, asking participants to record any instances of HFSS snacking that they missed. In this mode of the Snack Tracker app, participants had to manually indicate the time and day of the week as well as the location. Additionally, at the end of each week, participants were sent a notification (instead of the daily notification) to keep them engaged and provide information or ask questions (eg, to ask about any technical problems; ). The final notification was sent at the end of the 28-day period (from the first app recording) to instruct the participants that the study had ended, thank them for their participation, and offer them a completion code for Prolific Academic.

Dutch Dataset: Brief Notes

The Dutch dataset was collected by a research group at the Faculty of Psychology and Neuroscience at Maastricht University (Study I in the report by Spanakis et al []). The dataset was collected with an app called Think Slim. The sample consisted of 57 participants who were overweight and 43 participants with a healthy weight in the Netherlands. This study employed EMAs. Data on 15 variables were collected (eg, mood, activity, and location), and the variables were monitored across 8 to 10 measurements per day depending on the participants’ waking and sleeping times. Spanakis et al [] aimed to study eating behavior in general (including main meals and snacks, both unhealthy and otherwise), with a view to identify participant clusters and provide adaptive feedback for improving eating behavior. Accordingly, we extracted measurements corresponding to HFSS snacks from the general eating behavior data [] to create the Dutch dataset. Only “eating moments” were used from the original dataset []. Eating moments were event contingent (ie, they were initiated by users whenever they were about to eat something). Therefore, users were not prompted or reminded to record their eating behavior.

The Dutch dataset was already present when the data collection for the UK dataset was planned, and there is a reasonable question as to why we did not adopt a sampling approach similar to that used for the Dutch dataset. In brief, the purpose of the UK dataset was different from that of the Dutch dataset. For the UK dataset, the priority was to collect snacking information in a way that was as nonintrusive as possible. Moreover, we wanted to focus on individuals with some snacking behavior in the first place, since sparsity of data would complicate the application of ML. On the other hand, the purpose of the Dutch dataset was to explore prediction using a range of variables, with a focus on eating behavior in general and not just snacking, and it aimed to understand differences in daily lifestyle between people with a healthy weight and people who are overweight. Accordingly, the extent of snacking behavior was less relevant.

Data Preprocessing

Regarding the UK dataset, it was decided to exclude participants who did not remain in the study for its intended duration.

Some basic operations were carried out to remove erroneous entries and ensure consistency between the UK and Dutch datasets. In the UK dataset, we removed missing and duplicate entries and ensured that measurement units for height and weight were the same across participants. In the Dutch dataset, we translated data from Dutch to English and manually reclassified 950 locations, originally in free text, into the 3 categories employed in the UK dataset. Finally, HFSS snacks were extracted from the general information about meals or snacks. This involved focusing on data concerning food items, such as burgers, chocolate bars, strawberries, and pasta, and identifying the items considered as snacks. Each snack was manually categorized as healthy or unhealthy (HFSS).

Data were coded in terms of the following 3 features: location (home, place of work, and other), day of the week (Monday to Sunday), and time. A time bin feature was constructed, and there were 4 large and 12 small time bins. A regular day was divided into 4 or 12 time bins. In the former case, the time bins were early morning (midnight to 05:59 AM), morning (6:00 AM to 11:59 AM), afternoon (noon to 4:59 PM), and evening (5:00 PM to 11:59 PM). In the latter case, the first time bin started at midnight and had a 2-hour duration, and every subsequent time bin had a 2-hour duration, resulting in a total of 12 time bins. Models were trained and evaluated for these 2 encodings of the time variable.

As is common in ML, for the 2 nominal variables in the datasets (location and day of the week), one-hot encoding was used, converting each variable to separate variables that take the value 1 or 0 to indicate the presence or absence of different levels of the variable []. The time bin variable was kept in its numerical format.

Finally, although additional data were collected from participants in the United Kingdom, including motivation for healthy eating, these additional variables were not considered in this study for practical reasons and for meeting the focus of snacking prediction using minimal information.

Computational Methods

We employed 3 fixed context models (RFreg, XGBreg, and FFNN) and 1 recurrent model (LSTM). Fixed context models require a fixed input size, which can be achieved by windowing a longer sequence. Windowing is a technique used to divide a longer sequence into smaller, fixed-length sequences. Due to the sparseness of our data, we decided to treat each individual data point as a separate window for prediction, rather than grouping them into larger sequences. This approach is often used when the data are not abundant enough to create longer sequences, and it can simplify the modeling process for the algorithms. Recurrent models can process an input sequence of arbitrary length. They function in cycles, during which the activation from the previous time step is used as input (together with other information) for the current time step. For the single recurrent model in the study, we used the observation sequences of 4 time steps. In our study, the day was divided into different time bins, including a case of 4 time bins (early morning, morning, afternoon, and evening). While the 4 time steps in this model do not directly correspond to the day time divisions, they broadly align with it, and this was sufficient for our analysis.

Some models can benefit from standardization, normalization, and dropout regularization more than others, and these techniques were explored accordingly. Feature scaling (such as standardization or normalization) was considered for the FFNN and LSTM models. Neural networks often benefit from having consistent scales across features. This is because neural networks learn complex relationships and patterns among features, which can be influenced by the differing scales of features, if not appropriately managed [,]. On the other hand, tree-based models, such as RFreg and XGBreg, operate by splitting nodes based on feature thresholds and thus are less sensitive to feature scaling. For the FFNN model, we standardized features using the formula zi = (xi – µ) / σ, where µ and σ are the mean and SD values of the variable, respectively; zi indicates the standardized value; and xi is the original value. For the LSTM model, we employed the normalization yi = xi – min (X) / (max (X)∙min (X)).

Dropout was also explored. It is a regularization technique that randomly sets to zero (drops out) a percentage of the features during training []. Dropout regularization introduces randomness during training and prevents overspecialization of units, which can improve generalization. This is important for neural networks, as they can overfit the training data. Accordingly, we evaluated and ended up retaining dropout regularization for the FFNN and LSTM models. Tree-based models use other (inherently incorporated) mechanisms, such as feature selection, bootstrapping, and ensemble aggregation, to manage overfitting.

We executed a total of 5 experiments for each model (5-fold cross-validation) to evaluate their performance. In each experiment, the models were evaluated based on their predefined configurations (). Both random forest and XGBoost are based on decision trees, and illustrates their differences as well as some of the main parameters.

Table 1. Hyperparameters for each of the machine learning algorithms used to analyze the data collected about consumption of snacks with high saturated fats, salt, or sugar during the app-based part of the study.Model and hyperparameterValue or rangeRFrega
nE, number of estimators (trees in the forest)150
Minss, minimum number of samples an internal node must cover to consider splitting (when a node has fewer samples than this value, it is regarded as final and called a terminal node or leaf)2
Maxdepth, maximum number of splits that each tree is permitted to executeNot user specified (nodes are expanded until leaves are pure or contain less than Minss)XGBregb
nE, number of boosting rounds (estimators)100
MinSS2
Maxdepth6
LR, learning rate (step size for each boosting iteration)0.3
Gammac (how much the loss must be decreased by a split in order for that split to occur)0.05FFNNd
Number of hidden layers4
Dropout regularizatione0.5, after the 2nd and 4th hidden layers
Number of neurons at the hidden layer (NHL)1st hidden layer: 32; 2nd hidden layer: 32; 3rd hidden layer: 8; 4th hidden layer: 8
Activation functionReLUf, in the dense and output layersLSTMg
Number of neurons in the LSTM layer (NLL)128
Number of hidden layers3
Dropout regularizatione0.5, after the 3rd hidden layer
NHL64
Activation functionReLU, in the dense and output layers

aRFreg: random forest regressor.

bXGBreg: Extreme Gradient Boosting regressor.

cIn relation to gamma, loss is the function minimized during model training. It is based on the difference between the current output of the machine learning model and the target (ie, the true value).

dFFNN: feed forward neural network.

eDropout regularization is a technique that randomly switches off neurons in a neural network during training to avoid overfitting. The number given refers to the fraction of neurons switched off.

fReLU: rectified linear unit.

gLSTM: long short-term memory.

Figure 5. The 2 panels of the figure illustrate the differences between the 2 random forest machine learning algorithms in the study. (A) Both random forest regressor (RFreg) and Extreme Gradient Boosting regressor (XGBreg) are based on the same building block, a decision tree, which is a set of rules built from the dataset. In our case, the aim is to predict the time until the next snack having high saturated fats, salt, or sugar in minutes. n is the number of trees in the model, selected based on cross-validation performance. Time k is the expected time until the final unhealthy snack is predicted using Tree k. (B) For RFreg (on the left), the model creates several different trees and then averages predictions. For XGRreg (on the right), there is a sequence of trees, which progressively refine the prediction. Splits in a decision tree correspond to the data points associated with a node being divided and the parts assigned to child nodes. RF: random forest; XGBoost: Extreme Gradient Boosting.

All algorithms had the same objective, which was to predict the time until the next HFSS snack (in minutes), given the current time bin, location (work, home, and other), and day of the week. Specifically, given the presence of an HFSS snack in a time bin, the ML algorithms attempted to predict the number of minutes to the next HFSS snack (snack to snack interval). ML algorithms were applied separately to the snacking data from each participant to take into account individual differences in how the available predictors can be related to snacking behavior. In ML terms, the question is whether there is structure in the data to predict the density of snacking instances, based on the available predictors, location, time bin, and day of the week. To summarize the main approach, we used an input of fixed size, comprising current location, current day of the week, and current time bin, for RFreg, XGBreg, and FFNN. The LSTM model processed data as a sequence of inputs.

To appreciate the capacity of the ML models to extract regularities from the data, 2 baseline models were explored. The first was a simple linear regression model, which attempted to predict snacking instances based on a linear combination of available features, which were suitably weighted. The second was a basic baseline model corresponding to prediction based on the grand mean of snacking (computed separately for each participant) across the time bins.

Five-fold cross-validation was applied on all fixed-context models. In each data split, the datasets were divided into 2 parts (80% for training and 20% for testing). That is, 20% of the data reserved for testing constitutes out-of-sample validation of the models. The error (mean absolute error [MAE]) values were computed on the part of the dataset left for testing (20%). This procedure was repeated 5 times, with each iteration corresponding to a different, randomly determined partition of the data into training and test parts. Reported MAE values were averaged across these 5 iterations. We ensured that data distributions in the training and testing data subsets were similar to those in the whole dataset, using stratified data sampling []. Regarding the choice of using 5-fold cross-validation, there is a question of whether it would make sense to have more “folds.” However, each different “fold” increases the number of simulations that must be run and leads to more unbalanced training and testing subsets. Five-fold cross-validation is a fairly standard approach.

For the LSTM, a time series cross-validation was employed. Time series cross-validation is more suitable for temporal data modeling, because the training set only includes the observations occurring before those in the testing set. It begins with a small subset of data for training that is successively extended to generate new predictions []. Because of the sparseness of the data, it was decided to use time series cross-validation with only 2 splits.

MAE is the main evaluation metric for the models. MAEs were computed as differences in minutes between the time of the current snack and the predicted time for the next snack. This is because our research focus was the time difference between the prediction and the actual time until the next HFSS snack occurrence. Additionally, residuals allow quantification of positive and negative errors, which were employed in a further analysis (related to the creation of hypothetical interventions). Residuals were calculated as true values minus predicted values. Thus, positive residuals indicate that the predicted time is earlier than the observed time, and negative residuals indicate that the predicted time is later than the observed time. It is important to note that the MAE is computed as the average of the absolute values of the residuals. Residuals were examined as standard, and corresponding plots are provided in .


ResultsPreliminary Notes

Regarding the UK dataset, as expected, some participants did not stay in the study for its intended duration (4 participants). Majority of the participants (107/111) recorded their HFSS snacks for the full 28 days or more. Among these participants, there were 34 female and 73 male participants, with an average age of 32.85 (SD 10.10) years. Some participants continued recording snacking for an additional 6 to 11 days after reaching the 28-day target and after receiving the completion message. These additional data were used in the analyses. Overall, the total number of recordings across all participants was 5391, which reduced to 4978 data points after data cleaning. There were some positive comments about the study in general. For example, participants mentioned that recording their snacks made them aware of the amount consumed daily and helped them reduce food intake.

There were 413 missing and duplicate entries in the UK dataset. Regarding duplicates in the UK dataset, when participants recorded their snacking instances, they might have pressed the submit button twice, which resulted in 2 instances of the same event. However, such duplicate instances were few (approximately 7.7%). In each participant, the total number of recorded snack instances ranged from 7 to 157.

The Dutch dataset included 3705 data points. In each participant, the number of recorded snack instances ranged from 6 to 348. Demographic and other characteristics of the Dutch dataset can be found in the original publication [].

The sizes of the training and testing sets were 3982 and 996, respectively, in the UK dataset and 2778 and 927, respectively, in the Dutch dataset.

We noted that normalization slightly enhanced the results from the LSTM model but not the FFNN model.

Model Results

and show the MAEs for the 4 models using all input features. In brief, the RFreg model performed very poorly with the UK dataset using 4 time bins. In contrast, the XGBreg, FFNN, and LSTM models all demonstrated reasonable performance with the UK dataset using 4 time bins, achieving lower MAE values and indicating better predictive ability compared to the RFreg and baseline (grand mean) models. When using 12 time bins for the UK dataset, all 4 models performed relatively similarly and demonstrated satisfactory performance. It is not possible to compute significance values from MAE differences in such an ML analysis. This is because to compute statistical significance, some knowledge of the sampling distribution of MAE differences is needed if the null hypothesis (there is no structure in the time series) is true, which is not possible. The best guide for the interpretation of MAE values is the MAE of the baseline (grand mean) model.

Table 2. Performance of the machine learning algorithms for predicting consumption of snacks with high saturated fats, salt, or sugar in the UK and Dutch datasets.Dataset and modelTraining set with 4 time bins (MAEa,b)Testing set with 4 time bins (MAEb)Training set with 12 time bins (MAEb)Testing set with 12 time bins (MAEb)UK dataset
Grand mean29.247.329.247.1
LRc9.356.36.860.3
RFregd14.752.36.216.2
XGBrege4.817.82.317.5
FFNNf15.315.815.316.3
LSTMg14.715.914.816.5Dutch dataset
Grand mean1049.01397.31049.01400.1
LR116.1816.199.2914.6
RFreg306.9309.9311.5239.1
XGBreg3.2238.94.1229.9
FFNN690.3 (133.8h)151.5 (130.0h)723.3 (133.7h)154.0 (129.8h)
LSTM1271.5171.01330.0174.2

aMAE: mean absolute error.

bMAEs for predicting the time of the next snack having high saturated fats, salt, or sugar in minutes using all features (fractional part of a minute is indicated as decimals). A lower MAE is better. The average across participants is provided.

cLR: linear regression.

dRFreg: random forest regressor.

eXGBreg: Extreme Gradient Boosting regressor.

fFFNN: feed forward neural network.

gLSTM: long short-term memory.

hThe numbers in parentheses refer to model performance after applying regularization and early stopping.

Figure 6. A graphical illustration of the relative performance of the machine learning algorithms (random forest regressor [RFreg], Extreme Gradient Boosting regressor [XGBreg], feed forward neural network [FFNN], and long short-term memory [LSTM]) for predicting consumption of snacks with high saturated fats, salt, or sugar in the UK and Dutch datasets and using 2 different approaches for partitioning time. Performance is quantified using the average mean absolute error (MAE) values from Table 2 (lower values are better). The error bars indicate the IQR. LR: logistic regression; RF: random forest; XGBoost: Extreme Gradient Boosting.

For the Dutch dataset, errors were high. The FFNN model was the best performing model when using both 4 and 12 time bins. The training MAE for the FFNN model was much higher than the testing MAE, suggesting underfitting of data but good generalization. To assess whether the training MAE can be reduced, regularization and early stopping with the FFNN model were employed for the Dutch dataset.

The results can be explored in 2 ways. First, the scatter plots of residuals can be considered for the best performing version of each model. For most noisy processes, the default expectation is that residuals should be evenly distributed around zero, with the spread of the residuals dependent on the amount of noise and the quality of the model. A narrow spread of residuals around zero is generally indicative of good quality predictions []. Model residuals with scatter plots are presented in . Overall, nonlinear models, particularly random forest, XGBoost, and FFNN, show the most potential for accurately predicting the time until the next HFSS snack, with the predominance of positive residuals being advantageous for just-in-time interventions (since positive residuals indicate that the prediction is earlier than the actual event).

Second, feature importance can be examined in the prediction task. Although there were only 3 features for prediction (time bin, location, and day of the week) and all of them appeared important, there might be enough information for prediction after eliminating day of the week or location. Feature importance analysis is one way to proceed in this case, but because of the approach used for model fit (individually for each participant), this was deemed impractical. Therefore, instead, we carried out 2 feature ablation analyses, one for location and another for day of the week. For the UK dataset, both these features appeared essential for the prediction task, since eliminating either one of them reduced model performance quite substantially (without either of the features, model performance became equivalent to that of the grand mean model) ( and ). On the other hand, for the Dutch dataset, the reduction in model performance in the ablation analyses was milder, indicating that prediction could be based on a subset of the available features.

Table 3. Location ablation analysis of the machine learning algorithms for predicting consumption of snacks with high saturated fats, salt, or sugar in the UK and Dutch datasets.Dataset and modelTraining set with 4 time bins (MAEa,b)Testing set with 4 time bins (MAEb)Training set with 12 time bins (MAEb)Testing set with 12 time bins (MAEb)UK dataset
Grand mean29.247.329.247.1
LRc17.1>10,000d13.8>10,000d
RFrege15.352.212.142.5
XGBregf10.249.94.246.6
FFNNg26.045.926.045.9
LSTMh23.455.423.555.4Dutch dataset
Grand mean1049.01397.31049.01398.0
LR132.4>10,000d76.1>10,000d

Comments (0)

No login
gif