Gambling disorder is characterized by problematic gambling behavior that causes significant problems and distress. This study aimed to develop and validate a predictive model for screening online problem gamblers based on players' account data.
Two random samples of French online gamblers in skill-based (poker, horse race betting and sports betting, n = 8,172) and pure chance games (scratch games and lotteries, n = 5,404) answered an online survey and gambling tracking data were retrospectively collected for the participants. The survey included age and gender, gambling habits, and the Problem Gambling Severity Index (PGSI). We used machine learning algorithms to predict the PGSI categories with gambling tracking data. We internally validated the prediction models in a leave-out sample.
Offline Gambler System V 7 0
When predicting gambling problems binary based on each PGSI threshold (1 for low-risk gambling, 5 for moderate-risk gambling and 8 for problem gambling), the predictive performances were good for the model for skill-based games (AUROCs from 0.72 to 0.82), but moderate for the model for pure chance games (AUROCs from 0.63 to 0.76, with wide confidence intervals) due to the lower frequency of problem gambling in this sample. When predicting the four PGSI categories altogether, performances were good for identifying extreme categories (non-problem and problem gamblers) but poorer for intermediate categories (low-risk and moderate-risk gamblers), whatever the type of game.
At the same time, online gambling provides interesting opportunities to study actual gambling behavior by using operators' routinely collected data (Deng, Lesch, & Clark, 2019; S. Gainsbury, 2011). Although they come with some limitations, related for example to the lack of contextual factors, ethical issues (protection of participants' privacy or difficulty obtaining informed consent) or methodological problems (multiple accounts for one gambler or multiple gamblers for one account) [for a detailed analysis of advantages and disadvantages of account-based gambling data, see (S. Gainsbury, 2011)], players' account-based gambling data are considered more reliable and less biased than self-reported online gambling behaviors (Braverman, Tom, & Shaffer, 2014; Catania & Griffiths, 2021; S. Gainsbury, 2011; Heirene, Wang, & Gainsbury, 2021).
Given the number of gamblers whose activity can be observed and the richness of players' account-based data, supervised machine learning algorithms appear to be an interesting option to identify individuals with gambling problems (Percy et al., 2016; Philander, 2014). Moreover, studies based on the analysis of gamblers' account data were often restricted to a single type of gambling (for example, only sports betting or only poker), using data from a single gambling operator. This is due to the difficulty for researchers to access gambling data from operators, and the virtual impossibility to link a player's account data from distinct operators. Thus, the observed gambling behavior might not reflect the complete online gambling behavior of an individual. A possible way to overcome this limit is to gather data from national regulatory authorities, when they exist. Indeed, they usually store account-based gambling data from all operators for the purposes of regulatory compliance checks, but may also send them to research teams under certain conditions when authorized by law and justified by the interest of the study.
The French regulation since 2010 provides that only four types of gambling are authorized online: poker, horse race betting, sports betting and lotteries (including draws, bingo and scratch games). All other forms of online gambling, especially online casino games (online slot machines, online table games except poker), have always been banned in France. On the one hand, among the four authorized types of online gambling, only poker, horse race betting and sports betting are opened to competition, in the framework of a license-based system managed by the Regulatory Authority for Online Gambling (Autorité de Régulation des Jeux En Ligne, ARJEL). In its capacity as the national regulator, the ARJEL is authorized to collect and store account-based gambling data from all licensed operators. On the other hand, lotteries are subjected to a monopoly from the historical national operator (Française des Jeux, FDJ), that was not regulated by the ARJEL before 2020.
Given the French regulations, we requested account-based data from both ARJEL and the FDJ. The ARJEL dataset contained individual-level data from all authorized operators. The data covered poker, horse race betting, and sports betting. If a gambler had multiple accounts (e.g., across multiple operators), then the ARJEL aggregated the data across all of the accounts. The FDJ dataset contained individual-level data related only to lotteries. This approach allowed us to cover the whole range of online gambling activities authorized in France rather than being operator-specific. Moreover, this allowed us to develop the prediction model separately for gambling forms that involve skill (sports and horse race betting, poker) and for pure chance games (lotteries) (Bjerg, 2010). The architecture of those two datasets is given in Table S1 of the supplementary material.
The ARJEL sent an email to a random sample of 840,797 online gamblers who had an active gambling account (i.e., had placed at least one bet during the previous twelve months) in the competition market in two successive waves (November 2015 and February 2016). The e-mail contained an invitation to respond to an online survey hosted by ARJEL. A total of 9,306 gamblers (1.1% of those invited to participate) responded to the whole survey.
The FDJ sent the same type of email in July 2019 to a random sample of 303,000 online gamblers who had an active gambling account in the monopoly from FDJ. The e-mail contained an invitation to respond to an online survey, with the same content as for the ARJEL survey, hosted by the University Hospital of Nantes. A total of 5,682 gamblers (1.9% of those invited to participate) responded to the whole survey.
Both datasets contained basic demographic data (age and sex), gambling tracking data during the twelve months preceding survey completion, and answers to the survey questions (see Table S1 of the supplementary material for a detailed list of variables and how they were operationalized). The list of metrics extracted from the gambling accounts was determined with the objective to have a model that could handle a large roster of gamblers without running into run time or memory issues. As a consequence, data providers eliminated metrics that were found to be computationally infeasible. This was especially the case for time-related metrics; for example, computing session length requires extracting start and end points from a sequence of bets' timestamps and looping the sequence of timestamps multiple times.
Moreover, we computed a second type of indicator related to breadth of involvement, defined as the range of participation in various forms of gambling. Breadth of involvement is traditionally measured as the number of games an individual plays (Binde, Romild, & Volberg, 2017). High breadth of involvement, also referred to as versatility (Welte, Barnes, Tidwell, & Hoffman, 2009), means that the gambler is engaged in multiple forms of gambling (Binde et al., 2017). It has been found to be associated with problem gambling, potentially as a moderator between gambling on the internet and developing gambling problems (Baggio et al., 2017). We computed the breadth of involvement as the number of different games for which at least one bet was placed by a given participant. This variable ranged from 1 to 10 for the ARJEL and from 1 to 3 for the FDJ (see the online appendix for the types of game considered).
As the reference period of current PGSI status was the last thirty days, we used data from the previous four months to develop the model and predict current PGSI status. Indeed, all gambling indicators were aggregated at the month level (i.e. over five-week periods) and variability indicators were estimated in relation to the previous three months (usual activity). The four-month period was a trade-off between having a sufficient hindsight of gambling activity, and having a reactive screening tool that would not require going too far back in gambling activity history. As a consequence, we excluded from the analyses gamblers who had created their account less than four months before survey completion (n = 1,134 for the ARJEL dataset and n = 278 for the FDJ dataset). We also excluded individuals who did not gamble in the reference period covered by the current PGSI status (i.e., thirty days before the survey) (n = 813 for the ARJEL dataset and n = 325 for the FDJ dataset). The final ARJEL and FDJ datasets contained 7,359 and 5,079 gamblers, respectively. 2ff7e9595c
Comentarios