A variety of real-world processes produce sequences of data whose complex temporal dynamics need to be modeled. For example, real-time event-driven customer interaction systems capture user actions (events) generated by customers on web and mobile applications. These user actions across various digital end-points are directed towards achieving specific outcomes and represent certain user intent. The ability to learn user intent through user’s actions can help derive deeper insights into customer behavior vis-à-vis specific outcomes.
Typically, a user starts interaction on an interface, with a specific initial intent or even without one. The intent at a particular moment is locally affected by the actions performed during a specific period of interaction, i.e., a session. Therefore, these series of actions create an event sequence during a user interaction session. The combined effect of these events and intent, leads to an outcome that can itself be either intermediate, for example, user logs off or closes the browser window to terminate the current session, or final, for example, the user makes a purchase or abandons the cart. In most cases an intermediate outcome leads to further actions or event sequences that ultimately end in a final outcome (in a subsequent user session).
The ability to transform these actions into ordered sequences with available outcomes helps us apply supervised sequence learning methods to learn models of user interaction. Since the action sequences during a session are driven by behavior and intent, being able to learn from these sequences can help us gain deeper insights into the underlying models.
Such sequence classification applications have a broad range of real-world applications including predicting order probabilities to drive recommender system use cases, early prediction of shopping cart abandonment, inject interventions such as on-the-spot discounts or loyalty point redemptions in the purchase event sequence, fraud detection, online inventory positions, real time in-store interventions by a sales associate, and so on. Additionally, such applications can also help identify product design improvements or drive real time cross-sell / upsell use cases such as presenting best offers for each customer.
Challenges in event-sequence modeling
The nature of events and event sequences present several challenges in modeling them. Since temporal order is important in sequence data, in many critical applications of sequence classification such as checkout prediction or cart abandonment, early prediction is a highly desirable feature of sequence classifiers. In early prediction, a sequence classifier uses a prefix of a sequence (as short as possible) to make a reasonably accurate prediction.
There are three important characteristics of features for early prediction. First, a feature should be relatively frequent. A frequent feature in the training set may indicate that it is applicable to many sequences to be classified in the future. On the other hand, an infrequent feature may lead to overfitting a small number of training samples. Second, a feature should be discriminative between classes. Discriminative features are powerful in classification. Last, we consider the earliness of features. We prefer features to appear early in sequences in the training set in order to support early predictions (based on sequence prefixes), effectively.
Obviously, the longer the sequence for a shopper, the more information is available about the shopper and the more accurate a classification decision can be made. Making early predictions can lead to earlier interventions or present more options for deciding the appropriate point in the event sequence to inject the intervention. For example, the checkout rate is high if the undesirable outcome can be detected at an early stage. Or, intervening with a discount offer on pair shoes in the cart after the shopper revisits the shoes section of the site for the third time.
In (conventional) sequence classification, each sequence is associated with only one class label and the whole sequence is available to a classifier before the classification. For a streaming sequence, which can be regarded as a virtually unlimited sequence, instead of predicting one class label, it is sometimes more desirable to predict a sequence of future labels. Such predictions are harder to make but are very attractive as they present multiple opportunities for interventions to nudge the customer at key times – towards completing the journey with a desired outcome.
Learning behavior of a user is a personalization property and is generally harder to learn, but event sequences or intent recognition can be generalized over users. Due to the extremely large number of users and sparseness in data across features, learning behavior for each user remains a challenging task. Therefore, in order to avoid the inherent cold-start problems associated with user-based modeling we can start with a model that learns from all event sequences (across all users and sessions) instead of a specific user’s event sequences.
Compared to the classification task on feature vectors, sequences do not have explicit features. Most of the classifiers, such as decision trees, can only take input data as features vectors. However, there are no explicit features in sequence data. Moreover, even with sophisticated feature selection techniques, the dimensionality of potential features may still be very high and the computations, costly. Besides accurate classification results, in some applications, we may also want to get an interpretable classifier. Predictions that are harder to explain make it tricky to offer answers to customers asking for transparency about how algorithms determine their individual user experience. Furthermore, a better understanding of trained models can help retailers to improve their services. Building an interpretable sequence classifier is difficult since there are no explicit features. This makes sequence classification a more challenging task than classification on feature vectors.
Implementing ML applications based on event sequences
Supervised sequence learning identifies the ordered relatedness of different events and uses them together as an inter-linked sequential input instead of using them as independent events.
There are typically two distinct parts to the application of machine learning algorithms in event-driven systems: Events Data Generation and Sequence Learning.
Events data generation
Depending on the use case, a web or mobile app can be instrumented to generate the requisite events. Such instrumentation of the customer digital endpoints can be accomplished offline or they can be configured dynamically from server-side using tags. Dynamic configuration allows you to go-live quicker, and supports progressive introduction of increasingly sophisticated real time use cases. It can also help you change the instrumentation on-the-fly in response to a particular campaign’s effectiveness against expected performance metrics.
The events generated from the digital channels or apps are assembled into an event sequence for a given user session. These can include events related to clicks, taps, scrolls, and so on. Additionally, some events can also have associated data payloads, for example, an add-to-cart event can have product categories, cart value, item details, etc. of the current cart contents, as the data payload. The session events are then filtered and ordered by their timestamps to form the event sequence. For the purposes of machine learning, we need to represent these events as a set of features based on the use case scenario. We perform a feature extraction process at this stage, and then standardize and normalize the features as required.
Sequence learning AI models
In this part, we perform sequence learning on the event sequences with respect to a specific target. The trained sequential learning model is used for making appropriate predictions. Note that multiple sequence learners can be trained simultaneously for different target labels using the same set of event sequences, for use in different prediction models.
Most of the popular machine learning methods used in e-commerce employ vector-based models: they operate on feature vectors of fixed length as input. In order to apply them, one needs to typically convert event sequence history data into fixed sets of features. These features are usually handcrafted by domain experts. This exercise requires many iterations of empirical experiments, and is time consuming and tedious human work.
Recurrent neural networks (RNNs) can overcome tedious feature engineering work required in vector-based methods. Historical user events are inherently sequential and of varying lengths, making RNNs a natural model choice. In e-commerce, available data sources and prediction scenarios often change, making the generality of RNNs appealing as no problem-specific feature engineering has to take place.
Recurrent Neural Network models like Long Short-Term Memory (LSTM) are powerful neural models that efficiently learn sequences to derive the implicit relationship between sequence elements. Such models can be trained to learn user patterns on the system corresponding to several outcomes. For example, using scroll rate and screen time on specific product pages product reviews & Q&A sections can provide a quantifiable measure of user’s indecisiveness or confusion with respect to the product. While these models may not be true measures of causation, they can at a minimum learn the presence of any strong correlation between the evolving sequence patterns and outcomes.
There are other ML applications that use unsupervised learning algorithms. For example, we can cluster event sequences into reasonably sized groups which are much smaller than the number of behaviors observed on the system overall. Such clusters can be matched categories related to shopping strategies such as shallow, directed buying, search deliberation, hedonic browsing, knowledge building and so on. Additionally, we can also use clustering algorithms for customer segmentation based on a combination of event sequences, event payloads such as contents of the cart, and customer attributes (customer demographic data) to validate existing segments and discover new ones as campaign target groups.