Interpretable Multimodal Learning for Intelligent Regulation in Online Payment Systems

  title={Interpretable Multimodal Learning for Intelligent Regulation in Online Payment Systems},
  author={Shuoyao Wang and Diwei Zhu},
With the explosive growth of transaction activities in online payment systems, effective and real-time regulation becomes a critical problem for payment service providers. Thanks to the rapid development of artificial intelligence (AI), AI-enable regulation emerges as a promising solution. One main challenge of the AI-enabled regulation is how to utilize multimedia information, i.e., multimodal signals, in Financial Technology (FinTech). Inspired by the attention mechanism in nature language… 

Figures and Tables from this paper

Deep LOB trading: Half a second please!

  • Jie YinH. Wong
  • Computer Science, Economics
    Expert Systems with Applications
  • 2022

Intelligent fi nancial fraud detection practices in post-pandemic era

A comprehensive overview of intelligent fraud risk caused by the pandemic is provided and the development of data types used in fraud detection practices from quantitative tabular data to various unstructured data are reviewed.

Intelligent financial fraud detection practices in post-pandemic era

ℓ2 Norm is all Your Need: Infrared-Visible Image Fusion VIA Guided Transformation Minimization

A low-complexity fusion algorithm via guided transformation minimization, namely L2GTM is proposed, which enjoys the uniqueness of the optimal solution to adaptively integrate information from both infrared and visible images.



Success Prediction on Crowdfunding with Multimodal Deep Learning

This work designed and evaluated advanced neural network schemes that combine information from different modalities to study the influence of sophisticated interactions among textual, visual, and metadata on project success prediction.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering

A novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities is proposed, which can robustly capture the high-level interactions between language and vision domains.

AttnSense: Multi-level Attention Mechanism For Multimodal Human Activity Recognition

AttnSense introduces the framework of combining attention mechanism with a convolutional neural network and a Gated Recurrent Units network to capture the dependencies of sensing signals in both spatial and temporal domains, which shows advantages in prioritized sensor selection and improves the comprehensibility.

Bilinear Attention Networks

BAN is proposed that find bilinear attention distributions to utilize given vision-language information seamlessly and quantitatively and qualitatively evaluates the model on visual question answering and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets.

Comprehensive Semi-Supervised Multi-Modal Learning

This paper proposes a novel Comprehensive Multi-Modal Learning (CMML) framework, which can strike a balance between the consistency and divergency modalities by considering the insufficiency in one unified framework and utilizes an instance level attention mechanism to weight the sufficiency for each instance on different modalities.

Position Focused Attention Network for Image-Text Matching

This paper proposes a novel position focused attention network (PFAN) to investigate the relation between the visual and the textual views, and integrates the object position clue to enhance the visual-text joint-embedding learning.

VQA: Visual Question Answering

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language

Visualizing Data using t-SNE

A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

This work proposes to incorporate generative processes into the cross-modal feature embedding, through which it is able to learn not only the global abstract features but also the local grounded features of image-text pairs.