RESEARCH PAPER - IA de Investigación especializada en el MIT

TRABAJO DE INVESTIGACIÓN

Auto Insurance Fraud Detection with Multimodal
Aprendiendo

Jiaxi Yang1, Kui Chen1, Kai Ding1, Chongning Na1† & Meng Wang2

1Financial Technological Research Center, Laboratorio de Zhejiang, Hangzhou 361005, Porcelana

2School of Computer Science and Engineering, Southeast University, Nanjing 211189, Porcelana

Palabras clave: Auto Insurance Multi-modal Learning; Fraud detection; Ensemble learning

Citación: Cual, J.X., Chen, K., Ding, K., et al.: Auto insurance fraud detection with multimodal learning. Data Intgelligence 5(2),

388-412 (2023). doi: 10.1162/dint_a_00191

Recibió: Feb. 10, 2022; Revised: Apr. 20, 2022; Aceptado: Julio 1, 2022

ABSTRACTO

En años recientes, feature engineering-based machine learning models have made significant progress in
auto insurance fraud detection. Sin embargo, most models or systems focused only on structural data and did
not utilize multi-modal data to improve fraud detection efficiency. To solve this problem, we adapt both
natural language processing and computer vision techniques to our knowledge-based algorithm and construct
an Auto Insurance Multi-modal Learning (AIML) estructura. We then apply AIML to detect fraud behavior in
auto insurance cases with data from real scenarios and conduct experiments to examine the improvement
in model performance with multi-modal data compared to baseline model with structural data only. A self-
designed Semi-Auto Feature Engineer (SAFE) algorithm to process auto insurance data and a visual data
processing framework are embedded within AIML. Results show that AIML substantially improves the model
performance in detecting fraud behavior compared to models that only use structural data.

1. INTRODUCCIÓN

According to the insurance industry development report issued by China Insurance Regulatory Commission
(CIRC) in April 2021, until the end of 2020, there are in total 235 insurance companies with total assets
de 23 trillion RMB, among which the income from insurance premiums is 4.53 trillion RMB, making China
the second largest insurance market across the world. Conservatively speaking, China’s auto insurance fraud
leakage accounts for at least 20% of the total compensation amount [1]. The estimate of China’s auto

†

Autor correspondiente: Chongning Na (Correo electrónico: na@zhejianglab.com; ORCID: 0000-0003-2680-5774).

© 2023 Academia China de Ciencias. Publicado bajo una atribución Creative Commons 4.0 Internacional (CC POR 4.0)
licencia.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Auto Insurance Fraud Detection with Multimodal Learning

insurance compensation is 472.55 billion RMB in 2020, correspondingly, the loss caused by insurance
fraud leakage is more than 90 billion RMB [2]. The huge amount of losses has led to great efforts spent on
auto insurance fraud detection. Besides, concealed crime and gang crime also make it challenging in
investigación, evidence collection and automatic identification of fraud information.

Many methods have been developed to analyze and predict insurance fraud behaviors, such as bayesian
modelling, clustering analysis, data mining and random forest etc. [3, 4, 5, 6, 7, 8] Most existing models
only rely on structural tabular data and are highly likely to have over-fitting issues and bad performance in
real data due to sparse feature, poor label quality and missing data in other modality. Different types of
data are collected in different stages of the insurance claim, such as structural data, photos of accident
escenas, invoices and letters of responsibility etc, and those provide a promising means of automatically
detecting auto insurance fraud with multi-modal information by using deep learning models. Extracting
information from multi-modal data would provide useful anti-fraud insights for professionals in insurance
industry and provide entry point in questioning high-risk cases. It also reduces the loss of insurance fraud
and the cost of repeated investigation.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

This paper proposes an ensemble learning method, Auto Insurance Multi-modal Learning (AIML). El
system of AIML includes feature extraction from multi-modal data, feature engineering and tree-based
clasificación. Computer vision and natural language processing models are necessary to extract factors in
the form of structural data from images and texts that may be correlated with auto insurance fraud behaviors.
AIML will be examined by its capability of detecting fraud behavior in real data from an auto insurance
company. Our research will answer the following questions:

1. How to build AI models that could precisely predict high-risk cases?
2. How to use AI to make maximum utilization of multi-modal data that are collected during the

insurance business?

3. How to use AI to extract risk factors from different types of data, will these factors be helpful in

predicting insurance fraud?

Results show that AIML could extract risk factors from multi-modal data efficiently and improve the
model performance to predict auto insurance fraud behavior. Compared to baseline machine learning
model that only uses structural data, the ensemble model in AIML increases the AUC by 12.24% en
predicting fraud behavior with multi-modal data. The rest of the paper is organized as follows. Sección 2
outlines the related work of auto insurance fraud detection and the state-of-the-art methods of multi-modal
data processing. Sección 3 describes details of the experimental dataset and the design of our evaluation.
Sección 4 shows the results and model performances based on our design. Sección 5 concludes and
discusses possible future topics.

2. RELATED WORK

En esta sección, we summarize related work in two main areas: auto insurance fraud detection and multi-

modal data processing methods.

Data Intelligence

389

Auto Insurance Fraud Detection with Multimodal Learning

2. 1 Auto Insurance Fraud Detection

Insurance fraud detection can be treated as a binary classification or multiple classification problem.
Many researchers have adapted machine learning models to auto insurance fraud detection and have
achieved solid results. Viaene et al. [3], Kasˇc´elan et al. [4] and Li et al. [5] examined the performance of
Bayesian modelling, clustering analysis, data mining and random forest in auto insurance fraud detection.
David et al. [9] achieved features and characteristics of population with high-risk in fraud behavior by
analyzing the age variable of insurance holder. He et al. [6], Guo et al. [7] and Wang et al. [10] further
explored the potential of deep learning models in fraud detection. Subudhi et al. [11] and Majhi et al. [12]
built mixture models that could detect auto insurance fraud effectively. Tuo et al. [13] and Liu et al. [14]
first discussed and studied the game theory of insurance fraud in China. Gui et al. [15] have reviewed and
classified literature on moral hazard of auto insurance. Zhao et al. [16], Tang et al. [17] and Wang et al. [18]
applied traditional machine learning methods to model insurance fraud behavior based on Chinese auto
insurance market data. It is not until recently that Yan et al. [19, 20], Yu et al. [1] and Xu et al. [21] started
to analyze insurance fraud problem with deep learning models and mixture models and made progress in
the field of auto insurance detection.

Although different methods have been proposed to analyze different types of data generated from the
business process of auto insurance, few multi-modal data-oriented models have been built in the field of
auto insurance fraud detection. More high-risk factors await to be extracted from the multi-modal data,
p.ej., images, textos, to detect fraud behavior.

2.2 Multi-modal Data Processing

Mult i-modal data processing has been widely adapted in the scenario of multimedia [22], disaster
supervisión [23] and intelligence analysis [24]. The representative work is GAIA proposed by Li et al. [22].
The GAIA system consists of a text knowledge extraction branch and a visual knowledge extraction branch
and thus enables seamless search of complex graph queries, and retrieves multimedia evidence including
texto, images and videos.

In the aspect of machine learning in multi-modal processing, Ngiam et al. [25] adopted the idea of
shared representation learning to extend the idea of unsupervised learning of auto-encoders to the field of
multi-modal learning, aiming to map data from different modalities to a uni-dimensional space. The core
idea is to use noise degrading auto-encoders to represent each modality separately and then use another
auto-encoder to fuse them into a multi-modal representation at the neural network fusion layer. Otro
method is the shared representation learning, whose idea is to project each modality into independent but
constrained spaces for representation. Por ejemplo, Wang et al. [26] proposed a compact hash coding
method for multi-modal expression. In their work, a deep learning model is designed to generate hash-
codes based on the inter-modal and intra-modal correlation constraints, and then the redundancy of hash
coding features is reduced based on orthogonal regularization method. Peng et al. [27] proposed the
concept of cross-media intelligence. It refers to the function of human brains across different sensory
información, such as sight, hearing, language and other cognitive features of the outside world. It mainly

390

Data Intelligence

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Auto Insurance Fraud Detection with Multimodal Learning

studies the techniques and application of multi-modal learning in cross-media reasoning analysis, incluido
fine-grained image classification, cross-media retrieval, text-generated image and video description generation,
etc.. Wu et al. [28] proposed a neural network that combines both visual information and text information
to recognize and disambiguate entities in short texts, whose core idea is to connect visual and text
information through embedding generated representation learning and to introduce a common concern
mechanism for fine-grained information interaction. Experiments show that this method is superior to
methods that only rely on text information.

In the aspect of knowledge engineering, a representative work is from Mousselly et al. [29], where they
constructed a unified knowledge embedding based on visual features, text features and structural features
of symbolic knowledge. Compared with traditional structure-based knowledge graph representation
aprendiendo, their performances in link prediction and entity classification tasks were improved. Xie et al. [30]
later proposed an improved model IKRL, whose core idea is to conduct joint modeling of visual features
and structural features of knowledge graph, so as to generate multi-modal knowledge graph embedding
with higher quality through connections between different types of modality. Chen et al. [31] explored how
to effectively jointly mapping and modeling cross-modal semantic information in the knowledge graph,
thus laying an important foundation for supporting intelligent application services for multi-modal content.
Guo et al. [32] further explored the entity alignment task of multi-modal knowledge graph, which mainly
extended the multi-modal entity alignment task from Euclidean space to hyperbolic space.

Since there are many relatively mature algorithms for each type of data, digging more information and

factors from both text data and visual data in the scenario of auto insurance is practical and promising.

3. FRAMEWORK

En esta sección, the multi-modal insurance fraud detection framework of AIML is explained in detail.

En general, our framework includes three modules as shown in the Figure 1.

Cifra 1. AIML workﬂ ow.

Data Intelligence

391

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Auto Insurance Fraud Detection with Multimodal Learning

Structural data will be cleaned and processed by a feature engineering model to extract and generate
risk factors for fraud behavior. Both text and visual data will be processed by systems that are embedded
with Natural Language Processing (NLP) models and Computer Vision (CV) algorithms to extract risk factors
correspondingly. Finalmente, the ensemble factors will be assigned to a machine learning model to predict
fraud behavior.

3.1 Structural Data and Baseline Model

The workflow of baseline model in AIML is:

1. Data are collected based on cases and stages from insurance companies, including case reporting
stage, investigating stage and loss verification stage (All data are labeled and verified by experts and
professionals from insurance companies).

2. Collected data are then cleaned and pre-processed, es decir., cases with more than 50% missing information

will be removed, categorical variables will be one-hot encoded.

3. New features are generated with feature engineering algorithms from original features.
4. New features are fed to a machine learning model to achieve predicted outcomes.

During the predicting process, feature engineering is an essential part in the process of predicting
problems for real case scenarios. It is divided into feature classification and feature derivation, among which
feature classification refers to the classification of original features based on their distributions; feature
derivation refers to feature synthesis based on classified features in order to obtain richer feature combinations.
After comparing multiple popular machine learning and deep learning methods, AIML uses the combination
of Semi-Auto Feature Engineering (SAFE) for automated feature engineering, which is a self-designed and
semi-automatic method for feature engineering, and eXtreme Gradient Boosting tree (XGB) [33] to predict
whether the case is fraud or not.

3.2 Unstructured Text Data Processing

Extracting risk factors from auto insurance case description texts is treated as NLP text mining tasks. Allá
are in total six text data mining tasks in AIML, es decir., recognizing driving status, type of accident, type of
caminos, cause of accident, number of cars and parties involved in the accident.

392

Data Intelligence

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Auto Insurance Fraud Detection with Multimodal Learning

Mesa 1 illustrates the key information we extracted from the unstructured text data:

Mesa 1. Example of text data.

Descriptions

Num. De
Cars

Cause of
Accident

Driving
Status

Type of
Accident

Type of
Roads

Parties
involved

标的车与三者车高速公路行驶相撞，
两车受损
Insured car crashed into a third-party
car when driving on the highway.
Both cars were damaged.
标的车与障碍物高速公路行驶相撞，
本车受损
Insured car crashed into an obstacle
when driving on the highway.
Insured car was damaged.

双车事故

疏忽

行驶状态

撞伤

高速公路

车/车

Two cars
accident

Negligence Driving

Crashed

Highway

Car/Car

单车事故

疏忽

行驶状态

撞伤

高速公路车/障碍物

Single car
accident

Negligence Driving

Crashed

Highway Car/Object

AIML uses multi-task classification framework to achieve the goal of risk factor mining, wherein a
common backbone representative learning model is shared by the six test mining tasks. The advantage of
multi-task learning is that it could reduce computational complexity and cost of training, while taking into
account different levels of correlation between tasks. Específicamente, feature extraction layer is fully shared,
based on Bidirectional Encoder Representations from Transformers (BERT) pre-trained model, combined
with multi-task loss linear fusion fine tuning and Conditional Random Fields (CRF) method to achieve
multi-task learning. The multi-task model is shown in Figure 2, including input layer, encoding layer, fully
connection layer (FC layer), activation layer, CRF layer and output layer.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 2. Unstructured text data processing.

Data Intelligence

393

Auto Insurance Fraud Detection with Multimodal Learning

Primero, AIML treats the description of accident as input and chooses BERT (Chinese Version) as pre-trained
encoder. BERT could dynamically represent the meaning and relevance of characters in context by using
powerful multi-directional self-attention mechanism combined with self-supervised learning, so as to
construct a vector that represents the semantic feature of the whole sentence after weighted combination.
Además, BERT pre-trained model uses massive data including Wikipedia and other knowledge as
training corpus to ensure its applicability to insurance text. BERT could still achieve a rather nice classification
accuracy by adding a full connection layer to the output layer, even without fine-tuning of model parameters.

Segundo, AIML uses multiple classifiers to extract multi-event factors, taking the sequence vector output
by BERT as their inputs. A cost function is defined for each classification task and each task was considered
independent to each other. Parameters of the newly added FC layer and BERT sequence output layer are
tuned by multi-task loss linear fusion method.

Finalmente, based on the correlation between computing tasks, AIML uses CRF to calculate the maximum
joint probability for multiple classification results. CRF is most commonly used in the field of sequential
annotation in NLP, using joint probability to calculate the co-occurrence relationship between text and
annotation to optimize the overall accuracy of sequential annotation. Here we use a similar mechanism to
optimize the overall accuracy of multi-task prediction with a CRF layer. The original CRF must satisfy two
prerequisites: Exponentially distributed and. Only adjacent elements are correlated. The input for CRF is
the output sequence vector for multi-classification task, presented as:

(
P Y X

)

(
P y y
i

…
,
1

−

y X
,
0

)

1
( )
Z x

(

(
f y y
i

exp.

…
,
1

−

y X
,
0

)

(

y y
,
i

−

…
,
1

y X
,
0

)

(
h y X
i

)

(
g y y
i

−

)

+ … +

(
h y X
0

)

)
(
g y y
,
1

(1)

(2)

where P indicates probability function, Z indicates normalization factor, h indicates the mapping function
between single output and global input, g indicates the function for local correlation between output
elementos, y indicates the single output element and X indicates the global input.

3.3 Visual Data and Processing

Based on the scenarios of auto insurance fraud detection, this paper mainly focuses on three techniques,
a saber, Object Detection, Optical Character Recognition (OCR) and Pedestrian Re-identification (ReID).
We design a systematic approach for AIML, as shown in Figure 3, to process and extract risk factors from
visual data, es decir., photos and pictures of car accidents.

Raw visual data are stored in folders with case ID as folder names. The first step is to classify pictures
into seven categories, es decir., accident scene, car components, invoices, driver license, driving license, photos
of inspectors and cars and others. A ResNet classification model is trained on 413 cases with 1,392 Bueno
labeled pictures. Then AIML adapts the trained model to a much larger test set with 22,385 pictures to
make a rough classification as those 22,385 pictures were originally unlabeled. Flowing a manually fine
clasificación, all pictures with correct categories are used to re-train the classification model.

394

Data Intelligence

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Auto Insurance Fraud Detection with Multimodal Learning

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 3. Visual data processing.

The second step is to extract risk factors from each category of pictures. For pictures in categories of
accident scene and car components, a Yolov5 model is used to extract risk factors from photos to identify
damage conditions of cars and a ResNet model is used to extract scene information such as daytime or
nighttime. For pictures that contain text information, AIML uses OCR to recognize information from licenses
and invoices. For pictures that contains both investigators and cars, AIML uses ReID to identify different
investigators and check anomalies, es decir., if they appeared in previously detected fraud cases or if they appear
in multiple cases.

In the last step, factors extracted from visual data will be merged with structural data by case ID to

improve model performance.

4. RESULTADOS

En esta sección, we report our experimental results on a real-world auto insurance dataset. The results of
baseline model will be firstly presented. Entonces, risk factors extracted from text data and visual data will be
added and show the effectiveness of multi-modal learning in improving the fraud detection capability.

Data Intelligence

395

Auto Insurance Fraud Detection with Multimodal Learning

4.1 Dataset

Experimental data are collected and resampled from 4,946 auto insurance cases from November 10,
2014, to October 26, 2020, among which 3,613 are non-fraud cases and 1,333 are confirmed fraud cases.
Data are organized in a per-car basis, es decir., cases containing multiple car accidents are treated as multiple
data samples, indicated by a compound Case Unique ID (CaseUID, including both case ID and car plate).
Por lo tanto, number of samples in the entire dataset is slightly larger than the number of cases, es decir., incluido
5,034 non-fraud Case Unique IDs and 1,413 fraud Case Unique IDs respectively. There are in total 216
fields of data containing information collected from the case reporting stage, investigating stage and loss
verification stage. Variables with over 70% missing information will be excluded; variables with information
that are not suitable for fitting into XGB model will be excluded (p.ej. ID-type variable, names etc); solo
structural data, es decir., mainly categorical and numerical data are used in the baseline model.

4.2 Results of Baseline Models

After the original variables are pre-processed by SAFE, es decir., our self-designed feature selection and feature
interaction tool, there are in total 1,155 características, which are generated from the original 216 variables
combined, one-hot encoded, interacted, added and subtracted according to their types. One special
Boolean feature named ‘Compensation Type_Normal Case’ is excluded, because it is a huge giveaway in
predicting fraud cases. In order to evaluate the performance of model comprehensively, four criteria,
precisión, recordar, F1-score and Area Under the Curve (AUC) will be used to evaluate model performance.

Precision

True Positive
+

True Positive

False Positive

Recordar

True Positive
+

True Positive

False Negative

2 *

Precision Recall
+

Precision Recall

(3)

(4)

(5)

Todo 6,447 subjects were randomly separated into train and test set with the ratio of 80%/20% and all
1,154 features are fed to the XGB model. The trained model has an overall accuracy of 0.8364 con
precision equals to 0.7095, recall equals to 0.4441 and F1 score as 0.5462. The plots of the ROC and PR
curves are presented in Figure 4 y figura 5 respectivamente.

Based on the results for baseline model, the model performance is rather moderate in predicting fraud

behavior in auto insurance cases.

396

Data Intelligence

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Auto Insurance Fraud Detection with Multimodal Learning

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 4. ROC curve for baseline model.

Mesa 2. Feature importance baseline model.

Ranks

Extracted factors from text

Feature importance

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Type of Cases
Type of Compensation
Gender of Driver Unknown (T/F)
Case Only Contain Automobile Damage Insurance
Case Contain 4S Store (T/F)
Third Party Insurance
Province A
Valid Date for Auto Insurance Cases
Time Length for Damage Assessment
Case Contain Automobile Damage Insurance (T/F)
Province B
Province C
Object Damage (T/F)
Unknown Auto Insurance Type (T/F)
Province D

0.136462
0.068291
0.025964
0.022832
0.014042
0.012947
0.011575
0.010736
0.009672
0.009474
0.009463
0.008983
0.008516
0.007255
0.007206

Data Intelligence

397

Auto Insurance Fraud Detection with Multimodal Learning

4.3 Results of Unstructured Text Data Processing

Cifra 5. PR curve for baseline model.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

In order to extract information that is relevant to fraud detection from text data, we formulated five text
classification tasks, wherein each one is a multi-class classification task. For each task, we defined text
labels, es decir., 12 types of accidents, types of driving status, 11 types of cause of accident, 4 types of car
numbers and 5 types of roads. We manually labeled each accident description text with those five types
of labels. To simplify the effort of the labeling work. we firstly selected 750 relatively uncorrelated samples
and labeled them manually. The uncorrelation is achieved by clustering the texts and select text in different
grupos. Then for each type of a label within as single task, we ensure at least 35 samples from those 750
labeled data samples. Afterall, we achieved a small data set to train a coarse classfier for each of the five
tareas. Then the coarse classifier is used to categorize all text samples. Incorrect categorization results a
manually adjusted. Final classification results for those five tasks are shown in Table 3 abajo.

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Mesa 3. Precision in categorizing text data with BERT.

Criterion

Driving status

Type of accident

Type of roads

Cause of accident Number of cars

F1-Score

0.93

0.84

0.79

0.85

0.94

398

Data Intelligence

Auto Insurance Fraud Detection with Multimodal Learning

Five new features were generated from text data. After one-hot encoding, there are 45 new boolean
factores. The trained XGB model with factors extracted from text has overall accuracy of 0.8481 con
precision equals to 0.7473, recall equals to 0.4755 and F1 score as 0.5812.

According to Table 4, although the number 45, es decir., the number of features extracted and derived from
text data is relatively small compared to the original number of features, es decir., 1,154. There is significant
improvement in model performance. Both recall and F1 score increase by around 6–7%. The plots for ROC
and PR curves are presented in Figure 6 y figura 7 respectivamente.

Mesa 4. Model performance for baseline model and model with text factors.

Criterion

Accuracy

Precision

Baseline Model
Model with Text Factors
Increase

0.8364
0.8481
1.40%

0.7095
0.7473
5.33%

Recordar

0.4441
0.4755
7.07%

F1-Score

0.5462
0.5812
6.41%

AUC

0.8325
0.841
1.02%

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 6. ROC curve for model with text factors.

Data Intelligence

399

Auto Insurance Fraud Detection with Multimodal Learning

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 7. PR Curve for fraud detection with text factors.

The feature importance of partial extracted factors from text data is listed in Table 5, along with their

rankings.

Mesa 5. Feature importance for text factors.

Ranks

Extracted factors from text

Feature importance

24
30
48
53
61
73
95

Single_Car_Accident
Negligence
Transportation_Facility
Other_Minicars
Rear_End
Auxiliary_Buildings
Third-party_Responsibility

4.4 Results of Visual Data Encoding

0.005487
0.005257
0.004256
0.004139
0.003830
0.003349
0.002961

As mentioned in Section 3.3, the first task of visual data mining is the categorization of the raw data.
The accuracy of the automatic multi-label classification algorithm is listed in Table 6 for each category,
wherein most categories are well classified.

400

Data Intelligence

Auto Insurance Fraud Detection with Multimodal Learning

Mesa 6. Categories of visual data.

Categories

Training pictures

Classiﬁ cation
ResNet
Re-identiﬁ cation
ReID
Invoice recognition
OCR
OCR
Driver license Recognition
Driving license Recognition OCR
Day/Night recognition
Scene recognition
Car plate recognition
Car parts detection
Damage detection

ResNet
ResNet
Yolov5/LPRNet
Yolov5
Yolov5

97.00%
83.23%
86.58%
80.67%
76.37%
99.64%
62.67%
84.76%
62.39%
21.30%

7 Categories
79 persons
8 Características
9 Características
8 Características
Day or Night
5 Categories
NA
8 Parts
5 Categories

Detailed risk factors extracted from visual data are listed in Table 8.

22,385
453
65
403
436
3,321
3,321
3,983
8,059
11,023

Mesa 8. Visual risk factors descriptions.

High-risk Factors

Extracted Factors from Pictures

Algorithms

Procesando

Correlation between
investigators
Cost
Car brand
License type
Piedra
Cars involved
Recognition of damage
Location of damage
Daytime/Nighttime
Road condition

ReID 0/1

Cost of repair
Car brand
Car type
Boolean 0/1
Car count
Scratch, break, deformation etc.
Nearest car parts to damage spot
Boolean 0/1
Categorical variable

ReID

OCR

Recognition and matching between
multiple face images
Recognition of invoice
Recognition of driving license
Recognition of driver license

Yolov5 Object Detection
Object Detection
Object Detection
Object Detection
Environmental identiﬁ cation
Environmental identiﬁ cation

ResNet

Data Intelligence

401

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Auto Insurance Fraud Detection with Multimodal Learning

All factors are designed and defined based on previous expert knowledge and reports in detecting fraud
casos. Sin embargo, due to the quality of pictures, solo 10 variables are relatively complete (less than 30%
desaparecido) and were extracted from documentary pictures. After one-hot encoding for categorical variables,
there are 29 new features from visual data. The trained XGB model with factors extracted from text has
overall accuracy of 0.8736 with precision equals to 0.724, recall equals to 0.6107 and F1 score as 0.6625.

According to Table 9, we can see that there is a significant improvement in model performance with
these visual features. Both recall and F1-score increase by over 20% which may be because visual data
contain key information that is not included in structural data. The plots for ROC and PR curves are
presented in Figure 8 y figura 9 respectivamente.

Mesa 9. Model performance for baseline model and model with visual factors.

Criterion

Accuracy

Precision

Baseline model
Model with visual factors
Increase

0.8364
0.8837
5.66%

0.7095
0.7456
5.09%

Recordar

0.4441
0.6489
46.12%

F1-score

0.5462
0.6939
27.04%

AUC

0.8325
0.9288
11.57%

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 8. ROC curve for fraud detection model with visual factors.

402

Data Intelligence

Auto Insurance Fraud Detection with Multimodal Learning

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 9. PR curve for fraud detection model with visual factors.

In order to be more specific, some anonymised visual data are shown in Figure 10 to Figure 14.

Cifra 10. Car component pictures.

Data Intelligence

403

Auto Insurance Fraud Detection with Multimodal Learning

Cifra 11. Invoice pictures.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 12. Inspectors + Cars pictures.

Cifra 13. Driver and driving license pictures.

404

Data Intelligence

Auto Insurance Fraud Detection with Multimodal Learning

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 14. Accident scene pictures.

The rectangle annotation marks different parts of cars, damage on cars and inspectors hired by insurance

companies, which will be then converted to structural features as risk factors from visual data.

4.5 Results for Ensemble Model

Finalmente, we combine the high-risk factors extracted from both text data and visual data to our baseline

model in order to check the improvement of model performance brought by multi-modal data.

Mesa 10. Model performance for ensemble model.

Criterion

Accuracy

Precision

Recordar

F1-Score

AUC

Baseline model
Model with text factors
Model with visual factors
Ensemble model
Overall increase compared to baseline model

0.8364
0.8481
0.8837
0.8713
4.17%

0.7095
0.7473
0.7456
0.7143
0.68%

0.4441
0.4755
0.6489
0.6107
37.51%

0.5462
0.5812
0.6939
0.6584
20.54%

0.8325
0.841
0.9288
0.9344
12.24%

Data Intelligence

405

Auto Insurance Fraud Detection with Multimodal Learning

The results in Table 10 show that there is a substantial increase in model performance after adding factors
extracted from multi-modal data in auto insurance cases. Compared to baseline model, el desempeño
increases by 12.24% in AUC after adding 45 text features and 29 visual features. The ROC and PR curves
for ensemble model are presented in Figure 15 y figura 16 respectivamente.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 15. ROC curve for ensemble model.

4.6 Limitation Analysis

Although we have achieved rather nice model performance by adding factors extracted from the multi-
modal data, we still observe some limitations of the current scheme. En primer lugar, categorization of text data is
extremely imbalanced. Por ejemplo, main causes of accidents are driver’s fault and third-party’s responsibility,
while other causes, p.ej., bad weather, are not adequately present in current dataset. Además, el
consequences caused by driver’s fault are also imbalanced and varied. It is easy to be misclassified when
the consequence of one accident is semantically close to another. Examples are shown below.

When driving through the water section, vehicle flameout. Car damaged. —Single car accident
—Flooding —Driving

行驶到积水路段，车辆熄火。本车有损；—单车事故—水淹—行驶状态

406

Data Intelligence

Auto Insurance Fraud Detection with Multimodal Learning

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Cifra 16. PR curve for ensemble model.

2). Heavy rain flooded car in the parking lot. Car damaged. —Single car accident —Flooding —Parking
暴雨积水，车辆在停车场被淹。本车有损；—单车事故—水淹—停放

The cause of the first accident should be driver’s fault while the cause of the second accident should be
bad weather. Sin embargo, due to the imbalanced number of samples relevant to those two different causes in
the training data, the model may easily misclassify the cause of first case as weather.

También, due to the limitation of BERT (mainly refer to its adaptation of the specific context of car accident
escenarios), semantic ambiguity or limited data samples, the causal relationship or the sequence of multiple
elements in one sentence cannot be identified clearly. There are still a big portion of text data left unused.
Por ejemplo, the wounded information for drivers or people involved in the accident, the description of
injuries from doctors’ notes and traffic police report, etc..

Due to the poor quality for many visual data, solo 10 variables were extracted from visual data with
satisfactory accuracy. Pictures for some cases cannot be detected or recognized and thus lead to lots of
missing data, which will bring data leakage issue to some extent consequently. Because of the small quantity
and bias issue, the performance of damage detection model are limited as well. More visual data are needed

Data Intelligence

407

Auto Insurance Fraud Detection with Multimodal Learning

to train the fine-grained images or parts. Además, there should be a better way to annotate the visual
datos. Rectangle annotation is relatively rough when marking tiny or irregular damage. Semantic segmentation
is worth trying for the next step research.

6. CONCLUSION AND FUTURE WORK

en este documento, we ensemble a structural data feature engineering algorithm, a natural language processing
model and a processing framework for visual data together with a machine learning model to handle the
task of auto insurance fraud detection based on multi-modal data. We first design an auto insurance multi-
modal learning (AIML) framework to analyze multi-modal data collected during the auto insurance business.
With AIML, we can utilize multi-modal data efficiently and improve the model performance to predict auto
insurance fraud behavior. We also design a text mining algorithm and a framework to process visual data.
Both of them have achieved significant improvements in predicting fraud behavior. Experimental results
show the high quality of AIML, and the effectiveness of applying AIML to auto insurance fraud detection
on multi-modal data.

As we have achieved substantial increase in model performance based on multi-modal data mining
with real-world dataset, constructing a real-time system or pipeline will be an appealing topic for the
next step to introduce multi-modal data mining in auto insurance industry. One possible challenge
could be multi-modal big data. As the amount of data increases, there will exist a bottleneck for each
branch that processes different types of data. Potential solution may consider distributed system with
load balance between algorithms handling different types of data, p.ej., NLP for text data and CV for
visual data. Considering the potential of further performance improvement, one may consider using
knowledge graph to connect and represent multi-modal data in a more structured way.

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

ACKNOWLEDGEMENTS

This research was supported by “Research on intelligent Computing technology in Financial Risk Control
and Anti-fraud”, funding code 2020NF0AC01, Laboratorio de Zhejiang, leaded by Dr. Chongning Na. We thank our
colleagues from Zhejiang Lab who provided insight and expertise that greatly assisted the research.

We would also like to show our gratitude to the auto insurance companies for sharing their pearls of

wisdom and data with us during the course of this research.

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

CONTRIBUCIONES DE AUTOR

C.N. Na (ORCID: 0000-0003-2680-5774, na@zhejianglab.com) and J.X. Cual (ORCID: 0000-0002-
5055-8729, jiaxiyang@zhejianglab.com) conceived of the presented idea and designed the framework of
AIML system. J.X. Yang wrote the manuscript with the help of K. Chen (ORCID: 0000-0002-7925-9968,
chenkui@zhejianglab.com), k. Ding (ORCID: 0000-0003-4534-2904, dingkaid@zhejianglab.com) y

408

Data Intelligence

Auto Insurance Fraud Detection with Multimodal Learning

METRO. Wang (ORCID: 0000-0002-2293-1709, meng.wang@seu.edu.cn). J.X. Cual, k. Chen and K. Ding
designed and carried out the experiment. C.N. Na and M. Wang proofread and edited the manuscript. Todo
authors provided critical feedback and helped shape the research and analysis.

REFERENCIAS

[1] Yu, w., feng, GRAMO., zhang, w.: A Research on Fraud Detectio n System and Gang Identification of Vehicle

Insurance. Insurance Studies (2), 63–73 (2017)

[2] The Joint Research Team on Anti-Vehicle Frauds. A Research on Vehicle Insurance Frauds and Anti-fraud

Issues and Regulatory Suggestions. Insurance Studies (06), 3–10 (2021)

[3] Viaene, S., Dedene, GRAMO., Derrig, R.A.: Auto claim fraud d etection using Bayesian learning neural networks.

Expert Systems with Applications 29(3), 653–666 (2005)

[4] Kasˇc´elan, l., Kasˇc´elan, v., Novovic´-Buric´, METRO.: A Data Min ing Approach for Risk Assessment in Car Insurance:
Evidence from Montenegro. International Journal of Business Intelligence Research (IJBIR) 5(3), 11–28 (2014)
li, y., yan, C., Liu, w., li, METRO.: A principle component analysi s-based random forest with the potential nearest
neighbor method for automobile insurance fraud identification. Applied Soft Computing 70, 1000–1009
(2018)

[5]

[6] Él, X., Chua, T.S.: Neural factorization machines for sparse p redictive analytics. In Proceedings of the 40th
International ACM SIGIR conference on Research and Development in Information Retrieval, páginas. 355–364
(2017, Agosto)

[7] guo, J., Liu, GRAMO., Zuo, y., Wu, J.: Learning sequential behavior representations for fraud detection. En 2018

IEEE international conference on data mining (ICDM), páginas. 127–136 (2018, Noviembre)

[8] Wang, r., Fu, B., Fu, GRAMO., Wang, METRO.: Deep & cross networ k for ad click predictions. En Actas de la

ADKDD’17, páginas. 1–7 (2017)

[9] David, METRO., Jemna, D.V.: Modeling the frequency of auto insurance claims by means of poisson and negative
binomial models. Analele stiintifice ale Universitatii “Al. I. Cuza” din Iasi. Stiinte economice/Scientific
Annals of the “Al. I. Cuza” (2015)

[10] Wang, y., Xu, w.: Leveraging deep learning with LDA-bas ed text analytics to detect automobile insurance

fraud. Decision Support Systems 105, 87–95 (2018)

[11] Subudhi, S., Panigrahi, S.: Use of optimized Fuzzy C-Means clus tering and supervised classifiers for
automobile insurance fraud detection. Journal of King Saud University-Computer and Information Sciences
32(5), 568–575 (2020)

[12] Majhi, S.K.: Fuzzy clustering algorithm based on modified whale optimization algorithm for automobile

insurance fraud detection. Evolutionary intelligence 14(1), 35–46 (2021)

[13] Tuo, GRAMO., Duan, J.: Game Theory Analysis of Insurance Fraud. Jou rnal of Capital University of Economics and

Negocio (3), 51–54 (1999)

[14] Liu, X., Jin, J.: The Insurance Fraud Game and Insurance Contrac t Based on Optimal Game Strategies. Sistemas

Engineering—Theory & Practice 24(2), 19–24 (2004)

[15] Gui, PAG., Hu, P.: A Literature Review of Auto Insurance Moral Haz ard at Home and Abroad. Insurance Studies

(6), 121–127 (2011)

[16] zhao, GRAMO., Wu, h.: Is There Moral Hazard in Chinese Automobile I nsurance Market?—Evidence from Dynamic

Renewal Policies. Journal of Financial Research (6), 175–188 (2010)

[17] Espiga, J., Mo, y.: Construction of Auto Insurance Anti-fraud Sy stem Based on Data Mining Technology. Diario

of the Postgraduate of Zhongnan University of Economics and Law (5), 80–87 (2013)

Data Intelligence

409

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Auto Insurance Fraud Detection with Multimodal Learning

[18] Wang, h.: A Research on Chinese Insurers’ Moral Hazard Screening in Operation: From the Big Data Hadoop

Clustering Analysis Technology Perspective. Insurance Studies (2), 59–67 (2016)

[19] yan, C., li, y., Sol, h.: A Research on Automobile Insurance Fr aud Identification Based on Random Forest

Model and Ant Colony Optimization Algorithm. Insurance Studies (6), 114–127 (2017)

[20] yan, C., li, METRO., zhou, X.: Improved genetic algorithm for vehic le insurance fraud identification model based
on BP neural network. Journal of Shandong University of Science and Technology (Natural Science) 38(5),
72–80 (2019)

[21] Xu, X., Wang, Z., Wang, METRO.: An Empirical Study of Auto Insuranc e Fraud Identification Model Based on Deep

Learning Technology. Shanghai Insurance (8), 53–58 (2019)

[22] li, METRO., Zareian, A., lin, y., Cacerola, X., Whitehead, S., Chen, B., … & Freedman, METRO.: GAIA: A fine-grained
multimedia knowledge extraction system. In Proceedings of the 58th Annual Meeting of the Association for
Ligüística computacional: Demostraciones del sistema, páginas. 77–86 (2020, Julio)

[23] zhang, B., lin, y., Cacerola, X., Lu, D., Puede, J., Caballero, K., Ji, h.: Elisa-edl: A cross-lingual entity extraction, linking
and localization system. In Proceedings of the 2018 Conference of the North American Chapter of the
Asociación de Lingüística Computacional: Demonstrations, páginas. 41–45 (2018, Junio)

[24] li, METRO., lin, y., Aspiradora, J., Whitehead, S., Voss, C., Dehghani, METRO., Ji, h.: Multilingual entity, relation, evento
and human value extraction. En Actas de la 2019 Conference of the North American Chapter of the
Asociación de Lingüística Computacional (Demonstrations), páginas. 110–115 (2019, Junio)

[25] Ngiam, J., Khosla, A., kim, METRO., Nam, J., Sotavento, h., Ng, A.Y.: Mu ltimodal deep learning. In ICML, páginas. 689–696

(2011, Enero)

[26] Wang, D., Cual, PAG., Ou, METRO., Zhu, w.: Learning compact hash codes for multimodal representations using

orthogonal deep structure. IEEE Transactions on Multimedia 17(9), 1404–1416 (2015)

[27] Peng, Y.X., Zhu, W.W., zhao, y., Xu, C.S., Huang, Q.M., Lu, H.Q., … & gao, w.: Cross-media analysis and
reasoning: advances and directions. Frontiers of Information Technology & Electronic Engineering 18(1),
44–57 (2017)

[28] Wu, Z., Zheng, C., Cai, y., Chen, J., Leung, H.F., li, P.: Mul timodal Representation with Embedded Visual
Guiding Objects for Named Entity Recognition in Social Media Posts. In Proceedings of the 28th ACM
International Conference on Multimedia, páginas. 1038–1046 (2020, Octubre)

[29] Mousselly-Sergieh, h., Botschen, T., Gurévich, I., Roth, S.: A multimodal translation-based approach for
knowledge graph representation learning. In Proceedings of the Seventh Joint Conference on Lexical and
Computational Semantics, páginas. 225–234 (2018, Junio)

[30] Xie, r., Liu, Z., Luan, h., Sol, METRO.: Image-embodied knowledge r epresentation learning. En procedimientos de

the 26th International Joint Conference on Artificial Intelligence, páginas. 3140–3146 (2017, Agosto)

[31] Chen, l., li, Z., Wang, y., Xu, T., Wang, Z., Chen, MI.: MMEA: E ntity Alignment for Multi-modal Knowledge
Graph. In International Conference on Knowledge Science, Engineering and Management, páginas. 134–147
Saltador, cham. (2020, Agosto)

[32] guo, h., Espiga, J., Zeng, w., zhao, X., Liu, l.: Multi-modal Ent ity Alignment in Hyperbolic Space.

Neurocomputing 461(1), 598–607 (2021)

[33] Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting syste m. In Proceedings of the 22nd acm sigkdd

international conference on knowledge discovery and data mining, páginas. 785–794 (2016, Agosto)

410

Data Intelligence

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Auto Insurance Fraud Detection with Multimodal Learning

AUTHOR BIOGRAPHY

Jiaxi Yang, postdoctoral researcher of Financial Technological Research
Center at Zhejiang Lab. He received his M.S/Ph.D. in Statistics, en 2016/2021
from Columbia University and B.A. in Economics in 2014 from Chu Kochen
Honors College, Zhejiang University. He was enrolled in the international
postdoctoral exchange & introduction program funded by China Postdoctoral
Science Foundation. His research interests include causal inference, machine
learning and multi-modal learning, and applications in area such as finance,
economics and education. He is now focusing on the research of financial
risk management, multi-modal data generation/imputation and improving
model robustness.
ORCID: 0000-0002-5055-8729

Kui Chen, a postdoctoral fellow in Fintech Research Center of Zhejiang Lab.
The main research interests include theoretical analysis and research of
machine learning, model construction and optimization, data governance,
automatic feature engineering and scenario solution construction. Él
graduated from Shanghai University majoring in mathematics and applied
Mathematics with a doctor’s degree. During the doctoral period, la investigación
work focused on dimension reductions and integrabilities of high-dimensional
semi-discrete integrable system, and completed seven SCI academic papers,
two of which were listed as highly cited papers by Web-of-Scicence. Después
graduation, he went to the School of Mathematics and Science of Fudan
University to engage in full-time postdoctoral research. The main research
content is the algebraic structure of constrained high-dimensional semi-
discrete integrable systems.
ORCID: 0000-0002-7925-9968

Ding Kai is currently a senior researcher of Financial Technology Center in
ZheJiang Lab. He received his Ph.D. degree in School of Automation Science
and Electronic Engineering at Beihang University, Porcelana. Sus intereses de investigación
include multimedia information retrieval, nature language process, and deep
aprendiendo.
ORCID: 0000-0003-4534-2904

Data Intelligence

411

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

mi
d
tu
d
norte

t
/

a
r
t
i
C
mi
–
pag
d

F
/

5
2
3
8
8
2
0
8
9
8
3
4
d
norte
_
a
_
0
0
1
9
1
pag
d

b
y
gramo
tu
mi
s
t

oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3

Auto Insurance Fraud Detection with Multimodal Learning

Chongning Na received his B.E. degree in 2002 from Tsinghua University,
M.S. and Ph.D. degrees in Electrical Engineering, en 2005 y 2010 de
the Technical University of Munich. He is now a research expert at FinTech
Research Center, Zhejiang Lab. His research interests include machine
aprendiendo, probabilistic inference, graphical models and their applications in
various areas e.g., industrial automation, telecommunications and FinTech.
He was with Siemens Corporate Research and NTT DOCOMO Research Lab,
and contributed to the R&D of ProfiNet, 4G/5G physical layers in the aspects
of high-performance algorithm development and associated standardization,
p.ej., IEEE 1588, 4G LTE/LTE-A, 5G NR. He is now focusing on the research
of intelligent computing technologies in financial risk management, usando
multi-modal-machine-learning based and knowledge-based financial data
mining techniques.
ORCID: 0000-0003-2680-5774

D
oh
w
norte
oh
a
d
mi
d

F
r
oh
metro
h

t
t

pag

:
/
/

d
i
r
mi
C
t
.

metro

i
t
.

Meng Wang is an assistant professor in the Knowledge Graph & AI Research
Group, School of Computer Science and Engineering, Southeast University,
Porcelana. He obtained the doctoral degree from the Department of Computer
Science and Technology, Xi’an Jiaotong University in 2018. He was a visiting
scholar in the DKE lab at University of Queensland, Australia in 2016. Su
research area is in the knowledge graph (KG), semantic search, NLP, y
cross-modal data.
ORCID: 0000-0002-2293-1709