An example of Outlier Detection using Quantum Machine Learning¶
import pennylane as qml
from pennylane import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import confusion_matrix, f1_score, recall_score, precision_score, roc_auc_score
from imblearn.under_sampling import RandomUnderSampler
import keras
from keras.models import Sequential
from keras.layers import Dense
import time
import warnings
import matplotlib.colors
warnings.filterwarnings('ignore')
import pygrnd
from pygrnd.qc.qml import QNNClassifier, QNNRegressor, one_epoch, accuracy_loss
jos_palette = ['#4c32ff', '#b332ff', '#61FBF8', '#1E164F', '#c71ab1ff']
sns.set_palette(jos_palette)
Introduction¶
Quantum Machine Learning¶
In this notebook, we look at how **parametrized quantum circuits** can be used as neural networks to perform a classification task. Using an openly accessible **credit card fraud detection** published on Kaggle and the quantum simulations tools provided by **pennylane**, we look at how quantum machine learning models can perform on a relatively small example compared to their classical counterparts.
Machine learning techniques and especially neural networks have drawn much attention in the past few years and shown interesting results in a variety of domains (e.g. image or speech recognition). Their power lies in their ability to fit a broad range of functions and express interesting relations between variables. Quantum Neural Networks harness the properties of quantum mechanics to span a much larger space which could bring a **potential advantage** compared to the classical models.
However, the development of quantum computers is at an early stage where no relevant advantages over classical computers have been obtained for now. Because of the few number of available qubits and other constraints (such as noise or decoherence), the current quantum machine learning models rely on an **hybrid approach** where the quantum circuit is interfaced with a classical computer. The classical machine takes on tasks that are not currently feasible on a quantum processor. This helps save precious quantum resources.
Binary Classification¶
The problem considered here can be seen as a **supervised** classification task. Given an input space $X$ of features and an output space $Y$. We seek a function $g: X \rightarrow Y$ that is able to **approximate** the output.
Here there are only two possible outcomes $Y=\{\text{fraud},\text{ non-fraud}\}$ or $Y=\{0,1\}$. This is called a **binary classification** problem. $Y$ can also have more outcomes (multiclass classification) or be continuous (regression).
The model is a quantum circuit with **parameters** $\Theta$. It acts as function $f_\Theta:x\rightarrow \hat{y}=f_\Theta(x)$. The circuit can be viewed as a unitary $\hat{U}_\theta(x)$ and its output $f_\Theta(x)$ as the expectation value of an Hamiltonian $H$. For an input state $\ket{0^{\otimes n}}$, $f_\Theta(x)=\bra{\psi}H\ket{\psi}$ where $\ket{\psi} = \hat{U}_\theta(x)\ket{0^{\otimes n}}$.
During the learning process, the model is fed with **labeled data** $x_i$ for which the output $y_i$ is known. A chosen loss function $L(y_i, \hat{y_i})$ is minimized through a classical **optimization** process (such as gradient descent) on parameters $\Theta$. The loss function estimates how close the model predictions are from the real data. The performances of the model are then evaluated on unseen data.
Credit Card Fraud Data¶
The dataset is taken from Kaggle and can be found here.
It contains two days of european credit card transactions. It is highly **imbalanced** since only 0.172% (492 out of 284 707) of these transactions are frauds. Except for the time and amount of the transaction, the other 28 features are numerical values that come from a PCA transformation. The original features are not provided because of confidentiality issues.
The path to creditcard.csv
file must be filled in the filepath
variable so the data can be accessed by the notebook
filepath = ''
df = pd.read_csv(filepath)
Data Exploration¶
Because of the few number of qubits available in the current quantum computers or the computational resources needed to simulate large circuits on a quantum feature, we want to **reduce** the number of considered **features** to a smaller number. To do that, we take a first look at the data.
df = df.sample(frac=1)
df_1 = df[df['Class'] == 1] #Fraud transactions
df_0 = df[df['Class'] == 0] #Regular transactions
df_sample = pd.concat([df_1, df_0[:len(df_1)]])
Distribution of the features¶
A first way to select relevant features is to look at how the two classes (1 for frauds and 0 for regular transactions) are distributed. For a given feature, if both **distributions** are similar then it might not be an interesting feature to consider. Conversely, huge disparities between the distributions could help tell apart the two classes.
For computational reasons, we randomly select the same number of regular as fraud transactions (492) to obtain the following distributions. This **undersampling** method will also be used when training the quantum model.
f, axes = plt.subplots(nrows=5, ncols=6, figsize=(32,28))
for k, col in enumerate(df.columns[:-1]):
i, j = k//6, k%6
sns.boxplot(y=col, x='Class', data=df_sample, ax=axes[i, j])
axes[i, j].set_title(col)
Correlation between the features¶
To reduce the number of features, we can also look at the **correlation** matrix. If two features are highly correlated then we can maybe keep only one of the two.
norm = matplotlib.colors.Normalize(-1,1)
colors = [[norm(-1.0), jos_palette[1]],[norm(0), "white"],[norm(1.0), jos_palette[0]]]
cmap = matplotlib.colors.LinearSegmentedColormap.from_list("", colors)
plt.figure(figsize=(8,6))
sns.heatmap(df_sample.corr(method='pearson'), cmap=cmap, vmin=-1, vmax=1)
plt.show()
Data Preparation¶
We reduce the number of features to 4 so that the data can be encoded in a PQC that can be simulated. The 4 selected features are V14
, V4
, V12
and Amount
The first three features are already **normalized** through the PCA. We also normalize Amount
in a way that is robust to outliers.
keep_cols = ['V14', 'V4', 'V12', 'Amount', 'Class']
df = df[keep_cols]
rob_scaler = RobustScaler()
df['Amount'] = rob_scaler.fit_transform(df['Amount'].values.reshape(-1, 1))
X = df.drop('Class', axis=1)
y = df['Class']
The data is separated in a **training** (80%) and **testing** (20%) set.
Because the data is highly **imbalanced**, the training set is undersampled randomly (we randomly select a number of regular credit card transactions so that it matches the number of fraudulent transactions). There could have been other ways to deal with the imbalanced property of the data but this one allows a **training time** of the quantum model that isn't too long. This new training set is (X_rus, y_rus)
.
The testing set (X_test, y_test)
is left untouched
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
rus = RandomUnderSampler()
X_rus, y_rus = rus.fit_resample(X_train, y_train)
The classical data will have to be encoded within the **quantum state** of the parametrized quantum circuit. Another normalization $f:x\rightarrow\frac{2}{1+e^{-x}}-1$ is applied so that the values fall in the $[-1,1]$ range. From that value is derived an **angle** $\theta = \text{arccos}(x)$ in the Bloch sphere that will enable the encoding of the data in the quantum system using **single qubit rotations**.
X_rus_, y_rus = X_rus.to_numpy(), y_rus.to_numpy()
norm = lambda x: 2/(1+np.exp(-x)) - 1
X_rus = norm(X_rus_)
X_test_, y_test = X_test.to_numpy(), y_test.to_numpy()
X_test = norm(X_test_)
Quantum Neural Network Classifier¶
Parametrized Quantum Circuit¶
A parametrized quantum circuit (PQC) is essentially defined by three components :
- **Feature Map** : the part of the circuit that encodes the classical data in a superposed quantum state
- **Ansatz** : the variational part of the circuit, which is built of parametrized gates and entangling layers. It transforms the input state so it fits the output data.
- **Measurement** : yields the output through the expectation value of an hamiltonian or the readout of qubits.
Many architectures exist and provide different expressibility or entangling capacities. The chosen architecture here is the following :
Feature Map¶
The **encoding circuit** we choose if a product feature map composed of single qubit rotations. The number of qubits in the circuit is the same as the number of classical values to encode and each value has its own qubit. The feature map can be viewed as a unitary :
\begin{equation*} U_{\phi(x)} = \bigotimes_{i=1}^n\,R_{\alpha,i}(\phi(x_i)) \end{equation*}Choosing $\phi(x) = \text{arccos}(x)$, we have :
\begin{equation*} R_{z}(arccos(x))\ket{0} = cos(arccos(x))\ket{0} - isin(arccos(x))\ket{1} = x\ket{0} - i\sqrt{1-x^2}\ket{1} \end{equation*}We can thus obtain the following **superposition** :
\begin{equation*} U_{product}(x)\ket{0^{\otimes n}} = \bigotimes_{j=1}^n\,(x_j\ket{0} - i\sqrt{1-x_j^2}\ket{1}) \end{equation*}Ansatz¶
The ansatz is the **variational** part of the circuit, it can be defined as a unitary $U_\Theta$ where $\Theta = (\theta_i)_{i\in {1...m}} \in \mathbb{C}^m$ is the set of parameters that is **optimized** during the learning process. Each of these parameters is associated to a particular quantum gate in the ansatz circuit.
The so-called **hardware efficient ansatz** used here was originally designed for VQE (Variational Quantum Eigensolver). It is made of successive layers of single qubits rotations along the three axis of the bloch sphere and a ring of entangling CX gates.
Measurement¶
While **expectation values** of complex Hamiltonians can be used for the output of the PQC, we only measure the first qubit in the Z basis here.
Chosen architecture¶
The architecture described above is here defined within Pennylane. Because the number of features is 4, we will also used 4 qubits. The plotted circuit has two successive layers of ansatz.
# Feature Map
def rx_encoding(x, wires):
for i, qb in enumerate(wires):
qml.RX(np.arccos(x[i%len(x)]), wires=qb)
# Ansatz
def hea_layer(params, wires):
for qb in wires:
qml.RX(params[qb, 0], wires=qb)
qml.RY(params[qb, 1], wires=qb)
qml.RZ(params[qb, 2], wires=qb)
qml.broadcast(qml.CNOT, wires=wires, pattern="ring")
n_qubits = 4
n_layers = 2
dev = qml.device('default.qubit', wires=n_qubits)
# Parametrized Quantum Circuit definition
@qml.qnode(dev)
def pqc(weights, x):
rx_encoding(x, wires=list(range(n_qubits))) # Feature Map
for params in weights:
hea_layer(params, wires=list(range(n_qubits))) # Ansatz
return qml.expval(qml.PauliZ(0)) # Measurement
model = QNNClassifier(pqc, [n_layers, n_qubits, 3])
model.display_circuit()
plt.show()
(<Figure size 1800x500 with 1 Axes>, <Axes:>)