A Quantum Implementation Model for Artificial Neural Networks

The learning process for multi layered neural networks with many nodes makes heavy demands on computational resources. In some neural network models, the learning formulas, such as the Widrow-Hoff formula, do not change the eigenvectors of the weight matrix while flatting the eigenvalues. In infinity, this iterative formulas result in terms formed by the principal components of the weight matrix: i.e., the eigenvectors corresponding to the non-zero eigenvalues. In quantum computing, the phase estimation algorithm is known to provide speed-ups over the conventional algorithms when it is used for the eigenvalue related problems. Therefore, it is appealing to ask whether we can model such learning formulas in quantum computing and gain a computational speed-up. Combining the quantum amplitude amplification with the phase estimation algorithm, a quantum implementation model for artificial neural networks using the Widrow-Hoff learning rule is presented. In addition, the complexity of the model is found to be linear in the size of the weight matrix. This provides a quadratic improvement over the classical algorithms.


I. INTRODUCTION AND BACKGROUND
Artificial neural networks (ANN) [1]- [3] are adaptive statistical models which mimic the neural structure of the human brain to find optimal solutions for multivariate problems. In the design of ANN, the followings are determined: the structure of the network, input-output variables, local activation rules, and a learning algorithm. Learning algorithms are generally linked to the activities of neurons and describe a mathematical cost function. Often, a minimization of this cost function composed of the weights and biases describes the learning process in artificial neural networks. Moreover, the learning rule in this process specifies how to update synaptic weights at each iteration. In general, learning rules can be put in two different categories: supervised and unsupervised learning. While the distance between the response of the neuron and a specified response, called target t, is considered in supervised learning rules, it is not required in unsupervised learning rules. Hebian learning rule is a typical example of the unsupervised learning, in which the weights at the (j + 1)th iteration is updated by the following rule: Here, η is a positive learning constant. On the other hand, Widrow-Hoff learning rule, which is the main interest of this paper, illustrates a typical supervised learning rule [2]- [5]: A. Daskin is with the Computer Engineering Department, Istanbul Medeniyet University, Istanbul, Turkey, email: adaskin25@gmail.com If we consider multi-neurons; the input, the output, and the target values become vectors: viz., respectively, v, y and t. Then, the above equations come in matrix forms as follows: It is known that the learning task for multi layered neural networks with many nodes makes heavy demands on computational resources. Reducing the size of a transistor, the computational power of digital computers has been successfully increased for decades in accordance with the Moore's law. However, even if we can continue on this reduction, it is predicted to be still impossible to obtain exact features of big quantum physical systems on digital computers based on classical physics for the next decades [6]. This physical limitations lead scientists to design a computational model based on quantum physics by using the quantum phenomena such as entanglement and superposition, named quantum computation [7]- [9]. Algorithms in this computational model provide computational speed-up over their classical counterparts for some particular problems: e.g., Shor's factoring algorithm [10] and Grover's search algorithm [11]. Quantum computing is also considered so as to develop a quantum analog of the artificial neural networks: e.g. [12]- [17]. These models should not be confused with the classical algorithms [18], [19] inspired by the quantum computing. Furthermore, using the Grover search algorithm [11], a quantum associative memory is introduced [20]. To the best of our knowledge, the common idea in these works is the use of features of quantum information processing to describe a quantum analog of artificial networks particularly by relating the neurons in the networks with qubits [21].
The quantum phase estimation algorithm (PEA) [22], [23] provides computational speed-ups over the known classical algorithms in eigenvalue related problems. The algorithm mainly finds the phase value of the eigenvalue of a unitary matrix (considered as the time evolution operator of a quantum Hamiltonian) for a given approximate eigenvector. Because of this property, it is very common to use PEA as subcomponent of other algorithms. While in the general case, PEA requires a good initial estimate of an eigenvector to produce the phase; in some cases, it is able to find the phase by using an initial equal superposition state: e.g., Shor's factoring algorithm [10]. In addition, in a prior work [24], we have described how to obtain eigenvalues in a given interval and corresponding eigenvectors from an initial equal superposition state. In that work, a linear combination of the eigenvectors is obtained, in which the coefficients of the eigenvectors are determined by the sum of the coefficients of each eigenvector: i.e., the overlap between the initial equal superposition state and the eigenvector.
For a given weight matrix W ; it is known that in networks using the Widrow-Hoff learning rule, while the eigenvalues goes to one, the eigenvectors does not change: i.e., lim j→∞ W [j] converges to QQ T , where Q represents the eigenvectors of W . Therefore, for a given input x, the considered network produces the output QQ T x. In this paper, we present a quantum implementation model for the artificial neural networks by employing the algorithm in Ref. [24]. In particular, we show how to construct QQ T x on quantum computers in linear time. Furthermore, we discuss all the implementation details and show solutions for the possible issues such as the circuit implementation of W , the preparation of the input x as a quantum circuit, and determining the number of iterations in the algorithm. We also show a random numerical example.

II. METHODS AND BACKGROUND
In this section, we shall describe the Widrow-Hoff learning rule and the quantum algorithms used in the paper.

A. Widrow-Hoff Learning
Widrow-Hoff learning rule given in Eq.(4), also known as LMS algorithm, in matrix form can be described as follows [2]- [5]: This can be also expressed by using the eigendecomposition of W , is called the eigenvalue matrix at the epoch j. Based on this formulation, it is shown that Widrow-Hoff error correction rule flattens the eigenvalues. and does not affect the eigenvectors: i.e., lim j→∞ Φ [j] = I. At the end of a learning process, the eigenvalues of W is sphericized:

B. Quantum Algorithms Used in the Model
Here, we first give a very brief introduction to quantum computing and explain the quantum algorithms used in this paper (For a broad introduction, please refer to the first couple of chapters of Ref. [23]). Quantum computers encode the information by using vectors represented in Dirac notation: . These vectors represent two distinguishable states of a quantum system called qubit. In general, the state of a qubit can be represented as a superposition of the two-distinguishable-basis states: In this representation, |ψ is the state of the considered qubit with α 0 and α 1 complex amplitudes. Here, |α 0 | 2 and |α 1 | 2 also describe the probabilities to see the qubit in, respectively, |0 and |1 states after a measurement. The number of possible states describing a quantum system grows exponentially with the number of qubits (particles) involved in the system. For instance, a system with two qubits can be expressed in the following form by using the four distinguishable states: where |0 = |0 ⊗ |0 = |00 which is the first vector and so |i is the ith vector of the standard basis (⊗ is the Kronecker tensor product.). In a similar fashion, |α i | 2 gives the probability of measuring the system in |i state. Changing the state of a qubit corresponds to some computation, operation, applied to the qubit. In general, a quantum operation can be mathematically represented by a unitary matrix. This matrix takes the state vector into another state vector; therefore, the quantum operations can be summarized as a linear transformation of the amplitudes, the complex vector elements: Here, U is a 4x4 unitary matrix describing a quantum operation applied to the state in Eq. (7). In circuit terms, with the lines representing the qubits evolving in time. In addition, quantum circuits allow the realization of conditionals operations:i.e., if some qubits in |ψ apply U to the other qubits. For instance, the following circuit implements the conditional "If the first qubit, the control qubit, is in |1 , then apply the quantum gate U to the second qubit, the target qubit; else do nothing".
Engineering the matrix elements of a quantum operator, unitary matrix U , along with using the entanglement and superposition phenomena, different quantum algorithms can be designed. In the following, we shall explain two well-known quantum algorithms, then describe how they are used in Ref. [24] to obtain the linear combination of the eigenvectors.

1) Quantum Phase Estimation Algorithm:
The phase estimation algorithm (PEA) [22], [23] finds an estimation for the phase of an eigenvalue of a given operator. In mathematical terms, the algorithm seen in Fig.1 as a circuit works as follows: • An estimated eigenvector |ϕ j associated to the jth eigenvalue e iφj of a unitary matrix, U of order N is given.
-Here, U generally represents a time evolution of the Hamiltonian (H) representing the dynamics of the quantum system: where t represents the time and is the Planck constant. As a result, while e iφj is the eigenvalue of U , its phase φ j is the eigenvalue of H.
• The algorithm uses two quantum registers dedicated to the eigenvalue and the eigenvector: respectively, |reg 1 and |reg 2 with m and (n = log 2 N ) number of qubits. The initial state of the system is set to |reg 1 |reg 2 =|0 |ϕ j , where |0 is the first standard basis vector. • Then, the quantum Fourier transform given in Appendix A is applied to |reg 1 , which produces the following equal superposition state: where M = 2 m and |k is the kth standard basis vector. • For each kth qubit in the first register, a quantum operator, U 2 k−1 , controlled by this qubit is applied to the second register. This operation leads the the first register to hold the discrete Fourier transform of the phase, φ j . • The inverse quantum Fourier transform on the first register produces the binary digits of φ j . • Finally, the phase is obtained by measuring the first register. 2) Quantum Amplitude Amplification Algorithm: If a given quantum state |psi in N-dimensional Hilbert space can be rewritten in terms of some orthonormal states considered as the "good" and the "bad" parts of |ψ as: then amplitude amplification technique [11], [25], [26] can be used to increase the amplitude of |ψ good in magnitude while decreasing the amplitude of |ψ bad . The technique mainly consists of two parts: the marking and the amplifying implemented by two operators, respectively U f and U ψ . Here, U f marks-flips the sign of-the amplitudes of |ψ good and does nothing to |ψ bad . U f can be implemented as a reflection operator when |ψ good and |ψ bad are known: where I is an identity matrix. In the amplification part, the marked amplitudes are amplified by the application of the operator U ψ : To maximize the probability of |ψ good , the iteration operator times applied to the resulting state.

C. Quantum principal component analysis
In Ref. [24], we have shown that combining PEA with the amplitude amplification, one can obtain eigenvalues in certain intervals.
In the phase estimation part, the initial state of the registers is set to |0 |0 . Then, the second register is put into the equal superposition state 1/ √ N (1, . . . , 1) T . The phase estimation process in this input generates the superposition of the eigenvalues on the first and the eigenvectors on the second register. In this final superposition state, the amplitudes for the eigenpairs are proportional to the norm of the projection of the input vector onto the eigenvector: i.e., the normalized sum of the eigenvector elements. This part is represented by U P EA and also involves the input preparation circuit, U input , on the second register.
In the amplification part, first, U f is applied to the first register to mark the eigenvalues determined by the binary values of the eigenvalues: For instance, if we want to mark an eigenvalue equal to 0.25 in |reg 1 with 3 qubits, we use U f = I − 2 |010 010| since the binary form of 0.25 is (010). The amplitudes of the marked eigenvalues are then amplified by the application of U ψ with |ψ representing the output of the phase estimation: Using the above equation, U ψ can be implemented as: where U 0 = I − 2 |0 0|. The amplitudes of the eigenvalues in the desired region are further amplified by the iterative application of the operator G = U ψ U f . At the end of this process, a linear combination of the eigenvectors with the coefficients determined by the normalized sum of the vector elements of the eigenvectors are produced. In the following section, we shall show how to apply this process to model the implementation of the neural networks based on the Widrow-Hoff learning rule.

III. APPLICATION TO THE NEURAL NETWORKS
Since the weight matrix in Widrow-Hoff learning rule converges to the principal components in infinity [5]: i.e., W [∞] = QQ T , the behavior of the trained network on some input |x can be concluded as: Our main purpose is to find an efficient way to implement this behavior on quantum computers by using the quantum principal component analysis. For this purpose, we form U f in a way that marks only the non-zero eigenvalues and their corresponding eigenvectors: For zero eigenvalues ( in binary form (0 . . . 0) ), the first register is in |0 = (1, 0, 0 . . . , 0, 0) T state. Therefore, we need to construct a U f which "marks" the nonzero eigenvalues and does nothing to |0 . This can be done by using a vector |f in the standard basis which has the same non-zero coefficients for the all basis states except the first one: Here, µ is a normalization constant equal to 1 √ M−1 . U f does nothing when the first register in |0 state; however, it does not only flip the signs but also changes the amplitudes of the other states. Then, U ψ is applied for the amplification of the marked amplitudes. The iterative application of U ψ U f results a quantum state where the amplitude of |0 becomes almost zero  and the amplitudes of the other states becomes almost equal. At this point, the second register holds QQ T |x which is the expected output from the neural network. This is explained more below.

A. Details of the Algorithm
Here, we assume that U = e iW t is given: Later, in Sec.III-D, we shall also discuss how U may be obtained as a quantum circuit from a given W matrix. Fig.2 shows the algorithm as a quantum circuit where the dashed lines indicates an iteration in the amplitude amplification. At the beginning, U P EA is applied to the initial state |0 |0 . U P EA includes also an input preparation circuit, U input , bringing the second register from |0 state to the input |x . U P EA generates a superposition of the eigenvalues and associated eigenvectors, respectively, on the first and the second registers with the amplitudes defined by the overlap of the eigenvector and the input |x : where α j = ϕ j | |x .
In the second part, the operator G = U ψ U f is applied to |ψ iteratively until we can obtain QQ T |x on the second register. The action of U f applied to |ψ is as follows: Here, assuming the first k number of eigenvalues are zero, the unnormalized state |φ is defined as: It is easy to see that |φ = QQ T |x , which is our target output. When U ψ is applied to the output in Eq.(21), we simply change the amplitudes of |ψ : Here, P f is the initial success probability and equal to N −1 j=k α 2 j . The repetitive applications of G only changes the amplitudes of |ψ and |f |φ : e.g., where c = (4µ 2 P f − 1). The normalized probability of (2µ |f |ϕ ) is presented in Fig. 3 by using different values for c (The amplitudes of |ψ and (2µ |f ) are normalized.). The amplitude of |ψ through the iterations of the amplitude amplification oscillates with a frequency depending on the overlaps of the input with the eigenvectors. When the amplitude of |ψ becomes close to zero, the second register in the remaining part |f |φ is exactly QQ T |x and the first register is equal to |f . Fig.5 represents the iterations of the algorithm for a random 2 7 × 2 7 matrix with 2 7 /2 number of zero eigenvalues and a random input |x (MATLAB code for the random generation is given in Appendix B). In each subfigure, we have used different numbers of qubits for the first register to see the effect on the results. The bar graphs in the subfigures shows the probability change for each state |j , j = 0 . . . 1, of the first qubit (A different color tone indicates a different state.). When the probability for |0 becomes close to zero, the probability for the rest of the states becomes equal and so the total of these probabilities as shown in the bottom figure of each subfigure becomes almost one. At that point, the fidelity found by reg 2 | QQ T |x also comes closer to one.

B. Number of Iterations
Through the iterations, while the probability for |0 state goes to zero, the probabilities for the rest of the states become almost equal. This indicates that the individual states of each qubit turns into the equal superposition state. Therefore, if the state of a qubit in the first register is in the almost equal superposition state, then the success probability is very likely to be in its maximum level. In the Hadamard basis, |0 and |1 are represented in the equal superposition states as follows: Therefore, using the Hadamard basis, if the probability of measuring |0 is close to one, in other words, if |1 is not seen in the measurement, then the second register likely holds QQ T |x with a maximum possible fidelity. Fig.4 shows the comparisons of the individual qubit probabilities (i.e., the probability to see a qubit in the first register in |0 in the Hadamard basis.) with the total probability observed in Fig.5(f) for the random case: As seen in the figure, the individual probabilities exhibit the same behavior as the total probability. Generally, obtaining a possible probability density of an unknown quantum state is a difficult task. However, since we are dealing with only a single qubit and does not require the exact density, this can be done efficiently. For instance, if |0 is seen a number times in ten measurements, then the success probability is expected to be a/10. Here, the number of measurements obviously determines the precision in the obtained probability which may also affect the fidelity.

C. Error-Precision (Number of Qubits in |reg 1 )
The number of qubits, m, in the first register should be sufficient to distinguish very small nonzero eigenvalues from the ones which are zero. In our numerical random experiments, we have observed that choosing only six or five qubits are enough to get very high fidelity while not requiring a high number of iterations. The impact of the number of qubits on the fidelity and the probability is shown in Fig.5 in which each sub-figure is drawn by using different register sizes for the same random case. As seen in the figure, the number of qubits also affects the required number of iterations: e.g., while for m = 3, the highest fidelity and probability are seen at the fourth iteration; for m = 6, it happens around the ninth iteration.

D. Circuit Implementation of W
The circuit implementation of W requires forming a quantum circuit representing the time evolution of W : i.e., U = e i2πW t . When W is a sparse matrix, the circuit can be formed by following the method in Ref. [27]. However, when it is not sparse but in the following form W = j x j x j T , then the exponential is equal to: To approximate the above exponential, We apply the Trotter Suzuki formula [28]- [31] to decompose Eq.(26) into the terms whereĪ is a kind of identity matrix with the first element set to e i2πt , and U x j is a unitary matrix with the first row and column equal to x j . For instance, if we apply the second order Trotter-Suzuki decomposition to Eq.(26) (Note that the order of the decomposition impacts the accuracy of the approximation.), the following is obtained: Then, the same decomposition is applied to the term e i2πt/2 κ j=2 x j x j T in the above equation. This recursive decomposition yields an approximation composed of (4κ) number of U xj matrices. U xj can be implemented as a Householder matrix requires O(2 n ) quantum operations which is linear in the size of x j [32]- [35].

E. Obtaining a solution from the output
Generally, the amplitudes of the output vector (the final state of the second register) encodes the information needed for the solution of the considered problem. Since obtaining the full density of a quantum state is known to be very inefficient for larger systems, one needs to drive efficient measurement schemes specific to the problem. For instance, for some problems, comparisons of the peak values instead of the whole vectors may be enough to gauge a conclusion: In this case, since a possible outcome in a measurement would be the one with an amplitude likely to be greater than most of the states in magnitude, the peak values can be obtained efficiently. However, this alone may not be enough for some applications.

IV. COMPLEXITY ANALYSIS
The computational complexity of a quantum algorithm is assessed by the total number of single gates and two qubit controlled NOT (CNOT) gates in the quantum circuit implementing the algorithm. We derive the computational complexity of the whole method by finding the complexities of U f on the first register with m number qubits and U ψ on the second register with n number of qubits. We will use M = 2 m and N = 2 n to describe the sizes of the operators on the registers.

A. The complexity of U f
It is shown that the number of quantum gates to implement a Householder matrix is bounded by the size of the matrix Ref.
[32]- [35]. Therefore, the circuit for U f requires O(M ) CNOT gates since it is a Householder transformation formed by the vector |f of size M .
B. The complexity of U ψ U ψ is equal to U P EA U 0 U † P EA in which the total complexity will be typically governed by the complexity of U P EA . U P EA involves the Fourier transforms, input preparation circuit, and the controlled U = e itW with different t values: • The circuits for the quantum Fourier transform and its inverse are well known [23] and can be implemented on the first register in O(m 2 ). • The input preparation circuit on the second register, U input , can be implemented again as a Householder transformation by using O(N ) number of quantum gates.
(It can be also designed by following Sec. III.B. of Ref. [36]: In that case, for every two vector elements, a controlled rotation gate is used to construct U input with the initial row equal to x; thus, U input |0 = |x .) • The circuit complexity of U = e itW is highly related to the structure of W . When W of order N is sparse enough: i.e., the number of nonzero entries is bounded by some polynomial of the number of qubits, poly(n); then W can be simulated by using only O(poly(n)) number of quantum gates [27], [37], [38]. However, when W is not sparse but equal to x i x i T , then as shown in Sec. III-D, we use Trotter-Suzuki formula which yields a product of (4κ) number of U xj matrices with 1 ≤ j ≤ κ. Since U xj can be implemented as a Householder transformation by using O(N ) quantum gates, U requires O(κN ) quantum gates. If we combine all the above terms, the total complexity can be concluded as: which is linear in the system size, but exponential in the number of qubits involved in either one of the registers. When a classical method is applied to obtain QQ T x; then because of the matrix vector multiplication, it will at least require O(N 2 ) time complexity. Therefore, the quantum model presented here may provide a quadratic speed-up over the classical methods for some applications.

V. AN EXAMPLE
Here, we give a simple example to show how the algorithm works: Let us assume, we have given weights represented by the columns of the following matrix [39]: where we scale the vectors by 1 10 so as to make sure that the eigenvalues of W are less than one. To validate the simulation results, first, W [∞] is classically computed by following the singular value decomposition of X: We use the following Trotter-Suzuki formula [28]- [31] to compute the exponential of W = XX ′ : In the simulation for a random input |x , the comparison of W [∞] |x = QQ T |x with the output of the second register in the quantum model yields the fidelity. For two different random inputs, the simulation results in each iteration are shown in Fig.6(a)

VI. CONCLUSION
The weight matrix of the networks based on the Widrow-Hoff learning rule converges to QQ T , where Q represents the eigenvectors of the matrix corresponding to the nonzero eigenvalues. In this paper, we have showed how to apply the quantum principal component analysis method described in Ref. [24] to artificial neural networks using the Widrow-Hoff learning rule. In particular, we have show that one can implement an equivalent quantum circuit which produces the output QQ T x for a given input x in linear time. The implementation details are discussed by using random cases, and the computation complexity is analyzed based on the number of quantum gates. In addition, a simple numerical example is presented. The model is general and requires only linear time computational complexity in the size of the weight matrix.
(34) Therefore, if we can somehow produce the bits of an eigenvalue j in the above form, the quantum inverse Fourier transform can be used to obtain the bit values on the quantum register (It is the main intuition of the phase estimation algorithm.). QFT can be implemented as a quantum circuit by using O(n 2 ) quantum gates.

APPENDIX B MATLAB CODE FOR THE RANDOM MATRIX
The random matrix used in the numerical example is generated by the following MATLAB code snippet: %number of non-zero eigenvalues npc = ceil(N/2); d = rand(N,1);%random eigenvalues d(npc+1:end) = 0; %random eigenvectors [Qfull,˜] = qr(randn(N)); %the unitary matrix in PEA U = Qfull * diag(exp(1i * 2 * pi * d)) * Qfull'; %normalized input vector x = rand(N,1); x = x/norm(x);      (f) m is 6. Fig. 5. The probability changes in the iteration of the amplitude amplification for a random 2 7 × 2 7 matrix with 2 7 /2 number of zero eigenvalues and a random input |x (MATLAB code for the random generation is given in Appendix B). In each subfigure, we have used different numbers of qubits, m, for the first register to see the effect on the results. The bar graphs in the subfigures shows the probability change for each state |j , j = 0 . . . 1, of the first qubit. For each state, a different color tone is used.