Optimisation en Python

Contenu

6. Optimisation en Python#

Marc BUFFAT , Université Claude Bernard Lyon 1

Premature optimization is the root of all evil (or at least most of it) in programming.
- Donald Knuth (créateur de TeX)
It’s easier to optimize correct code than to correct optimized code.
Avant toute optimisation, il faut d’abord écrire un code correct, documenté et validé.

# option de mise en page
from IPython.core.display import HTML,display
#from JSAnimation import IPython_display
#css_file = 'style.css'
#HTML(open(css_file, "r").read())

/tmp/ipykernel_3693640/1945229322.py:2: DeprecationWarning: Importing display from IPython.core.display is deprecated since IPython 7.14, please import from IPython display
  from IPython.core.display import HTML,display

%matplotlib inline
# bibliotheques de base
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams['font.family'] = 'serif'
rcParams['font.size'] = 14

6.1. Calcul de la matrice de Vandermonde V (N,M)#

Soit un vecteur $X$ de dimension $N$:

\[ V_{ij} = x_i ^{j-1} \;\; \forall i=1,N \;\; \forall j=1,n\]

écrire un algorithme avec des boucles
vectoriser la boucle interne et comparer

# version avec des boucles
def Vandermonde_v1(X) :
    """Calcul de la matrice de VanderMonde V a partir de X"""
    N= X.size
    V= np.ones((N,N))
    for i in range(N) :
        for j in range(N) :
            V[i,j] = X[i]**j
    return V

# donnees
N=100
X=np.linspace(0,1.,N)

A1=Vandermonde_v1(X)
print(A1[:,0],A1[:,1])

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1.] [0.         0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
06060606 0.07070707 0.08080808 0.09090909 0.1010101  0.11111111
12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
18181818 0.19191919 0.2020202  0.21212121 0.22222222 0.23232323
24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
3030303  0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
36363636 0.37373737 0.38383838 0.39393939 0.4040404  0.41414141
42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
66666667 0.67676768 0.68686869 0.6969697  0.70707071 0.71717172
72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
78787879 0.7979798  0.80808081 0.81818182 0.82828283 0.83838384
84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899
90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596
96969697 0.97979798 0.98989899 1.        ]

%%timeit -n 4
A=Vandermonde_v1(X)

2.42 ms ± 30 μs per loop (mean ± std. dev. of 7 runs, 4 loops each)

6.1.1. optimisation python#

vectorisation boucle interne
élimine la fonction **

# version optimisee
def Vandermonde_v2(X) :
    N=X.size
    V=np.ones((N,N))
    for j in range(1,N):
        V[:,j]=V[:,j-1]*X[:]
    return V

A2=Vandermonde_v2(X)
print(A2[:,0],A2[:,1])

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1.] [0.         0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
06060606 0.07070707 0.08080808 0.09090909 0.1010101  0.11111111
12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
18181818 0.19191919 0.2020202  0.21212121 0.22222222 0.23232323
24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
3030303  0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
36363636 0.37373737 0.38383838 0.39393939 0.4040404  0.41414141
42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
66666667 0.67676768 0.68686869 0.6969697  0.70707071 0.71717172
72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
78787879 0.7979798  0.80808081 0.81818182 0.82828283 0.83838384
84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899
90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596
96969697 0.97979798 0.98989899 1.        ]

%%timeit -n 4
A=Vandermonde_v2(X)

168 μs ± 86.3 μs per loop (mean ± std. dev. of 7 runs, 4 loops each)

6.2. Cython#

interface notebook ipython (+simple)
ou compilation explicite

%load_ext Cython

%%cython
# reecriture en Cython
# ====================
#cython: boundscheck=False
#cython: wraparound=False
#
import numpy as np
cimport numpy  as np
cimport cython

cpdef np.ndarray[double,ndim=2] vandermonde(np.ndarray[double,ndim=1] X):
    cdef int N=X.size
    cdef np.ndarray[double,ndim=2] A = np.ones((N,N))
    cdef int i,j
    for j in xrange(1,N):
       for i in xrange(N):
         A[i,j] = A[i,j-1]*X[i]
    return A

A3=vandermonde(X)
print(A3[:,0],A3[:,1])

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1.] [0.         0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
06060606 0.07070707 0.08080808 0.09090909 0.1010101  0.11111111
12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
18181818 0.19191919 0.2020202  0.21212121 0.22222222 0.23232323
24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
3030303  0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
36363636 0.37373737 0.38383838 0.39393939 0.4040404  0.41414141
42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
66666667 0.67676768 0.68686869 0.6969697  0.70707071 0.71717172
72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
78787879 0.7979798  0.80808081 0.81818182 0.82828283 0.83838384
84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899
90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596
96969697 0.97979798 0.98989899 1.        ]

6.2.1. compilation explicite cython#

creation code vandermonde.pyx en Cython
ecriture interface python (bibliotheque)
compilartion du code C genere

%%bash
cat vandermonde.pyx

# code reecrit en Cython

# ====================

#cython: boundscheck=False

#cython: wraparound=False

import numpy as np

cimport numpy  as np

cimport cython

cpdef np.ndarray[double,ndim=2] vandermonde(np.ndarray[double,ndim=1] X):

    cdef int N=X.size

    cdef np.ndarray[double,ndim=2] A = np.ones((N,N))

    cdef int i,j

    for j in xrange(1,N):

       for i in xrange(N):

         A[i,j] = A[i,j-1]*X[i]

    return A

%%bash
cat setup.py

# creation librairie

from distutils.core import setup, Extension

from Cython.Distutils import build_ext

import numpy

ext = Extension("vandermonde", ["vandermonde.pyx"], extra_compile_args=['-O3','-Wno-unused-function'],

            include_dirs = [numpy.get_include()])

setup(ext_modules=[ext],

     cmdclass={'build_ext': build_ext}

%%bash
python setup.py build_ext --inplace
./test_vandermonde.py 

running build_ext

skipping 'vandermonde.c' Cython extension (up-to-date)

test vandermonde N= 100

matrice :  <class 'numpy.ndarray'> (100, 100)

colonne 0:

 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.

 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.

 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.

 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.

 1. 1. 1. 1.]

colonne 1:

 [0.         0.01010101 0.02020202 0.03030303 0.04040404 0.05050505

 0.06060606 0.07070707 0.08080808 0.09090909 0.1010101  0.11111111

 0.12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717

 0.18181818 0.19191919 0.2020202  0.21212121 0.22222222 0.23232323

 0.24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929

 0.3030303  0.31313131 0.32323232 0.33333333 0.34343434 0.35353535

 0.36363636 0.37373737 0.38383838 0.39393939 0.4040404  0.41414141

 0.42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747

 0.48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354

 0.54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596

 0.60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566

 0.66666667 0.67676768 0.68686869 0.6969697  0.70707071 0.71717172

 0.72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778

 0.78787879 0.7979798  0.80808081 0.81818182 0.82828283 0.83838384

 0.84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899

 0.90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596

 0.96969697 0.97979798 0.98989899 1.        ]

6.2.2. code C#

%%bash
cat vandermonde_C.C

#include <math.h>

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include "vandermonde_C.h"

// calcul la matrice de Vandermonde a partir de X

void vandermondeC(double *V,double *X,int N)

	for (int i=0; i<N; i++)

		V[i]=1.0;

	for (int j=1; j<N; j++)

		for (int i=0; i<N; i++)

			V[i+j*N]=V[i+(j-1)*N]*X[i];

	return;

%%bash
g++ test_vandermondeC.C vandermonde_C.C -O3
./a.out 10 -d

calcul de la matrice de Vandermonde d'ordre N=10

[  0]	           1	           1 ....	           1

[  1]	           1	         0.5 ....	  0.00195312

[  2]	           1	    0.333333 ....	 5.08053e-05

[  3]	           1	        0.25 ....	  3.8147e-06

[  4]	           1	         0.2 ....	    5.12e-07

[  5]	           1	    0.166667 ....	  9.9229e-08

[  6]	           1	    0.142857 ....	 2.47809e-08

[  7]	           1	       0.125 ....	 7.45058e-09

[  8]	           1	    0.111111 ....	 2.58117e-09

[  9]	           1	         0.1 ....	       1e-09

6.3. Démarche d’optimisation#

« Premature optimization is the root of all evil (or at least most of it) in programming.»
    — Donald Knuth (créateur de TeX)
    
« It's easier to optimize correct code than to correct optimized code.»

"Un code correct et validé est indispensable. Un code optimisé non !"

6.4. étape d’optimisation#

écrire tout d'abord un code correcte et documenté
valider le code
faire du profiling (avec timeit) pour déterminer les fonctions les plus couteuses en temps CPU
optimiser uniquement ces fonctions (en général 5 à 10% du code total)

6.5. Optimisation sous Python#

Dans l’ordre:

utiliser les bibliothéques (numpy, scipy)
vectoriser les boucles internes
utiliser un compilateur python (Cython,..)
réecrire le code en C/FORTRAN

6.6. exemple: produit matrice vecteur#

$ b = A*x $ avec $$ b_i = \sum_j A_{ij} x_j $$

programme avec boucles
vectorisation
utilisation de numpy dot

def produit(A,x):
    """ calcul b=Ax """
    n=x.size
    b=np.zeros(n)
    for i in range(n):
        for j in range(n):
            b[i] += A[i,j]*x[j]
    return b

# version vectoriel 1
def produitv1(A,x):
    n=x.size
    b=np.zeros(n)
    for i in range(n):
        b[i] = np.sum(A[i,:]*x[:])
        #b[i] = sum(A[i,:]*x[:]) # attention sum
    return b

# version vectoriel 2
def produitv2(A,x):
    B = A * x
    b = np.sum(B[:,:],axis=1)
    return b

N=1000
A  = np.random.rand(N,N)
X  = np.random.rand(N)
B0 = np.dot(A,X)
B1 = produit(A,X)
B2 = produitv1(A,X) 
B3 = produitv2(A,X) 
print("erreur=",np.linalg.norm(B1-B0),\
                np.linalg.norm(B2-B0),\
                np.linalg.norm(B3-B0))

erreur= 6.574329060294259e-12 1.6771217010423162e-12 1.6771217010423162e-12

%%timeit -n 2
B = produit(A,X)

395 ms ± 3.69 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)

%%timeit -n 2
B = produitv1(A,X)

5.8 ms ± 479 μs per loop (mean ± std. dev. of 7 runs, 2 loops each)

%%timeit -n 2
B = produitv2(A,X)

1.82 ms ± 261 μs per loop (mean ± std. dev. of 7 runs, 2 loops each)

%%timeit -n 2
B = np.dot(A,X)

The slowest run took 12.86 times longer than the fastest. This could mean that an intermediate result is being cached.
541 μs ± 668 μs per loop (mean ± std. dev. of 7 runs, 2 loops each)