6. Optimisation en Python#

Marc BUFFAT , Université Claude Bernard Lyon 1

  • Premature optimization is the root of all evil (or at least most of it) in programming.

    • Donald Knuth (créateur de TeX)

  • It’s easier to optimize correct code than to correct optimized code.

  • Avant toute optimisation, il faut d’abord écrire un code correct, documenté et validé.

# option de mise en page
from IPython.core.display import HTML,display
#from JSAnimation import IPython_display
#css_file = 'style.css'
#HTML(open(css_file, "r").read())
/tmp/ipykernel_3693640/1945229322.py:2: DeprecationWarning: Importing display from IPython.core.display is deprecated since IPython 7.14, please import from IPython display
  from IPython.core.display import HTML,display
%matplotlib inline
# bibliotheques de base
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams['font.family'] = 'serif'
rcParams['font.size'] = 14

6.1. Calcul de la matrice de Vandermonde V (N,M)#

Soit un vecteur \(X\) de dimension \(N\):

\[ V_{ij} = x_i ^{j-1} \;\; \forall i=1,N \;\; \forall j=1,n\]
  • écrire un algorithme avec des boucles

  • vectoriser la boucle interne et comparer

# version avec des boucles
def Vandermonde_v1(X) :
    """Calcul de la matrice de VanderMonde V a partir de X"""
    N= X.size
    V= np.ones((N,N))
    for i in range(N) :
        for j in range(N) :
            V[i,j] = X[i]**j
    return V
# donnees
N=100
X=np.linspace(0,1.,N)
A1=Vandermonde_v1(X)
print(A1[:,0],A1[:,1])
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1.] [0.         0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
 0.06060606 0.07070707 0.08080808 0.09090909 0.1010101  0.11111111
 0.12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
 0.18181818 0.19191919 0.2020202  0.21212121 0.22222222 0.23232323
 0.24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
 0.3030303  0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
 0.36363636 0.37373737 0.38383838 0.39393939 0.4040404  0.41414141
 0.42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
 0.48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
 0.54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
 0.60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
 0.66666667 0.67676768 0.68686869 0.6969697  0.70707071 0.71717172
 0.72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
 0.78787879 0.7979798  0.80808081 0.81818182 0.82828283 0.83838384
 0.84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899
 0.90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596
 0.96969697 0.97979798 0.98989899 1.        ]
%%timeit -n 4
A=Vandermonde_v1(X)
2.42 ms ± 30 μs per loop (mean ± std. dev. of 7 runs, 4 loops each)

6.1.1. optimisation python#

  • vectorisation boucle interne

  • élimine la fonction **

# version optimisee
def Vandermonde_v2(X) :
    N=X.size
    V=np.ones((N,N))
    for j in range(1,N):
        V[:,j]=V[:,j-1]*X[:]
    return V
A2=Vandermonde_v2(X)
print(A2[:,0],A2[:,1])
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1.] [0.         0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
 0.06060606 0.07070707 0.08080808 0.09090909 0.1010101  0.11111111
 0.12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
 0.18181818 0.19191919 0.2020202  0.21212121 0.22222222 0.23232323
 0.24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
 0.3030303  0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
 0.36363636 0.37373737 0.38383838 0.39393939 0.4040404  0.41414141
 0.42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
 0.48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
 0.54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
 0.60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
 0.66666667 0.67676768 0.68686869 0.6969697  0.70707071 0.71717172
 0.72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
 0.78787879 0.7979798  0.80808081 0.81818182 0.82828283 0.83838384
 0.84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899
 0.90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596
 0.96969697 0.97979798 0.98989899 1.        ]
%%timeit -n 4
A=Vandermonde_v2(X)
168 μs ± 86.3 μs per loop (mean ± std. dev. of 7 runs, 4 loops each)

6.2. Cython#

  • interface notebook ipython (+simple)

  • ou compilation explicite

%load_ext Cython
%%cython
# reecriture en Cython
# ====================
#cython: boundscheck=False
#cython: wraparound=False
#
import numpy as np
cimport numpy  as np
cimport cython

cpdef np.ndarray[double,ndim=2] vandermonde(np.ndarray[double,ndim=1] X):
    cdef int N=X.size
    cdef np.ndarray[double,ndim=2] A = np.ones((N,N))
    cdef int i,j
    for j in xrange(1,N):
       for i in xrange(N):
         A[i,j] = A[i,j-1]*X[i]
    return A
A3=vandermonde(X)
print(A3[:,0],A3[:,1])
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1.] [0.         0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
 0.06060606 0.07070707 0.08080808 0.09090909 0.1010101  0.11111111
 0.12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
 0.18181818 0.19191919 0.2020202  0.21212121 0.22222222 0.23232323
 0.24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
 0.3030303  0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
 0.36363636 0.37373737 0.38383838 0.39393939 0.4040404  0.41414141
 0.42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
 0.48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
 0.54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
 0.60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
 0.66666667 0.67676768 0.68686869 0.6969697  0.70707071 0.71717172
 0.72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
 0.78787879 0.7979798  0.80808081 0.81818182 0.82828283 0.83838384
 0.84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899
 0.90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596
 0.96969697 0.97979798 0.98989899 1.        ]

6.2.1. compilation explicite cython#

  • creation code vandermonde.pyx en Cython

  • ecriture interface python (bibliotheque)

  • compilartion du code C genere

%%bash
cat vandermonde.pyx
# code reecrit en Cython
# ====================
#cython: boundscheck=False
#cython: wraparound=False
#
import numpy as np
cimport numpy  as np
cimport cython

cpdef np.ndarray[double,ndim=2] vandermonde(np.ndarray[double,ndim=1] X):
    cdef int N=X.size
    cdef np.ndarray[double,ndim=2] A = np.ones((N,N))
    cdef int i,j
    for j in xrange(1,N):
       for i in xrange(N):
         A[i,j] = A[i,j-1]*X[i]
    return A

%%bash
cat setup.py
# creation librairie
from distutils.core import setup, Extension
from Cython.Distutils import build_ext

import numpy 

ext = Extension("vandermonde", ["vandermonde.pyx"], extra_compile_args=['-O3','-Wno-unused-function'],
            include_dirs = [numpy.get_include()])

setup(ext_modules=[ext],
     cmdclass={'build_ext': build_ext}
)


%%bash
python setup.py build_ext --inplace
./test_vandermonde.py 
running build_ext
skipping 'vandermonde.c' Cython extension (up-to-date)
test vandermonde N= 100
matrice :  <class 'numpy.ndarray'> (100, 100)
colonne 0:
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1.]
colonne 1:
 [0.         0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
 0.06060606 0.07070707 0.08080808 0.09090909 0.1010101  0.11111111
 0.12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
 0.18181818 0.19191919 0.2020202  0.21212121 0.22222222 0.23232323
 0.24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
 0.3030303  0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
 0.36363636 0.37373737 0.38383838 0.39393939 0.4040404  0.41414141
 0.42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
 0.48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
 0.54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
 0.60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
 0.66666667 0.67676768 0.68686869 0.6969697  0.70707071 0.71717172
 0.72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
 0.78787879 0.7979798  0.80808081 0.81818182 0.82828283 0.83838384
 0.84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899
 0.90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596
 0.96969697 0.97979798 0.98989899 1.        ]

6.2.2. code C#

%%bash
cat vandermonde_C.C
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "vandermonde_C.h"

// calcul la matrice de Vandermonde a partir de X
void vandermondeC(double *V,double *X,int N) 
{
	for (int i=0; i<N; i++)
		V[i]=1.0;
	for (int j=1; j<N; j++)
		for (int i=0; i<N; i++)
			V[i+j*N]=V[i+(j-1)*N]*X[i];
	return;
}

%%bash
g++ test_vandermondeC.C vandermonde_C.C -O3
./a.out 10 -d
calcul de la matrice de Vandermonde d'ordre N=10
[  0]	           1	           1 ....	           1
[  1]	           1	         0.5 ....	  0.00195312
[  2]	           1	    0.333333 ....	 5.08053e-05
[  3]	           1	        0.25 ....	  3.8147e-06
[  4]	           1	         0.2 ....	    5.12e-07
[  5]	           1	    0.166667 ....	  9.9229e-08
[  6]	           1	    0.142857 ....	 2.47809e-08
[  7]	           1	       0.125 ....	 7.45058e-09
[  8]	           1	    0.111111 ....	 2.58117e-09
[  9]	           1	         0.1 ....	       1e-09

6.3. Démarche d’optimisation#

« Premature optimization is the root of all evil (or at least most of it) in programming.»
    — Donald Knuth (créateur de TeX)
    
« It's easier to optimize correct code than to correct optimized code.»

"Un code correct et validé est indispensable. Un code optimisé non !"

6.4. étape d’optimisation#

1. écrire tout d'abord un code correcte et documenté
2. valider le code
3. faire du profiling (avec timeit) pour déterminer les fonctions les plus couteuses en temps CPU
4. optimiser uniquement ces fonctions (en général 5 à 10% du code total)

6.5. Optimisation sous Python#

Dans l’ordre:

  1. utiliser les bibliothéques (numpy, scipy)

  2. vectoriser les boucles internes

  3. utiliser un compilateur python (Cython,..)

  4. réecrire le code en C/FORTRAN

6.6. exemple: produit matrice vecteur#

\( b = A*x \) avec $\( b_i = \sum_j A_{ij} x_j \)$

  • programme avec boucles

  • vectorisation

  • utilisation de numpy dot

def produit(A,x):
    """ calcul b=Ax """
    n=x.size
    b=np.zeros(n)
    for i in range(n):
        for j in range(n):
            b[i] += A[i,j]*x[j]
    return b
# version vectoriel 1
def produitv1(A,x):
    n=x.size
    b=np.zeros(n)
    for i in range(n):
        b[i] = np.sum(A[i,:]*x[:])
        #b[i] = sum(A[i,:]*x[:]) # attention sum
    return b
# version vectoriel 2
def produitv2(A,x):
    B = A * x
    b = np.sum(B[:,:],axis=1)
    return b
N=1000
A  = np.random.rand(N,N)
X  = np.random.rand(N)
B0 = np.dot(A,X)
B1 = produit(A,X)
B2 = produitv1(A,X) 
B3 = produitv2(A,X) 
print("erreur=",np.linalg.norm(B1-B0),\
                np.linalg.norm(B2-B0),\
                np.linalg.norm(B3-B0))
erreur= 6.574329060294259e-12 1.6771217010423162e-12 1.6771217010423162e-12
%%timeit -n 2
B = produit(A,X)
395 ms ± 3.69 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
%%timeit -n 2
B = produitv1(A,X)
5.8 ms ± 479 μs per loop (mean ± std. dev. of 7 runs, 2 loops each)
%%timeit -n 2
B = produitv2(A,X)
1.82 ms ± 261 μs per loop (mean ± std. dev. of 7 runs, 2 loops each)
%%timeit -n 2
B = np.dot(A,X)
The slowest run took 12.86 times longer than the fastest. This could mean that an intermediate result is being cached.
541 μs ± 668 μs per loop (mean ± std. dev. of 7 runs, 2 loops each)

6.7. FIN#