6. Optimisation en Python#
Marc BUFFAT , Université Claude Bernard Lyon 1
Premature optimization is the root of all evil (or at least most of it) in programming.
Donald Knuth (créateur de TeX)
It’s easier to optimize correct code than to correct optimized code.
Avant toute optimisation, il faut d’abord écrire un code correct, documenté et validé.
# option de mise en page
from IPython.display import HTML,display
#from JSAnimation import IPython_display
#css_file = 'style.css'
#HTML(open(css_file, "r").read())
%matplotlib inline
# bibliotheques de base
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams['font.family'] = 'serif'
rcParams['font.size'] = 14
6.1. Calcul de la matrice de Vandermonde V (N,M)#
Soit un vecteur \(X\) de dimension \(N\):
écrire un algorithme avec des boucles
vectoriser la boucle interne et comparer
# version avec des boucles
def Vandermonde_v1(X) :
"""Calcul de la matrice de VanderMonde V a partir de X"""
N= X.size
V= np.ones((N,N))
for i in range(N) :
for j in range(N) :
V[i,j] = X[i]**j
return V
# donnees
N=100
X=np.linspace(0,1.,N)
A1=Vandermonde_v1(X)
print(A1[:,0],A1[:,1])
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1.] [0. 0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
0.06060606 0.07070707 0.08080808 0.09090909 0.1010101 0.11111111
0.12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
0.18181818 0.19191919 0.2020202 0.21212121 0.22222222 0.23232323
0.24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
0.3030303 0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
0.36363636 0.37373737 0.38383838 0.39393939 0.4040404 0.41414141
0.42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
0.48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
0.54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
0.60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
0.66666667 0.67676768 0.68686869 0.6969697 0.70707071 0.71717172
0.72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
0.78787879 0.7979798 0.80808081 0.81818182 0.82828283 0.83838384
0.84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899
0.90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596
0.96969697 0.97979798 0.98989899 1. ]
%%timeit -n 4
A=Vandermonde_v1(X)
3.04 ms ± 302 μs per loop (mean ± std. dev. of 7 runs, 4 loops each)
6.1.1. optimisation python#
vectorisation boucle interne
élimine la fonction **
# version optimisee
def Vandermonde_v2(X) :
N=X.size
V=np.ones((N,N))
for j in range(1,N):
V[:,j]=V[:,j-1]*X[:]
return V
A2=Vandermonde_v2(X)
print(A2[:,0],A2[:,1])
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1.] [0. 0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
0.06060606 0.07070707 0.08080808 0.09090909 0.1010101 0.11111111
0.12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
0.18181818 0.19191919 0.2020202 0.21212121 0.22222222 0.23232323
0.24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
0.3030303 0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
0.36363636 0.37373737 0.38383838 0.39393939 0.4040404 0.41414141
0.42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
0.48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
0.54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
0.60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
0.66666667 0.67676768 0.68686869 0.6969697 0.70707071 0.71717172
0.72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
0.78787879 0.7979798 0.80808081 0.81818182 0.82828283 0.83838384
0.84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899
0.90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596
0.96969697 0.97979798 0.98989899 1. ]
%%timeit -n 4
A=Vandermonde_v2(X)
166 μs ± 43.8 μs per loop (mean ± std. dev. of 7 runs, 4 loops each)
6.2. Cython#
interface notebook ipython (+simple)
ou compilation explicite
%load_ext Cython
%%cython
# reecriture en Cython
# ====================
#cython: boundscheck=False
#cython: wraparound=False
#
import numpy as np
cimport numpy as np
cimport cython
cpdef np.ndarray[double,ndim=2] vandermonde(np.ndarray[double,ndim=1] X):
cdef int N=X.size
cdef np.ndarray[double,ndim=2] A = np.ones((N,N))
cdef int i,j
for j in xrange(1,N):
for i in xrange(N):
A[i,j] = A[i,j-1]*X[i]
return A
A3=vandermonde(X)
print(A3[:,0],A3[:,1])
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1.] [0. 0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
0.06060606 0.07070707 0.08080808 0.09090909 0.1010101 0.11111111
0.12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
0.18181818 0.19191919 0.2020202 0.21212121 0.22222222 0.23232323
0.24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
0.3030303 0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
0.36363636 0.37373737 0.38383838 0.39393939 0.4040404 0.41414141
0.42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
0.48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
0.54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
0.60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
0.66666667 0.67676768 0.68686869 0.6969697 0.70707071 0.71717172
0.72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
0.78787879 0.7979798 0.80808081 0.81818182 0.82828283 0.83838384
0.84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899
0.90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596
0.96969697 0.97979798 0.98989899 1. ]
6.2.1. compilation explicite cython#
creation code vandermonde.pyx en Cython
ecriture interface python (bibliotheque)
compilartion du code C genere
%%bash
cat vandermonde.pyx
# code reecrit en Cython
# ====================
#cython: boundscheck=False
#cython: wraparound=False
#
import numpy as np
cimport numpy as np
cimport cython
cpdef np.ndarray[double,ndim=2] vandermonde(np.ndarray[double,ndim=1] X):
cdef int N=X.size
cdef np.ndarray[double,ndim=2] A = np.ones((N,N))
cdef int i,j
for j in xrange(1,N):
for i in xrange(N):
A[i,j] = A[i,j-1]*X[i]
return A
%%bash
cat setup.py
# creation librairie
from distutils.core import setup, Extension
from Cython.Distutils import build_ext
import numpy
ext = Extension("vandermonde", ["vandermonde.pyx"], extra_compile_args=['-O3','-Wno-unused-function'],
include_dirs = [numpy.get_include()])
setup(ext_modules=[ext],
cmdclass={'build_ext': build_ext}
)
%%bash
python setup.py build_ext --inplace
./test_vandermonde.py
running build_ext
skipping 'vandermonde.c' Cython extension (up-to-date)
building 'vandermonde' extension
x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/home/buffat/venvs/jupyter/lib/python3.10/site-packages/numpy/core/include -I/home/buffat/venvs/jupyter/include -I/usr/include/python3.10 -c vandermonde.c -o build/temp.linux-x86_64-cpython-310/vandermonde.o -O3 -Wno-unused-function
In file included from /home/buffat/venvs/jupyter/lib/python3.10/site-packages/numpy/core/include/numpy/ndarraytypes.h:1929,
from /home/buffat/venvs/jupyter/lib/python3.10/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /home/buffat/venvs/jupyter/lib/python3.10/site-packages/numpy/core/include/numpy/arrayobject.h:5,
from vandermonde.c:698:
/home/buffat/venvs/jupyter/lib/python3.10/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
17 | #warning "Using deprecated NumPy API, disable it with " \
| ^~~~~~~
x86_64-linux-gnu-gcc -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 build/temp.linux-x86_64-cpython-310/vandermonde.o -L/usr/lib/x86_64-linux-gnu -o /data/cours/MGC2367M/cours/vandermonde.cpython-310-x86_64-linux-gnu.so
test vandermonde N= 100
matrice : <class 'numpy.ndarray'> (100, 100)
colonne 0:
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1.]
colonne 1:
[0. 0.01010101 0.02020202 0.03030303 0.04040404 0.05050505
0.06060606 0.07070707 0.08080808 0.09090909 0.1010101 0.11111111
0.12121212 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717
0.18181818 0.19191919 0.2020202 0.21212121 0.22222222 0.23232323
0.24242424 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929
0.3030303 0.31313131 0.32323232 0.33333333 0.34343434 0.35353535
0.36363636 0.37373737 0.38383838 0.39393939 0.4040404 0.41414141
0.42424242 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747
0.48484848 0.49494949 0.50505051 0.51515152 0.52525253 0.53535354
0.54545455 0.55555556 0.56565657 0.57575758 0.58585859 0.5959596
0.60606061 0.61616162 0.62626263 0.63636364 0.64646465 0.65656566
0.66666667 0.67676768 0.68686869 0.6969697 0.70707071 0.71717172
0.72727273 0.73737374 0.74747475 0.75757576 0.76767677 0.77777778
0.78787879 0.7979798 0.80808081 0.81818182 0.82828283 0.83838384
0.84848485 0.85858586 0.86868687 0.87878788 0.88888889 0.8989899
0.90909091 0.91919192 0.92929293 0.93939394 0.94949495 0.95959596
0.96969697 0.97979798 0.98989899 1. ]
6.2.2. code C#
%%bash
cat vandermonde_C.C
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "vandermonde_C.h"
// calcul la matrice de Vandermonde a partir de X
void vandermondeC(double *V,double *X,int N)
{
for (int i=0; i<N; i++)
V[i]=1.0;
for (int j=1; j<N; j++)
for (int i=0; i<N; i++)
V[i+j*N]=V[i+(j-1)*N]*X[i];
return;
}
%%bash
g++ test_vandermondeC.C vandermonde_C.C -O3
./a.out 10 -d
calcul de la matrice de Vandermonde d'ordre N=10
[ 0] 1 1 .... 1
[ 1] 1 0.5 .... 0.00195312
[ 2] 1 0.333333 .... 5.08053e-05
[ 3] 1 0.25 .... 3.8147e-06
[ 4] 1 0.2 .... 5.12e-07
[ 5] 1 0.166667 .... 9.9229e-08
[ 6] 1 0.142857 .... 2.47809e-08
[ 7] 1 0.125 .... 7.45058e-09
[ 8] 1 0.111111 .... 2.58117e-09
[ 9] 1 0.1 .... 1e-09
6.3. Démarche d’optimisation#
« Premature optimization is the root of all evil (or at least most of it) in programming.»
— Donald Knuth (créateur de TeX)
« It's easier to optimize correct code than to correct optimized code.»
"Un code correct et validé est indispensable. Un code optimisé non !"
6.4. étape d’optimisation#
1. écrire tout d'abord un code correcte et documenté
2. valider le code
3. faire du profiling (avec timeit) pour déterminer les fonctions les plus couteuses en temps CPU
4. optimiser uniquement ces fonctions (en général 5 à 10% du code total)
6.5. Optimisation sous Python#
Dans l’ordre:
utiliser les bibliothéques (numpy, scipy)
vectoriser les boucles internes
utiliser un compilateur python (Cython,..)
réecrire le code en C/FORTRAN
6.6. exemple: produit matrice vecteur#
\( b = A*x \) avec $\( b_i = \sum_j A_{ij} x_j \)$
programme avec boucles
vectorisation
utilisation de numpy dot
def produit(A,x):
""" calcul b=Ax """
n=x.size
b=np.zeros(n)
for i in range(n):
for j in range(n):
b[i] += A[i,j]*x[j]
return b
# version vectoriel 1
def produitv1(A,x):
n=x.size
b=np.zeros(n)
for i in range(n):
b[i] = np.sum(A[i,:]*x[:])
#b[i] = sum(A[i,:]*x[:]) # attention sum
return b
# version vectoriel 2
def produitv2(A,x):
B = A * x
b = np.sum(B[:,:],axis=1)
return b
N=1000
A = np.random.rand(N,N)
X = np.random.rand(N)
B0 = np.dot(A,X)
B1 = produit(A,X)
B2 = produitv1(A,X)
B3 = produitv2(A,X)
print("erreur=",np.linalg.norm(B1-B0),\
np.linalg.norm(B2-B0),\
np.linalg.norm(B3-B0))
erreur= 6.369507367631486e-12 1.5755474068481816e-12 1.5755474068481816e-12
%%timeit -n 2
B = produit(A,X)
462 ms ± 1 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
%%timeit -n 2
B = produitv1(A,X)
5.92 ms ± 339 μs per loop (mean ± std. dev. of 7 runs, 2 loops each)
%%timeit -n 2
B = produitv2(A,X)
1.66 ms ± 405 μs per loop (mean ± std. dev. of 7 runs, 2 loops each)
%%timeit -n 2
B = np.dot(A,X)
The slowest run took 11.73 times longer than the fastest. This could mean that an intermediate result is being cached.
464 μs ± 608 μs per loop (mean ± std. dev. of 7 runs, 2 loops each)
6.7. FIN#
np.__version__
'1.26.4'