尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
Effective Numerical Computation in NumPy and SciPy 
Kimikazu Kato 
PyCon JP 2014 
September 13, 2014 
1 / 35
About Myself 
Kimikazu Kato 
Chief Scientists at Silver Egg Technology Co., Ltd. 
Ph.D in Computer Science 
Background in Mathematics, Numerical Computation, Algorithms, etc. 
<2 year experience in Python 
>10 year experience in numerical computation 
Now designing algorithms for recommendation system, and doing research 
about machine learning and data analysis. 
2 / 35
This talk... 
is about effective usage of NumPy/SciPy 
is NOT exhaustive introduction of capabilities, but shows some case 
studies based on my experience and interest 
3 / 35
Table of Contents 
Introduction 
Basics about NumPy 
Broadcasting 
Indexing 
Sparse matrix 
Usage of scipy.sparse 
Internal structure 
Case studies 
Conclusion 
4 / 35
Numerical Computation 
Differential equations 
Simulations 
Signal processing 
Machine Learning 
etc... 
Why Numerical Computation in Python? 
Productivity 
Easy to write 
Easy to debug 
Connectivity with visualization tools 
Matplotlib 
IPython 
Connectivity with web system 
Many frameworks (Django, Pyramid, Flask, Bottle, etc.) 
5 / 35
But Python is Very Slow! 
Code in C 
#include <stdio.h> 
int main() { 
int i; double s=0; 
for (i=1; i<=100000000; i++) s+=i; 
printf("%.0fn",s); 
} 
Code in Python 
s=0. 
for i in xrange(1,100000001): 
s+=i 
print s 
Both of the codes compute the sum of integers from 1 to 100,000,000. 
Result of benchmark in a certain environment: 
Above: 0.109 sec (compiled with -O3 option) 
Below: 8.657 sec 
(80+ times slower!!) 
6 / 35
Better code 
import numpy as np 
a=np.arange(1,100000001) 
print a.sum() 
Now it takes 0.188 sec. (Measured by "time" command in Linux, loading time 
included) 
Still slower than C, but sufficiently fast as a script language. 
7 / 35
Lessons 
Python is very slow when written badly 
Translate C (or Java, C# etc.) code into Python is often a bad idea. 
Python-friendly rewriting sometimes result in drastic performance 
improvement 
8 / 35
Basic rules for better performance 
Avoid for-sentence as far as possible 
Utilize libraries' capabilities instead 
Forget about the cost of copying memory 
Typical C programmer might care about it, but ... 
9 / 35
Basic techniques for NumPy 
Broadcasting 
Indexing 
10 / 35
Broadcasting 
>>> import numpy as np 
>>> a=np.array([0,1,2]) 
>>> a*3 
array([0, 3, 6]) 
>>> b=np.array([1,4,9]) 
>>> np.sqrt(b) 
array([ 1., 2., 3.]) 
A function which is applied to each element when applied to an array is called 
a universal function. 
11 / 35
Broadcasting (2D) 
>>> import numpy as np 
>>> a=np.arange(9).reshape((3,3)) 
>>> b=np.array([1,2,3]) 
>>> a 
array([[0, 1, 2], 
[3, 4, 5], 
[6, 7, 8]]) 
>>> b 
array([1, 2, 3]) 
>>> a*b 
array([[ 0, 2, 6], 
[ 3, 8, 15], 
[ 6, 14, 24]]) 
12 / 35
Indexing 
>>> import numpy as np 
>>> a=np.arange(10) 
>>> a 
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 
>>> indices=np.arange(0,10,2) 
>>> indices 
array([0, 2, 4, 6, 8]) 
>>> a[indices]=0 
>>> a 
array([0, 1, 0, 3, 0, 5, 0, 7, 0, 9]) 
>>> b=np.arange(100,600,100) 
>>> b 
array([100, 200, 300, 400, 500]) 
>>> a[indices]=b 
>>> a 
array([100, 1, 200, 3, 300, 5, 400, 7, 500, 9]) 
13 / 35
Refernces 
Gabriele Lanaro, "Python High Performance Programming," Packt 
Publishing, 2013. 
Stéfan van der Walt, Numpy Medkit 
14 / 35
Sparse matrix 
Defined as a matrix in which most elements are zero 
Compressed data structure is used to express it, so that it will be... 
Space effective 
Time effective 
15 / 35
scipy.sparse 
The class scipy.sparse has mainly three types as expressions of a sparse 
matrix. (There are other types but not mentioned here) 
lil_matrix : convenient to set data; setting a[i,j] is fast 
csr_matrix : convenient for computation, fast to retrieve a row 
csc_matrix : convenient for computation, fast to retrieve a column 
Usually, set the data into lil_matrix, and then, convert it to csc_matrix or 
csr_matrix. 
For csr_matrix, and csc_matrix, calcutaion of matrices of the same type is fast, 
but you should avoid calculation of different types. 
16 / 35
Use case 
>>> from scipy.sparse import lil_matrix, csr_matrix 
>>> a=lil_matrix((3,3)) 
>>> a[0,0]=1.; a[0,2]=2. 
>>> a=a.tocsr() 
>>> print a 
(0, 0) 1.0 
(0, 2) 2.0 
>>> a.todense() 
matrix([[ 1., 0., 2.], 
[ 0., 0., 0.], 
[ 0., 0., 0.]]) 
>>> b=lil_matrix((3,3)) 
>>> b[1,1]=3.; b[2,0]=4.; b[2,2]=5. 
>>> b=b.tocsr() 
>>> b.todense() 
matrix([[ 0., 0., 0.], 
[ 0., 3., 0.], 
[ 4., 0., 5.]]) 
>>> c=a.dot(b) 
>>> c.todense() 
matrix([[ 8., 0., 10.], 
[ 0., 0., 0.], 
[ 0., 0., 0.]]) 
>>> d=a+b 
>>> d.todense() 
matrix([[ 1., 0., 2.], 
[ 0., 3., 0.], 
[ 4., 0., 5.]]) 17 / 35
Internal structure: csr_matrix 
>>> from scipy.sparse import lil_matrix, csr_matrix 
>>> a=lil_matrix((3,3)) 
>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5. 
>>> b=a.tocsr() 
>>> b.todense() 
matrix([[ 0., 1., 2.], 
[ 0., 0., 3.], 
[ 4., 5., 0.]]) 
>>> b.indices 
array([1, 2, 2, 0, 1], dtype=int32) 
>>> b.data 
array([ 1., 2., 3., 4., 5.]) 
>>> b.indptr 
array([0, 2, 3, 5], dtype=int32) 
18 / 35
Internal structure: csc_matrix 
>>> from scipy.sparse import lil_matrix, csr_matrix 
>>> a=lil_matrix((3,3)) 
>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5. 
>>> b=a.tocsc() 
>>> b.todense() 
matrix([[ 0., 1., 2.], 
[ 0., 0., 3.], 
[ 4., 5., 0.]]) 
>>> b.indices 
array([2, 0, 2, 0, 1], dtype=int32) 
>>> b.data 
array([ 4., 1., 5., 2., 3.]) 
>>> b.indptr 
array([0, 1, 3, 5], dtype=int32) 
19 / 35
Merit of knowing the internal structure 
Setting csr_matrix or csc_matrix with its internal structure is much faster than 
setting lil_matrix with indices. 
See the benchmark of setting 
 
 
 
  
  
ý ý 
ý  
 
 
 
 
20 / 35
from scipy.sparse import lil_matrix, csr_matrix 
import numpy as np 
from timeit import timeit 
def set_lil(n): 
a=lil_matrix((n,n)) 
for i in xrange(n): 
a[i,i]=2. 
if i+1n: 
a[i,i+1]=1. 
return a 
def set_csr(n): 
data=np.empty(2*n-1) 
indices=np.empty(2*n-1,dtype=np.int32) 
indptr=np.empty(n+1,dtype=np.int32) 
# to be fair, for-sentence is intentionally used 
# (using indexing technique is faster) 
for i in xrange(n): 
indices[2*i]=i 
data[2*i]=2. 
if in-1: 
indices[2*i+1]=i+1 
data[2*i+1]=1. 
indptr[i]=2*i 
indptr[n]=2*n-1 
a=csr_matrix((data,indices,indptr),shape=(n,n)) 
return a 
print lil:,timeit(set_lil(10000), 
number=10,setup=from __main__ import set_lil) 
print csr:,timeit(set_csr(10000), 
number=10,setup=from __main__ import set_csr) 
21 / 35
Result: 
lil: 11.6730761528 
csr: 0.0562081336975 
Remark 
When you deal with already sorted data, setting csr_matrix or csc_matrix 
with data, indices, indptr is much faster than setting lil_matrix 
But the code tend to be more complicated if you use the internal structure 
of csr_matrix or csc_matrix 
22 / 35
Case Studies 
23 / 35
Case 1: Norms 
If 2 
is dense: 
norm=np.dot(v,v) 
Ï2  Ï % 
2% 
Expressed as product of matrices. (dot means matrix product, but you don't 
have to take transpose explicitly.) 
When is sparse, suppose that is expressed as matrix: 
2 2  g * 
norm=v.multiply(v).sum() 
(multiply() is element-wise product) 
This is because taking transpose of a sparse matrix changes the type. 
24 / 35
Frobenius norm: 
norm=a.multiply(a).sum() 
 ÏÏ'SP % 
 % 
25 / 35
Case 2: Applying a function to all of the elements of a 
sparse matrix 
A universal function can be applied to a dense matrix: 
 import numpy as np 
 a=np.arange(9).reshape((3,3)) 
 a 
array([[0, 1, 2], 
[3, 4, 5], 
[6, 7, 8]]) 
 np.tanh(a) 
array([[ 0. , 0.76159416, 0.96402758], 
[ 0.99505475, 0.9993293 , 0.9999092 ], 
[ 0.99998771, 0.99999834, 0.99999977]]) 
This is convenient and fast. 
However, we cannot do the same thing for a sparse matrix. 
26 / 35
from scipy.sparse import lil_matrix 
 a=lil_matrix((3,3)) 
 a[0,0]=1. 
 a[1,0]=2. 
 b=a.tocsr() 
 np.tanh(b) 
3x3 sparse matrix of type 'type 'numpy.float64'' 
with 2 stored elements in Compressed Sparse Row format 
This is because, for an arbitrary function, its application to a sparse matrix is 
not necessarily sparse. 
However, if a universal function  satisfies 	
   
, the density is 
preserved. 
Then, how can we compute it? 
27 / 35
Use the internal structure!! 
The positions of the non-zero elements are not changed after application of 
the function. 
Keep indices and indptr, and just change data. 
Solution: 
b = csr_matrix((np.tanh(a.data), a.indices, a.indptr), shape=a.shape) 
28 / 35
Case 3: Formula which appears in a paper 
In the algorithm for recommendation system [1], the following formula 
appears: 
 øø   
 * g  
where is dense matrix, and D is a diagonal matrix defined from a 
given array as: 
	 %
 
  
 
 
 
  
  
ý 
 * 
 
 
 
Here, (which corresponds to the number of users or items) is big and 
(which means the number of latent factors) is small. 
[1] Hu et al. Collaborative Filtering for Implicit Feedback Datasets, ICDM, 
2008. 
*  
29 / 35
Solution 1: 
There is a special class dia_matrix to deal with a diagonal sparse matrix. 
import scipy.sparse as sparse 
import numpy as np 
def f(a,d): 
a: 2d array of shape (n,f), d: 1d array of length n 
dd=sparse.diags([d],[0]) 
return np.dot(a.T,dd.dot(a)) 
30 / 35
Solution 2: 
Pack csr_matrix with data,indices,indptr 
data=d 
indices=[0,1,..,n] 
indptr=[0,1,...,n+1] 
def g(a,d): 
n,f=a.shape 
data=d 
indices=np.arange(n) 
indptr=np.arange(n+1) 
dd=sparse.csr_matrix((data,indices,indptr),shape=(n,n)) 
return np.dot(a.T,dd.dot(a)) 
31 / 35
Solution 3: 
 
  
 
 
û 
) 
 
 
û 
) 
	 
  g g   
 
  
  
  
û 
)  
  
  
û 
)  
This is equivalent to the broadcasting! 
def h(a,d): 
return np.dot(a.T*d,a) 
ü 
ü 
ü 
* 
* 
û 
*) 
 
  
 
 
   
  
ý 
 * 
 
 
 
ü 
ü 
 g  
ü 
* * 
* * 
û 
*) * 
 
  
32 / 35
Benchmark 
def datagen(n,f): 
np.random.seed(0) 
a=np.random.random((n,f)) 
d=np.random.random(n) 
return a,d 
from timeit import timeit 
print dia_matrix :,timeit(f(a,d),number=10, 
setup=from __main__ import f,datagen; a,d=datagen(1000000,10)) 
print csr_matrix :,timeit(g(a,d),number=10, 
setup=from __main__ import g,datagen; a,d=datagen(1000000,10)) 
print broadcasting :,timeit(h(a,d),number=10, 
setup=from __main__ import h,datagen; a,d=datagen(1000000,10)) 
Result: 
dia_matrix : 1.60458707809 
csr_matrix : 1.32580018044 
broadcasting : 1.30032682419 
33 / 35
Conclusion 
Try not to use for-sentence, but use libraries' capabilities instead. 
Knowledge about the internal structure of the sparse matrix is useful to 
extract further performance. 
Mathematical derivation is important. The key is to find a mathematically 
equivalent and Python-friendly formula. 
Computational speed does not necessarily matter. Finding a better code in 
a short time is valuable. Otherwise, you shouldn't pursue too much. 
34 / 35
Acknowledgment 
I would like to thank 
(@shima__shima) 
who gave me useful advice in Twitter. 
35 / 35

More Related Content

What's hot

Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
SylvainGugger
 
Pytorch
PytorchPytorch
Pytorch
ehsan tr
 
Expectation maximization
Expectation maximizationExpectation maximization
Expectation maximization
LALAOUIBENCHERIFSIDI
 
NumPy/SciPy Statistics
NumPy/SciPy StatisticsNumPy/SciPy Statistics
NumPy/SciPy Statistics
Enthought, Inc.
 
Logiques de descriptions.pptx
Logiques de descriptions.pptxLogiques de descriptions.pptx
Logiques de descriptions.pptx
mohmll
 
Introduction to particle swarm optimization
Introduction to particle swarm optimizationIntroduction to particle swarm optimization
Introduction to particle swarm optimization
Mrinmoy Majumder
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
shaurya uppal
 
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
akira-ai
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
BERT
BERTBERT
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro
9xdot
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
Simplilearn
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
Chia-Wen Cheng
 
INTRODUCTION TO ALGORITHMS Third Edition
INTRODUCTION TO ALGORITHMS Third EditionINTRODUCTION TO ALGORITHMS Third Edition
INTRODUCTION TO ALGORITHMS Third Edition
PHI Learning Pvt. Ltd.
 
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Edureka!
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
Knoldus Inc.
 
NumPy
NumPyNumPy
Introduction to numpy
Introduction to numpyIntroduction to numpy
Introduction to numpy
Gaurav Aggarwal
 
NUMPY
NUMPY NUMPY
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPy
Devashish Kumar
 

What's hot (20)

Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
Pytorch
PytorchPytorch
Pytorch
 
Expectation maximization
Expectation maximizationExpectation maximization
Expectation maximization
 
NumPy/SciPy Statistics
NumPy/SciPy StatisticsNumPy/SciPy Statistics
NumPy/SciPy Statistics
 
Logiques de descriptions.pptx
Logiques de descriptions.pptxLogiques de descriptions.pptx
Logiques de descriptions.pptx
 
Introduction to particle swarm optimization
Introduction to particle swarm optimizationIntroduction to particle swarm optimization
Introduction to particle swarm optimization
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
BERT
BERTBERT
BERT
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
INTRODUCTION TO ALGORITHMS Third Edition
INTRODUCTION TO ALGORITHMS Third EditionINTRODUCTION TO ALGORITHMS Third Edition
INTRODUCTION TO ALGORITHMS Third Edition
 
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
NumPy
NumPyNumPy
NumPy
 
Introduction to numpy
Introduction to numpyIntroduction to numpy
Introduction to numpy
 
NUMPY
NUMPY NUMPY
NUMPY
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPy
 

Viewers also liked

Zuang-FPSGD
Zuang-FPSGDZuang-FPSGD
Zuang-FPSGD
Kimikazu Kato
 
A Safe Rule for Sparse Logistic Regression
A Safe Rule for Sparse Logistic RegressionA Safe Rule for Sparse Logistic Regression
A Safe Rule for Sparse Logistic Regression
Kimikazu Kato
 
Recommendation System --Theory and Practice
Recommendation System --Theory and PracticeRecommendation System --Theory and Practice
Recommendation System --Theory and Practice
Kimikazu Kato
 
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
Kimikazu Kato
 
特定の不快感を与えるツイートの分類と自動生成について
特定の不快感を与えるツイートの分類と自動生成について特定の不快感を与えるツイートの分類と自動生成について
特定の不快感を与えるツイートの分類と自動生成について
Kimikazu Kato
 
About Our Recommender System
About Our Recommender SystemAbout Our Recommender System
About Our Recommender System
Kimikazu Kato
 
養成読本と私
養成読本と私養成読本と私
養成読本と私
Kimikazu Kato
 
Googleにおける機械学習の活用とクラウドサービス
Googleにおける機械学習の活用とクラウドサービスGoogleにおける機械学習の活用とクラウドサービス
Googleにおける機械学習の活用とクラウドサービス
Etsuji Nakai
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013Shuyo Nakatani
 
「TensorFlow Tutorialの数学的背景」 クイックツアー(パート1)
「TensorFlow Tutorialの数学的背景」 クイックツアー(パート1)「TensorFlow Tutorialの数学的背景」 クイックツアー(パート1)
「TensorFlow Tutorialの数学的背景」 クイックツアー(パート1)
Etsuji Nakai
 
Googleのインフラ技術に見る基盤標準化とDevOpsの真実
Googleのインフラ技術に見る基盤標準化とDevOpsの真実Googleのインフラ技術に見る基盤標準化とDevOpsの真実
Googleのインフラ技術に見る基盤標準化とDevOpsの真実
Etsuji Nakai
 
Life with jupyter
Life with jupyterLife with jupyter
Life with jupyter
Etsuji Nakai
 
Numpy scipy matplotlibの紹介
Numpy scipy matplotlibの紹介Numpy scipy matplotlibの紹介
Numpy scipy matplotlibの紹介
Tatsuro Yasukawa
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
Etsuji Nakai
 
数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ
Shuyo Nakatani
 
NumPy闇入門
NumPy闇入門NumPy闇入門
NumPy闇入門
Ryosuke Okuta
 
Spannerに関する技術メモ
Spannerに関する技術メモSpannerに関する技術メモ
Spannerに関する技術メモ
Etsuji Nakai
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyoShuyo Nakatani
 
Using Kubernetes on Google Container Engine
Using Kubernetes on Google Container EngineUsing Kubernetes on Google Container Engine
Using Kubernetes on Google Container Engine
Etsuji Nakai
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013Shuyo Nakatani
 

Viewers also liked (20)

Zuang-FPSGD
Zuang-FPSGDZuang-FPSGD
Zuang-FPSGD
 
A Safe Rule for Sparse Logistic Regression
A Safe Rule for Sparse Logistic RegressionA Safe Rule for Sparse Logistic Regression
A Safe Rule for Sparse Logistic Regression
 
Recommendation System --Theory and Practice
Recommendation System --Theory and PracticeRecommendation System --Theory and Practice
Recommendation System --Theory and Practice
 
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
 
特定の不快感を与えるツイートの分類と自動生成について
特定の不快感を与えるツイートの分類と自動生成について特定の不快感を与えるツイートの分類と自動生成について
特定の不快感を与えるツイートの分類と自動生成について
 
About Our Recommender System
About Our Recommender SystemAbout Our Recommender System
About Our Recommender System
 
養成読本と私
養成読本と私養成読本と私
養成読本と私
 
Googleにおける機械学習の活用とクラウドサービス
Googleにおける機械学習の活用とクラウドサービスGoogleにおける機械学習の活用とクラウドサービス
Googleにおける機械学習の活用とクラウドサービス
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
 
「TensorFlow Tutorialの数学的背景」 クイックツアー(パート1)
「TensorFlow Tutorialの数学的背景」 クイックツアー(パート1)「TensorFlow Tutorialの数学的背景」 クイックツアー(パート1)
「TensorFlow Tutorialの数学的背景」 クイックツアー(パート1)
 
Googleのインフラ技術に見る基盤標準化とDevOpsの真実
Googleのインフラ技術に見る基盤標準化とDevOpsの真実Googleのインフラ技術に見る基盤標準化とDevOpsの真実
Googleのインフラ技術に見る基盤標準化とDevOpsの真実
 
Life with jupyter
Life with jupyterLife with jupyter
Life with jupyter
 
Numpy scipy matplotlibの紹介
Numpy scipy matplotlibの紹介Numpy scipy matplotlibの紹介
Numpy scipy matplotlibの紹介
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
 
数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ
 
NumPy闇入門
NumPy闇入門NumPy闇入門
NumPy闇入門
 
Spannerに関する技術メモ
Spannerに関する技術メモSpannerに関する技術メモ
Spannerに関する技術メモ
 
言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo言語処理するのに Python でいいの? #PyDataTokyo
言語処理するのに Python でいいの? #PyDataTokyo
 
Using Kubernetes on Google Container Engine
Using Kubernetes on Google Container EngineUsing Kubernetes on Google Container Engine
Using Kubernetes on Google Container Engine
 
数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013数式を綺麗にプログラミングするコツ #spro2013
数式を綺麗にプログラミングするコツ #spro2013
 

Similar to Effective Numerical Computation in NumPy and SciPy

Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Arnaud Joly
 
Introduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning ProgrammersIntroduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning Programmers
Kimikazu Kato
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
g3_nittala
 
CE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdfCE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdf
UmarMustafa13
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientists
Lambda Tree
 
Advance data structure & algorithm
Advance data structure & algorithmAdvance data structure & algorithm
Advance data structure & algorithm
K Hari Shankar
 
Python for Scientific Computing -- Ricardo Cruz
Python for Scientific Computing -- Ricardo CruzPython for Scientific Computing -- Ricardo Cruz
Python for Scientific Computing -- Ricardo Cruz
rpmcruz
 
Ds lab manual by s.k.rath
Ds lab manual by s.k.rathDs lab manual by s.k.rath
Ds lab manual by s.k.rath
SANTOSH RATH
 
Numpy python cheat_sheet
Numpy python cheat_sheetNumpy python cheat_sheet
Numpy python cheat_sheet
Nishant Upadhyay
 
Python_cheatsheet_numpy.pdf
Python_cheatsheet_numpy.pdfPython_cheatsheet_numpy.pdf
Python_cheatsheet_numpy.pdf
AnonymousUser67
 
Numpy python cheat_sheet
Numpy python cheat_sheetNumpy python cheat_sheet
Numpy python cheat_sheet
Zahid Hasan
 
Time Series Analysis:Basic Stochastic Signal Recovery
Time Series Analysis:Basic Stochastic Signal RecoveryTime Series Analysis:Basic Stochastic Signal Recovery
Time Series Analysis:Basic Stochastic Signal Recovery
Daniel Cuneo
 
07. Arrays
07. Arrays07. Arrays
07. Arrays
Intro C# Book
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
PyData
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)
PyData
 
Introduction to NumPy
Introduction to NumPyIntroduction to NumPy
Introduction to NumPy
Huy Nguyen
 
Writing Faster Python 3
Writing Faster Python 3Writing Faster Python 3
Writing Faster Python 3
Sebastian Witowski
 
From NumPy to PyTorch
From NumPy to PyTorchFrom NumPy to PyTorch
From NumPy to PyTorch
Mike Ruberry
 
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
Abd El Kareem Ahmed
 
Clustering com numpy e cython
Clustering com numpy e cythonClustering com numpy e cython
Clustering com numpy e cython
Anderson Dantas
 

Similar to Effective Numerical Computation in NumPy and SciPy (20)

Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
 
Introduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning ProgrammersIntroduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning Programmers
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
CE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdfCE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdf
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientists
 
Advance data structure & algorithm
Advance data structure & algorithmAdvance data structure & algorithm
Advance data structure & algorithm
 
Python for Scientific Computing -- Ricardo Cruz
Python for Scientific Computing -- Ricardo CruzPython for Scientific Computing -- Ricardo Cruz
Python for Scientific Computing -- Ricardo Cruz
 
Ds lab manual by s.k.rath
Ds lab manual by s.k.rathDs lab manual by s.k.rath
Ds lab manual by s.k.rath
 
Numpy python cheat_sheet
Numpy python cheat_sheetNumpy python cheat_sheet
Numpy python cheat_sheet
 
Python_cheatsheet_numpy.pdf
Python_cheatsheet_numpy.pdfPython_cheatsheet_numpy.pdf
Python_cheatsheet_numpy.pdf
 
Numpy python cheat_sheet
Numpy python cheat_sheetNumpy python cheat_sheet
Numpy python cheat_sheet
 
Time Series Analysis:Basic Stochastic Signal Recovery
Time Series Analysis:Basic Stochastic Signal RecoveryTime Series Analysis:Basic Stochastic Signal Recovery
Time Series Analysis:Basic Stochastic Signal Recovery
 
07. Arrays
07. Arrays07. Arrays
07. Arrays
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)
 
Introduction to NumPy
Introduction to NumPyIntroduction to NumPy
Introduction to NumPy
 
Writing Faster Python 3
Writing Faster Python 3Writing Faster Python 3
Writing Faster Python 3
 
From NumPy to PyTorch
From NumPy to PyTorchFrom NumPy to PyTorch
From NumPy to PyTorch
 
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
 
Clustering com numpy e cython
Clustering com numpy e cythonClustering com numpy e cython
Clustering com numpy e cython
 

More from Kimikazu Kato

Tokyo webmining 2017-10-28
Tokyo webmining 2017-10-28Tokyo webmining 2017-10-28
Tokyo webmining 2017-10-28
Kimikazu Kato
 
機械学習ゴリゴリ派のための数学とPython
機械学習ゴリゴリ派のための数学とPython機械学習ゴリゴリ派のための数学とPython
機械学習ゴリゴリ派のための数学とPython
Kimikazu Kato
 
Pythonを使った機械学習の学習
Pythonを使った機械学習の学習Pythonを使った機械学習の学習
Pythonを使った機械学習の学習
Kimikazu Kato
 
Fast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-MeansFast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-Means
Kimikazu Kato
 
Pythonで機械学習入門以前
Pythonで機械学習入門以前Pythonで機械学習入門以前
Pythonで機械学習入門以前
Kimikazu Kato
 
Pythonによる機械学習
Pythonによる機械学習Pythonによる機械学習
Pythonによる機械学習
Kimikazu Kato
 
Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation system
Kimikazu Kato
 
Pythonによる機械学習の最前線
Pythonによる機械学習の最前線Pythonによる機械学習の最前線
Pythonによる機械学習の最前線
Kimikazu Kato
 
Sparse pca via bipartite matching
Sparse pca via bipartite matchingSparse pca via bipartite matching
Sparse pca via bipartite matching
Kimikazu Kato
 
正しいプログラミング言語の覚え方
正しいプログラミング言語の覚え方正しいプログラミング言語の覚え方
正しいプログラミング言語の覚え方
Kimikazu Kato
 
Sapporo20140709
Sapporo20140709Sapporo20140709
Sapporo20140709
Kimikazu Kato
 
ネット通販向けレコメンドシステム提供サービスについて
ネット通販向けレコメンドシステム提供サービスについてネット通販向けレコメンドシステム提供サービスについて
ネット通販向けレコメンドシステム提供サービスについて
Kimikazu Kato
 
関東GPGPU勉強会資料
関東GPGPU勉強会資料関東GPGPU勉強会資料
関東GPGPU勉強会資料
Kimikazu Kato
 
2012-03-08 MSS研究会
2012-03-08 MSS研究会2012-03-08 MSS研究会
2012-03-08 MSS研究会
Kimikazu Kato
 
純粋関数型アルゴリズム入門
純粋関数型アルゴリズム入門純粋関数型アルゴリズム入門
純粋関数型アルゴリズム入門
Kimikazu Kato
 

More from Kimikazu Kato (15)

Tokyo webmining 2017-10-28
Tokyo webmining 2017-10-28Tokyo webmining 2017-10-28
Tokyo webmining 2017-10-28
 
機械学習ゴリゴリ派のための数学とPython
機械学習ゴリゴリ派のための数学とPython機械学習ゴリゴリ派のための数学とPython
機械学習ゴリゴリ派のための数学とPython
 
Pythonを使った機械学習の学習
Pythonを使った機械学習の学習Pythonを使った機械学習の学習
Pythonを使った機械学習の学習
 
Fast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-MeansFast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-Means
 
Pythonで機械学習入門以前
Pythonで機械学習入門以前Pythonで機械学習入門以前
Pythonで機械学習入門以前
 
Pythonによる機械学習
Pythonによる機械学習Pythonによる機械学習
Pythonによる機械学習
 
Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation system
 
Pythonによる機械学習の最前線
Pythonによる機械学習の最前線Pythonによる機械学習の最前線
Pythonによる機械学習の最前線
 
Sparse pca via bipartite matching
Sparse pca via bipartite matchingSparse pca via bipartite matching
Sparse pca via bipartite matching
 
正しいプログラミング言語の覚え方
正しいプログラミング言語の覚え方正しいプログラミング言語の覚え方
正しいプログラミング言語の覚え方
 
Sapporo20140709
Sapporo20140709Sapporo20140709
Sapporo20140709
 
ネット通販向けレコメンドシステム提供サービスについて
ネット通販向けレコメンドシステム提供サービスについてネット通販向けレコメンドシステム提供サービスについて
ネット通販向けレコメンドシステム提供サービスについて
 
関東GPGPU勉強会資料
関東GPGPU勉強会資料関東GPGPU勉強会資料
関東GPGPU勉強会資料
 
2012-03-08 MSS研究会
2012-03-08 MSS研究会2012-03-08 MSS研究会
2012-03-08 MSS研究会
 
純粋関数型アルゴリズム入門
純粋関数型アルゴリズム入門純粋関数型アルゴリズム入門
純粋関数型アルゴリズム入門
 

Recently uploaded

Leveraging AI for Software Developer Productivity.pptx
Leveraging AI for Software Developer Productivity.pptxLeveraging AI for Software Developer Productivity.pptx
Leveraging AI for Software Developer Productivity.pptx
petabridge
 
Supplier Sourcing Presentation - Gay De La Cruz.pdf
Supplier Sourcing Presentation - Gay De La Cruz.pdfSupplier Sourcing Presentation - Gay De La Cruz.pdf
Supplier Sourcing Presentation - Gay De La Cruz.pdf
gaydlc2513
 
Ubuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdfUbuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdf
TechOnDemandSolution
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
UmmeSalmaM1
 
Database Management Myths for Developers
Database Management Myths for DevelopersDatabase Management Myths for Developers
Database Management Myths for Developers
John Sterrett
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
SOFTTECHHUB
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
Cynthia Thomas
 
An Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise IntegrationAn Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise Integration
Safe Software
 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
Databarracks
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
ThousandEyes
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
Kieran Kunhya
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
UiPathCommunity
 
Corporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade LaterCorporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade Later
ScyllaDB
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
Mydbops
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
Larry Smarr
 
Building a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data PlatformBuilding a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data Platform
Enterprise Knowledge
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
ScyllaDB
 

Recently uploaded (20)

Leveraging AI for Software Developer Productivity.pptx
Leveraging AI for Software Developer Productivity.pptxLeveraging AI for Software Developer Productivity.pptx
Leveraging AI for Software Developer Productivity.pptx
 
Supplier Sourcing Presentation - Gay De La Cruz.pdf
Supplier Sourcing Presentation - Gay De La Cruz.pdfSupplier Sourcing Presentation - Gay De La Cruz.pdf
Supplier Sourcing Presentation - Gay De La Cruz.pdf
 
Ubuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdfUbuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdf
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
 
Database Management Myths for Developers
Database Management Myths for DevelopersDatabase Management Myths for Developers
Database Management Myths for Developers
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
 
An Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise IntegrationAn Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise Integration
 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
 
Corporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade LaterCorporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade Later
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
 
Building a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data PlatformBuilding a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data Platform
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
 

Effective Numerical Computation in NumPy and SciPy

  • 1. Effective Numerical Computation in NumPy and SciPy Kimikazu Kato PyCon JP 2014 September 13, 2014 1 / 35
  • 2. About Myself Kimikazu Kato Chief Scientists at Silver Egg Technology Co., Ltd. Ph.D in Computer Science Background in Mathematics, Numerical Computation, Algorithms, etc. <2 year experience in Python >10 year experience in numerical computation Now designing algorithms for recommendation system, and doing research about machine learning and data analysis. 2 / 35
  • 3. This talk... is about effective usage of NumPy/SciPy is NOT exhaustive introduction of capabilities, but shows some case studies based on my experience and interest 3 / 35
  • 4. Table of Contents Introduction Basics about NumPy Broadcasting Indexing Sparse matrix Usage of scipy.sparse Internal structure Case studies Conclusion 4 / 35
  • 5. Numerical Computation Differential equations Simulations Signal processing Machine Learning etc... Why Numerical Computation in Python? Productivity Easy to write Easy to debug Connectivity with visualization tools Matplotlib IPython Connectivity with web system Many frameworks (Django, Pyramid, Flask, Bottle, etc.) 5 / 35
  • 6. But Python is Very Slow! Code in C #include <stdio.h> int main() { int i; double s=0; for (i=1; i<=100000000; i++) s+=i; printf("%.0fn",s); } Code in Python s=0. for i in xrange(1,100000001): s+=i print s Both of the codes compute the sum of integers from 1 to 100,000,000. Result of benchmark in a certain environment: Above: 0.109 sec (compiled with -O3 option) Below: 8.657 sec (80+ times slower!!) 6 / 35
  • 7. Better code import numpy as np a=np.arange(1,100000001) print a.sum() Now it takes 0.188 sec. (Measured by "time" command in Linux, loading time included) Still slower than C, but sufficiently fast as a script language. 7 / 35
  • 8. Lessons Python is very slow when written badly Translate C (or Java, C# etc.) code into Python is often a bad idea. Python-friendly rewriting sometimes result in drastic performance improvement 8 / 35
  • 9. Basic rules for better performance Avoid for-sentence as far as possible Utilize libraries' capabilities instead Forget about the cost of copying memory Typical C programmer might care about it, but ... 9 / 35
  • 10. Basic techniques for NumPy Broadcasting Indexing 10 / 35
  • 11. Broadcasting >>> import numpy as np >>> a=np.array([0,1,2]) >>> a*3 array([0, 3, 6]) >>> b=np.array([1,4,9]) >>> np.sqrt(b) array([ 1., 2., 3.]) A function which is applied to each element when applied to an array is called a universal function. 11 / 35
  • 12. Broadcasting (2D) >>> import numpy as np >>> a=np.arange(9).reshape((3,3)) >>> b=np.array([1,2,3]) >>> a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) >>> b array([1, 2, 3]) >>> a*b array([[ 0, 2, 6], [ 3, 8, 15], [ 6, 14, 24]]) 12 / 35
  • 13. Indexing >>> import numpy as np >>> a=np.arange(10) >>> a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> indices=np.arange(0,10,2) >>> indices array([0, 2, 4, 6, 8]) >>> a[indices]=0 >>> a array([0, 1, 0, 3, 0, 5, 0, 7, 0, 9]) >>> b=np.arange(100,600,100) >>> b array([100, 200, 300, 400, 500]) >>> a[indices]=b >>> a array([100, 1, 200, 3, 300, 5, 400, 7, 500, 9]) 13 / 35
  • 14. Refernces Gabriele Lanaro, "Python High Performance Programming," Packt Publishing, 2013. Stéfan van der Walt, Numpy Medkit 14 / 35
  • 15. Sparse matrix Defined as a matrix in which most elements are zero Compressed data structure is used to express it, so that it will be... Space effective Time effective 15 / 35
  • 16. scipy.sparse The class scipy.sparse has mainly three types as expressions of a sparse matrix. (There are other types but not mentioned here) lil_matrix : convenient to set data; setting a[i,j] is fast csr_matrix : convenient for computation, fast to retrieve a row csc_matrix : convenient for computation, fast to retrieve a column Usually, set the data into lil_matrix, and then, convert it to csc_matrix or csr_matrix. For csr_matrix, and csc_matrix, calcutaion of matrices of the same type is fast, but you should avoid calculation of different types. 16 / 35
  • 17. Use case >>> from scipy.sparse import lil_matrix, csr_matrix >>> a=lil_matrix((3,3)) >>> a[0,0]=1.; a[0,2]=2. >>> a=a.tocsr() >>> print a (0, 0) 1.0 (0, 2) 2.0 >>> a.todense() matrix([[ 1., 0., 2.], [ 0., 0., 0.], [ 0., 0., 0.]]) >>> b=lil_matrix((3,3)) >>> b[1,1]=3.; b[2,0]=4.; b[2,2]=5. >>> b=b.tocsr() >>> b.todense() matrix([[ 0., 0., 0.], [ 0., 3., 0.], [ 4., 0., 5.]]) >>> c=a.dot(b) >>> c.todense() matrix([[ 8., 0., 10.], [ 0., 0., 0.], [ 0., 0., 0.]]) >>> d=a+b >>> d.todense() matrix([[ 1., 0., 2.], [ 0., 3., 0.], [ 4., 0., 5.]]) 17 / 35
  • 18. Internal structure: csr_matrix >>> from scipy.sparse import lil_matrix, csr_matrix >>> a=lil_matrix((3,3)) >>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5. >>> b=a.tocsr() >>> b.todense() matrix([[ 0., 1., 2.], [ 0., 0., 3.], [ 4., 5., 0.]]) >>> b.indices array([1, 2, 2, 0, 1], dtype=int32) >>> b.data array([ 1., 2., 3., 4., 5.]) >>> b.indptr array([0, 2, 3, 5], dtype=int32) 18 / 35
  • 19. Internal structure: csc_matrix >>> from scipy.sparse import lil_matrix, csr_matrix >>> a=lil_matrix((3,3)) >>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5. >>> b=a.tocsc() >>> b.todense() matrix([[ 0., 1., 2.], [ 0., 0., 3.], [ 4., 5., 0.]]) >>> b.indices array([2, 0, 2, 0, 1], dtype=int32) >>> b.data array([ 4., 1., 5., 2., 3.]) >>> b.indptr array([0, 1, 3, 5], dtype=int32) 19 / 35
  • 20. Merit of knowing the internal structure Setting csr_matrix or csc_matrix with its internal structure is much faster than setting lil_matrix with indices. See the benchmark of setting ý ý ý 20 / 35
  • 21. from scipy.sparse import lil_matrix, csr_matrix import numpy as np from timeit import timeit def set_lil(n): a=lil_matrix((n,n)) for i in xrange(n): a[i,i]=2. if i+1n: a[i,i+1]=1. return a def set_csr(n): data=np.empty(2*n-1) indices=np.empty(2*n-1,dtype=np.int32) indptr=np.empty(n+1,dtype=np.int32) # to be fair, for-sentence is intentionally used # (using indexing technique is faster) for i in xrange(n): indices[2*i]=i data[2*i]=2. if in-1: indices[2*i+1]=i+1 data[2*i+1]=1. indptr[i]=2*i indptr[n]=2*n-1 a=csr_matrix((data,indices,indptr),shape=(n,n)) return a print lil:,timeit(set_lil(10000), number=10,setup=from __main__ import set_lil) print csr:,timeit(set_csr(10000), number=10,setup=from __main__ import set_csr) 21 / 35
  • 22. Result: lil: 11.6730761528 csr: 0.0562081336975 Remark When you deal with already sorted data, setting csr_matrix or csc_matrix with data, indices, indptr is much faster than setting lil_matrix But the code tend to be more complicated if you use the internal structure of csr_matrix or csc_matrix 22 / 35
  • 24. Case 1: Norms If 2 is dense: norm=np.dot(v,v) Ï2 Ï % 2% Expressed as product of matrices. (dot means matrix product, but you don't have to take transpose explicitly.) When is sparse, suppose that is expressed as matrix: 2 2 g * norm=v.multiply(v).sum() (multiply() is element-wise product) This is because taking transpose of a sparse matrix changes the type. 24 / 35
  • 26. Case 2: Applying a function to all of the elements of a sparse matrix A universal function can be applied to a dense matrix: import numpy as np a=np.arange(9).reshape((3,3)) a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) np.tanh(a) array([[ 0. , 0.76159416, 0.96402758], [ 0.99505475, 0.9993293 , 0.9999092 ], [ 0.99998771, 0.99999834, 0.99999977]]) This is convenient and fast. However, we cannot do the same thing for a sparse matrix. 26 / 35
  • 27. from scipy.sparse import lil_matrix a=lil_matrix((3,3)) a[0,0]=1. a[1,0]=2. b=a.tocsr() np.tanh(b) 3x3 sparse matrix of type 'type 'numpy.float64'' with 2 stored elements in Compressed Sparse Row format This is because, for an arbitrary function, its application to a sparse matrix is not necessarily sparse. However, if a universal function satisfies , the density is preserved. Then, how can we compute it? 27 / 35
  • 28. Use the internal structure!! The positions of the non-zero elements are not changed after application of the function. Keep indices and indptr, and just change data. Solution: b = csr_matrix((np.tanh(a.data), a.indices, a.indptr), shape=a.shape) 28 / 35
  • 29. Case 3: Formula which appears in a paper In the algorithm for recommendation system [1], the following formula appears: øø * g where is dense matrix, and D is a diagonal matrix defined from a given array as: % ý * Here, (which corresponds to the number of users or items) is big and (which means the number of latent factors) is small. [1] Hu et al. Collaborative Filtering for Implicit Feedback Datasets, ICDM, 2008. * 29 / 35
  • 30. Solution 1: There is a special class dia_matrix to deal with a diagonal sparse matrix. import scipy.sparse as sparse import numpy as np def f(a,d): a: 2d array of shape (n,f), d: 1d array of length n dd=sparse.diags([d],[0]) return np.dot(a.T,dd.dot(a)) 30 / 35
  • 31. Solution 2: Pack csr_matrix with data,indices,indptr data=d indices=[0,1,..,n] indptr=[0,1,...,n+1] def g(a,d): n,f=a.shape data=d indices=np.arange(n) indptr=np.arange(n+1) dd=sparse.csr_matrix((data,indices,indptr),shape=(n,n)) return np.dot(a.T,dd.dot(a)) 31 / 35
  • 32. Solution 3: û ) û ) g g û ) û ) This is equivalent to the broadcasting! def h(a,d): return np.dot(a.T*d,a) ü ü ü * * û *) ý * ü ü g ü * * * * û *) * 32 / 35
  • 33. Benchmark def datagen(n,f): np.random.seed(0) a=np.random.random((n,f)) d=np.random.random(n) return a,d from timeit import timeit print dia_matrix :,timeit(f(a,d),number=10, setup=from __main__ import f,datagen; a,d=datagen(1000000,10)) print csr_matrix :,timeit(g(a,d),number=10, setup=from __main__ import g,datagen; a,d=datagen(1000000,10)) print broadcasting :,timeit(h(a,d),number=10, setup=from __main__ import h,datagen; a,d=datagen(1000000,10)) Result: dia_matrix : 1.60458707809 csr_matrix : 1.32580018044 broadcasting : 1.30032682419 33 / 35
  • 34. Conclusion Try not to use for-sentence, but use libraries' capabilities instead. Knowledge about the internal structure of the sparse matrix is useful to extract further performance. Mathematical derivation is important. The key is to find a mathematically equivalent and Python-friendly formula. Computational speed does not necessarily matter. Finding a better code in a short time is valuable. Otherwise, you shouldn't pursue too much. 34 / 35
  • 35. Acknowledgment I would like to thank (@shima__shima) who gave me useful advice in Twitter. 35 / 35
  翻译: