Machine Learning: Your Path to Deeper Insight Driving increasing innovation and competitive advantage across industries
Motivation
Intel® Distribution for Python* Advancing Python performance closer to native speeds
Performance Gain from MKL (Compare to “vanilla” SciPy)
Out-of-the-box Performance with Intel® Distribution for Python* Mature AVX2 instructions based product
Out-of-the-box Performance with Intel® Distribution for Python* New AVX512 instructions based product
WORKSHOP: BASIC functions
Examples of Basic Functions
Intel Python Landscape
Scikit-Learn* optimizations with Intel® MKL Speedups of Scikit-Learn* Benchmarks (2017 Update 1)
More Scikit-Learn* optimizations with Intel® DAAL Speedups of Scikit-Learn* Benchmarks (2017 Update 2)
Intel® DAAL: Heterogeneous Analytics
Performance Example : Read And Compute SVM Classification with RBF kernel
WORKSHOP: PyDAAL
pyDAAL Getting Started
Intel® TBB: parallelism orchestration in Python ecosystem
Profiling Python* code with Intel® VTune™ Amplifier Right tool for high performance application profiling at all levels
Installing Intel® Distribution for Python* 2017
Intel® Distribution for Python
backup
Collaborative Filtering
Training: Profiling pure python*
Training: Profiling pure Python*
Training: Python + Numpy (MKL)
Legal Disclaimer & Optimization Notice
8.89M
Category: programmingprogramming

IDP for Machine Learning

1.

Navigate machine
learning
With InTEL®
Victoriya Fedotova
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
1

2. Machine Learning: Your Path to Deeper Insight Driving increasing innovation and competitive advantage across industries

Solutions
for reference across industries
Tools/Platforms
to accelerate deployment
strategy
provides
the
foundation
for
success
using AI
Intel® Deep Learning
SDK for Training &
Deployment
Optimized Frameworks
to simplify development
Libraries/Languages
featuring optimized building blocks
Intel® Math Kernel
Library (Intel®
MKL & MKL-DNN)
Intel® Data Analytics
Acceleration Library
(Intel® DAAL)
Intel®
Distribution
for Python*
Hardware Technology
portfolio that is broad and crosscompatible
Datacenter
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
Endpoint
+Network
+Memory
+Storage
2

3. Motivation

Python is among the most popular
programming languages
Challenge #1:
Domain specialists are not professional
software programmers
* L.Prechelt, An empirical comparison of seven programming languages, IEEE Computer, 2000, Vol. 33, Issue 10, pp. 23-29
** RedMonk - D.Berkholz, Programming languages ranked by expressiveness
Challenge #2:
Python performance limits migration to
production systems
Hire a team of Java/C++
programmers …
OR
Have team of Python programmers to
deploy optimized Python in
production
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
3

4. Intel® Distribution for Python* Advancing Python performance closer to native speeds

Easy, out-of-the-box
access to high
performance Python
• Prebuilt, optimized for numerical computing, data
analytics, HPC
• Drop in replacement for your existing Python. No code
changes required
High performance with
multiple optimization
techniques
• Accelerated NumPy*/SciPy*/Scikit-Learn* with Intel® MKL
• Data analytics with pyDAAL, enhanced thread scheduling with TBB,
Jupyter* Notebook interface, Numba*, Cython*
• Scale easily with optimized MPI4Py and Jupyter notebooks
Faster access to latest
optimizations for Intel
architecture
• Distribution and individual optimized packages available
through conda and Anaconda Cloud: anaconda.org/intel
• Optimizations upstreamed back to main Python trunk
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
4

5. Performance Gain from MKL (Compare to “vanilla” SciPy)

Linear Algebra
• BLAS
• LAPACK
• ScaLAPACK
• Sparse BLAS
• Sparse Solvers
Fast Fourier Transforms
Up to
100x
faster
Vector RNGs
• Multiple BRNG
Up to
• Support methods
60x
for independent
faster!
streams creation
• Support all key probability
distributions
• Multidimensional
• FFTW interfaces
• Cluster FFT
Up to
10x
faster!
Vector Math
• Trigonometric
• Hyperbolic
• Exponential
• Log
• Power, Root
Summary Statistics
• Kurtosis
• Variation coefficient
• Order statistics
• Min/max
• Variance-covariance
Up to
10x
faster!
And More
• Splines
• Interpolation
• Trust Region
• Fast Poisson Solver
Configuration info: - Versions: Intel® Distribution for Python 2017 Beta, icc 15.0; Hardware: Intel® Xeon® CPU E5-2698 v3 @ 2.30GHz (2 sockets, 16 cores each, HT=OFF), 64 GB of RAM, 8 DIMMS of
[email protected]; Operating System: Ubuntu 14.04 LTS.
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
5

6. Out-of-the-box Performance with Intel® Distribution for Python* Mature AVX2 instructions based product

Configuration Info: apt/atlas: installed with apt-get, Ubuntu 16.10, python 3.5.2, numpy 1.11.0, scipy 0.17.0; pip/openblas: installed with pip, Ubuntu 16.10, python 3.5.2, numpy
1.11.1, scipy 0.18.0; Intel Python: Intel Distribution for Python 2017
Hardware: Xeon: Intel Xeon CPU E5-2698 v3 @ 2.30 GHz (2 sockets, 16 cores each, HT=off), 64 GB of RAM, 8 DIMMS of [email protected]
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
6

7. Out-of-the-box Performance with Intel® Distribution for Python* New AVX512 instructions based product

Configuration Info: apt/atlas: installed with apt-get, Ubuntu 16.10, python 3.5.2, numpy 1.11.0, scipy 0.17.0; pip/openblas: installed with pip, Ubuntu 16.10, python 3.5.2, numpy
1.11.1, scipy 0.18.0; Intel Python: Intel Distribution for Python 2017
Hardware: Intel Intel® Xeon Phi™ CPU 7210 1.30 GHz, 96 GB of RAM, 6 DIMMS of [email protected]
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
7

8. WORKSHOP: BASIC functions

© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
8

9. Examples of Basic Functions

NumPy, SciPy
Matrix multiplication
Random number generation
Vector Math
Linear algebra decompositions
Not so basic functions
SciKit-learn
Linear regression
NOTE: Only Python 2.7 and 3.5 are supported for now
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
9

10. Intel Python Landscape

Intel®
Distribution
for Python*
Intel®
Performance
Libraries
Numpy*
Intel®
IPP
Scipy*
Intel®
TBB
Scikitlearn*
py
DAAL
Intel®
MKL
Pandas*
Intel®
DAAL
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
Mpi4py*

Intel®
MPI
Library
10

11. Scikit-Learn* optimizations with Intel® MKL Speedups of Scikit-Learn* Benchmarks (2017 Update 1)

Scikit-Learn* optimizations with Intel®
MKL
Speedups Intel®
of Scikit-Learn*
Benchmarks (2017 Update 1)
Distribution for Python* 2017 Update 1 vs. system Python & NumPy*/Scikit-Learn*
9x
8x
7x
Speedup
6x
5x
4x
3x
2x
1x
0x
Approximate Fast K-means
neighbors
GLM
GLM net
LASSO
Lasso path
Least angle Non-negative Regression
Sampling
regression,
matrix
by SGD
without
OpenMP factorization
replacement
SVD
System info: 32x Intel® Xeon® CPU E5-2698 v3 @ 2.30GHz, disabled HT, 64GB RAM; Intel® Distribution for Python* 2017 Gold; Intel® MKL 2017.0.0; Ubuntu 14.04.4 LTS; Numpy 1.11.1; scikit-learn 0.17.1. See Optimization Notice.
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
11

12. More Scikit-Learn* optimizations with Intel® DAAL Speedups of Scikit-Learn* Benchmarks (2017 Update 2)

More Scikit-Learn* optimizations with
Intel® DAAL
Speedups
Accelerated
key Machine
Learning algorithms
with Intel®
DAAL
of Scikit-Learn*
Benchmarks
(2017 Update
2)
Distances, K-means, Linear & Ridge Regression, PCA
Up to 160x speedup on top of MKL initial optimizations
Scikit-Learn Optimizations
Due to Intel(R) DAAL
180.00
160.00
158.91
157.94
Speedup
140.00
120.00
100.00
80.00
60.00
39.65
40.00
20.00
2.56
5.39
1.57
0.00
1Kx150K
1Kx150K
100Kx50, 10 clusters
10Mx25, training
10Mx25, training
1Mx50, 3 components
Correlation Distance
Cosine Distance
K-means
Linear Regression
Ridge Regression
PCA
Intel Python 2017 U2 vs. U1
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
12

13. Intel® DAAL: Heterogeneous Analytics

Available also in open source:
https://software.intel.com/en-us/articles/opendaal
Web/Social
Scientific/Engineering
Targets both data centers (Intel® Xeon® and Intel® Xeon Phi™) and edge-devices (Intel® Atom™)
Perform analysis close to data source (sensor/client/server) to optimize response latency, decrease
network bandwidth utilization, and maximize security
Offload data to server/cluster for complex and large-scale analytics
Business
Pre-processing
(De-)Compression
(De-)Serialization
Transformation
PCA
Outlier detection
Normalization
Math functions
Sorting
Analysis
Statistical moments
Quantiles
Distances
Variance matrix
Distances
QR, SVD, Cholesky
Apriori
Optimization solvers
Modeling
Regression
Linear
Ridge
Classification
Naïve Bayes
SVM
Classifier boosting
kNN
Decision Forest
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
Validation
Decision Making
Clustering
Kmeans
EM GMM
Collaborative filtering
ALS
Neural Networks
Quality metrics
13

14. Performance Example : Read And Compute SVM Classification with RBF kernel

Performance Example : Read And
Compute
SVM
Classification
with
RBF kernelMNIST, 40 principal components) n=42000, p=40
Training
dataset: CSV file
(PCA-preprocessed
Testing dataset: CSV file (PCA-preprocessed MNIST, 40 principal components) n=28000, p=40
60% faster
CSV read
Training (sec)
20
2.2x
25
Time in Seconds
Time in Seconds
25
Prediction (sec)
15
10
5
0
20
15
66x
10
5
Balanced
read and
compute
0
Scikit-Learn, Pandas
Read Training Dataset (incl. labels)
pyDAAL
Training Compute
Scikit-Learn, Pandas
Read Test Dataset
pyDAAL
Prediction Compute
System Info: Intel® Xeon® CPU E5-2680 v3 @ 2.50GHz, 504GB, 2x24 cores, HT=on, OS RH7.2 x86_64, Intel® Distribution for Python* 2017 Update 1 (Python* 3.5)
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
14

15. WORKSHOP: PyDAAL

WORKSHOP
:
PyDAAL
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
15

16. pyDAAL Getting Started

https://github.com/daaltces/pydaal-getting-started
DAAL4PY: Tech Preview
https://software.intel.com/en-us/articles/daal4py-overview-a-high-levelpython-api-to-the-intel-data-analytics-acceleration-library
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
16

17. Intel® TBB: parallelism orchestration in Python ecosystem

Software components are built from smaller ones
If each component is threaded there can be too much!
Intel TBB dynamically balances thread loads and effectively manages oversubscription
> python -m TBB application.py
Application
Component 1
Subcomponent 1
Subcomponent
1
Subcomponent 2
Subcomponent
1
Numpy
Scipy
PyDAAL
Joblib
Dask
Subcomponent
1
Subcomponent
1
Thread
Pool
Numba
Subcomponent K
Subcomponent
1
Component N
Subcomponent 1
Subcomponent
1
Intel® MKL
Intel®
DAAL
Intel® TBB module
for Python
Subcomponent M
Subcomponent
1
Subcomponent
1
Intel® TBB runtime
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
17

18. Profiling Python* code with Intel® VTune™ Amplifier Right tool for high performance application profiling at all levels

Profiling Python* code with Intel®
VTune™ Amplifier
Right
tool for high and
performance
profiling
at alltolevels
Function-level
line-level application
hotspot analysis,
down
disassembly
Call stack analysis
Low overhead
Mixed-language, multi-threaded application analysis
cProfile
Line_profiler
Intel® VTune™
Profiling technology
Event
Instrumentation
Sampling, hardware events
Analysis granularity
Function-level
Line-level
Line-level, call stack, time
windows, hardware events
Medium (1.3-5x)
High (4-10x)
Low (1.05-1.3x)
Python
Python
Python, Cython, C++, Fortran
Feature
Intrusiveness
Mixed language
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
18

19. Installing Intel® Distribution for Python* 2017

Stand-alone installer and anaconda.org/intel
Linux
Windows*
Download full installer from
https://software.intel.com/en-us/intel-distribution-for-python
OS X*
OR
> conda config --add channels intel
> conda install intelpython3_full
> conda install intelpython3_core
docker pull intelpython/intelpython3_full
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
19

20. Intel® Distribution for Python

https://software.intel.com/en-us/distribution-for-python
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
20

21. backup

© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
21

22. Collaborative Filtering

Processes users’ past behavior, their activities and ratings
Predicts, what user might want to buy depending on his/her
preferences
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
22

23. Training: Profiling pure python*

Items similarity
assessment
(similarity matrix
computation) is the
main hotspot
Configuration Info: - Versions: Red Hat
Enterprise Linux* built Python*: Python 2.7.5
(default, Feb 11 2014), NumPy 1.7.1, SciPy
0.12.1, multiprocessing 0.70a1 built with gcc
4.8.2; Hardware: 24 CPUs (HT ON), 2 Sockets (6
cores/socket), 2 NUMA nodes, Intel(R) Xeon(R)
[email protected], RAM 24GB, Operating
System: Red Hat Enterprise Linux Server release
7.0 (Maipo)
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
23

24. Training: Profiling pure Python*

This loop is major
bottleneck. Use appropriate
technologies
(NumPy/SciPy/Scikit-Learn
or Cython/Numba) to
accelerate
Configuration Info: - Versions: Red Hat Enterprise Linux* built
Python*: Python 2.7.5 (default, Feb 11 2014), NumPy 1.7.1,
SciPy 0.12.1, multiprocessing 0.70a1 built with gcc 4.8.2;
Hardware: 24 CPUs (HT ON), 2 Sockets (6 cores/socket), 2
NUMA nodes, Intel(R) Xeon(R) [email protected], RAM 24GB,
Operating System: Red Hat Enterprise Linux Server release 7.0
(Maipo)
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
24

25. Training: Python + Numpy (MKL)

Much faster!
The most computeintensive part takes ~5%
of all the execution time
Configuration info: 96 CPUs (HT ON), 4 Sockets (12 cores/socket), 1 NUMA nodes, Intel(R) Xeon(R) E5-4657L [email protected], RAM 64GB, Operating System: Fedora
release
23
(TwentyorThree)
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks
of Intel
Corporation
its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
25

26. Legal Disclaimer & Optimization Notice

Legal Disclaimer & Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY
ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS
DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR
IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES
RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY
PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions. Any change to any of those factors may cause the
results to vary. You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products.
For more complete information about compiler optimizations, see our Optimization Notice at
https://software.intel.com/en-us/articles/optimization-notice#opt-en.
Copyright © 2017, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel
Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of
others.
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
26

27.

© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
For more complete information about compiler optimizations, see our Optimization Notice.
English     Русский Rules