SparkML basics
RDD Basics
RDD Basics
RDD Basics
RDD Basics
DataFrames
Datasets
SQL vs. DataFrame vs. Dataset
Spark ML Pipelines
Spark ML Pipelines
Spark ML Pipelines
Spark ML Pipelines
Spark ML Pipelines
Spark ML Pipelines
Spark ML Core
Field Metadata and Attributes
Prediction Model
“My Spark ML Model”
Spark ML Features
Spark ML Features
Spark ML Features
Spark ML Features
Spark ML Features
Thank you for your attention!
2.34M
Category: informaticsinformatics

SparkML basics

1. SparkML basics

Dmitry Bugaychenko

2.

3.

4. RDD Basics

5. RDD Basics

6. RDD Basics

7. RDD Basics

8. DataFrames

9. Datasets

10. SQL vs. DataFrame vs. Dataset

11. Spark ML Pipelines

12. Spark ML Pipelines

Transformer

13. Spark ML Pipelines

Transformer
Estimator

14. Spark ML Pipelines

15. Spark ML Pipelines

16. Spark ML Pipelines

17. Spark ML Core

18. Field Metadata and Attributes

19. Prediction Model

20. “My Spark ML Model”

21. Spark ML Features

ETL
Feature Normalization
SQLTransformer
MaxAbsScaler
SqlFilter, ColumnsExtractor
MinMaxScaler
Numerization
Normalizer
OneHotEncoder
QuantileDiscretizer
StringIndexer
StandardScaler
MultinomialExtractor
Vectorization
Missing values
Imputer
VectorAssembler
NullToDefaultReplacer
FeatureHasher
NaNToMeanReplacer
AutoAssembler

22. Spark ML Features

Feature Engineering
Dimension reduction
DCT
PCA
ElementwiseProduct
MinHashLSHModel
Interaction
BucketedRandomProjectio
VectorIndexer
nLSH
RandomProjectionsHasher
PolynomialExpansion
Feature Selection
ChiSqSelector
FoldedFeaturesSelector

23. Spark ML Features

Texts extraction
Tokenizer
RegexTokenizer
Ngram
StopWordsRemover
NLP in Pravada-ML
LanguageDetectorTransformer
LanguageAwareAnalyzer
NGramExtractor
URLElimminator
Texts vecotization
CountVectorizer
HashingTF
IDF
Text embedding
Word2Vec
Clustering
LDA
KMeans/BisectingKMeans
GaussianMixture

24. Spark ML Features

Regression
Classification

25. Spark ML Features

Recommendations
ALS
FPGrowth
Tuning
Evaluation
BinaryClassificationEvaluator
ClusteringEvaluator
MulticlassClassificationEvaluat
or
RegressionEvaluator
More from Pravda-ML
ParamGridBuilder
CrossValidator
CombinedModel
PartitionedRankingEvaluator
CRRSampler
XGBoost
StochasticHyperopt

26. Thank you for your attention!

Dmitry.Bugaychenko
@corp.mail.ru
English     Русский Rules