Similar presentations:
SparkML basics
1. SparkML basics
Dmitry Bugaychenko2.
3.
4. RDD Basics
5. RDD Basics
6. RDD Basics
7. RDD Basics
8. DataFrames
9. Datasets
10. SQL vs. DataFrame vs. Dataset
11. Spark ML Pipelines
12. Spark ML Pipelines
Transformer13. Spark ML Pipelines
TransformerEstimator
14. Spark ML Pipelines
15. Spark ML Pipelines
16. Spark ML Pipelines
17. Spark ML Core
18. Field Metadata and Attributes
19. Prediction Model
20. “My Spark ML Model”
21. Spark ML Features
ETLFeature Normalization
SQLTransformer
MaxAbsScaler
SqlFilter, ColumnsExtractor
MinMaxScaler
Numerization
Normalizer
OneHotEncoder
QuantileDiscretizer
StringIndexer
StandardScaler
MultinomialExtractor
Vectorization
Missing values
Imputer
VectorAssembler
NullToDefaultReplacer
FeatureHasher
NaNToMeanReplacer
AutoAssembler
22. Spark ML Features
Feature EngineeringDimension reduction
DCT
PCA
ElementwiseProduct
MinHashLSHModel
Interaction
BucketedRandomProjectio
VectorIndexer
nLSH
RandomProjectionsHasher
PolynomialExpansion
Feature Selection
ChiSqSelector
FoldedFeaturesSelector
23. Spark ML Features
Texts extractionTokenizer
RegexTokenizer
Ngram
StopWordsRemover
NLP in Pravada-ML
LanguageDetectorTransformer
LanguageAwareAnalyzer
NGramExtractor
URLElimminator
Texts vecotization
CountVectorizer
HashingTF
IDF
Text embedding
Word2Vec
Clustering
LDA
KMeans/BisectingKMeans
GaussianMixture
24. Spark ML Features
RegressionClassification
25. Spark ML Features
RecommendationsALS
FPGrowth
Tuning
Evaluation
BinaryClassificationEvaluator
ClusteringEvaluator
MulticlassClassificationEvaluat
or
RegressionEvaluator
More from Pravda-ML
ParamGridBuilder
CrossValidator
CombinedModel
PartitionedRankingEvaluator
CRRSampler
XGBoost
StochasticHyperopt
26. Thank you for your attention!
Dmitry.Bugaychenko@corp.mail.ru