https://habr.com/ru/company/skillfactory/blog/510688/. what is p-value?
http://www.stochasticlifestyle.com/the-essential-tools-of-scientific-machine-learning-scientific-ml/
https://habr.com/ru/post/475552/ Блиц-проверка алгоритмов машинного обучения: скорми свой набор данных библиотеке scikit-learn
https://habr.com/ru/post/460557/
https://habr.com/ru/post/462961/ . ML Digest
http://themlbook.com/wiki/doku.php
https://vas3k.ru/blog/machine_learning/
https://ml-cheatsheet.readthedocs.io/
https://github.com/danielhanchen/hyperlearn/blob/master/Modern%20Big%20Data%20Algorithms%20(Lower%20quality%20PDF).pdf
https://habr.com/ru/post/453290/ Data Science Digest
https://github.com/kmario23/deep-learning-drizzle
https://github.com/trekhleb/homemade-machine-learning . HomeMade ML using Jupiter Notebook
https://habr.com/ru/post/449260/ . AutoML
https://github.com/mljar/mljar-supervised . AutoML
https://ai.googleblog.com/2019/05/an-end-to-end-automl-solution-for.html . AutoML
https://news.ycombinator.com/item?id=19712465 . ML workflow
https://www.textbook.ds100.org/ Introduction to datascience
https://github.com/machinelearningmindset/machine-learning-course
https://blog.floydhub.com/introduction-to-anomaly-detection-in-python/
https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html Loss function
https://gombru.github.io/2018/05/23/cross_entropy_loss/
https://residentmario.github.io/machine-learning-notes/kernels.html
https://aws.amazon.com/training/learning-paths/machine-learning/
https://www.youtube.com/playlist?list=PLl8OlHZGYOQ7bkVbuRthEsaLr7bONzbXS . CORNELL CS4780
https://news.ycombinator.com/item?id=20570025 . ML Books
https://github.com/r0f1/datascience a list of links
https://deepmind.com/blog/unsupervised-learning/
https://www.octavian.ai/machine-learning-on-graphs-course
https://jinchuika.com/en/post/1-preprocessing-part-1/ . Preprocessing
https://skymind.ai/wiki/
https://github.com/clone95/Machine-Learning-Study-Path/blob/master/README.md
In math terms, an operation F is linear if scaling inputs scales the output, and adding inputs adds the outputs:
F(ax) = a * F(x)
F(x + y) = F(x) + F(y)
Linear Models
https://habrahabr.ru/company/ods/blog/323890/ Linear models
https://medium.freecodecamp.org/learn-how-to-improve-your-linear-models-8294bfa8a731
http://www.jmlr.org/papers/volume18/17-468/17-468.pdf . Automatic Differentiation
Statistical tests:
https://lindeloev.github.io/tests-as-linear/
https://www.youtube.com/watch?v=enpPFqcIFj8&list=PLlb7e2G7aSpRb95_Wi7lZ-zA6fOjV3_l7 .
Анализ данных на Python в примерах и задачах
https://distill.pub/2019/visual-exploration-gaussian-processes/ . Gaussian process
https://blog.finxter.com/python-linear-regression-1-liner/
from sklearn.linear_model import LinearRegression
import numpy as np
## Data (Apple stock prices)
apple = np.array([155, 156, 157])
n = len(apple)
## One-liner
model = LinearRegression().fit(np.arange(n).reshape((n,1)), apple)
print(model.predict([[3],[4]]))
## Result
print(model.coef_)
# [1.]
print(model.intercept_)
# 155.0
Linear regression can be applied to model non-linear relationship between input and response.
This can be done by replacing the input x with some nonlinear function φ(x).
Note that doing so preserves the linearity as a function of the parame- ters w,
https://habr.com/ru/company/mailru/blog/513842/. different types of regression
https://www.youtube.com/watch?v=68ABAU_V8qI . Linear models
https://github.com/Yorko/mlcourse.ai
https://medium.com/@vimarshk . ML interview
https://github.com/trekhleb/homemade-machine-learning
https://jalammar.github.io/ visuaslization ML cocepts
http://blog.christianperone.com/2019/01/a-sane-introduction-to-maximum-likelihood-estimation-mle-and-maximum-a-posteriori-map/
Logistic regression
https://towardsdatascience.com/logistic-regression-b0af09cdb8ad
https://habr.com/ru/post/485872/
https://realpython.com/logistic-regression-python/
https://towardsdatascience.com/10-gradient-descent-optimisation-algorithms-86989510b5e9]
https://github.com/turingbirds/gradient_descent/blob/master/gradient_descent.ipynb
https://raiboso.me/backpropagation-demo/
https://www.reddit.com/r/learnmachinelearning/comments/ax6ep5/machine_learning_git_codebook_case_study_of/
https://hackernoon.com/tackle-bias-and-other-problems-solutions-in-machine-learning-models-f4274c5fe538
https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html
https://brohrer.github.io/how_modeling_works_1.html
https://github.com/zekelabs/data-science-complete-tutorial
https://dyakonov.org/
https://github.com/AntonioErdeljac/Google-Machine-Learning-Course-Notes
https://github.com/robertmartin8/udemyML . code and notes for Kirill Eremenko's Machine Learning course
https://habr.com/ru/company/singularis/blog/440026/ . Real Kaggle project
Books
http://themlbook.com . The 100 pages ML book (Andrij Burkov)
https://github.com/jakevdp/PythonDataScienceHandbook
https://news.ycombinator.com/item?id=19296031
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/ e-book
https://play.google.com/store/books/details/Николенко_Сергей_Игоревич_Глубокое_обучение?id=Zi48DwAAQBAJ
https://john.specpal.science/deepvision/
https://jakevdp.github.io/PythonDataScienceHandbook/ BOOK ONLINE
http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/ Book
https://github.com/zackchase/mxnet-the-straight-dope e-book
https://github.com/rasbt/python-machine-learning-book-2nd-edition ML book with python code
https://christophm.github.io/interpretable-ml-book/ . Book
http://www.cs-114.org/wp-content/uploads/2015/01/Elements_of_Information_Theory_Elements.pdf
https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/ Bishop Book
ISLR book and videos:
http://auapps.american.edu/alberto/www/analytics/ISLRLectures.html
https://github.com/JWarmenhoven/ISLR-python/tree/master/Notebooks
https://mml-book.github.io/
http://www.inference.phy.cam.ac.uk/itprnn/book.pdf David MacKay. Information Theory, Inference and Learning Algorithms
http://mbmlbook.com/
https://universalflowuniversity.com/ulibrary/?drawer1=Computer%20Programming*Neural%20Networks%20and%20Deep%20Learning
https://github.com/joelgrus/data-science-from-scratch - Code from book "Data science from scratch"
https://news.ycombinator.com/item?id=18201986
Metahevristics
https://proplot.readthedocs.io/en/stable/
https://habr.com/ru/post/688820/
http://www2.cscamm.umd.edu/publications/BookChapter_CS-09-13.pdf
https://cs.gmu.edu/~sean/book/metaheuristics/Essentials.pdf
https://medium.com/huggingface/from-zero-to-research-an-introduction-to-meta-learning-8e16e677f78a MetaLearning
https://sgfin.github.io/learning-resources/
https://see.stanford.edu/Course/CS229
https://github.com/danielhanchen/hyperlearn/blob/master/Modern%20Big%20Data%20Algorithms.pdf
https://www.coursera.org/promo/NEXTExtended
https://habr.com/company/tssolution/blog/423783/ Splunk
SageMaker:
https://towardsdatascience.com/building-fully-custom-machine-learning-models-on-aws-sagemaker-a-practical-guide-c30df3895ef7
Deployment
https://towardsdatascience.com/create-a-complete-machine-learning-web-application-using-react-and-flask-859340bddb33
https://www.inovex.de/blog/machine-learning-model-management/
https://towardsdatascience.com/deploying-a-machine-learning-model-as-a-rest-api-4a03b865c166 . Flask Rest API for model
https://heartbeat.fritz.ai/brilliant-beginners-guide-to-model-deployment-133e158f6717
https://towardsdatascience.com/deploying-a-keras-deep-learning-model-as-a-web-application-in-p-fc0f2354a7ff
https://habr.com/ru/company/otus/blog/442918/
https://www.dataquest.io/blog/learning-curves-machine-learning/
https://arxiv.org/abs/1809.10756 . probabilistic programming
https://github.com/Avik-Jain/100-Days-Of-ML-Code
https://github.com/seddonr/Ng_ML . Ng Cousera implemented in Python
https://www.youtube.com/channel/UCsBKTrp45lTfHa_p49I2AEQ Brandon Rohrer
Automatic differentiation
https://github.com/tensorflow/swift/blob/master/docs/AutomaticDifferentiation.md
https://www.sanyamkapoor.com/machine-learning/autograd-magic/ . Automatic Differentiation and back propagation
https://aws.amazon.com/training/learning-paths/machine-learning/
http://www.fast.ai/2018/09/26/ml-launch/ . Online Course
Boosting and bagging
https://habr.com/ru/company/piter/blog/488362/
https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205
https://medium.com/mlreview/gradient-boosting-from-scratch-1e317ae4587d
https://habr.com/ru/company/piter/blog/445780/
https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/
Ансамблевый метод — это метод, который совмещает множество слабых учеников,
которые основаны на одном и том же обучающемся алгоритме, с целью создания (более сильного) ученика,
чья результативность лучше,
чем у любого из отдельно взятых учеников. Ансамблевые методы помогают уменьшить смещение и/или дисперсию.
бустирование совершенно отличается от бэггирования:
- Подгонка индивидуальных классификаторов выполняется последовательно.
- Слаборезультативные классификаторы отклоняются.
- На каждой итерации наблюдения взвешиваются по-разному.
XGBoost
https://habr.com/ru/company/mailru/blog/438560/
https://habr.com/ru/company/mailru/blog/438562/
https://saru.science/tech/2018/02/15/kl-divergence-explanation.html
Kullback-Leibler divergence
https://news.ycombinator.com/item?id=17916981
https://www.coursera.org/learn/machine-learning-projects/
https://www.youtube.com/user/PyDataTV/videos
https://bloomberg.github.io/foml/#lectures
https://appliedmachinelearning.blog/
https://ml-cheatsheet.readthedocs.io/en/latest/
https://github.com/afshinea/stanford-cs-229-machine-learning/blob/master/super-cheatsheet-machine-learning.pdf
https://stanford.edu/~shervine/teaching/cs-229/
http://anotherdatum.com/index2.html
ML BOOK with code:
https://arxiv.org/pdf/1803.08823
http://physics.bu.edu/~pankajm/ML-Notebooks/NotebooksforMLReview.zip - jupyter notebooks (zip)
X-means http://docs.splunk.com/Documentation/MLApp/3.4.0/User/Algorithms#X-means
Алгоритм кластеризации X-means представляет собой расширенный алгоритм k-means,
который автоматически определяет количество кластеров на основе информационного байесовского критерия (BIC).
Этот алгоритм удобно использовать, когда нет предварительной информации о числе кластеров,
на которые эти данные могут быть разделены.
RobustScaler http://docs.splunk.com/Documentation/MLApp/3.4.0/User/Algorithms#RobustScaler
Это алгоритм предварительной обработки данных. По применению схож с алгоритмом StandardScaler,
который преобразует данные так, что для каждого признака среднее будет равно 0, а дисперсия будет равна 1, в
результате чего все признаки будут иметь один и тот же масштаб.
Однако это масштабирование не гарантирует получение каких-то конкретных минимальных и максимальных значений признаков.
RobustScaler аналогичен StandardScaler в том плане, что в результате его применения признаки будут иметь один и тот же масштаб.
Однако RobustScaler вместо среднего и дисперсии использует медиану и квартили.
Это позволяет RobustScaler игнорировать выбросы или ошибки измерений, которые могут стать проблемой для остальных методов
масштабирования.
Links
https://sandipanweb.wordpress.com/
https://habr.com/company/intel/blog/417809/ . NN architectures for image recognition
https://kite.com/blog/python/data-analysis-visualization-python
https://habr.com/company/nixsolutions/blog/417935/ памятки по искусственному интеллекту
https://thegradient.pub/why-rl-is-flawed/
https://habr.com/post/418249/ . Google VM for ML
https://medium.com/syncedreview/google-ai-chief-jeff-deans-ml-system-architecture-blueprint-a358e53c68a5
https://news.ycombinator.com/item?id=17667705 . ML intro
https://news.ycombinator.com/item?id=17422770 Matrix 101 for ML
https://news.ycombinator.com/item?id=17664084 Math for ML
http://tools.google.com/seedbank/
https://developers.google.com/machine-learning/guides/
https://codequs.com/p/BkaLEq8r4/a-complete-machine-learning-project-walk-through-in-python
https://morioh.com/p/b56ae6b04ffc/a-complete-machine-learning-project-walk-through-in-python
ML from start to end
Open Machine Learning
https://towardsdatascience.com/forecasting-with-python-and-tableau-dd37a218a1e5 . Tableau+ARIMA+Python
https://mlcrunch.blogspot.com/2018/08/dimensionality-reduction-techniques-guide-python.html
https://github.com/Avik-Jain/100-Days-Of-ML-Code
https://sandipanweb.wordpress.com/2018/05/31/8626/
http://ciml.info/
https://news.ycombinator.com/item?id=17214588
http://ods.ai/
https://habrahabr.ru/company/ods/blog/344044/ Open Data Science
https://habrahabr.ru/company/ods/blog/325422/ Открытый курс машинного обучения. Тема 6. Построение и отбор признаков
Part 1
Part 2
Part 3
https://towardsdatascience.com/another-machine-learning-walk-through-and-a-challenge-8fae1e187a64
## Russian translation of 3 links above:
https://habr.com/company/nixsolutions/blog/425253
https://habr.com/company/nixsolutions/blog/425907/
https://habr.com/company/nixsolutions/blog/426771/
https://github.com/esokolov/ml-course-hse (ru)
intrepretable-machine-learning-nfl
https://spandan-madan.github.io/DeepLearningProject/ End to End Implementation
https://spandan-madan.github.io/DeepLearningProject/docs/Deep_Learning_Project-Pytorch.html
https://towardsdatascience.com/visualizing-data-with-pair-plots-in-python-f228cf529166 Pair plots
Markov Chain Monte Carlo
https://towardsdatascience.com/markov-chain-monte-carlo-in-python-44f7e609be98
https://habr.com/ru/company/piter/blog/491268/
https://news.ycombinator.com/item?id=19633212
http://arogozhnikov.github.io/2016/12/19/markov_chain_monte_carlo.html
https://news.ycombinator.com/item?id=15986687 Markov chain Monte-Carlo
http://www.moderndescartes.com/essays/deep_dive_mcts/ monte carlo tree search
https://skymind.ai/wiki/generative-adversarial-network-gan
https://habr.com/post/429276/ Вариационный автокодировщик VAE (автоэнкодер) — это генеративная модель,
которая учится отображать объекты в заданное скрытое пространство.
https://www.youtube.com/watch?v=Lo1rXJdAJ7w C++ ML
https://software.intel.com/en-us/ai-academy Intel AI
https://research.fb.com/the-facebook-field-guide-to-machine-learning-video-series/ FaceBook ML video series
https://medium.com/@deepsystems
https://datamonsters.com/ company
https://eli.thegreenplace.net/2018/minimal-character-based-lstm-implementation/
http://www.wildml.com/
PyTorch
https://habr.com/company/otus/blog/358096/
https://habr.com/company/piter/blog/354912/
https://www.reddit.com/r/Python/comments/878vjb/compute_distance_between_strings_30_algorithms/
https://thomaswdinsmore.com/
https://towardsdatascience.com/data-science-interview-guide-4ee9f5dc7784
https://medium.com/acing-ai/apple-ai-interview-questions-acing-the-ai-interview-803a65b0e795
https://towardsdatascience.com/data-science-and-machine-learning-interview-questions-3f6207cf040b
http://savvastjortjoglou.com/intrepretable-machine-learning-nfl-combine.html
PDF
QnA
My code
Neural Networks and Image Processing
https://towardsdatascience.com/building-prediction-apis-in-python-part-4-decoupling-the-model-and-api-4b5eaf2ed125
Statistics
https://en.wikipedia.org/wiki/Correlation_and_dependence
http://pages.cs.wisc.edu/~tdw/files/cookbook-en.pdf
https://etav.github.io/articles/ida_eda_method.html
http://statistics.zone/
https://h4labs.wordpress.com/2017/12/30/learning-probability-and-statistics/
https://news.ycombinator.com/item?id=18462520 . estimate probability of yet unhappen
Calculating avg and stdev on stream
--------------------------------------
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
https://math.stackexchange.com/questions/20593/calculate-variance-from-a-stream-of-sample-values
https://blog.superfeedr.com/streaming-percentiles/
https://www.johndcook.com/blog/standard_deviation/
https://dev.to/nestedsoftware/calculating-a-moving-average-on-streaming-data-5a7k
https://en.wikipedia.org/wiki/Receiver_operating_characteristic ROC curve
https://habrahabr.ru/post/311092/ standard distibutions
https://en.wikipedia.org/wiki/Outlier
https://medium.com/netflix-techblog/rad-outlier-detection-on-big-data-d6b0494371cc
https://en.wikipedia.org/wiki/Maximum_likelihood_estimation
https://en.wikipedia.org/wiki/Precision_and_recall
https://data36.com/statistical-bias-types-explained/
https://data36.com/statistical-bias-types-examples-part2/
Precision is the number of correct positive classifications divided by the total number of positive labels assigned.
precision=true positives / (true positives+false positives)
Recall is the number of correct positive classifications divided by the number of positive instances that should have been identified.
recall=true positives / (true positives+false negatives)
https://en.wikipedia.org/wiki/Quantile
https://www.analyticsvidhya.com/blog/2017/02/basic-probability-data-science-with-examples/
https://en.wikipedia.org/wiki/Simpson%27s_paradox
Bayes
https://greenteapress.com/wp/think-bayes/
https://habr.com/ru/post/510526/ bayes in python
http://web.ipac.caltech.edu/staff/fmasci/home/astro_refs/Science-2013-Efron.pdf
https://habrahabr.ru/post/337028/ video bayes deep ML
https://www.sanyamkapoor.com/machine-learning/the-beauty-of-bayesian-learning/
https://medium.freecodecamp.org/statistical-inference-showdown-the-frequentists-vs-the-bayesians-4c1c986f25de
https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/
https://malobukov.dreamwidth.org/7960.html bayes
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials
https://news.ycombinator.com/item?id=18213117
In the case of normally distributed data,
the three sigma rule means that roughly 1 in 22 observations will differ by twice the standard deviation or more from the mean,
and 1 in 370 will deviate by three times the standard deviation
Probability density function for normal disribution with sigma=1:
https://www.dataquest.io/onboarding
https://www.dataquest.io/blog/learning-curves-machine-learning/
http://efavdb.com/
https://www.hardikp.com/
https://unsupervisedpandas.com/
https://www.zabaras.com/statisticalcomputing
Signal Processing
https://terpconnect.umd.edu/~toh/spectrum/
https://habr.com/post/358868/ Kalman filter
Machine Learning
Machine Learning code snippets
List of machine learning concepts
Tour-of-machine-learning-algorithms
Regression
https://developers.google.com/machine-learning/crash-course/
https://avva.livejournal.com/3074895.html#comments
https://robertheaton.com/2014/05/02/jaccard-similarity-and-minhash-for-winners/
http://efavdb.com/
https://talkery.io/conferences/507?pageNumber=1 PyData 2017 videos
www.wildml.com/2017/12/ai-and-deep-learning-in-2017-a-year-in-review/
https://habr.com/company/ods/blog/354944/
https://habrahabr.ru/company/itinvest/blog/262155/ TOP 10 ML algo
https://habrahabr.ru/company/cloud4y/blog/346968/
https://habrahabr.ru/post/347008/
https://habrahabr.ru/post/349048/ Autoencoders
https://habrahabr.ru/company/ods/blog/325422/ Feature extraction
https://github.com/featuretools/featuretools
https://www.youtube.com/watch?v=BfS2H1y6tzQ
https://www.youtube.com/watch?v=GsAVf3fn3yM&feature=youtu.be Artificial Intelligence with Python | Sequence Learning
https://www.youtube.com/watch?v=RLsKzkxWpK8
https://github.com/AxeldeRomblay/MLBox
https://habrahabr.ru/company/ods/blog/350440/ Jini index
Apple
https://github.com/apple/coremltools
https://attardi.org/pytorch-and-coreml
https://github.com/apple/turicreate
https://news.ycombinator.com/item?id=15406237 Apple CoreML
https://machinelearning.apple.com/2017/08/06/siri-voices.html
https://news.ycombinator.com/item?id=16364826
NLP
NLP
Speech recognition
https://habrahabr.ru/post/350222/
https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
https://news.ycombinator.com/item?id=16626374 word2vec
http://fast.ai
https://github.com/vicky002/AlgoWiki/blob/gh-pages/Machine-Learning/Sources.md
http://www.inference.vc/design-patterns/
https://notebooks.azure.com/jakevdp/libraries/PythonDataScienceHandbook
https://eli.thegreenplace.net/tag/machine-learning
http://course.fast.ai/
http://learningsys.org/nips17/assets/slides/dean-nips17.pdf TPU Google
R
https://radiant-rstats.github.io/docs/index.html
https://rattle.togaware.com/
Visualization and ML packages
https://veusz.github.io/
https://www.knime.com/
https://rapidminer.com/
https://sourceforge.net/projects/weka/
https://orange.biolab.si/
https://elki-project.github.io/
Matlab book
https://www.amazon.com/Exploratory-Analysis-Chapman-Computer-Science/dp/149877606X Exploratory Data Analysis with MATLAB, Third Edition
JavaScript
http://propelml.org/
https://news.ycombinator.com/item?id=16465105
Clustering
https://habrahabr.ru/post/164417/
https://www.youtube.com/watch?v=-_gIcc5_uHY
https://habrahabr.ru/post/322034/ DBSCAN
https://en.wikipedia.org/wiki/DBSCAN DBSCAN
https://towardsdatascience.com/a-gentle-introduction-to-hdbscan-and-density-based-clustering-5fd79329c1e8
https://mubaris.com/2017/10/01/kmeans-clustering-in-python/
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/
Bias is the difference between your model's expected predictions and the true values.
The error due to bias is taken as the difference between the expected (or average) prediction of our model and the correct value which we are trying to predict.
The error due to variance is taken as the variability of a model prediction for a given data point.
The variance is how much the predictions for a given point vary between different realizations of the model.
The small sample size is a source of variance. If we increased the sample size, the results would be more consistent.
The results still might be highly inaccurate due to our large sources of bias, but the variance of predictions will be reduced
Variance refers to your algorithm's sensitivity to specific sets of training data.
https://oneraynyday.github.io/ml/2017/08/08/Bias-Variance-Tradeoff/
High bias, low variance: model are consistent but inaccurate on averag
High variance, low bias: model are inconsistent but accurate on average
Low variance tends to be related to simpler atgorithms (regression, naive bayes, linear, parametric)
Low bias tends to be related to complex atgorithms (Decision tree, Near Neigbour, Non-parametric)
https://medium.com/@kevin_yang/simple-approximate-nearest-neighbors-in-python-with-annoy-and-lmdb-e8a701baf905
Regression algo can be regularized to reduce complexity
Decision tree can be pruned to reduce complexity
Too complex model -> overfitting
Too simple model -> underfitting
The Linear model does not fit the data very well and is therefore said to have a higher bias than the polynomial model.
Overfitting:
---------------
Our model doesn’t generalize well from our training data to unseen data.
Cross-validation is a powerful preventative measure against overfitting.
K-fold cross-validation: partition the the data into k subsets, called folds. Then, we iteratively train the algorithm on k-1 folds while using the remaining fold as the test set (called the “holdout fold”).
- Remove feature
- Regularization: you could prune a decision tree, use dropout on a neural network, or add a penalty parameter to the cost function in regression.
- Early stopping
When you’re training a learning algorithm iteratively, you can measure how well each iteration of the model performs.
Up until a certain number of iterations, new iterations improve the model. After that point, however, the model’s ability to generalize can weaken as it begins to overfit the training data.
Early stopping refers stopping the training process before the learner passes that point.
Underfitting
--------------
occurs when a model is too simple – informed by too few features or regularized too much – which makes it inflexible in learning from the dataset.
In both Machine Learning and Curve Fitting, you want to come up with a model that explains (fits) the data. However, the difference in the end goal is both subtle and profound.
In Curve Fitting, we have all the data available to us at the time of fitting the curve. We want to fit the curve as best as we can.
In Machine Learning, only a small set (the training set) of data is available at the time of training. We obviously want a model that fits the data well, but more importantly, we want the model to generalize to unseen data points
http://blog.dlib.net/2017/12/a-global-optimization-algorithm-worth.html
https://towardsdatascience.com/improving-vanilla-gradient-descent-f9d91031ab1d
Classification is forecasting the target class / category
Regression if forecasting a value.
Logistic regression - dependent variable is categorical.
https://www.analyticsvidhya.com/blog/2017/08/skilltest-logistic-regression/
Logistic function predict the corresponding target class.
Probability of result = logistic function:
y = 1 / ( 1 + exp(-f(x))) in range from 0 to 1.
if f(x) = 0 then y=0.5
if f(x) is big negative # then y=0
if f(x) is big positive # then y=1
f(x) = ax+b, here X is the input vector and A is a parameter vector
Goal is to find A.
The common method is Max likelehood (logarithm) criteria; the gradiend descend can be used
x - random outcomes
theta - parameter
L(theta| x) = P(x | theta)
Because the logarithm is a monotonically increasing function, the logarithm of a function achieves its maximum value at the same points as the function itself, and hence the log-likelihood can be used in place of the likelihood in maximum likelihood estimation
Regulariation: to decrease overfitting
https://habr.com/ru/post/456176/ . L1 and L2 Stohastic Gradient Descent
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/
https://www.quora.com/What-is-the-difference-between-L1-and-L2-regularization-How-does-it-solve-the-problem-of-overfitting-Which-regularizer-to-use-and-when
L2: Euclide
L1: producies many coefficients with zero values or very small values with few large coefficients
Bagging and other resampling techniques can be used to reduce the variance in model predictions.
In bagging (Bootstrap Aggregating), numerous replicates of the original data set are created using random selection with replacement.
Each derivative data set is then used to construct a new model and the models are gathered together into an ensemble.
To make a prediction, all of the models in the ensemble are polled and their results are averaged.
Bagging attempts to reduce the chance overfitting complex models.
It trains a large number of "strong" learners in parallel.
A strong learner is a model that's relatively unconstrained.
Bagging then combines all the strong learners together in order to "smooth out" their predictions.
Boosting attempts to improve the predictive flexibility of simple models.
It trains a large number of "weak" learners in sequence.
A weak learner is a constrained model (i.e. you could limit the max depth of each decision tree).
Each one in the sequence focuses on learning from the mistakes of the one before it.
Boosting then combines all the weak learners into a single strong learner.
While bagging and boosting are both ensemble methods, they approach the problem from opposite directions.
Bagging uses complex base models and tries to "smooth out" their predictions, while boosting uses simple base models and tries to "boost" their aggregate complexity.
https://www.analyticsvidhya.com/blog/2017/02/40-questions-to-ask-a-data-scientist-on-ensemble-modeling-techniques-skilltest-solution/
https://towardsdatascience.com/markov-chain-monte-carlo-in-python-44f7e609be98
https://habr.com/ru/company/piter/blog/491268/
Decision Tree
https://github.com/Yorko/mlcourse.ai/blob/master/jupyter_russian/topic03_decision_trees_knn/topic3_trees_knn.ipynb
https://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/
https://heartbeat.fritz.ai/introduction-to-decision-tree-learning-cd604f85e236
http://www.win-vector.com/blog/2017/01/why-do-decision-trees-work/
The primary challenge in the decision tree implementation is to identify which attributes do we need to consider as the root node and each level.
In decision tree algorithm calculating the nodes and forming the rules will happen using the information gain and Gini index.
Information Gain calculates the expected reduction in entropy due to sorting on the attribute.
Gini Index is a metric to measure how often a randomly chosen element would be incorrectly identified. It means an attribute with lower gini index should be preferred.
Random Forest
https://habr.com/ru/company/ruvds/blog/488342/
https://habr.com/ru/company/piter/blog/488362/
Random forest algorithm is a supervised classification algorithm.
Random forest algorithm can use both for classification and the regression kind of problems.
It works by training numerous decision trees each based on a different resampling of the original training data.
In Random Forests the bias of the full model is equivalent to the bias of a single decision tree (which itself has high variance).
By creating many of these trees, in effect a "forest", and then averaging them the variance of the final model can be greatly reduced over that of a single tree. In practice the only limitation on the size of the forest is computing time as an infinite number of trees could be trained without ever increasing bias and with a continual (if asymptotically declining) decrease in the variance.
https://victorzhou.com/blog/intro-to-random-forests/
http://dataaspirant.com/2017/05/22/random-forest-algorithm-machine-learing/
https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d
https://medium.com/@williamkoehrsen/random-forest-in-python-24d0893d51c0
https://r4ds.had.co.nz/
https://news.ycombinator.com/item?id=19632052
Separate training and test sets
------------------------------------
Split the data into three sets — training (60%), validation (a.k.a development) (20%) and test (20%).
Use the training set to train different models, the validation set to select a model
and finally report performance on the test set.
- Trying appropriate algorithms (No Free Lunch)
- Fitting model parameters
- Tuning impactful hyperparameters
- Proper performance metrics
- Systematic cross-validation
https://blog.statsbot.co/machine-learning-algorithms-183cc73197c
https://towardsdatascience.com/battle-of-the-deep-learning-frameworks-part-i-cff0e3841750
http://pbpython.com/categorical-encoding.html
https://www.kaggle.com/dansbecker/using-categorical-data-with-one-hot-encoding
https://github.com/onurakpolat/awesome-analytics
ML cheetsheets
Tensor Flow
ML DLIB C++
https://medium.com/@mngrwl/explained-simply-how-deepmind-taught-ai-to-play-video-games-9eb5f38c89ee
CoreML
ML blog
ML method
ML plan
MLOSS.org
distill.pub
### PCA
https://joellaity.com/2018/10/18/pca.html
PCA 1
PCA 2
Q & A
https://habrahabr.ru/company/newprolab/blog/350584/ t-SNE and UMAP
http://www.datatau.com/
https://www.bonaccorso.eu/
https://habrahabr.ru/company/oleg-bunin/blog/340184/ Architectures of NN
https://morfizm.livejournal.com/1136917.html BitFunnel
https://blog.statsbot.co/
http://rpubs.com/JDAHAN/172473
https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-algorithm-choice
https://elitedatascience.com/machine-learning-iteration
https://elitedatascience.com/dimensionality-reduction-algorithms
https://elitedatascience.com/machine-learning-algorithms