https://habr.com/ru/company/skillfactory/blog/510688/.  what is p-value?
        http://www.stochasticlifestyle.com/the-essential-tools-of-scientific-machine-learning-scientific-ml/

        https://habr.com/ru/post/475552/ Блиц-проверка алгоритмов машинного обучения: скорми свой набор данных библиотеке scikit-learn 
        https://habr.com/ru/post/460557/ 
        https://habr.com/ru/post/462961/ . ML Digest
	http://themlbook.com/wiki/doku.php
	https://vas3k.ru/blog/machine_learning/
        https://ml-cheatsheet.readthedocs.io/
	https://github.com/danielhanchen/hyperlearn/blob/master/Modern%20Big%20Data%20Algorithms%20(Lower%20quality%20PDF).pdf
	
	https://habr.com/ru/post/453290/ Data Science Digest
	https://github.com/kmario23/deep-learning-drizzle
	
	https://github.com/trekhleb/homemade-machine-learning . HomeMade ML using Jupiter Notebook
	
https://habr.com/ru/post/449260/ . AutoML	
https://github.com/mljar/mljar-supervised .  AutoML
https://ai.googleblog.com/2019/05/an-end-to-end-automl-solution-for.html .  AutoML	
	
	https://news.ycombinator.com/item?id=19712465 . ML workflow
	
	https://www.textbook.ds100.org/  Introduction to datascience
	
	https://github.com/machinelearningmindset/machine-learning-course
	
https://blog.floydhub.com/introduction-to-anomaly-detection-in-python/
	
https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html Loss function
https://gombru.github.io/2018/05/23/cross_entropy_loss/
https://residentmario.github.io/machine-learning-notes/kernels.html
	
https://aws.amazon.com/training/learning-paths/machine-learning/
	
https://www.youtube.com/playlist?list=PLl8OlHZGYOQ7bkVbuRthEsaLr7bONzbXS . CORNELL CS4780	
	
	https://news.ycombinator.com/item?id=20570025 .  ML Books
https://github.com/r0f1/datascience	a list of links
	
https://deepmind.com/blog/unsupervised-learning/	
	
	https://www.octavian.ai/machine-learning-on-graphs-course
	
https://jinchuika.com/en/post/1-preprocessing-part-1/ .  Preprocessing
	
https://skymind.ai/wiki/
	
	https://github.com/clone95/Machine-Learning-Study-Path/blob/master/README.md
In math terms, an operation F is linear if scaling inputs scales the output, and adding inputs adds the outputs:

F(ax)  = a  * F(x)  
F(x + y)  = F(x) + F(y)


Linear Models
https://habrahabr.ru/company/ods/blog/323890/ Linear models

https://medium.freecodecamp.org/learn-how-to-improve-your-linear-models-8294bfa8a731

http://www.jmlr.org/papers/volume18/17-468/17-468.pdf .  Automatic Differentiation


Statistical tests:
https://lindeloev.github.io/tests-as-linear/

https://www.youtube.com/watch?v=enpPFqcIFj8&list=PLlb7e2G7aSpRb95_Wi7lZ-zA6fOjV3_l7 . 
Анализ данных на Python в примерах и задачах


https://distill.pub/2019/visual-exploration-gaussian-processes/ .  Gaussian process
  
https://blog.finxter.com/python-linear-regression-1-liner/
from sklearn.linear_model import LinearRegression
import numpy as np

## Data (Apple stock prices)
apple = np.array([155, 156, 157])
n = len(apple)
## One-liner
model = LinearRegression().fit(np.arange(n).reshape((n,1)), apple)

print(model.predict([[3],[4]]))
## Result
print(model.coef_)
# [1.]
print(model.intercept_)
# 155.0


Linear regression can be applied to model non-linear relationship between input and response. 
This can be done by replacing the input x with some nonlinear function φ(x). 
Note that doing so preserves the linearity as a function of the parame- ters w, 


https://habr.com/ru/company/mailru/blog/513842/.  different types of regression

https://www.youtube.com/watch?v=68ABAU_V8qI .  Linear models

https://github.com/Yorko/mlcourse.ai

https://medium.com/@vimarshk .   ML interview

https://github.com/trekhleb/homemade-machine-learning

https://jalammar.github.io/ visuaslization ML cocepts


http://blog.christianperone.com/2019/01/a-sane-introduction-to-maximum-likelihood-estimation-mle-and-maximum-a-posteriori-map/


Logistic regression
https://towardsdatascience.com/logistic-regression-b0af09cdb8ad 
https://habr.com/ru/post/485872/
https://realpython.com/logistic-regression-python/


https://towardsdatascience.com/10-gradient-descent-optimisation-algorithms-86989510b5e9]
https://github.com/turingbirds/gradient_descent/blob/master/gradient_descent.ipynb
https://raiboso.me/backpropagation-demo/

https://www.reddit.com/r/learnmachinelearning/comments/ax6ep5/machine_learning_git_codebook_case_study_of/

https://hackernoon.com/tackle-bias-and-other-problems-solutions-in-machine-learning-models-f4274c5fe538
https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html
https://brohrer.github.io/how_modeling_works_1.html
https://github.com/zekelabs/data-science-complete-tutorial
https://dyakonov.org/
https://github.com/AntonioErdeljac/Google-Machine-Learning-Course-Notes

https://github.com/robertmartin8/udemyML .  code and notes for Kirill Eremenko's Machine Learning course

https://habr.com/ru/company/singularis/blog/440026/ .  Real Kaggle project

Books
http://themlbook.com .  The 100 pages ML book (Andrij Burkov)
https://github.com/jakevdp/PythonDataScienceHandbook

https://news.ycombinator.com/item?id=19296031

https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/ e-book
https://play.google.com/store/books/details/Николенко_Сергей_Игоревич_Глубокое_обучение?id=Zi48DwAAQBAJ
https://john.specpal.science/deepvision/
https://jakevdp.github.io/PythonDataScienceHandbook/  BOOK ONLINE
http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/ Book
https://github.com/zackchase/mxnet-the-straight-dope   e-book 
https://github.com/rasbt/python-machine-learning-book-2nd-edition ML book with python code
https://christophm.github.io/interpretable-ml-book/ .  Book
http://www.cs-114.org/wp-content/uploads/2015/01/Elements_of_Information_Theory_Elements.pdf

https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/  Bishop Book

ISLR book and videos:
http://auapps.american.edu/alberto/www/analytics/ISLRLectures.html
https://github.com/JWarmenhoven/ISLR-python/tree/master/Notebooks

https://mml-book.github.io/
 
http://www.inference.phy.cam.ac.uk/itprnn/book.pdf     David MacKay.  Information Theory, Inference and Learning Algorithms
http://mbmlbook.com/
https://universalflowuniversity.com/ulibrary/?drawer1=Computer%20Programming*Neural%20Networks%20and%20Deep%20Learning
https://github.com/joelgrus/data-science-from-scratch - Code from book "Data science from scratch"
https://news.ycombinator.com/item?id=18201986

 Metahevristics 
https://proplot.readthedocs.io/en/stable/

https://habr.com/ru/post/688820/

http://www2.cscamm.umd.edu/publications/BookChapter_CS-09-13.pdf
https://cs.gmu.edu/~sean/book/metaheuristics/Essentials.pdf
https://medium.com/huggingface/from-zero-to-research-an-introduction-to-meta-learning-8e16e677f78a   MetaLearning

https://sgfin.github.io/learning-resources/

https://see.stanford.edu/Course/CS229

https://github.com/danielhanchen/hyperlearn/blob/master/Modern%20Big%20Data%20Algorithms.pdf


https://www.coursera.org/promo/NEXTExtended 


https://habr.com/company/tssolution/blog/423783/   Splunk

SageMaker:

https://towardsdatascience.com/building-fully-custom-machine-learning-models-on-aws-sagemaker-a-practical-guide-c30df3895ef7

 Deployment 
https://towardsdatascience.com/create-a-complete-machine-learning-web-application-using-react-and-flask-859340bddb33
https://www.inovex.de/blog/machine-learning-model-management/
https://towardsdatascience.com/deploying-a-machine-learning-model-as-a-rest-api-4a03b865c166 . Flask Rest API for model
https://heartbeat.fritz.ai/brilliant-beginners-guide-to-model-deployment-133e158f6717
https://towardsdatascience.com/deploying-a-keras-deep-learning-model-as-a-web-application-in-p-fc0f2354a7ff
https://habr.com/ru/company/otus/blog/442918/

https://www.dataquest.io/blog/learning-curves-machine-learning/

https://arxiv.org/abs/1809.10756 . probabilistic programming

https://github.com/Avik-Jain/100-Days-Of-ML-Code
https://github.com/seddonr/Ng_ML .   Ng Cousera implemented in Python

https://www.youtube.com/channel/UCsBKTrp45lTfHa_p49I2AEQ Brandon Rohrer

Automatic differentiation
https://github.com/tensorflow/swift/blob/master/docs/AutomaticDifferentiation.md
https://www.sanyamkapoor.com/machine-learning/autograd-magic/ .  Automatic Differentiation and back propagation

https://aws.amazon.com/training/learning-paths/machine-learning/ 
http://www.fast.ai/2018/09/26/ml-launch/ . Online Course

 Boosting and bagging 

https://habr.com/ru/company/piter/blog/488362/

https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205
https://medium.com/mlreview/gradient-boosting-from-scratch-1e317ae4587d
https://habr.com/ru/company/piter/blog/445780/
https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/

Ансамблевый метод — это метод, который совмещает множество слабых учеников, 
которые основаны на одном и том же обучающемся алгоритме, с целью создания (более сильного) ученика, 
чья результативность лучше, 
чем у любого из отдельно взятых учеников. Ансамблевые методы помогают уменьшить смещение и/или дисперсию.


бустирование совершенно отличается от бэггирования:

- Подгонка индивидуальных классификаторов выполняется последовательно.
- Слаборезультативные классификаторы отклоняются.
- На каждой итерации наблюдения взвешиваются по-разному.


XGBoost
https://habr.com/ru/company/mailru/blog/438560/
https://habr.com/ru/company/mailru/blog/438562/



https://saru.science/tech/2018/02/15/kl-divergence-explanation.html
Kullback-Leibler divergence
https://news.ycombinator.com/item?id=17916981

https://www.coursera.org/learn/machine-learning-projects/
https://www.youtube.com/user/PyDataTV/videos 
https://bloomberg.github.io/foml/#lectures
https://appliedmachinelearning.blog/
https://ml-cheatsheet.readthedocs.io/en/latest/
https://github.com/afshinea/stanford-cs-229-machine-learning/blob/master/super-cheatsheet-machine-learning.pdf
https://stanford.edu/~shervine/teaching/cs-229/

http://anotherdatum.com/index2.html

ML BOOK with code:
 https://arxiv.org/pdf/1803.08823 
 http://physics.bu.edu/~pankajm/ML-Notebooks/NotebooksforMLReview.zip - jupyter notebooks (zip) 



X-means  http://docs.splunk.com/Documentation/MLApp/3.4.0/User/Algorithms#X-means

Алгоритм кластеризации X-means представляет собой расширенный алгоритм k-means, 
который автоматически определяет количество кластеров на основе информационного байесовского критерия (BIC). 
Этот алгоритм удобно использовать, когда нет предварительной информации о числе кластеров, 
на которые эти данные могут быть разделены. 

RobustScaler http://docs.splunk.com/Documentation/MLApp/3.4.0/User/Algorithms#RobustScaler

Это алгоритм предварительной обработки данных. По применению схож с алгоритмом StandardScaler, 
который преобразует данные так, что для каждого признака среднее будет равно 0, а дисперсия будет равна 1, в
результате чего все признаки будут иметь один и тот же масштаб. 
Однако это масштабирование не гарантирует получение каких-то конкретных минимальных и максимальных значений признаков. 
RobustScaler аналогичен StandardScaler в том плане, что в результате его применения признаки будут иметь один и тот же масштаб. 
Однако RobustScaler вместо среднего и дисперсии использует медиану и квартили. 
Это позволяет RobustScaler игнорировать выбросы или ошибки измерений, которые могут стать проблемой для остальных методов 
масштабирования.


 Links
https://sandipanweb.wordpress.com/
https://habr.com/company/intel/blog/417809/ . NN architectures for image recognition

https://kite.com/blog/python/data-analysis-visualization-python

https://habr.com/company/nixsolutions/blog/417935/ памятки по искусственному интеллекту
https://thegradient.pub/why-rl-is-flawed/
https://habr.com/post/418249/ .  Google VM for ML
https://medium.com/syncedreview/google-ai-chief-jeff-deans-ml-system-architecture-blueprint-a358e53c68a5
https://news.ycombinator.com/item?id=17667705 . ML intro
https://news.ycombinator.com/item?id=17422770   Matrix 101 for ML
https://news.ycombinator.com/item?id=17664084   Math for ML

http://tools.google.com/seedbank/
https://developers.google.com/machine-learning/guides/

https://codequs.com/p/BkaLEq8r4/a-complete-machine-learning-project-walk-through-in-python
https://morioh.com/p/b56ae6b04ffc/a-complete-machine-learning-project-walk-through-in-python
ML from start to end
Open Machine Learning

https://towardsdatascience.com/forecasting-with-python-and-tableau-dd37a218a1e5 .  Tableau+ARIMA+Python

https://mlcrunch.blogspot.com/2018/08/dimensionality-reduction-techniques-guide-python.html	
	
https://github.com/Avik-Jain/100-Days-Of-ML-Code
	
https://sandipanweb.wordpress.com/2018/05/31/8626/
http://ciml.info/
https://news.ycombinator.com/item?id=17214588
http://ods.ai/  
https://habrahabr.ru/company/ods/blog/344044/   Open Data Science
https://habrahabr.ru/company/ods/blog/325422/  Открытый курс машинного обучения. Тема 6. Построение и отбор признаков


Part 1
Part 2
Part 3
https://towardsdatascience.com/another-machine-learning-walk-through-and-a-challenge-8fae1e187a64
	
## Russian translation of 3 links above:
https://habr.com/company/nixsolutions/blog/425253
https://habr.com/company/nixsolutions/blog/425907/
https://habr.com/company/nixsolutions/blog/426771/	
	
https://github.com/esokolov/ml-course-hse (ru)
	
intrepretable-machine-learning-nfl
https://spandan-madan.github.io/DeepLearningProject/  End to End Implementation
https://spandan-madan.github.io/DeepLearningProject/docs/Deep_Learning_Project-Pytorch.html
	
https://towardsdatascience.com/visualizing-data-with-pair-plots-in-python-f228cf529166  Pair plots 

	Markov Chain Monte Carlo 
	
https://towardsdatascience.com/markov-chain-monte-carlo-in-python-44f7e609be98
https://habr.com/ru/company/piter/blog/491268/	
https://news.ycombinator.com/item?id=19633212	
http://arogozhnikov.github.io/2016/12/19/markov_chain_monte_carlo.html	
https://news.ycombinator.com/item?id=15986687  Markov chain Monte-Carlo	
http://www.moderndescartes.com/essays/deep_dive_mcts/ monte carlo tree search
	

https://skymind.ai/wiki/generative-adversarial-network-gan	
	
https://habr.com/post/429276/ Вариационный автокодировщик VAE (автоэнкодер) — это генеративная модель, 
	которая учится отображать объекты в заданное скрытое пространство. 	


	


https://www.youtube.com/watch?v=Lo1rXJdAJ7w C++ ML
https://software.intel.com/en-us/ai-academy   Intel AI
https://research.fb.com/the-facebook-field-guide-to-machine-learning-video-series/  FaceBook ML video series
https://medium.com/@deepsystems
https://datamonsters.com/ company
https://eli.thegreenplace.net/2018/minimal-character-based-lstm-implementation/
	



http://www.wildml.com/
 

PyTorch

https://habr.com/company/otus/blog/358096/
https://habr.com/company/piter/blog/354912/  

https://www.reddit.com/r/Python/comments/878vjb/compute_distance_between_strings_30_algorithms/ 

https://thomaswdinsmore.com/
https://towardsdatascience.com/data-science-interview-guide-4ee9f5dc7784
https://medium.com/acing-ai/apple-ai-interview-questions-acing-the-ai-interview-803a65b0e795
https://towardsdatascience.com/data-science-and-machine-learning-interview-questions-3f6207cf040b
http://savvastjortjoglou.com/intrepretable-machine-learning-nfl-combine.html

PDF
QnA
 
My code

Neural Networks and Image Processing
	 
https://towardsdatascience.com/building-prediction-apis-in-python-part-4-decoupling-the-model-and-api-4b5eaf2ed125




Statistics
https://en.wikipedia.org/wiki/Correlation_and_dependence	
http://pages.cs.wisc.edu/~tdw/files/cookbook-en.pdf
https://etav.github.io/articles/ida_eda_method.html
http://statistics.zone/
https://h4labs.wordpress.com/2017/12/30/learning-probability-and-statistics/

	https://news.ycombinator.com/item?id=18462520 . estimate probability of yet unhappen
	
Calculating avg and stdev on stream
--------------------------------------
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
https://math.stackexchange.com/questions/20593/calculate-variance-from-a-stream-of-sample-values
https://blog.superfeedr.com/streaming-percentiles/
https://www.johndcook.com/blog/standard_deviation/
https://dev.to/nestedsoftware/calculating-a-moving-average-on-streaming-data-5a7k


https://en.wikipedia.org/wiki/Receiver_operating_characteristic  ROC curve

https://habrahabr.ru/post/311092/   standard distibutions

https://en.wikipedia.org/wiki/Outlier
https://medium.com/netflix-techblog/rad-outlier-detection-on-big-data-d6b0494371cc
	
https://en.wikipedia.org/wiki/Maximum_likelihood_estimation

https://en.wikipedia.org/wiki/Precision_and_recall

https://data36.com/statistical-bias-types-explained/
https://data36.com/statistical-bias-types-examples-part2/	

Precision is the number of correct positive classifications divided by the total number of positive labels assigned.
precision=true positives / (true positives+false positives)

Recall is the number of correct positive classifications divided by the number of positive instances that should have been identified.
recall=true positives / (true positives+false negatives)

https://en.wikipedia.org/wiki/Quantile
https://www.analyticsvidhya.com/blog/2017/02/basic-probability-data-science-with-examples/
https://en.wikipedia.org/wiki/Simpson%27s_paradox	
	
	
	Bayes
https://greenteapress.com/wp/think-bayes/

https://habr.com/ru/post/510526/ bayes in python

http://web.ipac.caltech.edu/staff/fmasci/home/astro_refs/Science-2013-Efron.pdf	
https://habrahabr.ru/post/337028/  video bayes deep ML
https://www.sanyamkapoor.com/machine-learning/the-beauty-of-bayesian-learning/
https://medium.freecodecamp.org/statistical-inference-showdown-the-frequentists-vs-the-bayesians-4c1c986f25de	
https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/
https://malobukov.dreamwidth.org/7960.html  bayes
https://www.datascience.com/blog/introduction-to-bayesian-inference-learn-data-science-tutorials
https://news.ycombinator.com/item?id=18213117	
 
	



In the case of normally distributed data,
the three sigma rule means that roughly 1 in 22 observations will differ by twice the standard deviation or more from the mean,
and 1 in 370 will deviate by three times the standard deviation

Probability density function for normal disribution with sigma=1:


https://www.dataquest.io/onboarding
https://www.dataquest.io/blog/learning-curves-machine-learning/
http://efavdb.com/
https://www.hardikp.com/
https://unsupervisedpandas.com/
https://www.zabaras.com/statisticalcomputing

	Signal  Processing

https://terpconnect.umd.edu/~toh/spectrum/
https://habr.com/post/358868/  Kalman filter


	Machine Learning

Machine Learning code snippets

List of machine learning concepts

Tour-of-machine-learning-algorithms

Regression	
	
https://developers.google.com/machine-learning/crash-course/
https://avva.livejournal.com/3074895.html#comments
https://robertheaton.com/2014/05/02/jaccard-similarity-and-minhash-for-winners/

http://efavdb.com/
https://talkery.io/conferences/507?pageNumber=1   PyData 2017 videos

www.wildml.com/2017/12/ai-and-deep-learning-in-2017-a-year-in-review/
https://habr.com/company/ods/blog/354944/
	


https://habrahabr.ru/company/itinvest/blog/262155/ TOP 10 ML algo
https://habrahabr.ru/company/cloud4y/blog/346968/
https://habrahabr.ru/post/347008/
https://habrahabr.ru/post/349048/   Autoencoders


https://habrahabr.ru/company/ods/blog/325422/  Feature extraction
https://github.com/featuretools/featuretools

https://www.youtube.com/watch?v=BfS2H1y6tzQ
https://www.youtube.com/watch?v=GsAVf3fn3yM&feature=youtu.be  Artificial Intelligence with Python | Sequence Learning
https://www.youtube.com/watch?v=RLsKzkxWpK8

https://github.com/AxeldeRomblay/MLBox

https://habrahabr.ru/company/ods/blog/350440/ Jini index


Apple
https://github.com/apple/coremltools
https://attardi.org/pytorch-and-coreml
https://github.com/apple/turicreate
https://news.ycombinator.com/item?id=15406237  Apple CoreML
https://machinelearning.apple.com/2017/08/06/siri-voices.html
https://news.ycombinator.com/item?id=16364826

NLP
NLP
Speech recognition
https://habrahabr.ru/post/350222/
https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
https://news.ycombinator.com/item?id=16626374  word2vec

http://fast.ai
https://github.com/vicky002/AlgoWiki/blob/gh-pages/Machine-Learning/Sources.md
http://www.inference.vc/design-patterns/

https://notebooks.azure.com/jakevdp/libraries/PythonDataScienceHandbook
https://eli.thegreenplace.net/tag/machine-learning

http://course.fast.ai/
http://learningsys.org/nips17/assets/slides/dean-nips17.pdf  TPU Google

R
	https://radiant-rstats.github.io/docs/index.html
	https://rattle.togaware.com/
	
Visualization and ML packages 
	https://veusz.github.io/
	https://www.knime.com/
	https://rapidminer.com/
	https://sourceforge.net/projects/weka/
	https://orange.biolab.si/
	https://elki-project.github.io/
	 
Matlab book
https://www.amazon.com/Exploratory-Analysis-Chapman-Computer-Science/dp/149877606X  Exploratory Data Analysis with MATLAB, Third Edition


JavaScript
http://propelml.org/
https://news.ycombinator.com/item?id=16465105



Clustering
https://habrahabr.ru/post/164417/
https://www.youtube.com/watch?v=-_gIcc5_uHY

https://habrahabr.ru/post/322034/ DBSCAN
https://en.wikipedia.org/wiki/DBSCAN  DBSCAN
https://towardsdatascience.com/a-gentle-introduction-to-hdbscan-and-density-based-clustering-5fd79329c1e8
	
https://mubaris.com/2017/10/01/kmeans-clustering-in-python/







https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/
 Bias is the difference between your model's expected predictions and the true values.
 The error due to bias is taken as the difference between the expected (or average) prediction of our model and the correct value which we are trying to predict.
 The error due to variance is taken as the variability of a model prediction for a given data point.
 The variance is how much the predictions for a given point vary between different realizations of the model.
 The small sample size is a source of variance. If we increased the sample size, the results would be more consistent.
 The results still might be highly inaccurate due to our large sources of bias, but the variance of predictions will be reduced
 Variance refers to your algorithm's sensitivity to specific sets of training data.
https://oneraynyday.github.io/ml/2017/08/08/Bias-Variance-Tradeoff/
	


High bias, low variance: model are consistent but inaccurate on averag
High variance, low bias: model are inconsistent but accurate on average
Low variance tends to be related to simpler atgorithms (regression, naive bayes, linear, parametric)
Low bias tends to be related to complex atgorithms (Decision tree, Near Neigbour, Non-parametric)

https://medium.com/@kevin_yang/simple-approximate-nearest-neighbors-in-python-with-annoy-and-lmdb-e8a701baf905

Regression algo can be regularized to reduce complexity
Decision tree can be pruned to reduce complexity

Too complex model -> overfitting
Too simple model -> underfitting

The Linear model does not fit the data very well and is therefore said to have a higher bias than the polynomial model.

Overfitting:
---------------
Our model doesn’t generalize well from our training data to unseen data.
Cross-validation is a powerful preventative measure against overfitting.
K-fold cross-validation: partition the the data into k subsets, called folds. Then, we iteratively train the algorithm on k-1 folds while using the remaining fold as the test set (called the “holdout fold”).

- Remove feature
- Regularization:  you could prune a decision tree, use dropout on a neural network, or add a penalty parameter to the cost function in regression.
- Early stopping
When you’re training a learning algorithm iteratively, you can measure how well each iteration of the model performs.
Up until a certain number of iterations, new iterations improve the model. After that point, however, the model’s ability to generalize can weaken as it begins to overfit the training data.
Early stopping refers stopping the training process before the learner passes that point.



Underfitting
--------------
occurs when a model is too simple – informed by too few features or regularized too much – which makes it inflexible in learning from the dataset.

In both Machine Learning and Curve Fitting, you want to come up with a model that explains (fits) the data. However, the difference in the end goal is both subtle and profound.
In Curve Fitting, we have all the data available to us at the time of fitting the curve. We want to fit the curve as best as we can.
In Machine Learning, only a small set (the training set) of data is available at the time of training. We obviously want a model that fits the data well, but more importantly, we want the model to generalize to unseen data points

http://blog.dlib.net/2017/12/a-global-optimization-algorithm-worth.html
https://towardsdatascience.com/improving-vanilla-gradient-descent-f9d91031ab1d

Classification is forecasting the target class / category
Regression if forecasting a value.

Logistic regression - dependent variable is categorical.
https://www.analyticsvidhya.com/blog/2017/08/skilltest-logistic-regression/
Logistic function predict the corresponding target class.
Probability of result = logistic function:
y = 1 / ( 1 + exp(-f(x)))  in range from 0 to 1.
if f(x) = 0 then y=0.5
if f(x) is big negative # then y=0
if f(x) is big positive # then y=1

f(x) = ax+b, here X is the input vector and A is a parameter vector
Goal is to find A.
The common method is Max likelehood (logarithm) criteria; the gradiend descend can be used

x - random outcomes
theta - parameter
L(theta| x) = P(x | theta)

Because the logarithm is a monotonically increasing function, the logarithm of a function achieves its maximum value at the same points as the function itself, and hence the log-likelihood can be used in place of the likelihood in maximum likelihood estimation

Regulariation: to decrease overfitting

https://habr.com/ru/post/456176/ .  L1 and L2 Stohastic Gradient Descent
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/
https://www.quora.com/What-is-the-difference-between-L1-and-L2-regularization-How-does-it-solve-the-problem-of-overfitting-Which-regularizer-to-use-and-when
L2: Euclide
L1: producies many coefficients with zero values or very small values with few large coefficients

Bagging and other resampling techniques can be used to reduce the variance in model predictions.
In bagging (Bootstrap Aggregating), numerous replicates of the original data set are created using random selection with replacement.
Each derivative data set is then used to construct a new model and the models are gathered together into an ensemble.
To make a prediction, all of the models in the ensemble are polled and their results are averaged.
Bagging attempts to reduce the chance overfitting complex models.

It trains a large number of "strong" learners in parallel.
A strong learner is a model that's relatively unconstrained.
Bagging then combines all the strong learners together in order to "smooth out" their predictions.
Boosting attempts to improve the predictive flexibility of simple models.

It trains a large number of "weak" learners in sequence.
A weak learner is a constrained model (i.e. you could limit the max depth of each decision tree).
Each one in the sequence focuses on learning from the mistakes of the one before it.
Boosting then combines all the weak learners into a single strong learner.
While bagging and boosting are both ensemble methods, they approach the problem from opposite directions.

Bagging uses complex base models and tries to "smooth out" their predictions, while boosting uses simple base models and tries to "boost" their aggregate complexity.

https://www.analyticsvidhya.com/blog/2017/02/40-questions-to-ask-a-data-scientist-on-ensemble-modeling-techniques-skilltest-solution/
https://towardsdatascience.com/markov-chain-monte-carlo-in-python-44f7e609be98
https://habr.com/ru/company/piter/blog/491268/
Decision Tree
 https://github.com/Yorko/mlcourse.ai/blob/master/jupyter_russian/topic03_decision_trees_knn/topic3_trees_knn.ipynb
 https://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/
 https://heartbeat.fritz.ai/introduction-to-decision-tree-learning-cd604f85e236
 http://www.win-vector.com/blog/2017/01/why-do-decision-trees-work/	
	
The primary challenge in the decision tree implementation is to identify which attributes do we need to consider as the root node and each level.
In decision tree algorithm calculating the nodes and forming the rules will happen using the information gain and Gini index.
Information Gain calculates the expected reduction in entropy due to sorting on the attribute.
Gini Index is a metric to measure how often a randomly chosen element would be incorrectly identified. It means an attribute with lower gini index should be preferred.

Random Forest

	 https://habr.com/ru/company/ruvds/blog/488342/ 
         
	https://habr.com/ru/company/piter/blog/488362/
	
Random forest algorithm is a supervised classification algorithm.
Random forest algorithm can use both for classification and the regression kind of problems.
It works by training numerous decision trees each based on a different resampling of the original training data.
In Random Forests the bias of the full model is equivalent to the bias of a single decision tree (which itself has high variance).
By creating many of these trees, in effect a "forest", and then averaging them the variance of the final model can be greatly reduced over that of a single tree. In practice the only limitation on the size of the forest is computing time as an infinite number of trees could be trained without ever increasing bias and with a continual (if asymptotically declining) decrease in the variance.

https://victorzhou.com/blog/intro-to-random-forests/
http://dataaspirant.com/2017/05/22/random-forest-algorithm-machine-learing/
https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d
https://medium.com/@williamkoehrsen/random-forest-in-python-24d0893d51c0
https://r4ds.had.co.nz/
https://news.ycombinator.com/item?id=19632052

Separate training and test sets
------------------------------------
Split the data into three sets — training (60%), validation (a.k.a development) (20%) and test (20%).
Use the training set to train different models, the validation set to select a model
and finally report performance on the test set.

- Trying appropriate algorithms (No Free Lunch)
- Fitting model parameters
- Tuning impactful hyperparameters
- Proper performance metrics
- Systematic cross-validation





https://blog.statsbot.co/machine-learning-algorithms-183cc73197c
https://towardsdatascience.com/battle-of-the-deep-learning-frameworks-part-i-cff0e3841750
http://pbpython.com/categorical-encoding.html
https://www.kaggle.com/dansbecker/using-categorical-data-with-one-hot-encoding

https://github.com/onurakpolat/awesome-analytics
ML cheetsheets

Tensor Flow
ML DLIB C++

https://medium.com/@mngrwl/explained-simply-how-deepmind-taught-ai-to-play-video-games-9eb5f38c89ee
CoreML
ML blog
ML method
ML plan
MLOSS.org





distill.pub
### PCA
https://joellaity.com/2018/10/18/pca.html
PCA 1
PCA 2

Q & A


https://habrahabr.ru/company/newprolab/blog/350584/ t-SNE and UMAP

http://www.datatau.com/
https://www.bonaccorso.eu/
https://habrahabr.ru/company/oleg-bunin/blog/340184/ Architectures of NN
https://morfizm.livejournal.com/1136917.html  BitFunnel
https://blog.statsbot.co/
http://rpubs.com/JDAHAN/172473




https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-algorithm-choice




https://elitedatascience.com/machine-learning-iteration
https://elitedatascience.com/dimensionality-reduction-algorithms
https://elitedatascience.com/machine-learning-algorithms