Michael Lubinsky

← Home

Machine Learning

STOHASTIC GRADIENT EXPLAINED https://habr.com/ru/companies/airi/articles/883266/

https://www.bishopbook.com/

Free book: introduction to statistical Learning with Python: https://www.statlearning.com/

https://arxiv.org/abs/2502.05244 Probabilistic Artificial Intelligence

https://www.amazon.com/Probabilistic-Robotics-INTELLIGENT-ROBOTICS-AUTONOMOUS/dp/0262201623

https://www.amazon.com/dp/3031064682/ Robotics Vision and Control with Python

Automatic EDA https://habr.com/ru/companies/gazprombank/articles/881386/

https://habr.com/ru/companies/yandex_praktikum/articles/879316/ EDA

https://github.com/NeKonnnn/Exploratory_Data_Analysis

https://www.youtube.com/playlist?list=PL4_hYwCyhAvaprlx_MnGC5xyKR1qb7Y7L Математика больших данных

https://www.llm-book.com/ Book

https://www.amazon.com/_/dp/149204552 Book Deep Learning for Coders with fastai and PyTorch

https://www.amazon.com/dp/1108415199/ref=sspa_dk_detail_1 Roman Vershinin. High-Dimensional Probability: An Introduction with Applications in Data Science

mlcourse.ai/

https://news.ycombinator.com/item?id=42827913 Good links!

https://habr.com/ru/articles/870718/ Наивный байесовский классификатор. Теория и реализация. С нуля

https://p.migdal.pl/blog/2025/01/dont-use-cosine-similarity

100+ LLM Interview Questions for Top Companies https://github.com/llmgenai/LLMInterviewQuestions/tree/main

Mathematics for machine learning book https://mml-book.github.io/ https://course.ccs.neu.edu/ds4420sp20/readings/mml-book.pdf

Machine learning in production https://mlip-cmu.github.io/book/

https://karpov.courses/ml-hard ХАРДКОРНЫЙ MACHINE LEARNING

Book
https://www.manning.com/books/machine-learning-system-design Valerii Babushkin and Arseny Kravchenko

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications. Chip Huyen https://www.amazon.com/Designing-Machine-Learning-Systems-Production-Ready/dp/1098107969

Book https://github.com/abhishekkrthakur/approachingalmost/blob/master/AAAMLP.pdf

https://www.youtube.com/@abhishekkrthakur

Sklearn; XGBoost; LightGBM; Catboost;

https://habr.com/ru/companies/otus/articles/869372/

Компиляторы для глубоких нейросетевых моделей:

https://hpc-education.unn.ru/%D0%BE%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5/%D0%BA%D1%83%D1%80%D1%81%D1%8B/%D0%B1%D0%B0%D0%BA%D0%B0%D0%BB%D0%B0%D0%B2%D1%80%D0%B8%D0%B0%D1%82/%D0%BA%D0%BE%D0%BC%D0%BF%D0%B8%D0%BB%D1%8F%D1%82%D0%BE%D1%80%D1%8B-%D0%B4%D0%BB%D1%8F-%D0%B3%D0%BB%D1%83%D0%B1%D0%BE%D0%BA%D0%B8%D1%85-%D0%BC%D0%BE%D0%B4%D0%B5%D0%BB%D0%B5%D0%B9

https://poloclub.github.io/transformer-explainer/ Transformer explained

FDTool: a Python application to mine for functional dependencies and candidate keys in tabular data” Matt Buranosky https://habr.com/ru/articles/866718/

https://pypi.org/project/fdtool/

https://hpi.de/naumann/projects/repeatability/data-profiling/fds.html

https://hpi.de/naumann/projects/repeatability/data-profiling.html

https://parlance-labs.com/education/

https://github.com/chiphuyen/aie-book/blob/main/resources.md

https://rish-01.github.io/blog/posts/ml_estimation/ Loss function

https://stats.stackexchange.com/questions/357963/what-is-the-difference-between-cross-entropy-and-kl-divergence

Как на самом деле работает Attention https://habr.com/ru/companies/oleg-bunin/articles/865856/

5 способов установить и нативно использовать ChatGPT на компьютерах Mac: https://habr.com/ru/companies/x-com/articles/865858/

https://www.reddit.com/r/MachineLearning/comments/1gwbhxq/d_next_big_thing_in_time_series/

https://github.com/thuml/Time-Series-Library

https://www.reddit.com/r/MachineLearning/comments/1gujfj2/d_whats_the_most_surprising_or_counterintuitive/

https://sites.google.com/view/datascience-cheat-sheets

Book: https://mml-book.github.io/

https://www.kdnuggets.com/7-free-cloud-ide-for-data-science-that-you-are-missing-out

Book: Essential Math for AI: Next‑Level Mathematics for Efficient and Successful AI Systems

https://habr.com/ru/companies/raft/articles/851548/ Autoencoder in russian

https://dokumen.pub/essential-math-for-ai-next-level-mathematics-for-efficient-and-successful-ai-systems-1nbsped-1098107632-9781098107635.html

https://www.kdnuggets.com/7-free-cloud-ide-for-data-science-that-you-are-missing-out

https://www.justinmath.com/books/

https://physicsbaseddeeplearning.org/intro.html . BOOK

https://github.com/graviraja/MLOps-Basics

http://neuralnetworksanddeeplearning.com/

https://course.fast.ai/ Practical Deep Learning for coders

https://huyenchip.com/ml-interviews-book/

https://www.linkedin.com/groups/961087 Machine Learning Group

https://arxiv.org/abs/2201.00650 Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI

https://arxiv.org/pdf/2207.10185.pdf Modern Stat Learning Book

https://habr.com/ru/articles/814343/ What is Kernel

ML with tabular data: https://news.ycombinator.com/item?id=41072616

https://habr.com/ru/articles/829336/ Bootstrap

https://github.com/owainlewis/awesome-artificial-intelligence

https://arxiv.org/pdf/1912.13213.pdf Modern Online learning

https://habr.com/ru/articles/783766/ interview

https://habr.com/ru/companies/megafon/articles/800919/ interview

https://jeroenjanssens.com/dsatcl/ data science from command-line

https://www.evidentlyai.com/ml-system-design

https://habr.com/ru/articles/800973/ data preparation for ML

https://github.com/ml-tooling/best-of-ml-python

https://thecleverprogrammer.com/2023/07/15/machine-learning-projects-using-python/

https://habr.com/ru/users/egaoharu_kensei/publications/articles/

https://stepik.org/course/68260/promo Free course

https://habr.com/ru/companies/raft/articles/811371/ error backpropagation

math behind AlexNet https://towardsdatascience.com/the-math-behind-deep-cnn-alexnet-738d858e5a2f

https://nuancesprog.ru/computer-science/

https://habr.com/ru/articles/813221/ Методы оптимизации в машинном и глубоком обучении.

https://www.youtube.com/watch?v=yyHP4ySDOeg Местецкий Л.М. | Лекция 12.1 по Обработке и распознаванию изображений, 2024, весна| ВМК МГУ

https://www.youtube.com/watch?v=lEGR3u2SWKk Воронцов К.В. | Лекция 27 по Методам машинного обучения, 2024, весна| ВМК МГУ

Доп. главы машинного обучения, Карпачёв Н.Е., лекция 1, 05.02.2024 https://www.youtube.com/watch?v=Cs5NuVseHxU&list=PLti61wgkUWHyhCM4jK0ktwyDGN8XiGgHv&index=1

Автопереобучение моделей в Production https://habr.com/ru/companies/alfa/articles/821447/

All the FREE Stanford University Machine Learning Lectures 👇

Learn about Probability, NLP, LLMs, Transformers, and more …

  1. Probability for Computer Scientists - https://lnkd.in/e6sCyZGj

  2. Machine Learning Full Course taught by Andrew Ng - https://lnkd.in/eWs74qyR

  3. NLP with Deep Learning - https://lnkd.in/eazqcvmk

  4. Machine Learning Explainability - https://lnkd.in/evimZ5Za

  5. Reinforcement Learning - https://lnkd.in/eEf5PETJ

  6. Deep Generative Models - https://lnkd.in/euZ2e3xU

  7. Building Large Language Models (LLMs) - https://lnkd.in/eVUkaJuF

  8. Machine Learning with Graphs - https://lnkd.in/eF_d3iwq

  9. Transformers United - https://lnkd.in/eXdGBqQq

👉 Get free resources, curated articles, and expert tips on Data and AI: https://lnkd.in/e7EunZck

https://www.kdnuggets.com/10-github-repositories-to-master-data-engineering

  1. Airbnb AI & Machine Learning - https://lnkd.in/gzAXfQg5
  2. Databricks Data Science & ML - https://lnkd.in/gRPE8Gbm
  3. Google AI research applications - https://lnkd.in/gctN7Ths
  4. NVIDIA Data Science - https://lnkd.in/ghzhBPnm
  5. Apple Machine Learning Research - https://lnkd.in/gcJggDju
  6. Stripe Machine Learning for Fraud Detection - https://lnkd.in/gKS4F-V3
  7. Netflix Recommender System - https://lnkd.in/gm4pTRf4
  8. Uber AI - https://lnkd.in/gXA8UEBU
  9. X (aka Twitter) - https://lnkd.in/gNDME2-m
  10. Pinterest Ads Recommender - https://lnkd.in/g8QXRH3i
  11. Meta AI - https://lnkd.in/gGMp-Jh2
  12. Microsoft ML - https://lnkd.in/gm-aSSP9
  13. DoorDash Data Science and ML - https://lnkd.in/gHkDwpvC
  14. MongoDB AI - https://lnkd.in/g8c3HNaa
  15. Amazon Machine Learning Blog - https://lnkd.in/g2Q3ZmEh
  16. Grammarly NLP/ML - https://lnkd.in/guMDtPfW
  17. Spotify Machine Learning - https://lnkd.in/gGq7uj9g

  18. Data Science Interviews By Alexey Grigorev : https://lnkd.in/gdRwnWeJ
  19. Cheatsheet for Data Science by Austin Powell : https://lnkd.in/g5MSJrpD
  20. Cracking the DS interview by James Le(: https://lnkd.in/geSVJmJR
  21. DS Questions and Answers by Youssef Hosni : https://lnkd.in/gA5ihxs5
  22. Coding and ML System Design by Alireza Dirafzoon: https://lnkd.in/gXFaaaQR

https://habr.com/ru/articles/859478/ not only transformers

Vector DB

https://habr.com/ru/articles/817173/

https://www.youtube.com/watch?v=035I2WKj5F0

https://habr.com/ru/companies/tochka/articles/809493/

https://swirlaiconnect.com/blog/using-vectors-without-a-vector-database

https://duckdb.org/2024/05/03/vector-similarity-search-vss.html

ML on graphs: https://storage.googleapis.com/xavierbresson/lectures/CS6208/lecture01_introduction.pdf

Теоретические основы всех популярных алгоритмов машинного обучения и их реализация с нуля на Python https://habr.com/ru/articles/804605/

https://habr.com/ru/companies/yandex/articles/800945/ data quantization

Feature Selection Tutorial with Python Examples https://arxiv.org/pdf/2106.06437.pdf

https://arxiv.org/pdf/1905.12787.pdf The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial

https://smunshi.net/entropy-cross-entropy-and-kullback-leibler-divergence.html

Machine Learning Phystech https://www.youtube.com/@MachineLearningIS

Доп. главы машинного обучения, Карпачёв Н.Е. https://www.youtube.com/watch?v=Cs5NuVseHxU

https://habr.com/ru/companies/otus/articles/805801/ Time series, clustering

ML Notebooks

https://github.com/ageron/handson-ml3

https://aman.ai/primers/ai/

Topological data analysis. Nikita Kalinin. https://www.youtube.com/playlist?list=PLKXEsFnBcT5BD47xO19UsKshsQ7DMU3Sa

https://habr.com/ru/articles/814981/ Ansamble learning

Inside the Machine Learning Interview: 151 Real Questions from FAANG

https://leanpub.com/insidethemachinelearninginterview/c/CyberMonday2023HugeSale

beginner-level projects in Machine Learning!

Математика больших данных (4-5 курсы, осень 2022) - Гасников А.В

https://www.youtube.com/playlist?list=PL4_hYwCyhAvZIqYnUHqHf7g0G74gPOHo1

https://arxiv.org/abs/2308.10825 Topological data analysis

https://habr.com/ru/companies/otus/articles/773102/ bias, variance, etc

https://news.ycombinator.com/item?id=37137810 links

https://www.youtube.com/playlist?list=PLnu7tVik2MzLUv0wrYiBHOwbf7QpcWlIr

https://www.oreilly.com/library/view/machine-learning-with/9781098135713/ ML Cookbook, 2nd edition

https://docs.profiling.ydata.ai/ Data quality for Pandas and Spark dataframes

https://www.youtube.com/watch?v=6IGx7ZZdS74 Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)

https://medium.com/@techlatest.net/exploratory-data-analysis-with-python-jupyter-notebook-a-tutorial-on-how-to-perform-exploratory-5a800791b04f

https://github.com/rougier/ML-Recipes

Steven Brunton book and code

https://www.youtube.com/@Eigensteve

https://faculty.washington.edu/sbrunton/DataBookV2.pdf

https://github.com/dynamicslab/

100 Data Science вопросов https://www.youtube.com/watch?v=BI-yjkRKymg

https://www.cs.toronto.edu/~duvenaud/cookbook/

https://www.cs.ubc.ca/~schmidtm/Courses/LecturesOnML/

https://onedrive.live.com/?authkey=%21AMC9DofT%5FW6hqE4&cid=199F8C87205FEB30&id=199F8C87205FEB30%21214006&parId=199F8C87205FEB30%21213972&o=OneUp

Пример уменьшения размерности данных с помощью линейных и нелинейных методов в Python:

PCA ICA TruncatedSVD MDS(Multidimensional Scaling)t-SNE (t-Distributed Stochastic Neighbor Embedding) UMAP (Uniform Manifold Approximation and Projection)

https://habr.com/ru/articles/751050/

Cosine Similarity

https://en.wikipedia.org/wiki/Cosine_similarity

Graph NN

https://distill.pub/2021/gnn-intro/

aGrUM is a C++ library for graphical models. It is designed for easily building applications using graphical models such as Bayesian networks, influence diagrams, decision trees, GAI networks or Markov decision processe

https://agrum.gitlab.io/

https://webia.lip6.fr/~phw/aGrUM/BookOfWhy/

EDA exploratory data analysis:

https://habr.com/ru/companies/otus/articles/752434/

Serving ML model https://news.ycombinator.com/item?id=32277894

https://kolodezev.ru/mlsysd1.html

https://www.mlexample.com/ . BOOK

https://arxiv.org/abs/2203.08890 The Mathematics of Artificial Intelligence

https://news.ycombinator.com/item?id=30984662

https://www.cs.ubc.ca/~schmidtm/Courses/LecturesOnML/ Pen and Paper in ML

https://www.youtube.com/user/sanshush

https://arxiv.org/abs/2206.13446 Pen and paper exersices in ML

https://machinelearningrecipes.com/blog/post/1687564/stochastic-processes-and-simulations

https://deepmind.com/learning-resources/reinforcement-learning-series-2021

https://www.youtube.com/watch?v=mLHIMulCHcM Andrew Ng’s New Machine Learning Specialization is Out!!!

Why do tree-based models still outperform deep learning on tabular data https://news.ycombinator.com/item?id=32333565

Differentiable programming

https://www.assemblyai.com/blog/differentiable-programming-a-simple-introduction/

https://news.ycombinator.com/item?id=31000709

Никита Калинин, топологическ анализ данных

https://www.youtube.com/playlist?list=PLKXEsFnBcT5BD47xO19UsKshsQ7DMU3Sa

Feature Engineering: https://habr.com/ru/company/ruvds/blog/680498/

Laslo vs Ridge regression https://habr.com/ru/post/679232/

https://news.ycombinator.com/item?id=30417811 What are the most important statistical ideas of the past 50 years? (tandfonline.com)

High-dimentional ML:

https://towardsdatascience.com/high-dimensional-learning-ea6131785802

Monte-Carlo in Python https://towardsdatascience.com/how-to-create-a-monte-carlo-simulation-using-python-c24634a0978a

https://www.manning.com/books/deep-learning-with-python-second-edition. ML with Python 2nd ed

Real time ML https://huyenchip.com//2022/01/02/real-time-machine-learning-challenges-and-solutions.html

https://arxiv.org/abs/1709.02840 Introduction to Machine Learning

https://web.eecs.umich.edu/~jabernet/eecs598course/fall2015/web/

https://arxiv.org/abs/2004.09280 Towards a theory of machine learning. Vitaly Vanchurin

https://www.youtube.com/watch?v=LlKAna21fLE

https://news.ycombinator.com/item?id=30658324 ML tools

http://shamin.ru/

Лекция 1. Р.В. Шамин. Стохастический анализ и его приложения в машинном обучении https://www.youtube.com/playlist?list=PLUbD59ZHv1GSmj1ecP7ARU2LMqZgT_TyQ

https://thegradient.pub/

Fingerprint Matching in Python https://www.youtube.com/watch?v=IIvfqfKkiio

PCA https://news.ycombinator.com/item?id=30876293

UMAP

https://github.com/kmkolasinski/nano-umap

https://www.reddit.com/r/MachineLearning/comments/1gsjfq9/p_analysis_of_why_umap_is_so_fast/

https://habr.com/ru/articles/811437/ отличия PCA от UMAP и t-SNE

Automatic differentiation JAX

https://arxiv.org/pdf/2110.06209.pdf A Brief Introduction to Automatic Differentiation for Machine Learning

automatic differntiation JAX https://arxiv.org/pdf/2111.00254.pdf

https://www.youtube.com/watch?v=WdTeDXsOSj4

https://deepmind.com/blog/article/using-jax-to-accelerate-our-research

WAX-ML

https://arxiv.org/pdf/2106.06524.pdf WAX-ML: A Python library for machine learning and feedback loops on streaming data

https://wax-ml.readthedocs.io/en/latest/

https://cdanielaam.medium.com/essential-mathematical-equations-for-predictive-models-fcb79630ec96

https://inria.github.io/scikit-learn-mooc/

https://github.com/orico/www.mlcompendium.com

https://github.com/blobcity/ai-seed 1000+ ready code templates to kickstart your next AI experiment

https://lazypredict.readthedocs.io/en/latest/readme.html

Spearman vs Pearson correlation:

https://medium.com/productive-data-science/spearman-coefficient-tool-for-a-generalized-correlation-analysis-d15b70d4ff1e

https://www.argmin.net/

Approximation algo (count distinct, count min sketch=frequency count)

https://datasketches.apache.org/ A software library of stochastic streaming algorithms

https://medium.com/inspiredbrilliance/fast-approximation-on-massive-datasets-dd23117bab7f

https://www.youtube.com/watch?v=6nqV58NA_Ew Adam Optimization from Scratch in Python

https://web.stanford.edu/class/cs168/

https://web.stanford.edu/class/cs168/l/

https://web.stanford.edu/class/cs168/l/l1.pdf

https://web.stanford.edu/class/cs168/l/l2.pdf

https://t.me/emeliml Эмели Драль о прикладном машинном обучении.

https://github.com/faridrashidi/kaggle-solutions Kaggle solutions

https://habr.com/ru/company/skillfactory/blog/561044/ . useful libs

Active learning

https://lilianweng.github.io/lil-log/2022/02/20/active-learning.html

https://habr.com/ru/post/592177/

https://habr.com/ru/post/593615/

Online machine learning (incremental learning, continual learning, and stream learning)

https://www.quantamagazine.org/the-computer-scientist-trying-to-teach-ai-to-learn-like-we-do-20220802

https://github.com/online-ml/river#readme

https://towardsdatascience.com/river-the-best-python-library-for-online-machine-learning-56bf6f71a403

https://towardsdatascience.com/7-cool-python-packages-kagglers-are-using-without-telling-you-e83298781cf4

https://causalnex.readthedocs.io/en/latest/ Casuality vs correlation

https://towardsdatascience.com/summarize-pandas-data-frames-b9770567f940 Skimpy is a convenient way to generate quick summaries of any dataset, even without writing any code.

https://habr.com/ru/post/654907/. Jini index

kernels in ML https://pub.towardsai.net/types-of-kernels-in-machine-learning-291cf85fcdd0

https://numpy-ml.readthedocs.io/en/latest/

https://www.i-programmer.info/news/89-net/14846-free-resources-for-machine-learning.html

https://predictivehacks.com/10-tips-and-tricks-for-data-scientists-vol-10/

https://www.youtube.com/watch?v=RaTe3dhiqdE entropy, mutual info

https://www.toptal.com/algorithms/metropolis-hastings-bayesian-inference

https://huyenchip.com/ml-interviews-book/ INTERVIEW BOOK

https://rentruewang.github.io/learning-machine/intro.html. BOOK

https://whitead.github.io/dmol-book

https://github.com/EdemGold/Nutshell-Machine-Learning

https://arxiv.org/abs/2010.03415 Knowledge based learning

https://github.com/r0f1/datascience

https://arxiv.org/pdf/2108.02497

https://habr.com/ru/company/otus/blog/573924/

https://github.com/dair-ai/ML-YouTube-Courses

https://dataelixir.com/

https://github.com/machow/siuba Python library for using dplyr like syntax with pandas and SQL

Kalman filter

https://towardsdatascience.com/kalman-filter-in-a-nutshell-e66154a06862

Geometric machine learning:

https://towardsdatascience.com/geometric-foundations-of-deep-learning-94cdd45b451d

https://www.youtube.com/watch?v=w6Pw4MOzMuo

https://news.ycombinator.com/item?id=27577467 ML beyound curve fitting

Confusion matrix, Accuracy, Precision, Recall, F-score, ROC-AUC

https://habr.com/ru/articles/821547/

https://habr.com/ru/articles/820411/ Confusion matrix, Accuracy, Precision, Recall, F-score, ROC-AUC

https://github.com/phongsathorn1/pretty-confusion-matrix Confusion matrix

https://en.wikipedia.org/wiki/Receiver_operating_characteristic ROC curve

https://towardsdatascience.com/a-graphical-explanation-of-roc-and-auc-183705caeb27

https://towardsdatascience.com/understanding-roc-curves-c7f0b52e931e Understanding ROC Curves with Python

https://habr.com/ru/company/netologyru/blog/582756/

Dimentionality reduction

https://towardsdatascience.com/dimensionality-reduction-explained-5ae45ae3058e Dimentionality reduction

https://featuretools.alteryx.com/en/stable Feature extraction

https://arxiv.org/pdf/2106.06437.pdf Feature selection

https://stackabuse.com/random-projection-theory-and-implementation-in-python-with-scikit-learn/

Spearman’s rank correlation coeff

https://stackabuse.com/calculating-spearmans-rank-correlation-coefficient-in-python-with-pandas/

http://creatingdata.us/techne/deep_scatterplots/# Zoomable scatterplot

AutoML

https://habr.com/ru/articles/811425/

https://habr.com/ru/post/559130/

Data Driven Causal Relationship Discovery with Python Example Code https://pkghosh.wordpress.com/2021/05/25/data-driven-causal-relationship-discovery-with-python-example-code/

https://habr.com/ru/company/otus/blog/559666/ Kaggle tricks

Accuracy, Precision, Recall, F1-score

https://habr.com/ru/articles/775032/

Transformers

https://news.ycombinator.com/item?id=29617087

https://habr.com/ru/company/skillfactory/blog/562928/

https://habr.com/ru/company/wunderfund/blog/592231/

https://habr.com/ru/post/558488/

https://habr.com/ru/post/563778/

https://habr.com/ru/company/wunderfund/blog/594333/

Hardware

https://semiengineering.com/neural-networks-without-matrix-math/

https://semiengineering.com/developers-turn-to-analog-for-neural-nets/

Мир статистических гипотез

https://habr.com/ru/post/558836/

https://habr.com/ru/post/556856/ Statistics with Python

SHAP - explainable AI

https://christophm.github.io/interpretable-ml-book/

https://www.youtube.com/watch?v=pqNCD_5r0IU Scikit-Learn Course

https://thedatasciencedigest.substack.com/p/python-data-science-digest-may-2021

Welford’s_online_algorithm

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford’s_online_algorithm

https://natural-blogarithm.com/post/variance-welford-vs-numpy/

https://www.youtube.com/watch?v=ylytZegK–I

https://www.youtube.com/watch?v=IqT551LjKHw

https://www.facebook.com/pythonposts/

https://explained.ai/regularization/index.html . Regalurization

https://www.youtube.com/watch?v=tX_MeIbfEmw К.В. Воронцов “Обзор постановок оптимизационных задач машинного обучения”

https://arxiv.org/abs/1803.08823 A high-bias, low-variance introduction to Machine Learning for physicists

MAXIMUM Likelihood

https://habr.com/ru/company/otus/blog/585610/

https://towardsdatascience.com/maximum-likelihood-vs-bayesian-estimation-dd2eb4dfda8a

https://ggcarvalho.dev/posts/montecarlo/ . Monte Carlo with GO

https://habr.com/ru/company/otus/blog/555980/ stat paradox

Linear regression

https://www.youtube.com/watch?v=EMIyRmrPWJQ

https://habr.com/ru/articles/850168/

https://aegis4048.github.io/mutiple_linear_regression_and_visualization_in_python . Linear regression

Fourier

https://sidsite.com/posts/fourier-nets/

https://news.ycombinator.com/item?id=26980169

Max_Welling

https://en.wikipedia.org/wiki/Max_Welling

https://www.youtube.com/watch?v=mmDw5glry9w

https://www.reddit.com/r/MachineLearning/comments/mwwftu/d_your_favorite_ai_podcasts_blogs_newsletters/

Курс Р.В. Шамина «Машинное обучение и искусственный интеллект в математике и приложениях»

http://ai.lector.ru/

http://www.mathnet.ru/conf1243

https://github.com/rwsh

Simulated annealing (ru) https://www.math.spbu.ru/user/gran/sb1/lopatin.pdf

Mikhail BELKIN https://www.youtube.com/watch?v=yPwCb12V0Mk

https://www.theinsaneapp.com/2020/12/machine-learning-and-data-science-cheat-sheets-pdf.html

https://www.alexpghayes.com/blog/many-models-workflows-in-python-part-i/

https://www.youtube.com/watch?v=7inArpm-83U. Interview

https://theblog.github.io/post/from-tensorflow-to-pytorch/. PyTorch for Tensorflow users

https://twitter.com/icymi_py Python data science

https://github.com/SimonBlanke/Hyperactive collection of optimization algorithms that can be used for a variety of optimization problems.

Gradient-free algorithms

https://github.com/SimonBlanke/Gradient-Free-Optimizers

https://news.ycombinator.com/item?id=26293171

https://habr.com/ru/post/549376. ROC curve

Tidy data

https://vita.had.co.nz/papers/tidy-data.pdf

https://pbeshai.github.io/tidy/

https://uwdata.github.io/arquero/

Stat

http://www.machinelearning.ru/wiki/images/7/7c/SMAIS11_MCMC.pdf

https://habr.com/ru/post/455762/. Markov

https://habr.com/ru/company/skyeng/blog/473124/

http://www.randomservices.org/random/. Markov chain etc

https://www.youtube.com/watch?v=i3AkTO9HLXo&list=PLM8wYQRetTxBkdvBtz-gw8b9lcVkdXQKV Markov chain

http://www.stat.columbia.edu/~gelman/research/unpublished/stat50.pdf

https://news.ycombinator.com/item?id=26374788

https://mixtape.scunning.com/

https://mlfromscratch.com/model-stacking-explained/#/

Educational projects

https://habr.com/ru/post/562640/

https://github.com/rushter/MLAlgorithms

https://github.com/trekhleb/homemade-machine-learning/

https://github.com/Gautam-J/Machine-Learning

https://pypi.org/project/sealion/

https://habr.com/ru/post/541742/. Image processing

Apache Spark

https://sparkbyexamples.com/h2o-sparkling-water/install-running-sparkling-water-on-mac-os/

brew install apache-spark

H2O

https://docs.h2o.ai/h2o/latest-stable/h2o-docs/downloading.html

version 3.32.0.4

https://anaconda.org/h2oai/h2o

conda install -c h2oai h2o

https://pypi.org/project/h2o/

H2O Web GUI

java -jar h2o.jar

http://localhost:54321

https://docs.h2o.ai/h2o/latest-stable/h2o-docs/flow.html

https://towardsdatascience.com/h2o-for-inexperienced-users-7bc064124264

https://www.coursera.org/learn/machine-learning-h2o

Time series example https://www.h2o.ai/products-dai-timeseries/

https://youtu.be/0pvvDHfxdZ8

https://github.com/SeanPLeary/time-series-h2o-automl-example/blob/master/h2o_automl_example_with_multivariate_time_series.ipynb

https://stackoverflow.com/questions/56666876/how-to-predict-future-values-of-time-series-using-h2o-predict

Books

https://www.confetti.ai/assets/ml-primer/ml_primer.pdf Primer

Book: Dive into Deep learning https://d2l.ai/d2l-en.pdf

https://www.deeplearningbook.org/

Book: Computer Age Statistical Inference

https://web.stanford.edu/~hastie/CASI_files/PDF/casi.pdf

Book: https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf

https://www.microsoft.com/en-us/research/people/cmbishop/

Book: Probabilistic Machine Learning: An Introduction. by Kevin Patrick Murphy. MIT Press, 2021. https://probml.github.io/pml-book/book1.html

Book: Foundation of Data Science

http://www.cs.cornell.edu/jeh/book%20no%20so;utions%20March%202019.pdf

Book: Data Science in Production

https://levelup.gitconnected.com/book-launch-data-science-in-production-54b325c03818

https://github.com/bgweber/DS_Production

https://mlpowered.com/book/ Book

Book: https://deeplearningsystems.ai/

https://marksaroufim.medium.com/the-robot-overlord-manual-d4ee709155bc

https://github.com/ml-tooling/best-of-ml-python

https://www.youtube.com/channel/UCh8IuVJvRdporrHi-I9H7Vw. Unfold Data Science

Linear models: https://www.youtube.com/watch?v=68ABAU_V8qI

Best ML Blogs: https://bloggingfordevs.com/machine-learning-blogs/

https://habr.com/ru/company/skillbox/blog/540940/. DataScientist Job Intervew in Aazon

500 + 𝗔𝗿𝘁𝗶𝗳𝗶𝗰𝗶𝗮𝗹 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝗰𝗲 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 𝗟𝗶𝘀𝘁 𝘄𝗶𝘁𝗵 𝗰𝗼𝗱𝗲: https://github.com/ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

https://github.com/gugarosa/opytimizer

https://www.mindsdb.com/. MindsDB provides a simple way to create, train and test ML models and then publish them as virtual AI-Tables into databases.

Integrate seamlessly with most of databases on the market Use SQL queries for all manipulation with ML models Improve model training speed with GPU without affecting your database performance

https://betterexplained.com/articles/intuitive-convolution/ Convolution

https://rubikscode.net/2020/11/15/top-9-feature-engineering-techniques/

https://madewithml.com/

jupyter notebooks) for the “The Elements of Statistical Learning” https://github.com/maitbayev/the-elements-of-statistical-learning

Desktop applications for viewing and analyzing tabular data

https://moosetechnology.org/

https://news.ycombinator.com/item?id=24983603

visidata.org

https://github.com/antonycourtney/tad Tad

https://towardsdatascience.com/introduction-to-d-tale-5eddd81abe3f. DTale

https://towardsdatascience.com/4-libraries-that-can-perform-eda-in-one-line-of-python-code-b13938a06ae

Pandas-Profiling
Sweetviz
Autoviz
D-Tale

https://towardsdatascience.com/drag-and-drop-tools-for-machine-learning-pipelines-worth-a-try-63ace4a18715

Knime
Orange

ML

https://machinelearningmastery.com/calculate-feature-importance-with-python/

https://towardsdatascience.com/from-linear-regression-to-ridge-regression-the-lasso-and-the-elastic-net-4eaecaf5f7e6

https://habr.com/ru/company/skillfactory/blog/524722/. List of useful links

https://habr.com/ru/company/recognitor/blog/524980/

https://habr.com/ru/company/skillfactory/blog/525512/

https://habr.com/ru/company/skillbox/blog/525784/

Julia ML

https://sciml.ai/

https://github.com/tirthajyoti/Papers-Literature-ML-DL-RL-AI

Eugenevectors /values : numpy + mathplotlib https://www.paepper.com/blog/posts/eigenvectors_eigenvalues_machine_learning/

https://dafriedman97.github.io/mlbook/content/introduction.html book

Decision Tree

https://habr.com/ru/company/skillfactory/blog/526970/

https://habr.com/ru/post/526460/

https://habr.com/ru/post/520204/ Decision Tree

https://habr.com/ru/company/productstar/blog/523044/ Decision Tree

https://datalore.jetbrains.com/ free online jupyter notebook fron jetbrains

https://araza6.github.io/posts/autodiff/autodiff/ autodiff

https://www.youtube.com/channel/UCwBs8TLOogwyGd0GxHCp-Dw AIEngineering

https://www.youtube.com/channel/UCts-XMcexTiPSR8QbyRGFxA ML

https://leimao.github.io/article/

Categorial encodig

https://habr.com/ru/post/666234/

https://towardsdatascience.com/stop-one-hot-encoding-your-categorical-variables-bbb0fba89809 One-Hot Encoding

Amazon ML classes: https://www.youtube.com/channel/UC12LqyqTQYbXatYS9AA7Nuw/playlists

https://github.com/aws-samples/aws-machine-learning-university-accelerated-nlp

https://github.com/search?q=org%3Aaws-samples+%22aws-machine-learning%22

https://news.ycombinator.com/item?id=23901729 ML in physics

https://arxiv.org/pdf/2002.04803v2.pdf ML and Python

https://github.com/tirthajyoti/Machine-Learning-with-Python

https://github.com/pycaret/pycaret PyCaret

https://news.ycombinator.com/item?id=24671525 igel (like PyCaret)

https://twitter.com/ComputingByArts Michael Bukatin

https://machine-learning-with-python.readthedocs.io/en/latest/

https://github.com/khuyentran1401/Data_science_on_Medium

https://www.youtube.com/watch?v=bVQUSndDllU

https://libradocs.github.io/ Libra Fully Automated Machine Learning in One-Liners

https://github.com/KartikChugh/Otto

https://www.youtube.com/watch?v=Ozo6hkOaqPk working with datasets

Clustering algos: https://link.springer.com/content/pdf/10.1007/s40745-015-0040-1.pdf

https://habr.com/ru/company/skillfactory/blog/509212/ Good Links

https://github.com/mljar/mljar-supervised

https://habr.com/ru/company/leroy_merlin/blog/511792/ featuretools

https://github.com/mljar/mljar-supervised

https://www.meetup.com/Scala-Bay/events/271129752/. Kirpichev Google

https://arxiv.org/abs/1911.01547 On the Measure of Intelligence François Chollet

https://www.amazon.com/Practical-Deep-Learning-Cloud-Mobile-ebook/dp/B07Z7957PL/

https://arxiv.org/abs/2003.01384 Self-Supervised Object-Level Deep Reinforcement Learning

https://habr.com/ru/post/505516/ 1st ML model witk skilearn

https://news.ycombinator.com/item?id=22769319

https://explained.ai/regularization/index.html

https://jaydaigle.net/blog/overview-of-bayesian-inference/

https://dmm.dreamwidth.org/23855.html

https://www.edgeimpulse.com/blog/dsp-key-embedded-ml DSP for ML

https://amitness.com/2020/03/fixmatch-semi-supervised/

https://habr.com/ru/post/491010/

https://habr.com/ru/post/491326/ Data classification

https://blog.insightdatascience.com/bias-variance-tradeoff-explained-fa2bc28174c4 bias-variance explained

https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/

https://jacobgil.github.io/deeplearning/activelearning

https://habr.com/ru/company/otus/blog/497770/ PyCaret библиотека машинного обучения на Python

https://artint.info/2e/index.html BOOK

https://mlcourse.ai/ Russian

https://dlcourse.ai/ Russsian

https://ods.ai/ Russian

https://theaisummer.com/Graph_Neural_Networks/ Graph NN

https://prodi.gy/ . Prodigy is a scriptable annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration.

https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview

https://www.ahmedbesbes.com/blog/end-to-end-machine-learning

https://www.sicara.ai/blog/

https://habr.com/ru/post/479398/ Приводим уравнение линейной регрессии в матричный вид

https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about

https://practicalai.me/ A practical approach to machine learning.

https://jameskle.com/writes/rec-sys-part-3 recommendation systems

https://news.ycombinator.com/item?id=21710863 TF vs PyTorch

https://www.reddit.com/r/MachineLearning/comments/dgog2h/d_why_is_l2_preferred_over_l1_regularization/

https://news.ycombinator.com/item?id=21158487

https://streamlit.io/ . streamlit

https://habr.com/ru/post/473196/ streamlit

https://ahmedbesbes.com/end-to-end-ml.html

https://github.com/firmai/awesome-google-colab/blob/master/README.md . Google’s Colab

https://hn.algolia.com/?q=colab Google’s Colab

Collecting and scraping customer reviews data using Selenium and Scrapy
Training a deep learning sentiment classifier on this data using PyTorch
Building an interactive web app using Dash
Setting a REST API and a Postgres database
Dockerizing the app using Docker Compose
Deploying to AWS

https://mml-book.github.io/

https://news.ycombinator.com/item?id=21293132 math for machine learning

https://observablehq.com/

https://github.com/fritzlabs/Awesome-Mobile-Machine-Learning ML on mobile devices

https://habr.com/ru/post/460557/

https://www.reddit.com/r/learnmachinelearning/comments/be9e7j/post_all_machine_learning_courses_here/

https://eli5.readthedocs.io/en/latest/index.html . ELI5 explaing and plot for different ML models

https://uber.github.io/ludwig/ . toolbox that allows to train and test deep learning models without the need to write code.

https://github.com/genular/simon-frontend . https://genular.org/

AutoML

https://github.com/microsoft/FLAML from Microsoft

https://github.com/mljar/mljar-supervised

https://github.com/openml/automlbenchmark

https://habr.com/ru/company/otus/blog/525292/ . pip install autoviml

https://habr.com/ru/company/jetinfosystems/blog/485232/

https://www.ahmedbesbes.com/blog/introduction-to-mlbox

AutoML basic project: https://github.com/minimaxir/automl-gs given X predict Y

https://medium.com/georgian-impact-blog/choosing-the-best-automl-framework-4f2a90cb1826

https://github.com/mljar/mljar-supervised . AutoML

https://libradocs.github.io/ Libra Fully Automated Machine Learning in One-Liners

https://github.com/KartikChugh/Otto

Using gridsearch for hyperparamter optimization. Grid search is known to perform worse than random search in cases where not all hyperparameters are of similar importanc

Automatic Machine Learning: Methods, Systems, Challenges; Chapter Hyperparameter Optimization:

https://www.automl.org/wp-content/uploads/2018/11/hpo.pdf

https://habr.com/ru/company/mailru/blog/445530/

Awesome Machine Learning https://github.com/josephmisiti/awesome-machine-learning

Впечатляющий список систем, библиотек и ПО, классифицированных по языкам и категориям (компьютерное зрение, обработка естественного языка и т.д.). Кроме того, в этом репозитории вы найдете перечень бесплатных книг по машинному обучению, бесплатных (в основном) курсов по машинному обучению, блогов по data science.

Scikit-learn https://github.com/scikit-learn/scikit-learn

Развиваемый с 2007 г. Python-модуль для машинного обучения, построенный на основе библиотек SciPy, NumPy и Matplotlib. Распространяется по лицензии BSD 3-Clause. Scikit-learn — универсальный инструмент для работы, содержащий алгоритмы классификации, регрессии и кластеризации, а также методы подготовки данных и оценки моделей.

PredictionIO https://github.com/PredictionIO/PredictionIO

Фреймворк машинного обучения с открытым исходным кодом, поддерживающий сбор событий, развёртывание алгоритмов, оценку, шаблоны для известных задач, таких как классификация и рекомендации. Подключается к существующим приложениям с помощью REST API или SDK. PredictionIO основан на масштабируемых сервисах с открытым исходным кодом, таких как Hadoop, HBase (и другие БД), Elasticsearch, Spark.

Dive Into Machine Learning https://github.com/hangtwenty/dive-into-machine-learning

Материал для новичков в теме. Репозиторий содержит сборник туториалов IPython для библиотеки Scikit-learn, в которой реализовано большое количество алгоритмов машинного обучения, а также несколько ссылок на связанные с Python темы машинного обучения и более общую информацию по анализу данных. Автор дает ссылки на многие другие учебные пособия, охватывающие тему.

Pattern https://github.com/clips/pattern

Модуль веб-разработки на основе Python с инструментами для анализа, обработки естественного языка (разметка частей речи, поиск n-грамм, анализ настроений, WordNet), машинного обучения, сетевого анализа и визуализации. Модуль создан и хорошо документирован в исследовательском центре компьютерной лингвистики и психолингвистики Антверпенского университета (Бельгия). В репозитории вы найдете более 50 примеров его использования.

GoLearn https://github.com/sjwhitworth/golearn

Активно развивающаяся библиотека машинного обучения для Go. Предоставляет полнофункциональный, простой в использовании, легко настраиваемый программный пакет для разработчиков. GoLearn реализует знакомый многим интерфейс обучения Scikit-learn.

Vowpal Wabbit https://github.com/JohnLangford/vowpal_wabbit

Система Vowpal Wabbit расширяет границы машинного обучения с помощью таких методов, как хэширование, allreduce, learning2search, активное и интерактивное обучение. Vowpal Wabbit нацелена на быстрое моделирование массивных наборов данных и поддерживает параллельное обучение. Особое внимание уделяется обучению с подкреплением с использованием нескольких контекстуальных «бандитских алгоритмов».

Aerosolve https://github.com/airbnb/aerosolve

aerosolve пытается отличаться от других библиотек, концентрируясь на удобных для пользователя средствах отладки, Scala-коде для обучения, механизме анализа контента изображений для удобного ранжирования, гибкости и контроле над функциями. Библиотека предназначена для использования с редкими интерпретируемыми функциями, которые обычно встречаются в поиске (ключевые слова для поиска, фильтры) или ценообразовании (количество комнат в гостиничном номере, местоположение, цена).

Code for Machine Learning for Hackers https://github.com/johnmyleswhite/ML_for_Hackers

Дополняющий книгу «Machine Learning for Hackers» репозиторий, в котором весь код представлен на языке R, предназначенном для статистической обработки данных (фактически стандарт статистических программ) и работы с графикой. Здесь вы найдете многочисленные пакеты R. В число рассматриваемых тем входят общие задачи классификации, ранжирования и регрессии, а также статистические процедуры анализа компонентов и многомерного масштабирования.

https://habr.com/ru/company/skillfactory/blog/510420/ Automatic feature extraction

IPython (Jupyter) Notebooks

Список полезных репозиториев Github, состоящий из блокнотов IPython (Jupyter), ориентированных на работу с данными и машинное обучение.

https://github.com/donnemartin/data-science-ipython-notebooks

https://github.com/andypetrella/spark-notebook Spark Notebook

Python Machine Learning Book https://github.com/rasbt/python-machine-learning-book

Сопроводительный репозиторий первого издания книги «Machine Learning with Python» (репозиторий ко второму изданию тут), в которой рассматривается работа с недостающими значениями, преобразование категорийных переменных в форматы, применимые при машинном обучении, выбор информативных свойств, сжатие данных с переносом в подпространства с меньшим количеством измерений.

Example Data Science Notebook https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects/blob/master/example-data-science-notebook/Example%20Machine%20Learning%20Notebook.ipynb

Репозиторий учебных материалов, кода и данных для различных проектов анализа данных и машинного обучения. Notebook содержит все базовые принципы работы с анализом данных на примере датасета Iris, и служит прекрасной иллюстрацией построения рабочего процесса в data science. Базовые пункты для работы в репо почерпнуты из книги «The Elements of Data Analytic Style» (Jeff Leek, 2015).

Learn Data Science https://github.com/nborwankar/LearnDataScience

Коллекция Notebooks и датасетов, охватывающая четыре алгоритмические темы: линейная регрессия, логистическая регрессия, случайные леса и алгоритмы K-Means кластеризации. Learn Data Science основана на материалах, созданных для проекта Open Data Science Training.

IPython Notebooks https://github.com/jdwittenauer/ipython-notebooks

Репозиторий содержит различные Notebooks IPython — от обзора языка и функциональности IPython до примеров использования различных популярных библиотек в анализе данных. Здесь вы найдете исчерпывающую коллекцию материалов по машинному обучению, глубокому обучению и средам обработки больших данных с курсов «Machine Learning» Andrew Ng (Coursera), «Intro to TensorFlow for Deep Learning» (Udacity) и «Spark» (edX).

Scikit-learn Tutorial https://github.com/jakevdp/sklearn_tutorial

Репозиторий для изучения библиотеки Scikit-learn, в которой реализовано большое количество алгоритмов машинного обучения. Библиотека предоставляет реализацию целого ряда алгоритмов для обучения как с учителем, так и без него. Scikit-learn построена поверх SciPy (Scientific Python).

Machine Learning https://github.com/masinoa/machine_learning

Серия очень подробных учебных материалов по IPython Notebook, созданная на основе данных из курса Эндрю Нга по машинному обучению (Стэнфордский университет), курса Тома Митчелла (Университет Карнеги-Меллон) и книги Кристофера М. Бишора «Распознавание образов и машинное обучение». https://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738