STOHASTIC GRADIENT EXPLAINED https://habr.com/ru/companies/airi/articles/883266/
https://www.bishopbook.com/
Free book: introduction to statistical Learning with Python: https://www.statlearning.com/
https://arxiv.org/abs/2502.05244 Probabilistic Artificial Intelligence
https://www.amazon.com/Probabilistic-Robotics-INTELLIGENT-ROBOTICS-AUTONOMOUS/dp/0262201623
https://www.amazon.com/dp/3031064682/ Robotics Vision and Control with Python
Automatic EDA https://habr.com/ru/companies/gazprombank/articles/881386/
https://habr.com/ru/companies/yandex_praktikum/articles/879316/ EDA
https://github.com/NeKonnnn/Exploratory_Data_Analysis
https://www.youtube.com/playlist?list=PL4_hYwCyhAvaprlx_MnGC5xyKR1qb7Y7L Математика больших данных
https://www.llm-book.com/ Book
https://www.amazon.com/_/dp/149204552 Book Deep Learning for Coders with fastai and PyTorch
https://www.amazon.com/dp/1108415199/ref=sspa_dk_detail_1 Roman Vershinin. High-Dimensional Probability: An Introduction with Applications in Data Science
mlcourse.ai/
https://news.ycombinator.com/item?id=42827913 Good links!
https://habr.com/ru/articles/870718/ Наивный байесовский классификатор. Теория и реализация. С нуля
https://p.migdal.pl/blog/2025/01/dont-use-cosine-similarity
100+ LLM Interview Questions for Top Companies https://github.com/llmgenai/LLMInterviewQuestions/tree/main
Mathematics for machine learning book https://mml-book.github.io/ https://course.ccs.neu.edu/ds4420sp20/readings/mml-book.pdf
Machine learning in production https://mlip-cmu.github.io/book/
https://karpov.courses/ml-hard ХАРДКОРНЫЙ MACHINE LEARNING
Book
https://www.manning.com/books/machine-learning-system-design Valerii Babushkin and Arseny Kravchenko
Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications. Chip Huyen https://www.amazon.com/Designing-Machine-Learning-Systems-Production-Ready/dp/1098107969
Book https://github.com/abhishekkrthakur/approachingalmost/blob/master/AAAMLP.pdf
https://www.youtube.com/@abhishekkrthakur
https://habr.com/ru/companies/otus/articles/869372/
Компиляторы для глубоких нейросетевых моделей:
https://hpc-education.unn.ru/%D0%BE%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5/%D0%BA%D1%83%D1%80%D1%81%D1%8B/%D0%B1%D0%B0%D0%BA%D0%B0%D0%BB%D0%B0%D0%B2%D1%80%D0%B8%D0%B0%D1%82/%D0%BA%D0%BE%D0%BC%D0%BF%D0%B8%D0%BB%D1%8F%D1%82%D0%BE%D1%80%D1%8B-%D0%B4%D0%BB%D1%8F-%D0%B3%D0%BB%D1%83%D0%B1%D0%BE%D0%BA%D0%B8%D1%85-%D0%BC%D0%BE%D0%B4%D0%B5%D0%BB%D0%B5%D0%B9
https://poloclub.github.io/transformer-explainer/ Transformer explained
FDTool: a Python application to mine for functional dependencies and candidate keys in tabular data” Matt Buranosky https://habr.com/ru/articles/866718/
https://pypi.org/project/fdtool/
https://hpi.de/naumann/projects/repeatability/data-profiling/fds.html
https://hpi.de/naumann/projects/repeatability/data-profiling.html
https://parlance-labs.com/education/
https://github.com/chiphuyen/aie-book/blob/main/resources.md
https://rish-01.github.io/blog/posts/ml_estimation/ Loss function
https://stats.stackexchange.com/questions/357963/what-is-the-difference-between-cross-entropy-and-kl-divergence
Как на самом деле работает Attention https://habr.com/ru/companies/oleg-bunin/articles/865856/
5 способов установить и нативно использовать ChatGPT на компьютерах Mac: https://habr.com/ru/companies/x-com/articles/865858/
https://www.reddit.com/r/MachineLearning/comments/1gwbhxq/d_next_big_thing_in_time_series/
https://github.com/thuml/Time-Series-Library
https://www.reddit.com/r/MachineLearning/comments/1gujfj2/d_whats_the_most_surprising_or_counterintuitive/
https://sites.google.com/view/datascience-cheat-sheets
Book: https://mml-book.github.io/
https://www.kdnuggets.com/7-free-cloud-ide-for-data-science-that-you-are-missing-out
Book: Essential Math for AI: Next‑Level Mathematics for Efficient and Successful AI Systems
https://habr.com/ru/companies/raft/articles/851548/ Autoencoder in russian
https://dokumen.pub/essential-math-for-ai-next-level-mathematics-for-efficient-and-successful-ai-systems-1nbsped-1098107632-9781098107635.html
https://www.kdnuggets.com/7-free-cloud-ide-for-data-science-that-you-are-missing-out
https://www.justinmath.com/books/
https://physicsbaseddeeplearning.org/intro.html . BOOK
https://github.com/graviraja/MLOps-Basics
http://neuralnetworksanddeeplearning.com/
https://course.fast.ai/ Practical Deep Learning for coders
https://huyenchip.com/ml-interviews-book/
https://www.linkedin.com/groups/961087 Machine Learning Group
https://arxiv.org/abs/2201.00650 Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI
https://arxiv.org/pdf/2207.10185.pdf Modern Stat Learning Book
https://habr.com/ru/articles/814343/ What is Kernel
ML with tabular data: https://news.ycombinator.com/item?id=41072616
https://habr.com/ru/articles/829336/ Bootstrap
https://github.com/owainlewis/awesome-artificial-intelligence
https://arxiv.org/pdf/1912.13213.pdf Modern Online learning
https://habr.com/ru/articles/783766/ interview
https://habr.com/ru/companies/megafon/articles/800919/ interview
https://jeroenjanssens.com/dsatcl/ data science from command-line
https://www.evidentlyai.com/ml-system-design
https://habr.com/ru/articles/800973/ data preparation for ML
https://github.com/ml-tooling/best-of-ml-python
https://thecleverprogrammer.com/2023/07/15/machine-learning-projects-using-python/
https://habr.com/ru/users/egaoharu_kensei/publications/articles/
https://stepik.org/course/68260/promo Free course
https://habr.com/ru/companies/raft/articles/811371/ error backpropagation
math behind AlexNet https://towardsdatascience.com/the-math-behind-deep-cnn-alexnet-738d858e5a2f
https://nuancesprog.ru/computer-science/
https://habr.com/ru/articles/813221/ Методы оптимизации в машинном и глубоком обучении.
https://www.youtube.com/watch?v=yyHP4ySDOeg Местецкий Л.М. | Лекция 12.1 по Обработке и распознаванию изображений, 2024, весна| ВМК МГУ
https://www.youtube.com/watch?v=lEGR3u2SWKk Воронцов К.В. | Лекция 27 по Методам машинного обучения, 2024, весна| ВМК МГУ
Доп. главы машинного обучения, Карпачёв Н.Е., лекция 1, 05.02.2024 https://www.youtube.com/watch?v=Cs5NuVseHxU&list=PLti61wgkUWHyhCM4jK0ktwyDGN8XiGgHv&index=1
Автопереобучение моделей в Production https://habr.com/ru/companies/alfa/articles/821447/
All the FREE Stanford University Machine Learning Lectures 👇
Learn about Probability, NLP, LLMs, Transformers, and more …
Probability for Computer Scientists - https://lnkd.in/e6sCyZGj
Machine Learning Full Course taught by Andrew Ng - https://lnkd.in/eWs74qyR
NLP with Deep Learning - https://lnkd.in/eazqcvmk
Machine Learning Explainability - https://lnkd.in/evimZ5Za
Reinforcement Learning - https://lnkd.in/eEf5PETJ
Deep Generative Models - https://lnkd.in/euZ2e3xU
Building Large Language Models (LLMs) - https://lnkd.in/eVUkaJuF
Machine Learning with Graphs - https://lnkd.in/eF_d3iwq
Transformers United - https://lnkd.in/eXdGBqQq
👉 Get free resources, curated articles, and expert tips on Data and AI: https://lnkd.in/e7EunZck
https://www.kdnuggets.com/10-github-repositories-to-master-data-engineering
Spotify Machine Learning - https://lnkd.in/gGq7uj9g
https://habr.com/ru/articles/859478/ not only transformers
https://habr.com/ru/articles/817173/
https://www.youtube.com/watch?v=035I2WKj5F0
https://habr.com/ru/companies/tochka/articles/809493/
https://swirlaiconnect.com/blog/using-vectors-without-a-vector-database
https://duckdb.org/2024/05/03/vector-similarity-search-vss.html
ML on graphs: https://storage.googleapis.com/xavierbresson/lectures/CS6208/lecture01_introduction.pdf
Теоретические основы всех популярных алгоритмов машинного обучения и их реализация с нуля на Python https://habr.com/ru/articles/804605/
https://habr.com/ru/companies/yandex/articles/800945/ data quantization
Feature Selection Tutorial with Python Examples https://arxiv.org/pdf/2106.06437.pdf
https://arxiv.org/pdf/1905.12787.pdf The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial
https://smunshi.net/entropy-cross-entropy-and-kullback-leibler-divergence.html
Machine Learning Phystech https://www.youtube.com/@MachineLearningIS
Доп. главы машинного обучения, Карпачёв Н.Е. https://www.youtube.com/watch?v=Cs5NuVseHxU
https://habr.com/ru/companies/otus/articles/805801/ Time series, clustering
https://github.com/ageron/handson-ml3
https://aman.ai/primers/ai/
Topological data analysis. Nikita Kalinin. https://www.youtube.com/playlist?list=PLKXEsFnBcT5BD47xO19UsKshsQ7DMU3Sa
https://habr.com/ru/articles/814981/ Ansamble learning
https://leanpub.com/insidethemachinelearninginterview/c/CyberMonday2023HugeSale
Математика больших данных (4-5 курсы, осень 2022) - Гасников А.В
https://www.youtube.com/playlist?list=PL4_hYwCyhAvZIqYnUHqHf7g0G74gPOHo1
https://arxiv.org/abs/2308.10825 Topological data analysis
https://habr.com/ru/companies/otus/articles/773102/ bias, variance, etc
https://news.ycombinator.com/item?id=37137810 links
https://www.youtube.com/playlist?list=PLnu7tVik2MzLUv0wrYiBHOwbf7QpcWlIr
https://www.oreilly.com/library/view/machine-learning-with/9781098135713/ ML Cookbook, 2nd edition
https://docs.profiling.ydata.ai/ Data quality for Pandas and Spark dataframes
https://www.youtube.com/watch?v=6IGx7ZZdS74 Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)
https://medium.com/@techlatest.net/exploratory-data-analysis-with-python-jupyter-notebook-a-tutorial-on-how-to-perform-exploratory-5a800791b04f
https://github.com/rougier/ML-Recipes
https://www.youtube.com/@Eigensteve
https://faculty.washington.edu/sbrunton/DataBookV2.pdf
https://github.com/dynamicslab/
100 Data Science вопросов https://www.youtube.com/watch?v=BI-yjkRKymg
https://www.cs.toronto.edu/~duvenaud/cookbook/
https://www.cs.ubc.ca/~schmidtm/Courses/LecturesOnML/
https://onedrive.live.com/?authkey=%21AMC9DofT%5FW6hqE4&cid=199F8C87205FEB30&id=199F8C87205FEB30%21214006&parId=199F8C87205FEB30%21213972&o=OneUp
Пример уменьшения размерности данных с помощью линейных и нелинейных методов в Python:
PCA ICA TruncatedSVD MDS(Multidimensional Scaling)t-SNE (t-Distributed Stochastic Neighbor Embedding) UMAP (Uniform Manifold Approximation and Projection)
https://habr.com/ru/articles/751050/
https://en.wikipedia.org/wiki/Cosine_similarity
https://distill.pub/2021/gnn-intro/
aGrUM is a C++ library for graphical models. It is designed for easily building applications using graphical models such as Bayesian networks, influence diagrams, decision trees, GAI networks or Markov decision processe
https://agrum.gitlab.io/
https://webia.lip6.fr/~phw/aGrUM/BookOfWhy/
EDA exploratory data analysis:
https://habr.com/ru/companies/otus/articles/752434/
Serving ML model https://news.ycombinator.com/item?id=32277894
https://kolodezev.ru/mlsysd1.html
https://www.mlexample.com/ . BOOK
https://arxiv.org/abs/2203.08890 The Mathematics of Artificial Intelligence
https://news.ycombinator.com/item?id=30984662
https://www.cs.ubc.ca/~schmidtm/Courses/LecturesOnML/ Pen and Paper in ML
https://www.youtube.com/user/sanshush
https://arxiv.org/abs/2206.13446 Pen and paper exersices in ML
https://machinelearningrecipes.com/blog/post/1687564/stochastic-processes-and-simulations
https://deepmind.com/learning-resources/reinforcement-learning-series-2021
https://www.youtube.com/watch?v=mLHIMulCHcM Andrew Ng’s New Machine Learning Specialization is Out!!!
Why do tree-based models still outperform deep learning on tabular data https://news.ycombinator.com/item?id=32333565
https://www.assemblyai.com/blog/differentiable-programming-a-simple-introduction/
https://news.ycombinator.com/item?id=31000709
Никита Калинин, топологическ анализ данных
https://www.youtube.com/playlist?list=PLKXEsFnBcT5BD47xO19UsKshsQ7DMU3Sa
Feature Engineering: https://habr.com/ru/company/ruvds/blog/680498/
Laslo vs Ridge regression https://habr.com/ru/post/679232/
https://news.ycombinator.com/item?id=30417811 What are the most important statistical ideas of the past 50 years? (tandfonline.com)
High-dimentional ML:
https://towardsdatascience.com/high-dimensional-learning-ea6131785802
Monte-Carlo in Python https://towardsdatascience.com/how-to-create-a-monte-carlo-simulation-using-python-c24634a0978a
https://www.manning.com/books/deep-learning-with-python-second-edition. ML with Python 2nd ed
Real time ML https://huyenchip.com//2022/01/02/real-time-machine-learning-challenges-and-solutions.html
https://arxiv.org/abs/1709.02840 Introduction to Machine Learning
https://web.eecs.umich.edu/~jabernet/eecs598course/fall2015/web/
https://arxiv.org/abs/2004.09280 Towards a theory of machine learning. Vitaly Vanchurin
https://www.youtube.com/watch?v=LlKAna21fLE
https://news.ycombinator.com/item?id=30658324 ML tools
http://shamin.ru/
Лекция 1. Р.В. Шамин. Стохастический анализ и его приложения в машинном обучении https://www.youtube.com/playlist?list=PLUbD59ZHv1GSmj1ecP7ARU2LMqZgT_TyQ
https://thegradient.pub/
Fingerprint Matching in Python https://www.youtube.com/watch?v=IIvfqfKkiio
PCA https://news.ycombinator.com/item?id=30876293
https://github.com/kmkolasinski/nano-umap
https://www.reddit.com/r/MachineLearning/comments/1gsjfq9/p_analysis_of_why_umap_is_so_fast/
https://habr.com/ru/articles/811437/ отличия PCA от UMAP и t-SNE
https://arxiv.org/pdf/2110.06209.pdf A Brief Introduction to Automatic Differentiation for Machine Learning
automatic differntiation JAX https://arxiv.org/pdf/2111.00254.pdf
https://www.youtube.com/watch?v=WdTeDXsOSj4
https://deepmind.com/blog/article/using-jax-to-accelerate-our-research
https://arxiv.org/pdf/2106.06524.pdf WAX-ML: A Python library for machine learning and feedback loops on streaming data
https://wax-ml.readthedocs.io/en/latest/
https://cdanielaam.medium.com/essential-mathematical-equations-for-predictive-models-fcb79630ec96
https://inria.github.io/scikit-learn-mooc/
https://github.com/orico/www.mlcompendium.com
https://github.com/blobcity/ai-seed 1000+ ready code templates to kickstart your next AI experiment
https://lazypredict.readthedocs.io/en/latest/readme.html
Spearman vs Pearson correlation:
https://medium.com/productive-data-science/spearman-coefficient-tool-for-a-generalized-correlation-analysis-d15b70d4ff1e
https://www.argmin.net/
https://datasketches.apache.org/ A software library of stochastic streaming algorithms
https://medium.com/inspiredbrilliance/fast-approximation-on-massive-datasets-dd23117bab7f
https://www.youtube.com/watch?v=6nqV58NA_Ew Adam Optimization from Scratch in Python
https://web.stanford.edu/class/cs168/
https://web.stanford.edu/class/cs168/l/
https://web.stanford.edu/class/cs168/l/l1.pdf
https://web.stanford.edu/class/cs168/l/l2.pdf
https://t.me/emeliml Эмели Драль о прикладном машинном обучении.
https://github.com/faridrashidi/kaggle-solutions Kaggle solutions
https://habr.com/ru/company/skillfactory/blog/561044/ . useful libs
https://lilianweng.github.io/lil-log/2022/02/20/active-learning.html
https://habr.com/ru/post/592177/
https://habr.com/ru/post/593615/
https://www.quantamagazine.org/the-computer-scientist-trying-to-teach-ai-to-learn-like-we-do-20220802
https://github.com/online-ml/river#readme
https://towardsdatascience.com/river-the-best-python-library-for-online-machine-learning-56bf6f71a403
https://towardsdatascience.com/7-cool-python-packages-kagglers-are-using-without-telling-you-e83298781cf4
https://causalnex.readthedocs.io/en/latest/ Casuality vs correlation
https://towardsdatascience.com/summarize-pandas-data-frames-b9770567f940 Skimpy is a convenient way to generate quick summaries of any dataset, even without writing any code.
https://habr.com/ru/post/654907/. Jini index
kernels in ML https://pub.towardsai.net/types-of-kernels-in-machine-learning-291cf85fcdd0
https://numpy-ml.readthedocs.io/en/latest/
https://www.i-programmer.info/news/89-net/14846-free-resources-for-machine-learning.html
https://predictivehacks.com/10-tips-and-tricks-for-data-scientists-vol-10/
https://www.youtube.com/watch?v=RaTe3dhiqdE entropy, mutual info
https://www.toptal.com/algorithms/metropolis-hastings-bayesian-inference
https://huyenchip.com/ml-interviews-book/ INTERVIEW BOOK
https://rentruewang.github.io/learning-machine/intro.html. BOOK
https://whitead.github.io/dmol-book
https://github.com/EdemGold/Nutshell-Machine-Learning
https://arxiv.org/abs/2010.03415 Knowledge based learning
https://github.com/r0f1/datascience
https://arxiv.org/pdf/2108.02497
https://habr.com/ru/company/otus/blog/573924/
https://github.com/dair-ai/ML-YouTube-Courses
https://dataelixir.com/
https://github.com/machow/siuba Python library for using dplyr like syntax with pandas and SQL
https://towardsdatascience.com/kalman-filter-in-a-nutshell-e66154a06862
https://towardsdatascience.com/geometric-foundations-of-deep-learning-94cdd45b451d
https://www.youtube.com/watch?v=w6Pw4MOzMuo
https://news.ycombinator.com/item?id=27577467 ML beyound curve fitting
https://habr.com/ru/articles/821547/
https://habr.com/ru/articles/820411/ Confusion matrix, Accuracy, Precision, Recall, F-score, ROC-AUC
https://github.com/phongsathorn1/pretty-confusion-matrix Confusion matrix
https://en.wikipedia.org/wiki/Receiver_operating_characteristic ROC curve
https://towardsdatascience.com/a-graphical-explanation-of-roc-and-auc-183705caeb27
https://towardsdatascience.com/understanding-roc-curves-c7f0b52e931e Understanding ROC Curves with Python
https://habr.com/ru/company/netologyru/blog/582756/
https://towardsdatascience.com/dimensionality-reduction-explained-5ae45ae3058e Dimentionality reduction
https://featuretools.alteryx.com/en/stable Feature extraction
https://arxiv.org/pdf/2106.06437.pdf Feature selection
https://stackabuse.com/random-projection-theory-and-implementation-in-python-with-scikit-learn/
https://stackabuse.com/calculating-spearmans-rank-correlation-coefficient-in-python-with-pandas/
http://creatingdata.us/techne/deep_scatterplots/# Zoomable scatterplot
https://habr.com/ru/articles/811425/
https://habr.com/ru/post/559130/
Data Driven Causal Relationship Discovery with Python Example Code https://pkghosh.wordpress.com/2021/05/25/data-driven-causal-relationship-discovery-with-python-example-code/
https://habr.com/ru/company/otus/blog/559666/ Kaggle tricks
https://habr.com/ru/articles/775032/
https://news.ycombinator.com/item?id=29617087
https://habr.com/ru/company/skillfactory/blog/562928/
https://habr.com/ru/company/wunderfund/blog/592231/
https://habr.com/ru/post/558488/
https://habr.com/ru/post/563778/
https://habr.com/ru/company/wunderfund/blog/594333/
https://semiengineering.com/neural-networks-without-matrix-math/
https://semiengineering.com/developers-turn-to-analog-for-neural-nets/
https://habr.com/ru/post/558836/
https://habr.com/ru/post/556856/ Statistics with Python
https://christophm.github.io/interpretable-ml-book/
https://www.youtube.com/watch?v=pqNCD_5r0IU Scikit-Learn Course
https://thedatasciencedigest.substack.com/p/python-data-science-digest-may-2021
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford’s_online_algorithm
https://natural-blogarithm.com/post/variance-welford-vs-numpy/
https://www.youtube.com/watch?v=ylytZegK–I
https://www.youtube.com/watch?v=IqT551LjKHw
https://www.facebook.com/pythonposts/
https://explained.ai/regularization/index.html . Regalurization
https://www.youtube.com/watch?v=tX_MeIbfEmw К.В. Воронцов “Обзор постановок оптимизационных задач машинного обучения”
https://arxiv.org/abs/1803.08823 A high-bias, low-variance introduction to Machine Learning for physicists
https://habr.com/ru/company/otus/blog/585610/
https://towardsdatascience.com/maximum-likelihood-vs-bayesian-estimation-dd2eb4dfda8a
https://ggcarvalho.dev/posts/montecarlo/ . Monte Carlo with GO
https://habr.com/ru/company/otus/blog/555980/ stat paradox
https://www.youtube.com/watch?v=EMIyRmrPWJQ
https://habr.com/ru/articles/850168/
https://aegis4048.github.io/mutiple_linear_regression_and_visualization_in_python . Linear regression
https://sidsite.com/posts/fourier-nets/
https://news.ycombinator.com/item?id=26980169
https://en.wikipedia.org/wiki/Max_Welling
https://www.youtube.com/watch?v=mmDw5glry9w
https://www.reddit.com/r/MachineLearning/comments/mwwftu/d_your_favorite_ai_podcasts_blogs_newsletters/
http://ai.lector.ru/
http://www.mathnet.ru/conf1243
https://github.com/rwsh
Simulated annealing (ru) https://www.math.spbu.ru/user/gran/sb1/lopatin.pdf
Mikhail BELKIN https://www.youtube.com/watch?v=yPwCb12V0Mk
https://www.theinsaneapp.com/2020/12/machine-learning-and-data-science-cheat-sheets-pdf.html
https://www.alexpghayes.com/blog/many-models-workflows-in-python-part-i/
https://www.youtube.com/watch?v=7inArpm-83U. Interview
https://theblog.github.io/post/from-tensorflow-to-pytorch/. PyTorch for Tensorflow users
https://twitter.com/icymi_py Python data science
https://github.com/SimonBlanke/Hyperactive collection of optimization algorithms that can be used for a variety of optimization problems.
https://github.com/SimonBlanke/Gradient-Free-Optimizers
https://news.ycombinator.com/item?id=26293171
https://habr.com/ru/post/549376. ROC curve
https://vita.had.co.nz/papers/tidy-data.pdf
https://pbeshai.github.io/tidy/
https://uwdata.github.io/arquero/
http://www.machinelearning.ru/wiki/images/7/7c/SMAIS11_MCMC.pdf
https://habr.com/ru/post/455762/. Markov
https://habr.com/ru/company/skyeng/blog/473124/
http://www.randomservices.org/random/. Markov chain etc
https://www.youtube.com/watch?v=i3AkTO9HLXo&list=PLM8wYQRetTxBkdvBtz-gw8b9lcVkdXQKV Markov chain
http://www.stat.columbia.edu/~gelman/research/unpublished/stat50.pdf
https://news.ycombinator.com/item?id=26374788
https://mixtape.scunning.com/
https://mlfromscratch.com/model-stacking-explained/#/
https://habr.com/ru/post/562640/
https://github.com/rushter/MLAlgorithms
https://github.com/trekhleb/homemade-machine-learning/
https://github.com/Gautam-J/Machine-Learning
https://pypi.org/project/sealion/
https://habr.com/ru/post/541742/. Image processing
https://sparkbyexamples.com/h2o-sparkling-water/install-running-sparkling-water-on-mac-os/
brew install apache-spark
https://docs.h2o.ai/h2o/latest-stable/h2o-docs/downloading.html
version 3.32.0.4
https://anaconda.org/h2oai/h2o
conda install -c h2oai h2o
https://pypi.org/project/h2o/
java -jar h2o.jar
http://localhost:54321
https://docs.h2o.ai/h2o/latest-stable/h2o-docs/flow.html
https://towardsdatascience.com/h2o-for-inexperienced-users-7bc064124264
https://www.coursera.org/learn/machine-learning-h2o
Time series example https://www.h2o.ai/products-dai-timeseries/
https://youtu.be/0pvvDHfxdZ8
https://github.com/SeanPLeary/time-series-h2o-automl-example/blob/master/h2o_automl_example_with_multivariate_time_series.ipynb
https://stackoverflow.com/questions/56666876/how-to-predict-future-values-of-time-series-using-h2o-predict
https://www.confetti.ai/assets/ml-primer/ml_primer.pdf Primer
Book: Dive into Deep learning https://d2l.ai/d2l-en.pdf
https://www.deeplearningbook.org/
Book: Computer Age Statistical Inference
https://web.stanford.edu/~hastie/CASI_files/PDF/casi.pdf
Book: https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf
https://www.microsoft.com/en-us/research/people/cmbishop/
Book: Probabilistic Machine Learning: An Introduction. by Kevin Patrick Murphy. MIT Press, 2021. https://probml.github.io/pml-book/book1.html
Book: Foundation of Data Science
http://www.cs.cornell.edu/jeh/book%20no%20so;utions%20March%202019.pdf
Book: Data Science in Production
https://levelup.gitconnected.com/book-launch-data-science-in-production-54b325c03818
https://github.com/bgweber/DS_Production
https://mlpowered.com/book/ Book
Book: https://deeplearningsystems.ai/
https://marksaroufim.medium.com/the-robot-overlord-manual-d4ee709155bc
https://github.com/ml-tooling/best-of-ml-python
https://www.youtube.com/channel/UCh8IuVJvRdporrHi-I9H7Vw. Unfold Data Science
Linear models: https://www.youtube.com/watch?v=68ABAU_V8qI
Best ML Blogs: https://bloggingfordevs.com/machine-learning-blogs/
https://habr.com/ru/company/skillbox/blog/540940/. DataScientist Job Intervew in Aazon
500 + 𝗔𝗿𝘁𝗶𝗳𝗶𝗰𝗶𝗮𝗹 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝗰𝗲 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 𝗟𝗶𝘀𝘁 𝘄𝗶𝘁𝗵 𝗰𝗼𝗱𝗲: https://github.com/ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code
https://github.com/gugarosa/opytimizer
https://www.mindsdb.com/. MindsDB provides a simple way to create, train and test ML models and then publish them as virtual AI-Tables into databases.
Integrate seamlessly with most of databases on the market Use SQL queries for all manipulation with ML models Improve model training speed with GPU without affecting your database performance
https://betterexplained.com/articles/intuitive-convolution/ Convolution
https://rubikscode.net/2020/11/15/top-9-feature-engineering-techniques/
https://madewithml.com/
jupyter notebooks) for the “The Elements of Statistical Learning” https://github.com/maitbayev/the-elements-of-statistical-learning
https://moosetechnology.org/
https://news.ycombinator.com/item?id=24983603
visidata.org
https://github.com/antonycourtney/tad Tad
https://towardsdatascience.com/introduction-to-d-tale-5eddd81abe3f. DTale
https://towardsdatascience.com/4-libraries-that-can-perform-eda-in-one-line-of-python-code-b13938a06ae
Pandas-Profiling
Sweetviz
Autoviz
D-Tale
https://towardsdatascience.com/drag-and-drop-tools-for-machine-learning-pipelines-worth-a-try-63ace4a18715
Knime
Orange
https://machinelearningmastery.com/calculate-feature-importance-with-python/
https://towardsdatascience.com/from-linear-regression-to-ridge-regression-the-lasso-and-the-elastic-net-4eaecaf5f7e6
https://habr.com/ru/company/skillfactory/blog/524722/. List of useful links
https://habr.com/ru/company/recognitor/blog/524980/
https://habr.com/ru/company/skillfactory/blog/525512/
https://habr.com/ru/company/skillbox/blog/525784/
https://sciml.ai/
https://github.com/tirthajyoti/Papers-Literature-ML-DL-RL-AI
Eugenevectors /values : numpy + mathplotlib https://www.paepper.com/blog/posts/eigenvectors_eigenvalues_machine_learning/
https://dafriedman97.github.io/mlbook/content/introduction.html book
https://habr.com/ru/company/skillfactory/blog/526970/
https://habr.com/ru/post/526460/
https://habr.com/ru/post/520204/ Decision Tree
https://habr.com/ru/company/productstar/blog/523044/ Decision Tree
https://datalore.jetbrains.com/ free online jupyter notebook fron jetbrains
https://araza6.github.io/posts/autodiff/autodiff/ autodiff
https://www.youtube.com/channel/UCwBs8TLOogwyGd0GxHCp-Dw AIEngineering
https://www.youtube.com/channel/UCts-XMcexTiPSR8QbyRGFxA ML
https://leimao.github.io/article/
https://habr.com/ru/post/666234/
https://towardsdatascience.com/stop-one-hot-encoding-your-categorical-variables-bbb0fba89809 One-Hot Encoding
Amazon ML classes: https://www.youtube.com/channel/UC12LqyqTQYbXatYS9AA7Nuw/playlists
https://github.com/aws-samples/aws-machine-learning-university-accelerated-nlp
https://github.com/search?q=org%3Aaws-samples+%22aws-machine-learning%22
https://news.ycombinator.com/item?id=23901729 ML in physics
https://arxiv.org/pdf/2002.04803v2.pdf ML and Python
https://github.com/tirthajyoti/Machine-Learning-with-Python
https://github.com/pycaret/pycaret PyCaret
https://news.ycombinator.com/item?id=24671525 igel (like PyCaret)
https://twitter.com/ComputingByArts Michael Bukatin
https://machine-learning-with-python.readthedocs.io/en/latest/
https://github.com/khuyentran1401/Data_science_on_Medium
https://www.youtube.com/watch?v=bVQUSndDllU
https://libradocs.github.io/ Libra Fully Automated Machine Learning in One-Liners
https://github.com/KartikChugh/Otto
https://www.youtube.com/watch?v=Ozo6hkOaqPk working with datasets
Clustering algos: https://link.springer.com/content/pdf/10.1007/s40745-015-0040-1.pdf
https://habr.com/ru/company/skillfactory/blog/509212/ Good Links
https://github.com/mljar/mljar-supervised
https://habr.com/ru/company/leroy_merlin/blog/511792/ featuretools
https://github.com/mljar/mljar-supervised
https://www.meetup.com/Scala-Bay/events/271129752/. Kirpichev Google
https://arxiv.org/abs/1911.01547 On the Measure of Intelligence François Chollet
https://www.amazon.com/Practical-Deep-Learning-Cloud-Mobile-ebook/dp/B07Z7957PL/
https://arxiv.org/abs/2003.01384 Self-Supervised Object-Level Deep Reinforcement Learning
https://habr.com/ru/post/505516/ 1st ML model witk skilearn
https://news.ycombinator.com/item?id=22769319
https://explained.ai/regularization/index.html
https://jaydaigle.net/blog/overview-of-bayesian-inference/
https://dmm.dreamwidth.org/23855.html
https://www.edgeimpulse.com/blog/dsp-key-embedded-ml DSP for ML
https://amitness.com/2020/03/fixmatch-semi-supervised/
https://habr.com/ru/post/491010/
https://habr.com/ru/post/491326/ Data classification
https://blog.insightdatascience.com/bias-variance-tradeoff-explained-fa2bc28174c4 bias-variance explained
https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
https://jacobgil.github.io/deeplearning/activelearning
https://habr.com/ru/company/otus/blog/497770/ PyCaret библиотека машинного обучения на Python
https://artint.info/2e/index.html BOOK
https://mlcourse.ai/ Russian
https://dlcourse.ai/ Russsian
https://ods.ai/ Russian
https://theaisummer.com/Graph_Neural_Networks/ Graph NN
https://prodi.gy/ . Prodigy is a scriptable annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration.
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview
https://www.ahmedbesbes.com/blog/end-to-end-machine-learning
https://www.sicara.ai/blog/
https://habr.com/ru/post/479398/ Приводим уравнение линейной регрессии в матричный вид
https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about
https://practicalai.me/ A practical approach to machine learning.
https://jameskle.com/writes/rec-sys-part-3 recommendation systems
https://news.ycombinator.com/item?id=21710863 TF vs PyTorch
https://news.ycombinator.com/item?id=21158487
https://streamlit.io/ . streamlit
https://habr.com/ru/post/473196/ streamlit
https://ahmedbesbes.com/end-to-end-ml.html
https://github.com/firmai/awesome-google-colab/blob/master/README.md . Google’s Colab
https://hn.algolia.com/?q=colab Google’s Colab
Collecting and scraping customer reviews data using Selenium and Scrapy
Training a deep learning sentiment classifier on this data using PyTorch
Building an interactive web app using Dash
Setting a REST API and a Postgres database
Dockerizing the app using Docker Compose
Deploying to AWS
https://news.ycombinator.com/item?id=21293132 math for machine learning
https://github.com/fritzlabs/Awesome-Mobile-Machine-Learning ML on mobile devices
https://habr.com/ru/post/460557/
https://eli5.readthedocs.io/en/latest/index.html . ELI5 explaing and plot for different ML models
https://uber.github.io/ludwig/ . toolbox that allows to train and test deep learning models without the need to write code.
https://github.com/genular/simon-frontend . https://genular.org/
https://github.com/microsoft/FLAML from Microsoft
https://github.com/mljar/mljar-supervised
https://github.com/openml/automlbenchmark
https://habr.com/ru/company/otus/blog/525292/ . pip install autoviml
https://habr.com/ru/company/jetinfosystems/blog/485232/
https://www.ahmedbesbes.com/blog/introduction-to-mlbox
AutoML basic project: https://github.com/minimaxir/automl-gs given X predict Y
https://medium.com/georgian-impact-blog/choosing-the-best-automl-framework-4f2a90cb1826
https://github.com/mljar/mljar-supervised . AutoML
https://libradocs.github.io/ Libra Fully Automated Machine Learning in One-Liners
https://github.com/KartikChugh/Otto
Using gridsearch for hyperparamter optimization. Grid search is known to perform worse than random search in cases where not all hyperparameters are of similar importanc
Automatic Machine Learning: Methods, Systems, Challenges; Chapter Hyperparameter Optimization:
https://www.automl.org/wp-content/uploads/2018/11/hpo.pdf
https://habr.com/ru/company/mailru/blog/445530/
Awesome Machine Learning https://github.com/josephmisiti/awesome-machine-learning
Впечатляющий список систем, библиотек и ПО, классифицированных по языкам и категориям (компьютерное зрение, обработка естественного языка и т.д.). Кроме того, в этом репозитории вы найдете перечень бесплатных книг по машинному обучению, бесплатных (в основном) курсов по машинному обучению, блогов по data science.
Scikit-learn https://github.com/scikit-learn/scikit-learn
Развиваемый с 2007 г. Python-модуль для машинного обучения, построенный на основе библиотек SciPy, NumPy и Matplotlib. Распространяется по лицензии BSD 3-Clause. Scikit-learn — универсальный инструмент для работы, содержащий алгоритмы классификации, регрессии и кластеризации, а также методы подготовки данных и оценки моделей.
PredictionIO https://github.com/PredictionIO/PredictionIO
Фреймворк машинного обучения с открытым исходным кодом, поддерживающий сбор событий, развёртывание алгоритмов, оценку, шаблоны для известных задач, таких как классификация и рекомендации. Подключается к существующим приложениям с помощью REST API или SDK. PredictionIO основан на масштабируемых сервисах с открытым исходным кодом, таких как Hadoop, HBase (и другие БД), Elasticsearch, Spark.
Dive Into Machine Learning https://github.com/hangtwenty/dive-into-machine-learning
Материал для новичков в теме. Репозиторий содержит сборник туториалов IPython для библиотеки Scikit-learn, в которой реализовано большое количество алгоритмов машинного обучения, а также несколько ссылок на связанные с Python темы машинного обучения и более общую информацию по анализу данных. Автор дает ссылки на многие другие учебные пособия, охватывающие тему.
Pattern https://github.com/clips/pattern
Модуль веб-разработки на основе Python с инструментами для анализа, обработки естественного языка (разметка частей речи, поиск n-грамм, анализ настроений, WordNet), машинного обучения, сетевого анализа и визуализации. Модуль создан и хорошо документирован в исследовательском центре компьютерной лингвистики и психолингвистики Антверпенского университета (Бельгия). В репозитории вы найдете более 50 примеров его использования.
GoLearn https://github.com/sjwhitworth/golearn
Активно развивающаяся библиотека машинного обучения для Go. Предоставляет полнофункциональный, простой в использовании, легко настраиваемый программный пакет для разработчиков. GoLearn реализует знакомый многим интерфейс обучения Scikit-learn.
Vowpal Wabbit https://github.com/JohnLangford/vowpal_wabbit
Система Vowpal Wabbit расширяет границы машинного обучения с помощью таких методов, как хэширование, allreduce, learning2search, активное и интерактивное обучение. Vowpal Wabbit нацелена на быстрое моделирование массивных наборов данных и поддерживает параллельное обучение. Особое внимание уделяется обучению с подкреплением с использованием нескольких контекстуальных «бандитских алгоритмов».
Aerosolve https://github.com/airbnb/aerosolve
aerosolve пытается отличаться от других библиотек, концентрируясь на удобных для пользователя средствах отладки, Scala-коде для обучения, механизме анализа контента изображений для удобного ранжирования, гибкости и контроле над функциями. Библиотека предназначена для использования с редкими интерпретируемыми функциями, которые обычно встречаются в поиске (ключевые слова для поиска, фильтры) или ценообразовании (количество комнат в гостиничном номере, местоположение, цена).
Code for Machine Learning for Hackers https://github.com/johnmyleswhite/ML_for_Hackers
Дополняющий книгу «Machine Learning for Hackers» репозиторий, в котором весь код представлен на языке R, предназначенном для статистической обработки данных (фактически стандарт статистических программ) и работы с графикой. Здесь вы найдете многочисленные пакеты R. В число рассматриваемых тем входят общие задачи классификации, ранжирования и регрессии, а также статистические процедуры анализа компонентов и многомерного масштабирования.
https://habr.com/ru/company/skillfactory/blog/510420/ Automatic feature extraction
Список полезных репозиториев Github, состоящий из блокнотов IPython (Jupyter), ориентированных на работу с данными и машинное обучение.
https://github.com/donnemartin/data-science-ipython-notebooks
https://github.com/andypetrella/spark-notebook Spark Notebook
Python Machine Learning Book https://github.com/rasbt/python-machine-learning-book
Сопроводительный репозиторий первого издания книги «Machine Learning with Python» (репозиторий ко второму изданию тут), в которой рассматривается работа с недостающими значениями, преобразование категорийных переменных в форматы, применимые при машинном обучении, выбор информативных свойств, сжатие данных с переносом в подпространства с меньшим количеством измерений.
Example Data Science Notebook https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects/blob/master/example-data-science-notebook/Example%20Machine%20Learning%20Notebook.ipynb
Репозиторий учебных материалов, кода и данных для различных проектов анализа данных и машинного обучения. Notebook содержит все базовые принципы работы с анализом данных на примере датасета Iris, и служит прекрасной иллюстрацией построения рабочего процесса в data science. Базовые пункты для работы в репо почерпнуты из книги «The Elements of Data Analytic Style» (Jeff Leek, 2015).
Learn Data Science https://github.com/nborwankar/LearnDataScience
Коллекция Notebooks и датасетов, охватывающая четыре алгоритмические темы: линейная регрессия, логистическая регрессия, случайные леса и алгоритмы K-Means кластеризации. Learn Data Science основана на материалах, созданных для проекта Open Data Science Training.
IPython Notebooks https://github.com/jdwittenauer/ipython-notebooks
Репозиторий содержит различные Notebooks IPython — от обзора языка и функциональности IPython до примеров использования различных популярных библиотек в анализе данных. Здесь вы найдете исчерпывающую коллекцию материалов по машинному обучению, глубокому обучению и средам обработки больших данных с курсов «Machine Learning» Andrew Ng (Coursera), «Intro to TensorFlow for Deep Learning» (Udacity) и «Spark» (edX).
Scikit-learn Tutorial https://github.com/jakevdp/sklearn_tutorial
Репозиторий для изучения библиотеки Scikit-learn, в которой реализовано большое количество алгоритмов машинного обучения. Библиотека предоставляет реализацию целого ряда алгоритмов для обучения как с учителем, так и без него. Scikit-learn построена поверх SciPy (Scientific Python).
Machine Learning https://github.com/masinoa/machine_learning
Серия очень подробных учебных материалов по IPython Notebook, созданная на основе данных из курса Эндрю Нга по машинному обучению (Стэнфордский университет), курса Тома Митчелла (Университет Карнеги-Меллон) и книги Кристофера М. Бишора «Распознавание образов и машинное обучение». https://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738