Michael Lubinsky's homepage

Programming, Math and Physics

View My GitHub Profile

https://the-algorithms.com/

Data Structures for Data-Intensive apps: https://cs-people.bu.edu/mathan/publications/fnt23-athanassoulis.pdf

https://habr.com/ru/articles/808511/ Дерево отрезков

Probabilistic data structures: Bloom filter Hyperloglog

https://towardsdatascience.com/how-to-store-and-query-100-million-items-using-just-77mb-with-python-bloom-filters-6b3e8549f032

https://boyter.org/posts/bloom-filter/ Bloom filter: NO means NO

https://codeconfessions.substack.com/p/bloom-filters-and-beyond

https://habr.com/ru/company/otus/blog/541378/ Bloom filter

https://sagi.io/2017/07/bloom-filters-for-the-perplexed/

https://news.ycombinator.com/item?id=15346337

https://djhworld.github.io/hyperloglog/

https://mungingdata.com/apache-spark/hyperloglog-count-distinct/ HyperLogLog for approximate count distinct https://samwho.dev/bloom-filters/
https://habr.com/ru/articles/743800/ Bloom Filter, Count-Min Sketch, HyperLogLog
https://codeconfessions.substack.com/p/bloom-filters-and-beyond Bloom filter
https://codeconfessions.substack.com/p/cpython-bloom-filter-usage

https://github.com/GauravWalia19/Free-Algorithms-Books/blob/master/Library/Python.md

https://habr.com/ru/post/686806/ ML + information theory 1

https://habr.com/ru/post/687546/ ML + information theory 2

https://usaco.guide/CPH.pdf Competitive programming handbook

https://www.jjj.de/fxt/fxtbook.pdf

https://ru.algorithmica.org/

https://habr.com/ru/articles/790844/ Sparce arrays

https://tproger.ru/books/algorithms-data-structures-books/

https://www.youtube.com/playlist?list=PL4_hYwCyhAvZZe505W6mfZQLvOmd7iAn7 MFTI algo lectures ru

Hash function and table

https://redixhumayun.github.io/databases/2024/01/26/extendible-hash-tables.html

https://papa.bretmulvey.com/post/124027987928/hash-functions

Let’s say we want to count the number of times elements appear in a stream of data.
A simple solution is to maintain a hash table that maps elements to their frequencies.

This approach does not scale:
 Imagine having a stream with billions of elements, most of which are unique.
Even if we are only interested in the most important ones, this method has huge space requirements.
Since we do not know for which items to store counts, our hash table will grow to contain billions of elements

https://florian.github.io//count-min-sketch/

https://courses.csail.mit.edu/6.851/fall17/lectures/

https://www.youtube.com/c/Reducible

https://web.stanford.edu/class/cs168/index.html

https://news.ycombinator.com/item?id=32186203 Obsure data structures

https://blog.bradfieldcs.com/an-introduction-to-hashing-in-the-era-of-machine-learning-6039394549b0

https://www.youtube.com/watch?v=poz15EVC4MQ Stack

Books

http://algorithmsbook.com/ Algorithms for Decision Making

https://news.ycombinator.com/item?id=25716581 Algorithms for Decision Making

Partitionong and hashing

https://m.youtube.com/watch?v=UF9Iqmg94tk consistent hashing

https://habr.com/ru/company/mygames/blog/669390/ consistent hashing

https://www.thedigitalcatonline.com/blog/2022/08/23/data-partitioning-and-consistent-hashing/

https://www.manning.com/books/algorithms-and-data-structures-for-massive-datasets

http://dimacs.rutgers.edu/~graham/ssbd.html . Book - Small summaries for big data

http://csc.kth.se/~jsannemo/slask/main.pdf Book

https://cses.fi/book.pdf Book

http://e-maxx.ru/index.php http://e-maxx.ru/algo/

https://cses.fi/book/ . Competitive programming book

https://www.amazon.com/dp/1793296634 Algo Book

http://www.cs.sjtu.edu.cn/~jiangli/teaching/CS222/files/materials/Algorithm%20Design.pdf Algo book

https://www.reddit.com/r/algorithms/comments/g7qc13/is_there_a_cookbook_for_algorithms_a_collection/

http://opendatastructures.org/

https://algs4.cs.princeton.edu/home/

http://www.dspguide.com/pdfbook.htm DSP book

Not books

https://github.com/tobymao/sqlglot/blob/main/posts/sql_diff.md

https://www.infoq.com/presentations/abstract-algebra-analytics/

https://www.infoq.com/data-analysis/

https://news.ycombinator.com/item?id=28526966 Precentile

http://www.numericalexpert.com/articles/single_pass_stat/ Calculate STD and other metrics in single pass

https://www.algorithm-archive.org/

http://jeffe.cs.illinois.edu/teaching/algorithms/

https://quanticdev.com/algorithms/distributed-computing/distributed-sorting/. distibuted sorting

bitmaps

https://vikramoberoi.com/a-primer-on-roaring-bitmaps-what-they-are-and-how-they-work/

Trie

https://mazzo.li/posts/haskell-readability.html

https://www.cnblogs.com/lexus/archive/2012/03/05/2379830.html

https://habr.com/ru/company/tensor/blog/518726/. Prefix Trie

https://habr.com/ru/company/otus/blog/676692/

Tree

Breadth-first traversal (aka level order traversal) and Depth-first traversals

Breadth-first traversal: This involves traversing the tree level by level, visiting all the nodes at a given level before moving on to the next level. Breadth-first traversal is also known as level order traversal.

from collections import deque

class Node:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

def breadth_first_traversal(root):
    if root is None:
        return
    queue = deque()
    queue.append(root)
    while queue:
        node = queue.popleft()
        print(node.value, end=' ')
        if node.left:
            queue.append(node.left)
        if node.right:
            queue.append(node.right)

# Example usage
root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.left = Node(4)
root.left.right = Node(5)
breadth_first_traversal(root) 
# Output: 1 2 3 4 5

In the above example, breadth_first_traversal function takes a root node of a tree, and using a queue, it visits all the nodes at a given level before moving on to the next level.

The function uses a deque from the collections module, it appends the root to the queue and while the queue is not empty, it pops the leftmost element, prints it and appends its children to the queue. The example tree is a simple binary tree where each node has at most two children.

It’s worth noting that if the tree was a graph with unordered edges, you would have to keep track of the visited nodes to avoid infinite loop.

Depth-first traversal: 3 possible ways:

In-order traversal: visits the left subtree, then the root, and then the right subtree.

def inOrder(root):
    if root:
        inOrder(root.left)
        print(root.data, end=' ')
        inOrder(root.right)

Pre-order traversal: visits the root, then the left subtree, and then the right subtree.

def preOrder(root):
    if root:
        print(root.data, end=' ')
        preOrder(root.left)
        preOrder(root.right)

Post-order traversal: visits the left subtree, then the right subtree, and then the root.

def postOrder(root):
    if root:
        postOrder(root.left)
        postOrder(root.right)
        print(root.data, end=' ')

https://habr.com/ru/company/otus/blog/472040/ red black tree

https://blog.sqreen.io/demystifying-radix-trees/ . Radix tree

https://news.ycombinator.com/item?id=18921058 . Radix trees

Tree, suffix tree, B-tree, etc

https://habr.com/ru/post/681940/

https://habr.com/ru/post/455260/ Merkel’s tree

https://humanreadablemag.com/issues/0/articles/the-wonders-of-the-suffix-tree-through-the-lens-of-ukkonen%E2%80%99s-algorithm/

https://benjamincongdon.me/blog/2021/08/17/B-Trees-More-Than-I-Thought-Id-Want-to-Know/

https://erthalion.info/2020/11/28/evolution-of-btree-index-am/

https://medium.com/@adityakumar_98609/fenwick-tree-binary-index-tree-aca7824d9c2a fenwick

https://jornhub.dev/articles/fenwick-trees/ Fenwick tree

###

https://habr.com/ru/company/JetBrains-education/blog/536032/ Видео курсов Computer Science клуба

https://heap.io/blog/engineering/applying-textbook-data-structures-for-real-life-wins

https://www.reddit.com/r/MachineLearning/comments/fdw0ax/d_advanced_courses_update/

http://courses.csail.mit.edu/6.851/

https://compsciclub.ru/

https://www.codementor.io/@svenkatadileepkumar/find-all-the-prime-numbers-less-than-n-in-o-n-time-complexity-1amti1lm2p

https://mitmath.github.io/18S191/Fall20/ . Computational thinking using Julia

https://www.cs.auckland.ac.nz/compsci105s1c/resources/ProblemSolvingwithAlgorithmsandDataStructures.pdf

https://quastor.org/learn/course-introduction/introduction

https://pypi.org/project/circuitbreaker/

https://news.ycombinator.com/item?id=23841491

Succinct

https://github.com/miiohio/succinct

https://habr.com/ru/company/mailru/blog/479822/ Succinct data structures

https://habr.com/ru/post/511646/. how to effectively store integer array

System design

https://www.youtube.com/watch?v=AAMLzNaDkjk&utm_source=reddit https://www.youtube.com/watch?v=q0KGYwNbf-0 https://www.youtube.com/playlist?list=PLMCXHnjXnTnvo6alSjVkgxV-VH6EPyvoX . system design questions

https://www.youtube.com/watch?v=1_HIpTBUHAs&utm_source=reddit design twitter

https://www.youtube.com/watch?v=ap5Z1Q3y3Pw

https://www.youtube.com/watch?v=fowDzrMa-Mo&utm_source=reddit design distributed cache

https://www.youtube.com/watch?v=tzsOc-hBPfw

https://www.youtube.com/watch?v=Eg2BjTmnLS8

Unsorted

https://github.com/mochow13/competitive-programming-library

https://habr.com/ru/company/JetBrains-education/blog/495014/

https://habr.com/ru/company/otus/blog/485194/ Дерево отрезков

https://habr.com/ru/post/484756/ Visual information theory

https://news.ycombinator.com/item?id=21738802

Sort

https://yurichev.com/blog/qsort/ quick sort

https://habr.com/ru/post/484224/ Radix sort

https://youtu.be/3fwCe9Ct8go faster sort

https://youtu.be/lbcswonQmGs selection sort

https://dev.to/s_awdesh duplicates in array, dual pivot sort

https://habr.com/ru/company/otus/blog/476510/ Шахматный Bitboard

Algebraic data types

https://hn.algolia.com/?q=algebraic

https://nrinaudo.github.io/scala-best-practices/definitions/adt.html

https://jrsinclair.com/articles/2019/algebraic-data-types-what-i-wish-someone-had-explained-about-functional-programming/

https://blog.softwaremill.com/algebraic-data-types-in-four-languages-858788043d4e Algebraic data types

Monads

https://blog.kabir.sh/posts/inventing-monads.html

https://samgrayson.me/2019-08-06-monads-as-a-programming-pattern/

https://news.ycombinator.com/item?id=20673506

http://web.stanford.edu/class/archive/cs/cs161/cs161.1168/

http://web.stanford.edu/class/cs166/handouts/100%20Suggested%20Final%20Project%20Topics.pdf

https://www.youtube.com/channel/UC_79TBgKFlEwFp345dYF7ag

алгоритм Хаффмана

 использует только частоту появления одинаковых символов во входном блоке данных.
Ставит в соответствие символам входного потока, которые встречаются чаще, цепочку битов меньшей длины.
И напротив, встречающимся редко - цепочку большей длины.
Для сбора статистики требует двух проходов по входному блоку (также существуют однопроходные адаптивные варианты алгоритма).

Характеристики алгоритма Хаффмана [1]:

Степени сжатия: 8, 1.5, 1 (лучшая, средняя, худшая степени)

Симметричность по времени: 2:1 (за счёт того, что требует двух проходов по массиву сжимаемых данных)

Характерные особенности: один из немногих алгоритмов, который не увеличивает размера исходных данных в худшем случае
 (если не считать необходимости хранить таблицу перекодировки вместе с файлом).

Алгоритм Хаффмана основывается на создании двоичного дерева [3].

Изначально все узлы считаются листьями (конечными узлами), которые представляют символ (в нашем случае байт) и число его появлений. Для построения двоичного дерева используется очередь с приоритетами, где узлу с наименьшей частотой присваивается наивысший приоритет.

Алгоритм будет включать в себя следующие шаги:

Шаг 1. Помещаем все листья с ненулевым числом появлений в очередь с приоритетами (упорядочиваем все листья в порядке убывания числа появления)

Шаг 2. Пока в очереди больше одного узла, выполняем следующие действия:

Шаг 2a. Удаляем из очереди два узла с наивысшим приоритетом (самыми низкими числами появлений)

Шаг 2b. Создаём новый узел, для которого выбранные на шаге 2a узлы являются наследниками. При этом число появлений нового узла (его вес) полагается равным сумме появлений выбранных на шаге 2a узлов

Шаг 2c. Добавляем узел, созданный на шаге 2b, в очередь с приоритетами.

Единственный узел, который останется в очереди с приоритетами в конце работы алгоритма, будет корневым для построенного двоичного дерева.

Чтобы восстановить кодирующие слова, нам необходимо пройти по рёбрам от корневого узла получившегося дерева до каждого листа. При этом каждому ребру присваивается бит “1”, если ребро находится слева, и “0” - если справа (или наоборот).

https://habr.com/ru/post/438512/

https://habr.com/ru/companies/samsung/articles/771572/

https://www.youtube.com/watch?v=TeZqKnC2gvA Visitor design pattern

ML

http://www.benfrederickson.com/distance-metrics/

https://www.youtube.com/watch?v=vS4Zn1a9KUc

https://www.youtube.com/watch?v=7Hlb8YX2-W8

http://cslibrary.stanford.edu/

https://www.youtube.com/watch?v=GiCWlXEhht8 https://www.youtube.com/watch?v=il_t1WVLNxk https://www.youtube.com/watch?v=e5D3NepYvLE https://www.youtube.com/watch?v=eaYX0Ee0Kcg https://www.youtube.com/watch?v=zGv3hOORxh0

https://www.youtube.com/watch?v=eaYX0Ee0Kcg

https://en.wikipedia.org/wiki/Ant_on_a_rubber_rope

https://www.youtube.com/playlist?list=PLBZBJbE_rGRVnpitdvpdY9952IsKMDuev

Hash function and hashtable

https://notes.volution.ro/v1/2022/07/notes/1290a79c/

https://www.andreinc.net/2022/03/15/perfect-hashing-with-java

https://blog.bradfieldcs.com/an-introduction-to-hashing-in-the-era-of-machine-learning-6039394549b0

https://www.andreinc.net/2021/11/08/a-tale-of-java-hash-tables

https://news.ycombinator.com/item?id=29319151

https://www.partow.net/programming/hashfunctions/idx.html

https://programming.guide/hash-tables.html Hash

https://habr.com/ru/post/509220/. Hash table implementation

https://habr.com/ru/company/yandex/blog/572588/ hash tables

<http://www.idryman.org/blog/2017/07/04/learn-hash-table-the-hard-way/

https://medium.com/engineering-brainly/locality-sensitive-hashing-explained-304eb39291e4 Local sensitive hash

minhash

https://en.wikipedia.org/wiki/MinHash minhash

https://moultano.wordpress.com/2018/11/08/minhashing-3kbzhsxyg4467-6/?274619

https://robertheaton.com/2014/05/02/jaccard-similarity-and-minhash-for-winners/

https://www.data-structures-in-practice.com/hash-tables/

https://engineering.fb.com/developer-tools/f14/

https://medium.com/@leventov/smoothiemap-2-the-lowest-memory-hash-table-ever-6bebd06780a3

https://habr.com/ru/company/otus/blog/448992/

https://honest.engineering/posts/hash-functions-survival-guide

https://en.wikipedia.org/wiki/Hash_function

https://rcoh.me/posts/hash-map-analysis/

https://attractivechaos.wordpress.com/2018/10/01/advanced-techniques-to-implement-fast-hash-tables/

https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

https://rcoh.me/posts/hash-map-analysis/

https://www.reddit.com/r/programming/comments/9lw5pu/advanced_techniques_to_implement_fast_hash_tables/

Consistent Hashing

https://www.toptal.com/big-data/consistent-hashing

https://medium.com/@dgryski/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8 Consistent hashing

https://dzone.com/articles/simple-magic-consistent

local sensitive hashing - LSH

http://tylerneylon.com/a/lsh1/

https://www.youtube.com/watch?v=kKRvEJrvvso

https://news.ycombinator.com/item?id=27614381

https://en.wikipedia.org/wiki/Locality-sensitive_hashing

http://unboxresearch.com/articles/lsh_post1.html

Regular expression

https://regexr.com/

https://news.ycombinator.com/item?id=20608032

https://deniskyashif.com/implementing-a-regular-expression-engine/

https://deniskyashif.com/2019/02/17/implementing-a-regular-expression-engine/

https://www.youtube.com/watch?v=l3hda49XcDE

https://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.html

https://rcoh.me/posts/no-magic-regular-expressions-part-3/

https://medium.com/@cdiggins/beyond-regular-expressions-an-introduction-to-parsing-context-free-grammars-ee77bdab5a92

Inerpretators Compiles

https://bernsteinbear.com/blog/bytecode-interpreters/

https://news.ycombinator.com/item?id=18821475