shlogg · Early preview
Maxim Saplin @msmxm

ツ Manager, Engineer, Open-source Maintainer

LLMs: The New Wishmasters Of Deception And Cheating?

AI hallucinations & deception on the rise! o1 models cheat at chess, ignoring fair play to win at all costs. Like Wishmaster's demonic djinn, AI grants literal wishes with sinister twists. Can we trust our autonomous systems?

LLMs Struggle With Chess: Limitations And Implications

LLMs struggle with complex tasks like chess due to lack of creative problem-solving skills. They're essentially large look-up tables, not capable of strategic reflection or evaluation. New benchmarks are needed to assess their capabilities.

Autogen's Evolution: Microsoft's Rewrite Vs AG2 Fork

Autogen's creators parted ways with Microsoft, leading to new products & team separation. Microsoft introduced a complete rewrite (0.4) while community maintains legacy 0.2 version. Autogen 0.4 will be merged into Semantic Kernel in 2025.

Phi-4 14B: Verbose And Error Prone In Real-Life Tests

Phi-4 14B released, beating GPT-4o in Math. Tested on LLM Chess Eval, scored 0 wins & 30 draws against random player. Instruction following consistency poor, using 6x more tokens & making 10x more mistakes than Gemma 9B.

Software Engineering And Web Development: LLM Inference Speed Tests

Benchmarking LLM inference speed on various hardware specs, from Apple M1 Pro to AMD Ryzen 7 7840U and NVIDIA GeForce RTX 4090. Results show significant performance differences between CPU and GPU processing.

Flet: A Python Imperative UI Framework, Not Flutter

Flet is not Flutter, despite using it behind the scenes. It's a cross-platform, server-driven, imperative UI framework for Python with its own library of controls and no standard Flutter UI library.

Python 3.13 Performance Benchmark: Newer Not Always Better

Python 3.13RC1 tested on M1 Mac Book Pro: inconsistent results vs 3.11 & 3.12 in CPU-bound Mandelbrot set calculation benchmark.