A practical, open-access textbook covering foundations through modern LLMs — with exercises, code examples, and linguistic depth for 20+ Turkic languages.
The book is actively being written. Chapter 1 is in first draft. Sign up or watch the GitHub repository to be notified at launch.
Free forever. Because knowledge about under-resourced languages should be open.
Written for NLP researchers, practitioners, and graduate students who want to work with Turkic languages. No prior knowledge of Turkic linguistics required.
Each chapter builds on the last — starting from scripts and encodings, through morphology and syntax, all the way to large language models and generative AI.
Every chapter includes working code examples using the TurkicNLP toolkit. A companion repository hosts full runnable notebooks for each tutorial.
Examples span Turkish, Kazakh, Uzbek, Azerbaijani, Kyrgyz, Uyghur, Tatar, and more — not just the well-resourced flagship language.
Grounded in published research with proper citations. Covers state-of-the-art models, benchmarks, datasets, and open problems at the frontier of Turkic NLP.
Each chapter ends with exercises ranging from conceptual questions to implementation challenges — suitable for self-study and university courses.
The book is published online under an open license. PDF and print editions may follow, but the web version will always be free and open.
From the basics of the Turkic family to cutting-edge generative AI — structured for sequential reading or chapter-by-chapter reference.
What is NLP and why Turkic languages matter. The Turkic language family: 200M+ speakers, 24+ languages, geographic spread from Turkey to Siberia. Core computational challenges: agglutinative morphology, vowel harmony, script diversity, pro-drop.
A comprehensive reference covering published work, benchmarks, open problems, and research directions. Properly cited throughout — suitable as a course textbook or self-study guide.
Prerequisites: Basic Python, familiarity with machine learning concepts. No prior knowledge of Turkic languages required.
Hands-on tutorials in every chapter using the TurkicNLP toolkit. Real code that runs. Focused on what works in production — not just theory.
Prerequisites: Python and basic NLP concepts (tokenization, embeddings). Each chapter is self-contained enough to jump in anywhere.
The book is being actively written. Watch the GitHub repository to get notified when chapters are published — or follow on social media.
Star the repository and set notifications to "Watching" to be alerted when new chapters land.
Go to GitHub →The companion Python toolkit is already available. Install it and start working with Turkic languages while the book is being written.
Get the Toolkit →The book and the toolkit are both free and open-source. Writing an open textbook of this scope takes hundreds of hours. If this work is useful to you — whether for research, teaching, or building products — consider supporting its development.
Funds go directly toward time spent researching and writing, compute for model experiments referenced in the book, and infrastructure for hosting.