BY Jason Hunter & Mike Wooldridge, MarkLogic

Covers MarkLogic 8 ∙ April 2016


This book describes the MarkLogic Server internals including its data model, indexing system, update model, and operational behaviors. It’s intended for a technical audience such as someone new to MarkLogic who wants to understand its capabilities, or someone already familiar with MarkLogic who wants to understand what’s going on under the hood.

This book is not an introduction to using MarkLogic Server. For that you can read the official product documentation. Instead, this book explains the principles upon which MarkLogic is built. The goal isn’t to teach you to write code, but rather to help you understand what’s going on behind your code, and thus help you to write better and more robust applications.

Chapter 1 provides a high-level overview of MarkLogic Server. Chapter 2 explains MarkLogic’s core indexes, transactional storage system, multi- host clustering, and various coding options. This is a natural stopping point for the casual reader. Chapter 3 covers advanced indexing as well as topics like bitemporal, semantics, rebalancing, tiered storage, Hadoop integration, failover, and replication. It also discusses the ecosystem of tools, libraries, and plug-ins (many of them open source) built up around MarkLogic.

This third edition of the book adds discussions of features introduced in MarkLogic
7 and 8 including JSON and JavaScript support, semantics, bitemporal, rebalancing and forest assignment policies, tiered storage and super-databases, incremental backup, query-based flexible replication, the Java and Node.js Client APIs, custom tokenization, relevance superboosting, monitoring history, and a new distribute timestamps option. It also adds coverage for a few older features (such as “mild not” and wildcard matching) and expanded coverage on search relevance.


What follows is a conceptual exploration of MarkLogic’s capabilities—not a book about programming. However, the book does include code examples written in XQuery or JavaScript to explain certain ideas that are best relayed through code. A complete version of the code examples is available for download on GitHub.

When referring to MarkLogic built-in functions, we’ll reference the XQuery versions of the functions, e.g., xdmp:document-load(). In most cases, there are equivalent JavaScript versions. To access the function in JavaScript, replace the “:” with a “.” and change the hyphenation to camel-cased text, e.g., xdmp.documentLoad().1

1 Tip: For fast access to documentation for a function, go to You don’t even need the namespace.

2 You can find the full set of API documentation, as well as in-depth guides and tutorials, at the MarkLogic Product Documentation . The site includes a robust search feature that’s built, naturally, on MarkLogic.


Jason Hunter is MarkLogic’s CTO Asia-Pacific and one of the company’s first employees. He works across sales, consulting, partnerships, and engineering (he led development on Jason is probably best known as the author of the book Java Servlet Programming (O’Reilly Media) and the creator of the JDOM open source project for Java-optimized XML manipulation. He’s an Apache Software Foundation member and former vice president as well as an original contributor to Apache Tomcat and Apache Ant. He’s also a frequent speaker.

Mike Wooldridge is a Senior Software Engineer at MarkLogic focusing on front-end application development. He has built some of the software tools that ship with MarkLogic, including Monitoring History, and has authored open-source projects such as MLPHP. He has also written dozens of books for Wiley on graphics software and web design, including Teach Yourself Visually Photoshop CC and Teach Yourself Visually HTML5. He’s a frequent contributor to the MarkLogic Developer Blog and has spoken at MarkLogic World conferences.


You may also like...