2012-07-18

Use Lucene’s MMapDirectory on 64bit platforms, please!

Don’t be afraid – Some clarification to common misunderstandings

Since version 3.1, Apache Lucene and Solr use MMapDirectory by default on 64bit Windows and Solaris systems; since version 3.3 also for 64bit Linux systems. This change lead to some confusion among Lucene and Solr users, because suddenly their systems started to behave differently than in previous versions. On the Lucene and Solr mailing lists a lot of posts arrived from users asking why their Java installation is suddenly consuming three times their physical memory or system administrators complaining about heavy resource usage. Also consultants were starting to tell people that they should not use MMapDirectory and change their solrconfig.xml to work instead with slow SimpleFSDirectory or NIOFSDirectory (which is much slower on Windows, caused by a JVM bug #6265734). From the point of view of the Lucene committers, who carefully decided that using MMapDirectory is the best for those platforms, this is rather annoying, because they know, that Lucene/Solr can work with much better performance than before. Common misinformation about the background of this change causes suboptimal installations of this great search engine everywhere.

In this blog post, I will try to explain the basic operating system facts regarding virtual memory handling in the kernel and how this can be used to largely improve performance of Lucene (“VIRTUAL MEMORY for DUMMIES”). It will also clarify why the blog and mailing list posts done by various people are wrong and contradict the purpose of MMapDirectory. In the second part I will show you some configuration details and settings you should take care of to prevent errors like “mmap failed” and suboptimal performance because of stupid Java heap allocation.

Virtual Memory[1]

Let’s start with your operating system’s kernel: The naive approach to do I/O in software is the way, you have done this since the 1970s – the pattern is simple: whenever you have to work with data on disk, you execute a syscall to your operating system kernel, passing a pointer to some buffer (e.g. a byte[] array in Java) and transfer some bytes from/to disk. After that you parse the buffer contents and do your program logic. If you don’t want to do too many syscalls (because those may cost a lot processing power), you generally use large buffers in your software, so synchronizing the data in the buffer with your disk needs to be done less often. This is one reason, why some people suggest to load the whole Lucene index into Java heap memory (e.g., by using RAMDirectory).

But all modern operating systems like Linux, Windows (NT+), MacOS X, or Solaris provide a much better approach to do this 1970s style of code by using their sophisticated file system caches and memory management features. A feature called “virtual memory” is a good alternative to handle very large and space intensive data structures like a Lucene index. Virtual memory is an integral part of a computer architecture; implementations require hardware support, typically in the form of a memory management unit (MMU) built into the CPU. The way how it works is very simple: Every process gets his own virtual address space where all libraries, heap and stack space is mapped into. This address space in most cases also start at offset zero, which simplifies loading the program code because no relocation of address pointers needs to be done. Every process sees a large unfragmented linear address space it can work on. It is called “virtual memory” because this address space has nothing to do with physical memory, it just looks like so to the process. Software can then access this large address space as if it were real memory without knowing that there are other processes also consuming memory and having their own virtual address space. The underlying operating system works together with the MMU (memory management unit) in the CPU to map those virtual addresses to real memory once they are accessed for the first time. This is done using so called page tables, which are backed by TLBs located in the MMU hardware (translation lookaside buffers, they cache frequently accessed pages). By this, the operating system is able to distribute all running processes’ memory requirements to the real available memory, completely transparent to the running programs.


Schematic drawing of virtual memory
(image from Wikipedia [1], http://en.wikipedia.org/wiki/File:Virtual_memory.svg, licensed by CC BY-SA 3.0)

By using this virtualization, there is one more thing, the operating system can do: If there is not enough physical memory, it can decide to “swap out” pages no longer used by the processes, freeing physical memory for other processes or caching more important file system operations. Once a process tries to access a virtual address, which was paged out, it is reloaded to main memory and made available to the process. The process does not have to do anything, it is completely transparent. This is a good thing to applications because they don’t need to know anything about the amount of memory available; but also leads to problems for very memory intensive applications like Lucene.

Lucene & Virtual Memory

Let’s take the example of loading the whole index or large parts of it into “memory” (we already know, it is only virtual memory). If we allocate a RAMDirectory and load all index files into it, we are working against the operating system: The operating system tries to optimize disk accesses, so it caches already all disk I/O in physical memory. We copy all these cache contents into our own virtual address space, consuming horrible amounts of physical memory (and we must wait for the copy operation to take place!). As physical memory is limited, the operating system may, of course, decide to swap out our large RAMDirectory and where does it land? – On disk again (in the OS swap file)! In fact, we are fighting against our O/S kernel who pages out all stuff we loaded from disk [2]. So RAMDirectory is not a good idea to optimize index loading times! Additionally, RAMDirectory has also more problems related to garbage collection and concurrency. Because the data residing in swap space, Java’s garbage collector has a hard job to free the memory in its own heap management. This leads to high disk I/O, slow index access times, and minute-long latency in your searching code caused by the garbage collector driving crazy.

On the other hand, if we don’t use RAMDirectory to buffer our index and use NIOFSDirectory or SimpleFSDirectory, we have to pay another price: Our code has to do a lot of syscalls to the O/S kernel to copy blocks of data between the disk or filesystem cache and our buffers residing in Java heap. This needs to be done on every search request, over and over again.

Memory Mapping Files

The solution to the above issues is MMapDirectory, which uses virtual memory and a kernel feature called “mmap” [3] to access the disk files.

In our previous approaches, we were relying on using a syscall to copy the data between the file system cache and our local Java heap. How about directly accessing the file system cache? This is what mmap does!

Basically mmap does the same like handling the Lucene index as a swap file. The mmap() syscall tells the O/S kernel to virtually map our whole index files into the previously described virtual address space, and make them look like RAM available to our Lucene process. We can then access our index file on disk just like it would be a large byte[] array (in Java this is encapsulated by a ByteBuffer interface to make it safe for use by Java code). If we access this virtual address space from the Lucene code we don’t need to do any syscalls, the processor’s MMU and TLB handles all the mapping for us. If the data is only on disk, the MMU will cause an interrupt and the O/S kernel will load the data into file system cache. If it is already in cache, MMU/TLB map it directly to the physical memory in file system cache. It is now just a native memory access, nothing more! We don’t have to take care of paging in/out of buffers, all this is managed by the O/S kernel. Furthermore, we have no concurrency issue, the only overhead over a standard byte[] array is some wrapping caused by Java’s ByteBuffer interface (it is still slower than a real byte[] array, but  that is the only way to use mmap from Java and is much faster than all other directory implementations shipped with Lucene). We also waste no physical memory, as we operate directly on the O/S cache, avoiding all Java GC issues described before.

What does this all mean to our Lucene/Solr application?
  • We should not work against the operating system anymore, so allocate as less as possible heap space (-Xmx Java option). Remember, our index accesses rely on passed directly to O/S cache! This is also very friendly to the Java garbage collector.
  • Free as much as possible physical memory to be available for the O/S kernel as file system cache. Remember, our Lucene code works directly on it, so reducing the number of paging/swapping between disk and memory. Allocating too much heap to our Lucene application hurts performance! Lucene does not require it with MMapDirectory.

Why does this only work as expected on operating systems and Java virtual machines with 64bit?

One limitation of 32bit platforms is the size of pointers, they can refer to any address within 0 and 232-1, which is 4 Gigabytes. Most operating systems limit that address space to 3 Gigabytes because the remaining address space is reserved for use by device hardware and similar things. This means the overall linear address space provided to any process is limited to 3 Gigabytes, so you cannot map any file larger than that into this “small” address space to be available as big byte[] array. And when you mapped that one large file, there is no virtual space (address like “house number”) available anymore. As physical memory sizes in current systems already have gone beyond that size, there is no address space available to make use for mapping files without wasting resources (in our case “address space”, not physical memory!).

On 64bit platforms this is different: 264-1 is a very large number, a number in excess of 18 quintillion bytes, so there is no real limit in address space. Unfortunately, most hardware (the MMU, CPU’s bus system) and operating systems are limiting this address space to 47 bits for user mode applications (Windows: 43 bits) [4]. But there is still much of addressing space available to map terabytes of data.

Common misunderstandings

If you have read carefully what I have told you about virtual memory, you can easily verify that the following is true:
  • MMapDirectory does not consume additional memory and the size of mapped index files is not limited by the physical memory available on your server. By mmap() files, we only reserve address space not memory! Remember, address space on 64bit platforms is for free!
  • MMapDirectory will not load the whole index into physical memory. Why should it do this? We just ask the operating system to map the file into address space for easy access, by no means we are requesting more. Java and the O/S optionally provide the option to try loading the whole file into RAM (if enough is available), but Lucene does not use that option (we may add this possibility in a later version).
  • MMapDirectory does not overload the server when “top” reports horrible amounts of memory. “top” (on Linux) has three columns related to memory: “VIRT”, “RES”, and “SHR”. The first one (VIRT, virtual) is reporting allocated virtual address space (and that one is for free on 64 bit platforms!). This number can be multiple times of your index size or physical memory when merges are running in IndexWriter. If you have only one IndexReader open it should be approximately equal to allocated heap space (-Xmx) plus index size. It does not show physical memory used by the process. The second column (RES, resident) memory shows how much (physical) memory the process allocated for operating and should be in the size of your Java heap space. The last column (SHR, shared) shows how much of the allocated virtual address space is shared with other processes. If you have several Java applications using MMapDirectory to access the same index, you will see this number going up. Generally, you will see the space needed by shared system libraries, JAR files, and the process executable itself (which are also mmapped).

How to configure my operating system and Java VM to make optimal use of MMapDirectory?

First of all, default settings in Linux distributions and Solaris/Windows are perfectly fine. But there are some paranoid system administrators around, that want to control everything (with lack of understanding). Those limit the maximum amount of virtual address space that can be allocated by applications. So please check that “ulimit -v” and “ulimit -m” both report “unlimited”, otherwise it may happen that MMapDirectory reports “mmap failed” while opening your index. If this error still happens on systems with lot’s of very large indexes, each of those with many segments, you may need to tune your kernel parameters in /etc/sysctl.conf: The default value of vm.max_map_count is 65530, you may need to raise it. I think, for Windows and Solaris systems there are similar settings available, but it is up to the reader to find out how to use them.

For configuring your Java VM, you should rethink your memory requirements: Give only the really needed amount of heap space and leave as much as possible to the O/S. As a rule of thumb: Don’t use more than ¼ of your physical memory as heap space for Java running Lucene/Solr, keep the remaining memory free for the operating system cache. If you have more applications running on your server, adjust accordingly. As usual the more physical memory the better, but you don’t need as much physical memory as your index size. The kernel does a good job in paging in frequently used pages from your index.

A good possibility to check that you have configured your system optimally is by looking at both "top" (and correctly interpreting it, see above) and the similar command "iotop" (can be installed, e.g., on Ubuntu Linux by "apt-get install iotop"). If your system does lots of swap in/swap out for the Lucene process, reduce heap size, you possibly used too much. If you see lot's of disk I/O, buy more RUM (Simon Willnauer) so mmapped files don't need to be paged in/out all the time, and finally: buy SSDs.

Happy mmapping!

Bibliography

2012-07-11

The Policeman’s Horror: Default Locales, Default Charsets, and Default Timezones

Time for a tool to prevent any effects coming from them!

Did you ever try to run software downloaded from the net on a computer with Turkish locale? I think most of you never did that. And if you ask Turkish IT specialists, they will tell you: “It is better to configure your computer using any other locale, but not tr_TR”. I think you have no clue what I am talking about? Maybe this article gives you a hint: “A Cellphone’s Missing Dot Kills Two People, Puts Three More in Jail”.

What you see in lots of software is a so-called case-insensitive matching of keywords like parameter names or function names. This is implemented in most cases by lowercasing or upper-casing the input text and compare it with a list of already lowercased/uppercased items. This works in most cases fine, if you are anywhere in the world, except Turkey! Because most programmers don’t care about running their software in Turkey, they do not test their software under the Turkish locale.

But what happens with the case-insensitive matching if running in Turkey? Let’s take an example:

User enters “BILLY” in the search field of you application. The application then uses the approach presented before and lower-cases “BILLY” and then compares it to an internal table (e.g. our search index, parameter table, function table,...). So we search in this table for “billy”. So far so good, works perfect in USA, Germany, Kenia, almost everywhere - except Turkey. What happens in the Turkish locale when we lowercase “BILLY”? After reading the above article, you might expect it: The “BILLY”.toLowerCase() statement in Java returns “bılly” (note the dot-less i: 'ı' U+0131). You can try this out on your local machine without reconfiguring it to use the Turkish locale, just try the following Java code:
assertEquals(“bılly”, “BILLY”.toLowerCase(new Locale(“tr”,“TR”)));
The same happens vice versa, if you uppercase a ‘i’, it gets I with dot (‘İ’ U+0130). This is really serious, million lines of code out there in Java and other languages don’t take care that the String.toLowerCase() and String.toUpperCase() methods can optionally take a defined Locale (more about that later). Some examples from projects I am involved in:

  • Try to run an XSLT stylesheet using Apache XALAN-XSLTC (or Java 5’s internal XSLT interpreter) in the Turkish locale. It will fail with “unknown instruction”, because XALAN-XSLTC compiles the XSLT to Java Bytecode and somehow lowercases a virtual machine opcode before compiling it with BCEL (see XALANJ-2420, BCEL bug #38787).
  • The HTML SAX parser NekoHTML uses locale-less uppercasing/lowercasing to normalize charset names and element names. I opened a bug report (issue #3544334).
  • If you use PHP as your favourite scripting language, which is not case sensitive for class names and other language constructs, it will throw a compile error once you try to call a function with an “i” in it (see PHP bug #18556). Unfortunately it is unlikely that this serious bug is fixed in PHP 5.3 or 5.4!

The question is now: How to solve this?

The most correct way to do this is to not lowercase at all! For comparing case insensitive, Unicode defines “case folding”, which is a so-called canonical form of text where all upper/lower case of any character is normalized away. Unfortunately this case folded text may no longer be readable text (this depends on the implementation, but in most cases it is). It just ensures, that case-folded text can be compared to each other in a case-insensitive way. Unfortunately Java does not offer you a function to get this string, but ICU-4J can do (see UCharacter#foldCase). But Java offers something much better: String.equalsIgnoreCase(String), which internally handles case folding! But in lots of cases you cannot use this fantastic method, because you want to lookup such strings in a HashMap or other dictionary. Without modifying HashMap to use equalsIgnoreCase, this would never work. So we are back at lower-casing! As mentioned before, you can pass a locale to String.toLowerCase(), so the naive approach would be to tell Java, that we are in the US or using the English language: String.toLowerCase(Locale.US) or String.toLowerCase(Locale.ENGLISH). This produces identical results but is still not consistent. What happens if the US government decides to lowercase/uppercase like in Turkey? -- OK, don’t use Locale.US (this is also too US-centric). Locale.ENGLISH is fine and very generic, but languages also change over the years (who knows?), but we want to have it language invariant! If you are using Java 6, there is a much better constant: Locale.ROOT -- You should use this constant for our lowercase example: String.toLowerCase(Locale.ROOT).
You should start now and do a global search/replace on all your Java projects (if you do not rely on language specific presentation of text)! REALLY!
String.toLowerCase is not the only example of “automatic default locale usage” in the Java API. There are also things like transforming dates or numbers to strings. If you use the Formatter class, and you run it somewhere in another country, String.format(“%f”, 15.5f) may not always use a period (‘.’) as decimal separator; most Germans will know this. Passing a specific locale here helps in most cases. If you are writing a GUI in English language, pass Locale.ENGLISH everywhere, otherwise text output of numbers or dates may not match the language of your GUI! If you want Formatter to behave in a invariant way, use Locale.ROOT, too (then it will for sure format numbers with period and no comma for thousands, just like Float.toString(float) does).

A second problem affecting lot’s of software are two other system-wide configurable default settings: default charset/encoding and timezone. If you open a text file with FileReader or convert an InputStream to a Reader with InputStreamReader, Java assumes automatically, that the input is in the default platform encoding. This may be fine, if you want the text to be parsed by the defaults of the operating system -- but if you pass a text file together with your software package (maybe as resource in your JAR file) and then accidentally read it using the platform’s default charset... it’ll break your app! So my second recommendation:
Always pass a character set to any method converting bytes to strings (like InputStream <=> Reader, String.getBytes(),...). If you wrote the text file and ship it together with your app, only you know its encoding!
For timezones, similar examples can be found.

How this affects Apache Lucene!

Apache Lucene is a full-text search engine and deals with text from different languages all the time; Apache Solr is a enterprise search server on top of Lucene and deals with input documents in lots of different charsets and languages. It is therefore essential for a search library like Lucene to be as most independent from local machine settings as possible. A library must make it explicit what input it wants to have. So we require charsets and locales in all public and private APIs (or we only take e.g. java.io.Reader instead of InputStream if we expect text coming in), so the user must take care.

Robert Muir and I reviewed the source code of Apache Lucene and Solr for the coming version 4.0 (an alpha version is already available on Lucene’s homepage, documentation is here). We did this quite often, but whenever a new piece of code is committed to the source tree, it may happen that undefined locales, charsets, or similar things appear again. In most cases it is not the fault of the committer, this happens because auto-complete in IDE automatically lists possible methods and parameters to the developer. Often you select the easiest variant (like String.toLowerCase()).

Using default locales, charsets and timezones are in my opinion a big design issue in programming languages like Java. If there are locale-sensitive methods, those methods should take a locale, if you convert a byte[] stream to a char[] stream, a charset must be given. Automatically falling back to defaults is a no-go in the server environment. 
If a developer is interested in using the default locale of the user’s computer, he can always explicitely give the locale or charset. In our example this would be String.toLowerCase(Locale.getDefault()). This is more verbose, but it is obvious what the developer intends to do.

My proposal is to ban all those default charset and locale methods / classes in the Java API by deprecating them as soon as possible, so users stop using them implicit!


Robert’s and my intention is to automatically fail the nightly builds (or compilation on the developer’s machine) when somebody uses one of the above methods in Lucene’s or Solr’s source code. We looked at different solutions like PMD or FindBugs, but both tools are too sloppy to handle that in a consistent way (PMD does not have any “default charset” method detection and Findbugs has only a very short list of method signatures). In addition, both PMD and FindBugs are very slow and often fail to correctly detect all problems. For Lucene builds we only need a tool, that looks into the byte code of all generated Java classes of Apache Lucene and Solr, and fails the build if any signature that violates our requirements is found.

A new Tool for the Policeman

I started to hack a tool as a custom ANT task using ASM 4.0 (Lightweight Java Bytecode Manipulation Framework). The idea was to provide a list of methods signatures, field names and plain class names that should fail the build, once bytecode accesses it in any way. A first version of this task was published in issue LUCENE-4199, later improvements was to add support for fields (LUCENE-4202) and a sophisticated signature expansion to also catch calls to subclasses of the given signatures (LUCENE-4206).

In the meantime, Robert worked on the list of “forbidden” APIs. This is what came out in the first version:
java.lang.String#<init>(byte[])
java.lang.String#<init>(byte[],int)
java.lang.String#<init>(byte[],int,int)
java.lang.String#<init>(byte[],int,int,int)
java.lang.String#getBytes()
java.lang.String#getBytes(int,int,byte[],int) 
java.lang.String#toLowerCase()
java.lang.String#toUpperCase()
java.lang.String#format(java.lang.String,java.lang.Object[])
java.io.FileReader
java.io.FileWriter
java.io.ByteArrayOutputStream#toString()
java.io.InputStreamReader#<init>(java.io.InputStream)
java.io.OutputStreamWriter#<init>(java.io.OutputStream)
java.io.PrintStream#<init>(java.io.File)
java.io.PrintStream#<init>(java.io.OutputStream)
java.io.PrintStream#<init>(java.io.OutputStream,boolean)
java.io.PrintStream#<init>(java.lang.String)
java.io.PrintWriter#<init>(java.io.File)
java.io.PrintWriter#<init>(java.io.OutputStream)
java.io.PrintWriter#<init>(java.io.OutputStream,boolean)
java.io.PrintWriter#<init>(java.lang.String)
java.io.PrintWriter#format(java.lang.String,java.lang.Object[])
java.io.PrintWriter#printf(java.lang.String,java.lang.Object[])
java.nio.charset.Charset#displayName()
java.text.BreakIterator#getCharacterInstance()
java.text.BreakIterator#getLineInstance()
java.text.BreakIterator#getSentenceInstance()
java.text.BreakIterator#getWordInstance()
java.text.Collator#getInstance()
java.text.DateFormat#getTimeInstance()
java.text.DateFormat#getTimeInstance(int)
java.text.DateFormat#getDateInstance()
java.text.DateFormat#getDateInstance(int)
java.text.DateFormat#getDateTimeInstance()
java.text.DateFormat#getDateTimeInstance(int,int)
java.text.DateFormat#getInstance()
java.text.DateFormatSymbols#<init>()
java.text.DateFormatSymbols#getInstance()
java.text.DecimalFormat#<init>()
java.text.DecimalFormat#<init>(java.lang.String)
java.text.DecimalFormatSymbols#<init>()
java.text.DecimalFormatSymbols#getInstance()
java.text.MessageFormat#<init>(java.lang.String)
java.text.NumberFormat#getInstance()
java.text.NumberFormat#getNumberInstance()
java.text.NumberFormat#getIntegerInstance()
java.text.NumberFormat#getCurrencyInstance()
java.text.NumberFormat#getPercentInstance()
java.text.SimpleDateFormat#<init>()
java.text.SimpleDateFormat#<init>(java.lang.String)
java.util.Calendar#<init>()
java.util.Calendar#getInstance()
java.util.Calendar#getInstance(java.util.Locale)
java.util.Calendar#getInstance(java.util.TimeZone)
java.util.Currency#getSymbol()
java.util.GregorianCalendar#<init>()
java.util.GregorianCalendar#<init>(int,int,int)
java.util.GregorianCalendar#<init>(int,int,int,int,int)
java.util.GregorianCalendar#<init>(int,int,int,int,int,int)
java.util.GregorianCalendar#<init>(java.util.Locale)
java.util.GregorianCalendar#<init>(java.util.TimeZone)
java.util.Scanner#<init>(java.io.InputStream)
java.util.Scanner#<init>(java.io.File)
java.util.Scanner#<init>(java.nio.channels.ReadableByteChannel)
java.util.Formatter#<init>()
java.util.Formatter#<init>(java.lang.Appendable)
java.util.Formatter#<init>(java.io.File)
java.util.Formatter#<init>(java.io.File,java.lang.String)
java.util.Formatter#<init>(java.io.OutputStream)
java.util.Formatter#<init>(java.io.OutputStream,java.lang.String)
java.util.Formatter#<init>(java.io.PrintStream)
java.util.Formatter#<init>(java.lang.String)
java.util.Formatter#<init>(java.lang.String,java.lang.String)
Using this easily extend-able list, saved in a text file (UTF-8 encoded!), you can invoke my new ANT task (after registering it with <taskdef/>) very easy -- taken from Lucene/Solr’s build.xml:
<taskdef resource="lucene-solr.antlib.xml">
  <classpath>
    <pathelement location="${custom-tasks.dir}/build/classes/java" />
    <fileset dir="${custom-tasks.dir}/lib" includes="asm-debug-all-4.0.jar" />
  </classpath>
</taskdef>
<forbidden-apis>
  <classpath refid="additional.dependencies"/>
  <apiFileSet dir="${custom-tasks.dir}/forbiddenApis">
    <include name="jdk.txt" />
    <include name="jdk-deprecated.txt" />
    <include name="commons-io.txt" />
  </apiFileSet>
  <fileset dir="${basedir}/build" includes="**/*.class" />
</forbidden-apis>
The classpath given is used to look up the API signatures (provided as apiFileSet). Classpath is only needed if signatures are coming from 3rd party libraries. The inner fileset should list all class files to be checked. For running the task you also need asm-all-4.0.jar available in the task’s classpath.

If you are interested, take the source code, it is open source and released as part of the tool set shipped with Apache Lucene & Solr: Source, API lists (revision number 1360240).

At the moment we are investigating other opportunities brought by that tool:
  • We want to ban System.out/err or things like horrible Eclipse-like try...catch...printStackTrace() auto-generated Exception stubs. We can just ban those fields from the java.lang.System class and of course, Throwable#printStackTrace().
  • Using optimized Lucene-provided replacements for JDK API calls. This can be enforced by failing on the JDK signatures.
  • Failing the build on deprecated calls to Java’s API. We can of course print warnings for deprecations, but failing the build is better. And: We use deprecation annotations in Lucene’s own library code, so javac-generated warnings don’t help. We can use the list of deprecated stuff from JDK Javadocs to trigger the failures.
I hope other projects take a similar approach to scan their binary/source code and free it from system dependent API calls, which are not predictable for production systems in the server environment.

Thanks to Robert Muir and Dawid Weiss for help and suggestions!

EDIT (2015-03-14): On 2013-02-04, I released the plugin as Apache Ant, Apache Maven and CLI task on Google Code; later on 2015-03-14, it was migrated to Github. The project URL is: https://github.com/policeman-tools/forbidden-apis. The tool is available to your builds using Maven/Ivy through Maven Central and Sonatype repositories. Nightly snapshot builds are done by the Policeman Jenkins Server and can be downloaded from the Sonatype Snapshot repository.