tag:blogger.com,1999:blog-13538962023921924122024-03-29T12:03:38.551+01:00The Generics Policeman BlogUwe Schindlerhttp://www.blogger.com/profile/08079070589736993766noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-1353896202392192412.post-83535784872878111042021-08-11T16:50:00.018+02:002021-08-19T13:04:21.537+02:00Pixel 5 Gap Gate: It's not only the gap!<p>In April this year I bought a <b>Google Pixel 5</b> phone to replace my old <i>Samsung Galaxy S7</i>. I was searching for a phone that has a small size but still all new features and good camera. The Pixel 5 was <b><u>the</u> </b>ideal candidate, especially as its NFC sensor for <b>Google Pay</b> and my <b>blood sugar sensor Freestyle Libre</b> works much better. In addition the camera is producing fairly good pictures. The aluminium body makes the backside much easier to hold without a cover <i>(Samsung Galaxy S7 is impossible to hold and work with if you don't have a cover on the back side).</i></p><h3 style="text-align: left;">The new phone - a quick review</h3><p>When I got the Pixel 5 in April, everything was looking great: I was really happy to have the phone. The software was much better than Samsung's bloatware <i>(like Samsungs horrible <b>Bixby</b>)</i>. Phone was very fast and it was a pleasure to work with. After unpacking it, I also checked, if the phone is affected by the <a href="https://support.google.com/pixelphone/thread/77741994/pixel-5-gap-between-screen-and-body?hl=en">Pixel Gap Gate</a> (#PixelGapGate) - which wasn't the case! Mentioning the well known "Pixel Gap Gate" is also the reason for this blog post: After heavy use for a few weeks, mostly in office at home, and inductive charging (which heats up the phone more than by charging with USB3 cable), I figured out that my phone was indeed affected by #PixelGapGate, it just took longer. On the upper side of the phone, the gap between body and display started to open a bit. During the time it got larger. In June it was approximately as large to put your fingernail inbetween:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5VMExAkVvGbY3_4MoSyVWznsX-Ixfv5U_nz4dxJ5Cmmi3U_eKSvGwSVX77U_KdBLMTmFt5kaCeQ_D9EM13W3Fe81zTbjeZd1HbBi9MW9AZS2BV884P5WJ1h3bribW4tzDV6kxUJaDDqw/s1442/20210811_160354.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="961" data-original-width="1442" height="343" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5VMExAkVvGbY3_4MoSyVWznsX-Ixfv5U_nz4dxJ5Cmmi3U_eKSvGwSVX77U_KdBLMTmFt5kaCeQ_D9EM13W3Fe81zTbjeZd1HbBi9MW9AZS2BV884P5WJ1h3bribW4tzDV6kxUJaDDqw/w516-h343/20210811_160354.jpg" width="516" /></a></div><p>At first, I <b>did not notice any other issues</b>, so I was happy with Google's statement about <a href="https://support.google.com/pixelphone/thread/77741994/pixel-5-gap-between-screen-and-body?hl=en">#PixelGapGate</a>.</p><h3 style="text-align: left;">Proximity sensor issues started</h3><p>Starting of July, when Corona restrictions were loosened, I spent more time outside. I did a large number of phone calls without bluetooth headset: <i>Back to life!</i> I noticed quite fast that the phone screen went dark as soon as you started a call: You enter phone number, press "Call" button and half a second later, screen went dark. At first, I thought that this might be an issue of the Google Phone app, but after several updates with Play Store this did not change. So I started to investigate: It took a while to get the connection:</p><p></p><ul style="text-align: left;"><li>Phone screen only went dark when you did not use a headset. It did not matter if you dial yourself or somebody else is calling you. It just happened instantly!</li><li>In the car and with a Bluetooth headset, the phone screen never went dark.</li><li>If you call with loudspeaker turned on, the screen also does not turn dark.</li></ul><p style="text-align: left;">Those two facts lead me to the right direction: <i>Why does the phone screen normally wents dark?</i> <b>YES! </b><b>Because of you move the smartphone to your ears!</b> It is just a protection, so you not accidentally hit on-screen buttons while you listen to the phone call. If you move the phone away from your ears, screen switches on again.</p><p style="text-align: left;"><i>You can easily test this:</i> Start a phone call with any Android device. Screen should be turned on. Then move your hand at the top of the phone and when it gets closer than 5 centimeters, the screen switches off. If you move away the hand, screen goes on again. Responsible for this is the proximity sensor, which is a default on all smartphones (it was also installed on most old Nokia or Siemens Phones of the pre-smartphone era).</p><p style="text-align: left;">To further investigate, I installed a <a href="https://play.google.com/store/apps/details?id=it.sourcenetitalia.proximitysensortest" rel="nofollow">proximity sensor test app</a>. I also did this on my old phone, and then the issue was obvious: After starting the app on the Pixel 5, the sensor app showed a green frame and the text "NEAR". On my old phone, the proximity sensor behaved as it should: The frame was colored red and the text "FAR" appeared. As soon as I moved the hand over the sensor of the Galaxy S7, screen went green and showed "NEAR". In contrast, on the Pixel 5, the sensor did not react to my hand at all.</p><p style="text-align: left;">Digging a bit more, I figured out, that the sensor works, if you take your other hand and then press against the screen on the top-left or the top-right corner of the screen (not in the center, because the sensor is located there). If you then move the other hand in front of the sensor, the test app confirmed, that proximity sensor was reacting in the same way like on my old phone. I searched Google and found a lot of complaints about that:</p><p style="text-align: left;"></p><ul style="text-align: left;"><li><a href="https://www.reddit.com/r/GooglePixel/comments/jbl1j4/pixel_5_proximity_sensor_issue/">Reddit</a></li><li><a href="https://support.google.com/pixelphone/thread/78164488/n%C3%A4herungssensor?hl=de&authuser=0">Google Support Forum (German)</a></li></ul><p></p><p style="text-align: left;">One person also posted a video, which exactly reproduced what I have seen (source: <a href="https://drive.google.com/file/d/1S-nvsvxE_OyiqyS1DURfDLt9vTGyLCkT/view">https://drive.google.com/file/d/1S-nvsvxE_OyiqyS1DURfDLt9vTGyLCkT/view</a>):</p><div class="separator" style="clear: both; text-align: center;"><iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='380' height='590' src='https://www.blogger.com/video.g?token=AD6v5dwTJKslsHDC58FDcZpA9RZjw-JL8dU67fFEj3pgzYh0c6-j-X0i5BuEfb3S8oa-5lk6KHDAEhOTAfxk8reOmw' class='b-hbp-video b-uploaded' frameborder='0'></iframe></div><p style="clear: both; text-align: left;">This quite clearly illustrates my issue. I was happy that I was not alone!</p><h3 style="clear: both; text-align: left;">What does it have to do with #PixelGapGate?</h3><p style="clear: both; text-align: left;">It is just bringing together the two issues: There is a gap between screen and body (<a href="https://support.google.com/pixelphone/thread/77741994/pixel-5-gap-between-screen-and-body?hl=en">#PixelGapGate</a>) and the proximity sensor issue is solved by pressing on the screen! To explain how this works together, you have to understand how the gap sensor is installed on the phone: In older phoes, the gap sensor is at the top of the phone as a separate hole next to the earphones. In most cases it is powered by a capacitive or infrared sensor (you can test this if you put other objects than fingers in front of sensor, infrared sensors detect only fingers/ears, capacitive sensors detect almost everything). The problem with newer phones is that the whole front of phone is the display, so there is no space to insert a small hole. In fact, the Pixel 5 proximity sensor is at the top center of the display and mounted behind the display glass. If it is enabled, you can see some white flickering at top center of screen:</p><div class="separator" style="clear: both; text-align: center;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIfH9LLZJS41d3CoWYxuMzi4AMF2kdoGaIVa-hxMur37YMxpW4DGx8-6YT0OWxjPWitjsdQLvo7G82FciLqox7hTX_LmsUB28IKWuOuizn4jyb8016RB_nGgxFjWZpAtPBsfii1hOf_8c/s2764/20210811_171151.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="2764" data-original-width="2268" height="498" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIfH9LLZJS41d3CoWYxuMzi4AMF2kdoGaIVa-hxMur37YMxpW4DGx8-6YT0OWxjPWitjsdQLvo7G82FciLqox7hTX_LmsUB28IKWuOuizn4jyb8016RB_nGgxFjWZpAtPBsfii1hOf_8c/w409-h498/20210811_171151.jpg" width="409" /></a></div></div><p style="clear: both; text-align: left;">This flickering already <a href="https://9to5google.com/2020/10/19/how-to-disable-the-annoying-flashing-dot-on-your-pixel-5/">annoyed</a> a lot of people, because you can see it and it looks like some broken pixel of the display. Some applications like <a href="https://play.google.com/store/apps/details?id=com.google.android.apps.chromecast.app" rel="nofollow">Google Home</a> or the <a href="https://play.google.com/store/apps/details?id=de.number26.android" rel="nofollow" target="_blank">N26 banking app</a> all the time query the proximity sensor so it is flickering all the time. On my phone I disabled all those apps, so the proximity sensor is only active during phone calls and when the test app is running.</p><p style="clear: both; text-align: left;">Because the proximity sensor is mounted behind the display glass, the display glass also has an effect on it. It looks like on some Pixel 5 phones, there is a gap between the display glass an the sensor, so when you press on the glass it gets closer. If there is a small gap, the sensor "thinks" there is something in front because there's some adjustment in the sensor, so it looks through the display if it is close enough. Somebody else also told me that the proximity sensor needs to be shielded from light coming in from the side, which is obviously no longer the case if there is a gap. And now you see the connection to <a href="https://support.google.com/pixelphone/thread/77741994/pixel-5-gap-between-screen-and-body?hl=en">#PixelGapGate</a>: GapGate causes an additional gap: If the glass moves away, there's also a gap between sensor and display glass and something in the sensor therefore detects an object in front of the display.</p><p style="clear: both; text-align: left;">I contacted Google Support and they told me that they don't know about such a problem and that the Gap issue is no issue at all. Phone is still water resistant. But they offered to replace my phone. Problem was that I did not buy the phone from Google's web store, but instead from a local retailer. In that case I would need to send in the phone and live without. As you all know, I am diabetic and I regularily check my blood sugar using my mobile phone with <a href="https://www.freestylelibre.de/" rel="nofollow">Freestyle Libre</a> Bluetooth Low Energy (BLE) sensor. Also changing 2 times phones and copying data around was no option for me. So I looked for other possible solutions.</p><h3 style="clear: both; text-align: left;">(Temporarily) Fixing the Issue</h3><p style="clear: both; text-align: left;">So I applied some fix, which was also suggested by members of above forums about <a href="https://support.google.com/pixelphone/thread/77741994/pixel-5-gap-between-screen-and-body?hl=en">#PixelGapGate</a>: Try to fix the gap and check if proximity sensor works again! So I took a hairdryer and a huge amount of analogous books - The Brockhaus Encyclopedia! First I heated up the phone with the hairdryer (this makes the glue smooth) and then placed it between those Brockhaus Encyclopedia books. I put approximate 5 kg of books on top of the display and went for sleep.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjSesYQRyea-r9axJnpPz75j-SdGk1GcOhMtZ0NIopMCWNAr_eV6fmyDVc6DlTz0Rw7vJW7Bgwukpj1bwvemKhYwxK7S-2T5IQks_Kjd6MSBTa4te5LcIW83rvBEglSD07jYi-UdsXS7Y/s2316/20210811_160526%257E2.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="2316" data-original-width="2213" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjSesYQRyea-r9axJnpPz75j-SdGk1GcOhMtZ0NIopMCWNAr_eV6fmyDVc6DlTz0Rw7vJW7Bgwukpj1bwvemKhYwxK7S-2T5IQks_Kjd6MSBTa4te5LcIW83rvBEglSD07jYi-UdsXS7Y/s320/20210811_160526%257E2.jpg" width="306" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTU_h9ZmEVwSuttGpKb2EFRtc0PausyxnXKsixraEljI2apefb-RyIioxapnP8xmQJOIP94nh-dCaDvw2O7EmomznOWVHXVGIfj_pNSiWDVKBcfB498PHcRvvbyraR7f7gZB1p97k8lWI/s3464/20210811_160614%257E2.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1809" data-original-width="3464" height="167" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTU_h9ZmEVwSuttGpKb2EFRtc0PausyxnXKsixraEljI2apefb-RyIioxapnP8xmQJOIP94nh-dCaDvw2O7EmomznOWVHXVGIfj_pNSiWDVKBcfB498PHcRvvbyraR7f7gZB1p97k8lWI/s320/20210811_160614%257E2.jpg" width="320" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgInnYNOqlNSs-qMgURaP-LU4qJl5MUoYf2tCmacHDvnyuAuBlBItAdLzbiD5Vzpf3xEcx8h-x_77OD8gooSl31HJM4LxYKAWVQa5kZYWK8hTspUMDJThlk2YMR6RH4h5olOm1kS7oVfPk/s2607/20210811_160625%257E2.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="2059" data-original-width="2607" height="253" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgInnYNOqlNSs-qMgURaP-LU4qJl5MUoYf2tCmacHDvnyuAuBlBItAdLzbiD5Vzpf3xEcx8h-x_77OD8gooSl31HJM4LxYKAWVQa5kZYWK8hTspUMDJThlk2YMR6RH4h5olOm1kS7oVfPk/s320/20210811_160625%257E2.jpg" width="320" /></a></div><p style="clear: both; text-align: left;">On the next day, I rescued the phone and checked the state: The gap was gone! Hurraaa! A quick test with the proximity sensor test app also showed: All went back to normal. </p><p style="clear: both; text-align: left;">I used the phone a few days and had no problem with the sensor anmore. All phone calls behaved correctly and the screen only went dark when I put the phone close to my ear.</p><h3 style="clear: both; text-align: left;">Time goes by...</h3><p style="clear: both;">Unfortunately, after a week or so, the gap started to open again, so the Brockhaus fix was only temporarily. I repeated the same fix again, just to verify that it is reproducible. As expected: #PixelGapGate and proximity sensor issue are directly related. Searching on Reddit also found the following: <a href="https://www.reddit.com/r/GooglePixel/comments/kuhm8g/the_gap_gate_the_lie_of_google/">"The Gap (gate). The Lie of Google."</a></p><p style="clear: both;">I contacted Google Support again, they still offered to replace my phone, but at the moment they are checking if they can replace the phone by sending the new one first<i> (like Amazon normally does)</i>.</p><p style="clear: both;">To conclude everything: <b>Sorry Google, the Gap Gate is a real issue! It is a lie that it does not affect users. Hundreds of users have a gap and hundreds of users have a screen turning black when they make a phone call.</b> This is no good customer support to just say: "Hey all is fine, don't worry!"</p><p style="clear: both;">You should offer all people an easy replacement of their phone without any bureaucracy. But very important: Before doing that, fix your design! My phone was built (according to serial number) in March, which is 5 months after the first few people recognized the gaps! I would not have expected that a phone manufactured 5 months later to still have the issue.</p><p style="clear: both;"><span style="color: #cc0000;"><b>So please Google: Do something! The Pixel 5 is a great phone, but Gap Gate and Proximity Sensor is a pain! It's easy to fix by repacing the glue, just teach your manufacturer to fix their production!</b></span></p><p style="clear: both;"><span style="color: #cc0000;"><i><b>Update (2021-08-14; may apply to German customers only):</b> Unfortunately,</i><i> Google is not able to replace a Pixel 5 phone by sending the new one first and waiting for the return (like Amazon), if it was not bought through their web store. <b>So please see this as a warning:</b> Never ever buy a Google Pixel phone or other Google hardware through a local retailer or a web store like Amazon from a third party! If you buy it in the Google Store, they will offer you a direct exchange without having to send in the old phone in first, so you can get the replacement phone first and copy the data e.g. thorugh the USB cable. This would have been important for me as a diabetic, because I use the phone as my blood sugar sensor. Now I have to copy the data 2 times (broken phone => very old Samsung phone or my laptop => new phone).</i></span></p><p></p>Uwe Schindlerhttp://www.blogger.com/profile/08079070589736993766noreply@blogger.com291tag:blogger.com,1999:blog-1353896202392192412.post-19744860667962904862012-07-18T22:54:00.001+02:002012-07-19T15:44:45.252+02:00Use Lucene’s MMapDirectory on 64bit platforms, please!<h2>
Don’t be afraid – Some clarification to common misunderstandings</h2>
<div>
Since version 3.1, <b>Apache Lucene</b> and <b>Solr </b>use <span style="font-family: 'Courier New', Courier, monospace;">MMapDirectory</span> by default on 64bit Windows and Solaris systems; since version 3.3 also for 64bit Linux systems. This change lead to some confusion among Lucene and Solr users, because suddenly their systems started to behave differently than in previous versions. On the Lucene and Solr mailing lists a lot of posts arrived from users asking why their Java installation is suddenly consuming three times their physical memory or system administrators complaining about heavy resource usage. Also consultants were starting to tell people that they should <b>not </b>use <span style="font-family: 'Courier New', Courier, monospace;">MMapDirectory </span>and change their solrconfig.xml to work instead with slow <span style="font-family: 'Courier New', Courier, monospace;">SimpleFSDirectory </span>or <span style="font-family: 'Courier New', Courier, monospace;">NIOFSDirectory </span>(which is much slower on Windows, caused by a JVM bug <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6265734">#6265734</a>). From the point of view of the Lucene committers, who carefully decided that using <span style="font-family: 'Courier New', Courier, monospace;">MMapDirectory </span>is the best for those platforms, this is rather annoying, because they know, that Lucene/Solr can work with much better performance than before. Common misinformation about the background of this change causes suboptimal installations of this great search engine everywhere.<br />
<br />
In this blog post, I will try to explain the basic operating system facts regarding virtual memory handling in the kernel and how this can be used to largely improve performance of Lucene<i> (“VIRTUAL MEMORY for DUMMIES”)</i>. It will also clarify why the blog and mailing list posts done by various people are wrong and contradict the purpose of <span style="font-family: 'Courier New', Courier, monospace;">MMapDirectory</span>. In the second part I will show you some configuration details and settings you should take care of to prevent errors like “mmap failed” and suboptimal performance because of stupid Java heap allocation.<br />
<br />
<h3>
Virtual Memory<sup><a href="http://en.wikipedia.org/wiki/Virtual_memory">[1]</a></sup></h3>
</div>
<div>
Let’s start with your operating system’s kernel: The naive approach to do I/O in software is the way, you have done this since the 1970s – the pattern is simple: whenever you have to work with data on disk, you execute a <i>syscall </i>to your operating system kernel, passing a pointer to some buffer (e.g. a <span style="font-family: 'Courier New', Courier, monospace;">byte[]</span> array in Java) and transfer some bytes from/to disk. After that you parse the buffer contents and do your program logic. If you don’t want to do too many syscalls (because those may cost a lot processing power), you generally use large buffers in your software, so synchronizing the data in the buffer with your disk needs to be done less often. This is one reason, why some people suggest to load the whole Lucene index into Java heap memory (e.g., by using <span style="font-family: 'Courier New', Courier, monospace;">RAMDirectory</span>).<br />
<br />
But all modern operating systems like Linux, Windows (NT+), MacOS X, or Solaris provide a much better approach to do this 1970s style of code by using their sophisticated file system caches and memory management features. A feature called <i>“virtual memory”</i> is a good alternative to handle very large and space intensive data structures like a Lucene index. Virtual memory is an integral part of a computer architecture; implementations require hardware support, typically in the form of a <i>memory management unit (MMU)</i> built into the CPU. The way how it works is very simple: Every process gets his own virtual address space where all libraries, heap and stack space is mapped into. This address space in most cases also start at offset zero, which simplifies loading the program code because no relocation of address pointers needs to be done. Every process sees a large unfragmented linear address space it can work on. It is called “virtual memory” because this address space has nothing to do with physical memory, it just looks like so to the process. Software can then access this large address space as if it were real memory without knowing that there are other processes also consuming memory and having their own virtual address space. The underlying operating system works together with the MMU (memory management unit) in the CPU to map those virtual addresses to real memory once they are accessed for the first time. This is done using so called page tables, which are backed by <i>TLBs</i> located in the MMU hardware <i>(translation lookaside buffers, they cache frequently accessed pages)</i>. By this, the operating system is able to distribute all running processes’ memory requirements to the real available memory, completely transparent to the running programs.<br />
<br /></div>
<div>
<br /></div>
<div>
<div style="text-align: center;">
<img src="https://lh4.googleusercontent.com/NltTm3thAeSa7BCj26dHUUL5or63nCqowKUGXd8QecT0NEOymEnL5ypQyFQwM-juSgwlHg3f75Im0ncbwS74NWJl8qL5DRoGENy-aZH-KnSf3WFCFZs" /></div>
<div style="text-align: center;">
<span style="font-size: x-small;">Schematic drawing of virtual memory</span></div>
<div style="text-align: center;">
<span style="font-size: x-small;"><i>(image from Wikipedia <a href="http://en.wikipedia.org/wiki/Virtual_memory">[1]</a>, <a href="http://en.wikipedia.org/wiki/File:Virtual_memory.svg">http://en.wikipedia.org/wiki/File:Virtual_memory.svg</a>, licensed by CC BY-SA 3.0)</i></span></div>
</div>
<div>
<br />
By using this virtualization, there is one more thing, the operating system can do: If there is not enough physical memory, it can decide to “swap out” pages no longer used by the processes, freeing physical memory for other processes or caching more important file system operations. Once a process tries to access a virtual address, which was paged out, it is reloaded to main memory and made available to the process. The process does not have to do anything, it is completely transparent. This is a good thing to applications because they don’t need to know anything about the amount of memory available; but also leads to problems for very memory intensive applications like Lucene.<br />
<br />
<h3>
Lucene & Virtual Memory</h3>
</div>
<div>
Let’s take the example of loading the whole index or large parts of it into “memory” <i>(we already know, it is only virtual memory)</i>. If we allocate a <span style="font-family: 'Courier New', Courier, monospace;">RAMDirectory </span>and load all index files into it, we are working against the operating system: The operating system tries to optimize disk accesses, so it caches already all disk I/O in physical memory. We copy all these cache contents into our own virtual address space, consuming horrible amounts of physical memory (and we must wait for the copy operation to take place!). <b>As physical memory is limited, the operating system may, of course, decide to swap out our large <span style="font-family: 'Courier New', Courier, monospace;">RAMDirectory </span>and where does it land? – On disk again (in the OS swap file)!</b> In fact, we are fighting against our O/S kernel who pages out all stuff we loaded from disk <a href="https://www.varnish-cache.org/trac/wiki/ArchitectNotes">[2]</a>. So <span style="font-family: 'Courier New', Courier, monospace;">RAMDirectory </span>is not a good idea to optimize index loading times! Additionally, <span style="font-family: 'Courier New', Courier, monospace;">RAMDirectory </span>has also more problems related to garbage collection and concurrency. Because the data residing in swap space, Java’s garbage collector has a hard job to free the memory in its own heap management. This leads to high disk I/O, slow index access times, and minute-long latency in your searching code caused by the garbage collector driving crazy.<br />
<br />
On the other hand, if we don’t use <span style="font-family: 'Courier New', Courier, monospace;">RAMDirectory </span>to buffer our index and use <span style="font-family: 'Courier New', Courier, monospace;">NIOFSDirectory </span>or <span style="font-family: 'Courier New', Courier, monospace;">SimpleFSDirectory</span>, we have to pay another price: Our code has to do a lot of syscalls to the O/S kernel to copy blocks of data between the disk or filesystem cache and our buffers residing in Java heap. This needs to be done on every search request, over and over again.<br />
<br />
<h3>
Memory Mapping Files</h3>
</div>
<div>
The solution to the above issues is <span style="font-family: 'Courier New', Courier, monospace;">MMapDirectory</span>, which uses virtual memory and a kernel feature called “mmap” <a href="http://en.wikipedia.org/wiki/Memory-mapped_file">[3]</a> to access the disk files.<br />
<br />
<div style="text-align: center;">
<span style="color: #444444;">In our previous approaches, we were relying on using a syscall to <b>copy </b>the data between the file system cache and our local Java heap. How about <b>directly accessing</b> the file system cache? This is what mmap does!</span></div>
<br />
Basically mmap does the same like handling the Lucene index as a swap file. The <span style="font-family: 'Courier New', Courier, monospace;">mmap()</span> syscall tells the O/S kernel to virtually map our whole index files into the previously described virtual address space, and make them look like RAM available to our Lucene process. We can then access our index file on disk just like it would be a large <span style="font-family: 'Courier New', Courier, monospace;">byte[]</span> array (in Java this is encapsulated by a <span style="font-family: 'Courier New', Courier, monospace;">ByteBuffer </span>interface to make it safe for use by Java code). If we access this virtual address space from the Lucene code we don’t need to do any syscalls, the processor’s MMU and TLB handles all the mapping for us. If the data is only on disk, the MMU will cause an interrupt and the O/S kernel will load the data into file system cache. If it is already in cache, MMU/TLB map it directly to the physical memory in file system cache. It is now just a native memory access, nothing more! We don’t have to take care of paging in/out of buffers, all this is managed by the O/S kernel. Furthermore, we have no concurrency issue, the only overhead over a standard <span style="font-family: 'Courier New', Courier, monospace;">byte[] </span>array is some wrapping caused by Java’s <span style="font-family: 'Courier New', Courier, monospace;">ByteBuffer </span>interface (it is still slower than a real <span style="font-family: 'Courier New', Courier, monospace;">byte[]</span> array, but that is the only way to use mmap from Java and is much faster than all other directory implementations shipped with Lucene). We also waste no physical memory, as we operate directly on the O/S cache, avoiding all Java GC issues described before.<br />
<br />
<i>What does this all mean to our Lucene/Solr application?</i><br />
<ul>
<li><b>We should not work against the operating system anymore, so allocate as less as possible heap space (<span style="font-family: 'Courier New', Courier, monospace;">-Xmx</span> Java option).</b> Remember, our index accesses rely on passed directly to O/S cache! This is also very friendly to the Java garbage collector.</li>
<li><b>Free as much as possible physical memory to be available for the O/S kernel as file system cache. </b>Remember, our Lucene code works directly on it, so reducing the number of <i>paging/swapping</i> between disk and memory. Allocating too much heap to our Lucene application hurts performance! Lucene does not require it with <span style="font-family: 'Courier New', Courier, monospace;">MMapDirectory</span>.</li>
</ul>
<br />
<ul>
</ul>
<h3>
Why does this only work as expected on operating systems and Java virtual machines with 64bit?</h3>
</div>
<div>
One limitation of 32bit platforms is the size of pointers, they can refer to any address within 0 and 2<sup>32</sup>-1, which is 4 Gigabytes. Most operating systems limit that address space to 3 Gigabytes because the remaining address space is reserved for use by device hardware and similar things. This means the overall linear address space provided to any process is limited to 3 Gigabytes, so you cannot map any file larger than that into this “small” address space to be available as big <span style="font-family: 'Courier New', Courier, monospace;">byte[]</span> array. And when you mapped that one large file, there is no virtual space (address like “house number”) available anymore. As physical memory sizes in current systems already have gone beyond that size, there is no address space available to make use for mapping files without wasting resources<i> (in our case “address space”, not physical memory!)</i>.<br />
<br />
On 64bit platforms this is different: 2<sup>64</sup>-1 is a very large number, a number in excess of 18 quintillion bytes, so there is no real limit in address space. Unfortunately, most hardware (the MMU, CPU’s bus system) and operating systems are limiting this address space to 47 bits for user mode applications (Windows: 43 bits) <a href="http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details">[4]</a>. But there is still much of addressing space available to map terabytes of data.<br />
<br />
<h3>
Common misunderstandings</h3>
</div>
<div>
If you have read carefully what I have told you about virtual memory, you can easily verify that the following is true:<br />
<ul>
<li><b><span style="font-family: 'Courier New', Courier, monospace;">MMapDirectory </span>does not consume additional memory and the size of mapped index files is not limited by the physical memory available on your server.</b> By mmap() files, we only reserve address space not memory! Remember, address space on 64bit platforms is for free!</li>
<li><b><span style="font-family: 'Courier New', Courier, monospace;">MMapDirectory </span>will not load the whole index into physical memory.</b> Why should it do this? We just ask the operating system to map the file into address space for easy access, by no means we are requesting more. Java and the O/S optionally provide the option to try loading the whole file into RAM (if enough is available), but Lucene does not use that option (we may add this possibility in a later version).</li>
<li><b><span style="font-family: 'Courier New', Courier, monospace;">MMapDirectory </span>does not overload the server when “top” reports horrible amounts of memory.</b> “top” (on Linux) has three columns related to memory: “VIRT”, “RES”, and “SHR”. The first one (VIRT, virtual) is reporting allocated virtual address space (and that one is for free on 64 bit platforms!). This number can be multiple times of your index size or physical memory when merges are running in <span style="font-family: 'Courier New', Courier, monospace;">IndexWriter</span>. If you have only one <span style="font-family: 'Courier New', Courier, monospace;">IndexReader </span>open it should be approximately equal to allocated heap space (<span style="font-family: 'Courier New', Courier, monospace;">-Xmx</span>) plus index size. It does not show physical memory used by the process. The second column (RES, resident) memory shows how much (physical) memory the process allocated for operating and should be in the size of your Java heap space. The last column (SHR, shared) shows how much of the allocated virtual address space is shared with other processes. If you have several Java applications using <span style="font-family: 'Courier New', Courier, monospace;">MMapDirectory </span>to access the same index, you will see this number going up. Generally, you will see the space needed by shared system libraries, JAR files, and the process executable itself (which are also mmapped).</li>
</ul>
<br />
<ul>
</ul>
<h3>
How to configure my operating system and Java VM to make optimal use of MMapDirectory?</h3>
</div>
<div>
First of all, default settings in Linux distributions and Solaris/Windows are perfectly fine. But there are some paranoid system administrators around, that want to control everything (with lack of understanding). Those limit the maximum amount of virtual address space that can be allocated by applications. So please check that “<span style="font-family: 'Courier New', Courier, monospace;">ulimit -v</span>” and “<span style="font-family: 'Courier New', Courier, monospace;">ulimit -m</span>” both report “<span style="font-family: 'Courier New', Courier, monospace;">unlimited</span>”, otherwise it may happen that <span style="font-family: 'Courier New', Courier, monospace;">MMapDirectory </span>reports <i>“mmap failed”</i> while opening your index. If this error still happens on systems with lot’s of very large indexes, each of those with many segments, you may need to tune your kernel parameters in <span style="font-family: 'Courier New', Courier, monospace;">/etc/sysctl.conf</span>: The default value of <span style="font-family: 'Courier New', Courier, monospace;">vm.max_map_count</span> is 65530, you may need to raise it. I think, for Windows and Solaris systems there are similar settings available, but it is up to the reader to find out how to use them.<br />
<br />
For configuring your Java VM, you should rethink your memory requirements: Give only the really needed amount of heap space and leave as much as possible to the O/S. As a rule of thumb: Don’t use more than ¼ of your physical memory as heap space for Java running Lucene/Solr, keep the remaining memory free for the operating system cache. If you have more applications running on your server, adjust accordingly. As usual the more physical memory the better, but you don’t need as much physical memory as your index size. The kernel does a good job in paging in frequently used pages from your index.<br />
<br />
A good possibility to check that you have configured your system optimally is by looking at both "top"<i> (and correctly interpreting it, see above)</i> and the similar command "<a href="http://guichaz.free.fr/iotop/">iotop</a>" (can be installed, e.g., on Ubuntu Linux by "<span style="font-family: 'Courier New', Courier, monospace;">apt-get install iotop</span>"). If your system does lots of swap in/swap out for the Lucene process, reduce heap size, you possibly used too much. If you see lot's of disk I/O, buy more RUM (<a href="http://mail-archives.apache.org/mod_mbox/lucene-java-user/201207.mbox/%3CCAAHmpkhJ7KU3X0wm2VHwDkO0UZd%3D6%2Behh0qWzpzw-WdFvB%2BQ_A%40mail.gmail.com%3E">Simon Willnauer</a>) so mmapped files don't need to be paged in/out all the time, and finally: <a href="http://www.youtube.com/watch?v=H7PJ1oeEyGg">buy SSDs</a>.<br />
<br />
<i>Happy mmapping!</i><br />
<br />
<h3>
Bibliography</h3>
</div>
<div>
[1] <a href="http://en.wikipedia.org/wiki/Virtual_memory">http://en.wikipedia.org/wiki/Virtual_memory</a><br />
[2] <a href="https://www.varnish-cache.org/trac/wiki/ArchitectNotes">https://www.varnish-cache.org/trac/wiki/ArchitectNotes</a><br />
[3] <a href="http://en.wikipedia.org/wiki/Memory-mapped_file">http://en.wikipedia.org/wiki/Memory-mapped_file</a><br />
[4] <a href="http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details" style="background-color: white;">http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details</a></div>Uwe Schindlerhttp://www.blogger.com/profile/08079070589736993766noreply@blogger.com105H.-H.-Meier-Allee 63, 28213 Bremen, Deutschland53.0933332 8.84844689999999933.379500699999994 -31.5812406 72.8071657 49.2781344tag:blogger.com,1999:blog-1353896202392192412.post-53896732831699076462012-07-11T22:36:00.002+02:002015-03-14T23:46:13.622+01:00The Policeman’s Horror: Default Locales, Default Charsets, and Default Timezones<h3>
Time for a tool to prevent any effects coming from them!</h3>
<div>
Did you ever try to run software downloaded from the net on a computer with Turkish locale? I think most of you never did that. And if you ask Turkish IT specialists, they will tell you: “It is better to configure your computer using any other locale, but not <span style="font-family: 'Courier New', Courier, monospace;">tr_TR</span>”. I think you have no clue what I am talking about? Maybe this article gives you a hint: “<a href="http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail">A Cellphone’s Missing Dot Kills Two People, Puts Three More in Jail</a>”.<br />
<br />
What you see in lots of software is a so-called case-insensitive matching of keywords like parameter names or function names. This is implemented in most cases by lowercasing or upper-casing the input text and compare it with a list of already lowercased/uppercased items. This works in most cases fine, if you are anywhere in the world, except Turkey! Because most programmers don’t care about running their software in Turkey, they do not test their software under the Turkish locale.</div>
<div>
<br /></div>
<h3>
But what happens with the case-insensitive matching if running in Turkey? Let’s take an example:</h3>
<div>
User enters “BILLY” in the search field of you application. The application then uses the approach presented before and lower-cases “BILLY” and then compares it to an internal table (e.g. our search index, parameter table, function table,...). So we search in this table for “billy”. So far so good, works perfect in USA, Germany, Kenia, almost everywhere - except Turkey. What happens in the Turkish locale when we lowercase “BILLY”? After reading the above article, you might expect it: The <span style="font-family: 'Courier New', Courier, monospace;">“BILLY”.toLowerCase()</span> statement in Java returns “bılly” (note the dot-less i: 'ı' <i>U+0131)</i>. You can try this out on your local machine without reconfiguring it to use the Turkish locale, just try the following Java code:</div>
<pre style="background: #f0f0f0; border: 1px dashed #CCCCCC; color: black; font-family: arial; font-size: 8pt; height: auto; overflow: auto; padding: 5px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;">assertEquals(“bılly”, “BILLY”.toLowerCase(new Locale(“tr”,“TR”)));</code></pre>
<div>
The same happens vice versa, if you uppercase a ‘i’, it gets I with dot (‘İ’ <i>U+0130)</i>. This is really serious, million lines of code out there in Java and other languages don’t take care that the <span style="font-family: 'Courier New', Courier, monospace;">String.toLowerCase()</span> and <span style="font-family: 'Courier New', Courier, monospace;">String.toUpperCase()</span> methods can optionally take a defined Locale (more about that later). Some examples from projects I am involved in:<br />
<br />
<ul>
<li>Try to run an XSLT stylesheet using <a href="http://xml.apache.org/xalan-j/">Apache XALAN-XSLTC</a> (or Java 5’s internal XSLT interpreter) in the Turkish locale. It will fail with “unknown instruction”, because XALAN-XSLTC compiles the XSLT to Java Bytecode and somehow lowercases a virtual machine opcode before compiling it with BCEL (see <a href="https://issues.apache.org/jira/browse/XALANJ-2420">XALANJ-2420</a>, <a href="https://issues.apache.org/bugzilla/show_bug.cgi?id=38787">BCEL bug #38787</a>).</li>
<li>The HTML SAX parser <a href="http://nekohtml.sourceforge.net/">NekoHTML</a> uses locale-less uppercasing/lowercasing to normalize charset names and element names. I opened a bug report (issue <a href="https://sourceforge.net/tracker/?func=detail&aid=3544334&group_id=195122&atid=952178">#3544334</a>).</li>
<li>If you use <a href="http://www.php.net/">PHP</a> as your favourite scripting language, which is not case sensitive for class names and other language constructs, it will throw a compile error once you try to call a function with an “i” in it (see <a href="https://bugs.php.net/bug.php?id=18556">PHP bug #18556</a>). Unfortunately it is unlikely that this serious bug is fixed in PHP 5.3 or 5.4!</li>
</ul>
</div>
<div>
<br />
<h3>
The question is now: How to solve this?</h3>
The most correct way to do this is to not lowercase at all! For comparing case insensitive, Unicode defines “case folding”, which is a so-called canonical form of text where all upper/lower case of any character is normalized away. Unfortunately this case folded text may no longer be readable text (this depends on the implementation, but in most cases it is). It just ensures, that case-folded text can be compared to each other in a case-insensitive way. Unfortunately Java does not offer you a function to get this string, but <b>ICU-4J</b> can do (see<span style="font-family: 'Courier New', Courier, monospace;"> <a href="http://icu-project.org/apiref/icu4j/com/ibm/icu/lang/UCharacter.html#foldCase(java.lang.String,%20boolean)">UCharacter#foldCase</a></span>). But Java offers something much better: <span style="font-family: 'Courier New', Courier, monospace;">String.equalsIgnoreCase(String)</span>, which internally handles case folding! But in lots of cases you cannot use this fantastic method, because you want to lookup such strings in a <span style="font-family: 'Courier New', Courier, monospace;">HashMap </span>or other dictionary. Without modifying <span style="font-family: 'Courier New', Courier, monospace;">HashMap </span>to use <span style="font-family: 'Courier New', Courier, monospace;">equalsIgnoreCase</span>, this would never work. So we are back at lower-casing! As mentioned before, you can pass a locale to <span style="font-family: 'Courier New', Courier, monospace;">String.toLowerCase()</span>, so the naive approach would be to tell Java, that we are in the US or using the English language: <span style="font-family: 'Courier New', Courier, monospace;">String.toLowerCase(Locale.US)</span> or <span style="font-family: 'Courier New', Courier, monospace;">String.toLowerCase(Locale.ENGLISH)</span>. This produces identical results but is still not consistent. What happens if the US government decides to lowercase/uppercase like in Turkey? -- OK, don’t use <span style="font-family: 'Courier New', Courier, monospace;">Locale.US</span> (this is also too US-centric). <span style="font-family: 'Courier New', Courier, monospace;">Locale.ENGLISH</span> is fine and very generic, but languages also change over the years (who knows?), but we want to have it language invariant! If you are using Java 6, there is a much better constant: <span style="font-family: 'Courier New', Courier, monospace;">Locale.ROOT</span> -- You should use this constant for our lowercase example: <span style="font-family: 'Courier New', Courier, monospace;">String.toLowerCase(Locale.ROOT)</span><span style="font-family: inherit;">.</span><br />
<blockquote class="tr_bq">
<i>You should start now and do a global search/replace on all your Java projects (if you do not rely on language specific presentation of text)! <b>REALLY! </b></i></blockquote>
<span style="font-family: 'Courier New', Courier, monospace;">String.toLowerCase</span> is not the only example of “automatic default locale usage” in the Java API. There are also things like transforming dates or numbers to strings. If you use the <span style="font-family: 'Courier New', Courier, monospace;">Formatter </span>class, and you run it somewhere in another country, <span style="font-family: 'Courier New', Courier, monospace;">String.format(“%f”, 15.5f)</span> may not always use a period (‘.’) as decimal separator; most Germans will know this. Passing a specific locale here helps in most cases. If you are writing a GUI in English language, pass <span style="font-family: 'Courier New', Courier, monospace;">Locale.ENGLISH</span> everywhere, otherwise text output of numbers or dates may not match the language of your GUI! If you want <span style="font-family: 'Courier New', Courier, monospace;">Formatter </span>to behave in a invariant way, use <span style="font-family: 'Courier New', Courier, monospace;">Locale.ROOT</span>, too (then it will for sure format numbers with period and no comma for thousands, just like <span style="font-family: 'Courier New', Courier, monospace;">Float.toString(float)</span> does).<br />
<br />
A second problem affecting lot’s of software are two other system-wide configurable default settings: default charset/encoding and timezone. If you open a text file with <span style="font-family: 'Courier New', Courier, monospace;">FileReader </span>or convert an <span style="font-family: 'Courier New', Courier, monospace;">InputStream</span> to a <span style="font-family: 'Courier New', Courier, monospace;">Reader </span>with <span style="font-family: 'Courier New', Courier, monospace;">InputStreamReader</span>, Java assumes automatically, that the input is in the default platform encoding. This may be fine, if you want the text to be parsed by the defaults of the operating system -- but if you pass a text file together with your software package (maybe as resource in your JAR file) and then accidentally read it using the platform’s default charset... it’ll break your app! So my second recommendation:</div>
<div>
<blockquote class="tr_bq">
<i>Always pass a character set to any method converting bytes to strings (like <span style="font-family: 'Courier New', Courier, monospace;">InputStream </span><=> <span style="font-family: 'Courier New', Courier, monospace;">Reader</span>, <span style="font-family: 'Courier New', Courier, monospace;">String.getBytes()</span>,...). If you wrote the text file and ship it together with your app, only you know its encoding!</i></blockquote>
For timezones, similar examples can be found.</div>
<br />
<h3>
How this affects Apache Lucene!</h3>
<div>
<a href="http://lucene.apache.org/core/">Apache Lucene</a> is a full-text search engine and deals with text from different languages all the time; <a href="http://lucene.apache.org/solr/">Apache Solr</a> is a enterprise search server on top of Lucene and deals with input documents in lots of different charsets and languages. It is therefore essential for a search library like Lucene to be as most independent from local machine settings as possible. A library must make it explicit what input it wants to have. So we require charsets and locales in all public and private APIs (or we only take e.g. <span style="font-family: 'Courier New', Courier, monospace;">java.io.Reader</span> instead of <span style="font-family: 'Courier New', Courier, monospace;">InputStream </span>if we expect text coming in), so the user must take care.<br />
<br />
<a href="http://www.lucidimagination.com/blog/author/robert-muir/">Robert Muir</a> and I reviewed the source code of Apache Lucene and Solr for the <a href="http://www.heise.de/developer/meldung/Lucene-4-mit-Indizierungs-Plug-ins-1631859.html">coming version 4.0</a> (an alpha version is already <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/4.0.0-ALPHA">available</a> on Lucene’s homepage, documentation is <a href="http://lucene.apache.org/core/4_0_0-ALPHA/index.html">here</a>). We did this quite often, but whenever a new piece of code is committed to the source tree, it may happen that undefined locales, charsets, or similar things appear again. In most cases it is not the fault of the committer, this happens because auto-complete in IDE automatically lists possible methods and parameters to the developer. Often you select the easiest variant (like <span style="font-family: 'Courier New', Courier, monospace;">String.toLowerCase()</span>).<br />
<br />
<blockquote class="tr_bq">
<b>Using default locales, charsets and timezones are in my opinion a big design issue in programming languages like Java. If there are locale-sensitive methods, those methods should take a locale, if you convert a <span style="font-family: 'Courier New', Courier, monospace;">byte[]</span> stream to a <span style="font-family: 'Courier New', Courier, monospace;">char[]</span> stream, a charset must be given. Automatically falling back to defaults is a no-go in the server environment. </b></blockquote>
<blockquote class="tr_bq">
<i>If a developer is interested in using the default locale of the user’s computer, he can always explicitely give the locale or charset. In our example this would be </i><span style="font-family: 'Courier New', Courier, monospace;">String.toLowerCase(Locale.getDefault())</span><i>. This is more verbose, but it is obvious what the developer intends to do.</i></blockquote>
<br />
<h3 style="text-align: center;">
<span style="color: red;">My proposal is to ban all those default charset and locale methods / classes in the Java API by deprecating them as soon as possible, so users stop using them implicit!</span></h3>
<br />
Robert’s and my intention is to automatically fail the nightly builds (or compilation on the developer’s machine) when somebody uses one of the above methods in Lucene’s or Solr’s source code. We looked at different solutions like <b>PMD </b>or <b>FindBugs</b>, but both tools are too sloppy to handle that in a consistent way <i>(PMD does not have any “default charset” method detection and Findbugs has only a very short list of method signatures)</i>. In addition, both PMD and FindBugs are very slow and often fail to correctly detect all problems. For Lucene builds we only need a tool, that looks into the byte code of all generated Java classes of Apache Lucene and Solr, and fails the build if any signature that violates our requirements is found.</div>
<br />
<h3>
A new Tool for the Policeman</h3>
<div>
I started to hack a tool as a custom ANT task using <a href="http://asm.ow2.org/">ASM 4.0</a> <i>(Lightweight Java Bytecode Manipulation Framework)</i>. The idea was to provide a list of methods signatures, field names and plain class names that should fail the build, once bytecode accesses it in any way. A first version of this task was published in issue <a href="https://issues.apache.org/jira/browse/LUCENE-4199">LUCENE-4199</a>, later improvements was to add support for fields (<a href="https://issues.apache.org/jira/browse/LUCENE-4202">LUCENE-4202</a>) and a sophisticated signature expansion to also catch calls to subclasses of the given signatures (<a href="https://issues.apache.org/jira/browse/LUCENE-4206">LUCENE-4206</a>).<br />
<br />
In the meantime, Robert worked on the list of “forbidden” APIs. This is what came out in the first version:</div>
<pre style="background: #f0f0f0; border: 1px dashed #CCCCCC; color: black; font-family: arial; font-size: 8pt; height: auto; overflow: auto; padding: 5px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;">java.lang.String#<init>(byte[])
java.lang.String#<init>(byte[],int)
java.lang.String#<init>(byte[],int,int)
java.lang.String#<init>(byte[],int,int,int)
java.lang.String#getBytes()
java.lang.String#getBytes(int,int,byte[],int)
java.lang.String#toLowerCase()
java.lang.String#toUpperCase()
java.lang.String#format(java.lang.String,java.lang.Object[])
java.io.FileReader
java.io.FileWriter
java.io.ByteArrayOutputStream#toString()
java.io.InputStreamReader#<init>(java.io.InputStream)
java.io.OutputStreamWriter#<init>(java.io.OutputStream)
java.io.PrintStream#<init>(java.io.File)
java.io.PrintStream#<init>(java.io.OutputStream)
java.io.PrintStream#<init>(java.io.OutputStream,boolean)
java.io.PrintStream#<init>(java.lang.String)
java.io.PrintWriter#<init>(java.io.File)
java.io.PrintWriter#<init>(java.io.OutputStream)
java.io.PrintWriter#<init>(java.io.OutputStream,boolean)
java.io.PrintWriter#<init>(java.lang.String)
java.io.PrintWriter#format(java.lang.String,java.lang.Object[])
java.io.PrintWriter#printf(java.lang.String,java.lang.Object[])
java.nio.charset.Charset#displayName()
java.text.BreakIterator#getCharacterInstance()
java.text.BreakIterator#getLineInstance()
java.text.BreakIterator#getSentenceInstance()
java.text.BreakIterator#getWordInstance()
java.text.Collator#getInstance()
java.text.DateFormat#getTimeInstance()
java.text.DateFormat#getTimeInstance(int)
java.text.DateFormat#getDateInstance()
java.text.DateFormat#getDateInstance(int)
java.text.DateFormat#getDateTimeInstance()
java.text.DateFormat#getDateTimeInstance(int,int)
java.text.DateFormat#getInstance()
java.text.DateFormatSymbols#<init>()
java.text.DateFormatSymbols#getInstance()
java.text.DecimalFormat#<init>()
java.text.DecimalFormat#<init>(java.lang.String)
java.text.DecimalFormatSymbols#<init>()
java.text.DecimalFormatSymbols#getInstance()
java.text.MessageFormat#<init>(java.lang.String)
java.text.NumberFormat#getInstance()
java.text.NumberFormat#getNumberInstance()
java.text.NumberFormat#getIntegerInstance()
java.text.NumberFormat#getCurrencyInstance()
java.text.NumberFormat#getPercentInstance()
java.text.SimpleDateFormat#<init>()
java.text.SimpleDateFormat#<init>(java.lang.String)
java.util.Calendar#<init>()
java.util.Calendar#getInstance()
java.util.Calendar#getInstance(java.util.Locale)
java.util.Calendar#getInstance(java.util.TimeZone)
java.util.Currency#getSymbol()
java.util.GregorianCalendar#<init>()
java.util.GregorianCalendar#<init>(int,int,int)
java.util.GregorianCalendar#<init>(int,int,int,int,int)
java.util.GregorianCalendar#<init>(int,int,int,int,int,int)
java.util.GregorianCalendar#<init>(java.util.Locale)
java.util.GregorianCalendar#<init>(java.util.TimeZone)
java.util.Scanner#<init>(java.io.InputStream)
java.util.Scanner#<init>(java.io.File)
java.util.Scanner#<init>(java.nio.channels.ReadableByteChannel)
java.util.Formatter#<init>()
java.util.Formatter#<init>(java.lang.Appendable)
java.util.Formatter#<init>(java.io.File)
java.util.Formatter#<init>(java.io.File,java.lang.String)
java.util.Formatter#<init>(java.io.OutputStream)
java.util.Formatter#<init>(java.io.OutputStream,java.lang.String)
java.util.Formatter#<init>(java.io.PrintStream)
java.util.Formatter#<init>(java.lang.String)
java.util.Formatter#<init>(java.lang.String,java.lang.String)
</code></pre>
<div>
Using this easily extend-able list, saved in a text file <i>(UTF-8 encoded!)</i>, you can invoke my new ANT task (after registering it with <span style="font-family: 'Courier New', Courier, monospace;"><taskdef/></span>) very easy -- taken from Lucene/Solr’s <span style="font-family: 'Courier New', Courier, monospace;">build.xml</span>:</div>
<pre style="background: #f0f0f0; border: 1px dashed #CCCCCC; color: black; font-family: arial; font-size: 8pt; height: auto; overflow: auto; padding: 5px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"><taskdef resource="lucene-solr.antlib.xml">
<classpath>
<pathelement location="${custom-tasks.dir}/build/classes/java" />
<fileset dir="${custom-tasks.dir}/lib" includes="asm-debug-all-4.0.jar" />
</classpath>
</taskdef>
<forbidden-apis>
<classpath refid="additional.dependencies"/>
<apiFileSet dir="${custom-tasks.dir}/forbiddenApis">
<include name="jdk.txt" />
<include name="jdk-deprecated.txt" />
<include name="commons-io.txt" />
</apiFileSet>
<fileset dir="${basedir}/build" includes="**/*.class" />
</forbidden-apis>
</code></pre>
<div>
The classpath given is used to look up the API signatures (provided as <span style="font-family: 'Courier New', Courier, monospace;">apiFileSet</span>). Classpath is only needed if signatures are coming from 3rd party libraries. The inner fileset should list all class files to be checked. For running the task you also need <span style="font-family: 'Courier New', Courier, monospace;">asm-all-4.0.jar</span> available in the task’s classpath.<br />
<br />
If you are interested, take the source code, it is open source and released as part of the tool set shipped with Apache Lucene & Solr: <a href="http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/tools/src/java/org/apache/lucene/validation/ForbiddenApisCheckTask.java?revision=1360240&view=markup">Source</a>, <a href="http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/tools/forbiddenApis/">API lists</a> (revision number 1360240).<br />
<br />
At the moment we are investigating other opportunities brought by that tool:<br />
<ul>
<li>We want to ban <span style="font-family: 'Courier New', Courier, monospace;">System.out/err</span> or things like horrible Eclipse-like <span style="font-family: 'Courier New', Courier, monospace;">try...catch...printStackTrace()</span> auto-generated Exception stubs. We can just ban those fields from the java.lang.System class and of course, <span style="font-family: 'Courier New', Courier, monospace;">Throwable#printStackTrace()</span>.</li>
<li>Using optimized Lucene-provided replacements for JDK API calls. This can be enforced by failing on the JDK signatures.</li>
<li>Failing the build on deprecated calls to Java’s API. We can of course print warnings for deprecations, but failing the build is better. And: We use deprecation annotations in Lucene’s own library code, so javac-generated warnings don’t help. We can use the list of deprecated stuff from JDK Javadocs to trigger the failures.</li>
</ul>
I hope other projects take a similar approach to scan their binary/source code and free it from system dependent API calls, which are not predictable for production systems in the server environment.<br />
<br />
<b>Thanks to Robert Muir and Dawid Weiss for help and suggestions!</b><br />
<b><br /></b>
<b>EDIT (2015-03-14): </b>On 2013-02-04, I released the plugin as Apache Ant, Apache Maven and CLI task on Google Code; later on 2015-03-14, it was migrated to Github. The project URL is: <a href="https://github.com/policeman-tools/forbidden-apis">https://github.com/policeman-tools/forbidden-apis</a>. The tool is available to your builds using Maven/Ivy through <a href="http://repo1.maven.org/maven2/de/thetaphi/forbiddenapis/">Maven Central</a> and <a href="http://oss.sonatype.org/content/repositories/releases/de/thetaphi/forbiddenapis/">Sonatype</a> repositories. Nightly snapshot builds are done by the <a href="http://jenkins.thetaphi.de/job/Forbidden-APIs/">Policeman Jenkins Server</a> and can be downloaded from the <a href="https://oss.sonatype.org/content/repositories/snapshots/de/thetaphi/forbiddenapis/">Sonatype Snapshot</a> repository.</div>
Uwe Schindlerhttp://www.blogger.com/profile/08079070589736993766noreply@blogger.com49H.-H.-Meier-Allee 63, 28213 Bremen, Deutschland53.0933 8.848455533.379467500000004 -31.581232 72.8071325 49.278143tag:blogger.com,1999:blog-1353896202392192412.post-28451775106107402632012-02-06T17:35:00.000+01:002012-02-07T09:16:33.844+01:00Is your IndexReader atomic? - Major IndexReader refactoring in Lucene 4.0<div>
<i><b>Note: </b>This blog post was <a href="http://www.searchworkings.org/blog/-/blogs/uwe-says%3A-is-your-reader-atomic">originally posted on the SearchWorkings website</a>.</i><br />
<br />
Since Day 1 Lucene exposed the two fundamental concepts of reading and writing an index directly through IndexReader & IndexWriter. However, the API didn’t reflect reality; from the IndexWriter perspective this was desirable but when reading the index this caused several problems in the past. In reality a Lucene index isn’t a single index while logically treated as a such. The latest developments in Lucene trunk try to expose reality for type-safety and performance, but before I go into details about Composite, Atomic and DirectoryReaders let me go back in time a bit.<br />
<br />
Since version 2.9 / 3.0 Lucene started to move away from executing searches directly on the top-level IndexReaders towards a per-segment orientation. As Simon Willnauer already explained in <a href="http://www.searchworkings.org/blog/-/blogs/simon-says%3A-optimize-is-bad-for-you">his blog</a> entry, this lead to fact that optimizing an index is no longer needed to optimize searching performance. In fact, optimizing would slow your searches down, as after optimizing, all file system and Lucene-internal index caches get invalidated.<br />
<br />
A standard Lucene index consists of several so-called segments, which are themselves fully-functional Lucene indexes. During indexing, Lucene writes new documents into separate segments and, once there are too many segments, they are merged. (see Mike McCandless’ blog: <a href="http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html">Visualizing Lucene's segment merges</a>):<br />
<br />
<img height="188" src="https://lh3.googleusercontent.com/l9LpIDzDdzN2g6NkX11vjHrIdUAN6deFp337sO_goajjNnXX3Df1LxFXxJg5BPBU1p7Xn_JjDLsHxdMHvoJ1D8Quw4PJPzQ0Io74_SmEu7jnyrbi0aI" width="400" /> <br />
<br />
Prior to Lucene 2.9, despite consisting of multiple underlying segments, the segments were treated as though they were a single big index. Since then, Lucene has shifted towards a per-segment orientation. By now almost all structures and components in Lucene operate on a per-segment basis; among others this means that Lucene only loads actual changes on reopen, instead of the entire index. From a users perspective it might still look like one big logical index but under the hood everything works per-segment like this (simplified) IndexSearcher snippet shows:</div>
<pre style="background: #f0f0f0; border: 1px dashed #CCCCCC; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> public void search(Weight weight, Collector collector) throws IOException {
// iterate through all segment readers & execute the search
for (int i = 0; i < subReaders.length; i++) {
// pass the reader to the collector
collector.setNextReader(subReaders[i], docStarts[i]);
final Scorer scorer = ...;
if (scorer != null) { // score documents on this segment
scorer.score(collector);
}
}
}
</code></pre><div>
However, the distinction between a logical index and a segment wasn’t consistently reflected in the code hierarchy. In Lucene 3.x, one could still execute searches on a top-level (logical) reader, without iterating over its subreaders. Doing so could slowdown your searches dramatically provided your index consisted of more than one segment. Among other reasons, this was why ancient versions of Lucene instructed users to optimize the index frequently.<br />
<br />
Let me explain the problem in a little more detail. An IndexReader on top of a Directory is internally a MultiReader on all enclosing SegmentReaders. If you ask a MultiReader for a TermEnum or the Postings it executes an on-the-fly merge all of all subreader’s terms or postings data respectively. This merge process uses priority queues or related data structures leading to a serious slowdown depending on the number of subreaders. <br />
<br />
Yet, even beyond these internal limitations using SegmentReaders in combination with MultiReaders can influence higher-level structures in Lucene. The FieldCache is used to uninvert the index to allow sorting of search results by indexed value or Document / Value lookups during search. Uninverting the top-level readers leads to duplication in the FieldCache and essentially multiple instances of the same cache.<br />
<br />
<span style="font-size: large;"> Type-Safe IndexReaders in Lucene 4.0</span><br />
<br />
From day one Lucene 4.0 was designed to not allow retrieving of terms and postings data from “composite” readers like MultiReader or DirectoryReader (which is the implementation that is returned for on-disk indexes, if you get a reader from IndexReader.open(Directory)). Initial versions of Lucene trunk simply threw an UnsupportedOperationException when you tried to get instances of Fields, TermsEnum, or DocsEnum from a non SegmentReader. Because of the missing type safety, one couldn’t rely on the ability to get postings from the IndexReader unless manually checking if it was composite or atomic.<br />
<br />
<a href="https://issues.apache.org/jira/browse/LUCENE-2858">LUCENE-2858</a> is one of the major API changes in Lucene 4.0, it completely changes the Lucene client code “perspective” on indexes and its segments. The abstract class IndexReader has been refactored to expose only essential methods to access stored fields during display of search results. It is no longer possible to retrieve terms or postings data from the underlying index, not even deletions are visible anymore. You can still pass IndexReader as constructor parameter to IndexSearcher and execute your searches; Lucene will automatically delegate procedures like query rewriting and document collection atomic subreaders. <br />
<br />
If you want to dive deeper into the index and want to write own queries, take a closer look at the new abstract sub-classes AtomicReader and CompositeReader:<br />
<br />
AtomicReader instances are now the only source of Terms, Postings, DocValues and FieldCache. Queries are forced to execute on a Atomic reader on a per-segment basis and FieldCaches are keyed by AtomicReaders. It’s counterpart CompositeReader exposes a utility method to retrieve its composites. But watch out, composites are not necessarily atomic. Next to the added type-safety we also removed the notion of index-commits and version numbers from the abstract IndexReader, the associations with IndexWriter were pulled into a specialized DirectoryReader. Here is an “example” executing a query in Lucene trunk:</div>
<pre style="background: #f0f0f0; border: 1px dashed #CCCCCC; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> DirectoryReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
Query query = new QueryParser("fieldname", analyzer).parse(“text”);
TopDocs hits = searcher.search(query, 10);
ScoreDoc[] docs = hits.scoreDocs;
Document doc1 = searcher.doc(docs[0].doc);
// alternative:
Document doc2 = reader.document(docs[1].doc);
</code></pre><div>
Does that look familiar? Well, for the actual API user this major refactoring doesn’t bring much of a change. If you run into compile errors related to this change while upgrading you likely found a performance bottleneck.<br />
<br />
<span style="font-size: large;"> Enforcing Per-Segment semantics in Filters</span><br />
<br />
If you have more advanced code dealing with custom Filters, you might have noticed another new class hierarchy in Lucene (see <a href="https://issues.apache.org/jira/browse/LUCENE-2831">LUCENE-2831</a>): IndexReaderContext with corresponding Atomic-/CompositeReaderContext. This has been added quite a while ago but is closely related to atomic and composite readers.<br />
<br />
The move towards per-segment search Lucene 2.9 exposed lots of custom Queries and Filters that couldn't handle it. For example, some Filter implementations expected the IndexReader passed in is identical to the IndexReader passed to IndexSearcher with all its advantages like absolute document IDs etc. Obviously this “paradigm-shift” broke lots of applications and especially those that utilized cross-segment data structures (like Apache Solr). <br />
<br />
In Lucene 4.0, we introduce IndexReaderContexts “searcher-private” reader hierarchy. During Query or Filter execution Lucene no longer passes raw readers down Queries, Filters or Collectors; instead components are provided an AtomicReaderContext (essentially a hierarchy leaf) holding relative properties like the document-basis in relation to the top-level reader. This allows Queries & Filter to build up logic based on document IDs, albeit the per-segment orientation.<br />
<br />
<span style="font-size: large;">Can I still use top-level readers?</span><br />
<br />
There are still valid use-cases where Top-Level readers ie. “atomic views” on the index are desirable. Let say you want to iterate all terms of a complete index for auto-completion or facetting, Lucene provides utility wrappers like SlowCompositeReaderWrapper emulating an AtomicReader. Note: using “atomicity emulators” can cause serious slowdowns due to the need to merge terms, postings, DocValues, and FieldCache, use them with care!</div>
<pre style="background: #f0f0f0; border: 1px dashed #CCCCCC; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;"> Terms terms = SlowCompositeReaderWrapper.wrap(directoryReader).terms(“field”);
</code></pre><div>
Unfortunately, Apache Solr still uses this horrible code in a lot of places, leaving us with a major piece of work undone. Major parts of Solr’s facetting and filter caching need to be rewritten to work per atomic segment! For those implementing plugins or other components for Solr, SolrIndexSearcher exposes a “atomic view” of its underlying reader via SolrIndexSearcher.getAtomicReader().<br />
<br />
If you want to write memory-effective and fast search applications (that do not need those useless large caches like Solr uses), I would recommend to not use Solr 4.0 and instead write your search application around the new Lucene components like the new facet module and SearcherManager!</div>Uwe Schindlerhttp://www.blogger.com/profile/08079070589736993766noreply@blogger.com45H.-H.-Meier-Allee 63, 28213 Bremen, Deutschland53.0932921 8.848515533.379459600000004 -31.581172000000002 72.8071246 49.278203tag:blogger.com,1999:blog-1353896202392192412.post-22481651642321983782011-12-21T00:17:00.000+01:002011-12-21T01:02:26.266+01:00JDK 7u2 released - How about Linux and other operating systems?Last week, Oracle released <a href="http://www.oracle.com/technetwork/java/javase/7u2-relnotes-1394228.html">Java 7 Update 2</a>, another milestone. This release included, of course, all the fixes that were already in <a href="http://blog.thetaphi.de/2011/10/java-7-update-1-released-does-it-fix.html">Update 1</a> (see also <a href="http://www.oracle.com/technetwork/java/javase/7u1-relnotes-507962.html">Oracle's page</a>), especially those <a href="http://blog.thetaphi.de/2011/07/real-story-behind-java-7-ga-bugs.html">affecting Apache Lucene and Solr</a>. Since my last post on this blog, I was investigating what changed and how other operating systems like Ubuntu/Redhat Linux and FreeBSD are supported <i>(warning: sarcasm alert!)</i><br />
<br />
<span style="font-size: large;">Linux</span><br />
<br />
First of all, you can of course download the official Linux packages from Oracle. But those are not automatically updated when a new release comes out. So most Linux users prefer to use the automatic update of their operating system. Unfortunately, at the beginning of this month, Ubuntu wrote in an <a href="https://lists.ubuntu.com/archives/ubuntu-security-announce/2011-December/001528.html">announcement</a>:<br />
<blockquote>
<span style="background-color: white; font-family: inherit; white-space: pre-wrap;">As of August 24th 2011, we no longer have permission to redistribute new </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">Java packages as Oracle has retired the "Operating System Distributor </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">License for Java".</span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;"><br /></span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">Oracle has published an advisory about security issues in the version of </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">Java we currently have in the partner archive. Some of these issues are </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">currently being exploited in the wild. </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">Due to the severity of the security risk, Canonical is immediately </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">releasing a security update for the Sun JDK browser plugin which will </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">disable the plugin on all machines. This will mitigate users' risk from </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">malicious websites exploiting the vulnerable version of the Sun JDK.</span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;"><br /></span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">In the near future (exact date TBD), Canonical will remove all Sun JDK </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">packages from the Partner archive. This will be accomplished by pushing </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">empty packages to the archive, so that the Sun JDK will be removed from all </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">users machines when they do a software update. Users of these packages who </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">have not migrated to an alternative solution will experience failures after </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">the package updates have removed Oracle Java from the system.</span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;"><br /></span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">If you are currently using the Oracle Java packages from the partner </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">archive, you have two options:</span> </blockquote>
<blockquote>
<ul>
<li><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">Install the OpenJDK packages that are provided in the main Ubuntu </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">archive (<b>openjdk-6-jdk</b> or </span><span style="background-color: white; font-family: inherit; white-space: pre-wrap;"><b>openjdk-6-jre</b> for the virtual machine).</span></li>
<li><span style="background-color: white; font-family: inherit; white-space: pre-wrap;">Manually install Oracle's Java software from their web site.</span></li>
</ul>
</blockquote>
Unfortunately this means that we will never get an official Ubuntu package for Java 7! What are all these security bugs suddenly heavily exploited in the wild?<br />
<br />
OK, the latest version of Ubuntu's JDK 6 was Update 26, so what security fixes came in Update 27, Update 29, and Update 30? I inspected the changelogs shipped with the openjdk6 and openjdk7 packages, which are now the "official Java support" for Ubuntu (and also Redhat) but there is something wrong: It's not even OpenJDK! OpenJDK is still on <a href="http://download.java.net/openjdk/jdk7/">build 147</a><i> (as of their official download page)</i> - which is the original Java 7 release that <a href="http://blog.thetaphi.de/2011/07/real-story-behind-java-7-ga-bugs.html">broke</a> <b>Apache Lucene</b> and <b>Apache Solr</b> with index corru(m)ptions and SIGSEGVs. This means no Linux user can run our full text search engine on Linux, because it SIGSEGVs shortly after starting? <i>But thats not what the Ubuntu package contains: What Ubuntu "sells" as OpenJDK is indeed a strange product named "<a href="http://icedtea.classpath.org/">IcedTea</a>" - wtf is that?</i><br />
<br />
<a href="http://blog.fuseyism.com/index.php/2011/10/19/icedtea-2-0-released/">IcedTea 2.0 was released on October 19, 2011 with a long ist of security fixes</a>! But the ubuntu download still has the <i>famous build number 147</i> in its version number: <b>7~b147-2.0-1ubuntu2</b> - how does this fit together? <b>Redhat and Ubuntu both sell another product "IcedTea", but labeled as "OpenJDK"! </b>As this is so widely used, this seems to lead to the fact that Oracle does not seem to update their original OpenJDK release anymore. The Iced<strike>Cream</strike>Tea seems to be the "new" offical release? What about all non-Linux operating systems like <i>FreeBSD (see below)</i>? I think that's a bad idea, because it confuses users. Also, when reviewing fixed bugs in official Oracle releases you get an update number (current is Java 6u30 or Java 7u2), but with OpenJDK (sorry IcedTea) Linux packages you get version numbers that don't tell you any relation to Oracle's releases - useless!<br />
<br />
In fact to come back to OpenJDK 6 package in Ubuntu: If you install this replacement package on your machine according to the <a href="https://wiki.ubuntu.com/LucidLynx/ReleaseNotes/Java6Transition">howto on the Ubuntu webpage</a> for the good <span style="font-family: 'Courier New', Courier, monospace;">sun-java6</span> package (which is u26) - you get an older hotspot version (hotspot version numbers are the only things that you can read and compare from "<span style="font-family: 'Courier New', Courier, monospace;">java -version</span>" output)! Something around official Oracle JDK 6u24 - so in fact you get an older version -<b> that's no upgrade, that's a downgrade!</b> For OpenJDK 7 you get something like Oracle's JDK 7u0 but with thousands of patches applied.<br />
<br />
<i>To come back to the Lucene/Solr bugs:</i> Yes, they are fixed in this mysterious OpenJDK/IcedTea 7 release, <a href="http://blog.fuseyism.com/index.php/2011/10/19/icedtea-2-0-released/">the long list of changes</a> verifies that. If you download the wrong-named OpenJDK 7 package with the horrible build number 147 (<span style="font-family: 'Courier New', Courier, monospace;">openjdk-7-jdk 7~b147-2.0-1ubuntu2</span>), you will <b>not crash your JVM with Apache Solr</b> and you can try it out with the new garbage collector (G1) and some performance improvements (indeed Lucene tests run faster with Java 7 on my box). It looks very stable.<br />
<br />
The second shock on this day occurred when I was searching for the famous Lucene bugs in the list of fixed IcedTea issues. They appear there as one of the horrible security bugs with CVE numbers assigned (<a href="http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=2011-3558">CVE-2011-3558</a>, <i>and others</i>)! This also explains why Oracle made the <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134">orginal porter stemmer bug report</a> hidden! They also appear in the <span style="font-family: 'Courier New', Courier, monospace;">openjdk-6</span> Ubuntu packages - as horrible security bugs, too. So Ubuntu patched the antique 6u24 and older versions with patches for Java 6u29<i> [they also patched u20, where the bug was not existent, see <a href="http://people.canonical.com/~ubuntu-security/cve/2011/CVE-2011-3558.html">here</a>]</i> - thats really strange. <b>And again, confuses users!</b><br />
<b><br /></b><br />
<b>And finally:</b><i> The Lucene bugs seem to be one of the reasons to delete the sun-java6 packages from Ubuntu Partner repository in the future. How funny is this? Does anybody have an exploit except starting Apache Solr with the default configuration and <span style="font-family: 'Courier New', Courier, monospace;">-XX:+AggressiveOpts</span> enabled? OK, it is really a security issue for users working on your Solr Search web frontend and suddenly produce corru(m)pt indexes on your machine! They might not find anything after this disaster.</i><br />
<b><br /></b><br />
<span style="font-size: large;">FreeBSD</span><br />
<br />
What about FreeBSD? It looks much worse: There is no new update for OpenJDK available until today, so you cannot use it to build a new Port. The Jenkins Server at Apache, running the Lucene tests, is still running the original OpenJDK7 b147 build that I patched during the summer to work around the Java 7 bugs. I think the problem is here, that Oracle no longer releases OpenJDK builds, because IcedTea is there. But IcedTea is Linux only!<br />
<br />
<b><span style="color: red;">Please note: This blog post is partially a little bit sarcastic, I just tell my feelings about the whole Linux-Ubuntu-OpenJDK-FreeBSD issue.</span></b><br />
<br />
<i>A short side note: </i><a href="http://www.pangaea.de/">PANGAEA</a> now runs very stable and horrible fast for some operations with Lucene 3.5 (no Apache Solr) and official Oracle JDK 7u2 on Solaris x64 (MMapDirectory of course)! <b><span style="color: #274e13;">I wish you merry Xmas and a happy new year!</span></b>Uwe Schindlerhttp://www.blogger.com/profile/08079070589736993766noreply@blogger.com143Bremen53.09323 8.8484553.0884625 8.8385795 53.0979975 8.8583205tag:blogger.com,1999:blog-1353896202392192412.post-11629591220583809812011-10-19T16:05:00.016+02:002011-10-27T10:35:34.700+02:00Java 7 Update 1 released - Does it fix the Lucene index corru(m)ption and SIGSEGV bugs?After the <a href="http://blog.thetaphi.de/2011/07/real-story-behind-java-7-ga-bugs.html">serious issues</a> with the Java 7 GA release, Oracle <a href="http://www.oracle.com/technetwork/java/javase/7u1-relnotes-507962.html">released Update 1 of Java 7</a> yesterday. The first thing, of course, was to check the release notes, download the package, and finally run our Lucene test cases. When inspecting the <a href="http://www.oracle.com/technetwork/java/javase/downloads/index.html">download page</a> and <a href="http://www.oracle.com/technetwork/java/javase/7u1-relnotes-507962.html">release notes</a> on the Oracle web site, I got confused: The new release is <b>Java 7u1</b>, in contrast the developer preview released one week ago named <b>Java 7u2 Developer Preview</b>. <i>Both builds have the same build number (b08).</i> More strange is: The release notes of the developer preview <b>differ </b>from the release notes on the Update 1 release:<br />
<ul><li>The official Update 1 release notes only list <b>one bug of the three</b> ones originally reported to Oracle: <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051">#7068051</a> (SIGSEGV in PhaseIdealLoop::build_loop_late_post on T5440). The other ones are not listed, so we cannot be sure that they are really fixed.</li>
<li><b>The famous SIGSEGV bug in Porter Stemmer (<a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134">#7070134</a>) is not listed at all</b>, it also disappeared from the Oracle issue tracker. It seems like hidden to the public<i> (maybe they declared it as confidental because it's security related???)</i>.</li>
</ul><div><br />
No matter what release notes say - I had to download the offical release package! I took some free time on the <a href="http://2011.lucene-eurocon.org/">Lucene Eurocon Conference</a> in Barcelona, downloaded the packages and installed them in parallel on my Windows 64bit Thinkpad. First I tried to run the Porter Stemmer test with 100 iterations on Java 7 GA, and verified that it crashed. <a href="http://www.lucidimagination.com/blog/author/robert-muir/">Robert Muir</a> joined and we tested the new U1 release: <b>Test passes!</b> This means, Hotspot issue <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134">#7070134</a> was fixed, but Oracle missed to put it into the release notes.</div><div><br />
</div><div>The second part of the investigations were more complicated: The index corru(m)ption bugs are more complicated to reproduce, as the virtual machine does not crash and simply produces corrupt indexes after merging segments in one of the facetting tests. We checked out the Lucene Trunk revision of the time, when the bug was first discovered (issue <a href="https://issues.apache.org/jira/browse/LUCENE-3346">LUCENE-3346</a>) and used the random seed mentioned on the issue. We were able to verify the bug with Java 7 GA (the indexes are corrupt 90% of the time), <b>and luckily after 20 iterations of the same test and random seed in Java 7u1, we have seen no corrupt index!</b> It seemed to us that Oracle maybe fixed Java issues <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738">#7044738</a> and <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051">#7068051</a>, but missed to put both of them into the release notes. <b>Of course, without an additional statement from Oracle, we cannot be sure, that the issues are really fixed!</b></div><div><b><br />
</b></div><div>Oracle also <a href="http://www.oracle.com/technetwork/java/javase/6u29-relnotes-507960.html">released Java 6 Update 29</a> yesterday. The <a href="http://www.oracle.com/technetwork/java/javase/6u29-relnotes-507960.html">release notes</a> on that version doesn't mention any relation to the Lucene bugs, so we were not sure if this version is completely different to the Java 6u29 developer preview, released one week ago, which listed those bugs <i>(unfortunately, the package is no longer available on the net)</i>, or if they also missed to mention them. A quick review as done for Java 7 showed, that Porter Stemmer no longer crashes with <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">-XX:+AggressiveOpts</span>, so the bug seems to be also fixed here, too. We were not able to actually discover any index corru(m)ption.</div><div><br />
</div><div>Finally, we can somehow verify that the bugs seem to be fixed for both versions, <b>but without an official statement from Oracle (in their release notes), we cannot recommend to use Java 7u1 (and Java 6u29 with aggressive opts) with Lucene and Solr</b>.</div><div><br />
</div><div>Once I will be back in Germany, I will try to get an updated FreeBSD package of Java 7 and install it on our Apache Jenkins server.<br />
<br />
<span class="Apple-style-span" style="font-size: large;"><b>Update (2011-10-26)</b></span><br />
<br />
Last night, <b>Oracle updated the release notes of Java 7u1 and Java 6u29</b>, stating that they fixed the three Lucene-relevant bugs (plus another one related to that). <span class="Apple-style-span" style="color: red; font-weight: bold;">Based on this confirmation, it's now safe to use Java 7 Update 1 <i>(and later)</i> with Apache Lucene and Apache Solr.</span><br />
<br />
<span class="Apple-style-span" style="font-weight: bold;">Of course, there is still the recommendation <u>not</u> to use <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">-XX:+AggressiveOpts</span> on any JVM in production!</span><br />
<br />
We are still waiting for updated OpenJDK packages to install this release on our build servers.</div>Uwe Schindlerhttp://www.blogger.com/profile/08079070589736993766noreply@blogger.com52Barcelona, Spain41.3907407 2.141537200000016141.3166072 2.0547142000000163 41.464874200000004 2.228360200000016tag:blogger.com,1999:blog-1353896202392192412.post-61495867403874820072011-10-15T15:33:00.000+02:002011-10-15T15:33:15.562+02:00GotoCon Aarhus 2011Last week I visited the <a href="http://gotocon.com/aarhus-2011/">Gotocon Conference 2011</a> in Aarhus. I was invited by the Java User Group (<a href="http://home.javagruppen.dk/">Javagruppen</a>) of Denmark to have a talk at their <a href="https://secure.trifork.com/aarhus-2011/freeevent/index.jsp?eventOID=3491">event</a> during this conference. Of course this talk was about the famous <a href="http://blog.thetaphi.de/2011/07/real-story-behind-java-7-ga-bugs.html">Java 7 Launch Bug</a>, I posted earlier in this blog.<div><br />
I started the trip on Monday morning and wanted to arrive in the afternoon but unfortunately the first train from Bremen to Hamburg was canceled by Deutsche Bahn. Because of this I missed the only direct train to Aarhus and had to use slow trains like RegionalExpress and arrived about 3 hours later at the conference. Unfortunately I was only able to listen to the final keynote on the first day, so I missed the talks I wanted to visit on the afternoon.<br />
<br />
<b><span class="Apple-style-span" style="font-size: large;">Day One</span></b><br />
<br />
The "party keynote" was about "<a href="http://gotocon.com/aarhus-2011/presentation/Party%20Keynote:%20Cool%20Code">Cool Code</a>", held by <a href="http://asemantic.blogspot.com/">Kevlin Henney</a>. He presented lots of nice code fragments "that are interesting because of historical significance, profound concepts, impressive technique, exemplary style or just sheer geekiness". Unfortunately the room was so big and the screen so small, that most code parts were unreadable. You were still able to follow, but the coolness of the shown code was not really visible. Alltogether, the talk was very interesting and a must to follow! After this keynote, still very hungry from the train ride, I attended the conference dinner, met lots of people. On my table were people from Siemens, one of them was Frank Buschmann, who hold the talk "<a href="http://gotocon.com/aarhus-2011/presentation/Seven%20Secrets%20Every%20Architect%20Should%20Know">Seven Secrets Every Architect Should Know</a>" on Wednesday.</div><div><br />
</div><div><b><span class="Apple-style-span" style="font-size: large;">Day Two</span></b></div><div><br />
</div><div>The next day was packed with lot's of talks. Unfortunately, I still had to discuss with Robert Muir through Google Talk about Apache Lucene issue <a href="https://issues.apache.org/jira/browse/LUCENE-1536">LUCENE-1536</a><i> (applying Lucene filters low, once it is committed, I will write about it!) </i>and a bug in our FilteredQuery#Weight implementation, so I missed the first talk. I started with "<a href="http://gotocon.com/aarhus-2011/presentation/Eigenharp%20:%20Experiencing%20Music%20Differently%20(and%20what%20programmers%20can%20learn%20from%20it)">Eigenharp: Experiencing Music Differently (and what programmers can learn from it)</a>" which compared music instruments with user interfaces and what developers can learn from music instruments and their long history.</div><div><br />
</div><div>After lunch I attended the talk about <a href="http://gotocon.com/aarhus-2011/presentation/node.js">node.js</a> by Bert Belder, who had some problems to do his live demonstration on Apple computers <i>(yes: "die, Apple, die")</i>. It gave me some ideas, how to use a select-based webserver for Apache Solr. Maybe we will use the Java-only <a href="http://www.jboss.org/netty">JBoss' Netty</a> for Solr in the future!</div><div><br />
</div><div><b><span class="Apple-style-span" style="font-size: large;">The Java User Group: My talk about the Java 7 launch bug</span></b></div><div><br />
</div><div>On 15:50, the Java 7 event started in Kammermusiksalen. Martin Boel from Javagruppen did some introduction about the new features in Java 7 and introduced myself. He showed features, I did not know about, like an improved swing user interface. These are all features, I generally don't use in my code. One goodness is the new "try-with-resources" statement, that can make your code cleaner, as you don't have to take care of closing all resources when an exception occurs. Java 7 does this automatically (in fact, the Java compiler adds the correct statements in complicated finally blocks automatically to your code). We are using a similar infrastructure in our <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">IOUtils</span> class in Lucene. Recently I already upgraded it (<a href="https://issues.apache.org/jira/browse/LUCENE-3334">LUCENE-3334</a>) to make use of the new <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Throwable.addSuppressed()</span> functionality to log supressed exceptions in the stack trace of the original exception. As Apache Lucene is still compatible with Java 5, I added a convenience reflection-based method to <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">IOUtils</span>, so we can log the supressed exception if the code runs on Java 7.</div><div><br />
</div><div>After the introduction, I started with my talk. In the following hour I showed <a href="http://www.thetaphi.de/share/Schindler-Java7.pdf">a lot of slides<i> (download them here as PDF)</i></a> and explained the whole story of the Java 7 launch bug, as I did in my <a href="http://blog.thetaphi.de/2011/07/real-story-behind-java-7-ga-bugs.html">previous blog post</a>. I also explained, for the first time, why Java 7 crashes on the Porter Stemmer code:</div><blockquote>During the investigation of the bug in July, I had no idea, why Porter Stemmer's <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ends()</span> method crashed with <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">SIGSEGV</span> - but almost identical code in the Java <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">String</span> class did not. When preparing my presentation, I analyzed the comments by hotspot's developers and was able to understand it for our case. The reason why <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ends()</span> crashes in our case is also related to a loop unwinding bug. The difference to the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">String</span> class is the underlying <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">char[]</span> array, which never changes in Java's <i>final </i>and <i>unmodifiable </i><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">String</span> class. On the other hand, the Porter Stemmer code modifies the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">char[]</span> array and leads to a bug because it used wrong assumptions. Java 7's string optimization routines make the hotspot compiler assume <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">String.length()</span> is constant<i> (which is, of course, true)</i>. In the Porter Stemmer case, these lengths are generally very short (it checks for short strings only), and unwinds the loops in <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">end()</span>. Unfortunately the underlying <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">char[]</span> array used by Porter Stemmer is not unmodifiable, but length checks were removed leading to the bug. Not using string optimizations like Java 6 without <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">-XX:+OptimizeStringConcat</span> therefore prevented the bug (as the loop optimizations cannot be applied).</blockquote><div>My talk was also followed by some Nokia guys from Berlin (I knew them from Berlin Buzzwords conference). After a short reception we went to Aarhus' nightlife and visited some bars.</div><div><br />
</div><div><b><span class="Apple-style-span" style="font-size: large;">Day Three</span></b></div><div><br />
</div><div>The first talk I visited was Hannes Kruppa's (he works at Nokia in Berlin and joined us the evening before) "<a href="http://gotocon.com/aarhus-2011/presentations/show_presentation.jsp?oid=3580">Improving Search Ranking Through A/B Tests: A Case Study</a>". After that I went to "<a href="http://gotocon.com/aarhus-2011/presentation/Questions%20for%20an%20Enterprise%20Architect">Questions for an Enterprise Architect</a>" by Erik Doernenburg, and finally had some lunch. I left the conference at about 14:00 to head back to germany. This time with no train problems.</div><div><br />
</div><div>All togther a very nice conference! Thanks to all who organized it!</div>Uwe Schindlerhttp://www.blogger.com/profile/08079070589736993766noreply@blogger.com67Bremen, Germany53.093291 8.848491200000012252.984324 8.5939942000000116 53.202258 9.1029882000000129tag:blogger.com,1999:blog-1353896202392192412.post-47785841001883128012011-07-31T14:23:00.077+02:002011-08-12T11:02:47.309+02:00The real story behind the Java 7 GA bugs affecting Apache Lucene / SolrI started this blog about two months ago, but until now, I had no time to write any posts. Since <a href="http://mail.openjdk.java.net/pipermail/announce/2011-July/000106.html">Oracle's release of Java 7</a> on Thursday last week, a lot of myths around the problem with <a href="http://tartarus.org/~martin/PorterStemmer/">Porter stemmer</a>, wrong calculations in <a href="http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-primary-key.html">pulsing codec</a>, and <a href="https://issues.apache.org/jira/browse/LUCENE-3346">corrupt indexes</a> appeared on the web. Some guys already blamed the Apache Software Foundation because they may have used the hype around Java 7 to fight against Oracle (since they <a href="https://blogs.apache.org/foundation/entry/the_asf_resigns_from_the">stepped out of JCP</a>).<br />
<br />
<span class="Apple-style-span" style="font-size: large;"><b>I want to start with a chronological summary about what happened since last weekend:</b></span><br />
<br />
Last Saturday I woke up in the morning and had no real plans for the weekend. I checked my Lucene JIRA issues and after I helped my GSoC student, I read Heise newsticker. One news article stepped into my eye: "<a href="http://www.h-online.com/open/news/item/Gearing-up-for-Java-7-1282682.html">Gearing up for Java 7</a>" (english version). I noticed that there is only one week left until Oracle plans to release Java 7. "Maybe we should add some Jenkins jobs to the Apache build server to test Apache Lucene/Solr trunk and 3.x branches?" As always, <a href="http://rcmuir.wordpress.com/">Robert</a> and me had some quick Google Talk session and I pointed out my plans. Our first problem was that the <a href="https://builds.apache.org/computer/lucene/">Jenkins server</a> of Lucene runs under FreeBSD, so the chances were high, that we get no recent release of OpenJDK 7 running. I already installed the 64bit Windows preview build from Oracle (b147) on my Thinkpad and checked out the Lucene Core tests, which ran fine.<br />
<br />
A quick review showed that OpenJDK7 build 147 was already available in the <a href="http://www.freshports.org/java/openjdk7">FreeBSD ports collection</a>. I logged onto the jenkins slave machine and started to update ports and built the <b>openjdk7 </b>package. In parallel I also rebuilt the <b>openjdk6 </b>package for the standard test runs (a fix for a IPv6 bug was already available, so we were able to use latest version). After approx 25 minutes, the build was done and the package installed. I cloned the Jenkins jobs for the half-hourly test builds (3.x and trunk), hacked the shell scripts a little bit and started the first build. Robert and me were watching the build output,... live.<br />
<br />
During the tests of the new analyzers module suddenly we got a crash with SIGSEGV in TestPorterStemmer. As I was only running the core tests on my local machine, this was not yet discoverered. We were shocked, "this is just a stupid FreeBSD problem, I am sure... hmhmhm... but a few weeks ago a user on the Solr mailing list reported a crash in PorterStemmer, too" (see <a href="http://www.lucidimagination.com/search/document/3beaa082c4d2fdd4/porterstemfilter_kills_jvm">this post</a>). I started the analyzer tests locally on my windows box, bumm - same issue. Important was that the issue only appeared when I raised the number of test iterations, with the default of 1, the test passed. That clearly shows, there was an hotspot optimization bug, as they are only triggered when a method is executed ten thousand times.<br />
<br />
I stopped the 2-hourly builds with Java 7 on the Jenkins slave, as we did not want to spam the mailing list with failed build reports. Robert opened <a href="https://issues.apache.org/jira/browse/LUCENE-3335">LUCENE-3335</a> and started to investigate the problem. Step for step, he disabled compilation of several porter stemmer methods until he reached <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">PorterStemmer.ends()</span>. Robert opened a bug report at Oracles bug tracker: <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134">#7070134</a><br />
<br />
The rest of the day we also fixed another small issues in our test cases (TestWordDelimiterFilter failed because it used a character that changed properties in Java 7 - Robert committed a fix for this).<br />
<br />
On Sunday morning I gave Jenkins another chance: To proceed with testing on Jenkins, I hacked the shell scripts again to pass the following args to the Lucene test suite (only for Java 7 builds): <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">-Dargs="-XX:CompileCommand=exclude,org/apache/lucene/analysis/en/PorterStemmer,ends"</span><br />
<br />
The Lucene builds then passed fine most of the time, but we had some additional failures in another test inside the new facetting module: <a href="https://issues.apache.org/jira/browse/LUCENE-3346">LUCENE-3346</a> <i>(see below)</i>. But when starting to test the Solr builds, the problems began again: Some tests sometimes failed with strange error messages, but only randomly. It took us two hours and a hint from Yonik to find the issue behind this (<a href="https://issues.apache.org/jira/browse/SOLR-2673">SOLR-2673</a>):<br />
<blockquote>"It looks like with Java7, that the test methods are not being run in the order they are declared in the file. That's probably the root cause of the other Solr test failures with Java7, too."</blockquote>A quick test with Java 6, inserting a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Collections.shuffle()</span> into the <a href="http://blog.mikemccandless.com/2011/03/your-test-cases-should-sometimes-fail.html">customized Lucene TestRunner</a>, produced the same bugs. Additionally, the JDK docs <b>nowhere explicitely state</b> that <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Class.getMethods()</span> does return in a particular order:<br />
<blockquote>"Returns an array containing <code>Method</code> objects reflecting all the public <em>member</em> methods of the class or interface represented by this <code>Class</code> object, including those declared by the class or interface and those inherited from superclasses and superinterfaces. Array classes return all the (public) member methods inherited from the <code>Object</code> class. The elements in the array returned are not sorted and are not in any particular order."</blockquote>The next few hours were then spent in fixing all Solr tests that relied on order of method executions (which is a violation against testing principles). On the evening we had all tests running fine, except the facetting bug and a random failure in the ICU module: <a href="https://issues.apache.org/jira/browse/LUCENE-3344">LUCENE-3344</a> (I will come back to that later).<br />
<br />
<span class="Apple-style-span" style="font-size: large;"><b>Status on Sunday: Until now, no response from Oracle about the new bug...</b></span><br />
<br />
<span class="Apple-style-span" style="font-size: large;"><b>Monday morning CET, still no news from Oracle!</b></span><br />
<br />
In late afternoon we got the first response; Robert informed me:<br />
<blockquote>"Bug is visible, but it's priority is LOW. I am sure, they will release Java 7 with PorterStemmer crashing. All Solr users with the default configuration will blame us!"</blockquote>Dawid Weiss then contacted the <a href="http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-July/005962.html">hotspot-compiler-dev</a> mailing list; I subscribed to it and started to ask them <a href="http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-July/005966.html">questions</a>. The developer responsible for these bugs <i>(Vladimir Kozlov)</i> sent me in the evening links to patches that might fix the issue. I also got an explanation, why we had this issue, that also exists in Java 6, but is hidden there due to optimizations not enabled <i>(please read the thread for more information)</i>.<br />
<br />
On Tuesday morning CET I started to produce a FreeBSD ports patch, so I was able to compile our openjdk7 package on the Jenkins slave with the proposed fixes (see <a href="https://issues.apache.org/jira/secure/attachment/12487824/patch-0uwe.patch">patch-0uwe.patch</a>). I changed the Jenkins configuration again and suddenly all tests passed, even the facetting tests!<br />
<br />
We then worked around the last Java7-related test bug (<a href="https://issues.apache.org/jira/browse/LUCENE-3344">LUCENE-3344</a>), which made ICU fail on classloading <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ULocale</span>. The problem is caused by some new locales in Java 7, that lead to a chicken-and-egg problem in the static initializer of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ULocale</span>. It initializes its default locale from the JDK locale in a static ctor. Until the default ULocale instance is created, the default is not set in <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ULocale</span>. But <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ULocale</span>'s ctor itsself needs the default locale to fetch some ressource bundles and throws NPE. We opened a bug report at ICU (<a href="http://bugs.icu-project.org/trac/ticket/8734">#8734</a>) and added a workaround into our <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">LuceneTestCase</span>'s locale randomization.<br />
<br />
<b><span class="Apple-style-span" style="font-size: large;">Time goes on...</span></b><br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGP5gmH4b86HfXs7000B18LsnexTZpRhw6z7e4-K5SRdBefft3dGRs8g-Fa1nDTZeEMUtoTkX2ZbnkNtEmLFVLn_n8omYFqwkpMoGVLVWLpIrzrYwAuG3fWzX5mwR-vXseUOwQWfdE_rU/s1600/jdk7-signature.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGP5gmH4b86HfXs7000B18LsnexTZpRhw6z7e4-K5SRdBefft3dGRs8g-Fa1nDTZeEMUtoTkX2ZbnkNtEmLFVLn_n8omYFqwkpMoGVLVWLpIrzrYwAuG3fWzX5mwR-vXseUOwQWfdE_rU/s400/jdk7-signature.png" width="400" /></a></div>On Thursday, lunch time CET, Java 7 was released by Oracle. Still hoping for a wonder, I downloaded the Windows x64 build from the <a href="http://www.oracle.com/technetwork/java/javase/downloads/index.html">official download</a> location. Clicking on install it reported: "this version of the JDK is already installed on your computer, do you want to reinstall?" I did this and I was able to verify: it's exactly the same version as released on June 27th <i>(you can verify this, too: the timestamp on the signature of the Windows Installer EXE file is June 27th)</i>. So Oracle ignored all bugs (not only ours) in the preview release and simply released a one month old package! So what was the sense of the preview release? They could have released it one month before! It was for sure not intended for public review and bug hunting!<br />
<br />
In the evening (CET) I then opened a new issue (<a href="https://issues.apache.org/jira/browse/LUCENE-3349">LUCENE-3349</a>) as we already discussed on IRC, that we don't want our users to crash their Solr installations and corrupt their indexes. Robert spent the last days on hunting the causes behind the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">CheckIndex</span> failures in the facetting module <i>(the so-called index corru(m)ption bug)</i>. He was able to produce some random seeds that triggered the bug. In fact, it is a reincarnation of the well known <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">readVInt</span>-bug (<a href="https://issues.apache.org/jira/browse/LUCENE-2975">LUCENE-2975</a>) we discovered before release of Lucene 3.1. It affects the most often used Lucene method during reading index contents from disk: <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">DataInput.readVInt()</span>. For optimization purposes we have several incarnations of this method dependent on the underlying data structures. Java 6 JDKs only triggered this bug in combination with <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">MappedByteBuffer.getByte()</span>, but now it was triggered almost everywhere! Especially when the new <a href="http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-primary-key.html">pulsing codec</a> was used (which has a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">byte[]</span> variant of this method), all was <i>f*cked up!</i> StandardPostingsReader was merging index segments, the wrong results of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">readVInt()</span> were copied to the newly created index segments, and finally we produced corrupt indexes!<br />
<br />
<b><span class="Apple-style-span" style="font-size: large;">The warning mail was released...</span></b><br />
<br />
Shortly before midnight CET I sent the warning mail, prepared in <a href="https://issues.apache.org/jira/browse/LUCENE-3349">LUCENE-3349</a>, to the mailing lists:<br />
<blockquote><div class="MsoPlainText">Hello Apache Lucene & Apache Solr users, Hello users of other Java-based Apache projects,<o:p></o:p></div><div class="MsoPlainText"><br />
</div><div class="MsoPlainText">Oracle released Java 7 today. Unfortunately it contains hotspot compiler optimizations, which miscompile some loops. This can affect code of several Apache projects. Sometimes JVMs only crash, but in several cases, results calculated can be incorrect, leading to bugs in applications (see Hotspot bugs 7070134 [1], 7044738 [2], 7068051 [3]).<o:p></o:p></div><div class="MsoPlainText"><br />
</div><div class="MsoPlainText">Apache Lucene Core and Apache Solr are two Apache projects, which are affected by these bugs, namely all versions released until today. Solr users with the default configuration will have Java crashing with SIGSEGV as soon as they start to index documents, as one affected part is the well-known Porter stemmer (see LUCENE-3335 [4]). Other loops in Lucene may be miscompiled, too, leading to index corruption (especially on Lucene trunk with pulsing codec; other loops may be affected, too - LUCENE-3346 [5]).<o:p></o:p></div><div class="MsoPlainText"><br />
</div><div class="MsoPlainText">These problems were detected only 5 days before the official Java 7 release, so Oracle had no time to fix those bugs, affecting also many more applications. In response to our questions, they proposed to include the fixes into service release u2 (eventually into service release u1, see [6]). This means you cannot use Apache Lucene/Solr with Java 7 releases before Update 2! If you do, please don't open bug reports, it is not the committers' fault! At least disable loop optimizations using the -XX:-UseLoopPredicate JVM option to not risk index corruptions.<o:p></o:p></div><div class="MsoPlainText"><br />
</div><div class="MsoPlainText"><i>Please note:</i> Also Java 6 users are affected, if they use one of those JVM options, which are not enabled by default: -XX:+OptimizeStringConcat or -XX:+AggressiveOpts<o:p></o:p></div><div class="MsoPlainText"><br />
</div><div class="MsoPlainText">It is strongly recommended not to use any hotspot optimization switches in any Java version without extensive testing!<o:p></o:p></div><div class="MsoPlainText"><br />
</div><div class="MsoPlainText">In case you upgrade to Java 7, remember that you may have to reindex, as the unicode version shipped with Java 7 changed and tokenization behaves differently (e.g. lowercasing). For more information, read JRE_VERSION_MIGRATION.txt in your distribution package!<o:p></o:p></div><div class="MsoPlainText"><br />
</div><div class="MsoPlainText">On behalf of the Lucene project,<o:p></o:p></div><div class="MsoPlainText">Uwe<o:p></o:p></div><div class="MsoPlainText"><br />
</div><div class="MsoPlainText">[1] <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134">http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134</a><o:p></o:p></div><div class="MsoPlainText">[2] <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738">http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738</a><o:p></o:p></div><div class="MsoPlainText">[3] <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051">http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051</a><o:p></o:p></div><div class="MsoPlainText">[4] <a href="https://issues.apache.org/jira/browse/LUCENE-3335">https://issues.apache.org/jira/browse/LUCENE-3335</a><o:p></o:p></div><div class="MsoPlainText">[5] <a href="https://issues.apache.org/jira/browse/LUCENE-3346">https://issues.apache.org/jira/browse/LUCENE-3346</a><o:p></o:p></div><div class="MsoPlainText">[6] <a href="http://s.apache.org/StQ">http://s.apache.org/StQ</a></div></blockquote><br />
<b><span class="Apple-style-span" style="font-size: large;">What then happened is well-known:</span></b><br />
<br />
<ul><li>The first newsticker (heise online) released the message (because Europe is earlier awake than the U.S.): "<a href="http://www.heise.de/newsticker/meldung/Java-7-legt-Lucene-und-Solr-lahm-1288143.html">Java 7 legt Lucene und Solr lahm</a>" <i>(they added some nice background information, so it was a really good article)</i>.</li>
<li>Shortly after, the Austrians posted a report, interestingly they combined our warning with the official release article about Java 7 <i>(Der Standard)</i>: "<a href="http://derstandard.at/1311802255132/Update-Oracle-stellt-Java-7-vor">Oracle stellt Java 7 vor</a>"</li>
<li>Soon after, German "JAXenter" / "IT Republic" published the news, this time directly mentioning myself (I wrote some Lucene-related articles for their journal "Java Magazin" in the past): "<a href="http://it-republik.de/jaxenter/news/Apache-warnt-vor-Bugs-in-JDK-7-059924.html">Apache warnt vor Bugs in JDK 7</a>"</li>
<li>Some Japanese guys also posted this, I have no idea what they wrote, but it is the same message behind: "<a href="http://lucene.jugem.jp/?eid=450">Lucene/SolrをJava 7で使うときの注意(あるいはJava 6以前でもホットスポットのバグを踏む可能性あり)</a>"</li>
<li>Now it was time for the British news, UK's "Heise Online" published the English variant of the first article mentioned above: <a href="http://www.h-online.com/open/news/item/Java-7-paralyses-Lucene-and-Solr-1288210.html">Java 7 paralyses Lucene and Solr</a> <i>(Very nice title! In fact this is a translation of my warning message from "German-American" English to real British English)</i>.</li>
<li>Shortly after that, JAXenter posted: "<a href="http://jaxenter.com/apache-warn-java-7-causes-bugs-in-some-apache-projects-37111.html">Java 7 Could Cause Bugs in Some Apache Projects</a>"</li>
<li>Followed by eWeek Europe: "<a href="http://www.eweekeurope.co.uk/news/apache-developers-java-7-contains-bugs-35619">Apache Developers: Java 7 Contains Bugs</a>"</li>
</ul><br />
<div><b>Until now, California was still sleeping...</b><br />
<br />
</div><div><ul><li>The first American newsticker was CNET: "<a href="http://news.cnet.com/8301-1001_3-20085536-92/oracle-releases-buggy-java-se7/">Oracle releases 'buggy' Java SE7</a>"</li>
<li>Followed by i-Programmer: "<a href="http://www.i-programmer.info/news/80-java/2803-java-se7-out-now-but-with-a-bug.html">Java SE7 out now but with a bug</a>"</li>
</ul></div><div><br />
<b>In late afternoon CET, California woke up:</b><br />
<br />
</div><div><ul><li>Now the biggest U.S. newsticker <i>(InfoWorld) </i>posted its article: "<a href="http://www.infoworld.com/t/java-programming/apache-and-oracle-warn-serious-java-7-compiler-bugs-168516">Apache and Oracle warn of serious Java 7 compiler bugs</a>" <i>(the funny thing here is that suddenly not only "Apache" warned on the Java 7 bugs, also "Oracle" - looks like "whisper down the lane")</i></li>
</ul><br />
<b>On the evening before, Hoss already posted the following blog post on the Lucid Imagination website: "<a href="http://www.lucidimagination.com/blog/2011/07/28/dont-use-java-7-for-anything/">Don’t Use Java 7, For Anything</a>".</b><br />
<br />
</div><div><ul><li>After California woke up, the first person posted Hoss' blog post on Slashdot: "<a href="http://developers.slashdot.org/story/11/07/29/1639233/Java-7-Ships-With-Severe-Bug">Java 7 Ships With Severe Bug</a>" referring to Hoss' blog post.</li>
</ul><br />
<b>And then the whole thing went crazy: </b>On Twitter, new posts refering to Slashdot, InfoWorld, and, finally, Hoss' blog post on Lucid Imagination appeared every few seconds. There were more tweets stating "Don't use Java 7, For Anything" than tweets about "the first Java release since 4 years".</div><div><br />
</div><div>A little bit later I recognized the first pro-Oracle blog trying to explain some background information: "<a href="http://blog.eisele.net/2011/07/dont-use-java-7-are-you-kidding-me.html">Don't Use Java 7? Are you kidding me?</a>" (Markus Eisele). Thanks for posting this!<br />
<br />
I went to sleep and <b>on the following day, the original Oracle Bug report that caused this was upgraded to priority "HIGH" - yeah. So we will hopefully get a corrected Java 7 release quite soon in Update Pack 1!</b><br />
<br />
Finally I wanted to say thank you to the other Lucene committers that helped during investigation: <a href="http://rcmuir.wordpress.com/">Robert Muir</a>, <a href="http://yonik.wordpress.com/">Yonik Seeley</a>, <a href="http://www.dawidweiss.com/">Dawid Weiss</a>, and <a href="http://blog.mikemccandless.com/">Mike McCandless</a>. And of course <a href="http://www.lucidimagination.com/blog/author/hossman/">Hoss</a> for his funny caption on Lucid Imagination's blog!<br />
<br />
<span class="Apple-style-span" style="font-size: large;"><b>Update:</b> </span>(2011-08-01)<br />
<br />
On Saturday morning CET, Cay Horstmann, professor of computer science at San Jose State University, compared the JDK 7 bugs with the <a href="http://en.wikipedia.org/wiki/Pentium_FDIV_bug">Pentium FDIV bug</a> in 1994. In his blog article "<a href="http://weblogs.java.net/blog/cayhorstmann/archive/2011/07/29/java-7-unsafe-any-speed">Java 7 Unsafe at Any Speed?</a>", he stated that SIGSEGVs are easy bugs; much more serious are hidden bugs, only appearing under certain conditions and then silently produce wrong computation results. On Sunday evening CET, a user asked on Stackoverflow: "<a href="http://stackoverflow.com/questions/6894104/how-serious-is-the-java7-solr-lucene-bug">How serious is the Java7 'Solr/Lucene' bug?</a>" In his response, Robert Muir described his work how to track down Java's "pentium bug" and circumvent it, with no success <i>(see above)</i>.<br />
<br />
<span class="Apple-style-span">Luckily, Oracle raised the priority of the SIGSEGV bug (<a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134">#7070134</a>) to "HIGH", but the other two bugs are still on "MEDIUM". One of them (<a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738">#7044738</a>) is exactly such a bug that Cay Horstmann described in his blog post. We applied the patches for all three bugs to our JVM installation, the "HIGH" priority one only fixes the SIGSEGV. Since our wrong </span><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">readVInt</span><span class="Apple-style-span">-calculations in the method's loop are fixed by the combination of all three patches, wouldn't it be a good idea for Oracle, to rate all three issues with priority "HIGH"? Otherwise it could happen that we get service pack 1 only with the SIGSEGV bug fixed, but still silently producing corrupt indexes?</span><br />
<br />
On Monday, the German "JAXenter" / "IT Republic" published a nice article: "<a href="http://it-republik.de/jaxenter/news/Wie-gravierend-sind-die-Bugs-in-JDK7-wirklich-059938.html">Wie gravierend sind die Bugs in JDK7 wirklich?</a>", referring to the above posts. There is also an English version available: "<a href="http://jaxenter.com/java-7-causes-headaches-for-lucene-and-solr-users-37195.html">Java 7 Causes Headaches for Lucene and Solr Users</a>".<br />
<br />
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-size: large;"><b>Update:</b> </span>(2011-08-03)</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">Yesterday I had some interviews with journalists/bloggers:</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"></div><ul><li>One of them was again on the German "JAXenter" / "IT Republic": "<a href="http://it-republik.de/jaxenter/artikel/JDK7-Bugs-im-Fokus-Wenn-die-Fehler-auftreten-sind-sie-schwerwiegend-3981.html">JDK7-Bugs im Fokus: 'Wenn die Fehler auftreten, sind sie schwerwiegend'</a>". They also published the view of Markus Eisele: "<a href="http://it-republik.de/jaxenter/artikel/JDK7-Bugs-im-Fokus-Ich-gehe-davon-aus-dass-aktuell-nicht-viele-Anwender-betroffen-sein-werden-3982.html">JDK7-Bugs im Fokus: 'Ich gehe davon aus, dass aktuell nicht viele Anwender betroffen sein werden'</a>"</li>
<li>The Silicon Valley analyst, John K. Waters, published in his blog the following interview: "<a href="http://adtmag.com/Blogs/WatersWorks/2011/08/Uwe-Geir-on-Java-7-Bug.aspx">More on Java 7 Bug: Q&A with Uwe, Comments from Geir</a>"</li>
</ul><br />
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-size: large;"><b>Update:</b> </span>(2011-08-04)</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">Today, Neil McAllister published "<a href="http://www.infoworld.com/d/application-development/oracle-javas-worst-enemy-168828">Oracle: Java's worst enemy</a>" on the InfoWorld newsticker. When California woke up in the evening CET, someone posted this article on Slashdot with the title "<a href="http://developers.slashdot.org/story/11/08/04/1427213/Oracles-Java-Policies-Are-Destroying-the-Community">Oracle's Java Policies Are Destroying the Community</a>", resulting in a heated discussion and a high tweet rate again.<br />
<div><br />
</div><div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-size: large;"><b>Update:</b> </span>(2011-08-10)</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">The last days, community and journalists were blogging about the backgrounds:<br />
<ul><li>Andrew Binstock (Chief editor, Dr. Dobbs Journal): "<a href="http://drdobbs.com/java/231300060">Sloppy Work at Oracle</a>"</li>
<li>Richard Mayhew (The Serverside): "<a href="http://www.theserverside.com/discussions/thread.tss?thread_id=62764">Lucene should just shut up about Java 7</a>"</li>
<li>Ted Neward (DZone): "<a href="http://www.dzone.com/links/r/of_communities_companies_and_bugs_or_dr_dobbs_jou.html">Of communities, companies, and bugs (Or, "Dr Dobbs Journal is a slut!")</a>"</li>
<li>A nice conclusion was published on German JAXenter: "<a href="http://it-republik.de/jaxenter/news/Who-is-bad-Von-Java-7-Bugs-und-boesen-Buben-060011.html">Who is bad? Von Java 7, Bugs und bösen Buben</a>"</li>
<li>Jessica Thornsby (UK JAXenter) posted: "<a href="http://jaxenter.com/java-7-bugs-should-the-release-have-been-delayed-37370.html">Java 7 Bugs: Should the Release Have Been Delayed?</a>"</li>
</ul>Two days ago, Oracle offered to some committers that they could get access to early builds of the Java SE before they are released. This would allow us to check compatibility of bugfix releases and service packs of the Java SE with Apache Lucene and Solr. All this is covered by their <a href="http://www.oracle.com/technetwork/java/index-jsp-137266.html">Java CAP (Compatibility and Performance Program)</a>.<br />
<div><br />
</div><div></div></div></div><div></div></div><br />
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-size: large;"><b>Update:</b> </span>(2011-08-12)</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">Oracle published a <a href="http://blogs.oracle.com/henrik/entry/java_7_questions_answers">FAQ about Java 7</a>, that clarifies when and how the Lucene-related fixes in Java 7 will be released. They also mentioned another bug (<a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7077439">#7077439</a>), they investigated while fixing the others. Unfortunately the page is not accessible to the public, but the fix and an explanation is already <a href="http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-August/006055.html">reviewed</a> and <a href="http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-August/006059.html">committed</a> to OpenJDK. Does this mean, Oracle started to try and test Lucene builds with their JDK?</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"></div><ul></ul></div>Uwe Schindlerhttp://www.blogger.com/profile/08079070589736993766noreply@blogger.com285Bremen, Deutschland53.0932828 8.848487599999998652.8776248 8.5866210999999986 53.308940799999995 9.1103540999999986