Java Resources

The Latest Java Topics

We recently adopted the use of Spring Data. Spring Data provides a nice pattern/API that you can layer on top of JPA to eliminate boiler-plate code. With that adoption, we started looking at the DAO layer we use against Cassandra for some of our operations. Some of the data we store in Cassandra is simple. It does *not* leverage the flexible nature of NoSQL. In other words, we know all the table names, the column names ahead of time, and we don't anticipate them changing all that often. We could have stored this data in an RDBMs, using hibernate to access it, but standing up another persistence mechanism seemed like overkill. For simplicity's sake, we preferred storing this data in Cassandra. That said, we want the flexibility to move this to an RDBMs if we need to. Enter JPA. JPA would provide us a nice layer of abstraction away from the underlying storage mechanism. Wouldn't it be great if we could annotate the objects with JPA annotations, and persist them to Cassandra? Enter Kundera. Kundera is a JPA implementation that supports Cassandra (among other storage mechanisms). OK -- so JPA is great, and would get us what we want, but we had just adopted the use of Spring Data. Could we use both? The answer is "sort of". I forked off SpringSource's spring-data-cassandra: https://github.com/boneill42/spring-data-cassandra And I started hacking on it. I managed to get an implementation of the PagingAndSortingRepository for which I wrote unit tests that worked, but I was duplicating a lot of what should have come for free in the SimpleJpaRepository. When I tried to substitute my CassandraJpaRepository for the SimpleJpaRepository, I ran into some trouble w/ Kundera. Specifically, the MetaModel implementation appeared to be incomplete. MetaModelImpl was returning null for all managedTypes(). SimpleJpa wasn't too happy with this. Instead of wrangling with Kundera, we punted. We can achieve enough of the value leveraging JPA directly. Perhaps more importantly, there is still an impedance mismatch between JPA and NoSQL. In our case, it would have been nice to get at Cassandra through Spring Data using JPA for a few cases in our app, but for the vast majority of the application, a straight up ORM layer whereby we know the tables, rows and column names ahead of time is insufficient. For those cases where we don't know the schema ahead of time, we're going to need to leverage the converters pattern in Spring Data. So, I started hacking on a proper Spring Data layer using Astyanax as the client. Follow along here: https://github.com/boneill42/spring-data-cassandra More to come on that....

July 31, 2012

by Brian O' Neill

· 29,898 Views

Use Lucene’s MMapDirectory on 64bit Platforms, Please!

Don’t be afraid – Some clarification to common misunderstandings Since version 3.1, Apache Lucene and Solr use MMapDirectory by default on 64bit Windows and Solaris systems; since version 3.3 also for 64bit Linux systems. This change lead to some confusion among Lucene and Solr users, because suddenly their systems started to behave differently than in previous versions. On the Lucene and Solr mailing lists a lot of posts arrived from users asking why their Java installation is suddenly consuming three times their physical memory or system administrators complaining about heavy resource usage. Also consultants were starting to tell people that they should not use MMapDirectory and change their solrconfig.xml to work instead with slow SimpleFSDirectory or NIOFSDirectory (which is much slower on Windows, caused by a JVM bug #6265734). From the point of view of the Lucene committers, who carefully decided that using MMapDirectory is the best for those platforms, this is rather annoying, because they know, that Lucene/Solr can work with much better performance than before. Common misinformation about the background of this change causes suboptimal installations of this great search engine everywhere. In this blog post, I will try to explain the basic operating system facts regarding virtual memory handling in the kernel and how this can be used to largely improve performance of Lucene (“VIRTUAL MEMORY for DUMMIES”). It will also clarify why the blog and mailing list posts done by various people are wrong and contradict the purpose of MMapDirectory. In the second part I will show you some configuration details and settings you should take care of to prevent errors like “mmap failed” and suboptimal performance because of stupid Java heap allocation. Virtual Memory[1] Let’s start with your operating system’s kernel: The naive approach to do I/O in software is the way, you have done this since the 1970s – the pattern is simple: whenever you have to work with data on disk, you execute a syscall to your operating system kernel, passing a pointer to some buffer (e.g. a byte[] array in Java) and transfer some bytes from/to disk. After that you parse the buffer contents and do your program logic. If you don’t want to do too many syscalls (because those may cost a lot processing power), you generally use large buffers in your software, so synchronizing the data in the buffer with your disk needs to be done less often. This is one reason, why some people suggest to load the whole Lucene index into Java heap memory (e.g., by using RAMDirectory). But all modern operating systems like Linux, Windows (NT+), MacOS X, or Solaris provide a much better approach to do this 1970s style of code by using their sophisticated file system caches and memory management features. A feature called “virtual memory” is a good alternative to handle very large and space intensive data structures like a Lucene index. Virtual memory is an integral part of a computer architecture; implementations require hardware support, typically in the form of a memory management unit (MMU) built into the CPU. The way how it works is very simple: Every process gets his own virtual address space where all libraries, heap and stack space is mapped into. This address space in most cases also start at offset zero, which simplifies loading the program code because no relocation of address pointers needs to be done. Every process sees a large unfragmented linear address space it can work on. It is called “virtual memory” because this address space has nothing to do with physical memory, it just looks like so to the process. Software can then access this large address space as if it were real memory without knowing that there are other processes also consuming memory and having their own virtual address space. The underlying operating system works together with the MMU (memory management unit) in the CPU to map those virtual addresses to real memory once they are accessed for the first time. This is done using so called page tables, which are backed by TLBs located in the MMU hardware (translation lookaside buffers, they cache frequently accessed pages). By this, the operating system is able to distribute all running processes’ memory requirements to the real available memory, completely transparent to the running programs. Schematic drawing of virtual memory (image from Wikipedia [1], http://en.wikipedia.org/wiki/File:Virtual_memory.svg, licensed by CC BY-SA 3.0) By using this virtualization, there is one more thing, the operating system can do: If there is not enough physical memory, it can decide to “swap out” pages no longer used by the processes, freeing physical memory for other processes or caching more important file system operations. Once a process tries to access a virtual address, which was paged out, it is reloaded to main memory and made available to the process. The process does not have to do anything, it is completely transparent. This is a good thing to applications because they don’t need to know anything about the amount of memory available; but also leads to problems for very memory intensive applications like Lucene. Lucene & Virtual Memory Let’s take the example of loading the whole index or large parts of it into “memory” (we already know, it is only virtual memory). If we allocate a RAMDirectory and load all index files into it, we are working against the operating system: The operating system tries to optimize disk accesses, so it caches already all disk I/O in physical memory. We copy all these cache contents into our own virtual address space, consuming horrible amounts of physical memory (and we must wait for the copy operation to take place!). As physical memory is limited, the operating system may, of course, decide to swap out our large RAMDirectory and where does it land? – On disk again (in the OS swap file)! In fact, we are fighting against our O/S kernel who pages out all stuff we loaded from disk [2]. So RAMDirectory is not a good idea to optimize index loading times! Additionally, RAMDirectory has also more problems related to garbage collection and concurrency. Because the data residing in swap space, Java’s garbage collector has a hard job to free the memory in its own heap management. This leads to high disk I/O, slow index access times, and minute-long latency in your searching code caused by the garbage collector driving crazy. On the other hand, if we don’t use RAMDirectory to buffer our index and use NIOFSDirectory or SimpleFSDirectory, we have to pay another price: Our code has to do a lot of syscalls to the O/S kernel to copy blocks of data between the disk or filesystem cache and our buffers residing in Java heap. This needs to be done on every search request, over and over again. Memory Mapping Files The solution to the above issues is MMapDirectory, which uses virtual memory and a kernel feature called “mmap” [3] to access the disk files. In our previous approaches, we were relying on using a syscall to copy the data between the file system cache and our local Java heap. How about directly accessing the file system cache? This is what mmap does! Basically mmap does the same like handling the Lucene index as a swap file. The mmap() syscall tells the O/S kernel to virtually map our whole index files into the previously described virtual address space, and make them look like RAM available to our Lucene process. We can then access our index file on disk just like it would be a large byte[] array (in Java this is encapsulated by a ByteBuffer interface to make it safe for use by Java code). If we access this virtual address space from the Lucene code we don’t need to do any syscalls, the processor’s MMU and TLB handles all the mapping for us. If the data is only on disk, the MMU will cause an interrupt and the O/S kernel will load the data into file system cache. If it is already in cache, MMU/TLB map it directly to the physical memory in file system cache. It is now just a native memory access, nothing more! We don’t have to take care of paging in/out of buffers, all this is managed by the O/S kernel. Furthermore, we have no concurrency issue, the only overhead over a standard byte[] array is some wrapping caused by Java’s ByteBuffer interface (it is still slower than a real byte[] array, but that is the only way to use mmap from Java and is much faster than all other directory implementations shipped with Lucene). We also waste no physical memory, as we operate directly on the O/S cache, avoiding all Java GC issues described before. What does this all mean to our Lucene/Solr application? We should not work against the operating system anymore, so allocate as less as possible heap space (-Xmx Java option). Remember, our index accesses rely on passed directly to O/S cache! This is also very friendly to the Java garbage collector. Free as much as possible physical memory to be available for the O/S kernel as file system cache. Remember, our Lucene code works directly on it, so reducing the number of paging/swapping between disk and memory. Allocating too much heap to our Lucene application hurts performance! Lucene does not require it with MMapDirectory. Why does this only work as expected on operating systems and Java virtual machines with 64bit? One limitation of 32bit platforms is the size of pointers, they can refer to any address within 0 and 232-1, which is 4 Gigabytes. Most operating systems limit that address space to 3 Gigabytes because the remaining address space is reserved for use by device hardware and similar things. This means the overall linear address space provided to any process is limited to 3 Gigabytes, so you cannot map any file larger than that into this “small” address space to be available as big byte[] array. And when you mapped that one large file, there is no virtual space (address like “house number”) available anymore. As physical memory sizes in current systems already have gone beyond that size, there is no address space available to make use for mapping files without wasting resources (in our case “address space”, not physical memory!). On 64bit platforms this is different: 264-1 is a very large number, a number in excess of 18 quintillion bytes, so there is no real limit in address space. Unfortunately, most hardware (the MMU, CPU’s bus system) and operating systems are limiting this address space to 47 bits for user mode applications (Windows: 43 bits) [4]. But there is still much of addressing space available to map terabytes of data. Common misunderstandings If you have read carefully what I have told you about virtual memory, you can easily verify that the following is true: MMapDirectory does not consume additional memory and the size of mapped index files is not limited by the physical memory available on your server. By mmap() files, we only reserve address space not memory! Remember, address space on 64bit platforms is for free! MMapDirectory will not load the whole index into physical memory. Why should it do this? We just ask the operating system to map the file into address space for easy access, by no means we are requesting more. Java and the O/S optionally provide the option to try loading the whole file into RAM (if enough is available), but Lucene does not use that option (we may add this possibility in a later version). MMapDirectory does not overload the server when “top” reports horrible amounts of memory. “top” (on Linux) has three columns related to memory: “VIRT”, “RES”, and “SHR”. The first one (VIRT, virtual) is reporting allocated virtual address space (and that one is for free on 64 bit platforms!). This number can be multiple times of your index size or physical memory when merges are running in IndexWriter. If you have only one IndexReader open it should be approximately equal to allocated heap space (-Xmx) plus index size. It does not show physical memory used by the process. The second column (RES, resident) memory shows how much (physical) memory the process allocated for operating and should be in the size of your Java heap space. The last column (SHR, shared) shows how much of the allocated virtual address space is shared with other processes. If you have several Java applications using MMapDirectory to access the same index, you will see this number going up. Generally, you will see the space needed by shared system libraries, JAR files, and the process executable itself (which are also mmapped). How to configure my operating system and Java VM to make optimal use of MMapDirectory? First of all, default settings in Linux distributions and Solaris/Windows are perfectly fine. But there are some paranoid system administrators around, that want to control everything (with lack of understanding). Those limit the maximum amount of virtual address space that can be allocated by applications. So please check that “ulimit -v” and “ulimit -m” both report “unlimited”, otherwise it may happen that MMapDirectory reports “mmap failed” while opening your index. If this error still happens on systems with lot’s of very large indexes, each of those with many segments, you may need to tune your kernel parameters in /etc/sysctl.conf: The default value of vm.max_map_count is 65530, you may need to raise it. I think, for Windows and Solaris systems there are similar settings available, but it is up to the reader to find out how to use them. For configuring your Java VM, you should rethink your memory requirements: Give only the really needed amount of heap space and leave as much as possible to the O/S. As a rule of thumb: Don’t use more than ¼ of your physical memory as heap space for Java running Lucene/Solr, keep the remaining memory free for the operating system cache. If you have more applications running on your server, adjust accordingly. As usual the more physical memory the better, but you don’t need as much physical memory as your index size. The kernel does a good job in paging in frequently used pages from your index. A good possibility to check that you have configured your system optimally is by looking at both "top" (and correctly interpreting it, see above) and the similar command "iotop" (can be installed, e.g., on Ubuntu Linux by "apt-get install iotop"). If your system does lots of swap in/swap out for the Lucene process, reduce heap size, you possibly used too much. If you see lot's of disk I/O, buy more RUM (Simon Willnauer) so mmapped files don't need to be paged in/out all the time, and finally: buy SSDs. Happy mmapping! Bibliography [1] http://en.wikipedia.org/wiki/Virtual_memory [2] https://www.varnish-cache.org/trac/wiki/ArchitectNotes [3] http://en.wikipedia.org/wiki/Memory-mapped_file [4] http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details

July 31, 2012

by Uwe Schindler

· 13,317 Views · 1 Like

How Many Java developers are There in the World?

Oracle says it’s 9,000,000. Wikipedia claims it’s 10,000,000. And the guys from NumberOf.net seem to be the most precise - they know that there are exactly 9,007,346 Java developers out there. Nice numbers. I have used those articles as reference points while speaking about the potential market size for our memory leak detection tool. But something in these numbers has bothered me for years - there is no trustworthy and public analysis behind those numbers. Its just conjured up from thin air. So I finally thought I would do something about it and try to figure it out for good. It proved out to be a challenging task. After all - with more than seven billion people on our planet I couldn't call everyone and ask them. Well, maybe I could, but if every call would take on average 20 seconds I would need at least 4,439 years to complete the survey. If I would not sleep nor eat nor rest. So I had to use other ways for estimation. After playing around with different sources of information, I decided to dig into four of them for a closer look: Labour statistics provided by different governments Language popularity sites such as Tiobe and Langpop Employment portals using Indeed.com and Monster.com Download numbers on popular Java tools and libraries - namely Eclipse and Tomcat. Using that information I wanted to estimate the number using three different calculations - based on language popularity indexes, labour statistics and download figures. So, here we go. How many programmers could there be in total? World population is currently above seven billion. Out of those seven billion we can leave out sub-Saharan Africa (900M) and rural Asia (about 50% of its 2.2B population) as negligible. This leaves us with approximately 5 billion people living in regions where overall economical and cultural background can be considered suitable for software industries to spawn. Now, out of those 5,000,000,000 how many could be actually developing software? A good answer at StackExchange gives us some pointers as to where we can find information on the percentage of software developers in different countries. Using the US, Japan, Canada, the EU27 and the UK as a baseline we can estimate that 0.82% of the population is employed as a software developer or programmer: Country Population Developers % Canada 33,476,688 387,000 1.16% EU27 502,486,499 5,900,000 1.17% Japan 127,799,000 1,016,929 0.80% UK 63,162,000 333,000 0.53% US 313,931,000 1,336,300 0.43% Weighted average: 0.86% 0.86% out of five billion is 43,000,000. Lets remember this number, as it will be used as a baseline in following calculations. Popularity contests In the popularity contest we will use two channels for the source of data - the TIOBE index and the Langpop one. Other sources such as Dataist figures were hard to interpret, so we’ll stick just to those two. For the background - the TIOBE ratings are calculated by counting hits of the most popular search engines. The search query that is used is +" programming", e.g. +“Java programming” in our case. Langpop uses more sources for input besides search engine queries - in equal weights it traces open job positions, book titles, search engine results, the number of open source projects and other data to calculate its popularity score. Simplifying TIOBE and Langpop results, we can conclude that according to TIOBE 17% and according to Langpop ~15% of the programmers in the world are using Java. Averaging those numbers we can say that around 16% out of the 43,000,000 developers in the world use Java. This translates to 6,880,000 Java developers out there. Job portals Job portals, especially when considering both available positions and uploaded resumes, are definitely a good source of information. The larger ones also provide nice reports on labour market, which we will dig into next. Note that we used Indeed.com and Monster.com - if you can point us towards more and/or better sources of information, we would be glad to correct our calculations. But using this analysis from Monster.com and the aggregated statistics from Indeed.com we can say that ~18% of Monster.com applicants can program in Java and ~16% of open engineering / programming positions scanned by Indeed.com are looking for Java talent. Averaging those numbers we arrive at 17%. Which out of 41,000,000 programmers in total would translate to 7,310,000 Java guys and girls in the world. Software downloads Every Java developer uses something to build the application. Well, we expect them to use at least a JVM and a compiler. If you happen to know anyone who can get away without those two, please let us know. We would hire him immediately. But most of us tend to use more than just a compiler and a virtual machine. We use IDEs, application servers, build tools, etc. So we figured that we would look into the publicly available download numbers of these tools and try to estimate the number of developers from the download numbers. When calculating the total number of developers from estimated number of users, we take into account the market share of the corresponding software. To estimate the market share we use Zeroturnaround’s statistics gathered in the spring of 2012. Eclipse downloads. Eclipse Juno was released on June 27 and has been downloaded 1,200,000 times during the first 20 days. Looking into the historical data published by eclipse.org we can predict that Juno will be downloaded approximately 8,000,000 times in total. Last four major Eclipse releases have all been released using a yearly release calendar and all the releases took place in June: Juno - 8,000,000 (in a year, expecting the trend to continue. Currently has 1,200,000 downloads in first 20 days). Indigo - 6,000,000 downloads Helios - 4,100,000 downloads Galileo - 2,200,000 downloads Averaging Juno estimates and Indigo results, we can say that Eclipse is downloaded approximately 7,000,000 times a year. Using the Zeroturnaround’s statistics, we expect 68% of Java developers to use Eclipse as a (primary) IDE. If we now make a bold claim that each Java developer on Eclipse will download the IDE exactly once a year, expect the number of downloads per year to be 7,000,000 and consider that 32% of Java developers do not use Eclipse at all, we come to a conclusion that there should be 10,300,00 Java developers in total. Apache Tomcat downloads. Vadim Gritsenko has put together some nice statistics on top of Apache logs. From there we can see that during the last year Tomcat has been downloaded approximately 550,000 times/month. This gives us a yearly total of 6,600,000 Tomcat downloads. Applying now statistics from the same report used for calculating Eclipse’s market share we can estimate that 59% of Java developers are using Tomcat as one of their development platform. If we now again make a bold claim that each Java developer on Tomcat will download every major release exactly once and consider that 41% of Java developers do not use Tomcat, we reach to conclusion that there should be 11,186,000 Java developers out there. Averaging the numbers from Eclipse and Tomcat downloads, we end up with 10,743,000 Java developers. Conclusions We used three different sources for estimation - popularity contests, job market analysis and download numbers of popular Java development infrastructure products. The numbers varied quite a bit - from 6,880,000 to 10,743,000. Aggressively averaging the three numbers we can conclude that there are 8,311,000 Java developers out there. Not quite as much as Oracle or Wikipedia think, but still enough to build a business that provides developing tools for the Java community. Lies. Damn lies. And statistics.

July 20, 2012

by Nikita Salnikov-Tarnovski

· 23,831 Views

How Changing Java Package Names Transformed my System Architecture

Changing your perspective even a small amount can have profound effects on how you approach your system. Let’s say you’re writing a web application in Java. In the system you deal with orders, customers and products. As a web application, your classes include staples like PersonController, PersonRepository, CustomerController and OrderService. How do you organize your classes into packages? There are two fundamental ways to structure your packages. Either you can focus on the logical tiers, like com.brodwall.myapp.controllers, com.brodwall.myapp.domain or perhaps com.brodwall.myapp.services.customer. Or you can focus on the domain contexts, like com.brodwall.myapp.customer, com.brodwall.myapp.orders and com.brodwall.myapp.products. The first approach is by far the most prevalent. In my view, it’s also the least helpful. Here are some ways your thinking changes if you structure your packages around domain concepts, rather than technological tiers: First, and most fundamentally, your mental model will now be aligned with that of the users of your system. If you’re asked to implement a typical feature, it is now more likely to be focused around a strict subset of the packages of your system. For example, adding a new field to a form will at least affect the presentation logic, entity and persistence layer for the corresponding domain concept. If your packages are organized around tiers, this change will hit all over your system. In a word: A system organized around features, rather than technologies, have higher coherence. This technical term means that a large percentage of a the dependencies of a class are located close to that class. Secondly, organizing around domain concepts will give you more options when your software grows. When a package contains tens of classes, you may want to split it up in several packages. The discussion can itself be enlightening. “Maybe we should separate out the customer address classes into a com.brodwall.myapp.customer.address package. It seems to have a bit of a life on its own.” “Yeah, and maybe we can use the same classes for other places we need addresses, such as suppliers?” “Cool, so com.brodwall.myapp.address, then?” Or maybe you decide that order status codes and payment status codes deserve to be in the “com.brodwall.myapp.order.codes” package. On the other hand, what options do you have for splitting up com.brodwall.myapp.controllers? You could create subpackages for customer, orders and products, but these subpackages may only have one or possibly two classes each. Finally, and perhaps most intriguingly, using domain concepts for packages allows you to vary the design according on a case by case basis. Maybe you really need a OrderService which coordinates the payment and shipping of an order, while ProductController only needs basic create-retrieve-update-delete functionality with a repository. A ProductService would just get in the way. If ProductService is missing from the com.brodwall.myapp.services package, this may be confusing or at the very least give you a nagging feeling that something is wrong. On the other hand, if there’s no Controller in the com.brodwall.myapp.product package, it doesn’t matter much. Also, most systems have some good parts and some not-so-good parts. If your Services package is not working for you, there’s not much you can do. But if the Products package is rotten, you can throw it out and reimplement it without the whole system being thrown into a state of chaos. By putting the classes needed to implement a feature together with each other and apart from the classes needed to implement other features, developers can be pragmatic and innovative when developing one feature without negatively affecting other features. The flip side of this is that most developers are more comfortable with some technologies in the application and less comfortable with other technologies. Organizing around features instead of technologies force each developer to consider a larger set of technological challenges. Some programmers take this as a motivating challenge to learn, while others, it seems, would rather not have to learn something new. If it were my money being spend to create features, I know what kind of developer I would want. Trivial changes can have large effects. By organizing your software around features, you get a more coherent system that allows for growth. It may challenge your developers, but it drives down the number of hand-offs needed to implement a feature and it challenges the developers to improve the parts of the application they are working on. See also my blog post on Architecture as tidying up.

July 20, 2012

by Johannes Brodwall

· 16,884 Views

Spring Data - Apache Hadoop

Spring for Apache Hadoop is a Spring project to support writing applications that can benefit of the integration of Spring Framework and Hadoop. This post describes how to use Spring Data Apache Hadoop in an Amazon EC2 environment using the “Hello World” equivalent of Hadoop programming – a Wordcount application. 1./ Launch an Amazon Web Services EC2 instance. - Navigate to AWS EC2 Console (“https://console.aws.amazon.com/ec2/home”): - Select Launch Instance then Classic Wizzard and click on Continue. My test environment was a “Basic Amazon Linux AMI 2011.09″ 32-bit., Instant type: Micro (t1.micro , 613 MB), Security group quick-start-1 that enables ssh to be used for login. Select your existing key pair (or create a new one). Obviously you can select another AMI and instance types depending on your favourite flavour. (Should you vote for Windows 2008 based instance, you also need to have cygwin installed as an additional Hadoop prerequisite beside Java JDK and ssh, see “Install Apache Hadoop” section) 2./ Download Apache Hadoop - as of writing this article, 1.0.0 is the latest stable version of Apache Hadoop, that is what was used for testing purposes. I downloaded hadoop-1.0.0.tar.gz and copied it into /home/ec2-user directory using pscp command from my PC running Windows: c:\downloads>pscp -i mykey.ppk hadoop-1.0.0.tar.gz ec2-user@ec2-176-34-201-185.eu-west-1.compute.amazonaws.com:/home/ec2-user (the computer name above – ec2-ipaddress-region-compute.amazonaws.com – can be found on AWS EC2 console, Instance Description, public DNS field) 3./ Install Apache Hadoop: As prerequisites, you need to have Java JDK 1.6 and ssh installed, see Apache Single-Node Setup Guide. (ssh is automatically installed with Basic Amazon AMI). Then install hadoop itself: $ cd ~ # change directory to ec2-user home (/home/ec2-user) $ tar xvzf hadoop-1.0.0.tar.gz $ ln -s hadoop-1.0.0 hadoop $ cd hadoop/conf $ vi hadoop-env.sh # edit as below export JAVA_HOME=/opt/jdk1.6.0_29 $ vi core-site.xml # edit as below – this defines the namenode to be running on localhost and listeing to port 9000. fs.default.name hdfs://localhost:9000 $ vi hdsf-site.xml # edit as below this defines that file system replicate is 1 (in production environment it is supposed to be 3 by default) dfs.replication 1 $ vi mapred-site.xml # edit as below – this defines the jobtracker to be running on localhost and listeing to port 9001. mapred.job.tracker localhost:9001 $ cd ~/hadoop $ bin/hadoop namenode -format $ bin/start-all.sh At this stage all hadoop jobs are running in pseudo distributed mode, you can verify it by running: $ ps -ef | grep java You should see 5 java processes: namenode, secondarynamenode, datanode, jobtracker and tasktracker. 4./ Install Spring Data Hadoop Download Spring Data Hadoop package from SpringSource community download site. As of writing this article, the latest stable version is spring-data-hadoop-1.0.0.M1.zip. $ cd ~ $ tar xzvf spring-data-hadoop-1.0.0.M1.zip $ ln -s spring-data-hadoop-1.0.0.M1 spring-data-hadoop 5./ Build and Run Spring Data Hadoop Wordcount example $ cd spring-data-hadoop/spring-data-hadoop-1.0.0.M1/samples/wordcount Spring Data Hadoop is using gradle as build tool. Check build.grandle build file. The original version packaged in the tar.gz file does not compile, it complains about thrift, version 0.2.0 and jdo2-api, version2.3-ec. Add datanucleus.org maven repository to the build.gradle file to support jdo2-api (http://www.datanucleus.org/downloads/maven2/) . Unfortunatelly, there seems to be no maven repo for thrift 0.2.0 . You should download thrift 0.2.0.jar and thrift.0.2.0.pom file e.g. from this repo: “http://people.apache.org/~rawson/repo“ and then add it to local maven repo. $ mvn install:install-file -DgroupId=org.apache.thrift -DartifactId=thrift -Dversion=0.2.0 -Dfile=thrift-0.2.0.jar -Dpackaging=jar $ vi build.grandle # modify the build file to refer to datanucleus maven repo for jdo2-api and the local repo for thrift repositories { // Public Spring artefacts mavenCentral() maven { url “http://repo.springsource.org/libs-release” } maven { url “http://repo.springsource.org/libs-milestone” } maven { url “http://repo.springsource.org/libs-snapshot” } maven { url “http://www.datanucleus.org/downloads/maven2/” } maven { url “file:///home/ec2-user/.m2/repository” } } I also modified the META-INF/spring/context.xml file in order to run hadoop file system commands manually: $ cd /home/ec2-user/spring-data-hadoop/spring-data-hadoop-1.0.0.M1/samples/wordcount/src/main/resources $vi META-INF/spring/context.xml # remove clean-script and also the dependency on it for JobRunner. xmlns=”http://www.springframework.org/schema/beans” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xmlns:context=”http://www.springframework.org/schema/context” xmlns:hdp=”http://www.springframework.org/schema/hadoop” xmlns:p=”http://www.springframework.org/schema/p” xsi:schemaLocation=”http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd”> fs.default.name=${hd.fs} Copy the sample file – nietzsche-chapter-1.txt – to Hadoop file system (/user/ec2-user-/input directory) $ cd src/main/resources/data $ hadoop fs -mkdir /user/ec2-user/input $ hadoop fs -put nietzsche-chapter-1.txt /user/ec2-user/input/data $ cd ../../../.. # go back to samples/wordcount directory $ ../gradlew Verify the result: $ hadoop fs -cat /user/ec2-user/output/part-r-00000 | more “AWAY 1 “BY 1 “Beyond 1 “By 2 “Cheers 1 “DE 1 “Everywhere 1 “FROM” 1 “Flatterers 1 “Freedom 1

July 19, 2012

by Istvan Szegedi

· 11,512 Views

How to Resolve java.lang.NoClassDefFoundError: Part 3

This article is part 3 of our NoClassDefFoundError troubleshooting series. As I mentioned in my first article, there are many possible issues that can lead to a NoClassDefFoundError. This article will focus and describe one of the most common causes of this problem: failure of a Java class static initializer block or variable. A sample Java program will be provided and I encourage you to compile and run this example from your workstation in order to properly replicate and understand this type of NoClassDefFoundError problem. Java static initializer revisited The Java programming language provides you with the capability to “statically” initialize variables or a block of code. This is achieved via the “static” variable identifier or the usage of a static {} block at the header of a Java class. Static initializers are guaranteed to be executed only once in the JVM life cycle and are Thread safe by design which make their usage quite appealing for static data initialization such as internal object caches, loggers etc. What is the problem? I will repeat again, static initializers are guaranteed to be executed only once in the JVM life cycle…This means that such code is executed at the Class loading time and never executed again until you restart your JVM. Now what happens if the code executed at that time (@Class loading time) terminates with an unhandled Exception? Welcome to the java.lang.NoClassDefFoundError problem case #2! NoClassDefFoundError problem case 2 – static initializer failure This type of problem is occurring following the failure of static initializer code combined with successive attempts to create a new instance of the affected (non-loaded) class. Sample Java program The following simple Java program is split as per below: The main Java program NoClassDefFoundErrorSimulator The affected Java class ClassA ClassA provides you with a ON/OFF switch allowing you the replicate the type of problem that you want to study This program is simply attempting to create a new instance of ClassA 3 times (one after each other). It will demonstrate that an initial failure of either a static variable or static block initializer combined with successive attempt to create a new instance of the affected class triggers java.lang.NoClassDefFoundError. #### NoClassDefFoundErrorSimulator.java package org.ph.javaee.tools.jdk7.training2; /** * NoClassDefFoundErrorSimulator * @author Pierre-Hugues Charbonneau * */ public class NoClassDefFoundErrorSimulator { /** * @param args */ public static void main(String[] args) { System.out.println("java.lang.NoClassDefFoundError Simulator - Training 2"); System.out.println("Author: Pierre-Hugues Charbonneau"); System.out.println("http://javaeesupportpatterns.blogspot.com\n\n"); try { // Create a new instance of ClassA (attempt #1) System.out.println("FIRST attempt to create a new instance of ClassA...\n"); ClassA classA = new ClassA(); } catch (Throwable any) { any.printStackTrace(); } try { // Create a new instance of ClassA (attempt #2) System.out.println("\nSECOND attempt to create a new instance of ClassA...\n"); ClassA classA = new ClassA(); } catch (Throwable any) { any.printStackTrace(); } try { // Create a new instance of ClassA (attempt #3) System.out.println("\nTHIRD attempt to create a new instance of ClassA...\n"); ClassA classA = new ClassA(); } catch (Throwable any) { any.printStackTrace(); } System.out.println("\n\ndone!"); } } #### ClassA.java package org.ph.javaee.tools.jdk7.training2; /** * ClassA * @author Pierre-Hugues Charbonneau * */ public class ClassA { private final static String CLAZZ = ClassA.class.getName(); // Problem replication switch ON/OFF private final static boolean REPLICATE_PROBLEM1 = true; // static variable initializer private final static boolean REPLICATE_PROBLEM2 = false; // static block{} initializer // Static variable executed at Class loading time private static String staticVariable = initStaticVariable(); // Static initializer block executed at Class loading time static { // Static block code execution... if (REPLICATE_PROBLEM2) throw new IllegalStateException("ClassA.static{}: Internal Error!"); } public ClassA() { System.out.println("Creating a new instance of "+ClassA.class.getName()+"..."); } /** * * @return */ private static String initStaticVariable() { String stringData = ""; if (REPLICATE_PROBLEM1) throw new IllegalStateException("ClassA.initStaticVariable(): Internal Error!"); return stringData; } } Problem reproduction In order to replicate the problem, we will simply “voluntary” trigger a failure of the static initializer code. Please simply enable the problem type that you want to study e.g. either static variable or static block initializer failure: // Problem replication switch ON (true) / OFF (false) private final static boolean REPLICATE_PROBLEM1 = true; // static variable initializer private final static boolean REPLICATE_PROBLEM2 = false; // static block{} initializer Now, let’s run the program with both switch at OFF (both boolean values at false) ## Baseline (normal execution) java.lang.NoClassDefFoundError Simulator - Training 2 Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com FIRST attempt to create a new instance of ClassA... Creating a new instance of org.ph.javaee.tools.jdk7.training2.ClassA... SECOND attempt to create a new instance of ClassA... Creating a new instance of org.ph.javaee.tools.jdk7.training2.ClassA... THIRD attempt to create a new instance of ClassA... Creating a new instance of org.ph.javaee.tools.jdk7.training2.ClassA... done! For the initial run (baseline), the main program was able to create 3 instances of ClassA successfully with no problem. ## Problem reproduction run (static variable initializer failure) java.lang.NoClassDefFoundError Simulator - Training 2 Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com FIRST attempt to create a new instance of ClassA... java.lang.ExceptionInInitializerError at org.ph.javaee.tools.jdk7.training2.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:21) Caused by: java.lang.IllegalStateException: ClassA.initStaticVariable(): Internal Error! at org.ph.javaee.tools.jdk7.training2.ClassA.initStaticVariable(ClassA.java:37) at org.ph.javaee.tools.jdk7.training2.ClassA.(ClassA.java:16) ... 1 more SECOND attempt to create a new instance of ClassA... java.lang.NoClassDefFoundError: Could not initialize class org.ph.javaee.tools.jdk7.training2.ClassA at org.ph.javaee.tools.jdk7.training2.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:30) THIRD attempt to create a new instance of ClassA... java.lang.NoClassDefFoundError: Could not initialize class org.ph.javaee.tools.jdk7.training2.ClassA at org.ph.javaee.tools.jdk7.training2.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:39) done! ## Problem reproduction run (static block initializer failure) java.lang.NoClassDefFoundError Simulator - Training 2 Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com FIRST attempt to create a new instance of ClassA... java.lang.ExceptionInInitializerError at org.ph.javaee.tools.jdk7.training2.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:21) Caused by: java.lang.IllegalStateException: ClassA.static{}: Internal Error! at org.ph.javaee.tools.jdk7.training2.ClassA.(ClassA.java:22) ... 1 more SECOND attempt to create a new instance of ClassA... java.lang.NoClassDefFoundError: Could not initialize class org.ph.javaee.tools.jdk7.training2.ClassA at org.ph.javaee.tools.jdk7.training2.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:30) THIRD attempt to create a new instance of ClassA... java.lang.NoClassDefFoundError: Could not initialize class org.ph.javaee.tools.jdk7.training2.ClassA at org.ph.javaee.tools.jdk7.training2.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:39) done! What happened? As you can see, the first attempt to create a new instance of ClassA did trigger a java.lang.ExceptionInInitializerError. This exception indicates the failure of our static initializer for our static variable & bloc which is exactly what we wanted to achieve. The key point to understand at this point is that this failure did prevent the whole class loading of ClassA. As you can see, attempt #2 and attempt #3 both generated a java.lang.NoClassDefFoundError, why? Well since the first attempt failed, class loading of ClassA was prevented. Successive attempts to create a new instance of ClassA within the current ClassLoader did generate java.lang.NoClassDefFoundError over and over since ClassA was not found within current ClassLoader. As you can see, in this problem context, the NoClassDefFoundError is just a symptom or consequence of another problem. The original problem is the ExceptionInInitializerError triggered following the failure of the static initializer code. This clearly demonstrates the importance of proper error handling and logging when using Java static initializers. Recommendations and resolution strategies Now find below my recommendations and resolution strategies for NoClassDefFoundError problem case 2: - Review the java.lang.NoClassDefFoundError error and identify the missing Java class - Perform a code walkthrough of the affected class and determine if it contains static initializer code (variables & static block) - Review your server and application logs and determine if any error (e.g. ExceptionInInitializerError) originates from the static initializer code - Once confirmed, analyze the code further and determine the root cause of the initializer code failure. You may need to add some extra logging along with proper error handling to prevent and better handle future failures of your static initializer code going forward Please feel free to post any question or comment. The part 4 will start coverage of NoClassDefFoundError problems related to class loader problems.

July 19, 2012

by Pierre - Hugues Charbonneau

· 90,430 Views · 3 Likes

5 Tips for Proper Java Heap Size

Determination of proper Java Heap size for a production system is not a straightforward exercise. In my Java EE enterprise experience, I have seen multiple performance problem cases due to inadequate Java Heap capacity and tuning. This article will provide you with 5 tips that can help you determine optimal Java Heap size, as a starting point, for your current or new production environment. Some of these tips are also very useful regarding the prevention and resolution of java.lang.OutOfMemoryError problems; including memory leaks. Please note that these tips are intended to “help you” determine proper Java Heap size. Since each IT environment is unique, you are actually in the best position to determine precisely the required Java Heap specifications of your client’s environment. Some of these tips may also not be applicable in the context of a very small Java standalone application but I still recommend you to read the entire article. Future articles will include tips on how to choose the proper Java VM garbage collector type for your environment and applications. #1 – JVM: you always fear what you don't understand How can you expect to configure, tune and troubleshoot something that you don’t understand? You may never have the chance to write and improve Java VM specifications but you are still free to learn its foundation in order to improve your knowledge and troubleshooting skills. Some may disagree, but from my perspective, the thinking that Java programmers are not required to know the internal JVM memory management is an illusion. Java Heap tuning and troubleshooting can especially be a challenge for Java & Java EE beginners. Find below a typical scenario: - Your client production environment is facing OutOfMemoryError on a regular basis and causing lot of business impact. Your support team is under pressure to resolve this problem - A quick Google search allows you to find examples of similar problems and you now believe (and assume) that you are facing the same problem - You then grab JVM -Xms and -Xmx values from another person OutOfMemoryError problem case, hoping to quickly resolve your client’s problem - You then proceed and implement the same tuning to your environment. 2 days later you realize problem is still happening (even worse or little better)…the struggle continues… What went wrong? - You failed to first acquire proper understanding of the root cause of your problem - You may also have failed to properly understand your production environment at a deeper level (specifications, load situation etc.). Web searches is a great way to learn and share knowledge but you have to perform your own due diligence and root cause analysis - You may also be lacking some basic knowledge of the JVM and its internal memory management, preventing you to connect all the dots together My #1 tip and recommendation to you is to learn and understand the basic JVM principles along with its different memory spaces. Such knowledge is critical as it will allow you to make valid recommendations to your clients and properly understand the possible impact and risk associated with future tuning considerations. Now find below a quick high level reference guide for the Java VM: The Java VM memory is split up to 3 memory spaces: The Java Heap. Applicable for all JVM vendors, usually split between YoungGen (nursery) & OldGen (tenured) spaces. The PermGen (permanent generation). Applicable to the Sun HotSpot VM only (PermGen space will be removed in future Java 7 or Java 8 updates) The Native Heap (C-Heap). Applicable for all JVM vendors. I recommend that you review each article below, including Sun white paper on the HotSpot Java memory management. I also encourage you to download and look at the OpenJDK implementation. ## Sun HotSpot VM http://javaeesupportpatterns.blogspot.com/2011/08/java-heap-space-hotspot-vm.html ## IBM VM http://javaeesupportpatterns.blogspot.com/2012/02/java-heap-space-ibm-vm.html ## Oracle JRockit VM http://javaeesupportpatterns.blogspot.com/2012/02/java-heap-space-jrockit-vm.html ## Sun (Oracle) – Java memory management white paper http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf ## OpenJDK – Open-source Java implementation http://openjdk.java.net/ As you can see, the Java VM memory management is more complex than just setting up the biggest value possible via –Xmx. You have to look at all angles, including your native and PermGen space requirement along with physical memory availability (and # of CPU cores) from your physical host(s). It can get especially tricky for 32-bit JVM since the Java Heap and native Heap are in a race. The bigger your Java Heap, smaller the native Heap. Attempting to setup a large Heap for a 32-bit VM e.g .2.5 GB+ increases risk of native OutOfMemoryError depending of your application(s) footprint, number of Threads etc. 64-bit JVM resolves this problem but you are still limited to physical resources availability and garbage collection overhead (cost of major GC collections go up with size). The bottom line is that the bigger is not always the better so please do not assume that you can run all your 20 Java EE applications on a single 16 GB 64-bit JVM process. #2 – Data and application is king: review your static footprint requirement Your application(s) along with its associated data will dictate the Java Heap footprint requirement. By static memory, I mean “predictable” memory requirements as per below. - Determine how many different applications you are planning to deploy to a single JVM process e.g. number of EAR files, WAR files, jar files etc. The more applications you deploy to a single JVM, higher demand on native Heap - Determine how many Java classes will be potentially loaded at runtime; including third part API’s. The more class loaders and classes that you load at runtime, higher demand on the HotSpot VM PermGen space and internal JIT related optimization objects - Determine data cache footprint e.g. internal cache data structures loaded by your application (and third party API’s) such as cached data from a database, data read from a file etc. The more data caching that you use, higher demand on the Java Heap OldGen space - Determine the number of Threads that your middleware is allowed to create. This is very important since Java threads require enough native memory or OutOfMemoryError will be thrown For example, you will need much more native memory and PermGen space if you are planning to deploy 10 separate EAR applications on a single JVM process vs. only 2 or 3. Data caching not serialized to a disk or database will require extra memory from the OldGen space. Try to come up with reasonable estimates of the static memory footprint requirement. This will be very useful to setup some starting point JVM capacity figures before your true measurement exercise (e.g. tip #4). For 32-bit JVM, I usually do not recommend a Java Heap size high than 2 GB (-Xms2048m, -Xmx2048m) since you need enough memory for PermGen and native Heap for your Java EE applications and threads. This assessment is especially important since too many applications deployed in a single 32-bit JVM process can easily lead to native Heap depletion; especially in a multi threads environment. For a 64-bit JVM, a Java Heap size of 3 GB or 4 GB per JVM process is usually my recommended starting point. #3 – Business traffic set the rules: review your dynamic footprint requirement Your business traffic will typically dictate your dynamic memory footprint. Concurrent users & requests generate the JVM GC “heartbeat” that you can observe from various monitoring tools due to very frequent creation and garbage collections of short & long lived objects. As you saw from the above JVM diagram, a typical ratio of YoungGen vs. OldGen is 1:3 or 33%. For a typical 32-bit JVM, a Java Heap size setup at 2 GB (using generational & concurrent collector) will typically allocate 500 MB for YoungGen space and 1.5 GB for the OldGen space. Minimizing the frequency of major GC collections is a key aspect for optimal performance so it is very important that you understand and estimate how much memory you need during your peak volume. Again, your type of application and data will dictate how much memory you need. Shopping cart type of applications (long lived objects) involving large and non-serialized session data typically need large Java Heap and lot of OldGen space. Stateless and XML processing heavy applications (lot of short lived objects) require proper YoungGen space in order to minimize frequency of major collections. Example: - You have 5 EAR applications (~2 thousands of Java classes) to deploy (which include middleware code as well…) - Your native heap requirement is estimated at 1 GB (has to be large enough to handle Threads creation etc.) - Your PermGen space is estimated at 512 MB - Your internal static data caching is estimated at 500 MB - Your total forecast traffic is 5000 concurrent users at peak hours - Each user session data footprint is estimated at 500 K - Total footprint requirement for session data alone is 2.5 GB under peak volume As you can see, with such requirement, there is no way you can have all this traffic sent to a single JVM 32-bit process. A typical solution involves splitting (tip #5) traffic across a few JVM processes and / or physical host (assuming you have enough hardware and CPU cores available). However, for this example, given the high demand on static memory and to ensure a scalable environment in the long run, I would also recommend 64-bit VM but with a smaller Java Heap as a starting point such as 3 GB to minimize the GC cost. You definitely want to have extra buffer for the OldGen space so I typically recommend up to 50% memory footprint post major collection in order to keep the frequency of Full GC low and enough buffer for fail-over scenarios. Most of the time, your business traffic will drive most of your memory footprint, unless you need significant amount of data caching to achieve proper performance which is typical for portal (media) heavy applications. Too much data caching should raise a yellow flag that you may need to revisit some design elements sooner than later. #4 – Don’t guess it, measure it! At this point you should: - Understand the basic JVM principles and memory spaces - Have a deep view and understanding of all applications along with their characteristics (size, type, dynamic traffic, stateless vs. stateful objects, internal memory caches etc.) - Have a very good view or forecast on the business traffic (# of concurrent users etc.) and for each application - Some ideas if you need a 64-bit VM or not and which JVM settings to start with - Some ideas if you need more than one JVM (middleware) processes But wait, your work is not done yet. While this above information is crucial and great for you to come up with “best guess” Java Heap settings, it is always best and recommended to simulate your application(s) behaviour and validate the Java Heap memory requirement via proper profiling, load & performance testing. You can learn and take advantage of tools such as JProfiler (future articles will include tutorials on JProfiler). From my perspective, learning how to use a profiler is the best way to properly understand your application memory footprint. Another approach I use for existing production environments is heap dump analysis using the Eclipse MAT tool. Heap Dump analysis is very powerful and allow you to view and understand the entire memory footprint of the Java Heap, including class loader related data and is a must do exercise in any memory footprint analysis; especially memory leaks. Java profilers and heap dump analysis tools allow you to understand and validate your application memory footprint, including detection and resolution of memory leaks. Load and performance testing is also a must since this will allow you to validate your earlier estimates by simulating your forecast concurrent users. It will also expose your application bottlenecks and allow you to further fine tune your JVM settings. You can use tools such as Apache JMeter which is very easy to learn and use or explore other commercial products. Finally, I have seen quite often Java EE environments running perfectly fine until the day where one piece of the infrastructure start to fail e.g. hardware failure. Suddenly the environment is running at reduced capacity (reduced # of JVM processes) and the whole environment goes down. What happened? There are many scenarios that can lead to domino effects but lack of JVM tuning and capacity to handle fail-over (short term extra load) is very common. If your JVM processes are running at 80%+ OldGen space capacity with frequent garbage collections, how can you expect to handle any fail-over scenario? Your load and performance testing exercise performed earlier should simulate such scenario and you should adjust your tuning settings properly so your Java Heap has enough buffer to handle extra load (extra objects) at short term. This is mainly applicable for the dynamic memory footprint since fail-over means redirecting a certain % of your concurrent users to the available JVM processes (middleware instances). #5 – Divide and conquer At this point you have performed dozens of load testing iterations. You know that your JVM is not leaking memory. Your application memory footprint cannot be reduced any further. You tried several tuning strategies such as using a large 64-bit Java Heap space of 10 GB+, multiple GC policies but still not finding your performance level acceptable? In my experience I found that, with current JVM specifications, proper vertical and horizontal scaling which involved creating a few JVM processes per physical host and across several hosts will give you the throughput and capacity that you are looking for. Your IT environment will also more fault tolerant if you break your application list in a few logical silos, with their own JVM process, Threads and tuning values. This “divide and conquer” strategy involves splitting your application(s) traffic to multiple JVM processes and will provide you with: - Reduced Java Heap size per JVM process (both static & dynamic footprint) - Reduced complexity of JVM tuning - Reduced GC elapsed and pause time per JVM process - Increased redundancy and fail-over capabilities - Aligned with latest Cloud and IT virtualization strategies The bottom line is that when you find yourself spending too much time in tuning that single elephant 64-bit JVM process, it is time to revisit your middleware and JVM deployment strategy and take advantage of vertical & horizontal scaling. This implementation strategy is more taxing for the hardware but will really pay off in the long run. Please provide any comment and share your experience on JVM Heap sizing and tuning.

July 19, 2012

by Pierre - Hugues Charbonneau

· 142,504 Views · 7 Likes

Apache Thrift with Java Quickstart

Apache Thrift is a RPC framework founded by facebook and now it is an Apache project. Thrift lets you define data types and service interfaces in a language neutral definition file. That definition file is used as the input for the compiler to generate code for building RPC clients and servers that communicate over different programming languages. You can refer Thrift white paper also. According to the official web site Apache Thrift is a, software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages. Image courtesy wikipedia Installing Apache Thrift in Windows Installation Thrift can be a tiresome process. But for windows the compiler is available as a prebuilt exe. Download thrift.exe and add it into your environment variables. Writing Thrift definition file (.thrift file) Writing the Thrift definition file becomes really easy once you get used to it. I found this tutorial quite useful to begin with. Example definition file (add.thrift) namespace java com.eviac.blog.samples.thrift.server // defines the namespace typedef i32 int //typedefs to get convenient names for your types service AdditionService { // defines the service to add two numbers int add(1:int n1, 2:int n2), //defines a method } Compiling Thrift definition file To compile the .thrift file use the following command. thrift --gen For my example the command is, thrift --gen java add.thrift After performing the command, inside gen-java directory you'll find the source codes which is useful for building RPC clients and server. In my example it will create a java code called AdditionService.java Writing a service handler Service handler class is required to implement the AdditionService.Iface interface. Example service handler (AdditionServiceHandler.java) package com.eviac.blog.samples.thrift.server; import org.apache.thrift.TException; public class AdditionServiceHandler implements AdditionService.Iface { @Override public int add(int n1, int n2) throws TException { return n1 + n2; } } Writing a simple server Following is an example code to initiate a simple thrift server. To enable the multithreaded server uncomment the commented parts of the example code. Example server (MyServer.java) package com.eviac.blog.samples.thrift.server; import org.apache.thrift.transport.TServerSocket; import org.apache.thrift.transport.TServerTransport; import org.apache.thrift.server.TServer; import org.apache.thrift.server.TServer.Args; import org.apache.thrift.server.TSimpleServer; public class MyServer { public static void StartsimpleServer(AdditionService.Processor processor) { try { TServerTransport serverTransport = new TServerSocket(9090); TServer server = new TSimpleServer( new Args(serverTransport).processor(processor)); // Use this for a multithreaded server // TServer server = new TThreadPoolServer(new // TThreadPoolServer.Args(serverTransport).processor(processor)); System.out.println("Starting the simple server..."); server.serve(); } catch (Exception e) { e.printStackTrace(); } } public static void main(String[] args) { StartsimpleServer(new AdditionService.Processor(new AdditionServiceHandler())); } } Writing the client Following is an example java client code which consumes the service provided by AdditionService. Example client code (AdditionClient.java) package com.eviac.blog.samples.thrift.client; import org.apache.thrift.TException; import org.apache.thrift.protocol.TBinaryProtocol; import org.apache.thrift.protocol.TProtocol; import org.apache.thrift.transport.TSocket; import org.apache.thrift.transport.TTransport; import org.apache.thrift.transport.TTransportException; public class AdditionClient { public static void main(String[] args) { try { TTransport transport; transport = new TSocket("localhost", 9090); transport.open(); TProtocol protocol = new TBinaryProtocol(transport); AdditionService.Client client = new AdditionService.Client(protocol); System.out.println(client.add(100, 200)); transport.close(); } catch (TTransportException e) { e.printStackTrace(); } catch (TException x) { x.printStackTrace(); } } } Run the server code(MyServer.java). It should output following and will listen to the requests. Starting the simple server... Then run the client code(AdditionClient.java). It should output following. 300

July 16, 2012

by Pavithra Gunasekara

· 43,196 Views · 2 Likes

JMS With ActiveMQ

Java Message Service is a mechanism for integrating applications in a loosely coupled, flexible manner and delivers data asynchronously across applications.

July 14, 2012

by Pavithra Gunasekara

· 159,121 Views · 13 Likes

Dependency Convergence in Maven

I was running in to a problem with a Java project that occured only in IntelliJ Idea, but not on the command line, when running specific test classes in Maven. The exception stack trace had the following in it: Caused by: com.sun.jersey.api.container.ContainerException: No WebApplication provider is present That seems like an easy problem to fix - it is the exception message that is given when jersey can’t find the provider for JAX-RS. Fixing it is normally just a matter of making sure jersey-core is on the classpath to fulfill SPI requirements for JAX-RS. For some reason though this isn’t happening in IntelliJ Idea. I inspected the log output of the test run and it is quite clear that all of the jersey dependencies are on the classpath. Then it dawns me on the try running mvn dependency:tree from inside of Idea. Here is what I found: [INFO] +- org.mule.modules:mule-module-jersey:jar:3.2.1:provided [INFO] | +- com.sun.jersey:jersey-server:jar:1.6:provided [INFO] | +- com.sun.jersey:jersey-json:jar:1.6:provided [INFO] | | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:provided [INFO] | | \- org.codehaus.jackson:jackson-xc:jar:1.7.1:provided [INFO] | +- com.sun.jersey:jersey-client:jar:1.6:provided [INFO] | \- org.codehaus.jackson:jackson-jaxrs:jar:1.8.0:provided ... [INFO] +- org.jclouds.driver:jclouds-sshj:jar:1.4.0-rc.3:compile [INFO] | +- org.jclouds:jclouds-compute:jar:1.4.0-rc.3:compile [INFO] | | \- org.jclouds:jclouds-scriptbuilder:jar:1.4.0-rc.3:compile [INFO] | +- org.jclouds:jclouds-core:jar:1.4.0-rc.3:compile [INFO] | | +- net.oauth.core:oauth:jar:20100527:compile [INFO] | | +- com.sun.jersey:jersey-core:jar:1.11:compile [INFO] | | +- com.google.inject.extensions:guice-assistedinject:jar:3.0:compile Notice how I have jersey-core 1.11 coming from jclouds-core but jersey 1.6 everywhere else. That, my friends, is a dependency convergence problem. Maven with its default set of plugins (read: no maven-enforcer-plugin) does not even warn you if something like this happens. In this case, somehow jclouds-core depends directly on jersey-core and happens to resolve the dependency to the version that jclouds-core declared first before jersey-core can be resolved as a transitive dependency on mule-module-jersey. To fix the symptom, all I had to do was add the jersey-core dependency explicitely as a top level dependency in my pom: com.sun.jersey jersey-core ${jersey.version} provided But doing so only fixes the symptom, not the problem. The real problem is that the maven project I’m working on does not presently attempt to detect or resolve dependency convergence problems. This is where the maven-enforcer-plugin comes in handy. You can have the enforcer plugin run the DependencyConvergence rule agaisnt your build and have it fail when you have potential conflicts in your transitive dependencies that you haven’t resolved through exclusions or declaring direct dependencies yet. Binding the maven-enforcer-plugin to your build would look something like this: org.apache.maven.plugins maven-enforcer-plugin 1.0.1 enforce enforce validate ... I chose to bind to the validate phase since that is the first phase to be run in the maven lifecycle. Now my build fails immediately and contains very useful output that looks like the following: Dependency convergence error for org.codehaus.jackson:jackson-jaxrs:1.7.1 paths to dependency are: +-com.nodeable:server:1.0-SNAPSHOT +-org.mule.modules:mule-module-jersey:3.2.1 +-com.sun.jersey:jersey-json:1.6 +-org.codehaus.jackson:jackson-jaxrs:1.7.1 and +-com.nodeable:server:1.0-SNAPSHOT +-org.mule.modules:mule-module-jersey:3.2.1 +-org.codehaus.jackson:jackson-jaxrs:1.8.0 There are many rules you can apply besides DependencyConvergence. However, if the output from the DependencyConvergence rule looks anything like mine does presently, it might take you a while before you get around to getting your maven build to pass and conform to other rules.

July 11, 2012

by Jason Whaley

· 24,237 Views · 1 Like

JBoss AS 7 is neat but the documentation is still quite lacking (and error messages not as useful as they could be). This post summarizes how you can create your own JavaEE-compliant login module for authenticating users of your webapp deployed on JBoss AS. A working elementary username-password module provided. Why use Java EE standard authentication? Java EE security primer A part of the Java EE specification is security for web and EE applications, which makes it possible both to specify declarative constraints in your web.xml (such as “role X is required to access resources at URLs “/protected/*”) and to control it programatically, i.e. verifying that the user has a particular role (see HttpServletRequest.isUserInRole). It works as follows: You declare in your web.xml: Login configuration – primarily whether to use browser prompt (basic) or a custom login form and a name for the login realm The custom form uses “magic” values for the post action and the fields, starting with j_, which are intercepted and processed by the server The roles used in your application (typically you’d something like “user” and perhaps “admin”) What roles are required for accessing particular URL patterns (default: none) Whether HTTPS is required for some parts of the application You tell your application server how to authenticate users for that login realm, usually by associating its name with one of the available login modules in the configuration (the modules ranging from simple file-based user list to LDAP and Kerberos support). Only rarely do you need to create your own login module, the topic of this post. If this is new for you than I strongly recommend reading The Java EE 5 Tutorial – Examples: Securing Web Applications (Form-Based Authentication with a JSP Page incl. security constraint specification, Basic Authentication with JAX-WS, Securing an Enterprise Bean, Using the isCallerInRole and getCallerPrincipal Methods). Why to bother? Declarative security is nicely decoupled from the business code It’s easy to propagate security information between a webapp and for example EJBs (where you can protect a complete bean or a particular method declaratively via xml or via annotations such as @RolesAllowed) It’s easy to switch to a different authentication mechanism such as LDAP and it’s more likely that SSO will be supported Custom login module implementation options If one of the login modules (part of a security domain) provided out of the box with JBoss, such as UsersRoles, Ldap, Database, Certificate, isn’t sufficient for you then you can adjust one of them or implement your own. You can: Extend one of the concrete modules, overriding one or some of its methods to ajdust to your needs – see f.ex. how to override the DatabaseServerLoginModule to specify your own encryption of the stored passwords. This should be your primary choice, of possible. Subclass UsernamePasswordLoginModule Implement javax.security.auth.spi.LoginModule if you need maximal flexibility and portability (this is a part of Java EE, namely JAAS, and is quite complex) JBoss EAP 5 Security Guide Ch. 12.2. Custom Modules has an excellent description of the basic modules (AbstractServerLoginModule, UsernamePasswordLoginModule) and how to proceed when subclassing them or any other standard module, including description of the key methods to implement/override. You must read it. (The guide is still perfectly applicable to JBoss AS 7 in this regard.) The custom JndiUserAndPass module example, extending UsernamePasswordLoginModule, is also worth reading – it uses module options and JNDI lookup. Example: Custom UsernamePasswordLoginModule subclass See the source code of MySimpleUsernamePasswordLoginModule that extends JBoss’ UsernamePasswordLoginModule. The abstract UsernamePasswordLoginModule (source code) works by comparing the password provided by the user for equality with the password returned from the method getUsersPassword, implemented by a subclass. You can use the method getUsername to obtain the user name of the user attempting login. Implement abstract methods getUsersPassword() Implement getUsersPassword() to lookup the user’s password wherever you have it. If you do not store passwords in plain text then read how to customize the behavior via other methods below getRoleSets() Implement getRoleSets() (from AbstractServerLoginModule) to return at least one group named “Roles” and containing 0+ roles assigned to the user, see the implementation in the source code for this post. Usually you’d lookup the roles for the user somewhere (instead of returning hardcoded “user_role” role). Optionally extend initialize(..) to get access to module options etc. Usually you will also want to extend initialize(Subject subject, CallbackHandler callbackHandler, Map sharedState, Map options) (called for each authentication attempt), To get values of properties declared via the element in the security-domain configuration – see JBoss 5 custom module example To do other initialization, such as looking up a data source via JNDI – see the DatabaseServerLoginModule Optionally override other methods to customize the behavior If you do not store passwords in plain text (a wise choice!) and your hashing method isn’t supported out of the box then you can override createPasswordHash(String username, String password, String digestOption) to hash/encrypt the user-supplied password before comparison with the stored password. Alternatively you could override validatePassword(String inputPassword, String expectedPassword) to do whatever conversion on the password before comparison or even do a different type of comparison than equality. Custom login module deployment options In JBoss AS you can Deploy your login module class in a JAR as a standalone module, independently of the webapp, under /modules/, together with a module.xml – described at JBossAS7SecurityCustomLoginModules Deploy your login module class as a part of your webapp (no module.xml required) In a JAR inside WEB-INF/lib/ Directly under WEB-INF/classes In each case you have to declare a corresponding security-domain it inside JBoss configuration (standalone/configuration/standalone.xml or domain/configuration/domain.xml): The code attribute should contain the fully qualified name of your login module class and the security-domain’s name must match the declaration in jboss-web.xml: form-auth true The code Download the webapp jboss-custom-login containing the custom login module MySimpleUsernamePasswordLoginModule, follow the deployment instructions in the README.

July 4, 2012

by Jakub Holý

· 31,484 Views

Using the JavaFX AnimationTimer

In retrospect it was probably not a good idea to give the AnimationTimer its name, because it can be used for much more than just animation: measuring the fps-rate, collision detection, calculating the steps of a simulation, the main loop of a game etc. In fact, most of the time I saw AnimationTimer in action was not related to animation at all. Nevertheless there are cases when you want to consider using an AnimationTimer for your animation. This post will explain the class and show an example where AnimationTimer is used to calculate animations. The AnimationTimer provides an extremely simple, but very useful and flexible feature. It allows to specify a method, that will be called in every frame. What this method is used for is not limited and, as already mentioned, does not have anything to do with animation. The only requirement is, that it has to return fast, because otherwise it can easily become the bottleneck of a system. To use it, a developer has to extend AnimationTimer and implement the abstract method handle(). This is the method that will be called in every frame while the AnimationTimer is active. A single parameter is passed to handle(). It contains the current time in nanoseconds, the same as what you would get when calling System.nanoTime(). Why should one use the passed in value instead of calling System.nanoTime() or its little brother System.currentTimeMillis() oneself? There are several reasons, but the most important probably is, that it makes your life a lot easier while debugging. If you ever tried to debug code, that depended on these two methods, you know that you are basically screwed. But the JavaFX runtime goes into a paused state while it is waiting to execute the next step during debugging and the internal clock does not proceed during this pause. In other words no matter if you wait two seconds or two hours before you resume a halted program while debugging, the increment of the parameter will roughly be the same! AnimationTimer has two methods start() and stop() to activate and deactivate it. If you override them, it is important that you call these methods in the super class. The Animation API comes with many feature rich classes, that make defining an animation very simple. There are predefined Transition classes, it is possible to define a key-frame based animation using Timeline, and one can even write a custom Transition easily. But in which cases does it make sense to use an AnimationTimer instead? – Almost always you want to use one of the standard classes. But if you want to specify many simple animations, using an AnimationTimer can be the better choice. The feature richness of the standard animation classes comes with a price. Every single animation requires a whole bunch of variables to be tracked – variables that you often do not need for simple animations. Plus these classes are optimized for speed, not for small memory footprint. Some of the variables are stored twice, once in the format the public API requires and once in a format that helps faster calculation while playing. Below is a simple example that shows a star field. It animates thousands of rectangles flying from the center to the outer edges. Using an AnimationTimer allows to store only the values that are needed. The calculation is extremely simple compared to the calculation within a Timeline for example, because no advanced features (loops, animation rate, direction etc.) have to be considered. package fxsandbox; import java.util.Random; import javafx.animation.AnimationTimer; import javafx.application.Application; import javafx.scene.Group; import javafx.scene.Node; import javafx.scene.Scene; import javafx.scene.paint.Color; import javafx.scene.shape.Rectangle; import javafx.stage.Stage; public class FXSandbox extends Application { private static final int STAR_COUNT = 20000; private final Rectangle[] nodes = new Rectangle[STAR_COUNT]; private final double[] angles = new double[STAR_COUNT]; private final long[] start = new long[STAR_COUNT]; private final Random random = new Random(); @Override public void start(final Stage primaryStage) { for (int i=0; i

June 27, 2012

by Michael Heinrichs

· 17,501 Views

Using Cookies to implement a RememberMe functionality

Some web applications may need a "Remember Me" functionality. This means that, after a user login, user will have access from same machine to all its data even after session expired. This access will be possible until user does a logout. If you are using Spring and its login form, then you should use "Remember Me" functionality already implemented inside the framework. Some web frameworks also offer a type of SignIn panel which already has "remember me" built-in. But in case you have to implement "Remember Me" functionality by your own, this can be easily achieved using Cookies. Java has a Cookie class named javax.servlet.http.Cookie. Algorithm is straight-forward: your login panel must contain a "Remember Me" check after a succesfull login with "Remember Me" check selected, you can create two cookies: one to keep the value for rememberMe and one to keep a token which has to identify the logged user. For sake of security, this token must never contain user name or user password. The ideea is to generate a random id as token value. And token value aside with user id must be saved in your storage (database) whenever a login is needed, you have to look if there is any cookie saved by you, and if so and your "rememberMe" value is true, you can take the user from storage based on your token and do an automatic login. when a logout is done, you have to delete the cookie that keeps the token To add a cookie, you have to specify the maximum age of the cookie in seconds : HttpServletResponse servletResponse = ...; Cookie c = new Cookie(COOKIE_NAME, encodeString(uuid)); c.setMaxAge(365 * 24 * 60 * 60); // one year servletResponse.addCookie(c); To delete a cookie, you have to find cookie by name and set its maximum age to 0, before adding it to servlet response: HttpServletRequest servletRequest = ...; HttpServletResponse servletResponse = ... ; Cookie[] cookies = servletRequest.getCookies(); for (int i = 0; i < cookies.length; i++) { Cookie c = cookies[i]; if (c.getName().equals(COOKIE_NAME)) { c.setMaxAge(0); c.setValue(null); servletResponse.addCookie(c); } }

June 26, 2012

by Mihai Dinca - Panaitescu

· 57,429 Views · 1 Like

JAX-WS Header: Part 1 the Client Side

Manipulating JAXWS header on the client Side like adding WSS username token or logging saop message.

June 25, 2012

by Slim Ouertani

· 89,068 Views

Java Volatile Keyword Explained by Example

Check out an example of the volatile Java keyword.

June 21, 2012

by Thibault Delor

· 259,565 Views · 20 Likes

Top 10 Causes of Java EE Enterprise Performance Problems

Performance problems are one of the biggest challenges to expect when designing and implementing Java EE related technologies.

June 20, 2012

by Pierre - Hugues Charbonneau

· 273,068 Views · 20 Likes

How to Resolve java.lang.NoClassDefFoundError: How to resolve – Part 2

This article is part 2 of our NoClassDefFoundError troubleshooting series. It will focus and describe the simplest type of NoClassDefFoundError problem. This article is ideal for Java beginners and I highly recommend that you compile and run the sample Java program yourself. The following writing format will be used going forward and will provide you with: - Description of the problem case and type of NoClassDefFoundError - Sample Java program “simulating” the problem case - ClassLoader chain view - Recommendations and resolution strategies NoClassDefFoundError problem case 1 – missing JAR file The first problem case we will cover is related to a Java program packaging and / or classpath problem. A typical Java program can include one or many JAR files created at compile time. NoClassDefFoundError can often be observed when you forget to add JAR file(s) containing Java classes referenced by your Java or Java EE application. This type of problem is normally not hard to resolve once you analyze the Java Exception and missing Java class name. Sample Java program The following simple Java program is split as per below: - The main Java program NoClassDefFoundErrorSimulator - The caller Java class CallerClassA - The referencing Java class ReferencingClassA - A util class for ClassLoader and logging related facilities JavaEETrainingUtil This program is simple attempting to create a new instance and execute a method of the Java class CallerClassA which is referencing the class ReferencingClassA.It will demonstrate how a simple classpath problem can trigger NoClassDefFoundError. The program is also displaying detail on the current class loader chain at class loading time in order to help you keep track of this process. This will be especially useful for future and more complex problem cases when dealing with larger class loader chains. #### NoClassDefFoundErrorSimulator.java package org.ph.javaee.training1; import org.ph.javaee.training.util.JavaEETrainingUtil; /** * NoClassDefFoundErrorTraining1 * @author Pierre-Hugues Charbonneau * */ public class NoClassDefFoundErrorSimulator { /** * @param args */ public static void main(String[] args) { System.out.println("java.lang.NoClassDefFoundError Simulator - Training 1"); System.out.println("Author: Pierre-Hugues Charbonneau"); System.out.println("http://javaeesupportpatterns.blogspot.com"); // Print current Classloader context System.out.println("\nCurrent ClassLoader chain: "+JavaEETrainingUtil.getCurrentClassloaderDetail()); // 1. Create a new instance of CallerClassA CallerClassA caller = new CallerClassA(); // 2. Execute method of the caller caller.doSomething(); System.out.println("done!"); } } #### CallerClassA.java package org.ph.javaee.training1; import org.ph.javaee.training.util.JavaEETrainingUtil; /** * CallerClassA * @author Pierre-Hugues Charbonneau * */ public class CallerClassA { private final static String CLAZZ = CallerClassA.class.getName(); static { System.out.println("Classloading of "+CLAZZ+" in progress..."+JavaEETrainingUtil.getCurrentClassloaderDetail()); } public CallerClassA() { System.out.println("Creating a new instance of "+CallerClassA.class.getName()+"..."); } public void doSomething() { // Create a new instance of ReferencingClassA ReferencingClassA referencingClass = new ReferencingClassA(); } } #### ReferencingClassA.java package org.ph.javaee.training1; import org.ph.javaee.training.util.JavaEETrainingUtil; /** * ReferencingClassA * @author Pierre-Hugues Charbonneau * */ public class ReferencingClassA { private final static String CLAZZ = ReferencingClassA.class.getName(); static { System.out.println("Classloading of "+CLAZZ+" in progress..."+JavaEETrainingUtil.getCurrentClassloaderDetail()); } public ReferencingClassA() { System.out.println("Creating a new instance of "+ReferencingClassA.class.getName()+"..."); } public void doSomething() { //nothing to do... } } #### JavaEETrainingUtil.java package org.ph.javaee.training.util; import java.util.Stack; import java.lang.ClassLoader; /** * JavaEETrainingUtil * @author Pierre-Hugues Charbonneau * */ public class JavaEETrainingUtil { /** * getCurrentClassloaderDetail * @return */ public static String getCurrentClassloaderDetail() { StringBuffer classLoaderDetail = new StringBuffer(); Stack classLoaderStack = new Stack(); ClassLoader currentClassLoader = Thread.currentThread().getContextClassLoader(); classLoaderDetail.append("\n-----------------------------------------------------------------\n"); // Build a Stack of the current ClassLoader chain while (currentClassLoader != null) { classLoaderStack.push(currentClassLoader); currentClassLoader = currentClassLoader.getParent(); } // Print ClassLoader parent chain while(classLoaderStack.size() > 0) { ClassLoader classLoader = classLoaderStack.pop(); // Print current classLoaderDetail.append(classLoader); if (classLoaderStack.size() > 0) { classLoaderDetail.append("\n--- delegation ---\n"); } else { classLoaderDetail.append(" **Current ClassLoader**"); } } classLoaderDetail.append("\n-----------------------------------------------------------------\n"); return classLoaderDetail.toString(); } } Problem reproduction In order to replicate the problem, we will simply “voluntary” omit one of the JAR files from the classpath that contains the referencing Java class ReferencingClassA. The Java program is packaged as per below: - MainProgram.jar (contains NoClassDefFoundErrorSimulator.class and JavaEETrainingUtil.class) - CallerClassA.jar (contains CallerClassA.class) - ReferencingClassA.jar (contains ReferencingClassA.class) Now, let’s run the program as is: ## Baseline (normal execution) .\bin>java -classpath CallerClassA.jar;ReferencingClassA.jar;MainProgram.jar org.ph.javaee.training1.NoClassDefFoundErrorSimulator java.lang.NoClassDefFoundError Simulator - Training 1 Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com Current ClassLoader chain: ----------------------------------------------------------------- sun.misc.Launcher$ExtClassLoader@17c1e333 --- delegation --- sun.misc.Launcher$AppClassLoader@214c4ac9 **Current ClassLoader** ----------------------------------------------------------------- Classloading of org.ph.javaee.training1.CallerClassA in progress... ----------------------------------------------------------------- sun.misc.Launcher$ExtClassLoader@17c1e333 --- delegation --- sun.misc.Launcher$AppClassLoader@214c4ac9 **Current ClassLoader** ----------------------------------------------------------------- Creating a new instance of org.ph.javaee.training1.CallerClassA... Classloading of org.ph.javaee.training1.ReferencingClassA in progress... ----------------------------------------------------------------- sun.misc.Launcher$ExtClassLoader@17c1e333 --- delegation --- sun.misc.Launcher$AppClassLoader@214c4ac9 **Current ClassLoader** ----------------------------------------------------------------- Creating a new instance of org.ph.javaee.training1.ReferencingClassA... done! For the initial run (baseline), the main program was able to create a new instance of CallerClassA and execute its method successfully; including successful class loading of the referencing class ReferencingClassA. ## Problem reproduction run (with removal of ReferencingClassA.jar) ../bin>java -classpath CallerClassA.jar;MainProgram.jar org.ph.javaee.training1.NoClassDefFoundErrorSimulator java.lang.NoClassDefFoundError Simulator - Training 1 Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com Current ClassLoader chain: ----------------------------------------------------------------- sun.misc.Launcher$ExtClassLoader@17c1e333 --- delegation --- sun.misc.Launcher$AppClassLoader@214c4ac9 **Current ClassLoader** ----------------------------------------------------------------- Classloading of org.ph.javaee.training1.CallerClassA in progress... ----------------------------------------------------------------- sun.misc.Launcher$ExtClassLoader@17c1e333 --- delegation --- sun.misc.Launcher$AppClassLoader@214c4ac9 **Current ClassLoader** ----------------------------------------------------------------- Creating a new instance of org.ph.javaee.training1.CallerClassA... Exception in thread "main" java.lang.NoClassDefFoundError: org/ph/javaee/training1/ReferencingClassA at org.ph.javaee.training1.CallerClassA.doSomething(CallerClassA.java:25) at org.ph.javaee.training1.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:28) Caused by: java.lang.ClassNotFoundException: org.ph.javaee.training1.ReferencingClassA at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) ... 2 more What happened? The removal of the ReferencingClassA.jar, containing ReferencingClassA, did prevent the current class loader to locate this referencing Java class at runtime leading to ClassNotFoundException and NoClassDefFoundError. This is the typical Exception that you will get if you omit JAR file(s) from your Java start-up classpath or within an EAR / WAR for Java EE related applications. ClassLoader view Now let’s review the ClassLoader chain so you can properly understand this problem case. As you saw from the Java program output logging, the following Java ClassLoaders were found: Classloading of org.ph.javaee.training1.CallerClassA in progress... ----------------------------------------------------------------- sun.misc.Launcher$ExtClassLoader@17c1e333 --- delegation --- sun.misc.Launcher$AppClassLoader@214c4ac9 **Current ClassLoader** ----------------------------------------------------------------- ** Please note that the Java bootstrap class loader is responsible to load the core JDK classes and is written in native code ** ## sun.misc.Launcher$AppClassLoader This is the system class loader responsible to load our application code found from the Java classpath specified at start-up. ##sun.misc.Launcher$ExtClassLoader This is the extension class loader responsible to load code in the extensions directories (/lib/ext, or any other directory specified by the java.ext.dirs system property). As you can see from the Java program logging output, the extension class loader is the actual super parent of the system class loader. Our sample Java program was loaded at the system class loader level. Please note that this class loader chain is very simple for this problem case since we did not create child class loaders at this point. This will be covered in future articles. Recommendations and resolution strategies Now find below my recommendations and resolution strategies for NoClassDefFoundError problem case 1: - Review the java.lang.NoClassDefFoundError error and identify the missing Java class - Verify and locate the missing Java class from your compile / build environment - Determine if the missing Java class is from your application code, third part API or even the Java EE container itself. Verify where the missing JAR file(s) is / are expected to be found - Once found, verify your runtime environment Java classpath for any typo or missing JAR file(s) - If the problem is triggered from a Java EE application, perform the same above steps but verify the packaging of your EAR / WAR file for missing JAR and other library file dependencies such as MANIFEST Please feel free to post any question or comment. The part 3 will be available shortly.

June 16, 2012

by Pierre - Hugues Charbonneau

· 173,225 Views · 1 Like

How to Identify and Resolve Hibernate N+1 SELECT's Problems

Let’s assume that you’re writing code that’d track the price of mobile phones. Now, let’s say you have a collection of objects representing different Mobile phone vendors (MobileVendor), and each vendor has a collection of objects representing the PhoneModels they offer. To put it simple, there’s exists a one-to-many relationship between MobileVendor:PhoneModel. MobileVendor Class Class MobileVendor{ long vendor_id; PhoneModel[] phoneModels; ... } Okay, so you want to print out all the details of phone models. A naive O/R implementation would SELECT all mobile vendors and then do N additional SELECTs for getting the information of PhoneModel for each vendor. -- Get all Mobile Vendors SELECT * FROM MobileVendor; -- For each MobileVendor, get PhoneModel details SELECT * FROM PhoneModel WHERE MobileVendor.vendorId=? As you see, the N+1 problem can happen if the first query populates the primary object and the second query populates all the child objects for each of the unique primary objects returned. Resolve N+1 SELECTs problem (i) HQL fetch join "from MobileVendor mobileVendor join fetch mobileVendor.phoneModel PhoneModels" Corresponding SQL would be (assuming tables as follows: t_mobile_vendor for MobileVendor and t_phone_model for PhoneModel) SELECT * FROM t_mobile_vendor vendor LEFT OUTER JOIN t_phone_model model ON model.vendor_id=vendor.vendor_id (ii) Criteria query Criteria criteria = session.createCriteria(MobileVendor.class); criteria.setFetchMode("phoneModels", FetchMode.EAGER); In both cases, our query returns a list of MobileVendor objects with the phoneModels initialized. Only one query needs to be run to return all the PhoneModel and MobileVendor information required.

June 13, 2012

by Singaram Subramanian

· 200,933 Views · 13 Likes

Get TeamCity Artifacts Using HTTP, Ant, Gradle and Maven

In how many ways can you retrieve TeamCity artifacts? I say plenty to choose from! If you’re in a world of Java build tools then you can use plain HTTP request, Ant + Ivy, Gradle and Maven to download and use binaries produced by TeamCity build configurations. How? Read on. Build Configuration “id” Before you retrieve artifacts of any build configuration you need to know its "id" which can be seen in a browser when corresponding configuration is browsed. Let’s take IntelliJ IDEA Community Edition project hosted at teamcity.jetbrains.com as an example. Its “Community Dist” build configuration provides a number of artifacts which we’re going to play with. And as can be seen on the screenshot below, its "id" is "bt343". HTTP Anonymous HTTP access is probably the easiest way to fetch TeamCity artifacts, the URL to do so is: http://server/guestAuth/repository/download/// Fot this request to work 3 parameters need to be specified: btN Build configuration "id", as mentioned above. buildNumber Build number or one of predefined constants: "lastSuccessful", "lastPinned", or "lastFinished". For example, you can download periodic IDEA builds from last successful TeamCity execution. artifactName Name of artifact like "ideaIC-118.SNAPSHOT.win.zip". Can also take a form of "artifactName!archivePath" for reading archive’s content, like IDEA’s build file. You can get a list of all artifacts produced in a certain build by requesting a special "teamcity-ivy.xml" artifact generated by TeamCity. Ant + Ivy All artifacts published to TeamCity are accompanied by "teamcity-ivy.xml" Ivy descriptor, effectively making TeamCity an Ivy repository. The code below downloads "core/annotations.jar" from IDEA distribution to "download/ivy" directory: "ivyconf.xml" "ivy.xml" "build.xml" Gradle Identically to Ivy example above it is fairly easy to retrieve TeamCity artifacts with Gradle due to its built-in Ivy support. In addition to downloading the same jar file to "download/gradle" directory with a custom Gradle task let’s use it as "compile" dependency for our Java class, importing IDEA’s @NotNull annotation: "Test.java" import org.jetbrains.annotations.NotNull; public class Test { private final String data; public Test ( @NotNull String data ){ this.data = data; } } "build.gradle" apply plugin: 'java' repositories { ivy { ivyPattern 'http://teamcity.jetbrains.com/guestAuth/repository/download/[module]/[revision]/teamcity-ivy.xml' artifactPattern 'http://teamcity.jetbrains.com/guestAuth/repository/download/[module]/[revision]/[artifact](.[ext])' } } dependencies { compile ( 'org:bt343:lastSuccessful' ){ artifact { name = 'core/annotations' type = 'jar' } } } task copyJar( type: Copy ) { from configurations.compile into "${ project.projectDir }/download/gradle" } Maven The best way to use Maven with TeamCity is by setting up an Artifactory repository manager and its TeamCity plugin. This way artifacts produced by your builds are nicely deployed to Artifactory and can be served from there as from any other remote Maven repository. However, you can still use TeamCity artifacts in Maven without any additional setups. "ivy-maven-plugin" bridges two worlds allowing you to plug Ivy resolvers into Maven’s runtime environment, download dependencies required and add them to corresponding "compile" or "test" scopes. Let’s compile the same Java source from the Gradle example but using Maven this time. "pom.xml" 4.0.0 com.test maven jar 0.1-SNAPSHOT [${project.groupId}:${project.artifactId}:${project.version}] Ivy Maven plugin example com.github.goldin ivy-maven-plugin 0.2.5 get-ivy-artifacts ivy initialize ${project.basedir}/ivyconf.xml ${project.basedir}/ivy.xml ${project.basedir}/download/maven compile When this plugin runs it resolves IDEA annotations artifact using the same "ivyconf.xml" and "ivy.xml" files we’ve seen previously, copies it to "download/maven" directory and adds to "compile" scope so our Java sources can compile. GitHub Project All examples demonstrated are available in my GitHub project. Feel free to clone and run it: git clone git://github.com/evgeny-goldin/teamcity-download-examples.git cd teamcity-download-examples chmod +x run.sh dist/ant/bin/ant gradlew dist/maven/bin/mvn ./run.sh Resources The links below can provide you with more details: TeamCity – Patterns For Accessing Build Artifacts TeamCity – Accessing Server by HTTP TeamCity – Configuring Artifact Dependencies Using Ant Build Script Gradle – Ivy repositories "ivy-maven-plugin" That’s it, you’ve seen it – TeamCity artifacts are perfectly accessible using either of 4 ways: direct HTTP access, Ant + Ivy, Gradle or Maven. Which one do you use? Let me know!

June 5, 2012

by Evgeny Goldin

· 10,645 Views

Spring Integration - Robust Splitter Aggregator

A Robust Splitter Aggregator Design Strategy - Messaging Gateway Adapter Pattern What do we mean by robust? In the context of this article, robustness refers to an ability to manage exception conditions within a flow without immediately returning to the caller. In some processing scenarios n of m responses is good enough to proceed to conclusion. Example processing scenarios that typically have these tendencies are: Quotations for finance, insurance and booking systems. Fan-out publishing systems. Why do we need Robust Splitter Aggregator Designs? First and foremost an introduction to a typical Splitter Aggregator pattern maybe necessary. The Splitter is an EIP pattern that describes a mechanism for breaking composite messages into parts in order that they can be processed individually. A Router is an EIP pattern that describes routing messages into channels - aiming them at specific messaging endpoints. The Aggregator is an EIP pattern that collates and stores a set of messages that belong to a group, and releases them when that group is complete. Together, those three EIP constructs form a powerful mechanism for dividing processing into distinct units of work. Spring Integration (SI) uses the same pattern terminology as EIP and so readers of that methodology will be quite comfortable with Spring Integration Framework constructs. The SI Framework allows significant customisations of all three of those constructs and furthermore, by simply using asynchronous channels as you would in any other multi-threaded configuration, allows those units of work to be executed in parallel. An interesting challenge working with SI Splitter Aggregator designs is building appropriately robust flows that operate predictably in a number of invocation scenarios. A simple splitter aggregator design can be used in many circumstances and operate without heavy customisation of the SI constructs. However, some service requirements demand a more robust processing strategy and therefore more complex configuration. The following sections describe and show what a Simple Splitter Aggregator design actually looks like, the type of processing your design must be able to deal with and then goes on to suggest candidate solutions for more robust processing. A Simple Splitter Aggregator Design The following Splitter Aggregator design shows a simple flow that receives document request messages into messaging gateway, splits the message into two processing routes and then aggregates the response. Note that the diagram has been built from EIP constructs in OmniGraffle rather than being an Integration Graph view from within STS; the channels are missing from the diagram for the sake of brevity. SI Constructs in detail: Messaging Gateways - there are three messaging gateways. A number of configurations are available for gateway specifications but significantly can return business objects, exceptions and nulls (following a timeout). The gateway to the far left is the service gateway for which we are defining the flow. The other two gateways, between the Router and Aggregator, are external systems that will be providing responses to business questions that our flow generates. The Splitter - a single splitter exists and is responsible for consuming the document message and producing a collection of messages for onward processing. The Java signature for the, most often, custom Splitter specifies a single object argument and a collection for return. The Recipient List Router - a single router exists, any appropriate router can be used, chose the one that closely matches your requirements - you can easily route by expression or payload type. The primary purpose of the router is route a collection of messages supplied by the splitter. This is a pretty typical Splitter Aggregator configuration. Aggregator - a single construct that is responsible for collecting messages together in a group in order that further processing can take place on the gateway responses. Although the Aggregator can be configured with attributes and bean definitions to provide alternative grouping and release strategies, most often the default aggregation strategy suffices. Interesting Aspects of Splitter Aggregator Operation Gateway - the inbound gateway, the one on the far left, may or may not have an error handling bean reference defined on it. If it does then that bean will have an opportunity to handle an exceptions thrown within the flow to the right of that gateway. If not, any exception will be thrown straight out of the gateway. Gateway - an optional default-reply-timeout can be set on each of the gateways, there are significant implications for setting this value, ensure that they're well understood. An expired timeout will result in a null being returned from the gateway. This is the very same condition that can lead to a thread getting parked if an upstream gateway also has no default-reply-timeout set. Splitter Input Channel - this can be a simple direct channel or a direct channel with a dispatcher defined on it. If the channel has a dispatcher specified the flow downstream of this point will be asynchronous, multi-threaded. This also changes the upstream gateway semantics as it usually means that an otherwise impotent default-reply-timeout becomes active. Splitter - the splitter must return a single object. The single object returned by the splitter is a collection, a java.util.List. The SI framework will take each member of that list and feed it into the output-channel of the Splitter - as with this example, usually straight into a router. The contract for Splitter List returns is as its use in Java - it may contain zero, one or more elements. If the splitter returns an empty list it's unlikely that the router will have any work to do and so the flow invocation will be complete. However, if the List contains one item, the SI framework will extract that item from the list and push it into the router, if this gets routed successfully, the flow will continue. Router - the router will simply route messages into one of two gateways in this example. Gateways - the two gateways that are used between the Splitter and Aggregator are interesting. In this example I have used the generic gateway EIP pattern to represent a message sub-system but not defined it explicitly - we could use an HTTP outbound gateway, another SI flow or any other external system. Of course, for each of those sub-systems, a number of responses is possible. Depending on the protocol and external system, the message request may fail to send, the response fail to arrive, a long running process invoked, a network error or timeout or a general processing exception. Aggregator - the single aggregator will wait for a number of responses depending on what's been created by the Splitter. In the case where the splitter return list is empty the Aggregator will not get invoked. In the case where the Splitter return list has one entry, the aggregator will be waiting for one gateway response to complete the group. In the case where the Splitter list has n entries the Aggregator will be waiting for n entries to complete the group. Custom correlation strategies, release strategies and message stores can be injected amongst a set of rich configuration aspects. Interesting Aspects of Simple Splitter Aggregator Operation The primary deciding factor for establishing whether this type of simple gateway is adequate for requirements is to understand what happens in the event of failure. If any exception occurring in your SI flow results in the flow invocation being abandoned and that suits your requirements, there's no need to read any further. If, however, you need to continue processing following failure in one of the gateways the remainder of this article may be of more interest. Exceptions, from any source, generated between the splitter and aggregator, will result in an empty or partial group being discarded by the Aggregator. The exception will propagate back to the closest upstream gateway for either handling by a custom bean or re-throwing by the gateway. Note that a custom release strategy on the Aggregator is difficult to use and especially so alongside timeouts but would not help in this case as the exception will propagate back to the leftmost gateway before the aggregator is invoked. It's also possible to configure exception handlers on the innermost gateways, the exception message could be caught but how do you route messages from a custom exception handler into the aggregator to complete the group, inject the aggregator channel definition into the custom exception handler? This is a poor approach and would involve unpacking an exception message payload, copying the original message headers into a new SI message and then adding the original payload - only four or five lines of code, but dirty it is. Following exception generation, exception messages (without modification) cannot be routed into an Aggregator to complete the group. The original message, the one that contains the correlation and sequence ids for the group and group position are buried inside the SI messages exception payload. If processing needs to continue following exception generation, it should be clear that in order to continue processing, the following must take place: the aggregation group needs to be completed, any exceptions must be caught and handled before getting back to the closet upstream gateway, the correlation and sequence identifiers that allow group completion in the aggregator are buried within the exception message payload and will require extraction and setting on the message that's bound for the aggregator A More Robust Solution - Messaging Gateway Adapter Pattern Dealing with exceptions and null returns from gateways naturally leads to a design that implements a wrapper around the messaging gateway. This affords a level of control that would otherwise be very difficult to establish. This adapter technique allows all returns from messaging gateways to be caught and processed as the messaging gateway is injected into the Service Activator and called directly from that. The messaging gateway no longer responds to the aggregator directly, it responds to a custom Java code Spring bean configured in the Service Activator namespace definition. As expected, processing that does not undergo exception will continue as normal. Those flows that experience exception conditions or unexpected or missing responses from messaging gateways need to process messages in such as way that message groups bound for aggregation can be completed. If the Service Activator were to allow the exception to be propagated outside of it's backing bean, the group would not complete. The same applies not just for exceptions but any return object that does not carry the prerequisite group correlation id and sequence headers - this is where the adaptation is applied. Exception messages or null responses from messaging gateways are caught and handled as shown in the following example code: import com.l8mdv.sample.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.integration.Message; import org.springframework.integration.MessageHeaders; import org.springframework.integration.support.MessageBuilder; import org.springframework.util.Assert; public class AvsServiceImpl implements AvsService { private static final Logger logger = LoggerFactory.getLogger(AvsServiceImpl.class); public static final String MISSING_MANDATORY_ARG = "Mandatory argument is missing."; private AvsGateway avsGateway; public AvsServiceImpl(final AvsGateway avsGateway) { this.avsGateway = avsGateway; } public Message service(Message message) { Assert.notNull(message, MISSING_MANDATORY_ARG); Assert.notNull(message.getPayload(), MISSING_MANDATORY_ARG); MessageHeaders requestMessageHeaders = message.getHeaders(); Message responseMessage = null; try { logger.debug("Entering AVS Gateway"); responseMessage = avsGateway.send(message); if (responseMessage == null) responseMessage = buildNewResponse(requestMessageHeaders, AvsResponseType.NULL_RESULT); logger.debug("Exited AVS Gateway"); return responseMessage; } catch (Exception e) { return buildNewResponse(responseMessage, requestMessageHeaders, AvsResponseType.EXCEPTION_RESULT, e); } } private Message buildNewResponse(MessageHeaders requestMessageHeaders, AvsResponseType avsResponseType) { Assert.notNull(requestMessageHeaders, MISSING_MANDATORY_ARG); Assert.notNull(avsResponseType, MISSING_MANDATORY_ARG); AvsResponse avsResponse = new AvsResponse(); avsResponse.setError(avsResponseType); return MessageBuilder.withPayload(avsResponse) .copyHeadersIfAbsent(requestMessageHeaders).build(); } private Message buildNewResponse(Message responseMessage, MessageHeaders requestMessageHeaders, AvsResponseType avsResponseType, Exception e) { Assert.notNull(responseMessage, MISSING_MANDATORY_ARG); Assert.notNull(responseMessage.getPayload(), MISSING_MANDATORY_ARG); Assert.notNull(requestMessageHeaders, MISSING_MANDATORY_ARG); Assert.notNull(avsResponseType, MISSING_MANDATORY_ARG); Assert.notNull(e, MISSING_MANDATORY_ARG); AvsResponse avsResponse = new AvsResponse(); avsResponse.setError(avsResponseType, responseMessage.getPayload(), e); return MessageBuilder.withPayload(avsResponse) .copyHeadersIfAbsent(requestMessageHeaders).build(); } } Notice the last line of the catch clause of the exception handling block. This line of code copies the correlation and sequence headers into the response message, this is mandatory if the aggregation group is going to be allowed to complete and will always be necessary following an exception as shown here. Consequences of using this technique There's no doubt that introducing a Messaging Gateway Adapter into SI config makes the configuration more complex to read and follow. The key factor here is that there is no longer a linear progression through the configuration file. This because the Service Activator must forward reference a Gateway or a Gateway defined before it's adapting Service Activator - in both cases the result is the same. Resources Note:- The design for the software that drove creation of this meta-pattern was based on a requirement that a number of external risk assessment services would be accessed by a single, central Risk Assessment Service. In order to satisfy clients of the service, invocation had to take place in parallel and continue despite failure in any one of those external services. This requirement lead to the design of the Messaging Gateway Adapter Pattern for the project. Spring Integration Reference Manual The solution approach for this problem was discussed directly with Mark Fisher (SpringSource) in the context of building Risk Assessment flows for a large US financial institution. Although the configuration and code is protected by NDA and copyright, it's acceptable to express the design intention and similar code in this article.

June 3, 2012

by Matt Vickery

· 22,989 Views