Searchable Documents? Yes You Can. Another Reason to Choose AsciiDoc
Join the DZone community and get the full member experience.
Join For Free
Let's add required dependencies:
<dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> <dependency> <groupId>com.googlecode.lambdaj</groupId> <artifactId>lambdaj</artifactId> <version>2.3.3</version> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>0.90.1</version> </dependency> <dependency> <groupId>org.asciidoctor</groupId> <artifactId>asciidoctor-java-integration</artifactId> <version>0.1.3</version> </dependency> </dependencies>
node = nodeBuilder().local(true).node();
Next step is parse AsciiDoc document header, read its content and convert them into a json document.
An example of json document stored in Elasticsearch can be:
{ "title":"Asciidoctor Maven plugin 0.1.2 released!", "authors":[ { "author":"Jason Porter", "email":"example@mail.com" } ], "version":null, "content":"= Asciidoctor Maven plugin 0.1.2 released!.....", "tags":[ "release", "plugin" ] }
And for converting an AsciiDoc File to a json document we are going to useXContentBuilder class which is provided by ElasticsearchJava API to create jsondocuments programmatically.
package com.lordofthejars.asciidoctor; import static org.elasticsearch.common.xcontent.XContentFactory.*; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.util.List; import org.asciidoctor.Asciidoctor; import org.asciidoctor.Author; import org.asciidoctor.DocumentHeader; import org.asciidoctor.internal.IOUtils; import org.elasticsearch.common.xcontent.XContentBuilder; import ch.lambdaj.function.convert.Converter; public class AsciidoctorFileJsonConverter implements Converter<File, XContentBuilder> { private Asciidoctor asciidoctor; public AsciidoctorFileJsonConverter() { this.asciidoctor = Asciidoctor.Factory.create(); } public XContentBuilder convert(File asciidoctor) { DocumentHeader documentHeader = this.asciidoctor.readDocumentHeader(asciidoctor); XContentBuilder jsonContent = null; try { jsonContent = jsonBuilder() .startObject() .field("title", documentHeader.getDocumentTitle()) .startArray("authors"); Author mainAuthor = documentHeader.getAuthor(); jsonContent.startObject() .field("author", mainAuthor.getFullName()) .field("email", mainAuthor.getEmail()) .endObject(); List<Author> authors = documentHeader.getAuthors(); for (Author author : authors) { jsonContent.startObject() .field("author", author.getFullName()) .field("email", author.getEmail()) .endObject(); } jsonContent.endArray() .field("version", documentHeader.getRevisionInfo().getNumber()) .field("content", readContent(asciidoctor)) .array("tags", parseTags((String)documentHeader.getAttributes().get("tags"))) .endObject(); } catch (IOException e) { throw new IllegalArgumentException(e); } return jsonContent; } private String[] parseTags(String tags) { tags = tags.substring(1, tags.length()-1); return tags.split(", "); } private String readContent(File content) throws FileNotFoundException { return IOUtils.readFull(new FileInputStream(content)); } }
import static ch.lambdaj.Lambda.convert; //.... private void populateData(Client client) throws IOException { List<File> asciidoctorFiles = new ArrayList<File>() {{ add(new File("target/test-classes/java_release.adoc")); add(new File("target/test-classes/maven_release.adoc")); }}; List<XContentBuilder> jsonDocuments = convertAsciidoctorFilesToJson(asciidoctorFiles); for (int i=0; i < jsonDocuments.size(); i++) { client.prepareIndex("docs", "asciidoctor", Integer.toString(i)).setSource(jsonDocuments.get(i)).execute().actionGet(); } client.admin().indices().refresh(new RefreshRequest("docs")).actionGet(); } private List<XContentBuilder> convertAsciidoctorFilesToJson(List<File> asciidoctorFiles) { return convert(asciidoctorFiles, new AsciidoctorFileJsonConverter()); }
It is important to note that the first part of the algorithm is converting all our AsciiDocfiles (in our case two) to XContentBuilder instances by using previous converter class and the method convert of Lambdaj project.
If you want you can take a look to both documents used in this example in https://github.com/asciidoctor/asciidoctor.github.com/blob/develop/news/asciidoctor-java-integration-0-1-3-released.adoc and https://github.com/asciidoctor/asciidoctor.github.com/blob/develop/news/asciidoctor-maven-plugin-0-1-2-released.adoc.
Next part is inserting documents inside one index. This is done by using prepareIndexmethod, which requires an index name (docs), an index type (asciidoctor), and the idof the document being inserted. Then we call setSource method which transforms theXContentBuilder object to json, and finally by calling execute().actionGet(), data is sent to database.
The final step is only required because we are using an embedded instance ofElasticsearch (in production this part should not be required), which refresh the indexes by calling refresh method.
After that point we can start querying Elasticsearch for retrieving information from our AsciiDoc documents.
Let's start with very simple example, which returns all documents inserted:
SearchResponse response = client.prepareSearch().execute().actionGet();
Next we are going to search for all documents that has been written by Alex Sotowhich in our case is one.
import static org.elasticsearch.index.query.QueryBuilders.matchQuery; //.... QueryBuilder matchQuery = matchQuery("author", "Alex Soto"); QueryBuilder matchQuery = matchQuery("author", "Alexander Soto");
More queries, how about finding documents written by someone who is called Alex, but not Soto.
import static org.elasticsearch.index.query.QueryBuilders.fieldQuery; //.... QueryBuilder matchQuery = fieldQuery("author", "+Alex -Soto");
Also you can find all documents which contains the word released on title.
import static org.elasticsearch.index.query.QueryBuilders.matchQuery; //.... QueryBuilder matchQuery = matchQuery("title", "released");
And finally let's find all documents that talks about 0.1.2 release, in this case only one document talks about it, the other one talks about 0.1.3.
QueryBuilder matchQuery = matchQuery("content", "0.1.2");
Now we only have to send the query to Elasticsearch database, which is done by using prepareSearch method.
SearchResponse response = client.prepareSearch("docs") .setTypes("asciidoctor") .setQuery(matchQuery) .execute() .actionGet(); SearchHits hits = response.getHits(); for (SearchHit searchHit : hits) { System.out.println(searchHit.getSource().get("content")); }
So in this post we have seen how to index documents using Elasticsearch, how to get some important information from AsciiDoc files using Asciidoctor-java-integration project, and finally how to execute some queries to inserted documents. Of course there are more kind of queries in Elasticsearch, but the intend of this post wasn't to explore all possibilities of Elasticsearch.
Published at DZone with permission of Alex Soto, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments