Performance Resources

Service Discovery in a Microservices Architecture

Here's a quick overview of building apps using microservices. Specifically, an exploration of service discovery, including why to use service discovery, client-side, and server-side info.

October 30, 2015

by Patrick Nommensen

· 129,167 Views · 16 Likes

Failure Detection and Association Teardown With SCTP

It's time to see some failure detection procedures and finally how an SCTP association is closed.

October 20, 2015

by Tsvetomir Dimitrov

· 7,575 Views · 3 Likes

Integrating Syslog With Kinesis: Anticipating Use of the Firehose

On the heels of the Kinesis Firehose announcement, more people are going to be looking to integrate Kinesis with logging systems. Here is one take on solving that problem that integrates syslog-ng with Kinesis.

October 16, 2015

by Brian O' Neill

· 9,927 Views · 5 Likes

Jenkins, JaCoCo, and SonarQube Integration With Maven

Jenkins, SonarQube, and Jacoco are excellent tools for deploying applications. Check out these awesome Maven integrations.

October 11, 2015

by Tolga Tunca

· 156,549 Views · 16 Likes

Microservices and Kerberos Authentication

How to use Kerberos authentication with microservice architectures and API gateways.

October 6, 2015

by Jethro Bakker

· 18,506 Views · 7 Likes

Deal With Multi-Tenant Data in Solr

Different techniques can be used to handle multi-tenant data in Solr. This article discusses routing techniques you can use depending on the size and number of shards.

October 2, 2015

by Rafał Kuć

· 6,968 Views · 6 Likes

Multiplexing: TCP vs HTTP2

And now the question you've all been waiting for: can you use TCP AND HTTP2? Read on...

September 23, 2015

by Lori MacVittie

· 8,294 Views · 5 Likes

SolrCloud Rebalance API

An innovative approach that helps with an effective index and a dynamic config management system for massive multi-tenant search infrastructure in SolrCloud.

August 28, 2015

by Radu Gheorghe

· 4,673 Views · 2 Likes

How to Monitor TextView Changes in Android

In this tutorial, we will see how to monitor the text changes in Android TextView or EditText.

August 7, 2015

by Nilanchala Panigrahy

· 7,637 Views

Webpack Lazy Loading On Rails With CDN Support

Webpack is the best module bundler I’ve ever used. Just this week I used it to reduce the JS footprint of an app from 906KB to 87KB for mobile visitors. An 800KB difference! Webpack‘s core premise is that you can require('./foo') your JavaScripts. That sea of

July 3, 2015

by Swizec Teller

· 12,852 Views

Learning Spring-Cloud - Writing a Microservice

Continuing my Spring-Cloud learning journey, earlier I had covered how to write the infrastructure components of a typical Spring-Cloud and Netflix OSS based micro-services environment - in this specific instance two critical components, Eureka to register and discover services and Spring Cloud Configuration to maintain a centralized repository of configuration for a service. Here I will be showing how I developed two dummy micro-services, one a simple "pong" service and a "ping" service which uses the "pong" service. Sample-Pong microservice The endpoint handling the "ping" requests is a typical Spring MVC based endpoint: @RestController public class PongController { @Value("${reply.message}") private String message; @RequestMapping(value = "/message", method = RequestMethod.POST) public Resource pongMessage(@RequestBody Message input) { return new Resource<>( new MessageAcknowledgement(input.getId(), input.getPayload(), message)); } } It gets a message and responds with an acknowledgement. Here the service utilizes the Configuration server in sourcing the "reply.message" property. So how does the "pong" service find the configuration server, there are potentially two ways - directly by specifying the location of the configuration server, or by finding the Configuration server via Eureka. I am used to an approach where Eureka is considered a source of truth, so in this spirit I am using Eureka to find the Configuration server. Spring Cloud makes this entire flow very simple, all it requires is a "bootstrap.yml" property file with entries along these lines: --- spring: application: name: sample-pong cloud: config: discovery: enabled: true serviceId: SAMPLE-CONFIG eureka: instance: nonSecurePort: ${server.port:8082} client: serviceUrl: defaultZone: http://${eureka.host:localhost}:${eureka.port:8761}/eureka/ The location of Eureka is specified through the "eureka.client.serviceUrl" property and the "spring.cloud.config.discovery.enabled" is set to "true" to specify that the configuration server is discovered via the specified Eureka server. Just a note, this means that the Eureka and the Configuration server have to be completely up before trying to bring up the actual services, they are the pre-requisites and the underlying assumption is that the Infrastructure components are available at the application boot time. The Configuration server has the properties for the "sample-pong" service, this can be validated by using the Config-servers endpoint - http://localhost:8888/sample-pong/default, 8888 is the port where I had specified for the server endpoint, and should respond with a content along these lines: "name": "sample-pong", "profiles": [ "default" ], "label": "master", "propertySources": [ { "name": "classpath:/config/sample-pong.yml", "source": { "reply.message": "Pong" } } ] } As can be seen the "reply.message" property from this central configuration server will be used by the pong service as the acknowledgement message Now to set up this endpoint as a service, all that is required is a Spring-boot based entry point along these lines: @SpringBootApplication @EnableDiscoveryClient public class PongApplication { public static void main(String[] args) { SpringApplication.run(PongApplication.class, args); } } and that completes the code for the "pong" service. Sample-ping micro-service So now onto a consumer of the "pong" micro-service, very imaginatively named the "ping" micro-service. Spring-Cloud and Netflix OSS offer a lot of options to invoke endpoints on Eureka registered services, to summarize the options that I had: 1. Use raw Eureka DiscoveryClient to find the instances hosting a service and make calls using Spring's RestTemplate. 2. Use Ribbon, a client side load balancing solution which can use Eureka to find service instances 3. Use Feign, which provides a declarative way to invoke a service call. It internally uses Ribbon. I went with Feign. All that is required is an interface which shows the contract to invoke the service: package org.bk.consumer.feign; import org.bk.consumer.domain.Message; import org.bk.consumer.domain.MessageAcknowledgement; import org.springframework.cloud.netflix.feign.FeignClient; import org.springframework.http.MediaType; import org.springframework.web.bind.annotation.RequestBody; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.RequestMethod; import org.springframework.web.bind.annotation.ResponseBody; @FeignClient("samplepong") public interface PongClient { @RequestMapping(method = RequestMethod.POST, value = "/message", produces = MediaType.APPLICATION_JSON_VALUE, consumes = MediaType.APPLICATION_JSON_VALUE) @ResponseBody MessageAcknowledgement sendMessage(@RequestBody Message message); } The annotation @FeignClient("samplepong") internally points to a Ribbon "named" client called "samplepong". This means that there has to be an entry in the property files for this named client, in my case I have these entries in my application.yml file: samplepong: ribbon: DeploymentContextBasedVipAddresses: sample-pong NIWSServerListClassName: com.netflix.niws.loadbalancer.DiscoveryEnabledNIWSServerList ReadTimeout: 5000 MaxAutoRetries: 2 The most important entry here is the "samplepong.ribbon.DeploymentContextBasedVipAddresses" which points to the "pong" services Eureka registration address using which the service instance will be discovered by Ribbon. The rest of the application is a routine Spring Boot application. I have exposed this service call behind Hystrix which guards against service call failures and essentially wraps around this FeignClient: package org.bk.consumer.service; import com.netflix.hystrix.contrib.javanica.annotation.HystrixCommand; import org.bk.consumer.domain.Message; import org.bk.consumer.domain.MessageAcknowledgement; import org.bk.consumer.feign.PongClient; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.beans.factory.annotation.Qualifier; import org.springframework.stereotype.Service; @Service("hystrixPongClient") public class HystrixWrappedPongClient implements PongClient { @Autowired @Qualifier("pongClient") private PongClient feignPongClient; @Override @HystrixCommand(fallbackMethod = "fallBackCall") public MessageAcknowledgement sendMessage(Message message) { return this.feignPongClient.sendMessage(message); } public MessageAcknowledgement fallBackCall(Message message) { MessageAcknowledgement fallback = new MessageAcknowledgement(message.getId(), message.getPayload(), "FAILED SERVICE CALL! - FALLING BACK"); return fallback; } } Boot"ing up I have dockerized my entire set-up, so the simplest way to start up the set of applications is to first build the docker images for all of the artifacts this way: mvn clean package docker:build -DskipTests and bring all of them up using the following command, the assumption being that both docker and docker-compose are available locally: docker-compose up Assuming everything comes up cleanly, Eureka should show all the registered services, at http://dockerhost:8761 url - The UI of the ping application should be available at http://dockerhost:8080 url - Additionally a Hystrix dashboard should be available to monitor the requests to the "pong" app at this url http://dockerhost:8989/hystrix/monitor?stream=http%3A%2F%2Fsampleping%3A8080%2Fhystrix.stream: References 1. The code is available at my github location - https://github.com/bijukunjummen/spring-cloud-ping-pong-sample 2. Most of the code is heavily borrowed from the spring-cloud-samples repository - https://github.com/spring-cloud-samples

July 1, 2015

by Biju Kunjummen

· 13,446 Views · 4 Likes

JBoss BPM Suite Quick Guide: Import External Data Models to BPM Project

You are working on a big project, developing rules, events and processes at your enterprise for mission critical business needs. Part of the requirements state that a certain business unit will be providing their data model for you to leverage. This data model will not be designed in the JBoss BPM Suite Data Modeler but you need to have access to it while working on your rules, events and processes from the business central dashboard. For this article we will be using the JBoss BPM Travel Agency demo project as a reference, with it's current data model built externally to the JBoss BPM Suite business central. The external data model is called the acme-data-model and is found in the project directory: This data model is built during installation and provides you with an object data model as a Java Archive (JAR) file which is installed into the JBoss BPM Suite business central component by placing it into the following location: jboss-eap-6.4/standalone/deployments/business-central.war/WEB_INF/lib/acmeDataModel-1.0.jar Authoring --> Artifact repository. This way of deploying the data model means that it is available to all projects you work on in JBoss BPM Suite business central, something that might not always be preferable. What we need is a way to deploy external data models into JBoss BPM Suite and then selectively add them to projects as needed. Within JBoss BPM Suite there is an Artifact Repository that is made just for this purpose. We can upload through the business central dashboard UI all our models and then pick and choose from the repository artifacts (your data model is one artifact) on a per project basis. This gives you absolute control over the models that a project can access. Choose external data model file. There are a few steps involved that we will take you through here to change the current installation of JBoss BPM Travel Agency where the acmeDataModel-1.0.jar file will be removed from the previously mentioned business central component and uploaded into the Artifact Repository and added to the Special Trips Agency project. Here is how you can do it yourself: obtain and install JBoss BPM Travel Agency demo project remove current data model from global business central application: $ rm ./target/jboss-eap-6.4/standalone/deployments/business-central.war/WEB_INF/lib/acmeDataModel-1.0.jar Upload external model jar file. start JBoss BPM Suite server after installation as stated in the installation instructions login to JBoss BPM Suite at http://localhost:8080/business-centralwith: u: erics p: bpmsuite1! go to AUTHORING --> ARTIFACT REPOSITORY go to UPLOAD --> CHOOSE FILE... --> projects/acme-data-model/target/acmeDataModel-1.0.jar --> click button to UPLOAD this puts the external data model into the JBoss BPM Suite artifact repository Select dependencies to add to project. got to AUTHORING --> PROJECT AUTHORING --> OPEN PROJECT EDITOR in project editor select GENERAL PROJECT SETTINGS --> DEPENDENCIES in dependencies select ADD FROM REPOSITORY -> in pop-upSELECT entry acmeDataModel-1.0.jar This will result in the external data model being added only to the Special Trips Agency project and not available to other projects unless they add this same dependency from the JBoss BPM Suite artifact repository. If you build & deploy the project, run it as described in the project instructions you will find that the external data model is available and used by the various rules and process components that are the JBoss BPM Travel Agency. As a closing note, this works exactly the same for JBoss BRMS projects.

June 29, 2015

by Eric D. Schabell

CORE

· 2,777 Views · 1 Like

Launching Missiles With Haskell

Haskell advocates are fond of saying that a Haskell function cannot launch missiles without you knowing it. Pure functions have no side effects, so they can only do what they purport to do. In a language that does not enforce functional purity, calling a function could have arbitrary side effects, including launching missiles. But this cannot happen in Haskell. The difference between pure functional languages and traditional imperative languages is not quite that simple in practice. Programming with pure functions is conceptually easy but can be awkward in practice. You could just pass each function the state of the world before the call, and it returns the state of the world after the call. It’s unrealistic to pass a program’s entire state as an argument each time, so you’d like to pass just that state that you need to, and have a convenient way of doing so. You’d also like the compiler to verify that you’re only passing around a limited slice of the world. That’s where monads come in. Suppose you want a function to compute square roots and log its calls. Your square root function would have to take two arguments: the number to find the root of, and the state of the log before the function call. It would also return two arguments: the square root, and the updated log. This is a pain, and it makes function composition difficult. Monads provide a sort of side-band for passing state around, things like our function call log. You’re still passing around the log, but you can do it implicitly using monads. This makes it easier to call and compose two functions that do logging. It also lets the compiler check that you’re passing around a log but not arbitrary state. A function that updates a log, for example, can effect the state of the log, but it can’t do anything else. It can’t launch missiles. Once monads get large and complicated, it’s hard to know what side effects they hide. Maybe they can launch missiles after all. You can only be sure by studying the source code. Now how do you know that calling a C function, for example, doesn’t launch missiles? You study the source code. In that sense Haskell and C aren’t entirely different. The Haskell compiler does give you assurances that a C compiler does not. But ultimately you have to study source code to know what a function does and does not do.

June 28, 2015

by John Cook

· 11,711 Views · 1 Like

CryTek's CryEngine 3.8.1 Released: Updates Include Linux and VR Support

CryTek just released its CryEngine 3.8.1, an update packed with features. The CryEngine website dubs this the heftiest upgrade since debuting their Engine-as-a-Service in May. Among the many new features is virtual reality (VR) support, OpenGL compatibility, and Linux support. One of the biggest trends in gaming is VR, and CryEngine 3.8.1 adds VR support. Initially, it’s limited to the Oculus Rift, but chances are as more headsets emerge and see adoption among both developers and gamers, compatibility will expand to support these as well. Epic Games’ Unreal Engine also added VR support, along with Unity3D’s most recent release. There’s a neat VR demo on the CryTek website, which developers will surely want to check out. Another significant change is the addition of Linux support. While Wine and Playonlinux have both helped many games to run on a variety of Linux-based operating systems (OSes). However, native Linux support means easier use for developers. As more games add Linux compatibility, spearheaded by Steam’s SteamOS, the CryEngine itself can now be run on Linux. This latest update means that the CryEngine will join Unity3D, the Unreal Engine, and Source as a powerful game development engine with Linux and VR support. Virtual reality is seeing widespread adoption among the developer community, and Linux compatibility in gaming is a huge trend. While CryEngine 3.8.1’s ability to run on Linux won’t necessarily mean games developed will be compatible on the popular open source OS, it certainly makes it easier to ensure Linux support. There’s also OpenGL support, which will also aid cross-platform development. The CryEngine has been used to create many games known for their gorgeous eye-candy. Notably, the CryTek’s aptly names Crysis series is built with the CryEngine, as is Ryse: Son of Rome, State of Decay, and Kingdom Come: Deliverance.

June 26, 2015

by Moe Long

CORE

· 1,025 Views

From Design to Execution with JBoss BPM Suite & Signavio Process Editor

Occasionally we are asked about JBoss BPM Suite integration with other products and layers in an enterprises architecture. We have published articles talking about how to achieve this with various aspects such as: Microservices integration Data integration Articles are one thing, but seeing is believing, so we have done a few webinars to show you live how to tackle integration: Data integration webinar PEX webinar Along with these articles we have always published demo projects that give you a closer look and chance to get hands on with these integration strategies: JBoss BPM Suite & JBoss Fuse Travel microservices story JBoss BPM Suite & JBoss Data Virtualization integration Imported Signavio Process Editor mortgage workflow. There is another integration story yet to be told about how one can leverage other tooling together with JBoss BPM Suite. This article will introduce one such company,Signavio, that provides a Signavio Process Editor so"...you can start modeling and engaging your organization in improving operational efficiency through the development of optimal models..." The following demo project provides a working example of how you can model an example mortgage process in Signavio Process Editor and then bring it into JBoss BPM Suite where you can add implementation details, integration details and other implementation details to finally execute the mortgage process end-to-end. Demo project As always we bring you not only a story, but a reusable demo project you can easily spin up yourself to explore the details around how a JBoss BPM project would integrate with the model designed in Signavio Process Editor. The project is called the JBoss BPM Suite & Signavio Process Editor Integration Demo. The project installs JBoss BPM Suite 6.1 with an example mortgage project with rules, process, forms and other artifacts. It also includes a copy of an exported Signavio Process Editor mortgage process that we then show how to import. Final mortgage workflow project with implementation details and integration details completed. Ready to run! This gives you the initial starting point after importing the Signavio process and the completely integrated final mortgage project that you can run side-by-side. To setup this project there are just a few simple steps to get going and will be up and running minutes: Installation Download and unzip. Add products to installs directory. Run 'init.sh' or 'init.bat' file. 'init.bat' must be run with Administrative privileges. Start JBoss BPMS Server by running 'standalone.sh' or 'standalone.bat' in the /target/jboss-eap-6.1/bin directory. Login to http://localhost:8080/business-central - login for admin, appraisor, broker, and manager roles (u:erics / p:bpmsuite1!) Mortgage Loan demo pre-installed as project. Using process designer, import the Signavio process that was exported to the file found in: support/MortgageDemoSignavio.bpmn Looking to Automate your business? See screenshots provided in project for how this should look and note that the JBoss BPM Suite process designer included validation that puts messages about tasks not specified, this is correct as at this point you need to start implementing the process tasks. You can examine the imported process and note the various details captured during initial workshops have been put into the process details for each step in the workflow. After implementing these steps you will find the final process ready to run. You can now explore the final project by deploying it and starting a new instance. We hope you enjoy this example project and feel free to browse for more at JBoss Demo Central.

June 26, 2015

by Eric D. Schabell

CORE

· 1,672 Views · 1 Like

MaxScale: A New Tool to Solve Your MySQL Scalability Problems

Written by Yves Trudeau Ever since MySQL replication has existed, people have dreamed of a good solution to automatically split read from write operations, sending the writes to the MySQL master and load balancing the reads over a set of MySQL slaves. While if at first it seems easy to solve, the reality is far more complex. First, the tool needs to make sure it parses and analyses correctly all the forms of SQL MySQL supports in order to sort writes from reads, something that is not as easy as it seems. Second, it needs to take into account if a session is in a transaction or not. While in a transaction, the default transaction isolation level in InnoDB, Repeatable-read, and the MVCC framework insure that you’ll get a consistent view for the duration of the transaction. That means all statements executed inside a transaction must run on the master but, when the transaction commits or rollbacks, the following select statements on the session can be again load balanced to the slaves, if the session is in autocommit mode of course. Then, what do you do with sessions that set variables? Do you restrict those sessions to the master or you replay them to the slave? If you replay the set variable commands, you need to associate the client connection to a set of MySQL backend connections, made of at least a master and a slave. What about temporary objects like with “create temporary table…”? How do you deal when a slave lags behind or what if worse, replication is broken? Those are just a few of the challenges you face when you want to build a tool to perform read/write splitting. Over the last few years, a few products have tried to tackle the read/write split challenge. The MySQL_proxy was the first attempt I am aware of at solving this problem but it ended up with many limitations. ScaleARC does a much better job and is very usable but it stills has some limitations. The latest contender is MaxScale from MariaDB and this post is a road story of my first implementation of MaxScale for a customer. Let me first introduce what is MaxScale exactly. MaxScale is an open source project, developed by MariaDB, that aims to be a modular proxy for MySQL. Most of the functionality in MaxScale is implemented as modules, which includes for example, modules for the MySQL protocol, client side and server side. Other families of available modules are routers, monitors and filters. Routers are used to determine where to send a query, Read/Write splitting is accomplished by the readwritesplit router. The readwritesplit router uses an embedded MySQL server to parse the queries… quite clever and hard to beat in term of query parsing. There are other routers available, the readconnrouter is basically a round-robin load balancer with optional weights, the schemarouter is a way to shard your data by schema and the binlog router is useful to manage a large number of slaves (have a look at Booking.com’s Jean-François Gagné’s talk at PLMCE15 to see how it can be used). Monitors are modules that maintain information about the backend MySQL servers. There are monitors for a replicating setup, for Galera and for NDB cluster. Finally, the filters are modules that can be inserted in the software stack to manipulate the queries and the resultsets. All those modules have well defined APIs and thus, writing a custom module is rather easy, even for a non-developer like me, basic C skills are needed though. All event handling in MaxScale uses epoll and it supports multiple threads. Over the last few months I worked with a customer having a challenging problem. On a PXC cluster, they have more than 30k queries/s and because of their write pattern and to avoid certification issues, they want to have the possibility to write to a single node and to load balance the reads. The application is not able to do the Read/Write splitting so, without a tool to do the splitting, only one node can be used for all the traffic. Of course, to make things easy, they use a lot of Java code that set tons of sessions variables. Furthermore, for ISO 27001 compliance, they want to be able to log all the queries for security analysis (and also for performance analysis, why not?). So, high query rate, Read/Write splitting and full query logging, like I said a challenging problem. We experimented with a few solutions. One was a hardware load balancer that failed miserably – the implementation was just too simple, using only regular expressions. Another solution we tried was ScaleArc but it needed many rules to whitelist the set session variables and to repeat them to multiple servers. ScaleArc could have done the job but all the rules increases the CPU load and the cost is per CPU. The queries could have been sent to rsyslog and aggregated for analysis. Finally, the HA implementation is rather minimalist and we had some issues with it. Then, we tried MaxScale. At the time, it was not GA and was (is still) young. Nevertheless, I wrote a query logging filter module to send all the queries to a Kafka cluster and we gave it a try. Kafka is extremely well suited to record a large flow of queries like that. In fact, at 30k qps, the 3 Kafka nodes are barely moving with cpu under 5% of one core. Although we encountered some issues, remember MaxScale is very young, it appeared to be the solution with the best potential and so we moved forward. The folks at MariaDB behind MaxScale have been very responsive to the problems we encountered and we finally got to a very usable point and the test in the pilot environment was successful. The solution is now been deployed in the staging environment and if all goes well, it will be in production soon. The following figure is simplified view of the internals of MaxScale as configured for the customer: The blocks in the figure are nearly all defined in the configuration file. We define a TCP listener using the MySQL protocol (client side) which is linked with a router, either the readwritesplit router or the readconn router. The first step when routing a query is to assign the backends. This is where the read/write splitting decision is made. Also, as part of the steps required to route a query, 2 filters are called, regexp (optional) and Genlog. The regexp filter may be used to hot patch a query and the Genlog filter is the logging filter I wrote for them. The Genlog filter will send a json string containing about what can be found in the MySQL general query log plus the execution time. Authentication attempts are also logged but the process is not illustrated in the figure. A key point to note, the authentication information is cached by MaxScale and is refreshed upon authentication failure, the refresh process is throttled to avoid overloading the backend servers. The servers are continuously monitored, the interval is adjustable, and the server status are used when the decision to assign a backend for a query is done. In term of HA, I wrote a simple Pacemaker resource agent for MaxScale that does a few fancy things like load balancing with IPTables (I’ll talk about that in future post). With Pacemaker, we have a full fledge HA solution with quorum and fencing on which we can rely. Performance wise, it is very good – a single core in a virtual environment was able to read/write split and log to Kafka about 10k queries per second. Although MaxScale supports multiple threads, we are still using a single thread per process, simply because it yields a slightly higher throughput and the custom Pacemaker agent deals with the use of a clone set of MaxScale instances. Remember we started early using MaxScale and the beta versions were not dealing gracefully with threads so we built around multiple single threaded instances. So, since a conclusion is needed, MaxScale has proven to be a very useful and flexible tool that allows to elaborate solutions to problems that were very hard to tackle before. In particular, if you need to perform read/write splitting, then, try MaxScale, it is best solution for that purpose I have found so far. Keep in touch, I’ll surely write other posts about MaxScale in the near future.

June 26, 2015

by Peter Zaitsev

· 1,167 Views

Generating CSV-files on .NET

I have project where I need to output some reports as CSV-files. I found a good library called CsvHelper from NuGet and it works perfect for me. After some playing with it I was able to generate CSV-files that were shown correctly in Excel. Here is some sample code and also extensions that make it easier to work with DataTables. Simple report Here’s the simple fragment of code that illustrates how to use CsvHelper. using (var writer = new StreamWriter(Response.OutputStream)) using (var csvWriter = new CsvWriter(writer)) { csvWriter.Configuration.Delimiter = ";"; csvWriter.WriteField("Task No"); csvWriter.WriteField("Customer"); csvWriter.WriteField("Title"); csvWriter.WriteField("Manager"); csvWriter.NextRecord(); foreach (var project in data) { csvWriter.WriteField(project.Code); csvWriter.WriteField(project.CustomerName); csvWriter.WriteField(project.Name); csvWriter.WriteField(project.ProjectManagerName); csvWriter.NextRecord(); } } Of course, you can use other methods to output whole object or object list with one shot. I just needed here custom headers that doesn’t match property names 1:1. Generic helper for DataTable Some of my projects come from service layer as DataTable. I don’t want to add new models or Data Transfer Objects (DTO) with no good reason and DataTable is actually flexible enough if you need to add new fields to report and you want to do it fast. As DataTables are not supported by default (yet?), I wrote simple extension methods that work on DataTable views. When called on DataTable it selects default view automatically. The idea is – you can set filter on default data view and leave out the rows you don’t need. If you just want to show DataTable to screen as table then check out my posting Simple view to display contents of DataTable. public static class CsvHelperExtensions { public static void WriteDataTable(this CsvWriter csvWriter, DataTable table) { WriteDataView(csvWriter, table.DefaultView); } public static void WriteDataView(this CsvWriter csvWriter, DataView view) { foreach (DataColumn col in view.Table.Columns) { csvWriter.WriteField(col.ColumnName); } csvWriter.NextRecord(); foreach (DataRowView row in view) { foreach (DataColumn col in view.Table.Columns) { csvWriter.WriteField(row[col.ColumnName]); } csvWriter.NextRecord(); } } } And here is simple MVC controller action that gets data as DataTable and returns it as CSV-file. The result is CSV-file that opens correctly in Excel. [HttpPost] public void ExportIncomesReport() { var data = // Get DataTable here Response.ContentType = "text/csv"; Response.AddHeader("Content-disposition", "attachment;filename=IncomesReport.csv"); var preamble = Encoding.UTF8.GetPreamble(); Response.OutputStream.Write(preamble, 0, preamble.Length); using (var writer = new StreamWriter(Response.OutputStream)) using (var csvWriter = new CsvWriter(writer)) { csvWriter.Configuration.Delimiter = ";"; csvWriter.WriteDataTable(data); } } One thing to notice – with CsvHelper we have full control over a stream where we write data and this way we can write more performant code. Related Posts .Net Framework 4.0: string.IsNullOrWhiteSpace() method Exporting GridView Data to Excel Code Contracts: Hiding ContractException How to dump object properties My object to object mapper source released The post Generating CSV-files on .NET appeared first on Gunnar Peipman - Programming Blog.

June 26, 2015

by Gunnar Peipman

· 4,425 Views · 1 Like

Overcoming Barriers to Performance and Scalability Test Automation

[This article was written by Ophir Prusak] Guest author Ophir Prusak is chief evangelist atBlazeMeter. To learn more about load and performance testing automation, he invites readers toattend a meetupthis Wednesday, June 24, at New Relic’s San Francisco offices. Performance and load testing are kind of like flossing your teeth. You know you need to do it, but you might not be doing it as much as you should. When your site goes down because it couldn’t handle the load, you look back and realize you might have easily prevented it with a little more testing in advance. That’s why companies are automating their application testing in an effort to lower costs, increase efficiency, and reduce the time needed to release new features. The importance of automated testing in a continuous delivery era Continuous Delivery (CD) is rapidly emerging as the “new normal” in software development, as Perforce discovered in an independent survey, with an estimated 80% of SaaS companies and 51% of non-SaaS companies adopting this practice. Companies that provide Software-as-a-Service know they need to be continuously creating new features, updating their websites, and optimizing their backend. But while software development has adapted nicely in terms of automation, the testing side has moved more slowly. For a fully Continuous Delivery and Integration process to be realized, performance testing must be automated. As the need for testing increases, doing it manually can dramatically increase your time to release. Automating testing throughout the CD process can help detect errors instantly and deliver software faster. Making it work JMeter is the de facto standard in open source load testing. It’s the most widely used open source tool for performance testing for a good reason. There’s virtually nothing it can’t test (websites, native mobile applications, APIs, and Web applications) and it’s extremely powerful and fully featured. Yet there are challenges. JMeter poses a steep learning curve in terms of integration and ease of use. Additionally, it doesn’t integrate easily with APM and Continuous Integration (CI) tools. Many developers have been looking for a way to conduct performance testing with less time and effort—and fewer hiccups along the way. Taurus: An effort to simplify test automation A new open source project called Taurus (Test AUtomation Running Smoothly) is designed to provide exactly that—a way to remove most of the pain of using JMeter on its own. Taurus can give you the ability to Create and define a load test even without using JMeter. Override existing JMeter files or tests configurations. Create human-readable configuration files and testing scripts that are easily added to source control systems like GitHub. Integrate into CI tools like Jenkins. Run multiple tests in parallel. Provide pass/fail criteria back into the CI tool for easier automation of test-results analysis. Make analysis of test results easier and more intuitive. Taurus still uses JMeter under the hood, but is designed to have a much easier learning curve, especially for simple tests. Taurus also offers a built-in result analysis engine that provides both console-based reporting features and result analysis. Performance testing and optimizing your applications is not simple, yet there are solutions available that make the process easier and more successful. I’m looking forward to seeing how the technology evolves even further in the near future. If you want to learn more about Taurus, check out the project on GitHub. Better yet, you are invited to come to a meetup this Wednesday, June 24, at New Relic’s San Francisco offices. You can learn a lot more about Taurus and how you can use it to help scale load and performance testing automation.

June 24, 2015

by Fredric Paul

· 1,541 Views

Big Data TCO Lessons From Virtualization Technology Sprawl

The complexity of big data makes it a difficult concept for many to grasp, and utilizing it effectively is one of the biggest challenges businesses face today. There is little doubt that big data offers organizations a number of clear advantages, but applying them across the entire enterprise is one obstacle that can truly be described as formidable, even daunting, to even the most technologically savvy companies. One department might be able to create its own business solutions through big data analytics, while another department might come up with answers of their own, but lack of true coordination and collaboration remains a significant problem. Businesses aren’t without help in this area, however, because they’ve encountered similar problems before. Many companies have encountered issues such as virtualization technology sprawl, and the lessons learned from addressing that problem could prove to be exceptionally valuable when dealing with big data true cost of ownership (TCO). To understand the problem and the solution, we must first look back at the rapid growth of virtualization technology, more specifically server virtualization. As businesses adopted virtualization, the mainframe systems soon diverged into multiple systems. The more popular virtualization became, the more projects were taken on and the more technologies diverged. Larger companies eventually sought technology specialists to work within their areas of expertise. The result of the use of these individual teams was virtualization technology sprawl, an inefficient development that eventually lead to even higher operational costs. For all the benefits virtualization technology offered, many of them were outweighed by the increased demands and greater management complexity that came from technology sprawl. Businesses were quick to come up with new solutions for the problem. The most common was to adopt a converged infrastructure . This strategy directly addressed the higher operational costs that resulted from technology sprawl, basically breaking through the silos by taking multiple technologies and combining them into single stacks for computing, storage, and networking. This made the management of virtualization technology much easier since operational complexity was significantly reduced. In other words, management of this technology was kept at a reasonable size. The same principle can apply to big data management across an entire organization. When it comes to management of big data and hadoop security, it’s easy to get caught up in the immensity of it all. The fact that big data is so versatile and can be applied to so many different use cases also means it can apply to any number of different divisions within a company. This creates silos and a general desire to hold onto data sets. In other words, big data ends up in a sprawl of its own, becoming that much more unwieldy and complicated, which is a major problem for a technology that’s already so complex to begin with. The lesson that every company should take away from the solution to virtualization technology sprawl is the breaking down of barriers to big data management. It all comes down to ready access to all the necessary data no matter what roles an employee may have within a company. Businesses shouldn’t have to worry over the cost it takes to store and process data since the insights gained from big data analytics are particularly valuable. Most importantly, it’s about avoiding big data from getting too big, to the point where it becomes unmanageable and merely adds to the overall operating costs of a company. It’s true that big data introduces more complexity, but businesses that have learned how to store and process it efficiently, sometimes through big data platforms or cloud-based services, are in a more advantageous position than companies still dealing with technology sprawl. The lessons learned from previous problems can indeed play a helpful role in solving the problems many experience today.

June 22, 2015

by Rick Delgado

· 1,689 Views

Long-Term Log Analysis with AWS Redshift

You will aggregate a lot of logs over the lifetime of your product and codebase, so it’s important to be able to search through them. In the rare case of a security issue, not having that capability is incredibly painful. You might be able to use services that allow you to search through the logs of the last two weeks quickly. But what if you want to search through the last six months, a year, or even further? That availability can be rather expensive or not even an option at all with existing services. Many hosted log services provide S3 archival support which we can use to build a long-term log analysis infrastructure with AWS Redshift. Recently I’ve set up scripts to be able to create that infrastructure whenever we need it at Codeship. AWS Redshift AWS Redshift is a data warehousing solution by AWS. It has an easy clustering and ingestion mechanism ideal for loading large log files and then searching through them with SQL. As it automatically balances your log files across several machines, you can easily scale up if you need more speed. As I said earlier, looking through large amounts of log files is a relatively rare occasion; you don’t need this infrastructure to be around all the time, which makes it a perfect use case for AWS. Setting Up Your Log Analysis Let’s walk through the scripts that drive our long-term log analysis infrastructure. You can check them out in the flomotlik/redshift-logging GitHub repository. I’ll take you step by step through configuring the whole setup of the environment variables needed, as well as starting the creation of the cluster and searching the logs. But first, let’s get a high-level overview of what the setup script is doing before going into all the different options that you can set: Creates an AWS Redshift cluster. You can configure the number of servers and which server type should be used. Waits for the cluster to become ready. Creates a SQL table inside the Redshift cluster to load the log files into. Ingests all log files into the Redshift cluster from AWS S3. Cleans up the database and prints the psql access command to connect into the cluster. Be sure to check out the script on GitHub before we go into all the different options that you can set through the .env file. Options to set The following is a list of all the options available to you. You can simply copy the .env.template file to .env and then fill in all the options to get picked up. AWS_ACCESS_KEY_ID AWS key of the account that should run the Redshift cluster. AWS_SECRET_ACCESS_KEY AWS secret key of the account that should run the Redshift cluster. AWS_REGION=us-east-1 AWS region the cluster should run in, default us-east-1. Make sure to use the same region that is used for archiving your logs to S3 to have them close. REDSHIFT_USERNAME Username to connect with psql into the cluster. REDSHIFT_PASSWORD Password to connect with psql into the cluster. S3_AWS_ACCESS_KEY_ID AWS key that has access to the S3 bucket you want to pull your logs from. We run the log analysis cluster in our AWS Sandbox account but pull the logs from our production AWS account so the Redshift cluster doesn’t impact production in any way. S3_AWS_SECRET_ACCESS_KEY AWS secret key that has access to the S3 bucket you want to pull your logs from. PORT=5439 Port to connect to with psql. CLUSTER_TYPE=single-node The cluster type can be single-node or multi-node. Multi-node clusters get auto-balanced which gives you more speed at a higher cost. NODE_TYPE Instance type that’s used for the nodes of the cluster. Check out the Redshift Documentation for details on the instance types and their differences. NUMBER_OF_NODES=10 Number of nodes when running in multi-mode. CLUSTER_IDENTIFIER=log-analysis DB_NAME=log-analysis S3_PATH=s3://your_s3_bucket/papertrail/logs/862693/dt=2015 Database format and failed loads When ingesting log statements into the cluster, make sure to check the amount of failed loads that are happening. You might have to edit the database format to fit to your specific log output style. You can debug this easily by creating a single-node cluster first that only loads a small subset of your logs and is very fast as a result. Make sure to have none or nearly no failed loads before you extend to the whole cluster. In case there are issues, check out the documentation of the copy command which loads your logs into the database and the parameters in the setup script for that. Example and benchmarks It’s a quick thing to set up the whole cluster and run example queries against it. For example, I’ll load all of our logs of the last nine months into a Redshift cluster and run several queries against it. I haven’t spent any time on optimizing the table, but you could definitely gain some more speed out of the whole system if necessary. It’s just fast enough already for us out of the box. As you can see here, loading all logs of May — more than 600 million log lines — took only 12 minutes on a cluster of 10 machines. We could easily load more than one month into that 10-machine cluster since there’s more than enough storage available, but for this post, one month is enough. After that, we’re able to search through the history of all of our applications and past servers through SQL. We connect with our psql client and send of SQL queries against the “events’ database. For example, what if we want to know how many build servers reported logs in May: loganalysis=# select count(distinct(source_name)) from events where source_name LIKE 'i-%'; count ------- 801 (1 row) So in May, we had 801 EC2 build servers running for our customers. That query took ~3 seconds to finish. Or let’s say we want to know how many people accessed the configuration page of our main repository (the project ID is hidden with XXXX): loganalysis=# select count(*) from events where source_name = 'mothership' and program LIKE 'app/web%' and message LIKE 'method=GET path=/projects/XXXX/configure_tests%'; count ------- 15 (1 row) So now we know that there were 15 accesses on that configuration page throughout May. We can also get all the details, including who accessed it when through our logs. This could help in case of any security issues we’d need to look into. The query took about 40 seconds to go though all of our logs, but it could be optimized on Redshift even more. Those are just some of the queries you could use to look through your logs, gaining more insight into your customers’ use of your system. And you et all of that with a setup that costs $2.50 an hour, can be shut down immediately, and recreated any time you need access to that data again. Conclusions Being able to search through and learn from your history is incredibly important for building a large infrastructure. You need to be able to look into your history easily, especially when it comes to security issues. With AWS Redshift, you have a great tool in hand that allows you to start an ad hoc analytics infrastructure that’s fast and cheap for short-term reviews. Of course, Redshift can do a lot more as well. Let us know what your processes and tools around logging, storage, and search are in the comments.

June 21, 2015

by Florian Motlik

· 1,276 Views

The Latest Performance Topics