Using Spark, you can identify duplicate files in your S3 storage by calculating checksums. It's a quick, easy way to ensure you aren't carrying extra weight.
Some people say that I must have a bot to read and reply to emails at all crazy hours of the day — some type of awesome email assistant. Well, I decided to prototype it.
This deep dive into Clojure's reducers, transducers and core.async is not for the faint of heart. Get to know how and when to use these tools for abstraction.
Here's the solution to a timestamp format issue that occurs when reading CSV in Spark for both Spark versions 2.0.1 or newer and for Spark versions 2.0.0 or older.
Protocol Buffers are a high-performance alternative to text-based protocols like XML or JSON. Adapting them to a Spring Boot application is easy with these steps.
We take a look at how to collect and store data from several BLE devices, a necessary issue to deal with considering the built-in limitations of BLE networking.
Looking to make a spreadsheet web application for your team to chart their work? Read on to learn how to do just that using the open-source framework, Webix.
Druid is a high-performance, column-oriented, distributed data store. Learn how it's great for low-latency analytics and why you should integrate it with Apache Hive.
In this article, we'll take a look at the cases where advanced analytics can be implemented in an Order to Cash (O2C) process, going over potential use cases for anomaly detection, streaming analytics, and recommendation engines.