Big Data Resources

Top Sites to Learn the Internet of Things

This collection of sites and blogs will provide never-ending inspiration and learning opportunities for devs who want to master IoT.

June 26, 2017

by Francesco Azzola

· 20,832 Views · 9 Likes

Reviewing Open-Source Business Intelligence Tools

Review some open-source Business Intelligence tools that are built to simplify planning, analysis, and reporting with one software suite.

June 21, 2017

by Luba Belokon

· 16,708 Views · 7 Likes

NMEA Data Acquisition: An IoT Exercise With Python

This comprehensive post covers the basic data arc that many IoT projects have—exploration, modeling, filtering, and persistence—using Python.

June 21, 2017

by Steven Lott

· 13,511 Views · 4 Likes

Spark Streaming vs. Kafka Streaming

If event time is very relevant and latencies in the seconds range are completely unacceptable, Kafka should be your first choice. Otherwise, Spark works just fine.

June 19, 2017

by Mahesh Chand Kandpal

· 140,226 Views · 31 Likes

Streaming in Spark, Flink, and Kafka

There is a lot of buzz going on between when to use Spark, when to use Flink, and when to use Kafka. Get it all straight in this article.

June 18, 2017

by Shivangi Gupta

· 54,190 Views · 26 Likes

What Is Variable Importance and How Is It Calculated?

Variable Importance (VI) helps data scientists weed out certain predictors that are contributing to nothing and that instead add time to processing.

June 15, 2017

by Avkash Chauhan

· 26,252 Views · 5 Likes

How to Install the ELK Stack on Azure

Want to switch to the ELK Stack for your logging? Even better, want to get it running on your Azure cloud? This guide will walk you through setting up each component.

June 13, 2017

by PJ Hagerty

· 12,239 Views · 3 Likes

Identifying Duplicate Files in AWS S3 With Apache Spark

Using Spark, you can identify duplicate files in your S3 storage by calculating checksums. It's a quick, easy way to ensure you aren't carrying extra weight.

June 12, 2017

by Nikhil Bhide

· 16,670 Views · 9 Likes

Predictive Analytics and Machine Learning Explained Through Dog Memes

The way that memes go viral is very similar to the way that Machine Learning and predictive analytics work. How in the world could this be?!

June 4, 2017

by Gur Tirosh

· 8,341 Views · 1 Like

Apache Flume: Regex Filtering

There's so much precious data out there that it can be difficult for humans to get meaning out of it sometimes. Apache Flume to the rescue!

Updated June 1, 2017

by Nikhil Bhide

· 13,561 Views · 13 Likes

Advanced Analytics in Order to Cash Process

In this article, we'll take a look at the cases where advanced analytics can be implemented in an Order to Cash (O2C) process, going over potential use cases for anomaly detection, streaming analytics, and recommendation engines.

June 1, 2017

by Sachit Das

· 14,872 Views · 6 Likes

How the Internet of Things Will Affect Database Management

The Internet of Things poses unprecedented challenges for database administrators in terms of scalability, flexibility, and connectivity.

May 31, 2017

by Darren Perucci

· 6,445 Views · 4 Likes

Apache Flume to Multiplex or Replicate Big Data

The solution from Apache Flume to address multiplexing and replicating requirements is really elegant and very easy to set up.

May 30, 2017

by Nikhil Bhide

· 10,276 Views · 5 Likes

5 Essential Components of Data Strategy

PII and personal data isn't limited to reporting and data delivery; it needs to be considered throughout the integrated data strategy.

May 30, 2017

by Tom Smith

CORE

· 5,646 Views · 4 Likes

Strengths and Weaknesses of IoT Communication Patterns

There are four main communication patterns in IoT: Telemetry, Inquiry, Command, and Notification. Learn about them and which communication protocols work best for them.

May 28, 2017

by Paolo Patierno

· 22,940 Views · 9 Likes

Using ElasticSearch for Big Data Analysis

Learn how leveraging ElasticSearch to build the data infrastructure makes it easier to linearly scale as new data nodes are added in the future.

May 28, 2017

by Rohit Akiwatkar

· 29,631 Views · 9 Likes

ElasticSearch: Advantages, Case Studies, and Stats

ElasticSearch is an open-source, broadly distributable, readily scalable, enterprise-grade search engine. Look more closely into what it is, its advantages, and stats.

May 26, 2017

by Ekaterina Novoseltseva

CORE

· 32,662 Views · 15 Likes

Database Scaling Made Simple

For every problem technology poses, it offers many possible solutions. Here's how to take much of the sting out of scaling databases.

May 24, 2017

by Darren Perucci

· 6,827 Views · 2 Likes

Using TinkerBoard With TensorFlow and Python

In this post, we use ASUS' new embedded platform for Deep Learning and IoT with TensorFlow and Python on this RPI form factor device.

May 24, 2017

by Tim Spann

CORE

· 15,394 Views · 4 Likes

Code Analyzer for Apache Spark

The new Code Analyze for Apache Spark promotes DevOps for Big Data, helping to tear down the wall between developers and operations.

May 23, 2017

by Tom Smith

CORE

· 7,529 Views · 3 Likes

The Latest Big Data Topics