Big Data Resources

Exploratory and Confirmatory Analysis: What's the Difference?

Learn about the differences and uses of exploratory data analysis and confirmatory analysis by considering the process a detective goes through.

November 29, 2017

by Shelby Blitz

· 18,589 Views · 1 Like

Quick Start With Apache Livy

Learn how to get started with Apache Livy, a project in the process of being incubated by Apache that interacts with Apache Spark through a REST interface.

November 28, 2017

by Guglielmo Iozzia

· 47,716 Views · 7 Likes

AWS IoT: Retrieving SQS Messages From a Queue

Let's set up an AWS IoT Rule to call an AWS Lambda function triggered by MQTT messages. See how it's configured with this step-by-step guides.

November 27, 2017

by Kevin Hooke

· 8,516 Views · 4 Likes

Spring Boot Plus Apache Ignite Data Grid

Learn how to integrate Spring Boot with Apache Ignite to use Ignite's persistent durable memory feature and execute SQL queries over Ignite caches.

November 26, 2017

by Mahmoud Romeh

· 33,655 Views · 9 Likes

ETL Pipeline to Analyze Healthcare Data With Spark SQL, JSON, and MapR-DB

Learn how to ETL Open Payments CSV file data to JSON, explore with SQL, and store in a document database using Spark Datasets and MapR-DB.

November 23, 2017

by Carol McDonald

· 16,972 Views · 7 Likes

A Developer’s Introduction to the Pulsar Streaming Messaging System

Apache Pulsar is an open-source distributed pub-sub messaging system that's currently undergoing incubation. Get introduced to it here!

November 20, 2017

by Matteo Merli

· 13,759 Views · 7 Likes

Apache Spark Word Count: Data Analytics With a Publicly Available Dataset

Let's take things up a notch and check out how quickly we can get some huge datasets to perform word counts on the Yelp dataset.

November 17, 2017

by Kevin Hooke

· 19,358 Views · 4 Likes

Introduction to Apache Kafka [Tutorial]

What is Apache Kafka, and what can is be used for? Dive deep into what Apache Kafka is all about and learn how to create a Kafka cluster with three brokers.

November 17, 2017

by Siva Prasad Rao Janapati

· 63,223 Views · 22 Likes

Data Manipulation in R Using dplyr

Learn about the primary functions of the dplyr package and the power of this package to transform and manipulate your datasets with ease in R.

November 15, 2017

by Sibanjan Das

CORE

· 9,963 Views · 5 Likes

In-Memory Data Grid With Apache Ignite

Take a look at how to create an application that makes use of Apache Ignite, a new platform that is rapidly gaining popularity.

November 15, 2017

by Piotr Mińkowski

· 15,757 Views · 5 Likes

Connecting Message Queuing System With Mule ESB [Video]

In this series of videos, you'll learn how to connect various message queuing systems, like ActiveMQ and Kafka, with Mule ESB.

November 12, 2017

by Jitendra Bafna

CORE

· 8,719 Views · 3 Likes

Elevators: An IoT-Enabled Success Story

Settle in to hear the story of a Dutch company that transformed manual elevator logging into a digital industry standard and opened the door for predictive maintenance.

Updated November 9, 2017

by Danielle Goodman

· 8,921 Views · 2 Likes

IoT Glossary: 55 Terms You Need to Know

Beacons, communications protocols, edge gateways, integrators and more, if you develop IoT solutions, here are the essential phrases for your hobby or career.

November 9, 2017

by Mike Gates

· 20,459 Views · 4 Likes

NGINX and IoT: Adding Protocol Awareness for MQTT

Buckle up for a 30-minute talk about the current state of IoT data and a demo that tackles MQTT, TLS, load balancing, session persistence, and plenty more.

Updated November 8, 2017

by Liam Crilly

· 15,867 Views · 4 Likes

Feature Hashing for Scalable Machine Learning

Feature hashing is a valuable tool in the data scientist's arsenal. Learn how to use it as a fast, efficient, flexible technique for feature extraction that can scale to sparse, high-dimensional data.

November 8, 2017

by Nick Pentreath

· 18,319 Views · 3 Likes

InfluxDB vs. Elasticsearch for Time Series Analysis

InfluxDB was designed for time series data, and Elasticsearch wasn't. However, many people use Elasticsearch for this purpose. Is one database better than the other?

October 30, 2017

by Daniel Berman

· 19,769 Views · 2 Likes

Aggregate and Index Data into Elasticsearch Using Logstash and JDBC

Some of the shortcomings of Elasticsearch can be overcome using some Logstash plugins. Check out how to use them to aggregate and index data into Elasticsearch.

October 27, 2017

by Mohamed Sanaulla

· 24,311 Views · 2 Likes

Import and Ingest Data Into HDFS Using Kafka in StreamSets

Learn about reading data from different data sources such as Amazon Simple Storage Service (S3) and flat files, and writing the data into HDFS using Kafka in StreamSets.

October 26, 2017

by Rathnadevi Manivannan

· 23,632 Views · 5 Likes

Why Smart City Amsterdam Is the Home of Innovation

While the rest of the world embraces smart city technology in fits and starts, here is where Amsterdam shines with open data and citizen-led startup initiatives.

October 25, 2017

by Cate Lawrence

CORE

· 10,912 Views · 4 Likes

Credit Scoring: Analytics and Scorecard Development

By: Natasha Mashanovich, Senior Data Scientist at World Programming, UK Scorecard development describes how to turn data into a scorecard model, assuming tha...

October 25, 2017

by Natasha Mashanovich

· 22,008 Views · 3 Likes

The Latest Big Data Topics