Big Data Resources

Write a Kafka Producer Using Twitter Stream

With the newly open sourced Twitter HBC, a Java HTTP library for consuming Twitter’s Streaming API, we can easily create a Kafka twitter stream producer.

Updated October 5, 2020

by Saurabh Chhajed

· 38,760 Views · 13 Likes

Reasons Why You Should Get a Cloud Computing Certification

See why a cloud certification could be a great choice to advance your career and better help you stand out as a potential candidate.

October 5, 2020

by Shormistha Chatterjee

CORE

· 6,621 Views · 7 Likes

Using SocketCluster for Distributed Computing in a Unique Way

HarperDB CTO demonstrates the inner-workings of SocketCluster, including a code review to highlight SocketCluster concepts within a database framework.

October 2, 2020

by Margo McCabe

· 4,207 Views · 5 Likes

How We Build an HTAP Database That Simplifies Your Data Platform

This article is based on a talk given by Shawn Ma at TiDB DevCon 2020. TiDB is an open-source, distributed, NewSQL database that supports Hybrid Transactiona...

Updated October 1, 2020

by Xiaoyu Ma

· 7,207 Views · 4 Likes

Kafka Consumer Pooling

In this article, take a look at Kafka consumer pooling and see a use case.

October 1, 2020

by Sutanu Dalui

· 7,926 Views · 5 Likes

Data Ingestion Into Azure Data Explorer Using Kafka Connect

In this blog, we will go over how to ingest data into Azure Data Explorer using the open-source Kafka Connect Sink connector for Azure Data Explorer.

September 28, 2020

by Abhishek Gupta

CORE

· 2,739 Views · 4 Likes

7 Essential Tools for a Competent Data Scientist

Becoming a competent data scientist requires looking beyond the constructed horizon. Learn about the various data science tools.

Updated September 25, 2020

by Niti Sharma

· 10,716 Views · 8 Likes

Power BI vs Tableau: Comparison Between Top Two BI Tools

In tis article, compare two BI tools, Power BI and Tableau.

September 25, 2020

by Ankit Kumar

· 9,028 Views · 7 Likes

Distributed Balanced Partition-Queues Assignment Using Kubernetes statefulSet

Partitioning a domain is a useful way to achieve scalability. Instead of putting everything in a single place, you divide work based on some attribute (often an Id).

September 24, 2020

by Tamir Dresher

· 5,664 Views · 2 Likes

Crafting a Multi-Node Multi-Broker Kafka Cluster- A Weekend Project

This article explains how to install and configure the multi-node multi-broker Kafka cluster where Ubuntu 14.04 LTS as an OS on all the nodes in the cluster.

September 24, 2020

by Gautam Goswami

CORE

· 14,070 Views · 4 Likes

Understanding a Messaging Queue by Example

Design Messaging Queue

September 24, 2020

by Mayank Bansal

· 8,364 Views · 8 Likes

End2End Testing With TestContainers...and a Lot of Patience

This time I would like to show my experience creating an End2End test for a Camel integration application.

September 23, 2020

by Jonathan Vila

· 5,770 Views · 3 Likes

OpenAI GPT-3: How It Works and Why It Matters

GPT-3 has many strengths, but it also has some weaknesses. Explore why it matters and how to use it to write code, design an app, and compose music.

Updated September 23, 2020

by Dana Kozubska

· 17,743 Views · 9 Likes

Building a Data Catalog For Small and Medium-Sized Businesses

So you're ready to build a data catalog—where do you begin? Let's walk through the most important features you should look for that best represents your needs.

September 17, 2020

by Grant Seward

· 6,980 Views · 2 Likes

CData Elasticsearch Driver Features and Differentiators

In this article, we explore how CData drivers grant access to all of Elasticsearch, enable full SQL querying of Elasticsearch, and more.

September 16, 2020

by Jerod Johnson

· 4,018 Views · 2 Likes

CQRS Is an Anti-Pattern for DDD

CQRS solves a very particular set of problems, like executing queries in event-stores or building web applications with extremely high scalability requirements.

Updated September 16, 2020

by Hristiyan Pehlivanov

· 46,536 Views · 49 Likes

EFK Stack on Kubernetes (Part 1)

This is the first post of a 2-part series where we will set up Kubernetes logging for applications deployed in the cluster and the cluster itself.

September 16, 2020

by Sudip Sengupta

CORE

· 6,191 Views · 3 Likes

Modern Cloud-Native Jakarta EE Frameworks: Tips, Challenges, and Trends.

Changes brought by cloud-native architecture impact applications in ways that weren't critical before.

Updated September 16, 2020

by Otavio Santana

CORE

· 4,139 Views · 2 Likes

Horizontally Scaling the Hive Metastore Database by Migrating From MySQL to TiDB

This post shows how TiDB helps Zhihu eliminate their database bottleneck and horizontally scale their Hive Metastore database to meet the growing business needs.

September 15, 2020

by mengyu hu

· 7,513 Views · 5 Likes

Accelerated Automatic Differentiation With JAX: How Does It Stack Up Against Autograd, TensorFlow, and PyTorch?

In this article, take a look at accelerated automatic differentiation with Jax and see how it stacks up against Autograd, TensorFlow, and PyTorch.

September 10, 2020

by Kevin Vu

· 11,079 Views · 2 Likes

The Latest Big Data Topics