Some of the shortcomings of Elasticsearch can be overcome using some Logstash plugins. Check out how to use them to aggregate and index data into Elasticsearch.
Learn about reading data from different data sources such as Amazon Simple Storage Service (S3) and flat files, and writing the data into HDFS using Kafka in StreamSets.
While the rest of the world embraces smart city technology in fits and starts, here is where Amsterdam shines with open data and citizen-led startup initiatives.
By: Natasha Mashanovich, Senior Data Scientist at World Programming, UK Scorecard development describes how to turn data into a scorecard model, assuming tha...
In today's news: Neo4j is announcing Cypher for Apache Spark and the Neo4j Native Graph Platform. Come learn about it all, according to Neo4j's Head of Product Marketing.
Big data, IoT, and AI have all contributed to the widespread use of personal info. The privacy debate is at a crossroads where the public, authorities, and companies must decide in which direction the industry will turn.
A terabyte is enormous in size. Itβs difficult to put this into perspective, so let's try to understand it from two points of view: spatially and based on time.
TensorFlow and deep learning are things that corporations must now embrace. The coming flood of audio, video, and image data and their applications are key to success.
Apache Zeppelin β an open-source data analytics and visualization platform β helps us analyze the data to gain insight and to improve and enhance business decisions.
Learn how to perform anomaly detection using Kafka Streams with an example of a loan payment website that needs to send an alert if the payment is too high.
See how to get started with writing stream processing algorithms using Apache Flink. by reading a stream of Wikipedia edits and getting some meaningful data out of it.
Some quick stats: 656 million tweets go out per day, and 15,220,700 texts are sent every minute. This makes for LOTS of data. Read on for more shocking stats!
If you've been following software development news recently you probably heard about the new project called Apache Flink. I've already written about it a bit...
The variable selection process in the credit score modeling process is critical to finding key information. Learn how to do it to get a good understanding of your data!
If you have often wondered to yourself about the difference between machine learning and deep learning, read on to get a detailed comparison in simple layman language.
More than a third of the Fortune 500 companies now use Kafka in production β and for good reason. In this article, learn how to track real-time activity using Kafka.