Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
Instant APIs With Copilot and API Logic Server
Essential Math to Master AI and Quantum
During my early days as a Data Engineer (which dates back to 2016), I had the responsibility of scraping data from different websites. Web scraping is all about making use of tools that are automated to get vast amounts of data from the websites, usually from their HTML. I remember building around the application, digging into the HTML code, and trying to figure out the best solutions for scraping all the data. One of my main challenges was dealing with frequent changes to the websites: for example, the Amazon pages I was scraping changed every one to two weeks. One thought that occurred to me when I started reading about Large Language Models (LLMs) was, "Can I avoid all those pitfalls I faced using LLMs to structure data from webpages?" Let's see if I can. Web Scraping Tools and Techniques At the time, the main tools I was using were Requests, BeautifulSoup, and Selenium. Each service has a different purpose and is targeted at different types of web environments. Requests is a Python library that can be used to easily make HTTP requests. This library performs GET and POST operations against URLs provided in the requests. It is frequently used to fetch HTML content that can be parsed by BeautifulSoup. BeautifulSoup is a Python library for parsing HTML and XML documents, it constructs a parse tree from page source that allows you to access the various elements on the page easily. Usually, it is paired with other libraries like Requests or Selenium that provide the HTML source code. Selenium is primarily employed for websites that have a lot of JavaScript involved. Unlike BeautifulSoup, Selenium does not simply analyze HTML code: it interacts with websites by emulating user actions such as clicks and scrolling. This facilitates the data extraction from websites that create content dynamically. These tools were indispensable when I was trying to extract data from websites. However, they also posed some challenges: code, tags, and structural elements had to be regularly updated to accommodate changes in the website's layout, complicating long-term maintenance. What Are Large Language Models (LLMs)? Large Language Models (LLMs) are next-generation computer programs that can learn through reading and analyzing vast amounts of text data. At this age, they are gifted with the amazing capability to write in a human-like narrative making them efficient agents to process language and comprehend the human language. The outstanding ability shone through in that kind of situation, where the text context was really important. Integrating LLMs Into Web Scraping The web scraping process can be optimized in a great measure when implementing LLMs into it. We need to take the HTML code from a webpage and feed it into the LLM, which will pull out the objects it refers to. Therefore, this tactic helps in making maintenance easy, as the markup structure can evolve, but the content itself does not usually change. Here’s how the architecture of such an integrated system would look: Getting HTML: Use tools like Selenium or Requests to fetch the HTML content of a webpage. Selenium can handle dynamic content loaded with JavaScript, while Requests is suited for static pages. Parsing HTML: Using BeautifulSoup, we can parse out this HTML as text, thus removing the noise from the HTML (footer, header, etc.). Creating Pydantic models: Type the Pydantic model in which we are going to scrape. This makes sure that the data typed and structured conforms to the pre-defined schemas. Generating prompts for LLMs: Design a prompt that will inform the LLM what information has to be extracted. Processing by LLM: The model reads the HTML, understands it, and employs the instructions for data processing and structuring. Output of structured data: The LLM will provide the output in the form of structured objects which are defined by the Pydantic model. This workflow helps to transform HTML (unstructured data) into structured data using LLMs, solving problems such as non-standard design or dynamic modification of the web source HTML. Integration of LangChain With BeautifulSoup and Pydantic This is the static webpage selected for the example. The idea is to scrape all the activities listed there and present them in a structured way. This method will extract the raw HTML from the static webpage and clean it before the LLM processes it. Python from bs4 import BeautifulSoup import requests def extract_html_from_url(url): try: # Fetch HTML content from the URL using requests response = requests.get(url) response.raise_for_status() # Raise an exception for bad responses (4xx and 5xx) # Parse HTML content using BeautifulSoup soup = BeautifulSoup(response.content, "html.parser") excluded_tagNames = ["footer", "nav"] # Exclude elements with tag names 'footer' and 'nav' for tag_name in excluded_tagNames: for unwanted_tag in soup.find_all(tag_name): unwanted_tag.extract() # Process the soup to maintain hrefs in anchor tags for a_tag in soup.find_all("a"): href = a_tag.get("href") if href: a_tag.string = f"{a_tag.get_text()} ({href})" return ' '.join(soup.stripped_strings) # Return text content with preserved hrefs except requests.exceptions.RequestException as e: print(f"Error fetching data from {url}: {e}") return None The next step is to define the Pydantic objects that we are going to scrape from the webpage. Two objects need to be created: Activity: This is a Pydantic object that represents all the metadata related to the activity, with its attributes and data types specified. We have marked some fields as Optional in case they are not available for all activities. Providing a description, examples, and any metadata will help the LLM to have a better definition of the attribute. ActivityScraper: This is the Pydantic wrapper around the Activity. The objective of this object is to ensure that the LLM understands that it is needed to scrape several activities. Python from pydantic import BaseModel, Field from typing import Optional class Activity(BaseModel): title: str = Field(description="The title of the activity.") rating: float = Field(description="The average user rating out of 10.") reviews_count: int = Field(description="The total number of reviews received.") travelers_count: Optional[int] = Field(description="The number of travelers who have participated.") cancellation_policy: Optional[str] = Field(description="The cancellation policy for the activity.") description: str = Field(description="A detailed description of what the activity entails.") duration: str = Field(description="The duration of the activity, usually given in hours or days.") language: Optional[str] = Field(description="The primary language in which the activity is conducted.") category: str = Field(description="The category of the activity, such as 'Boat Trip', 'City Tours', etc.") price: float = Field(description="The price of the activity.") currency: str = Field(description="The currency in which the price is denominated, such as USD, EUR, GBP, etc.") class ActivityScrapper(BaseModel): Activities: list[Activity] = Field("List of all the activities listed in the text") Finally, we have the configuration of the LLM. We will use the LangChain library, which provides an excellent toolkit to get started. A key component here is the PydanticOutputParser. Essentially, this will translate our object into instructions, as illustrated in the Prompt, and also parse the output of the LLM to retrieve the corresponding list of objects. Python from langchain.prompts import PromptTemplate from langchain.output_parsers import PydanticOutputParser from langchain_openai import ChatOpenAI from dotenv import load_dotenv load_dotenv() llm = ChatOpenAI(temperature=0) output_parser = PydanticOutputParser(pydantic_object = ActivityScrapper) prompt_template = """ You are an expert making web scrapping and analyzing HTML raw code. If there is no explicit information don't make any assumption. Extract all objects that matched the instructions from the following html {html_text} Provide them in a list, also if there is a next page link remember to add it to the object. Please, follow carefulling the following instructions {format_instructions} """ prompt = PromptTemplate( template=prompt_template, input_variables=["html_text"], partial_variables={"format_instructions": output_parser.get_format_instructions} ) chain = prompt | llm | output_parser The final step is to invoke the chain and retrieve the results. Python url = "https://www.civitatis.com/es/budapest/" html_text_parsed = extract_html_from_url(url) activites = chain.invoke(input={ "html_text": html_text_parsed }) activites.Activities Here is what the data looks like. It takes 46 seconds to scrape the entire webpage. Python [Activity(title='Paseo en barco al anochecer', rating=8.4, reviews_count=9439, travelers_count=118389, cancellation_policy='Cancelación gratuita', description='En este crucero disfrutaréis de las mejores vistas de Budapest cuando se viste de gala, al anochecer. El barco es panorámico y tiene partes descubiertas.', duration='1 hora', language='Español', category='Paseos en barco', price=21.0, currency='€'), Activity(title='Visita guiada por el Parlamento de Budapest', rating=8.8, reviews_count=2647, travelers_count=34872, cancellation_policy='Cancelación gratuita', description='El Parlamento de Budapest es uno de los edificios más bonitos de la capital húngara. Comprobadlo vosotros mismos en este tour en español que incluye la entrada.', duration='2 horas', language='Español', category='Visitas guiadas y free tours', price=27.0, currency='€') ... ] Demo and Full Repository I have created a quick demo using Streamlit available here. In the first part, you are introduced to the model. You can add as many rows as you need and specify the name, type, and description of each attribute. This will automatically generate a Pydantic model to be used in the web scraping component. The next part allows you to enter a URL and scrape all the data by clicking the button on the webpage. A download button will appear when the scraping has finished, allowing you to download the data in JSON format. Feel free to play with it! Conclusion LLM provides new possibilities for efficiently extracting data from non-structured data such as websites, PDFs, etc. The automatization of web scraping by LLM not only will save time but also ensure the quality of the data retrieved. However, sending raw HTML to the LLM could increase the token cost and make it inefficient. Since HTML often includes various tags, attributes, and content, the cost can quickly rise. Therefore, it is crucial to preprocess and clean the HTML, removing all the unnecessary metadata and non-used information. This approach will help use LLM as a data extractor for webs while maintaining a decent cost. The right tool for the right job!
Artificial intelligence (AI) has long been fascinated by large language models for their impressive capabilities. However, the recent emergence of smaller language models brings about a significant paradigm shift in AI development. These models, though compact, are highly efficient and offer scalability, accessibility, and efficiency to both developers and businesses. This article examines the transformative potential of smaller language models and their wide-ranging applications. Understanding Smaller Language Models Compact language models, often referred to as "lite" or "mini" models, are purposefully designed to achieve outstanding performance while requiring significantly fewer computational resources compared to their larger counterparts. This achievement is realized through the implementation of various techniques, including knowledge distillation, quantization, and pruning. Knowledge distillation involves transferring the expertise acquired by a larger model to a smaller one, typically by utilizing the outputs or internal representations of the larger model as targets for the smaller model to emulate. This process allows the smaller model to benefit from the knowledge and capabilities of its larger counterpart, despite its reduced size. Quantization, on the other hand, entails reducing the precision of the numerical values used to represent the weights and activations of a model. By converting these floating-point numbers into fixed-point numbers with fewer bits, quantization effectively reduces the memory footprint and computational complexity of the model, without significantly compromising its performance. Pruning, meanwhile, aims to simplify and compress the model by identifying and removing redundant connections (weights) between neurons. This process results in a more streamlined architecture that is smaller and more efficient, while ideally maintaining or even improving its performance. Together, these techniques enable compact language models to strike a delicate balance between size and functionality, making them an ideal solution for resource-restricted settings such as mobile applications and edge devices, where computational resources are limited. The Emergence of Small Language Models In the rapidly evolving field of artificial intelligence, the size of a language model has often been synonymous with its capability. While large language models (LLMs) like GPT-4 have dominated the AI landscape, smaller language models are now emerging as potent tools. This shift challenges the long-held notion that bigger is always better. Limitations of Large Language Models (LLMs) LLMs excel in areas like translation, summarization, and question-answering. However, their success comes at a cost: High energy consumption: LLMs require substantial computational resources. Memory requirements: They demand significant memory. Cost: Their computational costs can be prohibitive. GPU innovation lags behind the growing size of LLMs, hinting at a scaling ceiling. The Rise of Smaller Models Researchers are turning their attention to smaller language models due to their efficiency and versatility. Techniques like knowledge distillation from LLMs into smaller models yield similar performance with reduced computational demands. Transfer learning enables small models to effectively adapt to specific tasks by leveraging knowledge acquired from solving related problems. This approach has demonstrated its efficacy in fields like sentiment analysis and translation, where small language models can achieve comparable or superior results. For instance, consider a scenario where a small language model is initially trained on a large corpus of text data, such as Wikipedia articles or news articles. Following this pre-training phase, the model can undergo a process known as fine-tuning, where it is further trained on a smaller dataset specifically annotated for sentiment analysis or translation tasks. Through fine-tuning on these task-specific datasets, the model can learn to discern and extract pertinent features and patterns relevant to sentiment or translation. Consequently, this process enables the model to achieve outcomes that are on par with or surpass those obtained through training from scratch. Exploring Leading-Edge Small Language Models 1. DeepMind’s Chinchilla Insight Despite its smaller stature, DeepMind's Chinchilla is a formidable contender against larger models, challenging the conventional belief that size equates to superiority. Key Features Compact power: With 70 billion parameters, Chinchilla stands tall in performance. Data refinement: Fine-tuned on an extensive 1.4 trillion training tokens dataset Efficiency unveiled: Chinchilla's research delves into optimal training dataset size, model dimensions, and compute budget, emphasizing efficiency over sheer size. Safety and Ethics Its ongoing development underscores the paramount importance of safety and ethical considerations. (Ref) 2. Meta’s Llama Models Insight Meta's Llama models, ranging from 7B to 70B parameters, defy the notion that bigger is always better, excelling particularly in dialogue-based tasks. Fine-Tuning and Versatility Adaptable across various NLP applications, showcasing prowess from text generation to programming code (Ref) 3. Stanford’s Alpaca Insight Stanford's Alpaca, born from Meta AI’s LLaMa 7B model, demonstrates remarkable performance despite modest resources, targeting instruction-based tasks. Cautious Engagement Interaction with Alpaca demands caution due to ongoing development nuances. (Ref) 4. Stability AI’s StableLM Series Insight Stability AI's StableLM series unveils a harmonious blend of efficiency and effectiveness, offering impressive text generation capabilities. Performance Par Excellence StableLM 1.6B outshines larger counterparts, underscoring the triumph of efficiency. (Ref) Technological Advancements and Their Implications UL2R: Ultra Lightweight 2 Repair introduces a mixture-of-denoisers objective, enhancing performance across tasks. Flan: Fine-tuning models on tasks phrased as instructions improve both performance and usability. Applications Across Industries Natural Language Understanding (NLU) in IoT Devices Smaller language models revolutionize the functionality of IoT devices by enabling them to comprehend and respond to user queries efficiently. For instance, a smart home assistant equipped with a compact language model can understand commands such as "dim the lights" or "set the thermostat to 72 degrees" without relying heavily on cloud services. This allows for quicker response times and improved privacy for users. Example Consider a smart speaker integrated with a mini-language model. When a user asks, "What's the weather forecast for today?" the device processes the query locally and provides an immediate response based on the pre-trained knowledge within the model. This seamless interaction enhances user experience and reduces dependency on external servers. Personalized Content Recommendations Content recommendation systems driven by smaller language models offer personalized suggestions tailored to individual user preferences in real time. By analyzing browsing history, purchase behavior, and other relevant data, these models deliver accurate recommendations across various platforms. Example A streaming service utilizes a lite language model to analyze user viewing habits and preferences. Based on this data, the model suggests movies or TV shows that align with the user's interests. For instance, if a user frequently watches sci-fi movies, the recommendation system might suggest similar titles, enhancing user engagement and satisfaction. Medical Diagnosis and Healthcare In the healthcare sector, smaller language models assist medical professionals in tasks such as clinical documentation, diagnosis prediction, and drug interaction analysis. By processing medical texts efficiently, these models contribute to improved accuracy and decision-making, ultimately enhancing patient care. Example A healthcare application employs a mini-language model to assist doctors in diagnosing diseases based on symptoms provided by patients. The model analyzes the symptoms against a vast database of medical knowledge and offers potential diagnoses or treatment recommendations, aiding healthcare providers in delivering timely and accurate care. Educational Tools and Language Learning Language models tailored for educational purposes empower learners with personalized tutoring experiences, language translation, and grammar correction. These models support educators in creating interactive learning materials and adaptive assessment tools, fostering a more engaging and effective learning environment. Example A language learning app utilizes a compact language model to provide personalized feedback and exercises to users. The model identifies areas where the user may need improvement, such as grammar or vocabulary, and offers targeted exercises and explanations to enhance their language skills. This personalized approach accelerates the learning process and improves overall proficiency. Code Snippets Let’s explore sample code snippets for building smaller language models in Python. I’ll provide examples for N-gram language, Neural language, and Meta's Llama models. N-gram Language Model An N-gram language model is a statistical model used in natural language processing to predict the probability of a word given the previous N-1 words (or tokens) in a sequence of text. It works by analyzing the frequency of co-occurrences of sequences of N words, known as N-grams, within a corpus of text. Real-Life Use Case Consider a smartphone keyboard that suggests the next word while typing a message. This feature often utilizes an N-gram language model to predict the most probable next word based on the context of the preceding words in the sentence. For example, if the user types "I am going to" the model may predict "the" or "see" as the next word based on the frequency of occurrence of these phrases in the training data. Code Explanation In the provided Python code snippet, we demonstrate how to build a simple N-gram language model using Python: We start with a sample text, such as "I love reading blogs about data science on Analytics Vidhya." We tokenize the text into unigrams (individual words) using the split() function. Next, we create bigrams (pairs of consecutive words) by iterating over the list of unigrams. We then compute the probabilities of each bigram occurring in the text. For simplicity, we assume equal probabilities for each bigram. Finally, we demonstrate how to predict the probability of a specific bigram, such as "love reading", by querying the probabilities dictionary. This code snippet provides a basic illustration of how an N-gram language model can be implemented in Python to analyze text data and make predictions based on the observed patterns of word sequences. Python # Example: Building an N-gram Language Model # Sample text text = "I love reading blogs about data science on Analytics Vidhya." # Tokenize the text into unigrams (1-grams) unigrams = text.split() # Create bigrams (2-grams) bigrams = [(unigrams[i], unigrams[i + 1]) for i in range(len(unigrams) - 1)] # Compute probabilities (you can use frequency counts or other methods) # For simplicity, let's assume equal probabilities for each bigram probabilities = {bigram: 1 / len(bigrams) for bigram in bigrams} # Example: Predict the probability of the bigram "love reading" print(f"Probability of 'love reading': {probabilities.get(('love', 'reading'), 0)}") Neural Language Model A neural language model is a type of model in natural language processing (NLP) that uses neural networks to learn the patterns and relationships within a sequence of words. These models are capable of generating coherent and contextually relevant text, making them suitable for tasks such as language generation, machine translation, and text summarization. Real-Life Use Case Consider a virtual assistant, like Google Assistant or Siri, that responds to user queries with natural-sounding and contextually appropriate answers. These virtual assistants often utilize neural language models to understand and generate human-like responses based on the input received from users. Model Explanation In the provided Python code snippet, we demonstrate how to construct a neural language model using PyTorch and the Transformer architecture: We start by loading the WikiText2 dataset, which contains a large collection of English-language Wikipedia articles. We tokenize the raw text data using a basic English tokenizer. Next, we build a vocabulary from the tokenized data to convert words into numerical indices. We preprocess the raw text data by converting it into tensors suitable for training the neural network. We define the neural language model architecture, which in this case, is based on the Transformer architecture. The specifics of the model architecture, including the number of layers, hidden units, and attention mechanisms, can be adjusted based on the requirements of the task. We batchify the preprocessed data to facilitate efficient training of the model by dividing it into batches. Finally, we train the neural language model using the Transformer architecture, adjusting the model architecture, hyperparameters, and training loop as needed to optimize performance. This code snippet provides a foundational framework for building and training neural language models using PyTorch and the Transformer architecture, which can be further customized and extended for various NLP tasks and applications. Python import torch from torchtext.datasets import WikiText2 from torchtext.data.utils import get_tokenizer from torchtext.vocab import build_vocab_from_iterator # Load the WikiText2 dataset train_iter, val_iter, test_iter = WikiText2() tokenizer = get_tokenizer('basic_english') vocab = build_vocab_from_iterator(map(tokenizer, train_iter), specials=['<unk>']) vocab.set_default_index(vocab['<unk>']) # Convert raw text into tensors def data_process(raw_text_iter): data = [torch.tensor(vocab(tokenizer(item)), dtype=torch.long) for item in raw_text_iter] return torch.cat(tuple(filter(lambda t: t.numel() > 0, data))) train_data = data_process(train_iter) val_data = data_process(val_iter) test_data = data_process(test_iter) # Define your neural language model (e.g., using nn.Transformer) # Example: Batchify the data for training def batchify(data, bsz): nbatch = data.size(0) // bsz data = data.narrow(0, 0, nbatch * bsz) data = data.view(bsz, -1).t().contiguous() return data.to(device) batch_size = 32 train_data = batchify(train_data, batch_size) val_data = batchify(val_data, batch_size) test_data = batchify(test_data, batch_size) # Now you can train your neural language model using the Transformer architecture! # Remember to adjust the model architecture, hyperparameters, and training loop as needed. Meta’s Llama Models Meta’s Llama models are advanced language models specifically designed for fine-tuning and domain adaptation tasks. These models are part of the broader landscape of models provided by Meta AI, aimed at empowering developers with powerful natural language processing capabilities. Real-Life Use Case Consider a social media platform like Facebook, which utilizes Meta’s Llama models to enhance its content generation and recommendation systems. By fine-tuning the Llama models on the platform's vast amount of user-generated content, Meta can generate more relevant and engaging content recommendations tailored to individual users' preferences and interests. Model Explanation In the provided Python code snippet, we demonstrate how to utilize Meta’s Llama Models for text generation tasks: We start by installing the required packages, including PyTorch and the Transformers library. We then load the pre-trained LLaMa model and tokenizer provided by Meta AI. In this example, we're using the "llama-3B" variant of the LLaMa model. Next, we specify a prompt, which serves as the starting point for text generation. We encode the prompt using the LlamaTokenizer, converting it into input tokens suitable for feeding into the LLaMa model. We generate text using the LLaMa model by passing the encoded input tokens and specifying parameters such as the maximum length of the generated text and the number of sequences to generate. Finally, we decode the generated output tokens into human-readable text and print the generated text. This code snippet showcases how Meta’s Llama Models can be leveraged for text generation tasks, such as generating stories, captions, or responses, based on a given prompt. These models excel in capturing the nuances of natural language and producing coherent and contextually relevant text, making them valuable tools for a wide range of applications in NLP. Python # Install the required packages !pip install torch !pip install transformers import torch from transformers import LlamaForCausalLM, LlamaTokenizer # Load the pre-trained LLaMa model model_name = "meta-llama/llama-3B" tokenizer = LlamaTokenizer.from_pretrained(model_name) model = LlamaForCausalLM.from_pretrained(model_name) # Example: Generate text using the LLaMa model prompt = "Once upon a time" input_ids = tokenizer.encode(prompt, return_tensors="pt") output = model.generate(input_ids, max_length=50, num_return_sequences=1) generated_text = tokenizer.decode(output[0], skip_special_tokens=True) print("Generated text:", generated_text) Challenges and Opportunities Although smaller language models offer many benefits, there are also challenges to consider. Techniques used to compress these models may result in a loss of information or decreased performance, which requires careful optimization and fine-tuning. Additionally, ensuring that these models are deployed ethically and without bias is crucial to minimize the risks associated with algorithmic biases. Nevertheless, there is reason for optimism due to the rapid advancements in model compression algorithms and hardware optimization techniques. These advancements create significant opportunities for further innovation in this space. As the demand for AI-powered solutions continues to grow, the potential of smaller language models to democratize AI by making it more accessible and affordable across industries and regions is immense. Conclusion To summarize, the emergence of compact language models signifies a significant evolution in the field of AI, presenting an alluring substitute to conventional, extensive models. Their adaptability, efficacy, and expandability render them an ideal choice for a diverse array of applications spanning from edge computing to healthcare and education. With the potential of smaller language models, companies and developers can explore novel opportunities for advancement and simultaneously tackle the difficulties of resource limitations and ethical concerns in the implementation of AI.
If you're eager to learn or understand decision trees, I invite you to explore this article. Alternatively, if decision trees aren't your current focus, you may opt to scroll through social media. About Decision Trees Figure 1: Simple Decision tree The image above shows an example of a simple decision tree. Decision trees are tree-shaped diagrams used for making decisions based on a series of logical conditions. In a decision tree, each node represents a decision statement, and the tree proceeds to make a decision based on whether the given statement is true or false. There are two main types of decision trees: Classification trees and Regression trees. A Classification tree categorizes problems by classifying the output of the decision statement into categories using if-else logical conditions. Conversely, a Regression tree classifies the output into numeric values. In Figure 2, the topmost node of a decision tree is called the Root node, while the nodes following the root node are referred to as Internal nodes or branches. These branches are characterized by arrows pointing towards and away from them. At the bottom of the tree are the Leaf nodes, which carry the final classification or decision of the tree. Leaf nodes are identifiable by arrows pointing to them, but not away from them. Figure 2: Nodes of a Decision tree Primary Objective of Decision Trees The primary objective of a decision tree is to partition the given data into subsets in a manner that maximizes the purity of the outcomes. Advantages of Decision Trees Simplicity: Decision trees are straightforward to understand, interpret, and visualize. Minimal data preparation: They require minimal effort for data preparation compared to other algorithms. Handling of data types: Decision trees can handle both numeric and categorical data efficiently. Robustness to non-linear parameters: Non-linear parameters have minimal impact on the performance of decision trees. Disadvantages of Decision Trees Overfitting: Decision trees may overfit the training data, capturing noise and leading to poor generalization on unseen data. High variance: The model may become unstable with small variations in the training data, resulting in high variance. Low bias, high complexity: Highly complex decision trees have low bias, making them prone to difficulties in generalizing new data. Important Terms in Decision Trees Below are important terms that are also used for measuring impurity in decision trees: 1. Entropy Entropy is a measure of randomness or unpredictability in a dataset. It quantifies the impurity of the dataset. A dataset with high entropy contains a mix of different classes or categories, making predictions more uncertain. Example: Consider a dataset containing data from various animals as in Figure 3. If the dataset includes a diverse range of animals with no clear patterns or distinctions, it has high entropy. Figure 3: Animal datasets 2. Information Gain Information gain is the measure of the decrease in entropy after splitting the dataset based on a particular attribute or condition. It quantifies the effectiveness of a split in reducing uncertainty. Example: When we split the data into subgroups based on specific conditions (e.g., features of the animals) like in Figure 3, we calculate information gain by subtracting the entropy of each subgroup from the entropy before the split. Higher information gain indicates a more effective split that results in greater homogeneity within subgroups. 3. Gini Impurity Gini impurity is another measure of impurity or randomness in a dataset. It calculates the probability of misclassifying a randomly chosen element if it were randomly labeled according to the distribution of labels in the dataset. In decision trees, Gini impurity is often used as an alternative to entropy for evaluating splits. Example: Suppose we have a dataset with multiple classes or categories. The Gini impurity is high when the classes are evenly distributed or when there is no clear separation between classes. A low Gini impurity indicates that the dataset is relatively pure, with most elements belonging to the same class. Classifications and Variations Implementation in Python The following is used to predict the Lung_cancer of the patients. 1. Importing necessary libraries for data analysis and visualization in Python: Python import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns # to ensure plots are displayed inline in Notebook %matplotlib inline # Set Seaborn style for plots sns.set_style("whitegrid") # Set default Matplotlib style plt.style.use("fivethirtyeight") 2. Uploading the CSV file containing the data and loading: Python import pandas as pd # Load the data from the CSV file df = pd.read_csv('survey_lung_cancer.csv') Python df.head() # Displaying first five rows of the dataframe EDA (Exploratory Data Analysis): Python sns.countplot(x='LUNG_CANCER', data=df) # Count plot using Seaborn # to visualize the distribution of values in "LUNG_CANCER" column Python # title AGE from matplotlib import pyplot as plt df['AGE'].plot(kind='hist', bins=20, title='AGE') plt.gca().spines[['top', 'right',]].set_visible(False) 3. Iterating through columns, identifying categorical columns, and appending: Python categorical_col = [] for column in df.columns: if df[column].dtype == object and len(df[column].unique()) <= 50: categorical_col.append(column) df['LUNG_CANCER'] = df.LUNG_CANCER.astype("category").cat.codes 4. Removing the column "LUNG_CANCER" for further processing: Python categorical_col.remove('LUNG_CANCER') 5. Encoding categorical variables using LabelEncoder: Python from sklearn.preprocessing import LabelEncoder # creating an instance of the LabelEncoder class # LabelEncoder will be used to transform categorical values into numerical labels label = LabelEncoder() for column in categorical_col: df[column] = label.fit_transform(df[column]) 6. Dataset splitting for Machine Learning, train_test_split: Python from sklearn.model_selection import train_test_split # X contains the features (all columns except 'LUNG_CANCER') # y contains the target variable ('LUNG_CANCER') from the DataFrame df X = df.drop('LUNG_CANCER', axis=1) y = df.LUNG_CANCER # performing the Split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) 7. Function for model evaluation and reporting: Overall, the function below serves as a convenient tool for assessing the performance of classification models and generating detailed reports, facilitating model evaluation and interpretation. Python # import functions from scikit-learn for model evaluation from sklearn.metrics import accuracy_score, confusion_matrix, classification_report # clf: The classifier model to be evaluated # X_train, y_train: The features and target variable of the training set # X_test, y_test: The features and target variable of the testing set def print_score(clf, X_train, y_train, X_test, y_test, train=True): if train: pred = clf.predict(X_train) clf_report = pd.DataFrame(classification_report(y_train, pred, output_dict=True)) print("Train Result:\n_________________________") print(f"Accuracy Score: {accuracy_score(y_train, pred) * 100:.2f}%") print("_________________________") print(f"CLASSIFICATION REPORT:\n{clf_report}") print("_________________________________________________________________________") print(f"Confusion Matrix: \n {confusion_matrix(y_train, pred)}\n") elif train==False: pred = clf.predict(X_test) clf_report = pd.DataFrame(classification_report(y_test, pred, output_dict=True)) print("\nTest Result:\n_________________________") print(f"Accuracy Score: {accuracy_score(y_test, pred) * 100:.2f}%") print("_________________________") print(f"CLASSIFICATION REPORT:\n{clf_report}") print("_________________________________________________________________________") print(f"Confusion Matrix: \n {confusion_matrix(y_test, pred)}\n") Training and evaluation of decision tree classifier: Overall, this code provides a comprehensive evaluation of the decision tree classifier's performance on both the training and testing sets, including the accuracy score, classification report, and confusion matrix for each set. During the training process, the decision tree algorithm uses entropy and information gain to recursively split nodes and build a tree that maximizes information gain at each step. Python from sklearn.tree import DecisionTreeClassifier tree_clf = DecisionTreeClassifier(random_state=42) tree_clf.fit(X_train, y_train) print_score(tree_clf, X_train, y_train, X_test, y_test, train=True) print_score(tree_clf, X_train, y_train, X_test, y_test, train=False) The results above indicate that the decision tree classifier achieved high accuracy and performance on the training set, with some level of overfitting as evident from the difference in performance between the training and testing sets. While the classifier performed well on the testing set, there is room for improvement, particularly in terms of reducing false positives and false negatives. Further tuning of hyperparameters or exploring other algorithms may help improve generalization performance. 8. Visualization of decision tree classifier: Python # Importing Dependencies # Image is used to display images in the IPython environment # StringIO is used to create a file-like object in memory # export_graphviz is used to export the decision tree in Graphviz DOT format # pydot is used to interface with the Graphviz library from IPython.display import Image from six import StringIO from sklearn.tree import export_graphviz import pydot features = list(df.columns) features.remove("LUNG_CANCER") Python dot_data = StringIO() export_graphviz(tree_clf, out_file=dot_data, feature_names=features, filled=True) graph = pydot.graph_from_dot_data(dot_data.getvalue()) Image(graph[0].create_png()) 9. Training and evaluation of Random Forest classifier: Python from sklearn.ensemble import RandomForestClassifier # Creating an instance of the Random Forest classifier with n_estimators=100 # which specifies the number of decision trees in the forest rf_clf = RandomForestClassifier(n_estimators=100) rf_clf.fit(X_train, y_train) print_score(rf_clf, X_train, y_train, X_test, y_test, train=True) print_score(rf_clf, X_train, y_train, X_test, y_test, train=False) This code below will generate heatmaps for both the training and testing sets' confusion matrices. The heatmaps use different shades to represent the counts in the confusion matrix. The diagonal elements (true positives and true negatives) will have higher values and appear lighter, while off-diagonal elements (false positives and false negatives) will have lower values and appear darker. Python import seaborn as sns import matplotlib.pyplot as plt # Create heatmap for training set plt.figure(figsize=(8, 6)) sns.heatmap(cm_train, annot=True, fmt='d', cmap='viridis', annot_kws={"size": 16}) plt.title('Confusion Matrix for Training Set') plt.xlabel('Predicted labels') plt.ylabel('True labels') plt.show() # Create heatmap for testing set plt.figure(figsize=(8, 6)) sns.heatmap(cm_test, annot=True, fmt='d', cmap='plasma', annot_kws={"size": 16}) plt.title('Confusion Matrix for Testing Set') plt.xlabel('Predicted labels') plt.ylabel('True labels') plt.show() XGBoost for Classification Python from xgboost import XGBClassifier from sklearn.metrics import accuracy_score # Instantiate XGBClassifier xgb_clf = XGBClassifier() # Train the classifier xgb_clf.fit(X_train, y_train) # Predict on the testing set y_pred = xgb_clf.predict(X_test) # Evaluate accuracy accuracy = accuracy_score(y_test, y_pred) print("Accuracy:", accuracy) The accuracy above indicates that the model's predictions align closely with the actual class labels, demonstrating its effectiveness in distinguishing between the classes. This code below will generate a bar plot showing the relative importance of the top features in the XGBoost model. The importance is typically calculated based on metrics such as gain, cover, or frequency of feature usage across all trees in the ensemble. Python from xgboost import plot_importance import matplotlib.pyplot as plt # Plot feature importance plt.figure(figsize=(10, 6)) plot_importance(xgb_clf, max_num_features=10) # Specify the maximum number of features to show plt.show() 10. Plotting the first tree in the XGBoost model: Python from xgboost import plot_tree # Plot the first tree plt.figure(figsize=(10, 20)) plot_tree(xgb_clf, num_trees=0, rankdir='TB') # Specify the tree number to plot plt.show() Conclusion In conclusion, this article gives an idea about how decision trees and their advanced variants like Random Forest and XGBoost offer powerful tools for classification and regression machine learning tasks. Through this journey, we've explored the fundamental concepts of decision trees, including entropy, information gain, and Gini impurity, which form the basis of their decision-making process. As we continue to delve deeper into the realm of machine learning, the versatility and effectiveness of decision trees and their variants underscore their significance in solving real-world problems across diverse domains. Whether it's classifying medical conditions, predicting customer behavior, or optimizing business processes, decision trees remain a cornerstone in the arsenal of machine learning techniques, driving innovation and progress in the field.
The objective behind the solution described in this blog was to be able to share live 3D objects captured by one person, using “normal-looking” glasses, with another person who can then view them in XR (AR, VR, or MR) and/or 3D print them and do so with a similar experience as to what exists today for 2D pictures and 2D printers. About This Project While Meta Ray-Ban glasses are not XR headsets (they are smart glasses), they are currently the most unobtrusive, aesthetically “normal” looking glasses on the market that can be used to capture video (which can then be turned into 3D objects via an intermediary service) as they are indistinguishable from regular Ray-Ban Headliner and Wayfarer glasses aside from the small camera lenses that are mere millimeters in size. See the side-by-side comparison below. Ray-Ban Wayfarer glasses Meta Ray-Ban Wayfarer glasses The Oracle database plays a central role in the solution as it provides an optimized location for all types of data storage (including 3D objects and point clouds), various inbound and outbound API calls, and AI and spatial operations. Details can be found here. I will start by saying that this process will of course be more streamlined in the future as better hardware and software tech and APIs become available; however, the need for workflow logic, interaction with and exposure of APIs, and central storage will remain a consistent requirement of an optimal architecture for the functionality. It is possible to run both Java and JavaScript from within the Oracle database, load and use libraries for those languages, expose these programs as API endpoints, and make calls out from them. It is also possible to simply issue direct HTTP, REST, etc., commands from PL/SQL using the UTL_HTTP.BEGIN_REQUEST call or, for Oracle AI cloud services (or any Oracle cloud services), by using the DBMS_CLOUD.send_request call. This offers a powerful and flexible architecture where the following four combinations are possible. This being the case, there are several ways to go about the solution described here; for example, by issuing requests directly from the database or an intermediary external application (such as microservices deployed in a Kubernetes cluster) as shown in the previous diagrams. Flow The flow is as follows: The user takes a video with Ray-Bans. The video is automatically sent to Instagram (or Cloud Storage). The Oracle database calls Instagram to get the video and saves it in the database (or object storage, etc.). The Oracle database sends video to the photogrammetry/AI service and retrieves the 3D model/capture generated by it. Optionally, further spatial and AI operations are automatically conducted on the 3D model by the Oracle database. Optionally, further manual modifications are made to the 3D model and a manual workflow step may be added (for example, to gate 3D printing). From here the 3D capture/model can be 3D printed or viewed and interacted with via an XR (VR, AR, MR) headset - or both can be done in parallel. 3D Printing The Oracle database sends the 3D model (.obj file) to PrusaSlicer, which generates and returns G-code from it. The G-code print job is then sent to 3D printer via OctoPrint API server. XR Viewing and Interaction The 3D model is exposed as REST (ORDS) endpoint. The XR headset (Magic Leap 2, Vision Pro, Quest, etc.) receives the 3D model from Oracle database and renders it for viewing and interaction at runtime. In diagram form, the flow looks roughly like this: Now let’s elaborate on each step. Step 1: The User Takes a Video With Ray-Bans As mentioned earlier, I did not pick Meta Ray-Bans due to their XR functionality. Numerous other glasses have actual XR functionality well beyond Ray-Bans, full-on XR headsets with an increasingly better ability to do 3D scans of various types, and 3D scanners devoted to extremely accurate, high-resolution scans. I picked Ray-Bans because they are the glasses that are, in short, the most “normal” looking (without thick lenses or bridges that sit far from the face or extra extensions, etc.). Meta Ray-Bans have a “hey Meta” command that works like Alexa or Siri, though fairly limited at this point, as it can not refer to location services, can send but not read messages, etc. It’s not hard to see how it is possible to use Vision AI, etc. with them. However, built-in functionality does not exist currently, and more importantly, there is no API to access any functionality (there are access hacks out there but this blog will stick to legit, supported approaches), so it is limited for developers at this point. It can play music and, most importantly, take pictures and videos — that is the functionality I am using here. Streaming must be set up on the Meta View app and the Instagram account being streamed to must be a business account. However, both of these are simple configuration steps that can be found in the documentation and do not require additional cost. Step 2: Video Is Automatically Sent to Instagram (Or Cloud Storage) Ray-Ban video recording is limited to one-minute clips, but that is enough for any modern photogrammetry/AI engine to generate a 3D model of small to medium objects. Video taken with the glasses can be stored in cloud services such as iCloud and Google. However, it is not automatically synced until the glasses are placed in the glass case. This is why I opted for storage in Instagram reels, which, not surprisingly, is supported by the Meta Ray-Bans such that videos can be automatically streamed and saved there as they are taken. Setup steps to stream can be found here. 3. Oracle Database Calls Instagram (App) to Get the Video Here, the Oracle database itself listens/polls for new Instagram reels/videos, using the Instagram Graph API. This requires creating a Meta/Instagram application, etc., and here are the steps involved in doing so. Register as a Meta developer. Create an app and submit it for approval. This takes approximately 5 days if everything has been completed correctly for eligibility. This process, in particular for the Instagram Basic Display app type we are creating, is described on this Getting Started page. However, I will provide a few additional screenshots here to elaborate a bit as certain new items around app types, privileges, and app approval processes are missing from the doc. First, it is necessary to select a Consumer app type and then the Instagram Basic Display product. Then the app is submitted for approval for the instagram_graph_user_profile and instagram_graph_user_media: Finally, testers are added/invited for access tokens to be generated. Once the application is set up and access token(s) acquired, a list of the information about the media from the account is obtained by issuing a request in the following format: https://graph.instagram.com/{{IG_USER_ID}/media?fields=id,caption,media_url,media_type,timestamp,children{media_url}&limit=10&access_token={{IG_LONG_LIVED_ACCESS_TOKEN} Finally, the media desired is filtered out from that JSON returned (i.e., any new videos posted), and the media_url can be used to get the actual media. As stated before, the video can be retrieved from the URL using PL/SQL, JavaScript, or Java from within the Oracle database itself, or via an intermediary service called from the database. It can then be saved in the database, object storage, or other storage and sent to the photogrammetry/AI service. An example of doing this with JavaScript from inside the database can be found in my blog How To Call Cohere and Hugging Face AI From Within an Oracle Database Using JavaScript and an example using PL/SQL is presented here: PLSQL CREATE TABLE file_storage ( id NUMBER PRIMARY KEY, filename VARCHAR2(255), file_content BLOB ); DECLARE l_http_request UTL_HTTP.req; l_http_response UTL_HTTP.resp; l_blob BLOB; l_buffer RAW(32767); l_amount BINARY_INTEGER := 32767; l_pos INTEGER := 1; l_url VARCHAR2(1000) := 'https://somesite.com/somefile.obj'; BEGIN INSERT INTO file_storage (id, filename, file_content) VALUES (1, 'file.obj', EMPTY_BLOB()) RETURNING file_content INTO l_blob; -- Open HTTP request to download file l_http_request := UTL_HTTP.begin_request(l_url); UTL_HTTP.set_header(l_http_request, 'User-Agent', 'Mozilla/4.0'); l_http_response := UTL_HTTP.get_response(l_http_request); -- Download the file and write it to the BLOB LOOP BEGIN UTL_HTTP.read_raw(l_http_response, l_buffer, l_amount); DBMS_LOB.writeappend(l_blob, l_amount, l_buffer); l_pos := l_pos + l_amount; EXCEPTION WHEN UTL_HTTP.end_of_body THEN EXIT; END; END LOOP; UTL_HTTP.end_response(l_http_response); COMMIT; DBMS_OUTPUT.put_line('File downloaded and saved.'); EXCEPTION WHEN UTL_HTTP.end_of_body THEN UTL_HTTP.end_response(l_http_response); WHEN OTHERS THEN UTL_HTTP.end_response(l_http_response); RAISE; END; This same technique can be used for any call out from the database and storing of any file/content. Therefore, these snippets can be referred back to for saving any file including the .obj file(s) generated in the next step. Step 4: Oracle Database Sends Video to the Photogrammetry/AI Service and Retrieves the 3D Model/Capture Generated by It There are a few photogrammetry/AI (and Nerf, Splat, etc.) services available. I have chosen to use Luma Labs again because it has an API available for direct HTTPS calls, and examples are also given for over 20 programming languages and platforms. The reference for it can be found here. I will keep things short by giving the succinct curl command for each call in the flow, but the same can be done using PL/SQL, JavaScript, etc. from the database as described earlier. Once a Luma Labs account is created and luma-api-key created, the process of converting the video to a 3D .obj file is as follows: Create/initiate a capture. Shell curl --location 'https://webapp.engineeringlumalabs.com/api/v2/capture' \ --header 'Authorization: luma-api-key={key}' \ --data-urlencode 'title=hand' # example response # { # "signedUrls": { # "source": "https://storage.googleapis.com/..." # }, # "capture": { # "title": "hand", # "type": "reconstruction", # "location": null, # "privacy": "private", # "date": "2024-03-26T15:54:08.268Z", # "username": "paulparkinson", # "status": "uploading", # "slug": "pods-of-kon-66" # } # } This call will return a signedUrls.source URL that is then used to upload the video. Also note the generated slug value returned which will be used to trigger 3D processing, check status processing status, etc. Shell curl --location --request PUT 'https://storage.googleapis.com/...' \ --header 'Content-Type: text/plain' \ --data 'hand.mov' Once the video file is uploaded, the processing is triggered by issuing a POST request to the slug retrieved in step 1. Shell curl --location -g --request POST 'https://webapp.engineeringlumalabs.com/api/v2/capture/{slug}' \ --header 'Authorization: luma-api-key={key}' If the process is triggered successfully, a value of true will be returned and the following can be issued to check the status of the capture by calling the capture endpoint. Shell curl --location -g 'https://webapp.engineeringlumalabs.com/api/v2/capture/{slug}' \ --header 'Authorization: luma-api-key={key}' Once the status returned is equal to complete, the 3D capture zip file (which contains the .obj file as well as the .mtl material mapping file and .png texture files) is downloaded and saved by calling the download endpoint. The approaches mentioned earlier can be used to do this and save the file(s). Step 5: Optionally, Further Spatial and AI Operations Are Automatically Conducted on the 3D Model by the Oracle Database It is also possible to break down the .obj file and store its various vertices, vertice texture/material mappings, etc. in a table as a point cloud for analysis and manipulation. Here is a simple example of that: PLSQL create or replace procedure gen_table_from_obj(id number) as ord MDSYS.SDO_ORDINATE_ARRAY; f UTL_FILE.FILE_TYPE; s VARCHAR2(2000); i number; begin ord := MDSYS.SDO_ORDINATE_ARRAY(); ord.extend(3); f := UTL_FILE.FOPEN('ADMIN_DIR','OBJFROMPHOTOAI.obj', 'R'); i := 1; while true loop UTL_FILE.GET_LINE(f, s); if(s = '') then exit; end if; if(REGEXP_SUBSTR(s, '[^ ]*', 1, 1) = 'v') then ord(1) := TO_NUMBER(REGEXP_SUBSTR(s, '[^ ]*', 1, 3)); ord(2) := TO_NUMBER(REGEXP_SUBSTR(s, '[^ ]*', 1, 5)); ord(3) := TO_NUMBER(REGEXP_SUBSTR(s, '[^ ]*', 1, 7)); insert into INP_OBJFROMPHOTOAI_TABLE(val_d1, val_d2, val_d3) values (ord(1), ord(2), ord(3)); end if; i := i+1; end loop; end; / The Oracle database has had a spatial component for decades now, and recent versions have added several operations for different analyses of point clouds, mesh creation, .obj export, etc. These are described in this video. One operation that has existed for a number of releases is the pc_simplify function shown below. This is often referred to as "decimate" or other terms by various 3D modeling tools and provides the ability to reduce the number of polygons in a mesh, thus reducing overall size. This is handy for a number of reasons, such as when different clients will use the 3D model: for example, a phone with limited bandwidth or need for high-poly meshes. PLSQL procedure pc_simplify( pc_table varchar2, pc_column varchar2, id_column varchar2, id varchar2, result_table_name varchar2, tol number, query_geom mdsys.sdo_geometry default null, pc_intensity_column varchar2 default null) DETERMINISTIC PARALLEL_ENABLE; Step 6: Optionally, Further Manual Modifications Can Also Be Made to the 3D Model, and a Manual Approval Can Be Inserted as Part of the Workflow The 3D model can be loaded from the database, and edited in 3D modeling tools like Blender or 3D printing tools such as BambuLabs numerous others. Due to the many steps involved in this overall process, the solution is also a good fit for a workflow engine such as the one that exists as part of the Oracle database. In this case, a manual review/approval can be inserted as part of the workflow to prevent sending or printing undesired models. From here the 3D capture/model can be 3D printed or viewed and interacted with via an XR (VR, AR, MR) headset - or both can be done in parallel. 3D Printing Step 1: Oracle Database Sends the 3D Model (.Obj File) to PrusaSlicer Which Generates and Returns G-Code From It PrusaSlicer is an extremely robust and successful open-source project/application that takes 3D models (.stl, .obj, .amf) and converts them into G-code instructions for FFF printers or PNG layers for mSLA 3D printers. It supports every conceivable printer and format; however, does not provide an API, only a CLI. There are a few ways to work/hack around this for automation. One, shown here, is to implement a [Spring Boot] microservice that takes the .obj file and executes the PrusaSlicer CLI (which must be accessible to the microservice of course), returning the G-code. Java @RestController public class SlicerController { @PostMapping(value = "/slice", consumes = MediaType.MULTIPART_FORM_DATA_VALUE) public byte[] sliceStlFile(@RequestParam("file") MultipartFile file, @RequestParam("config") String configPath) throws IOException, InterruptedException { Path tempDir = Paths.get(System.getProperty("java.io.tmpdir")); Path stlFilePath = Files.createTempFile(tempDir, "stl", ".stl"); file.transferTo(stlFilePath.toFile()); Path gcodePath = Files.createTempFile(tempDir, "output", ".gcode"); // Execute PrusaSlicer CLI command String command = String.format("PrusaSlicer --slice --load %s --output %s %s", configPath, gcodePath.toString(), stlFilePath.toString()); Process prusaSlicerProcess = Runtime.getRuntime().exec(command); prusaSlicerProcess.waitFor(); // Return G-code byte[] gcodeBytes = FileUtils.readFileToByteArray(gcodePath.toFile()); Files.delete(stlFilePath); Files.delete(gcodePath); return gcodeBytes; } } Once the G-Code for the .obj file has been returned, it can be passed to the OctoPrint API server for printing. Step 2: G-Code Print Job Is Then Sent to 3D Printer via Octoprint API Server OctoPrint is an application for 3D printers that offers a web interface for printer control. It can be installed on essentially any computer (in the case of my setup, a minimal Raspberry Pi) that is connected to the printer. This can even be done over Wi-Fi, cloud, etc. depending on the printer and setup. However, we will keep it to this basic setup. Again, printers have different applications to provide this functionality, but OctoPrint provides a REST API, which allows for programmatic control, including uploading and printing G-code files. First, an API key must be obtained from OctoPrint’s web interface under Settings > API. Then, the G-code file is uploaded and immediately printed by using a call of this format/content: Shell curl -k -X POST "http://octopi.local/api/files/local" \ -H "X-Api-Key: API_KEY" \ -F "file=@/path/to/file.gcode" \ -F "select=true" \ -F "print=true" XR Viewing and Interaction Step 1: 3D Model Is Exposed as a REST (ORDS) Endpoint Any data stored in the Oracle Database can be exposed as a REST endpoint. To expose the .obj file for download by the XR headset, we can REST-enable the table created earlier using the following: PLSQL CREATE OR REPLACE PROCEDURE download_file(p_id IN NUMBER) IS l_blob BLOB; BEGIN SELECT file_content INTO l_blob FROM file_storage WHERE id = p_id; -- Use ORDS to deliver the BLOB to the client ORDS.enable_download(l_blob); EXCEPTION WHEN NO_DATA_FOUND THEN HTP.p('File not found.'); WHEN OTHERS THEN HTP.p('Error retrieving file.'); END download_file; This makes the file (i.e., the .obj, etc. files) accessible via a simple GET call. Shell curl -X GET "http://thedbserver/ords/theschema/file/file/{id}" -o "3Dmodelwithobjandtextures.zip" Here, we are exposing and downloading the file by id. The XR headset keeps track of the 3D models it has received and polls for the next id each time. From here, the 3D model can be viewed on computers, phones, etc. as-is. However, it is obviously more interactive to view in an actual XR headset, which is what I will describe next. Step 2: XR Headset (Magic Leap 2, Vision Pro, Quest, etc.) Receives the 3D Model From the Oracle Database and Renders It For Viewing and Interaction at Runtime The process for receiving and rendering 3D in Unity as I am showing here (and likewise for UnReal) is the same regardless of the headset used. Interaction with 3D objects (via e.g., hand tracking, eye gaze, voice, etc.) has also been standardized with OpenXR and WebXR which Magic Leap, Meta, and others are compliant with. However, Apple (similar to the case of phone, etc. development) has its own development SDK, ecosystem, etc., and regardless, interaction is not the crux of this blog, so I will only cover the important aspects of the 3D object for viewing. There are a couple of assets on the Unity asset store for doing this conveniently. What is shown below is greatly simplified, but explains the general approach. First, the 3D model is downloaded using a script like this: C# using System.Collections; using UnityEngine; using UnityEngine.Networking; using System.IO; public class DownloadAndSave3DMOdel : MonoBehaviour { private string ordsFileUrl = "http://thedbserver/ords/theschema/file/file/{id}"; private string filePathForTextures; void Start() { filePath = Path.Combine(Application.persistentDataPath, "3Dmodelwithobjandtextures.zip"); StartCoroutine(DownloadFile(ordsFileUrl)); } IEnumerator DownloadFile(string url) { using (UnityWebRequest webRequest = UnityWebRequest.Get(url)) { yield return webRequest.SendWebRequest(); if (webRequest.isNetworkError || webRequest.isHttpError) { Debug.LogError("Error: " + webRequest.error); } else { //Unpack the zip here and process each file for the case where it isObjFile, isMtlFile, or isTextureFile; //The texture/png files are written to a file like this if (isTextureFile) File.WriteAllBytes(filePathForTextures, webRequest.downloadHandler.data); //Whereas the .obj and .mtl files are converted to a stream like this else if (isObjFile){ var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(webRequest.downloadHandler.text)); processObjFile(memoryStream); } else if (isMtlFile){ var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(webRequest.downloadHandler.text)); processMtlFile(memoryStream); } } } } } As shown, the the texture/.png files in the zip are saved and the .obj and .mtl files are converted to MemoryStreams for processing/creating the Unity GameObject's MeshRenderer, etc. This parsing is similar to how we created a point cloud table from the .obj in the earlier optional step where we parse the lines of the .obj file; however, a bit more complicated as we also parse the .mtl file (spec explaining these formats can be found here) and apply the textures to create the end resultant 3D model that is rendered for viewing at runtime. This entails this basic logic which will create the GameObject that can then be placed in the headset wearer's FOV or part of a library menu, etc. to interact with. C# // Create the Unity GameObject that will be the main result/holder of our 3D object var gameObject = new GameObject(_name); // add a MeshRenderer and MeshFilter to it var meshRenderer = gameObject.AddComponent<MeshRenderer>(); var meshFilter = gameObject.AddComponent<MeshFilter>(); // Create a Unity Vector object for each of the "v"/Vertices, "vn"/Normals, and "vt"/UVs values parsed from each of the lines in the .obj file new Vector3(x, y, z); // Create a Unity Mesh and add all of the Vertices, Normals, and UVs to it. var mesh = new Mesh(); mesh.SetVertices(vertices); mesh.SetNormals(normals); mesh.SetUVs(0, uvs); // Similarly parse the .mtl file to create the array of Unity Materials. var material = new Material(); material.SetTexture(...); material.SetColor(...); //collect materials into materialArray // Add the mesh and/with materials meshRenderer.sharedMaterials = materialArray; meshFilter.sharedMesh = mesh; Conclusion As you can see, there are many steps to the process; however, hopefully, you have found it interesting to see how it is possible — and will only become easier — to share 3D objects the way we share 2D pictures today. Please let me know if you have any thoughts or questions whatsoever and thank you very much for reading! Source Code The source code for the project can be found here. Video
When ChatGPT became a global phenomenon, books, papers, or articles about AI (Artificial Intelligence) appeared in countless numbers, but most of them were heavy on theory and mathematics. The series of articles "Introduction to Artificial Intelligence with Code" is a compilation of the most fundamental aspects of AI for beginners, presented with a combination of theory and code (C#) to help readers gain a better understanding of the concepts and ideas discussed in these articles. In the first article of the series, we will introduce propositional logic. Theory: An Introduction to Propositional Logic The rules of logic provide precise meanings for propositions. These rules are used to distinguish between valid and invalid mathematical arguments. Alongside its significance in understanding mathematical reasoning, logic also has many applications in computer science, such as designing computer networks, programming, checking program correctness, and so on. Propositions are the building blocks of the logical edifice of propositional logic. A proposition is a statement that is either true or false but cannot be both true and false simultaneously. The truth value of a proposition (in propositional logic) is referred to as its logical value (true or false). Letters are used to symbolize propositions, as well as to represent variables in programming. The commonly used conventions for these letters are p, q, r, s, and so on. Many mathematical propositions are created by combining one or more propositions we already have. These new propositions are called compound propositions (denoted temporarily as F), and they are formed from existing propositions using logical operators. Some basic logical operators are AND, OR, and NOT. A classical application of logical operators in computer science is to design logic gates. To check the truth value of a compound proposition, we need to apply the rules of logic and consider the truth values of the individual propositions along with the logical operators used. Coding: Checking the Truth Value of a Compound Proposition (F) We’ll create a set of classes, all related by inheritance, that will allow us to obtain the output of any F from inputs defined a priori. Here is the first class: C# public abstract class F { public abstract bool Check(); public abstract IEnumerable<Prop> Props(); } The abstract F class states that all its descendants must implement a Boolean method Check() and an IEnumerable<Prop> method Props(). The first will return the evaluation of the compound proposition and the latter the propositions contained within it. Because logical operators share some features, we’ll create an abstract class to group these features and create a more concise, logical inheritance design. The Op class, which can be seen in the code below, will contain the similarities that every logical operator shares: C# public abstract class Op : F { public F P { get; set; } public F Q { get; set; } public Op(F p, F q) { P = p; Q = q; } public override IEnumerable<Prop> Props() { return P.Props().Concat(Q.Props()); } } The first logical operator, the AND, is illustrated: C# public class AND: Op { public AND(F p, F q): base(p, q) { } public override bool Check() { return P.Check() &&Q.Check(); } } The implementation of the AND class is pretty simple. It receives two arguments that it passes to its parent constructor, and the Check method merely returns the logic AND that is built into C#. Very similar are the OR, NOT, and Prop classes, which are shown below: C# //OR class public class OR : Op { public OR(F p, F q): base(p, q) { } public override bool Check() { return P.Check() || Q.Check(); } } //NOT class public class NOT : F { public F P { get; set; } public NOT(F p) { P = p; } public override bool Check() { return !P.Check(); } public override IEnumerable<Prop> Props() { return new List<Prop>(P.Props()); } } The Prop class is the one we use for representing propositions in compound propositions. It includes a truthValue field, which is the truth value given to the proposition (true, false), and when the Props() method is called it returns a List<Prop> whose single element is itself: C# public class Prop : F { public bool truthValue { get; set; } public Prop(bool truthvalue) { truthValue = truthvalue; } public override bool Check() { return truthValue; } public override IEnumerable<Prop> Props() { return new List<Prop>() { this }; } } Creating and checking F = NOT(p) OR q: C# var p = new Prop(false); var q = new Prop(false); var f = new OR(new NOT(p), q); Console.WriteLine(f.Check()); p.trueValue = true; Console.WriteLine(f.Check()); The result looks like this: Summary In this article, we introduced a basic logic — propositional logic — and we also described C# code for representing compound propositions (propositions, logical operators, and so on). In the next article, we’ll introduce a very important logic that extends propositional logic: first-order logic.
Generative AI development has been democratized, thanks to powerful Machine Learning models (specifically Large Language Models such as Claude, Meta's LLama 2, etc.) being exposed by managed platforms/services as API calls. This frees developers from the infrastructure concerns and lets them focus on the core business problems. This also means that developers are free to use the programming language best suited for their solution. Python has typically been the go-to language when it comes to AI/ML solutions, but there is more flexibility in this area. In this post, you will see how to leverage the Go programming language to use Vector Databases and techniques such as Retrieval Augmented Generation (RAG) with langchaingo. If you are a Go developer who wants to how to build and learn generative AI applications, you are in the right place! If you are looking for introductory content on using Go for AI/ML, feel free to check out my previous blogs and open-source projects in this space. First, let's take a step back and get some context before diving into the hands-on part of this post. The Limitations of LLMs Large Language Models (LLMs) and other foundation models have been trained on a large corpus of data enabling them to perform well at many natural language processing (NLP) tasks. But one of the most important limitations is that most foundation models and LLMs use a static dataset which often has a specific knowledge cut-off (say, January 2022). For example, if you were to ask about an event that took place after the cut-off, date it would either fail to answer it (which is fine) or worse, confidently reply with an incorrect response — this is often referred to as Hallucination. We need to consider the fact that LLMs only respond based on the data they were trained on - it limits their ability to accurately answer questions on topics that are either specialized or proprietary. For instance, if I were to ask a question about a specific AWS service, the LLM may (or may not) be able to come up with an accurate response. Wouldn't it be nice if the LLM could use the official AWS service documentation as a reference? RAG (Retrieval Augmented Generation) Helps Alleviate These Issues It enhances LLMs by dynamically retrieving external information during the response generation process, thereby expanding the model's knowledge base beyond its original training data. RAG-based solutions incorporate a vector store which can be indexed and queried to retrieve the most recent and relevant information, thereby extending the LLM's knowledge beyond its training cut-off. When an LLM equipped with RAG needs to generate a response, it first queries a vector store to find relevant, up-to-date information related to the query. This process ensures that the model's outputs are not just based on its pre-existing knowledge but are augmented with the latest information, thereby improving the accuracy and relevance of its responses. But, RAG Is Not the Only Way Although this post focuses solely on RAG, there are other ways to work around this problem, each with its pros and cons: Task-specific tuning: Fine-tuning large language models on specific tasks or datasets to improve their performance in those domains. Prompt engineering: Carefully designing input prompts to guide language models towards desired outputs, without requiring significant architectural changes. Few-shot and zero-shot learning: Techniques that enable language models to adapt to new tasks with limited or no additional training data. Vector Store and Embeddings I mentioned vector store a few times in the last paragraph. These are nothing but databases that store and index vector embeddings, which are numerical representations of data such as text, images, or entities. Embeddings help us go beyond basic search since they represent the semantic meaning of the source data — hence the word Semantic search, which is a technique that understands the meaning and context of words to improve search accuracy and relevance. Vector databases can also store metadata, including references to the original data source (for example, the URL of a web document) of the embedding. Thanks to generative AI technologies, there has also been an explosion in Vector Databases. These include established SQL and NoSQL databases that you may already be using in other parts of your architecture — such as PostgreSQL, Redis, MongoDB, and OpenSearch. But there are also databases that are custom-built for vector storage. Some of these include Pinecone, Milvus, Weaviate, etc. Alright, let's go back to RAG... What Does a Typical RAG Workflow Look Like? At a high level, RAG-based solutions have the following workflow. These are often executed as a cohesive pipeline: Retrieving data from a variety of external sources like documents, images, web URLs, databases, proprietary data sources, etc. This consists of sub-steps such as chunking which involves splitting up large datasets (e.g. a 100 MB PDF file) into smaller parts (for indexing). Create embeddings: This involves using an embedding model to convert data into numerical representations. Store/Index embeddings in a vector store Ultimately, this is integration as part of a larger application where the contextual data (semantic search result) is provided to LLMs (along with the prompts). End-To-End RAG Workflow in Action Each of the workflow steps can be executed with different components. The ones used in the blog include: PostgreSQL: It will be used as a Vector Database, thanks to the pgvector extension. To keep things simple, we will run it in Docker. langchaingo: It is a Go port of the langchain framework. It provides plugins for various components, including vector store. We will use it for loading data from web URLs and indexing it in PostgreSQL. Text and embedding models: We will use Amazon Bedrock Claude and Titan models (for text and embedding respectively) with langchaingo. Retrieval and app integration: langchaingo vector store (for semantic search) and chain (for RAG). You will get a sense of how these individual pieces work. We will cover other variants of this architecture in subsequent blogs. Before You Begin Make sure you have: Go, Docker and psql (for e.g., using Homebrew if you're on Mac) installed. Amazon Bedrock access configured from your local machine - Refer to this blog post for details. Start PostgreSQL on Docker There is a Docker image we can use! docker run --name pgvector --rm -it -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres ankane/pgvector Activate pgvector extension by logging into PostgreSQL (using psql) from a different terminal: # enter postgres when prompted for password psql -h localhost -U postgres -W CREATE EXTENSION IF NOT EXISTS vector; Load Data Into PostgreSQL (Vector Store) Clone the project repository: git clone https://github.com/build-on-aws/rag-golang-postgresql-langchain cd rag-golang-postgresql-langchain At this point, I am assuming that your local machine is configured to work with Amazon Bedrock The first thing we will do is load data into PostgreSQL. In this case, we will use an existing web page as the source of information. I have used this developer guide — but feel free to use your own! Make sure to change the search query accordingly in the subsequent steps. export PG_HOST=localhost export PG_USER=postgres export PG_PASSWORD=postgres export PG_DB=postgres go run *.go -action=load -source=https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html You should get the following output: loading data from https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html vector store ready - postgres://postgres:postgres@localhost:5432/postgres?sslmode=disable no. of documents to be loaded 23 Give it a few seconds. Finally, you should see this output if all goes well: data successfully loaded into vector store To verify, go back to the psql terminal and check the tables: \d You should see a couple of tables — langchain_pg_collection and langchain_pg_embedding. These are created by langchaingo since we did not specify them explicitly (that's ok, it's convenient for getting started!). langchain_pg_collection contains the collection name while langchain_pg_embedding stores the actual embeddings. | Schema | Name | Type | Owner | |--------|-------------------------|-------|----------| | public | langchain_pg_collection | table | postgres | | public | langchain_pg_embedding | table | postgres | You can introspect the tables: select * from langchain_pg_collection; select count(*) from langchain_pg_embedding; select collection_id, document, uuid from langchain_pg_embedding LIMIT 1; You will see 23 rows in the langchain_pg_embedding table, since that was the number of langchain documents that our web page source was split into (refer to the application logs above when you loaded the data) A quick detour into how this works... The data loading implementation is in load.go, but let's look at how we access the vector store instance (in common.go): brc := bedrockruntime.NewFromConfig(cfg) embeddingModel, err := bedrock.NewBedrock(bedrock.WithClient(brc), bedrock.WithModel(bedrock.ModelTitanEmbedG1)) //... store, err = pgvector.New( context.Background(), pgvector.WithConnectionURL(pgConnURL), pgvector.WithEmbedder(embeddingModel), ) pgvector.WithConnectionURL is where the connection information for PostgreSQL instance is provided pgvector.WithEmbedder is the interesting part, since this is where we can plug in the embedding model of our choice. langchaingo supports Amazon Bedrock embeddings. In this case I have used Amazon Bedrock Titan embedding model. Back to the loading process in load.go. We first get the data in form of a slice of schema.Document (getDocs function) using the langchaingo in-built HTML loader for this. docs, err := documentloaders.NewHTML(resp.Body).LoadAndSplit(context.Background(), textsplitter.NewRecursiveCharacter()) Then, we load it into PostgreSQL. Instead of writing everything by ourselves, we can use the langchaingo vector store abstraction and use the high-level function AddDocuments: _, err = store.AddDocuments(context.Background(), docs) Great. We have set up a simple pipeline to fetch and ingest data into PostgreSQL. Let's make use of it! Execute Semantic Search Let's ask a question. I am going with "What tools can I use to design dynamodb data models?" relevant to this document which I used as the data source — feel free to tune it as per your scenario. export PG_HOST=localhost export PG_USER=postgres export PG_PASSWORD=postgres export PG_DB=postgres go run *.go -action=semantic_search -query="what tools can I use to design dynamodb data models?" -maxResults=3 You should see a similar output — note that we opted to output a maximum of three results (you can change it): vector store ready ============== similarity search results ============== similarity search info - can build new data models from, or design models based on, existing data models that satisfy your application's data access patterns. You can also import and export the designed data model at the end of the process. For more information, see Building data models with NoSQL Workbench similarity search score - 0.3141409 ============================ similarity search info - NoSQL Workbench for DynamoDB is a cross-platform, client-side GUI application that you can use for modern database development and operations. It's available for Windows, macOS, and Linux. NoSQL Workbench is a visual development tool that provides data modeling, data visualization, sample data generation, and query development features to help you design, create, query, and manage DynamoDB tables. With NoSQL Workbench for DynamoDB, you similarity search score - 0.3186116 ============================ similarity search info - key-value pairs or document storage. When you switch from a relational database management system to a NoSQL database system like DynamoDB, it's important to understand the key differences and specific design approaches.TopicsDifferences between relational data design and NoSQLTwo key concepts for NoSQL designApproaching NoSQL designNoSQL Workbench for DynamoDB Differences between relational data design and NoSQL similarity search score - 0.3275382 ============================ Now what you see here are the top three results (thanks to -maxResults=3). Note that this is not an answer to our question. These are the results from our vector store that are semantically close to the query — the keyword here is semantic. Thanks to the vector store abstraction in langchaingo, we were able to easily ingest our source data into PostgreSQL and use the SimilaritySearch function to get the top N results corresponding to our query (see semanticSearch function in query.go): Note that (at the time of writing) the pgvector implementation in langchaingo uses cosine distance vector operation but pgvector also supports L2 and inner product - for details, refer to the pgvector documentation. Ok, so far we have: Loaded vector data Executed semantic search This is the stepping stone to RAG (Retrieval Augmented Generation) - let's see it in action! Intelligent Search With RAG To execute a RAG-based search, we run the same command as above (almost), only with a slight change in the action (rag_search): export PG_HOST=localhost export PG_USER=postgres export PG_PASSWORD=postgres export PG_DB=postgres go run *.go -action=rag_search -query="what tools can I use to design dynamodb data models?" -maxResults=3 Here is the output I got (might be slightly different in your case): Based on the context provided, the NoSQL Workbench for DynamoDB is a tool that can be used to design DynamoDB data models. Some key points about NoSQL Workbench for DynamoDB: - It is a cross-platform GUI application available for Windows, macOS, and Linux. - It provides data modeling capabilities to help design and create DynamoDB tables. - It allows you to build new data models or design models based on existing data models. - It provides features like data visualization, sample data generation, and query development to manage DynamoDB tables. - It helps in understanding the key differences and design approaches when moving from a relational database to a NoSQL database like DynamoDB. So in summary, NoSQL Workbench for DynamoDB seems to be a useful tool specifically designed for modeling and working with DynamoDB data models. As you can see, the result is not just about "Here are the top X responses for your query." Instead, it's a well-formulated response to the question. Let's peek behind the scenes again to see how it works. Unlike, ingestion and semantic search, RAG-based search is not directly exposed by the langchaingo vector store implementation. For this, we use a langchaingo chain which takes care of the following: Invokes semantic search Combines the semantic search with a prompt Sends it to a Large Language Model (LLM), which in this case happens to be Claude on Amazon Bedrock. Here is what the chain looks like (refer to the function ragSearch in query.go): result, err := chains.Run( context.Background(), chains.NewRetrievalQAFromLLM( llm, vectorstores.ToRetriever(store, numOfResults), ), question, chains.WithMaxTokens(2048), ) Let’s Try Another One This was just one example. I tried a different question and increased maxResults to 10, which means that the top 10 results from the vector database will be used to formulate the answer. go run *.go -action=rag_search -query="how is NoSQL different from SQL?" -maxResults=10 The result (again, it might be different for you): Based on the provided context, there are a few key differences between NoSQL databases like DynamoDB and relational database management systems (RDBMS): 1. Data Modeling: - In RDBMS, data modeling is focused on flexibility and normalization without worrying much about performance implications. Query optimization doesn't significantly affect schema design. - In NoSQL, data modeling is driven by the specific queries and access patterns required by the application. The data schema is designed to optimize the most common and important queries for speed and scalability. 2. Data Organization: - RDBMS organizes data into tables with rows and columns, allowing flexible querying. - NoSQL databases like DynamoDB use key-value pairs or document storage, where data is organized in a way that matches the queried data shape, improving query performance. 3. Query Patterns: - In RDBMS, data can be queried flexibly, but queries can be relatively expensive and don't scale well for high-traffic situations. - In NoSQL, data can be queried efficiently in a limited number of ways defined by the data model, while other queries may be expensive and slow. 4. Data Distribution: - NoSQL databases like DynamoDB distribute data across partitions to scale horizontally, and the data keys are designed to evenly distribute the traffic across partitions, avoiding hot spots. - The concept of "locality of reference," keeping related data together, is crucial for improving performance and reducing costs in NoSQL databases. In summary, NoSQL databases prioritize specific query patterns and scalability over flexible querying, and the data modeling is tailored to these requirements, in contrast with RDBMS where data modeling focuses on normalization and flexibility. Where to “Go” From Here? Learning by doing is a good approach. If you've followed along and executed the application thus far, great! I recommend you try out the following: langchaingo has support for lots of different models, including ones in Amazon Bedrock (e.g. Meta LLama 2, Cohere, etc.) — try tweaking the model and see if it makes a difference. Is the output better? What about the Vector Database? I demonstrated PostgreSQL, but langchaingo supports others as well (including OpenSearch, Chroma, etc.) - Try swapping out the Vector store and see how/if the search results differ. You probably get the gist, but you can also try out different embedding models. We used Amazon Titan, but langchaingo also supports many others, including Cohere embed models in Amazon Bedrock. Wrap Up This was a simple example for you to better understand the individual steps in building RAG-based solutions. These might change a bit depending on the implementation, but the high-level ideas remain the same. I used langchaingo as the framework. But this doesn't always mean you have to use one. You could also remove the abstractions and call the LLM platforms APIs directly if you need granular control in your applications or the framework does not meet your requirements. Like most generative AI, this area is rapidly evolving, and I am optimistic about having Go developers have more options to build generative AI solutions. If you've feedback or questions, or you would like me to cover something else around this topic, feel free to comment below! Happy building!
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Enterprise AI: The Emerging Landscape of Knowledge Engineering. Artificial intelligence (AI) has evolved from a futuristic idea into a fundamental aspect of contemporary software development. This evolution has introduced significant milestones, reshaping both our interactions with technology and the methodologies of software creation. This article delves into AI's impact on the realm of software development, highlighting how professionals can adapt to and thrive amidst these transformative changes. Positive Impacts of AI on Developers' Jobs AI excels in automating repetitive tasks, ranging from code generation to intricate testing and deployment procedures. Tools like Jenkins and Azure DevOps streamline deployments, enhancing reliability and efficiency, while AI-driven IDEs provide real-time code analysis and bug detection, elevating coding precision and speed. In addition, the advent of AI-assisted tools marks a significant advancement, improving not only coding but also project management. Negative Impacts of AI on Developers' Jobs Despite AI's benefits, there's apprehension over job displacement, with predictions suggesting a significant portion of programming roles may become automated. Additionally, the sophistication of AI systems introduces complexity and necessitates a higher level of expertise, potentially sidelining those without specialized knowledge in AI and machine learning (ML). Some AI tools are now capable of generating complex code structures, which may reduce the need for entry-level programming jobs. According to researchers from OpenAI and the University of Pennsylvania, it is predicted that 80% of the U.S. workforce could see an effect on at least 10% of their tasks. Furthermore, as AI systems become more sophisticated, the complexity in understanding and maintaining these systems increases. For example, the development and maintenance of AI models in platforms like Google's TensorFlow or OpenAI's GPT-3 require specialized knowledge in ML, which is a skill set not all developers possess. Lastly, a heavy reliance on AI tools can lead to a scenario where developers may lack a deep understanding of the underlying code, leading to challenges in troubleshooting and customization. The Challenge of Staying Up to Date The fast-paced nature of AI advancements means that tools and techniques can quickly become outdated. For instance, ML frameworks are continuously updated, requiring developers to constantly learn new methodologies. This was evident when TensorFlow 2.0 was released with significant changes from its predecessor, requiring developers to adapt quickly. The need for continuous learning can be overwhelming, especially for developers who are already managing a full workload. The pace of change can lead to skill gaps, as seen in industries like finance and healthcare, where the adoption of AI has outpaced the workforce's ability to keep up with new technologies. Balancing AI and Human Skills in Development While AI is unparalleled in its ability to sift through and analyze extensive datasets, it's the human element — creativity, intuition, and ethical foresight — that propels truly innovative solutions. The realm of video gaming serves as a prime example of innovation through creativity, where AI assists in crafting intricate environments and behaviors. Yet it's the human touch that weaves the captivating storylines, character arcs, and the overall design, reflecting a deep understanding of narrative and emotional engagement. Finding the balance for ethical considerations and decision-making is imperative. Particularly in healthcare, AI's capacity to sift through patient data and recommend treatments is revolutionary. However, it's the human practitioner's role to weigh these suggestions within an ethical framework and make the final call on patient care, ensuring that technology serves humanity's best interests. AI: A Collaborative Companion, Not a Competitor Viewing AI as an ally in the development process is crucial for leveraging its full potential without undermining the value of human expertise. For example: In cybersecurity, AI's efficiency in identifying threats is invaluable. Nonetheless, it's the human expert's critical thinking and contextual judgment that are irreplaceable in formulating an appropriate response to these threats. The advent of collaborative robots (cobots) in manufacturing illustrates the harmonious blend of AI's precision with human dexterity and adaptability, enhancing productivity and safety. The Symbiotic Relationship Between AI and Human Intelligence A collaboration between human intelligence and AI's capabilities offers a balanced approach to solving complex challenges, leveraging the strengths of both. In financial sectors, AI excels in processing and analyzing market data to unearth trends. Yet it's the nuanced interpretation and strategic decision-making by humans, considering broader economic and geopolitical factors, that drive impactful outcomes. Leading tech firms, including Google and IBM, underscore the necessity of human oversight in AI's evolution. This ensures that AI technologies not only advance in capabilities but also align with ethical standards and human values, fostering a tech ecosystem that respects and enhances human dignity and welfare. The integration of AI in software development is not about displacing human roles but enriching them. By valuing the unique contributions of human creativity, ethical judgment, and strategic thinking alongside AI's analytical prowess, we pave the way for a future where technology amplifies human potential, driving forward innovation in a manner that is both ethical and impactful. Leveraging AI for Innovation The role of AI in software development transcends mere efficiency improvements, acting as a pivotal force for innovation. AI empowers developers to extend the realms of feasibility, facilitating the creation of software solutions that are more advanced, intuitive, and impactful. AI-Driven Creative Problem-Solving AI's unparalleled data processing and analysis capabilities unlock novel approaches for creative problem-solving within software development. Take, for example, predictive analytics for enhanced consumer insights. In the e-commerce domain, AI algorithms predict consumer behavior, allowing businesses to customize their offerings. A notable illustration is Amazon's recommendation system, which leverages AI to analyze consumer interactions and tailor shopping experiences accordingly. Additionally, AI has significantly advanced natural language processing (NLP), enabling the development of user interfaces that mimic human conversation. Siri by Apple exemplifies this, utilizing NLP to interpret and respond to user inquiries in a conversational manner. Pioneering New Software Solutions With AI AI's application spans a diverse array of industries, driving the development of innovative software solutions. AI plays a crucial role in healthcare by enabling the early detection of diseases and personalizing medical treatments. Google's DeepMind, for instance, has developed algorithms capable of identifying eye diseases from retinal scans, marking a significant leap forward in medical diagnostics. In the fintech sector, AI-driven algorithms offer automated trading systems that meticulously analyze market data to execute trades strategically, optimizing financial outcomes. Illustrative Case Studies of AI in Action The integration of AI in real-world development projects showcases its potential to redefine industry standards. Table 1. Case studies of AI in action Sector Example Automotive Tesla's Autopilot system exemplifies AI's capacity to innovate, employing ML to interpret sensor data for autonomous driving decisions. This represents a harmonious blend of AI's analytical prowess with advanced software engineering techniques. Entertainment Netflix leverages AI for content recommendation and optimization, analyzing viewer preferences to personalize content and guide original production decisions. This not only enhances the user experience but also optimizes content creation strategies. Retail operations Walmart's application of AI in managing inventory and enhancing customer service demonstrates its transformative impact. AI enables Walmart to adjust stock levels dynamically and offer personalized shopping experiences, showcasing the broad applicability and potential of AI across different market segments. Overcoming Challenges in AI Adoption The journey toward integrating AI into software development is fraught with unique challenges. Addressing these effectively demands a strategic focus on education, skill acquisition, and adherence to ethical standards. Bridging the Skills Divide Through Education and Training The swift evolution of AI technologies has precipitated a notable skills gap within the industry, necessitating a concerted effort toward continuous education and specialized training. This commitment to education may encompass engaging in specialized online courses, participating in workshops, and becoming actively involved in AI development communities to stay abreast of the latest trends and tools. Giants like IBM and Microsoft have forged alliances with academic institutions, offering AI and machine learning courses and certifications. These initiatives aim to arm developers with the expertise needed to harness AI technologies effectively. Meanwhile, Google has set a precedent with its internal AI training programs, ensuring its workforce remains at the forefront of AI advancements by familiarizing them with the latest tools and methodologies. The future will demand developers to blend AI proficiency with a broad spectrum of skills, including ethical considerations in AI, data science, and specialized industry knowledge. This holistic skill set will enable developers to leverage AI effectively across various application domains. Simplifying AI Adoption Through Accessible Tools and Resources The intricacies of AI tools and frameworks present a significant hurdle, particularly for newcomers to the field. Mastery over these technologies necessitates a considerable investment of time and resources. Efforts by companies with platforms such as Amazon SageMaker exemplify the industry's move toward simplifying AI application development. These platforms streamline the process of building, training, and deploying machine learning models, making AI more accessible. The open-source ecosystem also plays a pivotal role in democratizing AI adoption. Tools like TensorFlow and PyTorch are bolstered by extensive documentation and a supportive community, facilitating a smoother learning curve for developers. Upholding Data Privacy and Security In an era where AI systems frequently handle sensitive data, ensuring privacy and security is imperative. Adhering to stringent regulations such as GDPR and HIPAA is non-negotiable. IBM's AI ethics guidelines offer a blueprint for crafting AI solutions that honor privacy and security principles. The healthcare industry exemplifies the critical importance of data privacy, too. Firms like Epic Systems have integrated AI into their offerings while strictly complying with patient privacy regulations, setting a standard for ethical AI deployment. Overcoming the hurdles associated with AI adoption in software development is an endeavor that extends beyond mere technical implementation. It encompasses a holistic approach involving educational outreach, simplification of technological complexities, and a steadfast commitment to ethical practices. By addressing these facets, the industry can pave the way for a future where AI augments development processes in a manner that is both responsible and inclusive. The Future of AI in Development The trajectory of AI in software development is set toward groundbreaking shifts, fueled by relentless technological advancements and broader AI integration across diverse sectors. This forward-looking perspective offers insights into potential developments and the opportunities they may unveil. Emerging AI Trends and Future Directions As AI becomes increasingly entrenched in software development, we stand on the cusp of significant innovations — innovations by AI platforms illustrate the future of AI in enhancing code quality. These tools are set to extend beyond mere error detection to offer actionable recommendations for optimization, potentially setting new standards for coding efficiency and robustness. And in an era of evolving cyber threats, AI's capacity to preemptively identify and mitigate security risks will be indispensable. Future AI systems are expected to proactively counteract threats, offering a dynamic shield against cyber vulnerabilities. The future of AI in software development is not merely an extension of its current state but a revolution in how we conceive, develop, and optimize software. As we look ahead, the integration of AI promises to not only streamline development processes but also to inspire innovations that were previously unimaginable. The key to thriving in this evolving landscape lies in embracing continuous learning and interdisciplinary expertise, ensuring developers remain at the forefront of this technological renaissance. Conclusion The integration of AI in software development marks a transformative era, bringing both unparalleled opportunities and significant challenges. As innovative, AI-driven solutions reshape the development landscape, it becomes imperative for developers to commit to continuous education in order to balance AI's advanced capabilities with the irreplaceable nuances of human creativity and ethical judgment. Embracing this AI-centric future means not just leveraging its power for efficiency and innovation, but also navigating its complexities with a focus on sustainable and responsible development. Ultimately, the synergy between human intellect and artificial intelligence will define the next frontier in software development, leading to a more efficient, creative, and ethically grounded technological future. This is an excerpt from DZone's 2024 Trend Report, Enterprise AI: The Emerging Landscape of Knowledge Engineering.Read the Free Report
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Enterprise AI: The Emerging Landscape of Knowledge Engineering. AI continues to transform businesses, but this leads to enterprises facing new challenges in terms of digital transformation and organizational changes. Based on a 2023 Forbes report, those challenges can be summarized as follows: Companies whose analytical tech stacks are built around analytical/batch workloads need to start adapting to real-time data processing (Forbes). This change affects not only the way the data is collected, but it also leads to the need for new data processing and data analytics architectural models. AI regulations need to be considered as part of AI/ML architectural models. According to Forbes, "Gartner predicts that by 2025, regulations will force companies to focus on AI ethics, transparency, and privacy." Hence, those platforms will need to comply with upcoming standards. Specialized AI teams must be built, and they should be capable of not only building and maintaining AI platforms but also collaborating with other teams to support models' lifecycles through those platforms. The answer to these new challenges seems to be MLOps, or machine learning operations. MLOps builds on top of DevOps and DataOps as an attempt to facilitate machine learning (ML) applications and a way to better manage the complexity of ML systems. The goal of this article is to provide a systematic overview of MLOps architectural challenges and demonstrate ways to manage that complexity. MLOps Application: Setting Up the Use Case For this article, our example use case is a financial institution that has been conducting macroeconomic forecasting and investment risk management for years. Currently, the forecasting process is based on partially manual loading and postprocessing of external macroeconomic data, followed by statistical modeling using various tools and scripts based on personal preferences. However, according to the institution's management, this process is not acceptable due to recently announced banking regulations and security requirements. In addition, the delivery of calculated results is too slow and financially not acceptable compared to competitors in the market. Investment in a new digital solution requires a good understanding of the complexity and the expected cost. It should start with gathering requirements and subsequently building a minimum viable product. Requirements Gathering For solution architects, the design process starts with a specification of problems that the new architecture needs to solve — for example: Manual data collection is slow, error prone, and requires a lot of effort Real-time data processing is not part of the current data loading approach There is no data versioning and, hence, reproducibility is not supported over time The model's code is triggered manually on local machines and constantly updated without versioning Data and code sharing via a common platform is completely missing The forecasting process is not represented as a business process, all the steps are distributed and unsynchronized, and most of them require manual effort Experiments with the data and models are not reproducible and not auditable Scalability is not supported in case of increased memory consumptions or CPU-heavy operations Monitoring and auditing of the whole process are currently not supported The following diagram demonstrates the four main components of the new architecture: monitoring and auditing platform, model deployment platform, model development platform, and data management platform. Figure 1. MLOps architecture diagram Platform Design Decisions The two main strategies to consider when designing a MLOps platform are: Developing from scratch vs. selecting a platform Choosing between a cloud-based, on-premises, or hybrid model Developing From Scratch vs. Choosing a Fully Packaged MLOps Platform Building an MLOps platform from scratch is the most flexible solution. It would provide the possibility to solve any future needs of the company without depending on other companies and service providers. It would be a good choice if the company already has the required specialists and trained teams to design and build an ML platform. A prepackaged solution would be a good option to model a standard ML process that does not need many customizations. One option would even be to buy a pretrained model (e.g., model as a service), if available on the market, and build only the data loading, monitoring, and tracking modules around it. The disadvantage of this type of solution is that if new features need to be added, it might be hard to achieve those additions on time. Buying a platform as a black box often requires building additional components around it. An important criterion to consider when choosing a platform is the possibility to extend or customize it. Cloud-Based, On-Premises, or Hybrid Deployment Model Cloud-based solutions are already on the market, with popular options provided by AWS, Google, and Azure. In case of no strict data privacy requirements and regulations, cloud-based solutions are a good choice due to the unlimited infrastructural resources for model training and model serving. An on-premises solution would be acceptable for very strict security requirements or if the infrastructure is already available within the company. The hybrid solution is an option for companies that already have part of the systems built but want to extend them with additional services — e.g., to buy a pretrained model and integrate with the locally stored data or incorporate into an existing business process model. MLOps Architecture in Practice The financial institution from our use case does not have enough specialists to build a professional MLOps platform from scratch, but it also does not want to invest in an end-to-end managed MLOps platform due to regulations and additional financial restrictions. The institution's architectural board has decided to adopt an open-source approach and buy tools only when needed. The architectural concept is built around the idea of developing minimalistic components and a composable system. The general idea is built around microservices covering nonfunctional requirements like scalability and availability. Striving for maximal simplicity of the system, the following decisions for the system components were made. Data Management Platform The data collection process will be fully automated. There will be a separate data loading component for each data source due to the heterogeneity of external data providers. The database choice is crucial when it comes to writing real-time data and reading a large amount of data. Due to the time-based nature of the macroeconomic data and the institution's already available relational database specialists, they chose to use the open-source database, TimescaleDB. The possibility to provide a standard SQL-based API, perform data analytics, and conduct data transformations using standard relational database GUI clients will decrease the time to deliver a first prototype of the platform. Data versions and transformations can be tracked and saved into separate data versions or tables. Model Development Platform The model development process consists of four steps: Data reading and transformation Model training Model serialization Model packaging Once the model is trained, the parametrized and trained instance is usually stored as a packaged artifact. The most common solution for code storage and versioning is a Git. Furthermore, the financial institution is already equipped with a solution like GitHub, providing functionality to define pipelines for building, packaging, and publishing the code. The architecture of Git-based systems usually relies on a set of distributed worker machines executing the pipelines. That option will be used as part of the minimalistic MLOps architectural prototype to also train the model. After training a model, the next step is to store it in a model repository as a released and versioned artifact. Storing the model in a database as a binary file, a shared file system, or even an artifacts repository are all acceptable options at that stage. Later, a model registry or a blob storage service could be incorporated into the pipeline. A model's API microservice will expose the model's functionality for macroeconomic projections. Model Deployment Platform The decision to keep the MLOps prototype as simple as possible applies to the deployment phase as well. The deployment model is based on a microservices architecture. Each model can be deployed using a Docker container as a stateless service and be scaled on demand. That principle applies for the data loading components, too. Once that first deployment step is achieved and dependencies of all the microservices are clarified, a workflow engine might be needed for orchestrating the established business processes. Model Monitoring and Auditing Platform Traditional microservices architectures are already equipped with tools for gathering, storing, and monitoring log data. Tools like Prometheus, Kibana, and ElasticSearch are flexible enough for producing specific auditing and performance reports. Open-Source MLOps Platforms A minimalistic MLOps architecture is a good start for the initial digital transformation of a company. However, keeping track of available MLOps tools in parallel is crucial for the next design phase. The following table provides a summary of some of the most popular open-source tools. Table 1. Open-source MLOps tools for initial digital transformations Tool Description Functional Areas Kubeflow Makes deployments of ML workflows on Kubernetes simple, portable, and scalable Tracking and versioning, pipeline orchestration, and model deployment MLflow Is an open-source platform for managing the end-to-end ML lifecycle Tracking and versioning BentoML Is an open standard and SDK for AI apps and inference pipelines; provides features like auto-generation of API servers, REST APIs, gRPC, and long-running inference jobs; and offers auto-generation of Docker container images Tracking and versioning, pipeline orchestration, model development, and model deployment TensorFlow Extended (TFX) Is a production-ready platform; is designed for deploying and managing ML pipelines; and includes components for data validation, transformation, model analysis, and serving Model development, pipeline orchestration, and model deployment Apache Airflow, Apache Beam Is a flexible framework for defining and scheduling complex workflows — data workflows in particular, including ML Pipeline orchestration Summary MLOps is often called DevOps for machine learning, and it is essentially a set of architectural patterns for ML applications. However, despite the similarities with many well-known architectures, the MLOps approach brings some new challenges for MLOps architects. On one side, the focus must be on the compatibility and composition of MLOps services. On the other side, AI regulations will force existing systems and services to constantly adapt to new regulatory rules and standards. I suspect that as the MLOps field continues to evolve, a new type of service providing AI ethical and regulatory analytics will soon become the focus of businesses in the ML domain. This is an excerpt from DZone's 2024 Trend Report, Enterprise AI: The Emerging Landscape of Knowledge Engineering.Read the Free Report
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Enterprise AI: The Emerging Landscape of Knowledge Engineering. In today's digital age, data has become the cornerstone of decision-making across various domains, from business and healthcare to education and government. The ability to collect, analyze, and derive insights from data has transformed how organizations operate, offering unprecedented opportunities for innovation, efficiency, and growth. What Is a Data-Driven Approach? A data-driven approach is a methodology that relies on data analysis and interpretation to guide decision-making and strategy development. This approach encompasses a range of techniques, including data collection, storage, analysis, visualization, and interpretation, all aimed at harnessing the power of data to drive organizational success. Key principles include: Data collection – Gathering relevant data from diverse sources is foundational to ensuring its quality and relevance for subsequent analysis. Data analysis – Processing and analyzing collected data using statistical and machine learning (ML) techniques reveal valuable insights for informed decision-making. Data visualization – Representing insights visually through charts and graphs facilitates understanding and aids decision-makers in recognizing trends and patterns. Data-driven decision-making – Integrating data insights into decision-making processes across all levels of an organization enhances risk management and process optimization. Continuous improvement – Embracing a culture of ongoing data collection, analysis, and action fosters innovation and adaptation to changing environments. Data Integration Strategies Using AI Data integration combines data from various sources for a unified view. Artificial intelligence (AI) improves integration by automating tasks, boosting accuracy, and managing diverse data volumes. Here are the top four data integration strategies/patterns using AI: Automated data matching and merging – AI algorithms, such as ML and natural language processing (NLP), can match and automatically merge data from disparate sources. Real-time data integration – AI technologies, such as stream processing and event-driven architectures, can facilitate real-time data integration by continuously ingesting, processing, and integrating data as it becomes available. Schema mapping and transformation – AI-driven tools can automate the process of mapping and transforming data schemas from different formats or structures. This includes converting data between relational databases, NoSQL databases, and other data formats — plus handling schema evolution over time. Knowledge graphs and graph-based integration – AI can build and query knowledge graphs representing relationships between entities and concepts. Knowledge graphs enable flexible and semantic-driven data integration by capturing rich contextual information and supporting complex queries across heterogeneous data sources. Data integration is the backbone of modern data management strategies, which are pivotal in providing organizations with a comprehensive understanding of their data landscape. Data integration ensures a cohesive and unified view of organizational data assets by seamlessly combining data from disparate sources, such as databases, applications, and systems. One of the primary benefits of data integration is its ability to enhance data quality. By consolidating data from multiple sources, organizations can identify and rectify inconsistencies, errors, and redundancies, thus improving their data's accuracy and reliability. This, in turn, empowers decision-makers to make informed choices based on trustworthy information. Let's look closely at how we can utilize generative AI for data-related processes. Exploring the Impact of Generative AI on Data-Related Processes Generative AI has revolutionized various industries and data-related processes in recent years. Generative AI encompasses a wide array of methodologies, spanning generative adversarial networks (GANs) and variational autoencoders (VAEs) to transformer-based models such as GPT (generative pre-trained transformer). These algorithms showcase impressive abilities in producing lifelike images, text, audio, and even videos, which closely emulate human creativity through generating fresh data samples. Using Generative AI for Enhanced Data Integration Now, we've come to the practical part of the role of generative AI in enhanced data integration. Below, I've provided some real-world scenarios. This will bring more clarity to AI's role in data integration. Table 1. Real-world use cases Industry/Application Example Healthcare/image recognition Generating synthetic medical images for data augmentation in deep learning models Using GANs to create realistic medical images Supplementing limited training data Enhancing the performance of image recognition algorithms Facilitating tasks like disease diagnosis and medical imaging analysis E-commerce Automating schema mapping and transformation for product catalog integration Leveraging generative AI techniques Automatically aligning product attributes and specifications from various vendors Creating a unified schema Facilitating seamless integration of product catalogs Enhancing the shopping experience for customers on e-commerce platforms Social media Utilizing NLP models to extract metadata from user-generated content Analyzing text-based content, including social media posts or comments Extracting valuable metadata such as sentiment, topic, and user preferences Integrating extracted metadata into recommendation systems Personalizing content delivery based on user preferences Enhancing user engagement on social media platforms through personalized recommendations Cybersecurity Using generative AI to detect network traffic anomalies Training on synthetic data resembling real-world patterns Enhancing cybersecurity against threats Improving intrusion detection and response Financial services Integrating diverse market data in real time Using generative AI to aggregate data from various sources Enabling informed decisions and trade execution Continuously updating strategies for changing market conditions Improving investment outcomes and risk management Ensuring Data Accuracy and Consistency Using AI and ML Organizations struggle to maintain accurate and reliable data in today's data-driven world. AI and ML help detect anomalies, identify errors, and automate cleaning processes. Let's look into those patterns a bit closer. Validation and Data Cleansing Data validation and cleansing are often laborious tasks, requiring significant time and resources. AI-powered tools streamline and speed up these processes. ML algorithms learn from past data to automatically identify and fix common quality issues. They can standardize formats, fill in missing values, and reconcile inconsistencies. Automating these tasks reduces errors and speeds up data preparation. Uncovering Patterns and Insights AI and ML algorithms can uncover hidden patterns, trends, and correlations within datasets. By analyzing vast amounts of data, these algorithms can identify relationships that may not be apparent to human analysts. AI and ML can also understand the underlying causes of data quality issues and develop strategies to address them. For example, ML algorithms can identify common errors or patterns contributing to data inconsistencies. Organizations can then implement new processes to improve data collection, enhance data entry guidelines, or identify employee training needs. Anomalies in Data AI and ML algorithms reveal hidden patterns, trends, and correlations in datasets, analyzing vast amounts of data to uncover insights not readily apparent to humans. They also understand the root causes of data quality issues, identifying common errors or patterns causing inconsistencies. This enables organizations to implement new processes, such as refining data collection methods or enhancing employee training, to address these issues. Detecting Anomalies in Data ML models excel at detecting patterns, including deviations from norms. With ML, organizations can analyze large volumes of data, compare them against established patterns, and flag potential issues. Organizations can then identify anomalies and determine how to correct, update, or augment their data to ensure its integrity. Let's have a look at services that can validate data and detect anomalies. Detecting Anomalies Using Stream Analytics Azure Stream Analytics, AWS Kinesis, and Google Cloud Dataflow are examples of tools that provide built-in anomaly detection capabilities, both in the cloud and at the edge, enabling vendor-neutral solutions. These platforms offer various functions and operators for anomaly detection, allowing users to monitor anomalies, including temporary and persistent ones. For example, based on my experience building validation using Stream Analytics, here are several key actions to consider following: The model's accuracy improves with more data in the sliding window, treating it as expected within the timeframe. It focuses on event history in the window to spot anomalies, discarding old values as it moves. Functions establish a baseline normal by comparing past data and identifying outliers within a confidence level. Set the window size based on the minimum events needed for practical training. Response time increases with history size, so include only necessary events for better performance. Based on ML, you can monitor temporary anomalies like spikes and dips in a time series event stream using the AnomalyDetection_SpikeAndDip operator. If a second spike within the same sliding window is smaller than the first, its score might not be significant enough compared to the first spike within the specified confidence level. To address this, consider adjusting the model's confidence level. However, if you receive too many alerts, use a higher confidence interval. Leveraging Generative AI for Data Transformation and Augmentation Generative AI helps with data augmentation and transformation, which are also part of the data validation process. Generative models can generate synthetic data that resembles actual data samples. This can be particularly useful when the available dataset is small or needs more diversity. Generative models can also be trained to translate data from one domain to another, or to transform data while preserving its underlying characteristics. For example, sequence-to-sequence models like transformers can be used in NLP for tasks such as language translation or text summarization, effectively transforming the input data into a different representation. Also, the data transformation process can be used to solve problems in legacy systems based on an old codebase. Organizations can unlock numerous benefits by transitioning to modern programming languages. For instance, legacy systems are built on outdated programming languages such as Cobol, Lisp, and Fortran. To modernize and enhance their performance, we must migrate or rewrite them using the latest high-performance and sophisticated programming languages like Python, C#, or Go. Let's look at the diagram below to see how generative AI can be used to facilitate this migration process: Figure 1. Using generative AI to rewrite legacy code The architecture above is based on the following components and workflow: Azure Data Factory is the main ETL (extract, transform, load) for data orchestration and transformation. It connects to the source repo Git repositories. Alternatively, we can use AWS Glue for data integration and Google Cloud Data Fusion for ETL data operation. OpenAI is the generative AI service used to transform Cobol and C++ to Python, C#, and Golang (or any other language). The OpenAI service is connected to Data Factory. Alternatives to OpenAI are Amazon SageMaker or Google Cloud AI Platform. Azure Logic Apps and Google Cloud Functions are utility services that provide data mapping and file management capabilities. DevOps CI/CD provides pipelines to validate, compile, and interpret generated code. Data Validation and AI: Chatbot Call Center Use Case An automated call center setup is a great use case to demonstrate data validation. The following example provides an automation and database solution for call centers: Figure 2. Call center chatbot architecture The automation and database solution extracts data from the speech bot deployed in call centers or from interactions with real people. It then stores, analyzes, and validates this data using OpenAI's ChatGPT and an AI sentiment analysis service. Subsequently, the analyzed data is visualized using business intelligence (BI) dashboards for comprehensive insights. The processed information is also integrated into the customer relationship management (CRM) systems for human validation and further action. The solution ensures accurate understanding and interpretation of customer interactions by leveraging ChatGPT, an advanced NLP model. Using BI dashboards offers intuitive and interactive data visualization capabilities, allowing stakeholders to gain actionable insights at a glance. Integrating the analyzed data into CRM systems enables seamless collaboration between automated analysis and human validation. Conclusion In the ever-evolving landscape of enterprise AI, achieving data excellence is crucial. Data and generative AI services that provide data analysis, ETL, and NLP enable robust integration strategies for unlocking the full potential of data assets. By combining data-driven approaches and advanced technologies, businesses can pave the way for enhanced decision-making, productivity, and innovation through these AI and data services. This is an excerpt from DZone's 2024 Trend Report,Enterprise AI: The Emerging Landscape of Knowledge Engineering.Read the Free Report
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Enterprise AI: The Emerging Landscape of Knowledge Engineering. Generative AI, a subset of artificial intelligence (AI), stands as a transformative technology. Leveraging deep learning models, it exhibits a unique ability to interpret inputs spanning text, image, audio, video, or code and seamlessly generate novel content across various modalities. This innovation has broad applications, ranging from turning textual inputs into visual representations to transforming videos into textual narratives. Its proficiency lies in its capacity to generate high-quality and contextually relevant outputs, a testament to its potential in reshaping content creation. An example of this is found in Figure 1, which shows an application of generative AI where text prompts have been converted to an image. Figure 1. DALL·E 2 generates an image from text prompt Journey of Generative AI The fascinating journey of AI started a couple of centuries back, and Table 1 below highlights the key milestones in the evolution of generative AI, covering significant launches and advancements over the years: Table 1. Key milestones in the evolution of generative AI Major Launches 1805: First neural network (NN)/linear regression 1997: Introduction of LSTM 1925: First recurrent neural network (RNN) architecture 2014: Variational autoencoder, GAN, GRU 1958: Multi-layer perceptron — no deep learning 2017: Transformers 1965: First deep learning 2018: GPT, BERT 1972: Published artificial RNNs 2021: DALL·E 1980: Release of autoencoders 2022: Latent diffusion, DALL·E 2, Midjourney, Stable Diffusion, ChatGPT, AudioLM 1986: Invention of backpropagation 2023: GPT-4, Falcon, Bard, MusicGen, AutoGPT, LongNet, Voicebox, LLaMA 1990: Introduction of GAN/Curiosity 2024: Sora, Stable Cascade 1995: Release of LeNet-5 Generative AI Across Modalities Generative AI spans various modalities, as enlisted in Table 2 below, showcasing its versatile capabilities: Table 2. Generative AI modalities and major open-source tools Modality Tools Text OpenAI GPT, Transformer models (TensorFlow, PyTorch), BERT (Google) Code CodeT5, PolyCoder Image StyleGAN (NVlabs), DALL·E (OpenAI), CycleGAN (junyanz), BigGAN (Google), Stable Diffusion, StableStudio, Waifu Diffusion Audio WaveNet (DeepMind), Tacotron 2 (Google), MelGAN (descriptinc) 3D object 3D-GANs, PyTorch3D Video Video Generation with GANs, Temporal Generative Adversarial Nets (TGANs) How Does Generative AI Work? Generative AI leverages the pathbreaking models like transformer models, generative adversarial networks, and variational autoencoders to leverage its full potential. The Transformer Model The transformer architecture relies on a self-attention mechanism, discarding sequential processing constraints found in recurrent neural networks. The model's attention mechanism allows it to weigh input tokens differently, enabling the capture of long-range dependencies and improving parallelization during training. Transformers consist of an encoder-decoder structure, with multiple layers of self-attention and feedforward sub-layers. Models like OpenAI's GPT series utilize transformer architectures for autoregressive language modeling, where each token is generated based on the preceding context. The bidirectional nature of self-attention, coupled with the ability to handle context dependencies effectively, results in the creation of coherent and contextually relevant sequences, making transformers a cornerstone in the development of large language models (LLMs) for diverse generative applications like machine translation, text summarization, question answering, and text generation. Figure 2. Transformer architecture Generative Adversarial Networks Comprising two neural networks, namely the discriminator and the generator, generative adversarial networks (GANs) operate through adversarial training to achieve unparalleled results in unsupervised learning. The generator, driven by random noise, endeavors to deceive the discriminator, which, in turn, aims to accurately distinguish between genuine and artificially produced data. This competitive interaction propels both networks toward continuous improvement, generating realistic and high-quality samples. GANs find versatility in a myriad of applications, notably in image synthesis, style transfer, and text-to-image synthesis. Variational Autoencoders Variational autoencoders (VAEs) are designed to capture and learn the underlying probability distribution of input data, enabling them to generate new samples that share similar characteristics. The architecture of a VAE consists of an encoder network, responsible for mapping input data to a latent space, and a decoder network, which reconstructs the input data from the latent space representation. A key feature of VAEs lies in their ability to model the uncertainty inherent in the data by learning a probabilistic distribution in the latent space. This is achieved through the introduction of a variational inference framework, which incorporates a probabilistic sampling process during training. Their applications span various domains, including image and text generation, and data representation learning in complex high-dimensional spaces. Figure 3. Q/A generation from image The State of the Art Generative AI, with its disruptive innovation, leaves a profound impact across the industry. Generative Use Cases and Applications Generative AI exhibits a broad range of applications across various industries, revolutionizing processes and fostering innovation. Table 3 showcases how it is reshaping various industries: Table 3. Applications of generative AI across industries Sector Applications Healthcare Medical image generation and analysis, drug discovery, personalized treatment plans Finance Personalized risk assessment and financial advice, compliance monitoring Marketing Content creation, ad copy generation, personalized marketing campaigns Manufacturing 3D model generation for product design Retail Personalized product recommendations, virtual try-on experiences Education Adaptive learning materials, content generation for e-learning platforms Legal Document summarization, contract drafting, legal research assistance Entertainment Scriptwriting assistance, video game content generation, music composition Human resources Employee training content generation The Business Benefits Generative AI offers a myriad of business benefits, including the amplification of creative capabilities, empowering enterprises to autonomously produce expansive and innovative content. It creates significant time and cost efficiencies by automating tasks that previously required human intervention. Hyper-personalized experiences are achieved through customer data, generating recommendations and offers tailored to individual preferences. Furthermore, generative AI enhances operational efficiency by automating intricate processes, optimizing workflows, and facilitating realistic simulations for training and entertainment. The technology's adaptive learning capabilities allow continuous improvement based on feedback and new data, culminating in refined performance over time. Lastly, generative AI elevates customer interaction with dynamic AI agents capable of providing responses that mimic human conversation, contributing to an enhanced customer experience. Managing the Risks of Generative AI Effectively managing the risks associated with the widespread adoption of generative AI is crucial as this technology transforms various business aspects. Ethical guidelines focused on accuracy, safety, honesty, empowerment, and sustainability provide a framework for responsible AI development. Integrating generative AI requires using reliable data, ensuring transparency, and maintaining a human-in-the-loop approach. Ongoing testing, oversight, and feedback mechanisms are essential to prevent unintended consequences. Generative AI for Enterprises This section delves into the key methodologies for enterprises to make a transformative leap in innovation and productivity. Build Foundation Models Foundation models (FMs) like BERT and GPT are trained on extensive, generalized, and unlabeled datasets, enabling them to excel in diverse tasks, including language understanding, text and image generation, and natural language conversation. These FMs serve as base models for specialized downstream applications, evolving over a decade to handle increasingly complex tasks. The ability to continually learn from data inputs during inference enhances their effectiveness, supporting tasks like language processing, visual comprehension, code generation, human-centered engagement, and speech-to-text applications. Figure 4. Foundation model Bring your own model (BYOM) is a commitment to amplifying the platform's versatility, fostering a collaborative environment, and propelling a new era of AI innovation. BYOM's promise lies in the freedom to innovate, offering a personalized approach to AI solutions that align with individual visions. Improving an existing model involves a multifaceted approach, encompassing fine-tuning, dataset augmentation, and architectural enhancements. Fine-Tuning While pre-trained language models offer the advantage of being trained on massive datasets and generating text akin to human language, they may not always deliver optimal performance in specific applications or domains. Fine-tuning involves updating pre-trained models with new information or data, allowing them to adapt to tasks or domains. Fine-tuning pre-trained models is crucial for achieving high accuracy and relevance in generating outputs, especially when dealing with specific and nuanced tasks within various domains. Reinforcement Learning From Human Feedback The primary objective of reinforcement learning from human feedback (RLHF) is to leverage human feedback to enhance the efficiency and accuracy of ML models, specifically those employing reinforcement learning methodologies to maximize rewards. The RLHF process involves stages such as data collection, supervised fine-tuning of a language model, building a separate reward model, and optimizing the language model with the reward-based model. Retrieval Augmented Generation LLMs are instrumental in tasks like question-answering and language translation. However, inherent challenges, such as potential inaccuracies and the static nature of training data, can impact reliability and user trust. Retrieval-augmented generation (RAG) addresses these issues by seamlessly integrating domain-specific or organizational knowledge into LLMs, enhancing their relevance, accuracy, and utility without necessitating retraining. Figure 5. Retrieval-augmented generation The Tech Stack The LLMOps tech stack encompasses five key areas. The table below exhibits the key components of the five tech stack areas: Table 4. LLMOps tech stack components Stack Area Key Components Data management Data storage and retrieval Data processing Quality control Data distribution Model management Hosting the model Model testing Version control and model tracking Model training and fine tuning Model deployment Frameworks Event-driven architecture Prompt engineering and optimization Prompt development and testing Prompt analysis Prompt versioning Prompt chaining and orchestration Monitoring and logging Performance monitoring Logging Performance Evaluation Quantitative methods offer objective metrics, utilizing scores like inception score, Fréchet inception distance, or precision and recall for distributions to quantitatively measure the alignment between generated and real data distributions. Qualitative methods delve into visual and auditory inspection, employing techniques like visual inspection, pairwise comparison, or preference ranking to gauge the realism, coherence, and appeal of generated data. Hybrid methods integrate both quantitative and qualitative approaches like human-in-the-loop evaluation, adversarial evaluation, or Turing tests. What's Next? The Future of Generative AI Looking at the future of generative AI, three transformative avenues stand prominently on the horizon. The Genesis of Artificial General Intelligence The advent of artificial general intelligence (AGI) heralds a transformative era. AGI aims to surpass current AI limitations, allowing systems to excel in tasks beyond predefined domains. It distinguishes itself through autonomous self-control, self-understanding, and the ability to acquire new skills akin to human problem-solving capacities. This juncture marks a critical moment in the pursuit of AGI, envisioning a future where AI systems possess generalized human cognitive abilities and transcend current technological limitations. Integrating Perceptual Systems Through Human Senses Sensory AI stands at the forefront of generative AI evolution. Beyond computer vision, sensory AI encompasses touch, smell, and taste, aiming for a nuanced, human-like understanding of the world. The emphasis on diverse sensory inputs, including tactile sensing, olfactory, and gustatory AI, signifies a move toward human-like interaction and recognition capabilities. Computational Consciousness Modeling Focused on attributes like fairness, empathy, and transparency, computational consciousness modeling (CoCoMo) employs consciousness modeling, reinforcement learning, and prompt template formulation to instill knowledge and compassion in AI agents. CoCoMo guides generative AI toward a future where ethical and emotional dimensions seamlessly coexist with computational capabilities, fostering responsible and empathetic AI agents. Parting Thoughts This article discussed the foundational concepts to diverse applications across modalities and delved into the mechanisms, highlighting the power of the transformer model and the creativity of GANs and VAEs. The journey encompassed business benefits, risk management, and a forward-looking perspective on unprecedented advancements and the potential emergence of AGI, sensory AI, and artificial consciousness. Finally, it is encouraged to contemplate the future implications and ethical dimensions of generative AI, acknowledging the transformative journey that presents both opportunities and responsibilities in integrating generative AI into our daily lives. Repositories: A curated list of modern Generative Artificial Intelligence projects and services Home of CodeT5: Open Code LLMs for Code Understanding and Generation StableStudio GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis This is an excerpt from DZone's 2024 Trend Report,Enterprise AI: The Emerging Landscape of Knowledge Engineering.Read the Free Report
Tuhin Chattopadhyay
CEO at Tuhin AI Advisory and Professor of Practice,
JAGSoM
Yifei Wang
Senior Machine Learning Engineer,
Meta
Austin Gil
Developer Advocate,
Akamai
Tim Spann
Principal Developer Advocate,
Zilliz