AI/ML Resources

DZone's Featured AI/ML Resources

Semi-Supervised Learning: How to Overcome the Lack of Labels

By Aleksandr Timashov

All successfully implemented machine learning models are backed by at least two strong components: data and model. In my discussions with ML engineers, I heard many times that, instead of spending a significant amount of time on data preparation, including labeling for supervised learning, they would rather spend their time on model development. When it comes to most problems, labeling huge amounts of data is way more difficult than obtaining it in the first place. Unlabeled data fails to provide the desired accuracy during training, and labeling huge datasets for supervised learning can be time-consuming and expensive. What if the data labeling budget was limited? What data should be labeled first? These are just some of the daunting questions facing ML engineers who would rather be doing productive work instead. In reality, there are many fields where a lack of labels is natural. Below are some examples of fields where we can observe a lack of labels and the reasons why this occurs. Researchers and practitioners have developed several strategies to address these labeling challenges: Transfer learning and domain adaptationSynthetic data generationSemi-supervised learningActive learning Among these approaches, semi-supervised learning stands out as a particularly promising solution. This technique allows leveraging both small amounts of labeled data and much larger amounts of unlabeled data simultaneously. By combining the strengths of supervised and unsupervised learning, semi-supervised learning offers a potential solution to the labeling challenge while maintaining model performance. In this article, we will dive into the concept of semi-supervised learning and explore its principles, applications, and potential to revolutionize how we approach data-hungry ML tasks. Understanding Semi-Supervised Learning Semi-supervised learning is a machine learning approach that combines supervised and unsupervised learning by training models with a small amount of labeled data and a larger pool of unlabeled data. This approach can be represented mathematically as follows: Let DS: (x, y) ~ p(x,y) be a small labeled dataset, and DU: x ~ p(x) be a large unlabeled dataset. As usual, we use labeled data for supervised learning, and unlabeled data for unsupervised learning.In semi-supervised learning, we use both datasets to minimize a loss function that combines supervised and unsupervised components: L = μsLs + μu Lu.This loss function allows the model to learn from both labeled and unlabeled data simultaneously. It is worth mentioning that this method is more successful with a larger amount of labeled data. Semi-supervised learning is especially useful when acquiring a comprehensive set of labeled data is too costly or impractical. However, its effectiveness is dependent on the assumption that unlabeled data can provide meaningful information for model training, which is not always the case. The difficulty lies in balancing the use of labeled and unlabeled data, as well as ensuring that the model does not reinforce incorrect pseudo-labels generated by the unlabeled data. Core Concepts in Semi-Supervised Learning The research community has introduced several semi-supervised learning concepts. Let’s dive into the most impactful ones below. Confidence and Entropy The main idea of entropy minimization is to ensure that a classifier trained on labeled data makes confident predictions on unlabeled data as well (as in, produces predictions with minimal entropy). Entropy, in this context, refers to the uncertainty of the model's predictions. Lower entropy indicates higher confidence. This approach has been proven to have a regularizing effect on the classifier. A similar concept is pseudo labeling, also known as self-training in some literature, which involves: Asking the classifier to predict labels for unlabeled data.Using the most confidently predicted samples as additional ground truth for the next iteration of training. This is a basic type of semi-supervised learning and should be applied carefully. The reinforcing effect on the model can potentially amplify initial biases or errors if not properly managed. Other examples of similar methods include: Co-trainingMulti-view trainingNoisy student The general process for these methods typically follows these stages: A model is first trained on a small set of labeled data.The model generates pseudo-labels by predicting labels for a larger set of unlabeled data.The most confident labels (with minimal entropy) are chosen to enrich the training dataset.The model is retrained using the enriched dataset from step 3. This iterative process aims to leverage the model's growing confidence to improve its performance on both labeled and unlabeled data. Label Consistency and Regularization This approach is based on the idea that the prediction should not change the class if we apply simple augmentation to the sample. Simple augmentation refers to minor modifications of the input data, such as slight rotations, crops, or color changes for images. The model is then trained on unlabeled data to ensure that predictions between a sample and its augmented version are consistent. This concept is similar to ideas from self-supervised learning approaches based on consistency constraints. Examples of techniques using this approach include: Pi-ModelTemporal EnsemblingMean TeacherFixMatch algorithmVirtual Adversarial Training (VAT) The main steps in this approach are: Take an unlabeled sample.Create a few different views (augmentations) of the chosen sample.Apply a classifier and ensure that the predictions for these views are roughly similar. This method leverages the assumption that small changes to the input should not dramatically alter the model's prediction, thus encouraging the model to learn more robust and generalizable features from the unlabeled data. Unlike the Confidence and Entropy approach, which focuses on maximizing prediction confidence, Label Consistency and Regularization emphasizes the stability of predictions across similar inputs. This can help prevent overfitting to specific data points and encourage the model to learn more meaningful representations. Generative Models Generative models in semi-supervised learning utilize a similar method to transfer learning in supervised learning, where features learned on one task can be transferred to other downstream tasks. However, there's a key difference: generative models are able to learn the data distribution p(x), generate samples from this distribution, and ultimately enhance supervised learning by improving the modeling of p(y|x) for a given sample x with a given target label y. This approach is particularly useful in semi-supervised learning because it can leverage large amounts of unlabeled data to learn the underlying data distribution, which can then inform the supervised learning task. The most popular types of generative models used to enhance Semi-Supervised Learning are: GANs (Generative Adversarial Networks)VAEs (Variational Autoencoders) The procedure typically follows these steps: Construct both generative and supervised parts of the loss function.Train generative and supervised models simultaneously using the combined loss function.Use the trained supervised model for the target task. In this process, the generative model learns from both labeled and unlabeled data, helping to capture the underlying structure of the data space. This learned structure can then inform the supervised model, potentially improving its performance, especially when labeled data is scarce. Graph-Based Semi-Supervised Learning Graph-based semi-supervised learning methods employ a graph data structure to represent both labeled and unlabeled data as nodes. This approach is particularly effective in capturing complex relationships between data points, making it useful when the data has inherent structural or relational properties. In this method, labels are propagated through the graph. The number of paths from an unlabeled node to labeled nodes aids in determining its label. This approach leverages the assumption that similar data points (connected by edges in the graph) are likely to have similar labels. The procedure typically follows these steps: Construct a graph with nodes representing data points (both labeled and unlabeled).Connect nodes by edges, often based on similarity measures between data points (e.g., k-nearest neighbors or Gaussian kernel).Use graph algorithms (such as Label Propagation or Graph Neural Networks) to spread labels from labeled nodes to unlabeled nodes.Assign labels to unlabeled nodes based on the propagated information.Optionally repeat the process to refine labels on unlabeled nodes. This method is particularly advantageous when dealing with data that has a natural graph structure (e.g., social networks, citation networks) or when the relationship between data points is crucial for classification. However, performance can be sensitive to the choice of graph construction method and similarity measure. Common algorithms in this approach include Label Propagation, Label Spreading, and more recently, Graph Neural Networks. Examples in Research Semi-supervised learning has led to significant advances across various domains, including speech recognition, web content classification, and text document analysis. These advancements have not only improved performance in tasks with limited labeled data but have also introduced novel approaches to leveraging unlabeled data effectively. Below, I present a selection of papers that, in my view, represent some of the most impactful and interesting contributions to the field of semi-supervised learning. These works have shaped our understanding of the subject and continue to influence current research and applications. Temporal Ensembling for Semi-Supervised Learning (2017): Laine and Aila This paper introduced the concept of consistency regularization, a cornerstone of many subsequent semi-supervised learning methods. The authors first proposed the Pi-Model, which applies stochastic augmentations to each unlabeled input twice and encourages consistent predictions for both versions. This approach leverages the idea that a model should produce similar outputs for perturbed versions of the same input. Building upon the Pi-Model, the authors then introduced Temporal Ensembling. This method addresses a key limitation of the Pi-Model by reducing the noise in the consistency targets. Instead of comparing predictions from two concurrent passes, Temporal Ensembling maintains an exponential moving average (EMA) of past predictions for each unlabeled example. This EMA serves as a more stable target for the consistency loss, effectively ensembling the model's predictions over time. The Temporal Ensembling approach significantly improved upon the Pi-Model, demonstrating better performance and faster convergence. This work had a substantial impact on the field, laying the foundation for numerous consistency-based methods in SSL and showing how leveraging a model's own predictions could lead to improved learning from unlabeled data. Virtual Adversarial Training (2018): Miyato et al. Virtual Adversarial Training (VAT) cleverly adapted the concept of adversarial attacks to semi-supervised learning. The idea originated from the well-known phenomenon of adversarial examples in image classification, where small, imperceptible perturbations to an input image could dramatically change a model's prediction. Researchers discovered these perturbations by using backpropagation to maximize the change in the model's output but with respect to the input rather than the model weights. VAT's key innovation was to apply this adversarial perturbation concept to unlabeled data in a semi-supervised setting. Instead of using backpropagation to find perturbations that change the model's prediction, VAT uses it to find perturbations that would most significantly alter the model's predicted distribution. The model is then trained to resist these perturbations, encouraging consistent predictions even under small, adversarial changes to the input. This method tackled the problem of improving model robustness and generalization in SSL. VAT's impact was significant, showing how adversarial techniques could be effectively used in SSL and opening up new avenues for research at the intersection of adversarial robustness and semi-supervised learning. It demonstrated that principles from adversarial machine learning could be repurposed to extract more information from unlabeled data, leading to improved semi-supervised learning performance. Mean Teacher (2017): Tarvainen and Valpola The Mean Teacher method introduced a simple yet effective approach to creating high-quality consistency targets in SSL. Its key innovation was the use of an exponential moving average of model weights to create a “teacher” model, which provides targets for the "student" model. This addressed the problem of stabilizing training and improving performance in SSL. While both Mean Teacher and Temporal Ensembling use EMA, they apply it differently: Temporal Ensembling applies EMA to the predictions for each data point over different epochs. This creates stable targets but updates slowly, especially for large datasets where each example is seen infrequently.Mean Teacher, on the other hand, applies EMA to the model weights themselves. This creates a teacher model that's an ensemble of recent student models. The teacher can then generate consistency targets for any input, including unseen augmentations, allowing for more frequent updates. This subtle difference allows Mean Teacher to adapt more quickly to new data and provide more consistent targets, especially early in training and for larger datasets. It also enables the use of different augmentations for the student and teacher models, potentially capturing a broader range of invariances. Mean Teacher demonstrated that simple averaging techniques could lead to significant improvements in SSL performance. It inspired further research into teacher-student models in SSL and showed how the ideas from Temporal Ensembling could be extended and improved upon. Unsupervised Data Augmentation (2020): Xie et al. Unsupervised Data Augmentation (UDA) leveraged advanced data augmentation techniques for consistency regularization in SSL. The key innovation was the use of state-of-the-art data augmentation methods, particularly in NLP tasks where such techniques were less explored. By “advanced data augmentation,” the authors refer to more sophisticated transformations that go beyond simple perturbations: For image tasks: UDA uses RandAugment, which automatically searches for optimal augmentation policies. This includes combinations of color adjustments, geometric transformations, and various filters.For text tasks: UDA introduces methods like back-translation and word replacing using TF-IDF. Back-translation involves translating a sentence to another language and then back to the original, creating a paraphrased version. TF-IDF-based word replacement swaps words with synonyms while preserving the sentence's overall meaning. These advanced augmentations create more diverse and semantically meaningful variations of the input data, helping the model learn more robust representations. UDA addressed the problem of improving SSL performance across various domains, with a particular focus on text classification tasks. Its impact was significant, demonstrating the power of task-specific data augmentation in SSL and achieving state-of-the-art results in several benchmarks with limited labeled data. The success of UDA highlighted the importance of carefully designed data augmentation strategies in semi-supervised learning, especially for domains where traditional augmentation techniques were limited. FixMatch (2020): Sohn et al. FixMatch represents a significant simplification in semi-supervised learning techniques while achieving state-of-the-art performance. The key innovation lies in its elegant combination of two main ideas: Consistency regularization: FixMatch uses strong and weak augmentations on unlabeled data. The model's prediction on weakly augmented data must match its prediction on strongly augmented data.Pseudo-Labeling: It only retains pseudo-labels from weakly augmented unlabeled data when the model's prediction is highly confident (above a set threshold). What sets FixMatch apart is its use of extremely strong augmentations (like RandAugment) for the consistency regularization component, coupled with a simple threshold-based pseudo-labeling mechanism. This approach allows the model to generate reliable pseudo-labels from weakly augmented images and learn robust representations from strongly augmented ones. FixMatch demonstrated remarkable performance with extremely limited labeled data, sometimes using as few as 10 labeled examples per class. Its success showed that a well-designed, straightforward SSL algorithm could outperform more complex methods, setting a new benchmark in the field and influencing subsequent research in low-label regimes. Noisy Student (2020): Xie et al. Noisy Student introduced an iterative self-training approach with noise injection for SSL, marking a significant milestone in the field. The key innovation was the use of a large EfficientNet model as a student, trained on the noisy predictions of a teacher model, with the process repeated iteratively. What sets Noisy Student apart is its groundbreaking performance: Surpassing supervised learning: Notably, it was the first SSL method to outperform purely supervised learning even when a large amount of labeled data was available. This breakthrough challenged the conventional wisdom that SSL was only beneficial in low-labeled data regimes.Scale and effectiveness: The method demonstrated that by leveraging a large amount of unlabeled data (300M unlabeled images), it could improve upon state-of-the-art supervised models trained on all 1.28M labeled ImageNet images.Noise injection: The "noisy" aspect involves applying data augmentation, dropout, and stochastic depth to the student during training, which helps in learning more robust features. Noisy Student pushed the boundaries of performance on challenging, large-scale datasets like ImageNet. It showed that SSL techniques could be beneficial even in scenarios with abundant labeled data, expanding the potential applications of SSL. The method also inspired further research into scalable SSL techniques and their application to improve state-of-the-art models in various domains. Noisy Student's success in outperforming supervised learning with substantial labeled data available marked a paradigm shift in how researchers and practitioners view the potential of semi-supervised learning techniques. Semi-Supervised Learning With Deep Generative Models (2014): Kingma et al. This seminal paper introduced a novel approach to semi-supervised learning using variational autoencoders (VAEs). The key innovation lies in how it combines generative and discriminative learning within a single framework. Central to this method is the combined loss function, which has two main components: Generative component: This part of the loss ensures that the model learns to reconstruct input data effectively, capturing the underlying data distribution p(x).Discriminative component: This part focuses on the classification task, optimizing for accurate predictions on labeled data. The combined loss function allows the model to simultaneously learn from both labeled and unlabeled data. For labeled data, both components are used. For unlabeled data, only the generative component is active, but it indirectly improves the discriminative performance by learning better representations. This approach addressed the problem of leveraging unlabeled data to improve classification performance, especially when labeled data is scarce. It opened up new directions for using deep generative models in SSL. The method also demonstrated how generative models could improve discriminative tasks, bridging the gap between unsupervised and supervised learning and inspired a wealth of subsequent research at the intersection of generative modeling and semi-supervised learning. This work laid the foundation for many future developments in SSL, showing how deep generative models could be effectively utilized to extract useful information from unlabeled data for classification tasks. Examples of Application Semi-supervised learning has led to significant advances across various domains, demonstrating its versatility and effectiveness in handling large amounts of unlabeled data. Here are some notable applications: 1. Speech Recognition In 2021, Meta (formerly Facebook) used self-training with SSL on a base model trained with 100 hours of labeled audio and 500 hours of unlabeled data. This approach reduced the word error rate by 33.9%, showcasing SSL's potential in improving speech recognition systems. 2. Web Content Classification Search engines like Google employ SSL to classify web content and improve search relevance. This application is crucial for handling the vast and constantly growing volume of web pages, enabling more accurate and efficient content categorization. 3. Text Document Classification SSL has proven effective in building text classifiers. For instance, the SALnet text classifier developed by Yonsei University utilizes deep learning neural networks like LSTM for tasks such as sentiment analysis. This demonstrates SSL's capability in managing large, unlabeled datasets in natural language processing tasks. 4. Medical Image Analysis In 2023, researchers at Stanford University utilized SSL techniques to enhance the accuracy of brain tumor segmentation in MRI scans. By leveraging a small set of labeled images alongside a larger pool of unlabeled data, they achieved a 15% improvement in tumor detection accuracy compared to fully supervised methods. This application highlights SSL's potential in medical imaging, where labeled data is often scarce and expensive to obtain, but unlabeled data is abundant. Conclusion Semi-supervised learning (SSL) has established itself as a crucial machine learning technique, effectively bridging the gap between the abundance of unlabeled data and the scarcity of labeled data. By ingeniously combining supervised and unsupervised learning approaches, SSL offers a pragmatic and efficient solution to the perennial challenge of data labeling. This article has delved into various SSL methodologies, from the foundational consistent regularization techniques like Temporal Ensembling to cutting-edge approaches such as FixMatch and Noisy Student. The versatility of SSL is prominently displayed in its successful implementation across a wide spectrum of domains, including speech recognition, web content classification, and text document analysis. In an era where data generation far outpaces our ability to label it, SSL emerges as a pivotal development in Machine Learning, empowering researchers and practitioners to harness the potential of vast unlabeled datasets. As we look to the future, SSL is poised to assume an even more significant role in the AI and machine learning landscape. While challenges persist, such as enhancing performance with extremely limited labeled data and adapting SSL techniques to more intricate real-world scenarios, the field's rapid advancements suggest a trajectory of continued innovation. These developments may lead to groundbreaking approaches in model training and data interpretation The core principles of SSL are likely to influence and intersect with other burgeoning areas of machine learning, including few-shot learning and self-supervised learning. This cross-pollination of ideas promises to further expand SSL's impact and potentially reshape our understanding of learning from limited labeled data. SSL represents not just a set of techniques, but a paradigm shift in how we approach the fundamental problem of learning from data. As it continues to evolve, SSL may well be the key to unlocking the full potential of the vast, largely unlabeled data resources that characterize our digital age. More

Overview of Classical Time Series Analysis: Techniques, Applications, and Models

By Salman Khan

CORE

Time series data represents a sequence of data points collected over time. Unlike other data types, time series data has a temporal aspect, where the order and timing of the data points matter. This makes time series analysis unique and requires specialized techniques and models to understand and predict future patterns or trends. Applications of Time Series Modeling Time series modeling has a wide range of applications across various fields including: Economic and financial forecasting: Predicting future stock prices, volatility, and market trends; Forecasting GDP, inflation, and unemployment ratesRisk management: Assessing and managing financial risk through Value at Risk (VaR) modelsWeather forecasting: Predicting short-term weather conditions such as temperature and precipitationClimate modeling: Analyzing long-term climate patterns and predicting climate change impactsEpidemiology: Tracking and predicting the spread of diseasesPatient monitoring: Analyzing vital signs and predicting health events such as heart attacksDemand forecasting: Predicting electricity and gas consumption to optimize production and distributionCustomer behavior analysis: Understanding and predicting customer purchasing patternsPredictive maintenance: Forecasting equipment failures to perform maintenance before breakdowns occur Time Series Characteristics Time series data are characterized by: Trend: A long-term increase or decrease in the dataSeasonality: Influences from seasonal factors, such as the time of year or day of the week, occurring at fixed and known periodsCyclic patterns: Rises and falls that do not occur at a fixed frequency, usually driven by economic conditions and often linked to the "business cycle," typically lasting at least two years [1]. AirPassengers Time Series (1949-1960): This plot illustrates the monthly number of passengers on a US airline from 1949 to 1960. The blue line represents the original data, showing an increasing trend in air travel over the period. The green dashed line indicates the trend component, while the red dashed line depicts the seasonal component, highlighting the recurring patterns in passenger numbers across different months. In addition to standard descriptive statistical measures of central tendency (mean, median, mode) and variance, time series is defined by its temporal dependence. Temporal dependence is measured through auto-correlation and partial auto-correlation, which help identify the relationships between data points over time and are essential for understanding patterns and making accurate forecasts. Auto-Correlation and Partial Auto-Correlation Auto-correlation and partial auto-correlation are statistical measures used in time series analysis to understand the relationship between data points in a sequence. Auto-correlation measures the similarity between a data point and its lagged versions. It quantifies the correlation between a data point and previous data points in the sequence. Auto-correlation helps identify patterns and dependencies in the data over time and is often visualized using a correlogram, a plot of the correlation coefficients against the lag. Partial auto-correlation measures the correlation between a data point and its lagged versions while controlling for the influence of intermediate data points. It identifies the direct relationship between a data point and its lagged versions, excluding the indirect relationships mediated by other data points. Partial auto-correlation is also visualized using a correlogram. Both auto-correlation and partial auto-correlation are useful in time series analysis for several reasons: Identifying seasonality: Auto-correlation can help detect repeating patterns or seasonality in the data. Significant correlation at a specific lag suggests the data exhibits a repeating pattern at that interval.Model selection: Auto-correlation and partial auto-correlation guide the selection of appropriate models for time series forecasting. By analyzing the patterns in the correlogram, you can determine the order of autoregressive (AR) and moving average (MA) components in models like ARIMA (AutoRegressive Integrated Moving Average). White Noise Time series that show no autocorrelation are called white noise [1]. In other words, the values in a white noise series are independent and identically distributed (i.i.d.), with no predictable pattern or structure. A white noise series has the following properties: Zero mean: The average of the series is zero.Constant variance: The variance of the series remains the same over time.No auto-correlation: The auto-correlation at any lag is zero, indicating no predictable relationship between the data points. White noise is crucial in validating the effectiveness of time series models. If the residuals from a model are not white noise, it suggests that there are patterns left in the data that the model has not captured, indicating the need for a more complex or different model. Seasonality and Cycles Seasonality refers to the regular patterns or fluctuations in time series data that occur at fixed intervals within a year, such as daily, weekly, monthly, or quarterly. Seasonality is often caused by external factors like weather, holidays, or economic cycles. Seasonal patterns tend to repeat consistently over time. How To Identify Seasonality in Time-Series Models Seasonality in time series can be identified by analyzing ACF plots: Periodic peaks: Observing peaks in the ACF plot at regular intervals indicates a seasonal lag. For instance, when analyzing monthly data for yearly seasonality, peaks typically appear at lags 12, 24, 36, and so on. Similarly, quarterly data would show peaks at lags 4, 8, 12, etc.Significant peaks: Assessing the magnitude of auto-correlation coefficients at seasonal lags helps identify strong seasonal patterns. Higher peaks at seasonal lags compared to others suggest significant seasonality in the data.Repetitive patterns: Checking for repetitive patterns in the ACF plot aligned with the seasonal frequency reveals periodicity. Seasonal trends often exhibit repeated patterns of auto-correlation coefficients at seasonal lags.Alternating positive and negative correlations: Occasionally, observing alternating positive and negative auto-correlation coefficients at seasonal lags indicates a seasonal pattern.Partial Auto-correlation Function (PACF): Complementing the analysis with PACF helps pinpoint the direct influence of a lag on the current observation, excluding indirect effects through shorter lags. Significant spikes in PACF at seasonal lags further confirm seasonality in the data. By carefully examining the ACF/PACF plot for these indicators, one can infer the presence of seasonal trends in time series data. Additionally, spectral analysis and decomposition methods (e.g., STL decomposition) can also be used to identify and separate seasonal components from the data. This understanding is crucial for selecting appropriate forecasting models and devising strategies to manage seasonality effectively. ACF and PACF for AirPassenger Time Series: The plots above show the ACF and PACF correlograms for the airline passenger data. The ACF displays high values for the first few lags, which gradually decrease while remaining significant for many lags. This indicates a strong autocorrelation in the data, suggesting that past values have a significant influence on future values. In the PACF plot, significant peaks occur at lags 12, 24, etc., indicating a yearly seasonality effect in the data. Cycles, on the other hand, refer to fluctuations in a time series that are not of fixed frequency or period. They are typically longer-term patterns, often spanning several years, and are not as precisely defined as seasonal patterns. Cycles can be influenced by economic factors, business cycles, or other structural changes in the data. In summary, while both seasonality and cycles involve patterns of variation in time series data, seasonality repeats at fixed intervals within a year. In contrast, cycles represent longer-term fluctuations that may not have fixed periodicity. Stationarity Stationarity in time series data implies that statistical characteristics, such as mean, variance, and covariance, remain consistent over time. This stability is crucial for various time-series modeling techniques as it simplifies the underlying dynamics, facilitating accurate analysis, modeling, and forecasting. There are two primary types of stationarity: Stationarity is a key concept in time series analysis, as many statistical models assume the data's properties do not change over time. Non-stationary data can lead to unreliable forecasts and spurious relationships, making it crucial to achieve stationarity before modelling. Why Is Stationarity Important? Non-stationary time series can be problematic for several reasons: Difficulty in modeling: Non-stationary time series violates the assumptions of many statistical models, making it challenging to model and forecast future values accurately. Models like ARIMA (AutoRegressive Integrated Moving Average) assume stationarity, so non-stationary data can lead to unreliable predictions.Spurious regression: Non-stationary time series can result in spurious regression, where two unrelated variables appear to be strongly correlated. This can lead to misleading conclusions and inaccurate interpretations of the relationship between variables.Inefficient parameter estimation: Non-stationary time series can lead to inefficient parameter estimation. The estimates of model parameters may have large standard errors, reducing the precision and reliability of the estimated coefficients. Dickey-Fuller Test and Augmented Dickey-Fuller Test The Dickey-Fuller Test and the Augmented Dickey-Fuller Test are statistical tests used to determine if a time series dataset is stationary or not. They test for the presence of a unit root, which indicates non-stationarity. A unit root suggests that shocks to the time series have a permanent effect, meaning the series does not revert to a long-term mean. Limitations: These tests can be sensitive to the choice of lag length and may have low power in small samples. It's essential to interpret the results alongside other diagnostic checks carefully. ADF Test on AirPassengers Time seriesTest Result Python from statsmodels.tsa.stattools import adfuller output = adfuller(data['Passengers']) result = { 'ADF Statistic': round(output[0],3), 'p-value': round(output[1],3), 'Critical Values (1%)': round(output[4]['1%'],3), 'Critical Values (5%)': round(output[4]['5%'],3), 'Critical Values (10%)': round(output[4]['10%'],3),} print(pd.Series(result)) ADF Statistic 0.815 p-value 0.992 Critical Values (1%) -3.482 Critical Values (5%) -2.884 Critical Values (10%) -2.579 Given the high p-value (0.992) and the fact that ADF statistic (0.815) is greater than the critical values, we fail to reject the null hypothesis. Therefore, there is strong evidence to suggest that the time series is non-stationary and possesses a unit root. How To Make a Time Series Stationary if It Is Not Stationary Differencing: For example, First-Order Differencing involves subtracting the previous observation from the current observation. If the time series has seasonality, seasonal differencing can be applied.Transformations: Techniques like logarithm, square root, or Box-Cox can stabilize the variance.Decomposition: Decomposing the time series into trend, seasonal, and residual components.Detrending: For instance, subtracting the Rolling Mean or fitting and removing a Linear Trend. It is important to identify and address non-stationarity in time series analysis to ensure reliable and accurate modeling and forecasting. Modeling Univariate Time Series Wold Representation Theorem The Wold decomposition theorem states that any covariance stationary process can be decomposed into two mutually uncorrelated components. The first component is a linear combination of past values of a white noise process, while the second component consists of a process whose future values can be precisely predicted by a linear function of past observations. The Wold theorem is fundamental in time series analysis, providing a framework for understanding and modelling stationary time series. Lag Operator The lag operator (L) helps to succinctly represent the differencing operations. It shifts a time series back by a one-time increment. Exponential Smoothing Exponential smoothing is a time series forecasting technique that applies weighted averages to past observations, giving more weight to recent observations while exponentially decreasing the weight for older observations. This method is useful for making short-term forecasts and smoothing out irregularities in the data. Simple Exponential Smoothing Simple exponential smoothing is a technique where the forecast for the next period is calculated as a weighted average of the current period's observation and the previous prediction. This technique is suitable for time series data without trend or seasonality. ARMA (AutoRegressive Moving Average) Model The ARMA model is a popular time series model that combines both autoregressive (AR) and moving average (MA) components. It is used to forecast future values of a time series based on its past values. The autoregressive (AR) component of the ARMA model represents the linear relationship between the current observation and a certain number of lagged observations. It assumes that the current value of the time series is a linear combination of its past values. The order of the autoregressive component, denoted by p, determines the number of lagged observations included in the model. The moving average (MA) component of the ARMA model represents the linear relationship between the current observation and a certain number of lagged forecast errors. It assumes that the current value of the time series is a linear combination of the forecast errors from previous observations. The order of the moving average component, denoted by q, determines the number of lagged forecast errors included in the model. The ARMA model can be represented by the following equation: The ARMA model can be estimated using various methods, such as maximum likelihood estimation or least squares estimation. ARIMA Model ARIMA includes an integration term, denoted as the "I" in ARIMA, which accounts for non-stationarity in the data. ARIMA models handle non-stationary data by differencing the series to achieve stationarity. In ARIMA models, the integration order (denoted as "d") specifies how many times differencing is required to achieve stationarity. This is a parameter that needs to be determined or estimated from the data. ARMA models do not involve this integration order parameter since they assume stationary data. SARIMA Model SARIMA stands for Seasonal AutoRegressive Integrated Moving Average. It is an extension of the ARIMA model that incorporates seasonality into the modelling process. SARIMA models are particularly useful when dealing with time series data that exhibit seasonal patterns. SARMIAX The SARIMAX model is defined by the parameters (p, d, q) and (P, D, Q, s): (p, d, q): These are the non-seasonal parameters. p: The order of the non-seasonal AutoRegressive (AR) partd: The number of non-seasonal differences needed to make the series stationaryq: The order of the non-seasonal Moving Average (MA) part(P, D, Q, s): These are the seasonal parameters. P: The order of the seasonal AutoRegressive (AR) part.D: The number of seasonal differences needed to make the series stationary.Q: The order of the seasonal Moving Average (MA) part.s: The length of the seasonal cycle (e.g., s=12 for monthly data with yearly seasonality).Exogenous Variables (X): These are external variables that can influence the time series but are not part of the series itself. For example, economic indicators or weather data might be included as exogenous variables. SARIMAX Model For Air Passenger Time Series To identify the optimal order of a Seasonal AutoRegressive Integrated Moving Average (SARIMA) model, the auto_arima function from the pmdarima [4] library was utilized. It automates the identification of optimal parameters for the SARIMAX model. Python Sarimax_model = auto_arima(train_data, start_p=0, start_q=0, max_p=6, max_q=6, max_d=12, seasonal=True, m=12, # Seasonal period (e.g., 12 for monthly data with yearly seasonality) start_P=0, start_Q=0, max_P=25, max_Q=25, d=None, D=None, max_D=25, trace=True, error_action='ignore', suppress_warnings=True, stepwise=True, random = True, n_fits = 10, information_criterion = 'aic') Sarimax_model.summary() As per the auto_arima grid search, the optimal order for model is: SARIMAX(1, 1, 0)x(0, 1, 0, 12). Prediction and Forecasting using SARIMAX(1, 1, 0)x(0, 1, 0, 12) Modeling Volatility The volatility of a time series refers to the degree of variation or dispersion in the series over time. It is a measure of how much the series deviates from its average or expected value. Volatility is particularly relevant in financial markets but can also apply to other types of time series data where variability is important to understand or predict. It is often measured as the annualized standard deviation change in price or value of financial security; e.g., for asset price volatility, which is computed as follows [2]: Simple Methods to Model Volatility ARCH (Autoregressive Conditional Heteroskedasticity) Model ARCH models are a class of models used in econometrics and financial econometrics to analyze time series data, particularly in the context of volatility clustering. These models are designed to capture the time-varying volatility or heteroskedasticity in financial time series data, where the volatility of the series may change over time. In statistics, a sequence of random variables is homoscedastic if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as heterogeneity of variance [3]. The basic idea behind ARCH models is that the variance of a time series can be modeled as a function of its past values, along with possibly some exogenous variables. In other words, the variance at any given time is conditional on the past observations of the series. ARCH (1) model derivation: ARCH (p) model: GARCH (Generalized Autoregressive Conditional Heteroskedasticity) Model The GARCH model is an extension of the ARCH Model. It models time series as a function of previous values as well as volatility. GARCH for S&P 500 volatility: Daily data and forecasts for the S&P 500 ETF using a GARCH(2,2) model. The top plot illustrates the actual closing prices of SPY from December 1992 to August 2024. The bottom plot compares daily returns with those forecasted by the GARCH(2,2) model (trained on last 10 years data). The primary y-axis displays actual returns and the model’s predicted mean returns, while the secondary y-axis shows the predicted standard deviation, reflecting forecasted volatility. Review: ARMA and GARCH AR/ARMA models: Best suited for stationary time series data, where statistical properties like mean and variance are constant over time; Useful for short-term forecasting, ARMA models combine both autoregressive (AR) and moving average (MA) components to capture the dynamics influenced by past values and past forecast errors.AR models: Used when the primary relationship in the data is between the current value and its past values; Suitable for time series where residuals show no significant autocorrelation pattern, indicating that past values alone sufficiently explain the current observations.ARMA models: Employed when both past values and past forecast errors significantly influence the current value; This combination provides a more comprehensive model for capturing complex dynamics in time series data.ARCH models: Best suited for time series data with volatility clustering but lacking long-term persistence; ARCH models capture bursts of high and low volatility effectively by modelling changing variance over time based on past errors.GARCH models: Extend ARCH models by incorporating past variances, allowing them to handle more persistent volatility; GARCH models are better at capturing long-term dependencies in financial time series data, making them suitable for series with sustained periods of high or low volatility. A comparison of time series models: AR(1), ARMA(1,1), ARCH(1), and GARCH(1,1). Model Selection When analyzing time series data, selecting the appropriate model (e.g., AR vs ARMA) and determining the model's order is crucial for making accurate predictions. Several methods can be used for model selection: Conclusion Time series analysis is a critical tool for understanding and predicting temporal data patterns across various fields, from finance and economics to healthcare and climate science. A solid grasp of classical time series models, such as ARMA, ARIMA, SARIMA, ARCH, and GARCH, alongside fundamental concepts like stationarity, auto-correlation, and seasonality, is crucial for developing and fine-tuning more advanced methodologies. Classical models provide foundational insights into time series behavior, guiding the application of more sophisticated techniques. Mastery of these basics not only enhances understanding of complex models but also ensures that forecasting methods are robust and reliable. By leveraging the principles and performance benchmarks established through classical models, practitioners can optimize advanced approaches, such as machine learning algorithms, deep learning, and hybrid models, leading to more accurate predictions and better-informed decision-making. References [1] Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. Accessed on June 15, 2024. [2] MIT, Topics in mathematics with application in finance. [3] Homoscedasticity and heteroscedasticity. Accessed on June 29, 2024. [4] Project description: pmdarima More

Building an LLM-Powered Product to Learn the AI Stack: Part 1

By Indrajit Bhattacharya

Benchmarking OpenAI Models for Automated Error Resolution

By Reilly Oldham

In-Depth Understanding of Vector Search for RAG and Generative AI Applications

By Mohammed Talib

From Zero to AI Hero, Part 1: Jumpstart Your Journey With Semantic Kernel

GenAI and Coding: A Short Story When I first heard about the Semantic Kernel, I was confused by the term. As a former C++ guy who has worked with Operating Systems, I thought it had something to do with it. Though I have followed the Generative AI landscape, I felt most of it was the initial hype (Garner's hype curve). Predicting tokens was lovely, but I never thought it would influence and create usable bots that could integrate with existing applications. Then, there comes LangChain, which immediately became one of the fastest-growing open-source projects of 2023. LangChain is an AI orchestrator that helps developers sprinkle the magic of AI into their new and existing applications. LangChain came in Python, Java, and JavaScript, and there was a void for C# folks. Semantic Kernel filled this gap: although it differs from LangChain in principle and style, it was essentially built to make the .NET folks happy. Icing on the cake? It also supports Java and Python. Semantic Kernel What Is Semantic Kernel? In the evolving landscape of AI, integrating intelligent, context-aware functionalities into applications has become more crucial than ever. Microsoft’s Semantic Kernel (SK) is a robust framework that allows developers to embed AI capabilities, such as natural language processing, into .NET applications. Whether you’re looking to build chatbots, automate workflows, or enhance decision-making processes, Semantic Kernel provides a robust foundation. Semantic Kernel is an extensible framework that leverages AI models like OpenAI's GPT to enable natural language understanding and generation within .NET applications. It provides a set of APIs and tools that allow developers to create AI-driven agents capable of processing text, generating content, and meaningfully interacting with users. At its heart, Semantic Kernel focuses on "plugins" — modular, reusable components that encapsulate specific capabilities such as understanding user intent, summarizing content, or generating responses. These can be combined and orchestrated to build sophisticated AI-driven applications. Why Semantic Kernel? LangChain is excellent, and Semantic Kernel is equally fabulous. Choosing one over the other should depend on your style, programming languages, or specific use cases. For example, if you need to integrate Gen AI capabilities into a browser-based solution such as ReactJS/Angular/Vue or vanilla web application, I would use LangChain, as it supports JavaScript. Are We Only Going to Talk About Sematic Kernel? No: in this multi-part series, though the primary focus would be on Semantic Kernel, we will still explore use cases of LangChain and use it as a cousin to Semantic Kernel for specific scenarios and use cases. Enough talk! Let's build something with SK! Prerequisites Before diving into the code, ensure you have the following prerequisites: .NET 7.0 SDK or later: Download it from the .NET website.Visual Studio 2022: Ensure the ASP.NET Core workload is installed.Azure AI Model: It is possible to use OpenAI or Other models directly, but for this series, I will stick to AI models that are deployed in Azure as I have enough Azure credits as an MS MVP (P.S.: If you plan to use OpenAI’s models, you’ll need an API key, which you can obtain from the OpenAI website.) Setting Up Your Project The first step in integrating Semantic Kernel into your application is to set up the environment. Let’s start with a simple console application and then walk through adding Semantic Kernel to the mix. 1. Create a New Console Project It's fine if you prefer to create a new console application using Visual Studio or VS Code. Shell dotnet new console -n sk-console cd sk-console 2. Add the Semantic Kernel NuGet Package Add Semantic Kernel NuGet package using the following command: Shell dotnet add package Microsoft.SemanticKernel 3. Setup Semantic Kernel Open your Program.cs file and configure the Semantic Kernel service. This service will handle interactions with the AI model. Please take a look at the function AddAzureOpenAIChatCompletion(). As the name suggests, this function helps us integrate Open AI chat completion using the Open AI model hosted in Azure onto our Semantic Kernel Builder. The parameters' values are from my already deployed gtp-4o model on Azure AI Studio. I will write a separate article on deploying AI models using Azure AI Studio and link it here later. C# var builder = Kernel.CreateBuilder(); builder.AddAzureOpenAIChatCompletion( deploymentName: "<Your_Deployment_Name>", endpoint: "<Azure-Deployment-Endpoint-Ends-In:openai.azure.com>", apiKey: "<Your_API_Key>" ); var kernel = builder.Build(); Think of this KernelBuilder as similar to the ASP.NET Core HostBuilder. Before the Build() call, you would need to supply all of your plugin information(more on plugins later), so that SK would be aware of it. 4. Ask the First Question C# Console.WriteLine(await kernel.InvokePromptAsync("What is Gen AI?")); 5. Running the Application With everything configured, you’re ready to run your application. Run the below command in the terminal. Shell dotnet run 6. We Did It! All is well. Our Semantic Kernel configuration used the Deployed Azure Open AI model to answer our question. Hooray! I know, I know, this isn't much. But I still published the source code on GitHub here. This is the starting point, and we will build from here. Conclusion Semantic Kernel is a powerful tool for bringing advanced AI capabilities to your .NET applications. Following the steps outlined in this multi-part series, you can quickly get started with Semantic Kernel and integrate intelligent, context-aware functionalities into your projects. The possibilities are vast, from simple chatbots to complex, AI-driven workflows. As we dive deeper, remember that the key to effectively leveraging the Semantic Kernel is in how you define and orchestrate your skills. With a solid understanding of these basics, you're well on your way to building the next generation of intelligent applications. What's Next? Are we done? No. Now that we know how to add Semantic Kernel in a .NET Application, it is time to take this flight off the ground. We will dig deeper and deeper as we go along with this multi-part series. In "Part 2: Understanding Plugins in Semantic Kernel, A Deep Dive," we will dive deeper into plugins in the semantic kernel. We wouldn't stop there, in the following parts, we will discuss agents, local SLMs and Ollama, Semantic Kernel on ASP.NET Core applications, mixing SK with AutoGen and LangChain, and more.

By Aneesh Gopalakrishnan

Integrate Spring With Open AI

In this article, I will discuss in a practical and objective way the integration of the Spring framework with the resources of the OpenAI API, one of the main artificial intelligence products on the market. The use of artificial intelligence resources is becoming increasingly necessary in several products, and therefore, presenting its application in a Java solution through the Spring framework allows a huge number of projects currently in production to benefit from this resource. All of the code used in this project is available via GitHub. To download it, simply run the following command: git clone https://github.com/felipecaparelli/openai-spring.git or via SSL git clone. Note: It is important to notice that there is a cost in this API usage with the OpenAI account. Make sure that you understand the prices related to each request (it will vary by tokens used to request and present in the response). Assembling the Project 1. Get API Access As defined in the official documentation, first, you will need an API key from OpenAI to use the GPT models. Sign up at OpenAI's website if you don’t have an account and create an API key from the API dashboard. Going to the API Keys page, select the option Create new secret key. Then, in the popup, set a name to identify your key (optional) and press Create secret key.Now copy the API key value that will be used in your project configuration. 2. Configure the Project Dependencies The easiest way to prepare your project structure is via the Spring tool called Spring Initializr. It will generate the basic skeleton of your project, add the necessary libraries, the configuration, and also the main class to start your application. You must select at least the Spring Web dependency. In the Project type, I've selected Maven, and Java 17. I've also included the library httpclient5 because it will be necessary to configure our SSL connector. Follow the snipped of the pom.xml generated: XML <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>3.3.2</version> <relativePath/>  </parent> <groupId>br.com.erakles</groupId> <artifactId>spring-openai</artifactId> <version>0.0.1-SNAPSHOT</version> <name>spring-openai</name> <description>Demo project to explain the Spring and OpenAI integration</description> <properties> <java.version>17</java.version> <spring-ai.version>1.0.0-M1</spring-ai.version> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.apache.httpcomponents.client5</groupId> <artifactId>httpclient5</artifactId> <version>5.3.1</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project> 3. Basic Configuration On your configuration file (application.properties), set the OpenAI secret key in the property openai.api.key. You can also replace the model version on the properties file to use a different API version, like gpt-4o-mini. Properties files spring.application.name=spring-openai openai.api.url=https://api.openai.com/v1/chat/completions openai.api.key=YOUR-OPENAI-API-KEY-GOES-HERE openai.api.model=gpt-3.5-turbo A tricky part about connecting with this service via Java is that it will, by default, require your HTTP client to use a valid certificate while executing this request. To fix it we will skip this validation step. 3.1 Skip the SSL validation To disable the requirement for a security certificate required by the JDK for HTTPS requests you must include the following modifications in your RestTemplate bean, via a configuration class: Java import org.apache.hc.client5.http.classic.HttpClient; import org.apache.hc.client5.http.impl.classic.HttpClients; import org.apache.hc.client5.http.impl.io.BasicHttpClientConnectionManager; import org.apache.hc.client5.http.socket.ConnectionSocketFactory; import org.apache.hc.client5.http.socket.PlainConnectionSocketFactory; import org.apache.hc.client5.http.ssl.NoopHostnameVerifier; import org.apache.hc.client5.http.ssl.SSLConnectionSocketFactory; import org.apache.hc.core5.http.config.Registry; import org.apache.hc.core5.http.config.RegistryBuilder; import org.apache.hc.core5.ssl.SSLContexts; import org.apache.hc.core5.ssl.TrustStrategy; import org.springframework.boot.web.client.RestTemplateBuilder; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.http.client.HttpComponentsClientHttpRequestFactory; import org.springframework.web.client.RestTemplate; import javax.net.ssl.SSLContext; @Configuration public class SpringOpenAIConfig { @Bean public RestTemplate secureRestTemplate(RestTemplateBuilder builder) throws Exception { // This configuration allows your application to skip the SSL check final TrustStrategy acceptingTrustStrategy = (cert, authType) -> true; final SSLContext sslContext = SSLContexts.custom() .loadTrustMaterial(null, acceptingTrustStrategy) .build(); final SSLConnectionSocketFactory sslsf = new SSLConnectionSocketFactory(sslContext, NoopHostnameVerifier.INSTANCE); final Registry<ConnectionSocketFactory> socketFactoryRegistry = RegistryBuilder.<ConnectionSocketFactory> create() .register("https", sslsf) .register("http", new PlainConnectionSocketFactory()) .build(); final BasicHttpClientConnectionManager connectionManager = new BasicHttpClientConnectionManager(socketFactoryRegistry); HttpClient client = HttpClients.custom() .setConnectionManager(connectionManager) .build(); return builder .requestFactory(() -> new HttpComponentsClientHttpRequestFactory(client)) .build(); } } 4. Create a Service To Call the OpenAI API Now that we have all of the configuration ready, it is time to implement a service that will handle the communication with the ChatGPT API. I am using the Spring component RestTemplate, which allows the execution of the HTTP requests to the OpenAI endpoint. Java import org.springframework.beans.factory.annotation.Value; import org.springframework.http.HttpEntity; import org.springframework.http.HttpHeaders; import org.springframework.http.HttpMethod; import org.springframework.http.MediaType; import org.springframework.stereotype.Service; import org.springframework.web.client.RestTemplate; @Service public class JavaOpenAIService { @Value("${openai.api.url}") private String apiUrl; @Value("${openai.api.key}") private String apiKey; @Value("${openai.api.model}") private String modelVersion; private final RestTemplate restTemplate; public JavaOpenAIService(RestTemplate restTemplate) { this.restTemplate = restTemplate; } /** * @param prompt - the question you are expecting to ask ChatGPT * @return the response in JSON format */ public String ask(String prompt) { HttpEntity<String> entity = new HttpEntity<>(buildMessageBody(modelVersion, prompt), buildOpenAIHeaders()); return restTemplate .exchange(apiUrl, HttpMethod.POST, entity, String.class) .getBody(); } private HttpHeaders buildOpenAIHeaders() { HttpHeaders headers = new HttpHeaders(); headers.set("Authorization", "Bearer " + apiKey); headers.set("Content-Type", MediaType.APPLICATION_JSON_VALUE); return headers; } private String buildMessageBody(String modelVersion, String prompt) { return String.format("{ \"model\": \"%s\", \"messages\": [{\"role\": \"user\", \"content\": \"%s\"}]}", modelVersion, prompt); } } 5. Create Your REST API Then, you can create your own REST API to receive the questions and redirect it to your service. Java import org.springframework.http.ResponseEntity; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.RestController; import br.com.erakles.springopenai.service.SpringOpenService; @RestController public class SpringOpenAIController { private final SpringOpenService springOpenService; SpringOpenAIController(SpringOpenService springOpenService) { this.springOpenService = springOpenService; } @GetMapping("/chat") public ResponseEntity<String> sendMessage(@RequestParam String prompt) { return ResponseEntity.ok(springOpenService.askMeAnything(prompt)); } } Conclusion These are the steps required to integrate your web application with the OpenAI service, so you can improve it later by adding more features like sending voice, images, and other files to their endpoints. After starting your Spring Boot application (./mvnw spring-boot:run), to test your web service, you must run the following URL http://localhost:8080/ask?promp={add-your-question}. If you did everything right, you will be able to read the result on your response body as follows: JSON { "id": "chatcmpl-9vSFbofMzGkLTQZeYwkseyhzbruXK", "object": "chat.completion", "created": 1723480319, "model": "gpt-3.5-turbo-0125", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Scuba stands for \"self-contained underwater breathing apparatus.\" It is a type of diving equipment that allows divers to breathe underwater while exploring the underwater world. Scuba diving involves using a tank of compressed air or other breathing gas, a regulator to control the flow of air, and various other accessories to facilitate diving, such as fins, masks, and wetsuits. Scuba diving allows divers to explore the underwater environment and observe marine life up close.", "refusal": null }, "logprobs": null, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 90, "total_tokens": 102 }, "system_fingerprint": null } I hope this tutorial helped in your first interaction with the OpenAI and makes your life easier while diving deeper into your AI journey. If you have any questions or concerns don't hesitate to send me a message.

By Felipe Caparelli

Day in the Life of a Developer With Google’s Gemini Code Assist: Part 1

I started evaluating Google's Gemini Code Assist development in December 2023, almost about its launch time. The aim of this article is to cover its usage and impact beyond basic code generation on all the activities that a developer is supposed to do in his daily life (especially with additional responsibilities entrusted to developers these days with the advent of "Shift-Left" and full stack development roles). Gemini Code Assist Gemini Code Assist can be tried at no cost until November 2024. These are the core features it offered at the time of carrying out this exercise: AI code assistanceNatural language chatAI-powered smart actionsEnterprise security and privacy Refer to the link for more details and pricing: Gemini Code Assist. Note Gemini Code Assist was formerly known as Duet AI.The entire content of the study has been divided into two separate articles. Interested readers should go through both of them in sequential order. The second part will be linked following its publication.This review expresses a personal view specific to Gemini Code Assist only. As intelligent code assist is an evolving field, the review points are valid based on features available at the time of carrying out this study. Gemini Code Assist Capabilities: What’s Covered in the Study as per Features Availability Gemini ProCode Customization Code transformations ✓ Available for all users ✓ Local Context from Relevant files in local folderx Use Natural Language to modify existing code e.g java 8 to Java 21 ✓ Chat × Remote Context from private codebases ✓ Improve Code Generations ✓ Smart ActionsNote: 1. Items marked with x will be available in future releases of Gemini Code Assist. 2. Code Transformations is not released publicly and is in preview at the time of writing. Technical Tools Below are technical tools used for different focus areas during the exercise. The study is done on the specified tools, languages, and frameworks below, but the results can be applicable to other similar modern languages and frameworks with minor variations. Focus AreasTools Language and FrameworkJava 11 & 17; Spring Boot 2.2.3 & 3.2.5DatabasePostgresTestingJunit, MockitoIDE and Plugins VS Studio Code with extensions: Cloud Code ExtensionGemini Code AssistCloud Platform GCP with Gemini API Enabled on a project (Pre-requisite) Docker Cloud SQL (Postgres) Cloud Run Development Lifecycle Stages and Activities For simplicity, the entire development lifecycle has been divided into different stages (below) encompassing different sets of activities that developers would normally do. For each lifecycle stage, some activities were selected and tried out in VS Code Editor using Gemini Code Assist. S.No#stageActivities1Bootstrapping Gain deeper Domain Understanding via Enterprise Knowledge Base: Confluence, Git repos, etc.Generate Scaffolding Code for Microservices: Controller, services, repository, modelsPre-Generated Templates for Unit and Integration TestsDatabase Schema: Table creation, relationships, scripts test-data population2Build and Augment Implement Business Logic/Domain RulesLeverage Implementation Patterns: e.g., Configuration Mgt, Circuit Breaker, etc.Exception and Error HandlingLogging/MonitoringOptimized Code for performance: asynchronous, time-outs, concurrency, non-blocking, remove boilerplate3Testing and Documentation Debugging: Using Postman to test API EndpointsUnit/Integration TestsOpen API Specs CreationCode Coverage; Quality; Code SmellsTest Plan Creation4TroubleshootInvalid/No Responses or application errors5DeploymentDeploy Services to GCP Stack: Cloud Run/GKE/App Engine, Cloud SQL6OperateGet assistance modifying/upgrading existing application code and ensuring smooth operations Requirements Let's now consider a fictitious enterprise whose background and some functional requirements are given below. We will see to what extent Gemini Code Assist can help in fulfilling them. BackgroundFunctional Requirements A fictitious enterprise that moved to the cloud or adopted “cloud-native” a few years back: Domain: E-commerceLet’s keep discussion centric to “Microservices” using Spring Boot and Java Grappling with multi-fold technical challenges: Green Field (new microservices to be created Brown Field (breaking monolithic to microservices, integration with legacy systems) Iterative Development (incremental updates to microservices, upgrades, code optimization, patches) Allow List Products in CatalogAdd, Modify, and Delete Products in the CatalogRecommendation ServiceAsynchronous implementation to retrieve the latest priceQuery affiliated shops for a product and fetch the lowest price for a productBulk addition of products and grouping of processed results based on success and failure status Rules:A product will belong to a single category and a category may have many products. Let's Start with Stage 1, Bootstrapping, to gain deeper domain understanding. 1. Bootstrapping During this phase, developers will: Need more understanding of domain (i.e., e-commerce, in this case) from Enterprise Knowledge Management (Confluence, Git, Jira, etc.).Get more details about specific services that will need to be created.Get a viewpoint on the choice of tech stack (i.e., Java and Spring Boot) with steps to follow to develop new services. Let’s see how Gemini Code Assist can help in this regard and to what extent. Prompt: "I want to create microservices for an e-commerce company. What are typical domains and services that need to be created for this business domain" Note: Responses above by Gemini Code Assist: Chat are based on information retrieved from public online/web sources on which it is trained, and not retrieved from the enterprise’s own knowledge sources, such as Confluence. Though helpful, this is generic e-commerce information. In the future when Gemini Code Assist provides information more contextual to the enterprise, it will be more effective. Let’s now try to generate some scaffolding code for the catalog and recommendation service first as suggested by Code Assist. First, we will build a Catalog Service through Gemini Code Assist. A total of 7 steps along with code snippets were generated. Relevant endpoints for REST API methods to test the service are also provided once the service is up. Let's begin with the first recommended step, "Create a new Spring Boot project." Building Catalog Service, Step 1 Generate project through Spring Initializr: Note: Based on user prompts, Gemini Code Assist generates code and instructions to follow in textual form. Direct generation of files and artifacts is not supported yet. Generated code needs to be copied to files at the appropriate location. Building Catalog Service, Steps 2 and 3 Add dependency for JPA, and define Product and Category entities: Building Catalog Service, Step 4 Create Repository interfaces: Building Catalog Service, Step 5 Update Service layer: Building Catalog Service, Steps 6 and 7 Update the Controller and run the application: Building Catalog Service, Additional Step: Postgres Database Specific This step was not initially provided by Gemini Code Assist, but is part of an extended conversation/prompt by the developer. Some idiosyncrasies — for example, the Postgres database name — can not contain hyphens and had to be corrected before using the generated scripts. Building Through Gemini Code Assist vs Code Generators A counterargument to using Gemini Code Assist can be that a seasoned developer without Gemini Code Assist may be able to generate scaffolding code with JPAEntities quickly based on his past experience and familiarity with existing codebase using tools such as Spring Roo, JHipster, etc. However, there may be a learning curve, configuration, or approvals required before such tools can be used in an enterprise setup. The ease of use of Gemini Code Assist and the flexibility to cater to diverse use cases across domains makes it a viable option even for a seasoned developer, and it can, in fact, complement code-gen tools and be leveraged as the next step to initial scaffolding. 2. Build and Augment Now let's move to the second stage, Build and Augment, and evolve the product catalog service further by adding, updating, and deleting products generated through prompts. Generate a method to save the product by specifying comments at the service layer: Along similar lines to the product-catalog service, we created a Recommendation service. Each of the steps can be drilled down further as we did during the product-catalog service creation. Now, let's add some business logic by adding a comment and using Gemini Code Assist Smart Actions to generate code. Code suggestions can be generated not only by comment, but Gemini Code Assist is also intelligent enough to provide suggestions dynamically based on developer keyboard inputs and intent. Re-clicking Smart Actions can give multiple options for code. Another interactive option to generate code is the Gemini Code Assist Chat Feature. Let’s now try to change existing business logic. Say we want to return a map of successful and failed product lists instead of a single list to discern which products were processed successfully and which ones failed. Let's try to improve the existing method by an async implementation using Gemini Code Assist. Next, let's try to refactor an existing code by applying a strategy pattern through Gemini Code Assist. Note: The suggested code builds PricingStrategy for Shops; e.g., RandomPricing and ProductLength pricing. But, still, this is too much boilerplate code, so a developer, based on his experience, should probe further with prompts to reduce the boilerplate code. Let's try to reduce boilerplate code through Gemini Code Assist. Note: Based on the input prompt, the suggestion is to modify the constructor of the shop class to accept an additional function parameter for pricingstrategy using Lambdas. Dynamic behavior can be passed during the instantiation of Shop class objects. 3. Testing and Documentation Now, let's move to stage 3, testing and documentation, and probe Gemini Code Assist on how to test the endpoint. As per the response, Postman, curl, unit tests, and integration tests are some options for testing provided by Gemini Code Assist. Now, let's generate the payload from Gemini Code Assist to test the /bulk endpoint via Postman. Let's see how effective Gemini Code Assist generated payloads are by hitting the /bulk endpoint. Let's see if we can fix it with Gemini Code Assist so that invalid category IDs can be handled using product creation. Next, let's generate Open AI Specifications for our microservices using Gemini Code Assist. Note: Documenting APIs so that it becomes easy for API consumers to call and integrate these API(s) in their applications is a common requirement in microservices projects. However, it is often a time-consuming activity for developers. Swagger/Open API Specs is a common format followed to document REST APIs. Gemini Code Assist generated Open API Specs that matched the expectations in this regard. Next, we are generating unit test cases at the Controller layer. Following a similar approach, unit test cases can be generated at other layers; i.e., service and repository, too. Next, we ran the generated unit test cases and checked if we encountered any errors. 4. Troubleshooting While running this application, we encountered an error on table and an entity name mismatch, which we were able to rectify with Gemini Code Assist help. Next, we encountered empty results on the get products call when data existed in the products table. To overcome this issue, we included Lombok dependencies for missing getters and setters. Debugging: An Old Friend to the Developer’s Rescue The debugging skill of the developer will be handy, as there would be situations where results may not be as expected for generated code, resulting in hallucinations. We noted that a developer needs to be aware of concepts such as marshalling, unmarshalling, and annotations such as @RequestBody to troubleshoot such issues and then get more relevant answers from Gemini Code Assist. This is where a sound development background will come in handy. An interesting exploration in this area could be whether Code Assist tools can learn and be trained on issues that other developers in an enterprise have encountered during the development while implementing similar coding patterns. The API call to create a new product finally worked after incorporating the suggestion of adding @RequestBody. Handling exceptions in a consistent manner is a standard requirement for all enterprise projects. Create a new package for exceptions, a base class to extend, and other steps to implement custom exceptions. Gemini Code Assist does a good job of meeting this requirement. Handling specific exceptions such as "ProductNotFound": Part 1 Conclusion This concludes Part 1 of the article. In Part 2, I will cover the impact of Gemini Code Assist on the remainder of the lifecycle stages, Deployment and Operate; also, productivity improvements in different development lifecycle stages, and the next steps prescribed thereof.

By Aakash Sharma

Practitioner’s Guide to Deep Learning

Our world is undergoing an AI revolution powered by very deep neural networks. With the advent of Apple Intelligence and Gemini, AI has reached the hands of every human being with a mobile phone. Apart from consumer AI, we also have deep learning models being used in several industries like automobile, finance, medical science, manufacturing, etc. This has motivated many engineers to learn deep learning techniques and apply them to solve complex problems in their projects. In order to help these engineers, it becomes imperative to lay down certain guiding principles to prevent common pitfalls when building these black box models. Any deep learning project involves five basic elements: data, model architecture, loss functions, optimizer, and evaluation process. It is critical to design and configure each of these appropriately to ensure proper convergence of models. This article shall cover some of the recommended practices and common problems and their solutions associated with each of these elements. Data All deep-learning models are data-hungry and require several thousands of examples at a minimum to reach their full potential. To begin with, it is important to identify the different sources of data and devise a proper mechanism for selecting and labeling data if required. It helps to build some heuristics for data selection and gives careful consideration to balance the data to prevent unintentional biases. For instance, if we are building an application for face detection, it is important to ensure that there is no racial or gender bias in the data, as well as the data is captured under different environmental conditions to ensure model robustness. Data augmentations for brightness, contrast, lighting conditions, random crop, and random flip also help to ensure proper data coverage. The next step is to carefully split the data into train, validation, and test sets while ensuring that there is no data leakage. The data splits should have similar data distributions but identical, or very closely related samples should not be present in both train and test sets. This is important, as if train samples are present in the test set, then we may see high test performance metrics but still several unexplained critical issues in production. Also, data leakage makes it almost impossible to know if the alternate ideas for model improvement are bringing about any real improvement or not. Thus, a diverse, leak-proof, balanced test dataset representative of the production environment is your best safeguard to deliver a robust deep learning-based model and product. Model Architecture In order to get started with model design, it makes sense to first identify the latency and performance requirements of the task at hand. Then, one can look at open-source benchmarks like this one to identify some suitable papers to work with. Whether we use CNNs or transformers, it helps to have some pre-trained weights to start with, to reduce training time. If no pre-trained weights are available, then suitable model initialization for each model layer is important to ensure that the model converges in a reasonable time. Also, if the dataset available is quite small (a few hundred samples or less), then it doesn’t make sense to train the whole model, rather just the last few task-specific layers should be fine-tuned. Now, whether to use CNN, transformers, or a combination of them is very specific to the problem. For natural language processing, transformers have been established as the best choice. For vision, if the latency budget is very tight, CNNs are still the better choice; otherwise, both CNNs and transformers should be experimented with to get the desired results. Loss Functions The most popular loss function for classification tasks is the Cross Entropy Loss and for regression tasks are the L1 or L2 (MSE) losses. However, there are certain variations of them available for numerical stability during model training. For instance in Pytorch, BCEWithLogitsLoss combines the sigmoid layer and BCELoss into a single class and uses the log-sum-exp trick which makes it more numerically stable than a sigmoid layer followed by BCELoss. Another example is of SmoothL1Loss which can be seen as a combination of L1 and L2 loss and makes the L1 Loss smooth near zero. However, care must be taken when using smooth L1 Loss to set the beta appropriately as its default value of 1.0 may not be suitable for regressing values in sine and cosine domains. The figures below show the loss values for L1, L2 (MSE), and Smooth L1 losses and also the change in smooth L1 Loss value for different beta values. Optimizer Stochastic Gradient Descent with momentum has traditionally been a very popular optimizer among researchers for most problems. However, in practice, Adam is generally easier to use but suffers from generalization problems. Transformer papers have popularized the AdamW optimizer which decouples the weight-decay factor’s choice from the learning rate and significantly improves the generalization ability of Adam optimizer. This has made AdamW the optimal choice for optimizers these days. Also, it isn’t necessary to use the same learning rate for the whole network. Generally, if starting from a pre-trained checkpoint, it is better to freeze or keep a low learning rate for the initial layers and a higher learning rate for the deeper task-specific layers. Evaluation and Generalization Developing a proper framework for evaluating the model is the key to preventing issues in production. This should involve both quantitative and qualitative metrics for not only the full benchmark dataset but also for specific scenarios. This should be done to ensure that performance is acceptable in every scenario and there is no regression. Performance metrics should be carefully chosen to ensure that they appropriately represent the task to be achieved. For example, precision/recall or F1 score may be better than accuracy in many unbalanced problems. At times, we may have several metrics to compare alternate models, then it generally helps to come up with a single weighted metric that can simplify the comparison process. For instance, the nuScenes dataset introduced NDS (nuScenes Detection Score) which is a weighted sum of mAP (mean average precision), mATE (mean average translation error), mASE (mean average scale error), mAOE(mean average orientation error), mAVE(mean average velocity error) and mAAE(mean average attribute error) to simplify comparison of various 3D object detection models. Further, one should also visualize the model outputs whenever possible. This could involve drawing bounding boxes on input images for 2D object detection models or plotting cuboids on LIDAR point clouds for 3D object detection models. This manual verification ensures that model outputs are reasonable and there is no apparent pattern in model errors. Additionally, it helps to pay close attention to training and validation loss curves to check for overfitting or underfitting. Overfitting is a problem wherein validation loss diverges from training loss and starts increasing, representing that the model is not generalizing well. This problem can generally be fixed by adding proper regularization like weight-decay, drop-out layers, adding more data augmentation, or by using early stopping. Underfitting, on the other hand, represents the case where the model doesn’t have enough capacity to even fit the training data. This can be identified by the training loss not going down enough and/or remaining more or less flat over the epochs. This problem can be addressed by adding more layers to the model, reducing data augmentations, or selecting a different model architecture. The figures below show examples of overfitting and underfitting through the loss curves. The Deep Learning Journey Unlike traditional software engineering, deep learning is more experimental and requires careful tuning of hyper-parameters. However, if the fundamentals mentioned above are taken care of, this process can be more manageable. Since the models are black boxes, we have to leverage the loss curves, output visualizations, and performance metrics to understand model behavior and correspondingly take corrective measures. Hopefully, this guide can make your deep learning journey a little less taxing.

By Anurag Paul

Pivoting Database Systems Practices to AI: Create Efficient Development and Maintenance Practices With Generative AI

Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Database Systems: Modernization for Data-Driven Architectures. Modern database practices enhance performance, scalability, and flexibility while ensuring data integrity, consistency, and security. Some key practices include leveraging distributed databases for scalability and reliability, using cloud databases for on-demand scalability and maintenance, and implementing NoSQL databases for handling unstructured data. Additionally, data lakes store vast amounts of raw data for advanced analytics, and in-memory databases speed up data retrieval by storing data in main memory. The advent of artificial intelligence (AI) is rapidly transforming database development and maintenance by automating complex tasks, enhancing efficiency, and ensuring system robustness. This article explores how AI can revolutionize development and maintenance through automation, best practices, and AI technology integration. The article also addresses the data foundation for real-time AI applications, offering insights into database selection and architecture patterns to ensure low latency, resiliency, and high-performance systems. How Generative AI Enables Database Development and Maintenance Tasks Using generative AI (GenAI) for database development can significantly enhance productivity and accuracy by automating key tasks, such as schema design, query generation, and data cleaning. It can generate optimized database structures, assist in writing and optimizing complex queries, and ensure high-quality data with minimal manual intervention. Additionally, AI can monitor performance and suggest tuning adjustments, making database development and maintenance more efficient. Generative AI and Database Development Let's review how GenAI can assist some key database development tasks: Requirement analysis. The components that need additions and modifications for each database change request are documented. Utilizing the document, GenAI can help identify conflicts between change requirements, which will help in efficient planning for implementing change requests across dev, QA, and prod environments.Database design. GenAI can help develop the database design blueprint based on the best practices for normalization, denormalization, or one big table design. The design phase is critical and establishing a robust design based on best practices can prevent costly redesigns in the future.Schema creation and management. GenAI can generate optimized database schemas based on initial requirements, ensuring best practices are followed based on normalization levels and partition and index requirements, thus reducing design time.Packages, procedures, and functions creation. GenAI can help optimize the packages, procedures, and functions based on the volume of data that is processed, idempotency, and data caching requirements.Query writing and optimization. GenAI can assist in writing and optimizing complex SQL queries, reducing errors, and improving execution speed by analyzing data structures based on data access costs and available metadata.Data cleaning and transformation. GenAI can identify and correct anomalies, ensuring high-quality data with minimal manual intervention from database developers. Generative AI and Database Maintenance Database maintenance to ensure efficiency and security is crucial to a database administrator's (DBA) role. Here are some ways that GenAI can assist critical database maintenance tasks: Backup and recovery. AI can automate back-up schedules, monitor back-up processes, and predict potential failures. GenAI can generate scripts for recovery scenarios and simulate recovery processes to test their effectiveness.Performance tuning. AI can analyze query performance data, suggest optimizations, and generate indexing strategies based on access paths and cost optimizations. It can also predict query performance issues based on historical data and recommend configuration changes.Security management. AI can identify security vulnerabilities, suggest best practices for permissions and encryption, generate audit reports, monitor unusual activities, and create alerts for potential security breaches.Database monitoring and troubleshooting. AI can provide real-time monitoring, anomaly detection, and predictive analytics. It can also generate detailed diagnostic reports and recommend corrective actions.Patch management and upgrades. AI can recommend optimal patching schedules, generate patch impact analysis reports, and automate patch testing in a sandbox environment before applying them to production. Enterprise RAG for Database Development Retrieval augmented generation (RAG) helps in schema design, query optimization, data modeling, indexing strategies, performance tuning, security practices, and back-up and recovery plans. RAG improves efficiency and effectiveness by retrieving best practices and generating customized, context-aware recommendations and automated solutions. Implementing RAG involves: Building a knowledge baseDeveloping retrieval mechanismsIntegrating generation modelsEstablishing a feedback loop To ensure efficient, scalable, and maintainable database systems, RAG aids in avoiding mistakes by recommending proper schema normalization, balanced indexing, efficient transaction management, and externalized configurations. RAG Pipeline When a user query or prompt is input into the RAG system, it first interprets the query to understand what information is being sought. Based on the query, the system searches a vast database or document store for relevant information. This is typically accomplished using vector embeddings, where both the query and the documents are converted into vectors in a high-dimensional space, and similarity measures are used to retrieve the most relevant documents. The retrieved information, along with the original query, is fed into a language model. This model uses both the input query and the context provided by the retrieved documents to generate a more informed, accurate, and relevant response or output. Figure 1. Simple RAG pipeline Vector Databases for RAG Vector databases are tailored for high-dimensional vector operations, making them perfect for similarity searches in AI applications. Non-vector databases, however, manage transactional data and complex queries across structured, semi-structured, and unstructured data formats. The table below outlines the key differences between vector and non-vector databases: Table 1. Vector databases vs. non-vector databases Feature Vector Databases Non-Vector Databases Primary use case Similarity search, machine learning, AI Transactional data, structured queries Data structure High-dimensional vectors Structured data (tables), semi-structured data (JSON), unstructured data (documents) Indexing Specialized indexes for vector data Traditional indexes (B-tree, hash) Storage Vector embeddings Rows, documents, key-value pairs Query types k-NN (k-nearest neighbors), similarity search CRUD operations, complex queries (joins, aggregations) Performance optimization Optimized for high-dimensional vector operations Optimized for read/write operations and complex queries Data retrieval Nearest neighbor search, approximate nearest neighbor (ANN) search SQL queries, NoSQL queries When taking the vector database route, choosing a suitable vector database involves evaluating: data compatibility, performance, scalability, integration capabilities, operational considerations, cost, security, features, community support, and vendor stability. By carefully assessing these aspects, one can select a vector database that meets the application's requirements and supports its growth and performance objectives. Vector Databases for RAG Several vector databases in the industry are commonly used for RAG, each offering unique features to support efficient vector storage, retrieval, and integration with AI workflows: Qdrant and Chroma are powerful vector databases designed to handle high-dimensional vector data, which is essential for modern AI and machine learning tasks.Milvus, an open-source and highly scalable database, supports various vector index types and is used for video/image retrieval and large-scale recommendation systems.Faiss, a library for efficient similarity search, is widely used for large-scale similarity search and AI inference due to its high efficiency and support for various indexing methods. These databases are chosen based on specific use cases, performance requirements, and ecosystem compatibility. Vector Embeddings Vector embeddings can be created for diverse content types, such as data architecture blueprints, database documents, podcasts on vector database selection, and videos on database best practices for use in RAG. A unified, searchable knowledge base can be constructed by converting these varied forms of information into high-dimensional vector representations. This enables efficient and context-aware retrieval of relevant information across different media formats, enhancing the ability to provide precise recommendations, generate optimized solutions, and support comprehensive decision-making processes in database development and maintenance. Figure 2. Vector embeddings Vector Search and Retrieval Vector search and retrieval in RAG involve converting diverse data types (e.g., text, images, audio) into high-dimensional vector embeddings using machine learning models. These embeddings are indexed using techniques like hierarchical navigable small world (HNSW) or ANN to enable efficient similarity searches. When a query is made, it is also converted into a vector embedding and compared against the indexed vectors using distance metrics, such as cosine similarity or Euclidean distance, to retrieve the most relevant data. This retrieved information is then used to augment the generation process, providing context and improving the relevance and accuracy of the generated output. Vector search and retrieval are highly effective for applications such as semantic search, where queries are matched to similar content, and recommendation systems, where user preferences are compared to similar items to suggest relevant options. They are also used in content generation, where the most appropriate information is retrieved to enhance the accuracy and context of the generated output. LLMOps for AI-Powered Database Development Large language model operations (LLMOps) for AI-powered database development leverages foundational and fine-tuned models, effective prompt management, and model observability to optimize performance and ensure reliability. These practices enhance the accuracy and efficiency of AI applications, making them well suited for diverse, domain-specific, and robust database development and maintenance tasks. Foundational Models and Fine-Tuned Models Leveraging large, pre-trained GenAI models offers a solid base for developing specialized applications because of their training on diverse datasets. Domain adaptation involves additional training of these foundational models on domain-specific data, increasing their relevance and accuracy in fields such as finance and healthcare. A small language model is designed for computational efficiency, featuring fewer parameters and a smaller architecture compared to large language models (LLMs). Small language models aim to balance performance with resource usage, making them ideal for applications with limited computational power or memory. Fine-tuning these smaller models on specific datasets enhances their performance for particular tasks while maintaining computational efficiency and keeping them up to date. Custom deployment of fine-tuned small language models ensures they operate effectively within existing infrastructure and meet specific business needs. Prompt Management Effective prompt management is crucial for optimizing the performance of LLMs. This includes using various prompt types like zero-shot, single-shot, few-shot, and many-shot and learning to customize responses based on the examples provided. Prompts should be clear, concise, relevant, and specific to enhance output quality. Advanced techniques such as recursive prompts and explicit constraints help ensure consistency and accuracy. Methods like chain of thought (COT) prompts, sentiment directives, and directional stimulus prompting (DSP) guide the model toward more nuanced and context-aware responses. Prompt templating standardizes the approach, ensuring reliable and coherent results across tasks. Template creation involves designing prompts tailored to different analytical tasks, while version control manages updates systematically using tools like Codeberg. Continuous testing and refining of prompt templates further improve the quality and relevance of generated outputs. Model Observability Model observability ensures models function optimally through real-time monitoring, anomaly detection, performance optimization, and proactive maintenance. By enhancing debugging, ensuring transparency, and enabling continuous improvement, model observability improves AI systems' reliability, efficiency, and accountability, reducing operational risks and increasing trust in AI-driven applications. It encompasses synchronous and asynchronous methods to ensure the models function as intended and deliver reliable outputs. Generative AI-Enabled Synchronous Observability and AI-Enabled Asynchronous Data Observability Using AI for synchronous and asynchronous data observability in database development and maintenance enhances real-time and historical monitoring capabilities. Synchronous observability provides real-time insights and alerts on database metrics, enabling immediate detection and response to anomalies. Asynchronous observability leverages AI to analyze historical data, identify long-term trends, and predict potential issues, thus facilitating proactive maintenance and deep diagnostics. Together, these approaches ensure robust performance, reliability, and efficiency in database operations. Figure 3. LLMOps for model observability and database development Conclusion Integrating AI into database development and maintenance drives efficiency, accuracy, and scalability by automating tasks and enhancing productivity. In particular: Enterprise RAG, supported by vector databases and LLMOps, further optimizes database management through best practices.Data observability ensures comprehensive monitoring, enabling proactive and real-time responsiveness.Establishing a robust data foundation is crucial for real-time AI applications, ensuring systems meet real-time demands effectively.Integrating generative AI into data architectures and database selections, analytics layer building, data cataloging, data fabric, and data mesh development will increase automation and optimization, leading to more efficient and accurate data analytics. The benefits of leveraging AI in database development and maintenance will allow organizations to continuously improve performance and their database's reliability, thus increasing value and stance in the industry. Additional resources: Getting Started With Vector Databases by Miguel Garcia, DZone RefcardGetting Started With Large Language Models by Tuhin Chattopadhyay, DZone Refcard This is an excerpt from DZone's 2024 Trend Report, Database Systems: Modernization for Data-Driven Architectures.Read the Free Report

By Anandaganesh Balakrishnan

Enhancing Agile Product Development With AI and LLMs

During my 10+ years of experience in Agile product development, I have seen the difficulties of meeting the rapid requirements of the digital market. Manual procedures can slow down highly flexible software engineering and delivery teams, resulting in missed chances and postponed launches. With AI and Large Language Models (LLMs) becoming more prevalent, we are on the verge of a major change. Gartner points out a 25% increase in project success rates for those using predictive analytics (Gartner, 2021). These technologies are changing the way agile product development is optimized - by automating tasks, improving decision-making, and forecasting future trends. As stated in a report from McKinsey, companies using AI experience a 20% decrease in project costs (McKinsey & Company, 2023). In this article, I discuss how agile product development including any experiences and user journeys can be improved based on AI and LLM integrations across the development lifecycle. Also Read: "The Foundation of AI and Analytics Success: Why Architecture Matters" AI and LLM Integration Phases for Agile Product Development Automating User Story Generation Creating user stories is crucial for Agile development, although it can be time-consuming. LLMs, for example, such as GPT-4 from OpenAI are able to streamline the process by creating comprehensive user stories using available documentation and feedback. This speeds up the process while also enhancing precision and significance. Application Scenario For example, I focus on utilizing AI or LLM-based methods for streamlining, optimizing, and automating the creation of user stories. Integrating such methods with a comprehensive backlog has allowed me to improve product development lifecycles and any engineering prioritization. This significantly reduces user story creation time, which is also helpful for solutions architects and increases user satisfaction where there is more relevant and accurate feature development. Significance and Advantages The automation of generating user stories is essential as it reduces the monotonous job of creating stories by hand, enabling product managers and software engineers to concentrate on more strategic tasks. This process guarantees that user stories are created uniformly and in line with user requirements, resulting in improved prioritization and quicker development cycles. Assisting agile teams in sustaining their progress and releasing features that better align with user needs. Additionally, organizations that adopt AI for generating user stories usually see a 50% reduction in story creation time (Menzies & Zimmermann, 2022). Also Read: "User Story Reflections" Optimizing Backlog Prioritization Key to swift value delivery is effective prioritization of the backlog. AI algorithms analyze user feedback, market trends, and technical dependencies to forecast the most valuable features. This approach driven by data assists product managers in making well-informed choices. Application Scenario For example, during the development of a digital healthcare consumer platform, I utilized AI tools to review user feedback and determine which backlog items to focus on first. This was mapped across different prioritization techniques as well as how engineering would execute them based on complexity. As a result, there was a 40% rise in feature utilization and a 20% decrease in feature development duration, which also helped the software engineering team improve their metrics. Significance and Advantages It is crucial to prioritize backlog optimization in order to make informed decisions that improve the value of the product and customer satisfaction. Utilizing AI for prioritization aids agile teams in determining which features will yield the greatest benefit, enabling them to utilize resources effectively and concentrate on tasks with significant impact. Companies that have implemented AI for prioritizing their backlog have seen a 40% growth in feature adoption (Buch & Pokiya, 2020). Leveraging Predictive Analytics Predictive analytics offers insight to help shape development tactics. AI models can predict risks and estimate delivery times by examining historical data, helping teams address issues and align development efforts with market changes. Further, this can help agile product development teams assess how to staff across sprints and ensure workforce optimization to improve feature velocity. Application Scenario For example, I use predictive analytics in collaboration with engineering development and delivery teams to predict how new features would affect Sprint planning, Sprint allocation, and user engagement. The information assisted in determining which updates were most important as well as need execution in upcoming sprints and has allowed me to optimize MVPs, resulting in a ~25% rise in user retention and a ~15% increase in new user acquisition across two different products. Significance and Advantages Predictive analytics offer practical insights that steer strategic choices in flexible product development. Teams can prioritize new features that will have the greatest impact on user engagement and retention by predicting their effects. Businesses that use predictive analytics have observed a 25% rise in customer retention (Forrester, 2019). Improving Product Experiences and User Journeys AI and LLMs improve user journeys and product experiences through a more user-focused approach to development. Automated creation of user stories guarantees that features are developed according to genuine user requirements, resulting in products that are more instinctive and captivating. This alignment improves user satisfaction and involvement by customizing features to meet specific needs and desires. Use Case For example, I used LLMs to analyze user feedback and create features that directly addressed user pain points. This resulted in streamlining and optimizing how different product features are lined up along with tech debt for engineering execution. I have seen a ~35% increase in user engagement significant reduction in user churn rates. Significance and Advantages Improving product experiences and user journeys with AI and LLMs ensures a user-focused approach in product development, resulting in more user-friendly and personalized experiences. Aligning with user needs not only boosts satisfaction but also enhances engagement and retention. After incorporating AI-driven improvements, companies have experienced a 35% rise in user engagement (Ransbotham, Kiron, Gerbert, & Reeves, 2018). Supporting Agile Product Development and Product Management Incorporating AI and LLMs into agile product development changes how teams tackle and carry out projects, providing numerous advantages. To begin with, these technologies simplify the process of developing user stories, cutting down on manual work and allowing more time for strategic duties. This results in enhanced precision and significance in feature advancement. Also, by using AI to prioritize the backlog, teams can concentrate on important tasks, leading to better use of resources and increased overall productivity. Predictive analytics enhances value by predicting feature performance, allowing teams to make educated decisions that increase user retention and engagement. From my own experience, I've noticed that these advancements not only speed up the process of development but also make products better suited to user requirements, resulting in a more agile and adaptable development setting. The integration of AI in agile product development leads to improved product management, faster iterations, and enhanced user experience. For example, the global AI-assisted custom application development market is expected to grow up to $61Bn and from 21% to 28% by 2024 (Deloitte Insights, 2020). As a product manager working across multiple software engineering teams, AI and LLMs have helped me simplify decision-making by automating routine tasks and providing actionable insights. Automated user story generation and backlog prioritization free up time to focus on strategic aspects, while predictive analytics offers data-driven forecasts and trend analysis. This results in a more agile and responsive product management process, where decisions are guided by comprehensive data and real-time insights, ultimately leading to more successful product outcomes and better market alignment. Benefits of AI and LLMs for Agile Product Development Conclusion and Next Steps The incorporation of AI and LLMs in agile product development seems like a dynamic revolution. In my opinion, these tools have revolutionized the way tasks are done by automating them, streamlining processes, and forecasting trends accurately. They have made workflows more efficient and enhanced product experiences, resulting in more agile and responsive development cycles. As we further accept and improve these technologies, I look forward to witnessing how their developing abilities will continue to change our strategy for creating and providing outstanding products. The process of incorporating AI and LLMs into agile product development methods is indeed exciting and filled with potential. Key Takeaways Start using AI and LLM tools to automate and improve the generation of user stories and prioritize backlogs in your development processes.Utilize predictive analytics: Employ predictive analytics to gain insight into potential project risks and market trends, enabling proactive modifications.Prioritize user-centric development: Utilize AI-generated insights to enhance product experiences for better user satisfaction and retention.

By Varun Milind Kulkarni

Why AI-Assisted Code Generation Is Transforming Software Development

Software development and architecture is continuously evolving with artificial intelligence (AI). AI-assisted code generation stands out as a particularly revolutionary advancement, offering developers the ability to create high-quality code more efficiently and accurately than ever before. This innovation not only enhances productivity, but also opens the door to new possibilities in software creation, particularly in the realm of microservices development. The Evolution of Code Generation: Traditional Coding vs. AI-Assisted Coding Traditional coding requires developers to write and test extensive lines of code manually. This process is time consuming and prone to errors. Conversely, AI-assisted code generation leverages machine learning algorithms to analyze patterns in existing codebases, understands programming logic, and generates code snippets or entire programs based on specific requirements. This technology can drastically reduce the time spent on repetitive coding tasks and minimize human errors. It is not a substitute for developers, but rather a productivity tool that eliminates tedious and monotonous infrastructure and plumbing code. Benefits of AI-Assisted Code Generation Below is a list of some of the key benefits of leveraging AI-assisted code generation. Increased Efficiency: AI can quickly generate code, which allows developers to focus on more complex and creative aspects of software development. This leads to faster project completion times and the ability to tackle more projects simultaneously.Improved Code Quality: By learning from vast datasets of existing code, AI can produce high-quality code that adheres to best practices and industry standards. This results in more robust and maintainable software.Enhanced Collaboration: AI tools can bridge the gap between different development teams by providing consistent code styles and standards. This facilitates better collaboration and smoother integration of different software components.Rapid Prototyping: With AI-assisted code generation, developers can quickly create prototypes to test new ideas and functionalities. This accelerates the innovation cycle and helps bring new products to market faster. The Relationship Between AI and Microservices Microservices architecture has gained popularity in recent years because of its ability to break down complex applications into smaller, manageable services. Each service can be developed, deployed, and scaled independently, offering greater flexibility and resilience than a monolithic architecture. AI-assisted code generation is particularly well-suited for creating microservices, as it can handle the intricacies of defining and managing numerous small, interconnected services. A Platform for AI-Generated Microservices One example of AI in practice is ServiceBricks, an open-source platform that uses AI to generate microservices. Users provide human-readable text, which the AI then converts into fully functional microservices, including REST APIs for create, update, delete, get, and query operations. The platform also generates DTO models, source code, project files, class files, unit tests, and integration tests, thereby automating parts of the development process and reducing the time and effort needed to build scalable, maintainable microservices. The Future of AI-Assisted Development As AI technology continues to advance, its role in software development will only expand. Future iterations of AI-assisted code generation tools will likely become even more intuitive and capable, handling more complex programming tasks and integrating seamlessly with various development environments. The ultimate goal is to create a synergistic relationship between human developers and AI, where each leverages their strengths to produce superior software solutions. Conclusion AI-assisted code generation is transforming software development by enhancing efficiency, code quality, and innovation. This technology is reshaping how microservices and other essential components are developed, paving the way for greater productivity and creativity. As AI technology continues to evolve, it holds the potential to drive further advancements in software development, enabling developers to reach new heights in excellence and innovation worldwide.

By Danny Logsdon

Utilizing Multiple Vectors and Advanced Search Data Model Design for City Data

Goal of This Application In this article, we will build an advanced data model and use it for ingestion and various search options. For the notebook portion, we will run a hybrid multi-vector search, re-rank the results, and display the resulting text and images. Ingest data fields, enrich data with lookups, and format: Learn to ingest data including JSON and images, format and transform to optimize hybrid searches. This is done inside the streetcams.py application.Store data into Milvus: Learn to store data in Milvus, an efficient vector database designed for high-speed similarity searches and AI applications. In this step, we are optimizing the data model with scalar and multiple vector fields — one for text and one for the camera image. We do this in the streetcams.py application.Use open source models for data queries in a hybrid multi-modal, multi-vector search: Discover how to use scalars and multiple vectors to query data stored in Milvus and re-rank the final results in this notebook.Display resulting text and images: Build a quick output for validation and checking in this notebook.Simple Retrieval-Augmented Generation (RAG) with LangChain: Build a simple Python RAG application (streetcamrag.py) to use Milvus for asking about the current weather via Ollama. While outputing to the screen we also send the results to Slack formatted as Markdown. Summary By the end of this application, you’ll have a comprehensive understanding of using Milvus, data ingest object semi-structured and unstructured data, and using open source models to build a robust and efficient data retrieval system. For future enhancements, we can use these results to build prompts for LLM, Slack bots, streaming data to Apache Kafka, and as a Street Camera search engine. Milvus: Open Source Vector Database Built for Scale Milvus is a popular open-source vector database that powers applications with highly performant and scalable vector similarity searches. Milvus has a distributed architecture that separates compute and storage, and distributes data and workloads across multiple nodes. This is one of the primary reasons Milvus is highly available and resilient. Milvus is optimized for various hardware and supports a large number of indexes. You can get more details in the Milvus Quickstart. For other options for running Milvus, check out the deployment page. New York City 511 Data REST Feed of Street Camera information with latitude, longitude, roadway name, camera name, camera URL, disabled flag, and blocked flag: JSON { "Latitude": 43.004452, "Longitude": -78.947479, "ID": "NYSDOT-badsfsfs3", "Name": "I-190 at Interchange 18B", "DirectionOfTravel": "Unknown", "RoadwayName": "I-190 Niagara Thruway", "Url": "https://nyimageurl", "VideoUrl": "https://camera:443/rtplive/dfdf/playlist.m3u8", "Disabled":true, "Blocked":false } We then ingest the image from the camera URL endpoint for the camera image: After we run it through Ultralytics YOLO, we will get a marked-up version of that camera image. NOAA Weather Current Conditions for Lat/Long We also ingest a REST feed for weather conditions meeting latitude and longitude passed in from the camera record that includes elevation, observation date, wind speed, wind direction, visibility, relative humidity, and temperature. JSON "currentobservation":{ "id":"KLGA", "name":"New York, La Guardia Airport", "elev":"20", "latitude":"40.78", "longitude":"-73.88", "Date":"27 Aug 16:51 pm EDT", "Temp":"83", "Dewp":"60", "Relh":"46", "Winds":"14", "Windd":"150", "Gust":"NA", "Weather":"Partly Cloudy", "Weatherimage":"sct.png", "Visibility":"10.00", "Altimeter":"1017.1", "SLP":"30.04", "timezone":"EDT", "state":"NY", "WindChill":"NA" } Ingest and Enrichment We will ingest data from the NY REST feed in our Python loading script.In our streetcams.py Python script does our ingest, processing, and enrichment.We iterate through the JSON results from the REST call then enrich, update, run Yolo predict, then we run a NOAA Weather lookup on the latitude and longitude provided. Build a Milvus Data Schema We will name our collection: "nycstreetcameras".We add fields for metadata, a primary key, and vectors.We have a lot of varchar variables for things like roadwayname, county, and weathername. Python FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True), FieldSchema(name='latitude', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='longitude', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='name', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='roadwayname', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='directionoftravel', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='videourl', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='url', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='filepath', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='creationdate', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='areadescription', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='elevation', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='county', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='metar', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='weatherid', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='weathername', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='observationdate', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='temperature', dtype=DataType.FLOAT), FieldSchema(name='dewpoint', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='relativehumidity', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='windspeed', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='winddirection', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='gust', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='weather', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='visibility', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='altimeter', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='slp', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='timezone', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='state', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='windchill', dtype=DataType.VARCHAR, max_length=200), FieldSchema(name='weatherdetails', dtype=DataType.VARCHAR, max_length=8000), FieldSchema(name='image_vector', dtype=DataType.FLOAT_VECTOR, dim=512), FieldSchema(name='weather_text_vector', dtype=DataType.FLOAT_VECTOR, dim=384) The two vectors are image_vector and weather_text_vector, which contain an image vector and text vector. We add an index for the primary key id and for each vector. We have a lot of options for these indexes and they can greatly improve performance. Insert Data Into Milvus We then do a simple insert into our collection with our scalar fields matching the schema name and type. We have to run an embedding function on our image and weather text before inserting. Then we have inserted our record. We can then check our data with Attu. Building a Notebook for Report We will build a Jupyter notebook to query and report on our multi-vector dataset. Prepare Hugging Face Sentence Transformers for Embedding Sentence Text We utilize a model from Hugging Face, "all-MiniLM-L6-v2", a sentence transformer to build our Dense embedding for our short text strings. This text is a short description of the weather details for the nearest location to our street camera. See: Integrate with HuggingFace Prepare Embedding Model for Images We utilize a standard resnet34 Pytorch feature extractor that we often use for images. Instantiate Milvus As stated earlier, Milvus is a popular open-source vector database that powers AI applications with highly performant and scalable vector similarity search. For our example, we are connecting to Milvus running in Docker.Setting the URI as a local file, e.g., ./milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file.If you have a large scale of data, say more than a million vectors, you can set up a more performant Milvus server on Docker or Kubernetes. In this setup, please use the server URI, e.g.http://localhost:19530, as your uri.If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, adjust the URI and token, which correspond to the Public Endpoint and API key in Zilliz Cloud. Prepare Our Search We are building two searches (AnnSearchRequest) to combine together for a hybrid search which will include a reranker. Display Our Results We display the results of our re-ranked hybrid search of two vectors. We show some of the output scalar fields and an image we read from the stored path. The results from our hybrid search can be iterated and we can easily access all the output fields we choose. filepath contains the link to the locally stored image and can be accessed from the key.entity.filepath. The key contains all our results, while key.entity has all of our output fields chosen in our hybrid search in the previous step. We iterate through our re-ranked results and display the image and our weather details. RAG Application Since we have loaded a collection with weather data, we can use that as part of a RAG (Retrieval Augmented Generation). We will build a completely open-source RAG application utilizing the local Ollama, LangChain, and Milvus. We set up our vector_store as Milvus with our collection. Python vector_store = Milvus( embedding_function=embeddings, collection_name="CollectionName", primary_field = "id", vector_field = "weather_text_vector", text_field="weatherdetails", connection_args={"uri": "https://localhost:19530"}, ) We then connect to Ollama. Python llm = Ollama( model="llama3", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), stop=["<|eot_id|>"], ) We prompt for interacting questions. Python query = input("\nQuery: ") We set up a RetrievalQA connection between our LLM and our vector store. We pass in our query and get the result. Python qa_chain = RetrievalQA.from_chain_type( llm, retriever=vector_store.as_retriever(collection = SC_COLLECTION_NAME)) result = qa_chain({"query": query}) resultforslack = str(result["result"]) We then post the results to a Slack channel. Python response = client.chat_postMessage(channel="C06NE1FU6SE", text="", blocks=[{"type": "section", "text": {"type": "mrkdwn", "text": str(query) + " \n\n" }, {"type": "divider"}, {"type": "section","text": {"type": "mrkdwn","text": str(resultforslack) +"\n" }] ) Below is the output from our chat to Slack. You can find all the source code for the notebook, the ingest script, and the interactive RAG application in GitHub below. Source Code Conclusion In this notebook, you have seen how you can use Milvus to do a hybrid search on multiple vectors in the same collection and re-ranking the results. You also saw how to build a complex data modal that includes multiple vectors and many scalar fields that represent a lot of metadata related to our data. You learned how to ingest JSON, images, and text to Milvus with Python. And finally, we built a small chat application to check out the weather for locations near traffic cameras. To build your own applications, please check out the resources below. Resources In the following list, you can find resources helpful in learning more about using pre-trained embedding models for Milvus, performing searches on text data, and a great example notebook for embedding functions. Milvus RerankingMilvus Hybrid Search511NY: GET api/GetCamerasUsing PyMilvus's Model To Generate Text EmbeddingsHuggingFace: sentence-transformers/all-MiniLM-L6-v2Pretrained ModelsMilvus: SentenceTransformerEmbeddingFunctionVectorizing JSON Data with Milvus for Similarity SearchMilvus: Scalar IndexMilvus: In-memory IndexMilvus: On-disk IndexGPU IndexNot Every Field is Just Text, Numbers, or VectorsHow good is Quantization in Milvus?

By Tim Spann

CORE

Unboxing the Black Box

Today, several significant and safety-critical decisions are being made by deep neural networks. These include driving decisions in autonomous vehicles, diagnosing diseases, and operating robots in manufacturing and construction. In all such cases, scientists and engineers claim that these models help make better decisions than humans and hence, help save lives. However, how these networks reach their decisions is often a mystery, for not just their users, but also for their developers. These changing times, thus, necessitate that as engineers we spend more time unboxing these black boxes so that we can identify the biases and weaknesses of the models that we build. This may also allow us to identify which part of the input is most critical for the model and hence, ensure its correctness. Finally, explaining how models make their decisions will not only build trust between AI products and their consumers but also help meet the diverse and evolving regulatory requirements. The whole field of explainable AI is dedicated to figuring out the decision-making process of models. In this article, I wish to discuss some of the prominent explanation methods for understanding how computer vision models arrive at a decision. These techniques can also be used to debug models or to analyze the importance of different components of the model. The most common way to understand model predictions is to visualize heat maps of layers close to the prediction layer. These heat maps when projected on the image allow us to understand which parts of the image contribute more to the model’s decision. Heat maps can be generated either using gradient-based methods like CAM, or Grad-CAM or perturbation-based methods like I-GOS or I-GOS++. A bridge between these two approaches, Score-CAM, uses the increase in model confidence scores to provide a more intuitive way of generating heat maps. In contrast to these techniques, another class of papers argues that these models are too complex for us to expect just a single explanation for their decision. Most significant among these papers is the Structured Attention Graphs method which generates a tree to provide multiple possible explanations for a model to reach its decision. Class Activation Map (CAM) Based Approaches 1. CAM Class Activation Map (CAM) is a technique for explaining the decision-making of specific types of image classification models. Such models have their final layers consisting of a convolutional layer followed by global average pooling, and a fully connected layer to predict the class confidence scores. This technique identifies the important regions of the image by taking a weighted linear combination of the activation maps of the final convolutional layer. The weight of each channel comes from its associated weight in the following fully connected layer. It's quite a simple technique but since it works for a very specific architectural design, its application is limited. Mathematically, the CAM approach for a specific class c can be written as: where is the weight for activation map (A) of the kth channel of the convolutional layer. ReLU is used as only positive contributions of the activation maps are of interest for generating the heat map. 2. Grad-CAM The next step in CAM evolution came through Grad-CAM, which generalized the CAM approach to a wider variety of CNN architectures. Instead of using the weights of the last fully connected layer, it determines the gradient flowing into the last convolutional layer and uses that as its weight. So for the convolutional layer of interest A, and a specific class c, they compute the gradient of the score for class c with respect to the feature map activations of a convolutional layer. Then, this gradient is the global average pooled to obtain the weights for the activation map. The final obtained heat map is of the same shape as the feature map output of that layer, so it can be quite coarse. Grad-CAM maps become progressively worse as we move to more initial layers due to reducing receptive fields of the initial layers. Also, gradient-based methods suffer from vanishing gradients due to the saturation of sigmoid layers or zero-gradient regions of the ReLU function. 3. Score-CAM Score-CAM addresses some of these shortcomings of Grad-CAM by using Channel-wise Increase of Confidence (CIC) as the weight for the activation maps. Since it does not use gradients, all gradient-related shortcomings are eliminated. Channel-wise Increase of Confidence is computed by following the steps below: Upsampling the channel activation maps to input size and then, normalizing themThen, computing the pixel-wise product of the normalized maps and the input imageFollowed by taking the difference of the model output for the above input tensors and some base images which gives an increase in confidenceFinally, applying softmax to normalize the activation maps weights to [0, 1] The Score-CAM approach can be applied to any layer of the model and provides one of the most reasonable heat maps among the CAM approaches. In order to illustrate the heat maps generated by Grad-CAM and Score-CAM approaches, I selected three images: bison, camel, and school bus images. For the model, I used the Convnext-Tiny implementation in TorchVision. I extended the PyTorch Grad-CAM repo to generate heat maps for the layer convnext_tiny.features[7][2].block[5]. From the visualization below, one can observe that Grad-CAM and Score-CAM highlight similar regions for the bison image. However, Score-CAM’s heat map seems to be more intuitive for the camel and school bus examples. Perturbation-Based Approaches Perturbation-based approaches work by masking part of the input image and then observing how this affects the model's performance. These techniques directly solve an optimization problem to determine the mask that can best explain the model’s behavior. I-GOS and I-GOS++ are the most popular techniques under this category. 1. Integrated Gradients Optimized Saliency (I-GOS) The I-GOS paper generates a heat map by finding the smallest and smoothest mask that optimizes for the deletion metric. This involves identifying a mask such that if the masked portions of the image are removed, the model's prediction confidence will be significantly reduced. Thus, the masked region is critical for the model’s decision-making. The mask in I-GOS is obtained by finding a solution to an optimization problem. One way to solve this optimization problem is by applying conventional gradients in the gradient descent algorithm. However, such a method can be very time-consuming and is prone to getting stuck in local optima. Thus, instead of using conventional gradients, the authors recommend using integrated gradients to provide a better descent direction. Integrated gradients are calculated by going from a baseline image (giving very low confidence in model outputs) to the original image and accumulating gradients on images along this line. 2. I-GOS++ I-GOS++ extends I-GOS by also optimizing for the insertion metric. This metric implies that only keeping the highlighted portions of the heat map should be sufficient for the model to retain confidence in its decision. The main argument for incorporating insertion masks is to prevent adversarial masks which don’t explain the model behavior but are very good at deletion metrics. In fact, I-GOS++ tries to optimize for three masks: a deletion mask, an insertion mask, and a combined mask. The combined mask is the dot product of the insertion and deletion masks and is the output of the I-GOS++ technique. This technique also adds regularization to make masks smooth on image areas with similar colors, thus enabling the generation of better high-resolution heat maps. Next, we compare the heat maps of I-GOS and I-GOS++ with Grad-CAM and Score-CAM approaches. For this, I made use of the I-GOS++ repo to generate heat maps for the Convnext-Tiny model for the bison, camel, and school bus examples used above. One can notice in the visualization below that the perturbation techniques provide less diffused heat maps compared to the CAM approaches. In particular, I-GOS++ provides very precise heat maps. Structured Attention Graphs for Image Classification The Structured Attention Graphs (SAG) paper presents a counter view that a single explanation (heat map) is not sufficient to explain a model's decision-making. Rather multiple possible explanations exist which can also explain the model’s decision equally well. Thus, the authors suggest using beam-search to find all such possible explanations and then using SAGs to concisely present this information for easier analysis. SAGs are basically “directed acyclic graphs” where each node is an image patch and each edge represents a subset relationship. Each subset is obtained by removing one patch from the root node’s image. Each root node represents one of the possible explanations for the model’s decision. To build the SAG, we need to solve a subset selection problem to identify a diverse set of candidates that can serve as the root nodes. The child nodes are obtained by recursively removing one patch from the parent node. Then, the scores for each node are obtained by passing the image represented by that node through the model. Nodes below a certain threshold (40%) are not expanded further. This leads to a meaningful and concise representation of the model's decision-making process. However, the SAG approach is limited to only coarser representations as combinatorial search is very computationally expensive. Some illustrations for Structured Attention Graphs are provided below using the SAG GitHub repo. For the bison and camel examples for the Convnext-Tiny model, we only get one explanation; but for the school bus example, we get 3 independent explanations. Applications of Explanation Methods Model Debugging The I-GOS++ paper presents an interesting case study substantiating the need for model explainability. The model in this study was trained to detect COVID-19 cases using chest x-ray images. However, using the I-GOS++ technique, the authors discovered a bug in the decision-making process of the model. The model was paying attention not only to the area in the lungs but also to the text written on X-ray images. Obviously, the text should not have been considered by the model, indicating a possible case of overfitting. To alleviate this issue, the authors pre-processed the images to remove the text and this improved the performance of the original diagnosis task. Thus, a model explainability technique, IGOS++ helped debug a critical model. Understanding Decision-Making Mechanisms of CNNs and Transformers Jiang et. al. in their CVPR 2024 paper, deployed SAG, I-GOS++, and Score-CAM techniques to understand the decision-making mechanism of the most popular types of networks: Convolutional Neural Networks (CNNs) and Transformers. This paper applied explanation methods on a dataset basis instead of a single image and gathered statistics to explain the decision-making of these models. Using this approach, they found that Transformers have the ability to use multiple parts of an image to reach their decisions in contrast to CNNs which use several disjoint smaller sets of patches of images to reach their decision. Key Takeaways Several heat map techniques like Grad-CAM, Score-CAM, IGOS, and IGOS++ can be used to generate visualizations to understand which parts of the image a model focuses on when making its decisions.Structured Attention Graphs provide an alternate visualization to provide multiple possible explanations for the model’s confidence in its predicted class.Explanation techniques can be used to debug the models and can also help better understand model architectures.

By Anurag Paul

Building Product to Learn AI, Part 2: Shake and Bake

If you haven't already, be sure to review Part 1 where we reviewed data collection and prepared a dataset for our model to train on. In the previous section, we gathered the crucial "ingredients" for our AI creation — the data. This forms the foundation of our model. Remember, the quality of the ingredients (your data) directly impacts the quality of the final dish (your model's performance). Now, we'll transform that data into a fully functioning Large Language Model (LLM). By the end of this section, you'll be interacting with your very own AI! Choosing Your Base Layer Before we dive into training, we’ll explore the different approaches to training your LLM. This is like choosing the right flour for your bread recipe — it significantly influences the capabilities and limitations of your final creation. There are many ways to go about training an ML model. This is also an active area of research, with new methodologies emerging every day. Let’s take a look at the major tried-and-true categories of methods of model development. (Note: These methods are not necessarily mutually exclusive.) Key Approaches 1. Start From Scratch (Pretraining Your Own Model) This offers the most flexibility, but it's the most resource-intensive path. The vast amounts of data and compute resources required here mean that only the most well-resourced corporations are able to train novel pre-trained models. 2. Fine-Tuning (Building on a Pre-trained Model) This involves starting with a powerful, existing LLM and adapting it to our specific meal-planning task. It's like using pre-made dough — you don't have to start from zero, but you can still customize it. 3. Leveraging Open-Source Models Explore a growing number of open source models, often pre-trained on common tasks, to experiment without the need for extensive pre-training. 4. Using Commercial Off-the-Shelf Models For production-ready applications, consider commercial LLMs (e.g., from Google, OpenAI, Microsoft) for optimized performance, but with potential customization limits. 5. Cloud Services Streamline training and deployment with powerful tools and managed infrastructure, simplifying the process. Choosing the Right Approach The best foundation for your LLM depends on your specific needs: Time and resources: Do you have the capacity for pretraining, or do you need a faster solution?Customization: How much control over the model's behavior do you require?Cost: What's your budget? Can you invest in commercial solutions?Performance: What level of accuracy and performance do you need?Capabilities: What level of technical skills and/or compute resources do you have access to? Moving Forward We'll focus on fine-tuning Gemini Pro in this tutorial, striking a balance between effort and functionality for our meal-planning model. Getting Ready to Train: Export Your Dataset Now that we've chosen our base layer, let's get our data ready for training. Since we're using Google Cloud Platform (GCP), we need our data in JSONL format. Note: Each model might have specific data format requirements, so always consult the documentation before proceeding. Luckily, converting data from Google Sheets to JSONL is straightforward with a little Python. Export to CSV: First, export your data from Google Sheets as a CSV file.Convert CSV to JSONL: Run the following Python script, replacing your_recipes.csv with your actual filename: Python import csv import json csv_file = 'your_recipes.csv' # Replace 'your_recipes.csv' with your CSV filename jsonl_file = 'recipes.jsonl' with open(csv_file, 'r', encoding='utf-8') as infile, open(jsonl_file, 'w', encoding='utf-8') as outfile: reader = csv.DictReader(infile) for row in reader: row['Prompt'] = row['Prompt'].splitlines() row['Response'] = row['Response'].splitlines() json.dump(row, outfile) outfile.write('\n') This will create a recipes.jsonl file where each line is a JSON object representing a meal plan. Training Your Model We’re finally ready to start training our LLM. Let’s dive in! 1. Project Setup Google Cloud Project: Create a new Google Cloud project if you don't have one already (free tier available).Enable APIs: Search for "Vertex AI" in your console, and on the Vertex AI page, click Enable All Recommended APIs.Authentication: Search for "Service Accounts," and on that page, click Create Service Account. Use the walkthrough to set up a service account and download the required credentials for secure access.Cloud Storage Bucket: Find the "Cloud Storage" page and create a storage bucket. 2. Vertex AI Setup Navigate to Vertex AI Studio (free tier available).Click Try it in Console in a browser where you are already logged in to your Google Cloud Account.In the left-hand pane find and click Language.Navigate to the “Tune and Distill” tab: 3. Model Training Click Create Tuned Model.For this example, we’ll do a basic fine-tuning task, so select “Supervised Tuning” (should be selected by default).Give your model a name.Select a base model: We’ll use Gemini Pro 1.0 002 for this example.Click Continue.Upload your JSONL file that you generated in Step 2.You’ll be asked for a “dataset location.” This is just where your JSONL file is going to be located in the cloud. You can use the UI to very easily create a "bucket" to store this data. Click start and wait for the model to be trained! With this step, you have now entered the LLM AI arena. The quality of the model you produce is only limited by your imagination and the quality of the data you can find, prepare, and/or generate for your use case. For our use case, we used the data we generated earlier, which included prompts about how individuals could achieve their specific health goals, and meal plans that matched those constraints. 4. Test Your Model Once your model is trained, you can test it by navigating to it on the Tune and Distill main page. In that interface, you can interact with the newly created model the same way you would with any other chatbot. In the next section, we will show you how to host your newly created model to run evaluations and wire it up for an actual application! Deploying Your Model You've trained your meal planning LLM on Vertex AI, and it's ready to start generating personalized culinary masterpieces. Now it's time to make your AI chef accessible to the world! This post will guide you through deploying your model on Vertex AI and creating a user-friendly bot interface. Create an endpoint: Navigate to the Vertex AI section in the Google Cloud Console.Select "Endpoints" from the left-hand menu and click "Create Endpoint."Give your endpoint a descriptive name (e.g., "meal-planning-endpoint").Deploy your model: Within your endpoint, click "Deploy model."Select your trained model from the Cloud Storage bucket where you saved it.Specify a machine type suitable for serving predictions (consider traffic expectations).Choose a deployment scale (e.g., "Manual Scaling" for initial testing, "Auto Scaling" for handling variable demand).Deploy the model. Congratulations! You've now trained and tested your very own LLM on Google's Vertex AI. You are now an AI engineer! In the next and final installment of this series, we'll take you through the exciting steps of deploying your model, creating a user-friendly interface, and unleashing your meal-planning AI upon the world! Stay tuned for the grand finale of our LLM adventure.

By Obaid Sarvana

AI/ML

DZone's Featured AI/ML Resources

Top AI/ML Experts

The Latest AI/ML Topics