Posts

Showing posts from March, 2025

Experiment Framework: A Guide to Structuring Your AI Experiments

Introduction  Running AI experiments effectively requires a structured approach to ensure reproducibility, proper logging, and efficient result analysis. This tutorial provides a well-organized Python framework to conduct experiments smoothly by handling configuration, data loading, training, and exporting results. - Why Use a Structured Experiment Framework?  Without a structured framework, AI experiments can become disorganized, leading to difficulties in tracking parameters, reproducing results, and analyzing performance. A well-structured approach ensures:  Consistency : Using fixed configurations and seeds for reproducibility. Automation : Automating training, evaluation, and result exporting. Scalability : Easy adaptation for different datasets and models. Efficiency : Reducing redundant code and saving results systematically.  Implementing the Experiment Framework Configuration and Setup  We start by defining a configuration class to manage experim...

How to design and run a statistical data analysis ?

  Designing and running a statistical data analysis involves a structured process to ensure valid, reliable, and actionable results. Below is a step-by-step guide: 1. Define the Research Question or Objective Purpose : Clearly articulate what you want to investigate or achieve (e.g., "Does a new drug reduce blood pressure compared to a placebo?"). Specificity : Ensure the question is specific, measurable, and feasible. Hypotheses : Formulate a null hypothesis (H₀, no effect) and an alternative hypothesis (H₁, effect exists). 2. Determine the Study Design Type of Study : Experimental : Manipulate variables (e.g., randomized controlled trials). Observational : Observe without intervention (e.g., cohort, case-control, cross-sectional). Variables : Identify dependent variables (outcomes) and independent variables (predictors). Consider confounding variables that might affect results. Population and Sampling : Define the target population. Choose a sampling...

Creating a Dataset for an ML Project with Batch Loading

  Creating a Dataset for an ML Project with Batch Loading Machine learning (ML) projects often involve working with large datasets that cannot fit into memory all at once. To efficiently train models, datasets are divided into smaller, manageable batches. This article explains how to create datasets for ML projects, configure batch sizes, access data samples, and understand the importance of batch loading and iterative learning. Understanding Batch Loading in ML Why Use Batches? Memory Efficiency : Loading the entire dataset at once can exceed memory limits, especially with large datasets. Computational Optimization : Training with batches allows the use of parallel processing on GPUs. Faster Convergence : Stochastic and mini-batch gradient descent can lead to faster and more stable convergence compared to full-batch training. Generalization Improvement : Training on different batches helps models generalize better and avoid overfitting. Configuring Batch Size The batch size de...

Hello world !

 The story of "Hello, World!" is deeply tied to the history of programming and computer science education. Here's a quick rundown of its origins and significance: 1. Origins in Early Programming The phrase "Hello, World!" first appeared in programming literature in the 1970s. It was popularized by Brian Kernighan in his book The C Programming Language (1978), co-authored with Dennis Ritchie , the creator of the C language. However, Kernighan had already used it in an earlier 1972 internal Bell Labs tutorial for the B programming language, a precursor to C. The first recorded "Hello, World!" example in B looked like this: main() { printf("hello, world\n"); } 2. Why "Hello, World!"? Simplicity : It's a small, easy-to-understand program that demonstrates basic syntax. Testing : It's often the first thing programmers write when learning a new language. Debugging : It ensures that the compiler and environm...

Part1 : How to create your own first RAG ?

Retrieval-Augmented Generation (RAG) is an AI technique that enhances generative models by integrating a retrieval system to fetch relevant external information before generating responses. This improves the accuracy, relevance, and factual correctness of AI-generated content. 1. Definition of RAG RAG combines retrieval-based and generation-based AI models. Instead of relying only on pre-trained knowledge, a RAG system retrieves relevant documents from an external knowledge base and incorporates them into the response generation process. Main Components of a RAG System Query Understanding: The system processes and refines the input query for retrieval. Retrieval System: Finds relevant documents using embeddings and similarity search. Context Processing: Extracts useful content from retrieved documents. Generation Model: Uses the retrieved content to generate a response. Post-Processing & Response Validation: Ensures coherence, factuality, and relevance. 2....