• AI4EOSC platform introduction

      

       

    General introduction to Artificial Intelligence and Machine Learning. - Prof. Giang Nguyen (IISAS)


    Machine Learning can be divided between unsupervised learning, supervised learning and reinforcement learning. AI can be used in a lot of different areas like politics, transportation, education and so on.

    Recent years have been a very developmental period for AI with Chat GPT being one of the most commonly used tool by people.

    The best way to approach Machine Learning solution is to start by analyzing and exploring the dataset in EDA.

    One of the common issues found in datasets is imbalance classes issue.

    Data imbalance usually reflects an unequal distribution of classes within a dataset.

    Binary classification models without fixing this problem will be very biased.

    • Hadoop
      gradual disappearance of the Big Data technology

    • Undersampling
      Randomly deleting some of the observations from the majority class in order to match the numbers with the minority class.
    • Oversampling
      Synthetic Minority Over-sampling Technique (SMOTE) - looks at the feature space for the minority class data points and considers its k nearest neighbors.
    • 1990 - first Deep Learning application.

    Frameworks:
    frameworks


    Convolutional Neural Networks (CNN)

    • Object detection
      classifies a picture; predicts probability of an object
    • Face recognition
      Is this the correct person? Is this one of the k people in the database?
      Facial recognition was/is widely used by police departments in the United States. Although there have been some wrongful arrests reported.
    • Image segmentation
      segmenting an image into fragments; assigning a label to each of those

    Techniques:

    • Classic
      region-based segmentation; edge detection segmentation; thresholding; clustering
    • Deep learning
      U-Net: encoder-decoder; Mask R-CNN; DeepLab Versions; Interactive segmentation; Meta’s SAM (Segment Anything Model)

    R-CNN and YOLO (You only look once) -> Real-time object detection

    Detects up to several objects in a picture; Predicts probabilities of objects and where they are located.


    • Natural Language Processing (NLP)
      a subfield of artificial intelligence (AI) and computational linguistics focused on enabling computers to understand, interpret, and generate human language in a way that is both meaningful and useful.
    • DevOps
      a set of practices that combine software development (Dev) and IT operations (Ops) to improve collaboration, automate processes, and deliver software more quickly and reliably.
    • MLOps
      a set of practices that combines machine learning (ML) development and IT operations to streamline the deployment, monitoring, and management of ML models in production.

    The change in data distribution is called data drift. It affects the performance of the ML model used in deployment.

    Data drift leads to concept drift (model drift). It is a degradation of ML model performance in production.


    NLP Transformers: Benefits and Drawbacks.

    • highly parallelizable - they can process multiple parts of a sequence at the same time
    • capture long-term dependencies in text

    • high computational demand
    • sensitive to the quality and quantity of the training data

    2022 Leading Large Languages Models (LLMs)
    llms


    Vision Transformer vs. CNN

    • ViT possesses a different kind of bias toward exploring topological relationships between patches, which leads them to be able to capture also global and wider range relations but at the cost of a more onerous training in terms of data.
    • ViT more robust to input image distortions.

    Not so clear winner between CNN and ViT.

    Large Languages Models (LLMs)

    • Image Generation (MidJourney)
    • Audio Generation (Whisper)
    • Search Engines (Neeva)
    • Code Generation (Copilot)
    • Text Generation (Jasper)

    Responsible AI (the only way to mitigate AI risks) - a standard for ensuring that AI is safe, trustworthy and unbiased.

    • 1. Privacy
    • 2. Security and Safety
    • 3. Ethics
    • 4. Fairness
    • 5. Accountability
    • 6. Transparency

    Trustworthy AI - a methodology for the implementation of AI methods in real organizations with fairness, model explainability, accountability in its core.

    The purpose of Artificial Intelligence is to augment (not replace) Human Intelligence.