AI4EOSC platform introduction
General introduction to Artificial Intelligence and Machine Learning. - Prof. Giang Nguyen (IISAS)
Machine Learning can be divided between unsupervised learning, supervised learning and reinforcement learning. AI can be used in a lot of different areas like politics, transportation, education and so on.
Recent years have been a very developmental period for AI with Chat GPT being one of the most commonly used tool by people.
The best way to approach Machine Learning solution is to start by analyzing and exploring the dataset in EDA.
One of the common issues found in datasets is imbalance classes issue.
Data imbalance usually reflects an unequal distribution of classes within a dataset.
Binary classification models without fixing this problem will be very biased.
- Hadoop
gradual disappearance of the Big Data technology
- Undersampling
Randomly deleting some of the observations from the majority class in order to match the numbers with the minority class. - Oversampling
Synthetic Minority Over-sampling Technique (SMOTE) - looks at the feature space for the minority class data points and considers its k nearest neighbors. - 1990 - first Deep Learning application.
Frameworks:
Convolutional Neural Networks (CNN)
- Object detection
classifies a picture; predicts probability of an object - Face recognition
Is this the correct person? Is this one of the k people in the database?
Facial recognition was/is widely used by police departments in the United States. Although there have been some wrongful arrests reported. - Image segmentation
segmenting an image into fragments; assigning a label to each of those
Techniques:
- Classic
region-based segmentation; edge detection segmentation; thresholding; clustering - Deep learning
U-Net: encoder-decoder; Mask R-CNN; DeepLab Versions; Interactive segmentation; Meta’s SAM (Segment Anything Model)
R-CNN and YOLO (You only look once) -> Real-time object detection
Detects up to several objects in a picture; Predicts probabilities of objects and where they are located.
- Natural Language Processing (NLP)
a subfield of artificial intelligence (AI) and computational linguistics focused on enabling computers to understand, interpret, and generate human language in a way that is both meaningful and useful. - DevOps
a set of practices that combine software development (Dev) and IT operations (Ops) to improve collaboration, automate processes, and deliver software more quickly and reliably. - MLOps
a set of practices that combines machine learning (ML) development and IT operations to streamline the deployment, monitoring, and management of ML models in production.
The change in data distribution is called data drift. It affects the performance of the ML model used in deployment.
Data drift leads to concept drift (model drift). It is a degradation of ML model performance in production.
NLP Transformers: Benefits and Drawbacks.
- highly parallelizable - they can process multiple parts of a sequence at the same time
- capture long-term dependencies in text
- high computational demand
- sensitive to the quality and quantity of the training data
2022 Leading Large Languages Models (LLMs)
Vision Transformer vs. CNN
- ViT possesses a different kind of bias toward exploring topological relationships between patches, which leads them to be able to capture also global and wider range relations but at the cost of a more onerous training in terms of data.
- ViT more robust to input image distortions.
Not so clear winner between CNN and ViT.
Large Languages Models (LLMs)
- Image Generation (MidJourney)
- Audio Generation (Whisper)
- Search Engines (Neeva)
- Code Generation (Copilot)
- Text Generation (Jasper)
Responsible AI (the only way to mitigate AI risks) - a standard for ensuring that AI is safe, trustworthy and unbiased.
- 1. Privacy
- 2. Security and Safety
- 3. Ethics
- 4. Fairness
- 5. Accountability
- 6. Transparency
Trustworthy AI - a methodology for the implementation of AI methods in real organizations with fairness, model explainability, accountability in its core.