Area of expertise | Morphologie mathématique |
Doctoral School | ISMME - Ingénierie des Systèmes, Matériaux, Mécanique, Énergétique |
Supervisor | DOKLADAL Petr |
Research unit | Mathématiques et Systèmes |
Keywords | Deep learning, Frugal AI |
Abstract | Artificial Intelligence (AI) tools have become ubiquitous in today's society. At the same time, the impact of AI on the environment has become non-negligible because of its carbon footprint. A frugal, rather than data hungry, AI can improve efficiency, thus addressing a significant challenge given the widespread use of machine learning. A less data-intensive algorithm would consume less energy, but the search for frugality goes even further.
We address the above flaw via the development of perceptually based models. Like the human visual system, these models are specifically sensitive to perceptually pertinent features, such as textures, object contours, and their spatial arrangements. Research in perception has a long history. The links between perception and image processing started with detecting perceptually meaningful events. According to a long-standing principle in sensory processing, every large image deviation from “uniform noise” should be perceptible, provided this large deviation corresponds to an a priori fixed list of geometric structures (lines, curves, closed curves, convex sets, spots, local groups). Desolneux, et al. [1] explored the connection between this principle and image processing in a probabilistic setting for the detection of perceptual contours in natural images. A link between this probabilistic approach and Mathematical Morphology has been proposed by Dokladal [2] to detect cracks in materials. AI models too can be constrained to be sensitive to perceptually significant primitives, like lines or edges. This notion, however, runs counter to mainstream views that constrained models cannot match the performance of unconstrained models. However, at the negligible cost of small score reductions, one can obtain interesting properties when a model is tuned/constrained to target some desirable function. For example, the incorporation of modules inspired by biology has been shown to confer robustness to deep networks [3]. Further attempts at constraining AI models to perceptually significant features, developed in an effort to obtain invariance to rotation [4][5], produced interesting results in terms of 1) size of the model and 2) computational requirements. Visual processing of simple image elements (such as lines and edges) does not happen inside a cognitive vacuum: it may differ when those simple elements are embedded within natural scenes that look more like what we see every day, as opposed to the featureless backgrounds that are normally used in the laboratory. We know a good amount about the mechanisms that support vision in a simple setup (i.e., involving a simple stimulus with no natural meaningful content). We know virtually nothing about how those mechanisms may change and/or be augmented/replaced by new mechanisms under conditions that are closer to natural vision (i.e., when the image starts making sense and contains recognizable objects). In this thesis we will study how visual primitives (lines, edges, junctions) interact and how their spatial relations could be used by an AI model to efficiently use the semantic information [6] in the image to recognize objects and scenes. The features sensitive to these primitives will be fitted to data via learning. A promising tool for efficient encoding of spatial arrangements and relations are graph convolutional networks (GCN), introduced by Bruna et al. [7] and developed later by Kipf and Welling [8] to architecture that later became known as GCN. Since [8], the graph topology understanding remained on the level of immediate neighbors until Zhu et al. [9] proposed the H2GCN to encode a high-order network information from middle layers, and Qian et al. [10] explored that the performance of GCNs is related to the alignment among features, graph, and ground truth. Recently Wang et al. [11] proposed to integrate the graph motif-structure information into the convolution operation of each layer. The potential benefits of encoding the geometry and topology of perceptually significant primitives from image into a graph are significant. Indeed, a perceptually based model will not only be more efficient in terms of computational requirements, but it will also become data frugal, faster to train, and more robust to adversarial attacks. Such models will pave the way for a sustainable future with energy efficient, environmentally friendly AI. |
Profile | finished M2 programme,
AI coding skills (tensorflow, pytorch), excellent academic record |
Funding | Autre type de financement - |
©2009 Mines ParisTech
|