Gustavo Chávez, Ph.D.

Gustavo Chávez, Ph.D.

San Francisco Bay Area
1K followers 500+ connections

About

Earlier this year, I joined Medium to lead the data science team.

Previously, I…

Education

Publications

  • Robust and Accurate Stopping Criteria for Adaptive Randomized Sampling In Matrix-Free HSS Construction

    SIAM Journal on Scientific Computing (SISC)

    We present new algorithms for randomized construction of hierarchically semi-separable matrices, addressing several practical issues. The HSS construction algorithms use a partially matrix-free, adaptive randomized projection scheme to determine the maximum off-diagonal block rank. We develop both relative and absolute stopping criteria to determine the minimum dimension of the random projection matrix that is sufficient for the desired accuracy. Two strategies are discussed to adaptively…

    We present new algorithms for randomized construction of hierarchically semi-separable matrices, addressing several practical issues. The HSS construction algorithms use a partially matrix-free, adaptive randomized projection scheme to determine the maximum off-diagonal block rank. We develop both relative and absolute stopping criteria to determine the minimum dimension of the random projection matrix that is sufficient for the desired accuracy. Two strategies are discussed to adaptively enlarge the random sample matrix: repeated doubling of the number of random vectors, and iteratively incrementing the number of random vectors by a fixed number. The relative and absolute stopping criteria are based on probabilistic bounds for the Frobenius norm of the random projection of the Hankel blocks of the input matrix. We discuss parallel implementation and computation and communication cost of both variants. Parallel numerical results for a range of applications, including boundary element method matrices and quantum chemistry Toeplitz matrices, show the effectiveness, scalability and numerical robustness of the proposed algorithms.

    See publication
  • A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression

    2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

    We present memory-efficient and scalable algorithms for kernel methods used in machine learning. Using hierarchical matrix approximations for the kernel matrix the memory requirements, the number of floating point operations, and the execution time are drastically reduced compared to standard dense linear algebra routines. We consider both the general H matrix hierarchical format as well as Hierarchically Semi-Separable (HSS) matrices. Furthermore, we investigate the impact of several…

    We present memory-efficient and scalable algorithms for kernel methods used in machine learning. Using hierarchical matrix approximations for the kernel matrix the memory requirements, the number of floating point operations, and the execution time are drastically reduced compared to standard dense linear algebra routines. We consider both the general H matrix hierarchical format as well as Hierarchically Semi-Separable (HSS) matrices. Furthermore, we investigate the impact of several preprocessing and clustering techniques on the hierarchical matrix compression. Effective clustering of the input leads to a ten-fold increase in efficiency of the compression. The algorithms are implemented using the STRUMPACK solver library. These results confirm that — with correct tuning of the hyperparameters — classification using kernel ridge regression with the compressed matrix does not lose prediction accuracy compared to the exact — not compressed — kernel matrix and that our approach can be extended to O(1M) datasets, for which computation with the full kernel matrix becomes prohibitively expensive. We present numerical experiments in a distributed memory environment up to 1,024 processors of the NERSC's Cori supercomputer using well-known datasets to the machine learning community that range from dimension 8 up to 784.

    See publication
  • Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients

    Elsevier Journal of Computational Applied Mathematics

    We present a robust and scalable preconditioner for the solution of large-scale linear systems that arise from the discretization of elliptic PDEs amenable to rank compression. The preconditioner is based on hierarchical low-rank approximations and the cyclic reduction method. The setup and application phases of the preconditioner achieve log-linear complexity in memory footprint and number of operations, and numerical experiments exhibit good weak and strong scalability at large processor…

    We present a robust and scalable preconditioner for the solution of large-scale linear systems that arise from the discretization of elliptic PDEs amenable to rank compression. The preconditioner is based on hierarchical low-rank approximations and the cyclic reduction method. The setup and application phases of the preconditioner achieve log-linear complexity in memory footprint and number of operations, and numerical experiments exhibit good weak and strong scalability at large processor counts in a distributed memory environment. Numerical experiments with linear systems that feature symmetry and nonsymmetry, definiteness and indefiniteness, constant and variable coefficients demonstrate the preconditioner applicability and robustness. Furthermore, it is possible to control the number of iterations via the accuracy threshold of the hierarchical matrix approximations and their arithmetic operations, and the tuning of the admissibility condition parameter. Together, these parameters allow for optimization of the memory requirements and performance of the preconditioner.

    See publication
  • Robust and scalable hierarchical matrix-based fast direct solver and preconditioner for the numerical solution of elliptic partial differential equations

    KAUST Ph.D. Thesis

    This dissertation introduces a novel fast direct solver and preconditioner for the solution of block tridiagonal linear systems that arise from the discretization of elliptic partial differential equations on a Cartesian product mesh, such as the variable-coefficient Poisson equation, the convection-diffusion equation, and the wave Helmholtz equation in heterogeneous media.
    The algorithm extends the traditional cyclic reduction method with hierarchical matrix techniques. The resulting…

    This dissertation introduces a novel fast direct solver and preconditioner for the solution of block tridiagonal linear systems that arise from the discretization of elliptic partial differential equations on a Cartesian product mesh, such as the variable-coefficient Poisson equation, the convection-diffusion equation, and the wave Helmholtz equation in heterogeneous media.
    The algorithm extends the traditional cyclic reduction method with hierarchical matrix techniques. The resulting method exposes substantial concurrency, and its arithmetic operations and memory consumption grow only log-linearly with problem size, assuming bounded rank of off-diagonal matrix blocks, even for problems with arbitrary coefficient structure. The method can be used as a standalone direct solver with tunable accuracy, or as a black-box preconditioner in conjunction with Krylov methods.
    The challenges that distinguish this work from other thrusts in this active field are the hybrid distributed-shared parallelism that can demonstrate the algorithm at large-scale, full three-dimensionality, and the three stressors of the current state-of-the-art multigrid technology: high wavenumber Helmholtz (indefiniteness), high Reynolds convection (nonsymmetry), and high contrast diffusion (inhomogeneity).
    Numerical experiments corroborate the robustness, accuracy, and complexity claims and provide a baseline of the performance and memory footprint by comparisons with competing approaches such as the multigrid solver hypre, and the STRUMPACK implementation of the multifrontal factorization with hierarchically semi-separable matrices. The companion implementation can utilize many thousands of cores of Shaheen, KAUST's Haswell-based Cray XC-40 supercomputer, and compares favorably with other implementations of hierarchical solvers in terms of time-to-solution and memory consumption.

    See publication
  • Accelerated Cyclic Reduction: A distributed-memory fast solver for structured linear systems

    Elsevier Journal of Parallel Computing

    We present Accelerated Cyclic Reduction (ACR), a distributed-memory fast solver for rank-compressible block tridiagonal linear systems arising from the discretization of elliptic operators, developed here for three dimensions. Algorithmic synergies between Cyclic Reduction and hierarchical matrix arithmetic operations result in a solver that has arithmetic complexity and O(k Nlog N) memory footprint, where N is the number of degrees of freedom and k is the rank of a block in the hierarchical…

    We present Accelerated Cyclic Reduction (ACR), a distributed-memory fast solver for rank-compressible block tridiagonal linear systems arising from the discretization of elliptic operators, developed here for three dimensions. Algorithmic synergies between Cyclic Reduction and hierarchical matrix arithmetic operations result in a solver that has arithmetic complexity and O(k Nlog N) memory footprint, where N is the number of degrees of freedom and k is the rank of a block in the hierarchical approximation, and which exhibits substantial concurrency. We provide a baseline for performance and applicability by comparing with the multifrontal method with and without hierarchical semi-separable matrices, with algebraic multigrid and with the classic cyclic reduction method. Over a set of large-scale elliptic systems with features of nonsymmetry and indefiniteness, the robustness of the direct solvers extends beyond that of the multigrid solver, and relative to the multifrontal approach ACR has lower or comparable execution time and size of the factors, with substantially lower numerical ranks. ACR exhibits good strong and weak scaling in a distributed context and, as with any direct solver, is advantageous for problems that require the solution of multiple right-hand sides. Numerical experiments show that the rank k patterns are of O(1) for the Poisson equation and of O(n) for the indefinite Helmholtz equation. The solver is ideal in situations where low-accuracy solutions are sufficient, or otherwise as a preconditioner within an iterative method.

    See publication
  • A Direct Elliptic Solver Based on Hierarchically Low-Rank Schur Complements

    Proceedings of the Domain Decomposition Methods in Science and Engineering XXIII

    A parallel fast direct solver for rank-compressible block tridiagonal linear systems is presented. Algorithmic synergies between Cyclic Reduction and Hierarchical matrix arithmetic operations result in a solver with O(Nlog2N) arithmetic complexity and O(NlogN) memory footprint. We provide a baseline for performance and applicability by comparing with well-known implementations of the H-LU factorization and algebraic multigrid within a shared-memory parallel environment that leverages the…

    A parallel fast direct solver for rank-compressible block tridiagonal linear systems is presented. Algorithmic synergies between Cyclic Reduction and Hierarchical matrix arithmetic operations result in a solver with O(Nlog2N) arithmetic complexity and O(NlogN) memory footprint. We provide a baseline for performance and applicability by comparing with well-known implementations of the H-LU factorization and algebraic multigrid within a shared-memory parallel environment that leverages the concurrency features of the method. Numerical experiments reveal that this method is comparable with other fast direct solvers based on Hierarchical Matrices such as H-LU and that it can tackle problems where algebraic multigrid fails to converge.

    See publication
  • Marching surfaces: Isosurface approximation using G1 multi-sided surfaces

    Marching surfaces is a method for isosurface extraction and approximation based on a G1 multi-sided patch interpolation scheme. Given a 3D grid of scalar values, an underlying curve network is formed using second order information and cubic Hermite splines. Circular arc fitting defines the tangent vectors for the Hermite curves at specified isovalues. Once the boundary curve network is formed, a loop of curves is determined for each grid cell and then interpolated with multi-sided surface…

    Marching surfaces is a method for isosurface extraction and approximation based on a G1 multi-sided patch interpolation scheme. Given a 3D grid of scalar values, an underlying curve network is formed using second order information and cubic Hermite splines. Circular arc fitting defines the tangent vectors for the Hermite curves at specified isovalues. Once the boundary curve network is formed, a loop of curves is determined for each grid cell and then interpolated with multi-sided surface patches, which are G1 continuous at the joins. The data economy of the method and its continuity preserving properties provide an effective compression scheme, ideal for indirect volume rendering on mobile devices, or collaborating on the Internet, while enhancing visual fidelity. The use of multi-sided patches enables a more natural way to approximate the isosurfaces than using a fixed number of sides or polygons as is proposed in the literature. This assertion is supported with comparisons to the traditional Marching Cubes algorithm and other G1 methods.

    See publication
  • Lightweight visualization for high-quality materials on WebGL

    Proceedings of the 18th International Conference on 3D Web Technology

    We propose an architecture for lightweight visualization of high-quality 3D objects based on data compression, data streaming, virtual texturing and WebGL. Our method retains visual fidelity of the original scene, improves loading time and maintains real-time rendering speed. We assume that the user is restricted to low-performance GPUs and slow Internet connections (1 megabit per second or lower). For geometry compression, we use entropy-encoding techniques that achieve up to 95% storage…

    We propose an architecture for lightweight visualization of high-quality 3D objects based on data compression, data streaming, virtual texturing and WebGL. Our method retains visual fidelity of the original scene, improves loading time and maintains real-time rendering speed. We assume that the user is restricted to low-performance GPUs and slow Internet connections (1 megabit per second or lower). For geometry compression, we use entropy-encoding techniques that achieve up to 95% storage savings. Textures are stored as sets of tiles which feeds the virtual texturing engine. With use of the Crunch library, tiles are compressed with results similar to JPEG but with much faster transcoding to DXT on the GPU. The initial 27.7MB dataset takes an average of 5 minutes to load. Our approach, takes less than 5 seconds on average. A wide range of applications benefit from our architecture such as e-commerce, cultural heritage, virtual worlds, videogames, and scientific visualization, among others.

    See publication

Courses

  • Stanford ICME: Deep Learning

    Summer 2018

  • Stanford ICME: Deep Learning for Natural Language Processing

    Summer 2018

  • Stanford ICME: Machine learning

    Summer 2018

  • Stanford ICME: R for Statistical Computing

    Summer 2018

  • Stanford ICME: Statistics

    Summer 2018

  • Stanford ICME: StatisticsD3.js Data-Driven Documents

    Summer 2018

View Gustavo’s full profile

  • See who you know in common
  • Get introduced
  • Contact Gustavo directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More