new_langs_and_paradigms_in_hpc



new_langs_and_paradigms_in_hpc

0 0


new_langs_and_paradigms_in_hpc

Talk about new programming languages and paradigms in hpc for a seminar at uni hh

On Github Ahti / new_langs_and_paradigms_in_hpc

New programming languages and paradigms in HPC

Seminar „Neueste Trends im Hochleistungsrechnen“

Lukas Stabe / 2015-12-07

Structure

Introduction Definitions Advantages of new languages/paradigms Problems Examples
  • SciPy
  • Rust
  • Swift/T
  • OpenMP 4
Conclusion

Definitions: Language

  • „A programming language is a formal constructed language designed to communicate instructions to a machine, particularly a computer.“ – Wikipedia
  • A programming language defines how you tell the computer to do something
  • Languages are closely related to their standard library
    • Boundaries are often unclear
So a language is what specifies: "you write X and the computer does Y"

Definitions: Paradigm

  • „A programming paradigm is a fundamental style of computer programming, serving as a way of building the structure and elements of computer programs.“ – Wikipedia
  • Describes a way to approach problems
  • Defines common patterns
  • Often explicitly forbids some anti-patterns
- Anti-patterns are usage/programming patterns that are deemed bad practice - Agent-oriented - Automata-based - Data-driven - Declarative - Dataflow - Reactive - Functional - Logic - Imperative / Procedural - Inductive programming - Natural language programming - Object-oriented (OOP)

Definitions: Relation

  • „Capabilities and styles of various programming languages are defined by their supported programming paradigms; some programming languages are designed to follow only one paradigm, while others support multiple paradigms.“ – Wikipedia
  • Most languages support a mix of paradigms
  • Standard library may be written with a concrete paradigm in mind
- C++: object-oriented & imperative - C: procedural, but ppl have built class systems - C standard library (part of standard) often only returns status code from functions, returns actual data by reference (modifying state)

Advantages of new languages/paradigms

  • Simplify development
  • Fewer kinds of errors possible
  • Produces easier-to-maintain code
    • Easier to write (in a good/idiomatic manner) for inexperienced programmers
    • This is a result of the community surrounding the language
    • Unit-testing
    • Documentation
  • Better utilize available resources
- trusty old C/Fortran, why something else? - concrete examples later - simplify development: no complexities of e.g. c (pointers) or simplify things like ipc (mpi) - fewer kinds of errors: type system, memory-safety - easier to write: lots of inexperienced programmers in hpc (scientists) - easier-to-maintain: important in hpc, because scientists write code, cluster operators maintain - available resources: accelerators, vector units

Problems

  • A large existing codebase of C/Fortran code
  • Smaller ecosystem of libraries/tools (esp. related to HPC)
  • Huge expertise of experienced programmers
  • C/Fortran compilers have been worked on for decades, so they can optimize code extremely well
- existing codebase: c ffi - smaller ecosystem (eg valgrind): c ffi - expertise: important bc in hpc we need performance - optimizations: compile down to c?

Example: SciPy

  • Python library
  • Wraps compiled Fortran and C code
  • Write program flow and high-level structure in Python
  • Keep hotspots in compiled code
  • Near-native performance

Example: Rust

  • Compiled low-level language
  • Strong type and generics system with type inference
  • Guarantees memory safety
  • Thread-safety
  • MPI bindings in development
- thread safety due to only one reference to data which may write - no conclusive performance benchmarks - depends on algorithm, implementation, ... - seems like it can be as fast as c in some cases

Example: Rust

fn main() {
    // A simple integer calculator:
    // `+` or `-` means add or subtract by 1
    // `*` or `/` means multiply or divide by 2
    let program = "+ + * - /";
    let mut accumulator = 0;

    for token in program.chars() {
        match token {
            '+' => accumulator += 1,
            '-' => accumulator -= 1,
            '*' => accumulator *= 2,
            '/' => accumulator /= 2,
            _ => { /* ignore everything else */ }
        }
    }
}
- point out that there is not a single type explicitly written down, yet everything is statically typed

Example: Rust

extern crate mpi;

use mpi::traits::*;

fn main() {
    let universe = mpi::initialize().unwrap();
    let world = universe.world();
    let size = world.size();
    let rank = world.rank();

    if size != 2 {
        panic!("Size of MPI_COMM_WORLD must be 2, but is {}!", size);
    }

    match rank {
        0 => {
            let msg = vec![4.0f64, 8.0, 15.0];
            world.process_at_rank(rank + 1).send(&msg[..]);
        }
        1 => {
            let (msg, status) = world.receive_vec::<f64>();
            println!("Process {} got message {:?}.\nStatus is: {:?}",
                rank, msg, status);
        }
        _ => unreachable!()
    }
}
- conclusion: rust might not be much easier, but it prevents whole classes of errors from being made

Example: Swift/T

  • Swift script translates into MPI program
  • Calls leaf tasks written in C, C++, Fortran, Python, R, Tcl, Julia, Qt Script, or executable programs
  • Coordinates data flow between leaf tasks
  • Executes leaf tasks concurrently where possible
- remember when I said "simplify ipc"? this is it - okay so swift scripts look a lot like c, but each function call is actually executing a leaf node, and those are ...

Example: Swift/T

int X = 100, Y = 100;
int A[][];
int B[];
foreach x in [0:X-1] {
  foreach y in [0:Y-1] {
    if (check(x, y)) {
      A[x][y] = g(f(x), f(y));
    } else {
      A[x][y] = 0;
    }
  }
  B[x] = sum(A[x]);
}

- blue dots are tasks that are spawned - full arrows represent data flow

Example: OpenMP 4

  • Compiler directives on top of C, C++ and Fortran
  • Interesting new features in version 4
    • SIMD directive
      • Uses vector units like AVX/SSE and NEON to do multiple numeric operations in parallel on one core
      • Works combined with omp parallel
    • TARGET directive
      • Runs code on accelerators
      • transfers in- and output data back and forth
- you probably all know OpenMP - explain vector instructions - Oliver already talked about accelerators in-depth

Example: OpenMP 4

void vadd_openmp(float *a, float *b, float *c, int len)
{
    #pragma omp target map(to:a[0:len],b[0:len],len) map(from:c[0:len])
    {
        int i;
        #pragma omp parallel for
        for (i = 0; i < size; i++)
            c[i] = a[i] + b[i];

    }
}
- in the mapping: to moves data to the accelerator, from moves data back

Conclusion

  • New languages and paradigms can provide big benefits
    • Easier development
    • Easier-to-maintain code
    • Utilize new types of hardware
  • They need to overcome some significant challenges
    • Large existing codebase/ecosystem
    • Raw speed
  • Nothing can replace C/C++/Fortran right now
    • Rust looks promising

Sources

1/17
New programming languages and paradigms in HPC Seminar „Neueste Trends im Hochleistungsrechnen“ Lukas Stabe / 2015-12-07