Embedded Domain-specific Languages in C++



Embedded Domain-specific Languages in C++

0 0


dsel-in-cpp

My talk at the C++ Users Group Dortmund/Bochum: "Domain-specific Languages in C++"

On Github flanfly / dsel-in-cpp

Embedded Domain-specific Languages in C++

Kai Michaelis // @_cibo_

Domain-specific Languages

Graphical Programming (LabVIEW)

Spreadsheet (Excel)

Query Languages (SQL, Datalog, Regexp)

It's all about Abstractions

Function

double sqrt(double x)
{
  double ret = x;

  while(abs(ret * ret - x) X 0.0001)
  {
    ret /= 2;
  }

  return ret;
}

Parameterized Types

template<typename T>
struct linlist
{
  linlist<T> *next, *prev;
  T value;
};

Polymorphism

class operation_base
{
  virtual ~operation_base(void);
  virtual void doit(void) = 0;
};

Metalinguistic Abstraction

std::cout << "Hello, World" << std::end;
std::cout << std::hex << std::setw(8) << 1337 << std::endl;
v.s.
printf("%0.8x\n",1337);
struct custom_type
{
	int a;
	std::string b;

	std::ostream& operator<<(std::ostream& os) const
	{
	  os << a << ": " << b;
	  return os;
	}
};
custom_type ct{1,"one"};

// Prints "1: one" into standard output
std::cout << ct << std::endl;

Overloadable operators in C++

a = b
a += b
a -= b
a *= b
a /= b
a %= b
a &= b
a |= b
a ^= b
a <<= b
a >>= b
+a
-a
~a
a + b
a - b
a * b
a / b
a % b
a & b
a | b
a ^ b
a << b
a >> b
!a
++a
--a
a++
a--
a && b
a || b
a == b
a <= b
a >= b
a < b
a > b
a != b
a[b]
*a
a(a1, a2, ...)
(type)a
a, b

Must return bool

!a
a && b
a || b
a == b
a <= b
a >= b
a < b
a > b
a != b

Must return a Lvalue reference

a = b
a += b
a -= b
a *= b
a /= b
a %= b
a &= b
a |= b
a ^= b
a <<= b
a >>= b
++a
--a
a[b]
*a

Simple example: Logical Implication

a←b
int main(int argc, char** argv)
{
	std::cout << "false <- false = " << (0_b <- 0_b) << std::endl;
	std::cout << "false <- true  = " << (0_b <- 1_b) << std::endl;
	std::cout << "true <- false  = " << (1_b <- 0_b) << std::endl;
	std::cout << "true <- true   = " << (1_b <- 1_b) << std::endl;

	return 0;
}
#include <iostream>

struct impl_bool
{
	impl_bool(bool b) : inner(b) {}
	bool inner;
};

impl_bool operator""_b(unsigned long long b) { return impl_bool(b); }

impl_bool operator-(impl_bool b) { return b; }
impl_bool operator<(impl_bool b1, impl_bool b2)
	{ return !b2.inner || (b1.inner && b2.inner); }
std::ostream& operator<<(std::ostream& os, impl_bool b2)
	{ os << b2.inner; return os; }

Useful example: Datalog

Relational

Name Age Scott Meyers 55 Bjarne Stroustrup 64 Andrei Alexandrescu 45 James Gosling 59

Datalog

Person("Scott Meyers",55).Person("Bjarne Stroustrup",64).Person("Andrei Alexandrescu",45).Person("James Gosling",59).
Answer(X):-Person(X,55).Answer("Scott Meyers").
Answer(X):-Person(X,Y),Y≤55.Answer("Scott Meyers").Answer("Andrei Alexandrescu").
Answer(X):-Person(X,Y),Y≤45.Answer(X):-Person(X,Y),Y≥60.Answer("Bjarne Stroustrup").Answer("Andrei Alexandrescu").
Edge("a","b").Edge("a","c").Edge("b","d").Edge("c","e").Edge("a","e").Edge("a","b").Edge("d","f").Edge("e","f").
Path(X,Y):-Edge(X,Y).Path(X,Z):-Path(X,Y),Edge(Y,Z).

Datalog as DSEL in C++

// Extensional
rel_ptr Edge(new relation());
insert(Edge,"a","b");
// ...

// Intensional
parse Path("Path");
Path("X"_dl,"Y"_dl) << Edge("X"_dl,"Y"_dl);
Path("X"_dl,"Z"_dl) << Path("X"_dl,"Y"_dl),Edge("Y"_dl,"Z"_dl);
struct parse
{
  parse(std::string n);

  template<typename... Tail>
  parse_i operator()(Tail&&... tail)
  {
    std::vector<variable> vars;
    fill(vars,tail...);
    return parse_i(*this,vars);
  }

  std::string name;
  std::vector<rule_ptr> rules;
};
Path("X"_dl,"Z"_dl) << Path("X"_dl,"Y"_dl),Edge("Y"_dl,"Z"_dl);
parse_i << parse_i,parse_i;
parse_h operator,(parse_h h, parse_i i);
parse_h operator<<(parse_i lhs, parse_i rhs);
parse_i << parse_i,parse_i;
parse_h,parse_i;
parse_h;
std::map<std::string,rel_ptr> edb;
std::multimap<std::string,rule_ptr> idb;

std::for_each(Path.rules.begin(),Path.rules.end(),[&](rule_ptr r)
  { idb.insert(std::make_pair(r->head.name,r)); });
edb.insert(std::make_pair("Edge",Edge));

rel_ptr res = eval("Path",idb,edb);

Practical example: Disassembler Framework

AMD64 a.k.a Intel64 a.k.a x86_64

  • Compatible to 30 year old 8086/8088 assembly.
  • Endless instruction set extensions: MMX, SSE1-4 AVX1/2, SMX...
  • Multiple modes and privilege levels.
  • Registers were extended from 16 bits to 32 bits to 64 bits over time.
main[ *generic_prfx >>                 0x14 >> imm8  ]
main[ *generic_prfx >> opsize_prfx >>  0x15 >> imm16 ]
main[ *generic_prfx >>                 0x15 >> imm32 ]
main[ *generic_prfx >> rexw_prfx >>    0x15 >> imm32 ]
main[ *generic_prfx >>                 0x80 >> rm8_2 >> imm8 ]
main[ *generic_prfx >> rex_prfx >>     0x80 >> rm8_2 >> imm8 ]
imm8 [ "imm@........"_e] = [](sm& st)
{
	st.state.imm = constant(st.capture_groups.at("imm"));
};
imm16[ imm8 >> "imm@........"_e] = [](sm& st)
{
	st.state.imm = constant(be16toh(st.capture_groups.at("imm")));
};
imm32[ imm16 >> "imm@........"_e >> "imm@........"_e] = [](sm& st)
{
	st.state.imm = constant(be32toh(st.capture_groups.at("imm")));
};

Positive:

  • Nearly one-to-one relationship between Intel spec and disassembler code.
  • Easier maintenance and debugging.
  • Simple addition of edge cases.

Negative:

  • Large amount of code compared to conventional implementations.
  • Not easily understood.
  • Compared to an "real" DSL the code is still tied to C++.

Other DSEL in C++

  • Xpressive: Regular expressions embedded in C++ http://www.boost.org/doc/libs/1_55_0/doc/html/xpressive/user_s_guide.html
  • Spirit Parser Framework: DSEL to specify type 2 grammars in C++ http://boost-spirit.com/home/