PhD Thesis Introduction – Extracting A Knowledge Base a.k.a. Traces – Formal Modeling



PhD Thesis Introduction – Extracting A Knowledge Base a.k.a. Traces – Formal Modeling

0 3


phd-intro-slides

[PhD] PhD Thesis Introduction

On Github willdurand / phd-intro-slides

PhD Thesis Introduction

William Durand - May 31, 2013

PhD Topic

Automated Test Generation for applications and production machines in a Model-based Testing approach.

PhD Topic

Automated Test Generation for applications and production machines in a Model-based Testing approach.

PhD Topic

Automated Test Generation for applications and production machines in a Model-based Testing approach.

In Other Words

Based on a software, running in aproduction environment, would it possible to:

extract a knowledge base that can be formalized by a model that can be used to generate tests and/or specifications?

World Domination Plan

  • January 2013 - June: reading, creating a snapshot of all Level 2 applications;
  • June - September: extracting traces from existing applications;
  • September - March 2014: generating partial models;
  • March - February 2015: performing robustness testing on applications based on partial models;
  • February - August 2016: performing conformance testing on applications based on partial models with reuse of tests generated in the previous step;
  • August - December 2016: results analysis, the end!

Extracting A Knowledge Base a.k.a. Traces

Context (1/2)

Michelin has many applications spread at different levels:

  • L4: Business Software
  • L3: Virtual level as it is not that used (Factory Management)
  • L2: Supervision / Workshop Management
  • L1: Automata

These levels can exchange data among them.

Context (2/2)

Focus on Level 2 applications but, then again,there are a lot of differences between them, such as:

  • Programming Language
  • Framework
  • Design
  • Version

Hypotheses

Applications deployed in production behave as expected Don't consider (existing) specifications

So?

From a Level 2 perspective, the best way to create an agnostic knowledge base is to use a black-box approach.

How?

The idea is to monitor a running application, and to record incoming/outgoing data. That is what we call traces.

The second step is to leverage these traces in order to create a Formal Model.

Formal Modeling

Formal Model

A formal model of a system is a mathematical model of it, at some chosen level of abstraction.

Its purpose is to permit precise understanding, specification, and analysis of the system.

Why?

It is possible to verify the system's properties in a more thorough fashion than empirical testing.

The metamodels used by most formal methods are often limited in order to enhance provability.

How?

An expert system can produce a model from given traces.

Expert System

In Artificial Intelligence (AI), an expert system is a computer system that emulates the decision-making ability of a human expert. Designed to solve complex problems by reasoning about knowledge, and not by following the procedure of a developer as is the case in conventional programming.

How?

An expert system can own a set of inference rules.

First-order predicate calculus is a way to write these rules.

Once we obtain a decent model, we can use it to generate tests and specifications.

Generating Tests And Specifications

Which Tests?

Only functional tests will be generated.

A formal model gives us the ability to test the robustness of the applications as well as their conformance.

Which Specifications?

No real answer yet, some ideas though:

  • Functional Specification
  • Use Case
  • User Story
  • API documentation

Work In Progress

Web Applications Testing Based On HTTP Traces

It is the same thing but... different.

Why?

The need for an overview of the job.

Detecting potential issues before building the real things.

Extracting A Knowledge Base a.k.a. Traces

HTTP Archive (HAR), an archival format for HTTP transactions that are captured by HTTP sniffers.

It is possible to either record traces in a web browser, or create a proxy that records all the traffic (not yet done).

This behaves exactly like the black-box approach described before. Also, HTTP traces are understandable by a domain expert, and easily usable.

{
  "request": {
    "method": "GET",
    "url": "https://github.com/",
    "httpVersion": "HTTP/1.1",
    "headers": [],
    "queryString": [],
    "cookies": []
  },
  "response": {
    "status": 200,
    "statusText": "OK",
    "headers": [],
    "cookies": [],
    "content": {}
  }
}

Filtering Traces

First, we have to remove noise from the set of traces as everything is recorded, but not always relevant.

Sanitized Traces

Formal Modeling

Well, not so formal by now...

Example of First-Order Predicate Calculus

IF
    "request.method" IS POST
    AND "request.data" CONTAINS A USERNAME
    AND "request.data" CONTAINS A PASSWORD
    AND "previous_request" WAS A LOGIN PAGE
THEN
    // this is a login step
ENDIF

First-Order Logic: AND, OR, NOT

Predicates: CONTAINS A USERNAME, CONTAINS A PASSWORD, WAS A LOGIN PAGE

Thanks to this inference rule, the two "steps" highlighted in red are combined into a single one named Login.

Result Model

This model shows a real, and meaningful user scenario.

Generating Tests And Specifications

This model allows us to generate code, however more rules have to be defined in order to produce assertions.

Assertion

A confident and forceful statement of fact or belief.

In testing, it is used to check a property in a test method:

class RepoTest
{
    [Test]
    public void TestStatusIsUnknownByDefault()
    {
        Repo repository = new Repo();
        Assert.That(repository.Status, Is.EqualTo(Status.Unknown));
    }
}

Code Generation (WIP)

namespace Generated;

class MyTest extends FunctionalTestCase
{
    public function test0()
    {
        $crawler  = $this->cli->request('GET', 'https://github.com/');
        $response = $this->cli->getResponse();

        // assertions here
    }
}

What About Generating A Specification?

At this time, it is only possible to retrieve scenarios.

Retrieving use cases may be possible by using a combination of inference rules and language recognition.

This has not been explored though.

Perspectives

  • Consolidating the abstract model by exploring new paths, and gathering sets of traces.
  • Reusing these principles for Michelin's applications.

Thank You.

Questions?

William Durand - PhD student at Michelin/LIMOS