Interprocess Communication

Pierre Rioux

ACE lab Developer MeetingJune 2016

Table of content

Prologue: what are processes?
IPC mechanisms:
- signals
- environment variables
- files
- pipes
- fifos
- sockets
Epilogue: Quiz!

About processes

The obvious stuff: a UNIX process is:

an area of a computer's memory
where the processor executes some instructions
and manages some data

Processes attributes 1

Processes have these attributes:

a unique ID (called pid, 1 to 65535)
a UNIX owner
a UNIX group
various time accounting

These things aren't that important for this discussion.

Processes attributes 2

The operating system manages all the processes.

processes are organized in a tree
each process has a parent process
so each process also as a parent process ID (ppid)

A tree of processes

Some important attributes

All processes have these three things which will be important:

environment variables
signal handlers
file descriptors

These are the essential components of IPC.

How processes are created

Surprisingly, the operating system has no mechanism for creating new processes from scratch.

A process can only be created by another process...

... by making a perfect copy of itself.

This is also how the tree structure emerges. The copy becomes a child of the original process.

How processes are destroyed

They can simply finish their job
They can be terminated by the operating system or by other processes
And then the tree structure gets adjusted (no details here)

What IPC is about

This presentation is about the general concept of how two processes can send information to each other.

Two IPC situations...

Two processes on the same computer

Two processes on the different computers

One at a time!

Wait! What if the two processes run separately:

one after the other...
never at the same time...

... there is only one way they can send information to each other. Can you guess how?

By creating files.

Signals

Processes can send and receive signals.

They are a sort of ultra-simple message.

The only content is the type of signal, a simple number.

the types are encoded as a number from 1 to N
MacOS X (BSD-like) has about 30 signals
Linux has about 60 signals
they act like very quick pings
for convenience, short names are given to each type

E.g: Signal #11 is called SIGSEGV.

List of signals

You can get a full list of signals using the kill -l command:

macosx% kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL
 5) SIGTRAP      6) SIGABRT      7) SIGEMT       8) SIGFPE
 9) SIGKILL     10) SIGBUS      11) SIGSEGV     12) SIGSYS
13) SIGPIPE     14) SIGALRM     15) SIGTERM     16) SIGURG
17) SIGSTOP     18) SIGTSTP     19) SIGCONT     20) SIGCHLD
21) SIGTTIN     22) SIGTTOU     23) SIGIO       24) SIGXCPU
25) SIGXFSZ     26) SIGVTALRM   27) SIGPROF     28) SIGWINCH
29) SIGINFO     30) SIGUSR1     31) SIGUSR2

Signals On OS X

NAME            Default Action          Description
SIGHUP          terminate process       terminal line hangup
SIGINT          terminate process       interrupt program
SIGQUIT         create core image       quit program
SIGILL          create core image       illegal instruction
SIGTRAP         create core image       trace trap
SIGABRT         create core image       abort(3) call (formerly SIGIOT)
SIGEMT          create core image       emulate instruction executed
SIGFPE          create core image       floating-point exception
SIGKILL         terminate process       kill program
SIGBUS          create core image       bus error
SIGSEGV         create core image       segmentation violation

... etc...

Ref: sigvec(2)

Signals on Linux

Signal     Value     Action   Comment
SIGHUP        1       Term    Hangup detected on controlling terminal
                              or death of controlling process
SIGINT        2       Term    Interrupt from keyboard
SIGQUIT       3       Core    Quit from keyboard
SIGILL        4       Core    Illegal Instruction
SIGABRT       6       Core    Abort signal from abort(3)
SIGFPE        8       Core    Floating point exception
SIGKILL       9       Term    Kill signal
SIGSEGV      11       Core    Invalid memory reference

... etc...

Ref: signal(7)

Sending signals 1

Signals are sent from one process to another on the same computer.

To send a signal you need two values:

the number for the type of signal, e.g. 11
(though it's better to use constants, e.g. SIGSEGV instead)
the process ID (pid) of the receiver
most languages have a simple method or function that takes these two values and send the signal

Sending signals 2

These examples will all send the signal #15, SIGTERM to a process with pid 1234 :

Bash:
```
kill -TERM 1234 # same as kill -15 1234
```

Perl:

kill(SIGTERM,1234); # SIGTERM is a constant

PHP:
```
posix_kill(1234, SIGTERM);
```
Ruby:
```
Process.kill("TERM", 1234)
```

Receiving signals 1

A process can set up handlers to run a piece of code when a signal is received.

not all signals can be handled (e.g. #9 a.k.a. SIGKILL)
the handler is run asynchronously from the code of the process
handlers are not re-entrant
only one signal at a time can be handled
there is a default behavior for most signals
you can't tell what other process has sent the signal!

Receiving signals 2

In all languages, this consists in setting up a method or subroutine to handle a signal, and then telling the OS to invoke that method when the signal is received.

Bash:

function oh_no { echo Hello }   
trap oh_no SIGUSR1

Perl:

sub oh_no {
  print "Hello\n"
}
$SIG{'USR1'} = \&oh_no;

PHP:

function oh_no($s) { echo "Hello" }   
pcntl_signal(SIGUSR1, "oh_no");

Ruby:

Signal.trap("USR1") do
  puts "Hello"
end

Common signals

Signal name Default effect Trappable? Use it for INT (2) terminates process Yes User wants out? TERM (15) terminates process Yes Clean up properly KILL (9) terminates process Nope! Can't do anything about it USR1, USR2 No effect Yes! Anything you want

Example A, 1

Script test_inc.rb

#!/usr/bin/ruby

float   = 0.0
counter = 0

while true do
  float = float.next_float
  counter += 1
  puts "Float=" + float.to_s if counter % 1000000 == 0
end

We get about two reports per second on my Mac mini:

unix% ruby test_inc.rb
Float=4.940656e-318
Float=9.881313e-318
Float=1.482197e-317
Float=1.9762626e-317
Float=2.470328e-317
Float=2.964394e-317
...

Example A, 2

Improved script test_inc.rb

#!/usr/bin/ruby

float = 0.0
Signal.trap("USR1") { puts "Float=" + float.to_s }

while true do
  float = float.next_float
end

In two terminals:

unix% ruby test_inc.rb                 
Float=1.158144256e-315                          
Float=1.826497665e-315                          
Float=1.014102923e-314

unix% kill -USR1 3272                        
unix% kill -USR1 3272                        
unix% kill -USR1 3272

Example B, 1

In CBRAIN we have a maintenance process that regularly archives some tasks' work directories. The main Ruby loop looks like this:

# Ruby from CBRAIN (simplified)
task_list.each do |task|   # iterates over each task, likely hundreds
  task.archive_work_dir()  # this can take several minutes
done

Sometimes, maintenance (or shutdowns) need to be performed while this process has already started.

If we kill this process at an arbitrary time, we will likely end up with:

work directories only partly archived
the database state inconsistent
probably leftover temporary files

Example B, 2

What we do instead is allow the admin to tell the process to finish up whatever task is currently being archived, then stop before doing the next.

# Ruby from CBRAIN (simplified)

keep_going = true

Signal.trap("TERM")  { keep_going = false }

task_list.each do |task|   # iterates over each task, likely hundreds
  task.archive_work_dir()  # this can take several minutes
  break if ! keep_going    # break from the loop if signal received!
done

This way we know that no task is left at an inconsistent state.

Environment variables

Each process maintains a set of environment variables.

they are simple pairs of key=value
all keys and values are strings
by convention, keys are UPPERCASE (but not enforced)
e.g. USER=prioux
a few have special meanings for the operating system (e.g. PATH)
but most are arbitrary strings that programs agree on by convention or by specification

Misunderstandings

People think setting environment variables affect magically all the programs they run. In fact:

each process has its own private set of environment variables
a process can set, change or delete its own variables
but it is not possible for a process to change the environment variables of any other process

Puzzle: so how can this be even used for IPC ?

Environment variablesare inherited

Processes are created by forking from other processes.

a parent process #123 sets ABC=def in its environment
when the process fork(), the child process #987 also has ABC=def
all environment variables are copied exactly, in fact

The end result is that a process has a capability to transfer a set of X=y values to their child processes.

Obviously, this communication mechanism is unidirectional: the information flows only from parents to their children.

Also, it can only be done once, just before the child process is created.

fork() often leads to exec()

If processes create children that are identical, how can there be processes of different types?

yes, after fork(), the child process is identical
but often, the child process elects to change its executable code by invoking exec()
this replaces the code, but keeps the environment variables intact

From the shell (bash, sh, tcsh etc)

Interactively, the most common situation is:

set environment variables of a shell process
the shell itself likely doesn't care about those variables
but other processes launched will inherit them and use them

unix% export VISUAL=/bin/vim  # how bash sets its own environment variables
unix% git commit              # will launch vim to edit commit message
unix% export VISUAL=/bin/nano
unix% git commit              # now will launch nano to edit commit message

The env command 1

One of the simplest example is the UNIX command called env .

When run with no arguments:

it just prints all the environment variables it has received from its parent

unix% env
ABC=def

This is a great way to inspect what are the environment variables currently set in your shell.

The env command 2

When run with some ENV assignments in front of another command in argument...

env A=b C=d E=f command arg1 arg2 arg3 ...

it will change its own environment (A=b, C=d, etc)
then launch the command as a subprocess, which will inherit those variables
the environment of the shell running env is, itself, of course not modified in any way

unix% env
ABC=def
unix% env TMPDIR=/home/prioux/tmp minc_scramble abcde_t1.mnc.gz
minc scramble 1.0 : file abcde_t1.mnc.gz scrambled in /home/prioux/tmp/abcde_t1.mnc.gz
unix% env
ABC=def
unix% export TMPDIR=/home/prioux/tmp  # change this in SHELL level now
unix% env
ABC=def
TMPDIR=/home/prioux/tmp

System variables: examples

(Very few) environment variables have system meanings; the values they have directly affect some system aspect of the process.

Name Role Example PATH dirs to search for executables (when running exec())

/bin:/usr/bin:/sbin

LD_LIBRARY_PATH dirs to search for libraries

/lib:/usr/lib

User or app variables: examples

Most environment variables are specific to some programs, or are defined by conventions.

Name Role Example VISUAL if set, what program to use to launch a visual text editor

/bin/vim

EDITOR if set, what program to use to launch a text editor

/bin/ed

PAGER if set, what program to use to launch a pager

/usr/bin/less

DISPLAY X-window client programs get the socket to the X-window server

/tmp/socks-5/xwin:0

LESS options for the pager less

MeqisXRF

In the manual pages

Most manual pages describe, at their end, the environment variables used by their programs.

unix% man less
LESS(1)                                               LESS(1)

NAME
       less - opposite of more

SYNOPSIS
       less -?
       less --help
       less -V

(skipped through using 'more')

ENVIRONMENT VARIABLES
       EDITOR The name of the editor (used for the v command).

       LANG   Language for determining the character set.

       LESS   Options which are passed to less automatically.

(skipped through using 'more')

unix% export PAGER=less    # from now on I page using 'less'
unix% export LESS=MeqisXRF # also, 'less' will default to these options

Example with 'mt'

mt is a UNIX command to manipulate magnetic tapes.

unix% mt -f /dev/nst0 status
unix% mt -f /dev/nst0 rewind
unix% mt -f /dev/nst0 fsf 3
unix% man mt # can I make this simpler?

unix% export TAPE=/dev/nst0  # seems the mt commands knows how to use this
unix% mt status
unix% mt rewind
unix% mt fsf 3

'source' statements

Some text files, known as initialization scripts, often set environments variables.

Some are application-specific:e.g. the init.sh in the MNI Quarantine.

Some are used for user preferences or customizations:e.g. $HOME/.bashrc

In all cases, the script must be sourced, not executed.

unix% cat init.sh
export ABC=def
unix% bash init.sh   # will not do anything: it's a distinct subprocess!
unix% echo $ABC
unix% source init.sh # affects current shell: it reads the lines itself
unix% echo $ABC
def

File descriptors

Each process maintains a private list of file descriptors (FDs)

it's an array, indexed at 0
each entry corresponds to communication channel
they may not all be in use
they can be unidirectional or bidirectional

FDs for other things than files

Typically they are connected to files... but they can be many other things too:

anonymous pipes to other processes
named pipes
network sockets
UNIX-domain sockets
block devices (disks, tapes etc)
character devices (keyboards, ttys, speakers etc)

Processes can have lots of FDs

Modern LINUX systems allocate a lot of FDs to each process.

It used to be that a process could have a maximum of something like 64 FDs, nowadays it's more like 1024! Below: file descriptors for process mysqld on Ludmer server (excerpt)

COMMAND  FD   TYPE       DEVICE   SIZE/OFF NAME
mysqld    0r   CHR          1,3        0t0 /dev/null
mysqld    1w   REG          8,5      18042 /var/log/mysqld.log
mysqld    2w   REG          8,5      18042 /var/log/mysqld.log
mysqld    3uW  REG          8,5 1226833920 /var/lib/mysql/ibdata1
mysqld    4u   REG          8,4          0 /tmp/ibHSKNG9 (deleted)
mysqld    5u   REG          8,4        473 /tmp/ibP57ycj (deleted)
mysqld    6u   REG          8,4        103 /tmp/ibLQzkIs (deleted)
mysqld    7u   REG          8,4          0 /tmp/ibfRYmeC (deleted)
mysqld    8uW  REG          8,5    5242880 /var/lib/mysql/ib_logfile0
mysqld    9uW  REG          8,5    5242880 /var/lib/mysql/ib_logfile1
mysqld   10u  IPv4     55802736        0t0 *:mysql (LISTEN)
mysqld   11u   REG          8,4       1467 /tmp/ibROAYLL (deleted)
mysqld   12u  unix 0xfff882ed80        0t0 /var/lib/mysql/mysql.sock
mysqld   13u  IPv4     61704300        0t0 192.168.122.1:mysql->ccna:48212 (ESTABLISHED)
mysqld   15u   REG          8,5      30720 /var/lib/mysql/wp_ludmer/wp_options.MYI
mysqld   16u   REG          8,5     580904 /var/lib/mysql/wp_ludmer/wp_options.MYD
mysqld   37u   REG          8,5       4096 /var/lib/mysql/canadachina/MMSE.MYI
mysqld   38u   REG          8,5       1704 /var/lib/mysql/canadachina/MMSE.MYD
mysqld   52u   REG          8,5      25300 /var/lib/mysql/wp_nist/wp_usermeta.MYD

FDs are reusable

File descriptors go through a cycle of re-use:

they are opened and connected to a resource data is sent and/or received they are closed they can be reopened to another resource etc

A common confusion

File descriptors are lower level than file handles.

What we typically call a file handle (FH) is often the product of a library or language providing an API to the file descriptors.

The most common one is the C stdio library, which works on structures that hide the file descriptors.

Opening a file in C using file descriptors

int fd;
fd = open("/tmp/my_file",O_RDONLY);

Opening a file in C using the stdio file handles

File *fh;
fh = fopen("/tmp/my_file","r");

The three standard FDs

A common convention for UNIX processes are three standard FDs

Standard input ('STDIN', file descriptor 0)a stream going into the process
Standard output ('STDOUT' file descriptor 1)a stream going out of the process
Standard error ('STDERR', file descriptor 2)a stream going out the process too

Standard streamsconnect to anything

Processes do not care what theirs standards streams are connected to.

They can be connected to files, the streams of other processes, terminals, etc, just like any other FDs.

Pipeline building blocks

The standard channels allow building pipelines by linking processes together

Some useful building blocks

UNIX commands are part of a toolbox of processes that can be used to build such pipelines.

# 'cat' command: copies all input to output
unix% cat
hello how are you?
^D
hello how are you?

# 'echo' command: sends as string each argument, each separated by one space
unix% echo hi    how    "are  you" my,friend
hi how are  you my,friend

# 'tee' command: copies all input to output, and a copy to a separate file
unix% tee myout
hi there
^D
hi there
unix% cat myout
hi there

# 'head' command: copies only the first N lines of of input to output
unix% head -2
line 1
line 2
line 3
^D
line 1
line 2

# 'tail' command: copies only the last N lines of of input to output
unix% tail -2
line 1
line 2
line 3
^D
line 2
line 3

# 'grep' command: select only the input lines matching a regex and sends them to output
unix% grep pierre
natacha drinks coffee
but pierre eats chocolate
and tristan writes code
^D
but pierre eats chocolate

# 'sort' command: reads all lines, sorts them, then sends them to output
unix% sort
natacha drinks coffee
but pierre eats chocolate
and tristan writes code
^D
and tristan writes code
but pierre eats chocolate
natacha drinks coffee

# 'cut' command: select part of lines and sends them to output
unix% cut -d: -f2,4
pierre:rioux:le:fou
alanis:morissette:the:singer
justin:trudeau:le:politicien
^D
rioux:fou
morissette:singer
trudeau:politicien

# 'uniq' command: removes successive duplicate lines and sends them to output
unix% uniq
hello
hello
goodbye
hello
^D
hello
goodbye
hello

# 'gzip' command: compress input stream and sends the compressed result to output
unix% gzip
pierre eats chocolate
^D
$@~Fz-#@0

Input stream: files or STDIN

Some commands act differently depending on how they are invoked, when it comes to their inputs:

With no arguments, they read from STDIN
(all of the previous examples did that)
With arguments, they read from each file in turn

unix% cat
hello how are you?
^D
hello how are you?

unix% cat file
line 1 of 'file'
line 2 of 'file'
line 3 of 'file'

Grep & Cut

The grep and cut command, when used together to parse a data stream, might remind people of some relational DB primitives

# A sample (dummy) UNIX password file
prioux:x:1037:1037:Pierre Rioux:/home/prioux:/bin/bash
mero:x:1047:1047:Marc Rousseau:/home/mero:/bin/bash
sdas:x:1055:1055:Samir Das:/home/sdas:/bin/tcsh
nbeck:x:1055:1055:Natacha Beck:/home/sdas:/bin/sh
tglat:x:1055:1055:Tristan Glatard:/home/sdas:/bin/tcsh

Find all users of the tcsh shell:

unix% cat /etc/passwd | grep /bin/tcsh | cut -d: -f5
Samir Das
Tristan Glatard

So, what do grep and cut correspond to?

grep : the where clauses of a SQL select statement cut : the select `columns` part.

Bash redirection

The bash shell (and other shells too) have syntax to specify how to connect the standard channels to files before the command is run.

unix% program 0< file # fd 0 = stdin
unix% program  < file

unix% program 1> file # fd 1 = stdout
unix% program  > file

unix% program 2> file # fd 2 = stderr

unix% program 0< abc 1> def 2> xyz

Bash piping

The standard output of a process can be connected to the standard input of another one.

unix% wc -l memoirs.txt
7233

unix% cat memoirs.txt | wc -l
7233

Standard error handling

often used for diagnostics or messages
can be redirected or merged with standard output

unix% ls -l abc.txt def.txt
def.txt: no such file or directory
-rwxr-xr-x prioux wheel 12311 abc.txt
unix% ls -l abc.txt def.txt > out 2> err
unix% ls -l abc.txt def.txt | wc -l
def.txt: no such file or directory
1
unix% ls -l abc.txt def.txt 2>&1 | wc -l
2

Anonymous pipes

they are features provided by the operating system
they're used by the shell when piping "command | command"
a pipe is unidirectional
it connects a writer file descriptor of a process to a reader file descriptor of another process
in between is a memory buffer, managed by the operating system
the two processes have no real control over that buffer

Buffer behavior

The buffer's behavior can be compared to a barrel.

process A, the writer, is pouring data in
process B, the reader, is slurping data out

Unlike a barrel of water, the data's I/O order is kept.

A filled buffer

When the writer produces data faster than the reader can process them:

the buffer fills completely
the operating system will suspend process A
however, process B will run at full speed

An empty buffer

When the writer produces data slower than the reader can process them:

the buffer empties completely
the operating system will suspend process B
however, process A will run at full speed

Buffer specs

in good old times, the buffer's size was 4096 bytes (4 Kib)
in most modern UNIX systems, it is now 65036 bytes (64 Kib)
blocking the reader or writer makes the system as efficient as possible

Setting up a pipe

If you look up the system calls of a UNIX system you'll be surprised to discover that:

given two existing processes A and B...
... there is no API to create a pipe between them!

The pipe() system call

There is however a system call that seems promising:

it is called pipe()
it creates a buffer
it connects to it two file descriptors of the process
one for reading, one for writing
er... how is that useful ?!?
... what can we do?

Pipe and fork...

The solution is for the process to fork() after pipe()

we get two processes
both have access to both file descriptors
the memory buffer is unique and shared

Pipe, fork and close

Then, each process decides to close() one of their file descriptors.

To send data from parent to child:

the parent closes the read descriptor
the child closes the write descriptor

Pipe, fork and close 2

To send data from child to parent:

the parent closes the write descriptor
the child closes the read descriptor

An example in plain Perl

#!/bin/perl

# Standard perl functions; no Perl libs, no strict, no nothin'
pipe(READFD,WRITEFD);

if (fork) { # this block is the parent process
  close(READFD);
  print WRITEFD "Good morning my son";
  print "I am parent process $$ and I exit now.\n";
  exit 0;
} else { # this block is the child process
  close(WRITEFD);
  $a=<READFD>;
  print "I am child process $$ and I read '$a' from my parent. Exiting now.\n";
  exit 0;
}

unix% perl pipe.pl
I am parent process 24891 and I exit now.
I am child process 24892 and I read 'Good morning my son' from my parent. Exiting now.

popen() to the rescue

these pipe(), fork(), and close() can be cumbersome
yet piping to or from another process is a common need
many languages have libraries with wrappers to do all that:

C :
```
fh=popen("ls -l","r");
```

Perl :

open(FH,"ls -l|");
# or
my $fh = IO::File->new("ls -l|");

PHP:
```
$fh = popen("ls -l","r");
```

Ruby:

fh = IO.popen("ls -l","r")

Toolbox approach in action

Some commands will use piping internally without telling you.

An example is tar -czf out.tar.gz somedir

The tar command does not implement compression. It delegates this to a gzip child process that it pipes to.

unix% tar -czf /tmp/dummy.tar.gz MainStore &
[1] 28976

unix% pstree -p -a -c 28976
tar,28976 -czf /tmp/dummy.tar.gz MainStore
  |
  +--gzip,28977

unix% lsof -w -b -p 28976
COMMAND   PID  FD   TYPE  NAME
tar     28976   0u   CHR  /dev/pts/0
tar     28976   1u   CHR  /dev/pts/0
tar     28976   2u   CHR  /dev/pts/0
tar     28976   3r   REG  /data/MainStore/32/69/00/01221_2_t1.mnc.gz
tar     28976   4w  FIFO  pipe

unix% lsof -w -b -p 28977
COMMAND   PID  FD   TYPE  NAME
gzip    28977   0r  FIFO  pipe
gzip    28977   1w   REG  /tmp/dummy.tar.gz
gzip    28977   2u   CHR  /dev/pts/0

Named pipes (a.k.a. FIFOs)

What if we wanted to establish a pipe between two unrelated processes?

the processes need to 'find' each other
but PIDs aren't good enough
process names and commands are not reliable
we may want many such pipes
so it would be nice if we could give them names

Instead of establishing a registry, or adding new system calls, the UNIX designers chose the filesystem as a meeting place.

Introducing FIFO files

(a.k.a. named pipes oh darn it)

unix% mkfifo abcde # on some system it's mknod
unix% ls -l abcde
prw-r--r--  1 prioux  staff  0 Jul 27 21:46 abcde

this creates an entry in the filesystem (yes, an inode is used)
this entry is permanent until deleted
it is purely a place holder for communication
no buffers or file descriptors are associated with it
no space on the filesystem is used
the size stays at 0
normal file access restrictions apply

How they are used

Given a FIFO file, a process can, using standard file operations:

write data to it
read data from it

Except, the data is not going to disk at all!

How two processes communicate with FIFOs

Processes can independantly start reading from, or writing to, the FIFO file.

But the OS takes over!

However, the data requests are intercepted (kind of) and they instead go to a memory buffer just like for anonymous pipes.

Other properties of FIFOs

a process can start reading or writing even if the other process does not yet exist
in that case, the first process will block
FIFOs can be reused
you can delete the FIFO file while in use (!)
FIFO files can only be created on some filesystem types

An example in BASH

Preparation:

unix% mkfifo robot.pdf    # this file is neither a robot nor a PDF!
unix% ls -l robot.pdf
prw-r--r--  1 prioux  staff  0 Jul 28 10:52 robot.pdf|

In two separate shells, side by side:

unix% echo "Ceci n'est pas une pipe" > robot.pdf

unix% cat < robot.pdf           
Ceci n'est pas une pipe

A mistake I made

FIFOs only allow processes running on the same computer to exchange data.

Sockets

they can connect two processes on the same computer
they can connect two processes on different computers

Note: this is the only IPC mechanism that can connect processes on different computers!

Socket properties

just like pipes, the processes use file descriptors
they are bidirectional
... each process reads and writes to one file descriptor
the streams of bytes are buffered by the operating systems
... which can lead to blocking
care must be taken not to enter a deadlock
fortunately the OS provides calls to check if data is available or not without blocking

Network sockets vs UNIX-domain sockets

These two types of IPC end up providing the same feature, that is a bidirectional data flow.

They differ in how the data is transported and how the processes find themselves.

For network sockets

the stream is sent in packets over the network (duh)
the processes find each other by interfaces, IP addresses, port numbers, etc
the processes can be on two different computers

For UNIX-domain sockets

the stream is sent in memory buffers within the OS
the processes find each other with a socketfile
the socketfile is a placeholder just like for named pipes
the processes have to be on the same computer.

Summarizing another presentation

I already presented computer networking in details back in 2013, including network sockets.

That presentation is available in the Documents tab of the ACELab Redmine tracker, in project ACELab IT.

In that presentation...

interfaces
IP addresses
port numbers
packets
etc...

How processes find each other

Here's a quick summary of how a connection is set up. This applies to both types of sockets.

one process establish itself as a listenerNetwork: (IP, port) vs UNIX: (socketfile)
another process connects to it by specifying the same parameters
the listener accepts the connection
the operating system(s) transport the stream of bytes
the two processes read and write to their file descriptors

Many-to-one connections

Sockets are the only IPC mechanism that allow many processes to communicate with the listener process.

Access control

There is no particular access control for this type of IPC based on the owner or groups of the processes involved.

UNIX-domain sockets: the socketfile's permissions apply
network sockets: the external network routing, firewall etc
the listener process can reject connections based on source...
... or (most often) on content exchanged (protocol violations)

IPC mechanisms overview

IPC Bidirectional? Multi computer? Process relationships Signals no no any (same owner) Environment variables no no forked children only Pipes no no forked children only Named Pipes no no any, but fifo file permissions apply Network sockets yes yes any, but firewalls etc apply UNIX-domain sockets yes no any, but socketfile permissions apply

A quiz

This is based on a true story, here at the lab.

Setting: We had a UNIX machine with small disk space. Largest partition: 80G dedicated to the LORIS development MySQL DBs.

the disk is nearly full.
we identified a BIGDB inside the MySQL server using about 5G
we think it's filled with obsolete 'temp' data
we want to delete the DB after making a backup of the data
- We cannot do this, we don't have the disk space for file.sql:

unix% mysqldump [options] BIGDB > file.sql

Quiz: the command

Instead I ran this:

unix% mysqldump [options] BIGDB | gzip -c | ssh me@remote.host 'cat > dump.sql.gz'

Using what you all collectively know about UNIX:

how many processes are involved? which processes are they? which IPC mechanisms are used?

Quiz: the diagrams

unix% mysqldump [options] BIGDB | gzip -c | ssh me@remote.host 'cat > dump.sql.gz'

Let's identify the processes.

The easy ones: the processes we see on the command line:

They run on two different computers, right?

unix% mysqldump [options] BIGDB | gzip -c | ssh me@remote.host 'cat > dump.sql.gz'

They were are all started by some other processes, right? And what about the DB server process itself?

Note: two sshd processes are in fact involved...

unix% mysqldump [options] BIGDB | gzip -c | ssh me@remote.host 'cat > dump.sql.gz'

Now let's draw the connections to files. These are standard file operations, not IPC.

unix% mysqldump [options] BIGDB | gzip -c | ssh me@remote.host 'cat > dump.sql.gz'

OK now, what are all the IPC mechanisms operating between these processes?

Recap: Signals, Environment Variables, Anonymous Pipes, Named Pipes, Network Sockets, UNIX-Domain Sockets

unix% mysqldump [options] BIGDB | gzip -c | ssh me@remote.host 'cat > dump.sql.gz'

Thank you

I'm hoping this overview of IPC has help you all in understanding better how the applications we write, and the commands we run, interact with each other and with the system.

Presented in four parts: June 2016 - October 2016

I'm willing to present again any previous parts of this presentation, for those who have missed them. (This offer includes the GIT, the File system, and the Networking presentations too)

1.1

Interprocess Communication Pierre Rioux ACE lab Developer MeetingJune 2016

Interprocess Communication – Pierre Rioux – Table of content

prioux