PHP File Abstration – using the Flysystem library – Stream Wrappers (2)



PHP File Abstration – using the Flysystem library – Stream Wrappers (2)

0 1


phpne-talk

PHPNE talk notes (February 2014)

On Github judgej / phpne-talk

PHP File Abstration

using the Flysystem library

Talk by Jason Judge / @JasonDJudge

Overview

  • Some of this will be basic stuff, but hopefuly interesting, and hopefully something new

Accessing files in PHP

Types of operations

  • Whole-file operations
  • Directory operations
  • Streams
  • Simple operations on whole files

Whole File Operations

Handling files as single "lumps" of data: strings or arrays.

// read a file as a string
$string = file_get_contents($filename)

// create a file from a string
file_put_contents($filename, $string)

// make a copy of a file
copy($source, $destination)

// move a file
rename($oldname, $newname)

// delete a file
unlink($filename)
						
  • Simple operations on whole files
  • Files accessed by pathnames on the local filesystem (more on this later)
  • PHP normalises directory separators to / which makes things easier

Directory Operations

Directories can be created, removed, moved, or scanned for their contents.

// create a directory
mkdir($path)

// delete a directory
rmdir($path)

// move a directory
rename($oldname, $newname)

// scans a directory for its contents
scandir($path)
						

Resources

  • A primitive PHP data type
  • Resource variables references external resource
  • Resource variables created and destroyed by specialised functions
  • Resources are not objects
  • About 120 resource types
  • Well over 1000 functions for handling resources
  • One resource type: streams
  • primitive data type like an int or string
  • around long before PHP was OOP
  • external resources include: database connections, Flash object, images
  • an external resource is a
  • a dozen or more specialised functions created to handle every single resource type

Streams

A stream is a resource type with streamable behaviour:

  • Linear sequence of data – a start, middle and end
  • No structure inherent in the stream data
  • You progress through a stream, with a pointer tracking where you are
  • Some streams allow you to rewind or seek to arbitrary points
  • A stream may be readable/writable/seekable
  • You choose how to open the stream – read/write/both
  • A stream maps onto a file, providing access to that file
  • writing can take the form of rewriting from the start, or appending to what may already be there
  • operations on streams let you read and write characters, lines, strings
  • a directory can be opened as a stream too, providing a list of files and sub-directories

Advantages of Streams

  • Good for memory (with enormous files)
  • Rich set of functions (nearly 50)
  • Can read enormous directories
  • Can append to files
  • No need to keep a whole file in memory - it can be streamed and used a chunk at a time
  • So far we have talked about static files on the local file system - wrappers allows more

Stream Operations

// open a path as a stream
fopen($filename, $mode)
// write a string to a stream
fputs($handle)
// read bytes from a stream
fread($handle, $length)
// jump to a location in a stream (moves the file pointer)
fseek($handle, offset)
// close a stream and destroy the resource
fclose($handle)

// Similar for directories:
opendir(); readdir(); closedir();

// and a bunch of stream_*() functions for manipulating streams
						
  • A variable holding a stream resource is referred to as a handle
  • There are around 50 functions for operating on streams; this is just a handful

Stream Wrappers (1)

aka Protocol Handlers

  • Every stream is handled by a wrapper
  • The wrapper is specified at the start of the path as a scheme:scheme://target
  • Schemes include: file, ftp, http, php, phar, and more
  • Default scheme is file://
  • php:// wrapper gives access to stdin, stdout, memory
  • A wrapper handles the functionality; the means to access the stream
  • The target is a directory, but may also include login details and other information requried to connect
  • Explain stdin, stdout, stderr
  • Access to memory is like writing a temporary file, but it can reside in memory (though can overflow to disk)
  • Memory temporary files have no filename

Stream Wrappers (2)

  • All wrappers can be used in whole-file operations we looked at earlierreadfile('http://phpne.org.uk/about')
  • Custom wrappers can be created
  • Whole-file operations are just wrappers for collections of stream operations
  • A custom wrapper can be very powerful
  • A custom wrapper will contain an entire protocol for an abstration to access remote resources

Custom Stream Wrappers

Register Your Stream Wrapper

// Register the class
stream_wrapper_register('mywrapper', 'MyStreamProtocol');

// Use the wrapper
$fp = fopen('mywrapper://some/path/to/a/data/source', 'r');
readfile(fp);
fclose($fp);
						
  • The stream wrapper can be namespaced
  • Not sure if autoloaded, but I assume so
  • Examples are Google cloud storage, where the "google" wrapper is created

Custom Stream Wrappers

Define Your Stream Wrapper (1)

class MyStreamProtocol {
    // Open a stream
    function stream_open($path, $mode, $options, &$opened_path) {...};

    function stream_close() {...};
    function stream_read() {...};
    function stream_write() {...};
    function stream_eof() {...};
    function stream_tell() {...};
    function stream_seek() {...};
    ...
}
						

Custom Stream Wrappers

Define Your Stream Wrapper (2)

class MyStreamProtocol {
    ...
    // File-based operations for use with touch(), unlink(), rename(), stat()
    function unlink() {...};
    function rename() {...};
    function rmdir() {...};
    function url_stat() {...};
    ...
}
						
  • There is no interface for the wrapper class.
  • That is because most of the methods are optional: implement what is relevant.

Custom Stream Wrappers

Define Your Stream Wrapper (3)

class MyStreamProtocol {
    ...
    // Directory-based operations for use with mkdir(), rmdir(), opendir()
    function mkdir() {...};
    function rmdir() {...};
    function dir_opendir() {...};
    function dir_readdir() {...};
    function dir_closedir() {...};
}
						
  • Interesting example: http://www.tuxradar.com/practicalphp/15/11/1
  • The example lets you write text to the stream (stored in a variable, so a memory-based stream)
  • then when you read the stream the text is converted to morese code.
  • However, a filter may be a better way to handle that.

Stream Filters and Options

  • Filters can be added to streams
  • Custom filters can be created
  • Options can be passed to wrappers (called "contexts")
  • Options are specific to each wrapper
  • Filters can be chained together
  • Different filters can be added for input and output
  • A filter allows the stream to be transformed without having to deal with it in your application
  • e.g. a filter can perform compression, or decompression or encryption as data passes through it
  • or even convert to and from Morse Code.
  • Example options: tell a http stream whether to use POST or GET
  • Example options: tell an ftp stream whether to overwrite existing files or not

Put it all Together

  • This may not be strictly how PHP has the subsystems organised, but it is how it roughly looks to us.
  • Flysystem attempts to cover the first two layers.

Why Abstract?

Since the stream wrappers already seem to abstract everything, why abstract more?

  • One interface for handling files anywhere
  • Remote storage complexities hidden from the application logic; loose coupling
  • Swapping storage should be simple and quick; less lock-in
  • Testing - emulate a filesystem easily
  • This is all nice, so why would you want to abstract it further?
  • The complexity is still there, but it is hidden inside an abstraction, away from the application
  • Some servers do not allow local files to be written, and you want to migrate an application to it quickly
  • When testing, by emulated a filesystem, you have full control of how that filesystem behaves

Introducing Flysytem

  • Composer/packagist installation
  • Extendable using the adapter pattern
  • Works on PHP5.3, tested to PHP5.6
  • Supports streaming and whole files
  • Caching built-in (DI)
  • Copy files across different filesystems
  • The adapter pattern uses an internal interface, and adapters to convert that to external interfaces
  • The cache uses Dependancy Injection, so you can choose different caching options
  • Copying files between filesystems is about putting two streams back-to-back

Installing

composer install

{
    "require": {
        "league/flysystem": "0.2.*"
    }
}

What is Flysystem

"Flysystem is a filesystem abstraction which allows you to easily swap out a local filesystem for a remote one."

 

Author Frank de Jonge

github.com/thephpleague/flysystem

  • It is an object, unlike all of PHP's built-in file and stream handling.

Flysystem Architecture

Local Files

To access the local filesystem:

// The filesystem class and the adapter.
use League\Flysystem\Filesystem;
use League\Flysystem\Adapter\Local as Adapter;

// Create a local file filesystem object.
$filesystem = new Filesystem(new Adapter('path/to/root'));
// Get the content of a file.
$file_content = $filesystem->read('sub_dir/myfile.txt');
  • The filesystem object provides access to files through the adapter
  • The local file adapter needs a root directory to build any further directories on

FTP

Other adapters can be dropped into the constructor:

use League\Flysystem\Filesystem;
use League\Flysystem\Adapter\Ftp as Adapter;

$filesystem = new Filesystem(new Adapter(array(
    'host' => 'ftp.example.com',
    'username' => 'username',
    'password' => 'password',
)));
  • The FTP adapter uses the standard PHP FTP functions to perform its tasks

Dropbox

Some adapters need external libraries to be included:

use Dropbox\Client;
use League\Flysystem\Filesystem;
use League\Flysystem\Adapter\Dropbox as Adapter;

$client = new Client($token, $appName);
$filesystem = new Filesystem(new Adapter($client, 'optional/path/prefix'));

The Dropbox library in turn needs OAuth and its libraries.

It is all nicely handled by DI.

  • The DI can be handled manually in small projects, or a container used to ease use in frameworks.

Operations (1)

Writing complete files:

$fs = new Filesystem($adapter);
// Create a new file (including directories) and set permissions
$fs->write($filename, $content)

// Overwrite the contents of an existing file
$fs->update($filename, $content)

// Write or put, depending if file exists
$fs->put($filename, $content)

// Apend to a file?
  • Non-streaming - whole files are written from a string
  • write will raise an exception if the file alreadt exists
  • update will raise an exception if teh file does not exist
  • probably safer to use put() if you are unsure
  • Does not appear to be a way to append at this time

Operations (2)

Reading complete files:

$fs = new Filesystem($adapter);
// Read the complete contents of a file as a string
$fs->read($filename)

// Tells us if the file exists
$fs->has($filename)

Operations (3)

File operations:

$fs = new Filesystem($adapter);
// Delete a file
$fs->delete($filename)

// Rename or move a file
$fs->rename($filename)

// Copy a file?
  • There is not operation to copy or duplicate a file, but there is an elegant way detailed later

Operations (4)

Directory operations:

$fs = new Filesystem($adapter);
// Create a directory (to any number of levels)
$fs->createDir($path)

// Remove a directory
$fs->deleteDir($path)

// Move a directory? Same as renaming a file
$fs->rename($path)
  • Not all adapters support moving of directories, so be careful and test that

Metadata

Minimum supported metadata for all adapters:

$fs = new Filesystem($adapter);
// Get the Mimetype for a file
$fs->getMimetype($filename)

// Get the last update time for a file
$fs->getTimestamp($filename)

// Get the size of a file in bytes
$fs->getSize($filename)
  • All metadata is cached, so we don't have round-trips to the remote store each time
  • The adapter may return any number of additional metadata items (as we will see in the demo)
  • Fetching specific additional metadata fields will mean writing a plugin (not sure why though)

File Visibility (1)

The visibility of a file: can it be seen by others or just the current user?

// Two states allowed
AdapterInterface::VISIBILITY_PRIVATE // 'private'
AdapterInterface::VISIBILITY_PUBLIC  // 'public'
  • The visibility is very high level - may not be enough for some use-cases

File Visibility (2)

The visibility can be:

  • set globally, for all new files to take on
  • set for each file creating operation (write)
  • applied to existing files
// New files
$fs->write($filename, 'contents', ['visibility' => $visibility]

// Existing files
$fs->getVisibility($filename)
$fs->setVisibility($filename, $visibility)

// Or set globally when instantiating the Filesystem
$fs = new League\Flysystem\Filesystem($adapter, $cache, [
    'visibility' => AdapterInterface::VISIBILITY_PRIVATE
]);
  • It would be nice to see some use-cases on how the visibility choices arose.
  • Would also be nice to be able to extend the visibility to cater for other needs.

File Visibility (3)

Implementation example, for local files

  • public: 0644 (owner=rw, group=r, other=r)
  • private: 0600 (owner=rw)
  • Other adapters will use other methods
  • The broad-brush approach is probably just what is common between the adapters

Listing directories (1)

// List the contents of a directory, recursively if required
$fs->listContents($path [, bool $recurse])
  • Returns an array of arrays
  • Each item is a "file" or a "dir"
  • A basename (e.g. myfile.txt) and a path for each item
  • Any other metadata requested and cached for this item
  • Any metadata the adapter wants to return
  • An array of arrays is items and attributes of each
  • Would probably be more useful to return objects
  • The path for each item is the full path to an item, which includes filenames (probably a misnomer)

Listing directories (2)

  • Custom metadata is very useful
  • $fs->listPaths($path [, bool $recurse]) Return just a list of paths
  • $fs->listWith($keys, $path [, bool $recurse]) Make sure metadata is requested (fields listed in $keys)
  • Custom metadata: e.g. owner, group and full file permissions, public URLs (for Flickr pics)
  • Some metadata comes for free when listing a directory, and will always be included by the adapter
  • Some metadata needs an additional call to the remote site, and so won't be fetched unless asked for

Flysystem and Streams (1)

Use streams instead of strings to read and write files

// Create a file from a stream
$fs->writeStream($filename, $stream)

// Update a file from a stream
$fs->updateStream($filename, $stream)

// Return a file as a stream
resource $fs->readStream($filename)
  • If the remote storage does not support direct streams, then Flysystem will read a whole remote file into a temporary stream and return that

Flysystem and Streams (2)

There is no copy operation in Flysystem; few if any remote storages support it.

Instead, put two streams back-to-back:

// Copy one file to a new file
$fs->writeStream(
    'destination.txt',
    $fs->readStream('source.txt')
);
  • This will still stream the source file to the local server, then back out to the remote server.
  • The files could be copied between two separate filesystems this way

Flysystem and Streams (3)

So far no support for returning a stream to an open file for writing.

Some remote storage would support this, and some would not.

// This would be useful (but not implemented):
$stream = $fs->openWriteStream('destination.txt', 'a');

fputs($stream, 'foo');

fclose($stream);
  • This will still stream the source file to the local server, then back out to the remote server.
  • The files could be copied between two separate filesystems this way

Alternative Abstractions (1)

  • For Laravel, it is simple to swap it out through its facade, though that is essentially setting a single, global filestore for the whole application.

Alternative Abstractions (2)

  • KnpLabs/Gaufrette takes a similar approach, and has been around for a couple of years.
  • discordier/php-filesystem tries to get to some raw file access methods, though seems to have stopped development before many adapters were written. Implements filesystem iterators, which many other libraries do not.
  • zikula has abstracted the filesystem and is worth looking at for its coverage of operations.

Enhancements - Adapters

Some interesting adapters

  • BBC iPlayer
  • Flickr (finish off)
  • PDO/MySQL
  • IMAP
  • Google Cloud Storage
  • Test adapter skeleton
  • The test adapter could be set with rules on how to respond to requests
  • iPlayer includes channels, categories, thumbnails, link to view page
  • IMAP - emails are just text files anyway; the adapter could also pull attachments out into virtual folders

Enhancements - Features

  • Support for paged files (using streams)
  • Writeable streams
  • Directory listings (and files) as objects
  • Directory listings as streams
  • find() operation
  • Extend permissions
  • copy() operation
  • Flyception - Flyserver apdapter that uses a custom stream wrapper that calls the Flyserver adapter for its functionality
  • Paged files would include data that can only be retrived in pages, e.g. entries in a blog
  • Writeable streams would be supported by only some adapters and storage destinations
  • Directories as objects offer better handling of missing metadata
  • find() could search for matching files using any criteria - file globbing, metadata match.
  • The permissions needs more use-cases, so this is just a hunch at this stage.
  • One permissions case: read-only files, generated virtually by the adapter
  • The copy() operation could make use of the ability for a remote storage to duplicate files where it can, otherwise would just stream back-to-back.

Summary

  • PHP file handling is flexible and powerful
  • PHP provides some abstraction already
  • We like to abstract files, directories and streams
  • Flysystem brings file abstraction into the OOP world
  • Flysystem makes it easier to swap remote storage in and out
  • Flysystem is still young, and will develop further

Conclusions

  • We can no longer assume files will just be written to the local filesystem
  • An OO wrapper for file access will makes switching storage destinations easier
  • An agreed interface would be good for portability (PSR?)
  • Streams and stream wrappers are underused and underappreciated
  • Streams are still a core part of file handling, and will be for a long time
  • Flysystem is gaining momentum and is worth checking out...
  • ...but be aware it is still finding its feet

Demo (1)

  • An adapter to access some of Flickr as an abstract filesystem
  • Uses OAuth to log into Flickr
  • Directory structure is emulated, as are the files
  • Metadata includes various Flickr properties of the images
  • Demo does not write images at this time, but it would be simple

Demo (2)

The demo architecture

Demo (3)

The demo fetches the contents of a "path" on Flickr

// $app['flickrapi'] is a DI container for the Flickr API
// FlyAdapter is a custom adapter that adstracts Flickr
$filesystem = new Filesystem(
    new Academe\Flickr\FlyAdapter($app['flickrapi'])
);

// Get the path from the user.
$path = $request->get('path', '');

// Get the items on Flickr at that path
$items = $filesystem->listContents($flickr_path);

Demo (4)

If a file is selected, then fetch it for display or download

// Filename selected by user
$file = $request->get('file', '');

if (!empty($selected_file)) {
    // Get metadata for the file.
    $file_metadata = $filesystem->getWithMetadata("$path/$file", array());

    // Get the content of the file.
    $file_content = $filesystem->read("$path/$file", array());

    // The file_content can be streamed, or displayed inline
    // depending on its mimetype
}
  • The point if this is to show how simple the code for the application is - it knows nothing about where the files come from.

Demo (5)

The virtual directory structure:

/
|--contacts
|  |--friends
|  |--family
|  |--both
|  |--neither
|     |--metadata.csv
|     |--user1
|     |--user2
|        |--photostream
|           |--image1_small.jpg
|           |--image1_largesquare.jpg
|           |--image2_small.jpg
|           |--image2_largesquare.jpg
|           |--...
|        |--sets
|        |--non-sets
|     |--userX
|--me

Link to the demo (will need a Yahoo/Flickr login)

  • Note metadata.csv - it summarises the files in the directory as a CSV file
  • Any number of additional sizes could be included.

THE END

I hope you found this useful.

Slides will be online, and updated as necessary.

Feedback:@JasonDJudgejason@academe.co.ukgithub.com/judgej/phpne-talk

References