Web services client for Apache Tika

1.0.0 2014-04-23 05:05 UTC


This PHP client interacts with the Tika REST Server for extracting content and metadata from a [wide variety of document file types][types]. There are [alternative PHP libraries][alternatives] that use the Tika command line client, but instantiating the JVM for each operation is slow and costly.

This client is built on Guzzle.

"see "Using Tika as a command line utility"" [types]: [alternatives]:

Project Setup

This project is installed with composer.

In the shell, you can run this command:

composer require bangpound/tika-rest-client

Or you can edit your composer.json file to include this requirement:

    "require": {
        "bangpound/tika-rest-client": "^1.0"


$client = new Bangpound\Tika\Client('http://localhost:9998');
$response = $client->tika(array(
    'file' => 'TestPDF.pdf',

// Metadata varies by file and file type, so refer to the Apache Tika docs for details.
$all_metadata = $response->metadata;

// If you know the metadata element you want to retrieve, specify it as the argument
// to the response's metadata method.
$author = $response->metadata('author');

// Extracted content can be retrieved as a SimpleXMLElement or a string of XML.
$content_xml = $response->getBody();
$page_2 = $content_xml->children()->div[1];

$content_text = $response->getBody(true);


The Tika REST Client has an incomplete suite of tests. Run them using phpunit after installing the dev dependencies.

composer install


This code is released under the MIT license.