Warning: These documents are under active development and subject to change (version 2.1.0-beta).
The latest release documents are at: https://purl.dataone.org/architecture

Apache Configuration for DataONE Services

This document refers specifically to configuration directives that must be enabled to ensure Apache correctly processes the REST URLs used by the DataONE service interfaces.

Parameters in question:

AllowEncodedSlashes:
 

(Off)|On

The AllowEncodedSlashes directive allows URLs which contain encoded path separators (%2F for / and additionally %5C for on according systems) to be used. Normally such URLs are refused with a 404 (Not found) error.

AcceptPathInfo:

Off|On|(Default)

This directive controls whether requests that contain trailing pathname information that follows an actual filename (or non-existent file in an existing directory) will be accepted or rejected.

Both of these must be set to On for Member Node and Coordinating Node services to ensure that URLs containing identifiers as path element (e.g. for MN_crud.get()) are not rejected or mishandled by the Apache web server.

These parameters must be in effect for the section of the web server configuration handling DataONE service requests.

Examples

The following examples provide an indication of Apache response for different configurations.

The version of Apache being examined was:

Apache/2.2.14 (Unix) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l PHP/5.3.1
mod_perl/2.0.4 Perl/v5.10.1

A simple Perl CGI script was installed in the web server root content folder, which was ExecCGI enabled. The script:

$ cat htdocs/test.cgi

#!/usr/bin/perl
print "Content-type: text/html\n\n";
foreach $key (keys %ENV) {
print "$key --> $ENV{$key}\n";
}

Only relevant output from the script is provided in the examples below.


AllowEncodedSlashes:
 Off
AcceptPathInfo:Off
Request:http://localhost/test.cgi/bogus%2Fstuff
PID Equivalent:“bogus/stuff”
Error Message:Mon Dec 13 15:45:00 2010] [info] [client ::1] found %2f (encoded ‘/’) in URI (decoded=’/test.cgi/bogus/stuff’), returning 404
Response:Default 404

AllowEncodedSlashes:
 On
AcceptPathInfo:Off
Request:http://localhost/test.cgi/bogus%2Fstuff
PID Equivalent:“bogus/stuff”
Error Message:Mon Dec 13 15:46:08 2010] [error] [client ::1] AcceptPathInfo off disallows user’s path: /Applications/XAMPP/xamppfiles/htdocs/test.cgi
Response:Default 404

AllowEncodedSlashes:
 Off
AcceptPathInfo:On
Request:http://localhost/test.cgi/bogus%2Fstuff
PID Equivalent:“bogus/stuff”
Error Message:Mon Dec 13 15:46:48 2010] [info] [client ::1] found %2f (encoded ‘/’) in URI (decoded=’/test.cgi/bogus/stuff’), returning 404
Response:Default 404

AllowEncodedSlashes:
 

On

AcceptPathInfo:

On

PID Equivalent:

“bogus/stuff”

Request:

http://localhost/test.cgi/bogus%2Fstuff

Error Message:

None

Response:
SCRIPT_NAME --> /test.cgi
SERVER_NAME --> localhost
SERVER_ADMIN --> you@example.com
PATH_INFO --> /bogus/stuff
REQUEST_METHOD --> GET
HTTP_ACCEPT --> */*
SCRIPT_FILENAME --> /Applications/XAMPP/xamppfiles/htdocs/test.cgi
VERSIONER_PERL_PREFER_32_BIT --> no
SERVER_SOFTWARE --> Apache/2.2.14 (Unix) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l PHP/5.3.1 mod_perl/2.0.4 Perl/v5.10.1
QUERY_STRING -->
REMOTE_PORT --> 50155
HTTP_USER_AGENT --> curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
SERVER_SIGNATURE -->
SERVER_PORT --> 80
REMOTE_ADDR --> ::1
SERVER_PROTOCOL --> HTTP/1.1
PATH --> /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/opt/local/bin:/usr/local/git/bin
REQUEST_URI --> /test.cgi/bogus%2Fstuff
GATEWAY_INTERFACE --> CGI/1.1
SERVER_ADDR --> ::1
DOCUMENT_ROOT --> /Applications/XAMPP/xamppfiles/htdocs
PATH_TRANSLATED --> /Applications/XAMPP/xamppfiles/htdocs/bogus/stuff
HTTP_HOST --> localhost
VERSIONER_PERL_VERSION --> 5.10.0
UNIQUE_ID --> TQaGaEprSyIAAFOcw20AAAAB

AllowEncodedSlashes:
 

On

AcceptPathInfo:

On

Request:

http://localhost/test.cgi/bogus%2Fstuff%3Fvar%3Dvalue

PID Equivalent:

“bogus/stuff?var=value”

Error Message:

None

Response:
SCRIPT_NAME --> /test.cgi
SERVER_NAME --> localhost
SERVER_ADMIN --> you@example.com
PATH_INFO --> /bogus/stuff?var=value
REQUEST_METHOD --> GET
HTTP_ACCEPT --> */*
SCRIPT_FILENAME --> /Applications/XAMPP/xamppfiles/htdocs/test.cgi
VERSIONER_PERL_PREFER_32_BIT --> no
SERVER_SOFTWARE --> Apache/2.2.14 (Unix) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l PHP/5.3.1 mod_perl/2.0.4 Perl/v5.10.1
QUERY_STRING -->
REMOTE_PORT --> 64650
HTTP_USER_AGENT --> curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
SERVER_SIGNATURE -->
SERVER_PORT --> 80
REMOTE_ADDR --> ::1
SERVER_PROTOCOL --> HTTP/1.1
PATH --> /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/opt/local/bin:/usr/local/git/bin
REQUEST_URI --> /test.cgi/bogus%2Fstuff%3Fvar%3Dvalue
GATEWAY_INTERFACE --> CGI/1.1
SERVER_ADDR --> ::1
DOCUMENT_ROOT --> /Applications/XAMPP/xamppfiles/htdocs
PATH_TRANSLATED --> /Applications/XAMPP/xamppfiles/htdocs/bogus/stuff?var=value
HTTP_HOST --> localhost
VERSIONER_PERL_VERSION --> 5.10.0
UNIQUE_ID --> TQaK80prSyIAAFOexIUAAAAD

AllowEncodedSlashes:
 

On

AcceptPathInfo:

On

Request:

http://localhost/test.cgi/bogus%2Fstuff%3Fvar%3Dvalue?var2=value2

PID Equivalent:

“bogus/stuff?var=value” with query string at the end.

Error Message:

None

Response:
SCRIPT_NAME --> /test.cgi
SERVER_NAME --> localhost
SERVER_ADMIN --> you@example.com
PATH_INFO --> /bogus/stuff?var=value
REQUEST_METHOD --> GET
HTTP_ACCEPT --> */*
SCRIPT_FILENAME --> /Applications/XAMPP/xamppfiles/htdocs/test.cgi
VERSIONER_PERL_PREFER_32_BIT --> no
SERVER_SOFTWARE --> Apache/2.2.14 (Unix) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l PHP/5.3.1 mod_perl/2.0.4 Perl/v5.10.1
QUERY_STRING --> var2=value2
REMOTE_PORT --> 49339
HTTP_USER_AGENT --> curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
SERVER_SIGNATURE -->
SERVER_PORT --> 80
REMOTE_ADDR --> ::1
SERVER_PROTOCOL --> HTTP/1.1
PATH --> /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/opt/local/bin:/usr/local/git/bin
REQUEST_URI --> /test.cgi/bogus%2Fstuff%3Fvar%3Dvalue?var2=value2
GATEWAY_INTERFACE --> CGI/1.1
SERVER_ADDR --> ::1
DOCUMENT_ROOT --> /Applications/XAMPP/xamppfiles/htdocs
PATH_TRANSLATED --> /Applications/XAMPP/xamppfiles/htdocs/bogus/stuff?var=value
HTTP_HOST --> localhost
VERSIONER_PERL_VERSION --> 5.10.0
UNIQUE_ID --> TQaLPEprSyIAAFOdxIcAAAAC

AllowEncodedSlashes:
 

On

AcceptPathInfo:

On

Request:

http://localhost/test.cgi/bogus%2Fstuff%3Fvar=value?var2=value2

PID Equivalent:

“bogus/stuff?var=value” with query string at the end

Error Message:

None

Response:
SCRIPT_NAME --> /test.cgi
SERVER_NAME --> localhost
SERVER_ADMIN --> you@example.com
PATH_INFO --> /bogus/stuff?var=value
REQUEST_METHOD --> GET
HTTP_ACCEPT --> */*
SCRIPT_FILENAME --> /Applications/XAMPP/xamppfiles/htdocs/test.cgi
VERSIONER_PERL_PREFER_32_BIT --> no
SERVER_SOFTWARE --> Apache/2.2.14 (Unix) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l PHP/5.3.1 mod_perl/2.0.4 Perl/v5.10.1
QUERY_STRING --> var2=value2
REMOTE_PORT --> 59889
HTTP_USER_AGENT --> curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
SERVER_SIGNATURE -->
SERVER_PORT --> 80
REMOTE_ADDR --> ::1
SERVER_PROTOCOL --> HTTP/1.1
PATH --> /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/opt/local/bin:/usr/local/git/bin
REQUEST_URI --> /test.cgi/bogus%2Fstuff%3Fvar=value?var2=value2
GATEWAY_INTERFACE --> CGI/1.1
SERVER_ADDR --> ::1
DOCUMENT_ROOT --> /Applications/XAMPP/xamppfiles/htdocs
PATH_TRANSLATED --> /Applications/XAMPP/xamppfiles/htdocs/bogus/stuff?var=value
HTTP_HOST --> localhost
VERSIONER_PERL_VERSION --> 5.10.0
UNIQUE_ID --> TQaNjkprSyIAAFOfxYgAAAAE

AllowEncodedSlashes:
 

On

AcceptPathInfo:

On

Request:

http://localhost/test.cgi/bogus%2Fstuff/something/else

PID Equivalent:

“bogus/stuff” with additional path at the end

Error Message:

None

Response:
SCRIPT_NAME --> /test.cgi
SERVER_NAME --> localhost
SERVER_ADMIN --> you@example.com
PATH_INFO --> /bogus/stuff/something/else
REQUEST_METHOD --> GET
HTTP_ACCEPT --> */*
SCRIPT_FILENAME --> /Applications/XAMPP/xamppfiles/htdocs/test.cgi
VERSIONER_PERL_PREFER_32_BIT --> no
SERVER_SOFTWARE --> Apache/2.2.14 (Unix) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l PHP/5.3.1 mod_perl/2.0.4 Perl/v5.10.1
QUERY_STRING -->
REMOTE_PORT --> 57774
HTTP_USER_AGENT --> curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
SERVER_SIGNATURE -->
SERVER_PORT --> 80
REMOTE_ADDR --> ::1
SERVER_PROTOCOL --> HTTP/1.1
PATH --> /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/opt/local/bin:/usr/local/git/bin
REQUEST_URI --> /test.cgi/bogus%2Fstuff/something/else
GATEWAY_INTERFACE --> CGI/1.1
SERVER_ADDR --> ::1
DOCUMENT_ROOT --> /Applications/XAMPP/xamppfiles/htdocs
PATH_TRANSLATED --> /Applications/XAMPP/xamppfiles/htdocs/bogus/stuff/something/else
HTTP_HOST --> localhost
VERSIONER_PERL_VERSION --> 5.10.0
UNIQUE_ID --> TQaQiEprSyIAAFOixfMAAAAF

Configuration

As of Apache 2.2.14, there are some bugs that affect the AllowEncodedSlashes setting.

Bug 46830:

If “AllowEncodedSlashes On” is set in the global context, it is not inherited by virtual hosts. You must explicitly set “AllowEncodedSlashes On” in every <VirtalHost> container.

The documentation for how the different configuration sections are merged (http://httpd.apache.org/docs/2.2/sections.html) says “Sections inside <VirtualHost> sections are applied after the corresponding sections outside the virtual host definition. This allows virtual hosts to override the main server configuration.”

Virtual hosts are used in many default Apache configurations. In Ubuntu, the default VirtualHost container is set up in /etc/apache2/sites-available/default.

Bug 35256:

%2F will be decoded in PATH_INFO (Documentation to AllowEncodedSlashes says no decoding will be done)

The consequence of this bug is that only the last section in a URL can contain slashes.

Conclusions

  1. AllowEncodedSlashes and AcceptPathInfo must be set to On
  2. We can successfully add query parameters to the end of the URL providing the identifier embedded in the path is properly encoded.
  3. Adding additional path elements beyond the encoded identifier segment will require additional processing, which entails custom parsing of the REQUEST_URI environment variable passed on by the web server.