Crate npath[][src]

Expand description

Normalized Paths

npath is a Rust library providing methods for cross-platform lexical path processing and normalization. These methods are implemented in extension traits to Path and PathBuf.

Usage

Add npath to your Cargo.toml:

[dependencies]
npath = { git = "https://github.com/gdzx/npath" }

Import the following traits:

use npath::{NormPathExt, NormPathBufExt};

Overview

std::path lacks methods for lexical path processing, which:

  • Do not rely on system calls.
  • Remove the need to handle I/O errors.
  • Support more operations.
  • Allow to process paths to files or directories that do not exist.

The following sections outline the main features this library provides.

Joining paths

One of the most basic operation is joining two paths. Trying to get C:\Users\User\Documents\C:\foo using Path::join can yield an entirely different path:

use std::path::Path;

assert_eq!(
    Path::new(r"C:\Users\User\Documents").join(r"C:\foo"),
    Path::new(r"C:\foo"),
);

Although paths are represented by strings, Path::join is a high-level method that processes its second argument to determine if it is absolute. On the contrary, the fundamental operation of appending a path to another by string concatenation is called a lexical join.

NormPathExt::lexical_join joins two paths with an operation similar to string concatenation, only adding a path separator in-between if needed. Path::join is a refinement of a lexical join:

use std::path::{Path, PathBuf};
use npath::NormPathExt;

fn join(base: &Path, path: &Path) -> PathBuf {
    if path.is_absolute() {
        path.to_path_buf()
    } else {
        base.lexical_join(path)
    }
}

Normalization

If you want to check whether two paths are identical, you need to transform them into a form that allows comparison. Rust provides std::fs::canonicalize, which returns the true canonical path on the filesystem:

use std::path::Path;

assert_eq!(
    Path::new("/srv").join("file.txt").canonicalize()?,
    Path::new("/srv").join("bar//../file.txt").canonicalize()?,
);

Path::canonicalize requires a concrete path (that refers to an existing file or directory on the filesystem) or it will return an error. NormPathExt::normalized eliminates the intermediate components ., .., or duplicate / through pure lexical processing. It is the building block for comparing paths, ensuring a path is restricted to some base path, or for finding the relative path between two paths. It yields the shortest lexically equivalent path: it is normalized.

NormPathExt::resolved uses both approaches: the longest prefix whose individual components exist is canonicalized, the remaining path is normalized, and adjoined to it. The purpose is to circumvent the limitations of normalization, while still being able to apply it to paths that do not exist.

Restricting paths

Web servers are exposed to path traversal vulnerabilities that allow an attacker to access files outside of some base directory. Path::join with the base directory /srv and a user-supplied path can yield a path outside of /srv:

use std::path::{Path, PathBuf};

assert_eq!(
    Path::new("/srv").join("/etc/passwd"),
    PathBuf::from("/etc/passwd")
);

Only accepting relative paths is not sufficient:

use std::path::{Path, PathBuf};
use npath::NormPathExt;

assert_eq!(
    Path::new("/srv").join("../etc/passwd").normalized(),
    PathBuf::from("/etc/passwd")
);

Stripping .. prefixes is not enough either:

use std::path::{Path, PathBuf};
use npath::NormPathExt;

assert_eq!(
    Path::new("/srv").join("foo/../../etc/passwd").normalized(),
    PathBuf::from("/etc/passwd") // /etc/passwd
);

If the user-provided path only needs to be a single path component, the programmer can forbid any string containing paths separators and filter ... Otherwise, the inner .. components needs to be simplified, and the prefix .. components eliminated. Normalization is at the core of the following methods:

Limitations

Lexical path processing, being limited to operations without interacting with the system, can change the concrete object a path points to.

Normalization

If /a/b is a symlink to /d/e, then for /a/b/../c:

Windows

Common Windows filesystems are case-insensitive, where foo.txt, FOO.TXT, and fOo.txT point to the same file. Additionally, the mapping from lowercase to uppercase letters in the Unicode range is stored in the filesystem, and depends on the date it was created on. This library performs case-insensitive comparisons only for the ASCII character set (the first 128 Unicode characters).

TODO

  • Special Windows prefixes.

Traits

Extension trait for PathBuf.

Extension trait for Path.