Curve Similarity

The rapidgeo.similarity module provides algorithms for measuring the similarity between two polygonal curves. These algorithms are essential for comparing trajectories, GPS tracks, and other geographic paths.

Overview

Two primary similarity measures are available:

Fréchet Distance:

Considers the ordering of points along each curve. Often described as the “dog walking” distance - imagine a person walking their dog, each along their own path. The Fréchet distance is the shortest leash length that allows both to complete their walks.

Hausdorff Distance:

Measures the maximum distance from any point in one curve to the closest point in the other curve. Order-independent and symmetric.

All functions work with sequences of LngLat coordinates and return distances in meters.

Core Modules

Discrete Fréchet distance implementation.

The Fréchet distance measures similarity between polygonal curves while considering the ordering of points along each curve. Often described as the “dog walking” distance.

rapidgeo.similarity.frechet.discrete_frechet(polyline_a, polyline_b)

Calculate discrete Fréchet distance between two polylines.

The Fréchet distance measures the similarity between two polygonal curves. It considers the ordering of points along each curve, making it suitable for comparing trajectories, paths, or time-series data.

This implementation uses dynamic programming for optimal alignment between curves. Uses Haversine distance for point-to-point calculations.

Parameters:
  • polyline_a (List[LngLat]) – First polyline as list of coordinates

  • polyline_b (List[LngLat]) – Second polyline as list of coordinates

Returns:

Fréchet distance in meters

Return type:

float

Raises:

ValueError – If either polyline is empty

Examples

>>> from rapidgeo import LngLat
>>> from rapidgeo.similarity.frechet import discrete_frechet
>>> path1 = [LngLat(0.0, 0.0), LngLat(1.0, 0.0)]
>>> path2 = [LngLat(0.0, 1.0), LngLat(1.0, 1.0)]
>>> distance = discrete_frechet(path1, path2)
rapidgeo.similarity.frechet.discrete_frechet_with_threshold(polyline_a, polyline_b, threshold)

Calculate discrete Fréchet distance with early termination threshold.

Optimized version that can terminate early if the distance exceeds a threshold. Useful when you only need to know if curves are within a certain similarity bound.

Parameters:
  • polyline_a (List[LngLat]) – First polyline as list of coordinates

  • polyline_b (List[LngLat]) – Second polyline as list of coordinates

  • threshold (float) – Maximum distance threshold in meters

Returns:

Fréchet distance in meters, or value > threshold if exceeded

Return type:

float

Raises:

ValueError – If either polyline is empty

Examples

>>> from rapidgeo.similarity.frechet import discrete_frechet_with_threshold
>>> distance = discrete_frechet_with_threshold(path1, path2, 1000.0)

Hausdorff distance implementation.

The Hausdorff distance measures the maximum distance between any point in one set and the closest point in another set. Order-independent similarity measure.

rapidgeo.similarity.hausdorff.hausdorff(polyline_a, polyline_b)

Calculate Hausdorff distance between two polylines.

The Hausdorff distance measures the maximum distance between any point in one set and the closest point in another set. Unlike Fréchet distance, it doesn’t consider the ordering of points along curves.

This is the symmetric version: max(directed_hausdorff(A,B), directed_hausdorff(B,A)). Uses Haversine distance for point-to-point calculations.

Parameters:
  • polyline_a (List[LngLat]) – First polyline as list of coordinates

  • polyline_b (List[LngLat]) – Second polyline as list of coordinates

Returns:

Hausdorff distance in meters

Return type:

float

Raises:

ValueError – If either polyline is empty

Examples

>>> from rapidgeo import LngLat
>>> from rapidgeo.similarity.hausdorff import hausdorff
>>> points1 = [LngLat(0.0, 0.0), LngLat(1.0, 0.0)]
>>> points2 = [LngLat(0.0, 1.0), LngLat(1.0, 1.0)]
>>> distance = hausdorff(points1, points2)
rapidgeo.similarity.hausdorff.hausdorff_with_threshold(polyline_a, polyline_b, threshold)

Calculate Hausdorff distance with early termination threshold.

Optimized version that can terminate early if the distance exceeds a threshold. Useful for filtering point sets based on similarity bounds.

Parameters:
  • polyline_a (List[LngLat]) – First polyline as list of coordinates

  • polyline_b (List[LngLat]) – Second polyline as list of coordinates

  • threshold (float) – Maximum distance threshold in meters

Returns:

Hausdorff distance in meters, or value > threshold if exceeded

Return type:

float

Raises:

ValueError – If either polyline is empty

Examples

>>> from rapidgeo.similarity.hausdorff import hausdorff_with_threshold
>>> distance = hausdorff_with_threshold(points1, points2, 500.0)

Fréchet Distance

The discrete Fréchet distance is ideal for comparing trajectories where point order matters, such as GPS tracks or movement paths.

Basic Usage:

from rapidgeo.distance import LngLat
from rapidgeo.similarity.frechet import discrete_frechet

# Two similar GPS tracks with slight variations
track_a = [
    LngLat.new_deg(-122.4194, 37.7749),  # San Francisco
    LngLat.new_deg(-122.4180, 37.7755),  # Slight detour
    LngLat.new_deg(-122.4000, 37.7800),  # Continuing north
    LngLat.new_deg(-122.3900, 37.7850),  # End point
]

track_b = [
    LngLat.new_deg(-122.4194, 37.7749),  # Same start
    LngLat.new_deg(-122.4170, 37.7760),  # Different route
    LngLat.new_deg(-122.4010, 37.7790),  # Close to track_a
    LngLat.new_deg(-122.3900, 37.7850),  # Same end
]

# Calculate similarity
distance = discrete_frechet(track_a, track_b)
print(f"Fréchet distance: {distance:.1f} meters")

With Early Termination:

For performance when you only need to know if curves are similar within a threshold:

from rapidgeo.similarity.frechet import discrete_frechet_with_threshold

threshold = 50.0  # 50 meters
distance = discrete_frechet_with_threshold(track_a, track_b, threshold)

if distance <= threshold:
    print("Tracks are similar within 50 meters")
else:
    print(f"Tracks differ by at least {distance:.1f} meters")

Hausdorff Distance

The Hausdorff distance is useful when point order doesn’t matter, such as comparing building outlines or coastline segments.

Basic Usage:

from rapidgeo.similarity.hausdorff import hausdorff

# Two representations of the same geographic feature
coastline_detailed = [
    LngLat.new_deg(-123.0, 37.0),
    LngLat.new_deg(-122.9, 37.1),
    LngLat.new_deg(-122.8, 37.0),
    LngLat.new_deg(-122.7, 37.1),
    LngLat.new_deg(-122.6, 37.0),
]

coastline_simplified = [
    LngLat.new_deg(-123.0, 37.0),
    LngLat.new_deg(-122.8, 37.05),  # Simplified curve
    LngLat.new_deg(-122.6, 37.0),
]

# Calculate maximum deviation
max_distance = hausdorff(coastline_detailed, coastline_simplified)
print(f"Maximum deviation: {max_distance:.1f} meters")

With Early Termination:

from rapidgeo.similarity.hausdorff import hausdorff_with_threshold

threshold = 100.0  # 100 meters
distance = hausdorff_with_threshold(coastline_detailed, coastline_simplified, threshold)

if distance <= threshold:
    print("Simplified coastline is within 100 meters of original")
else:
    print(f"Simplification error exceeds {threshold} meters")

Practical Applications

GPS Track Comparison:

Compare recorded GPS tracks to detect similar routes:

def are_routes_similar(route1, route2, tolerance_m=25.0):
    """Check if two GPS routes are similar within tolerance."""
    distance = discrete_frechet_with_threshold(route1, route2, tolerance_m)
    return distance <= tolerance_m

# Example usage
morning_commute = [LngLat.new_deg(-122.4, 37.7), LngLat.new_deg(-122.3, 37.8)]
evening_commute = [LngLat.new_deg(-122.4, 37.7), LngLat.new_deg(-122.3, 37.8)]

if are_routes_similar(morning_commute, evening_commute):
    print("Similar commute pattern detected")

Map Data Quality Assessment:

Check how well simplified data represents original features:

def assess_simplification_quality(original, simplified, max_error_m=10.0):
    """Assess quality of line simplification."""
    error = hausdorff(original, simplified)

    if error <= max_error_m:
        return "Excellent quality"
    elif error <= max_error_m * 2:
        return "Good quality"
    elif error <= max_error_m * 5:
        return "Acceptable quality"
    else:
        return "Poor quality"

# Example usage
detailed_border = [/* many points */]
simplified_border = [/* fewer points */]

quality = assess_simplification_quality(detailed_border, simplified_border)
print(f"Simplification quality: {quality}")

Trajectory Clustering:

Group similar trajectories using Fréchet distance:

def find_similar_trajectories(trajectories, reference, threshold_m=50.0):
    """Find all trajectories similar to a reference trajectory."""
    similar = []

    for i, trajectory in enumerate(trajectories):
        distance = discrete_frechet_with_threshold(reference, trajectory, threshold_m)
        if distance <= threshold_m:
            similar.append((i, trajectory, distance))

    # Sort by similarity (ascending distance)
    similar.sort(key=lambda x: x[2])
    return similar

# Example usage
all_routes = [route1, route2, route3, route4]
reference_route = route1

similar_routes = find_similar_trajectories(all_routes, reference_route)
print(f"Found {len(similar_routes)} similar routes")

Algorithm Comparison

When to Use Fréchet Distance:

  • Comparing GPS tracks or trajectories

  • Order of points matters (start → end sequence)

  • Want to find similar paths or routes

  • Temporal data where sequence is important

When to Use Hausdorff Distance:

  • Comparing shapes or outlines

  • Order of points doesn’t matter

  • Assessing simplification quality

  • Measuring maximum deviation between curves

Algorithm Complexity:

  • Fréchet: Requires storing intermediate results, memory usage grows with input size

  • Hausdorff: Uses constant memory regardless of input size

  • Both algorithms compare each point in one curve to points in the other curve

Security Limits

To prevent memory exhaustion, the following limits are enforced:

Fréchet Distance: * Maximum 10,000 points per curve * Memory usage proportional to n × m

Hausdorff Distance: * Maximum 50,000 points per curve * Constant memory usage regardless of input size

Attempting to process curves exceeding these limits will raise a ValueError.

Example:

# This will raise ValueError: curve size limited
huge_curve = [LngLat.new_deg(i*0.001, 0) for i in range(15000)]
small_curve = [LngLat.new_deg(0, 0)]

try:
    distance = discrete_frechet(huge_curve, small_curve)
except ValueError as e:
    print(f"Error: {e}")

Error Conditions

Common errors and their causes:

ValueError: “Empty input curves”

One or both input curves contain no points.

ValueError: “curve size limited”

Input curves exceed security limits (see above).

TypeError

Input is not a sequence of LngLat objects.

Example Error Handling:

def safe_frechet_distance(curve1, curve2):
    """Calculate Fréchet distance with error handling."""
    try:
        return discrete_frechet(curve1, curve2)
    except ValueError as e:
        if "Empty input" in str(e):
            print("Error: Cannot compare empty curves")
        elif "size limited" in str(e):
            print("Error: Input curves too large")
        return None
    except Exception as e:
        print(f"Unexpected error: {e}")
        return None

Tips and Best Practices

Preprocessing:

# Remove duplicate consecutive points to improve performance
def remove_duplicates(curve):
    if not curve:
        return curve

    result = [curve[0]]
    for point in curve[1:]:
        if point != result[-1]:  # LngLat implements __eq__
            result.append(point)
    return result

cleaned_curve = remove_duplicates(noisy_gps_track)

Performance Optimization:

# Use threshold variants when possible
threshold = 100.0  # meters

# Faster - early termination
distance = discrete_frechet_with_threshold(curve1, curve2, threshold)

# Slower - always computes full distance
distance = discrete_frechet(curve1, curve2)

Coordinate Precision:

# High precision for small-scale comparisons
building_corner_a = LngLat.new_deg(-122.419400, 37.774900)
building_corner_b = LngLat.new_deg(-122.419401, 37.774901)

# Lower precision acceptable for large-scale comparisons
city_a = LngLat.new_deg(-122.42, 37.77)
city_b = LngLat.new_deg(-122.41, 37.78)