Coordinate Format Detection
rapidgeo automatically detects and converts coordinate data from various formats into its standard longitude, latitude (lng, lat) representation. This system handles the common problem of coordinate data coming in different structures and orderings.
How Format Detection Works
The system uses a two-stage approach:
Structure Detection: Examines the data type and structure to identify the format
Coordinate Detection: For ambiguous formats, analyzes coordinate values to determine lng,lat vs lat,lng ordering
The detection process follows this hierarchy:
NumPy arrays (when available) - fastest path with zero-copy when possible
Python lists with format-specific parsing
Automatic coordinate ordering detection for ambiguous cases
Supported Input Formats
Tuple/List Format
Coordinate pairs as tuples or lists:
from rapidgeo.formats import coords_to_lnglat
# Tuple format - detected automatically
coords = [
(-122.4194, 37.7749), # San Francisco
(-74.0060, 40.7128), # New York
(-87.6298, 41.8781), # Chicago
]
result = coords_to_lnglat(coords)
# Also works with lists
coords = [
[-122.4194, 37.7749],
[-74.0060, 40.7128]
]
result = coords_to_lnglat(coords)
Flat Array Format
Coordinates as a flat array of alternating longitude and latitude values:
# Flat array: [lng1, lat1, lng2, lat2, ...]
coords = [
-122.4194, 37.7749, # San Francisco
-74.0060, 40.7128, # New York
-87.6298, 41.8781, # Chicago
]
result = coords_to_lnglat(coords)
This format is common in graphics APIs, database storage, and NumPy arrays.
GeoJSON-like Format
Dictionary objects with coordinate arrays following GeoJSON Point structure:
coords = [
{"coordinates": [-122.4194, 37.7749]}, # San Francisco
{"coordinates": [-74.0060, 40.7128]}, # New York
{"coordinates": [-87.6298, 41.8781]}, # Chicago
]
result = coords_to_lnglat(coords)
The coordinates array must contain exactly two elements: [longitude, latitude].
NumPy Array Format
When the numpy feature is enabled, various NumPy array formats are supported:
import numpy as np
from rapidgeo.formats import coords_to_lnglat
# 2D array (N, 2) - fastest path
coords = np.array([
[-122.4194, 37.7749],
[-74.0060, 40.7128]
])
result = coords_to_lnglat(coords)
# 1D flat array
coords = np.array([-122.4194, 37.7749, -74.0060, 40.7128])
result = coords_to_lnglat(coords)
# Dynamic arrays also supported
coords = np.array([[-122.4194, 37.7749], [-74.0060, 40.7128]], dtype=object)
result = coords_to_lnglat(coords)
Automatic Coordinate Ordering Detection
For tuple and flat array formats, the system automatically determines whether coordinates are in lng,lat or lat,lng order by analyzing the coordinate values.
Detection Algorithm
The algorithm uses statistical analysis of coordinate ranges:
Validation: Checks if coordinates fit within valid ranges: - Longitude: -180° to +180° - Latitude: -90° to +90°
Sampling: Examines up to 100 coordinate pairs for performance
Scoring: Counts valid coordinates for each interpretation (lng,lat vs lat,lng)
Confidence: Uses 95% confidence threshold with early termination
Decision: Returns the format with more valid coordinates
Examples of Automatic Detection
Clear lng,lat format (negative longitudes in Western Hemisphere):
# These are clearly lng,lat due to longitude values < -90
coords = [
(-122.4194, 37.7749), # San Francisco: lng=-122° (clearly longitude)
(-74.0060, 40.7128), # New York: lng=-74° (clearly longitude)
]
result = coords_to_lnglat(coords)
# Result: coordinates used as-is
Clear lat,lng format (detected and corrected):
# These appear to be lat,lng and will be automatically swapped
coords = [
(37.7749, -122.4194), # San Francisco: 37° lat, -122° lng
(40.7128, -74.0060), # New York: 40° lat, -74° lng
]
result = coords_to_lnglat(coords)
# Result: automatically corrected to lng,lat order
Ambiguous coordinates (fallback to lng,lat):
# These could be valid in either order
coords = [
(45.0, 60.0), # Both values within ±90°
(30.0, -80.0), # Could be interpreted either way
]
result = coords_to_lnglat(coords)
# Result: treats as lng,lat (default assumption)
Performance Characteristics
Format Detection Speed
NumPy 2D arrays: Zero-copy for contiguous arrays (~1μs)
Flat arrays: Direct memory copy (~10μs for 1000 points)
Tuple lists: Python iteration required (~100μs for 1000 points)
GeoJSON objects: Dictionary access overhead (~500μs for 1000 points)
Detection Optimizations
Early termination: Stops when 95% confidence reached (typically after 10-20 samples)
Sampling limit: Maximum 100 coordinates analyzed regardless of input size
Zero-copy paths: Direct memory access for compatible NumPy arrays
Format caching: Structure detection happens once per input
Memory Usage
Zero additional memory: For already-correct lng,lat format
Single copy: For format conversion (input size × 2 × 8 bytes)
Minimal overhead: Detection uses <1KB regardless of input size
Error Handling and Edge Cases
Format Errors
from rapidgeo.formats import coords_to_lnglat
# Empty input - returns empty list
coords = []
result = coords_to_lnglat(coords) # []
# Malformed GeoJSON - raises KeyError
coords = [{"not_coordinates": [1.0, 2.0]}]
try:
result = coords_to_lnglat(coords)
except KeyError as e:
print(f"Missing coordinates key: {e}")
# Wrong coordinate count - raises ValueError
coords = [{"coordinates": [1.0]}] # Only one coordinate
try:
result = coords_to_lnglat(coords)
except ValueError as e:
print(f"Invalid coordinate array: {e}")
Invalid Coordinates
The system preserves invalid coordinates but they don’t affect format detection:
# Out-of-range coordinates are preserved
coords = [
(-122.4194, 37.7749), # Valid
(200.0, 95.0), # Invalid (out of range)
(-74.0060, 40.7128), # Valid
]
result = coords_to_lnglat(coords)
# Detection based only on valid coordinates
# Invalid coordinates passed through unchanged
Handling Mixed Data
# Mix of valid and invalid affects confidence but not correctness
coords = [
(-122.4194, 37.7749), # Clearly lng,lat
(0.0, 0.0), # Ambiguous but valid both ways
(-74.0060, 40.7128), # Clearly lng,lat
(500.0, 600.0), # Invalid coordinates
]
result = coords_to_lnglat(coords)
# Algorithm detects lng,lat from the clear examples
Practical Usage Examples
Converting GPS Track Data
from rapidgeo.formats import coords_to_lnglat
# GPS data might come in various formats
def standardize_gps_track(track_data):
"""Convert any GPS track format to standard LngLat."""
return coords_to_lnglat(track_data)
# Works with different input formats
gps_track_tuples = [(-122.41, 37.77), (-122.42, 37.78)]
gps_track_flat = [-122.41, 37.77, -122.42, 37.78]
gps_track_geojson = [
{"coordinates": [-122.41, 37.77]},
{"coordinates": [-122.42, 37.78]}
]
# All produce identical results
track1 = standardize_gps_track(gps_track_tuples)
track2 = standardize_gps_track(gps_track_flat)
track3 = standardize_gps_track(gps_track_geojson)
Working with DataFrames
import pandas as pd
from rapidgeo.formats import coords_to_lnglat
# DataFrame with separate lat/lng columns
df = pd.DataFrame({
'latitude': [37.7749, 40.7128, 41.8781],
'longitude': [-122.4194, -74.0060, -87.6298]
})
# Convert to coordinate pairs (note: lat,lng order from DataFrame)
coord_pairs = list(zip(df['latitude'], df['longitude']))
# System will detect this is lat,lng and correct it
standardized = coords_to_lnglat(coord_pairs)
API Integration
from rapidgeo.formats import coords_to_lnglat
def process_api_coordinates(api_response):
"""Handle coordinates from external API."""
# API might return various formats
if 'coordinates' in api_response:
# GeoJSON-style
coords = [{"coordinates": coord} for coord in api_response['coordinates']]
elif 'points' in api_response:
# Flat array style
coords = api_response['points']
else:
# Assume tuple/list format
coords = api_response['data']
# Automatic detection and conversion
return coords_to_lnglat(coords)
Integration with Other rapidgeo Functions
The format detection system is automatically used by other rapidgeo functions:
from rapidgeo import polyline, distance, simplify
# These functions automatically detect coordinate formats
coords = [(37.7749, -122.4194), (40.7128, -74.0060)] # lat,lng format
# Automatically detected and corrected to lng,lat internally
encoded = polyline.encode(coords)
# Distance calculation also handles format detection
dist = distance.geo.haversine(*coords_to_lnglat(coords[:2]))
# Simplification with automatic format handling
simplified = simplify.douglas_peucker(coords, tolerance=0.001)
Best Practices
Input Validation
While the system is robust, validating your input helps catch issues early:
def validate_and_convert_coordinates(coords):
"""Safely convert coordinates with validation."""
if not coords:
return []
if not isinstance(coords, (list, tuple)):
raise TypeError("Coordinates must be a list or tuple")
# Convert and let system handle format detection
try:
return coords_to_lnglat(coords)
except (ValueError, KeyError, TypeError) as e:
raise ValueError(f"Invalid coordinate format: {e}")
Performance Tips
For maximum performance with large datasets:
import numpy as np
from rapidgeo.formats import coords_to_lnglat
# Use NumPy arrays when possible (fastest)
coords = np.array([[-122.4194, 37.7749], [-74.0060, 40.7128]])
result = coords_to_lnglat(coords) # Zero-copy path
# Pre-convert to correct format if you know the ordering
# (skips detection overhead for very large datasets)
if coordinates_are_lng_lat:
# Direct conversion without detection
result = [LngLat.new_deg(lng, lat) for lng, lat in coord_pairs]
Format Consistency
Within the same application, try to standardize on one coordinate format:
# Good: Consistent format throughout application
COORDINATE_FORMAT = "lng_lat_tuples" # or "flat_array", "geojson", etc.
def load_coordinates(source):
"""Load coordinates in standardized format."""
raw_data = fetch_from_source(source)
return coords_to_lnglat(raw_data) # Always returns LngLat format