feat(changelog): Update changelog with enhanced tagging configuration and improvements

chore(release): Bump version to 1.4.1 and update changelog with title caching features
Merge remote-tracking branch 'origin/main' into feature/title-caching
2025-10-23 15:11:08 +00:00 · 2025-08-08 05:03:57 +00:00 · 2025-08-08 04:57:32 +00:00 · 2025-08-08 04:50:46 +00:00 · 2025-08-07 17:56:36 +00:00 · 2025-08-06 22:15:16 +00:00
14 changed files with 543 additions and 224 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -1,62 +0,0 @@
-# Logs and temporary files
-
-Logs/
-logs/
-temp/
-\*.log
-
-# Sensitive files
-
-key_vault.db
-unshackle/WVDs/
-unshackle/PRDs/
-unshackle/cookies/
-_.prd
-_.wvd
-
-# Cache directories
-
-unshackle/cache/
-**pycache**/
-_.pyc
-_.pyo
-\*.pyd
-.Python
-
-# Development files
-
-.git/
-.gitignore
-.vscode/
-.idea/
-_.swp
-_.swo
-
-# Documentation and plans
-
-plan/
-CONTRIBUTING.md
-CONFIG.md
-AGENTS.md
-OLD-CHANGELOG.md
-cliff.toml
-
-# Installation scripts
-
-install.bat
-
-# Test files
-
-_test_
-_Test_
-
-# Virtual environments
-
-venv/
-env/
-.venv/
-
-# OS generated files
-
-.DS_Store
-Thumbs.db
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,51 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

+## [1.4.1] - 2025-08-08
+
+### Added
+
+- **Title Caching System**: Intelligent title caching to reduce redundant API calls
+  - Configurable title caching with 30-minute default cache duration
+  - 24-hour fallback cache on API failures for improved reliability
+  - Region-aware caching to handle geo-restricted content properly
+  - SHA256 hashing for cache keys to handle complex title IDs
+  - Added `--no-cache` CLI flag to bypass caching when needed
+  - Added `--reset-cache` CLI flag to clear existing cache data
+  - New cache configuration variables in config system
+  - Documented caching options in example configuration file
+  - Significantly improves performance when debugging or modifying CLI parameters
+- **Enhanced Tagging Configuration**: New options for customizing tag behavior
+  - Added `tag_group_name` config option to control group name inclusion in tags
+  - Added `tag_imdb_tmdb` config option to control IMDB/TMDB details in tags
+  - Added Simkl API endpoint support as fallback when no TMDB API key is provided
+  - Enhanced tag_file function to prioritize provided TMDB ID when `--tmdb` flag is used
+  - Improved TMDB ID handling with better prioritization logic
+
+### Changed
+
+- **Language Selection Enhancement**: Improved default language handling
+  - Updated language option default to 'orig' when no `-l` flag is set
+  - Avoids hardcoded 'en' default and respects original content language
+- **Tagging Logic Improvements**: Simplified and enhanced tagging functionality
+  - Simplified Simkl search logic with soft-fail when no results found
+  - Enhanced tag_file function with better TMDB ID prioritization
+  - Improved error handling in tagging operations
+
+### Fixed
+
+- **Subtitle Processing**: Enhanced subtitle filtering for edge cases
+  - Fixed ValueError in subtitle filtering for multiple colons in time references
+  - Improved handling of subtitles containing complex time formatting
+  - Better error handling for malformed subtitle timestamps
+
+### Removed
+
+- **Docker Support**: Removed Docker configuration from repository
+  - Removed Dockerfile and .dockerignore files
+  - Cleaned up README.md Docker-related documentation
+  - Focuses on direct installation methods
+
 ## [1.4.0] - 2025-08-05

 ### Added
--- a/78
+++ b/78
@@ -1,78 +0,0 @@
-FROM python:3.12-slim
-
-# Set environment variables to reduce image size
-ENV PYTHONDONTWRITEBYTECODE=1 \
-    PYTHONUNBUFFERED=1 \
-    UV_CACHE_DIR=/tmp/uv-cache
-
-# Add container metadata
-LABEL org.opencontainers.image.description="Docker image for Unshackle with all required dependencies for downloading media content"
-
-# Install base dependencies
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    wget \
-    gnupg \
-    git \
-    curl \
-    build-essential \
-    cmake \
-    pkg-config \
-    && apt-get clean \
-    && rm -rf /var/lib/apt/lists/*
-
-# Set up repos for mkvtools and bullseye for ccextractor
-RUN wget -O /etc/apt/keyrings/gpg-pub-moritzbunkus.gpg https://mkvtoolnix.download/gpg-pub-moritzbunkus.gpg \
-    && echo "deb [signed-by=/etc/apt/keyrings/gpg-pub-moritzbunkus.gpg] https://mkvtoolnix.download/debian/ bookworm main" >> /etc/apt/sources.list \
-    && echo "deb-src [signed-by=/etc/apt/keyrings/gpg-pub-moritzbunkus.gpg] https://mkvtoolnix.download/debian/ bookworm main" >> /etc/apt/sources.list \
-    && echo "deb http://ftp.debian.org/debian bullseye main" >> /etc/apt/sources.list
-
-# Install all dependencies from apt
-RUN apt-get update && apt-get install -y \
-    ffmpeg \
-    ccextractor \
-    mkvtoolnix \
-    aria2 \
-    libmediainfo-dev \
-    && apt-get clean \
-    && rm -rf /var/lib/apt/lists/*
-
-# Install shaka packager
-RUN wget https://github.com/shaka-project/shaka-packager/releases/download/v2.6.1/packager-linux-x64 \
-    && chmod +x packager-linux-x64 \
-    && mv packager-linux-x64 /usr/local/bin/packager
-
-# Install N_m3u8DL-RE
-RUN wget https://github.com/nilaoda/N_m3u8DL-RE/releases/download/v0.3.0-beta/N_m3u8DL-RE_v0.3.0-beta_linux-x64_20241203.tar.gz \
-    && tar -xzf N_m3u8DL-RE_v0.3.0-beta_linux-x64_20241203.tar.gz \
-    && mv N_m3u8DL-RE /usr/local/bin/ \
-    && chmod +x /usr/local/bin/N_m3u8DL-RE \
-    && rm N_m3u8DL-RE_v0.3.0-beta_linux-x64_20241203.tar.gz
-
-# Create binaries directory and add symlinks for all required executables
-RUN mkdir -p /app/binaries && \
-    ln -sf /usr/bin/ffprobe /app/binaries/ffprobe && \
-    ln -sf /usr/bin/ffmpeg /app/binaries/ffmpeg && \
-    ln -sf /usr/bin/mkvmerge /app/binaries/mkvmerge && \
-    ln -sf /usr/local/bin/N_m3u8DL-RE /app/binaries/N_m3u8DL-RE && \
-    ln -sf /usr/local/bin/packager /app/binaries/packager && \
-    ln -sf /usr/local/bin/packager /usr/local/bin/shaka-packager && \
-    ln -sf /usr/local/bin/packager /usr/local/bin/packager-linux-x64
-
-# Install uv
-RUN pip install --no-cache-dir uv
-
-# Set working directory
-WORKDIR /app
-
-# Copy dependency files and README (required by pyproject.toml)
-COPY pyproject.toml uv.lock README.md ./
-
-# Copy source code first
-COPY unshackle/ ./unshackle/
-
-# Install dependencies with uv (including the project itself)
-RUN uv sync --frozen --no-dev
-
-# Set entrypoint to allow passing commands directly to unshackle
-ENTRYPOINT ["uv", "run", "unshackle"]
-CMD ["-h"]
--- a/README.md
+++ b/README.md
@@ -42,45 +42,6 @@ uv tool install git+https://github.com/unshackle-dl/unshackle.git
 uvx unshackle --help   # or just `unshackle` once PATH updated
 ```

-### Docker Installation
-
-Run unshackle using our pre-built Docker image from GitHub Container Registry:
-
-```bash
-# Run with default help command
-docker run --rm ghcr.io/unshackle-dl/unshackle:latest
-
-# Check environment dependencies
-docker run --rm ghcr.io/unshackle-dl/unshackle:latest env check
-
-# Download content (mount directories for persistent data)
-docker run --rm \
-  -v "$(pwd)/unshackle/downloads:/app/downloads" \
-  -v "$(pwd)/unshackle/cookies:/app/unshackle/cookies" \
-  -v "$(pwd)/unshackle/services:/app/unshackle/services" \
-  -v "$(pwd)/unshackle/WVDs:/app/unshackle/WVDs" \
-  -v "$(pwd)/unshackle/PRDs:/app/unshackle/PRDs" \
-  -v "$(pwd)/unshackle/unshackle.yaml:/app/unshackle.yaml" \
-  ghcr.io/unshackle-dl/unshackle:latest dl SERVICE_NAME CONTENT_ID
-
-# Run interactively for configuration
-docker run --rm -it \
-  -v "$(pwd)/unshackle/cookies:/app/unshackle/cookies" \
-  -v "$(pwd)/unshackle/services:/app/unshackle/services" \
-  -v "$(pwd)/unshackle.yaml:/app/unshackle.yaml" \
-  ghcr.io/unshackle-dl/unshackle:latest cfg
-```
-
-**Alternative: Build locally**
-
-```bash
-# Clone and build your own image
-git clone https://github.com/unshackle-dl/unshackle.git
-cd unshackle
-docker build -t unshackle .
-docker run --rm unshackle env check
-```
-
 > [!NOTE]
 > After installation, you may need to add the installation path to your PATH environment variable if prompted.

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"

 [project]
 name = "unshackle"
-version = "1.4.0"
+version = "1.4.1"
 description = "Modular Movie, TV, and Music Archival Software."
 authors = [{ name = "unshackle team" }]
 requires-python = ">=3.10,<3.13"
--- a/unshackle/commands/dl.py
+++ b/unshackle/commands/dl.py
@@ -143,7 +143,7 @@ class dl:
        "-l",
        "--lang",
        type=LANGUAGE_RANGE,
-        default="en",
+        default="orig",
        help="Language wanted for Video and Audio. Use 'orig' to select the original language, e.g. 'orig,en' for both original and English.",
    )
    @click.option(
@@ -240,6 +240,8 @@ class dl:
        help="Max workers/threads to download with per-track. Default depends on the downloader.",
    )
    @click.option("--downloads", type=int, default=1, help="Amount of tracks to download concurrently.")
+    @click.option("--no-cache", "no_cache", is_flag=True, default=False, help="Bypass title cache for this download.")
+    @click.option("--reset-cache", "reset_cache", is_flag=True, default=False, help="Clear title cache before fetching.")
    @click.pass_context
    def cli(ctx: click.Context, **kwargs: Any) -> dl:
        return dl(ctx, **kwargs)
@@ -436,6 +438,7 @@ class dl:
        **__: Any,
    ) -> None:
        self.tmdb_searched = False
+        self.search_source = None
        start_time = time.time()

        # Check if dovi_tool is available when hybrid mode is requested
@@ -460,7 +463,7 @@ class dl:
                self.log.info("Authenticated with Service")

        with console.status("Fetching Title Metadata...", spinner="dots"):
-            titles = service.get_titles()
+            titles = service.get_titles_cached()
            if not titles:
                self.log.error("No titles returned, nothing to download...")
                sys.exit(1)
@@ -493,34 +496,34 @@ class dl:
                if self.tmdb_id:
                    tmdb_title = tags.get_title(self.tmdb_id, kind)
                else:
-                    self.tmdb_id, tmdb_title = tags.search_tmdb(title.title, title.year, kind)
+                    self.tmdb_id, tmdb_title, self.search_source = tags.search_show_info(title.title, title.year, kind)
                    if not (self.tmdb_id and tmdb_title and tags.fuzzy_match(tmdb_title, title.title)):
                        self.tmdb_id = None
                if list_ or list_titles:
                    if self.tmdb_id:
                        console.print(
                            Padding(
-                                f"TMDB -> {tmdb_title or '?'} [bright_black](ID {self.tmdb_id})",
+                                f"Search -> {tmdb_title or '?'} [bright_black](ID {self.tmdb_id})",
                                (0, 5),
                            )
                        )
                    else:
-                        console.print(Padding("TMDB -> [bright_black]No match found[/]", (0, 5)))
+                        console.print(Padding("Search -> [bright_black]No match found[/]", (0, 5)))
                self.tmdb_searched = True

            if isinstance(title, Movie) and (list_ or list_titles) and not self.tmdb_id:
-                movie_id, movie_title = tags.search_tmdb(title.name, title.year, "movie")
+                movie_id, movie_title, _ = tags.search_show_info(title.name, title.year, "movie")
                if movie_id:
                    console.print(
                        Padding(
-                            f"TMDB -> {movie_title or '?'} [bright_black](ID {movie_id})",
+                            f"Search -> {movie_title or '?'} [bright_black](ID {movie_id})",
                            (0, 5),
                        )
                    )
                else:
-                    console.print(Padding("TMDB -> [bright_black]No match found[/]", (0, 5)))
+                    console.print(Padding("Search -> [bright_black]No match found[/]", (0, 5)))

-            if self.tmdb_id:
+            if self.tmdb_id and getattr(self, 'search_source', None) != 'simkl':
                kind = "tv" if isinstance(title, Episode) else "movie"
                tags.external_ids(self.tmdb_id, kind)
                if self.tmdb_year:
--- a/unshackle/core/init.py
+++ b/unshackle/core/init.py
@@ -1 +1 @@
-__version__ = "1.4.0"
+__version__ = "1.4.1"
--- a/unshackle/core/config.py
+++ b/unshackle/core/config.py
@@ -85,11 +85,17 @@ class Config:

        self.set_terminal_bg: bool = kwargs.get("set_terminal_bg", False)
        self.tag: str = kwargs.get("tag") or ""
+        self.tag_group_name: bool = kwargs.get("tag_group_name", True)
+        self.tag_imdb_tmdb: bool = kwargs.get("tag_imdb_tmdb", True)
        self.tmdb_api_key: str = kwargs.get("tmdb_api_key") or ""
        self.update_checks: bool = kwargs.get("update_checks", True)
        self.update_check_interval: int = kwargs.get("update_check_interval", 24)
        self.scene_naming: bool = kwargs.get("scene_naming", True)

+        self.title_cache_time: int = kwargs.get("title_cache_time", 1800)  # 30 minutes default
+        self.title_cache_max_retention: int = kwargs.get("title_cache_max_retention", 86400)  # 24 hours default
+        self.title_cache_enabled: bool = kwargs.get("title_cache_enabled", True)
+
    @classmethod
    def from_yaml(cls, path: Path) -> Config:
        if not path.exists():
--- a/unshackle/core/service.py
+++ b/unshackle/core/service.py
@@ -21,6 +21,7 @@ from unshackle.core.constants import AnyTrack
 from unshackle.core.credential import Credential
 from unshackle.core.drm import DRM_T
 from unshackle.core.search_result import SearchResult
+from unshackle.core.title_cacher import TitleCacher, get_account_hash, get_region_from_proxy
 from unshackle.core.titles import Title_T, Titles_T
 from unshackle.core.tracks import Chapters, Tracks
 from unshackle.core.utilities import get_ip_info
@@ -42,6 +43,12 @@ class Service(metaclass=ABCMeta):

        self.session = self.get_session()
        self.cache = Cacher(self.__class__.__name__)
+        self.title_cache = TitleCacher(self.__class__.__name__)
+
+        # Store context for cache control flags and credential
+        self.ctx = ctx
+        self.credential = None  # Will be set in authenticate()
+        self.current_region = None  # Will be set based on proxy/geolocation

        if not ctx.parent or not ctx.parent.params.get("no_proxy"):
            if ctx.parent:
@@ -79,6 +86,15 @@ class Service(metaclass=ABCMeta):
                            ).decode()
                        }
                    )
+                # Store region from proxy
+                self.current_region = get_region_from_proxy(proxy)
+            else:
+                # No proxy, try to get current region
+                try:
+                    ip_info = get_ip_info(self.session)
+                    self.current_region = ip_info.get("country", "").lower() if ip_info else None
+                except Exception:
+                    self.current_region = None

    # Optional Abstract functions
    # The following functions may be implemented by the Service.
@@ -123,6 +139,9 @@ class Service(metaclass=ABCMeta):
                raise TypeError(f"Expected cookies to be a {CookieJar}, not {cookies!r}.")
            self.session.cookies.update(cookies)

+        # Store credential for cache key generation
+        self.credential = credential
+
    def search(self) -> Generator[SearchResult, None, None]:
        """
        Search by query for titles from the Service.
@@ -187,6 +206,52 @@ class Service(metaclass=ABCMeta):
        This can be useful to store information on each title that will be required like any sub-asset IDs, or such.
        """

+    def get_titles_cached(self, title_id: str = None) -> Titles_T:
+        """
+        Cached wrapper around get_titles() to reduce redundant API calls.
+
+        This method checks the cache before calling get_titles() and handles
+        fallback to cached data when API calls fail.
+
+        Args:
+            title_id: Optional title ID for cache key generation.
+                     If not provided, will try to extract from service instance.
+
+        Returns:
+            Titles object (Movies, Series, or Album)
+        """
+        # Try to get title_id from service instance if not provided
+        if title_id is None:
+            # Different services store the title ID in different attributes
+            if hasattr(self, "title"):
+                title_id = self.title
+            elif hasattr(self, "title_id"):
+                title_id = self.title_id
+            else:
+                # If we can't determine title_id, just call get_titles directly
+                self.log.debug("Cannot determine title_id for caching, bypassing cache")
+                return self.get_titles()
+
+        # Get cache control flags from context
+        no_cache = False
+        reset_cache = False
+        if self.ctx and self.ctx.parent:
+            no_cache = self.ctx.parent.params.get("no_cache", False)
+            reset_cache = self.ctx.parent.params.get("reset_cache", False)
+
+        # Get account hash for cache key
+        account_hash = get_account_hash(self.credential)
+
+        # Use title cache to get titles with fallback support
+        return self.title_cache.get_cached_titles(
+            title_id=str(title_id),
+            fetch_function=self.get_titles,
+            region=self.current_region,
+            account_hash=account_hash,
+            no_cache=no_cache,
+            reset_cache=reset_cache,
+        )
+
    @abstractmethod
    def get_tracks(self, title: Title_T) -> Tracks:
        """
--- a/unshackle/core/title_cacher.py
+++ b/unshackle/core/title_cacher.py
@@ -0,0 +1,240 @@
+from __future__ import annotations
+
+import hashlib
+import logging
+from datetime import datetime, timedelta
+from typing import Optional
+
+from unshackle.core.cacher import Cacher
+from unshackle.core.config import config
+from unshackle.core.titles import Titles_T
+
+
+class TitleCacher:
+    """
+    Handles caching of Title objects to reduce redundant API calls.
+
+    This wrapper provides:
+    - Region-aware caching to handle geo-restricted content
+    - Automatic fallback to cached data when API calls fail
+    - Cache lifetime extension during failures
+    - Cache hit/miss statistics for debugging
+    """
+
+    def __init__(self, service_name: str):
+        self.service_name = service_name
+        self.log = logging.getLogger(f"{service_name}.TitleCache")
+        self.cacher = Cacher(service_name)
+        self.stats = {"hits": 0, "misses": 0, "fallbacks": 0}
+
+    def _generate_cache_key(
+        self, title_id: str, region: Optional[str] = None, account_hash: Optional[str] = None
+    ) -> str:
+        """
+        Generate a unique cache key for title data.
+
+        Args:
+            title_id: The title identifier
+            region: The region/proxy identifier
+            account_hash: Hash of account credentials (if applicable)
+
+        Returns:
+            A unique cache key string
+        """
+        # Hash the title_id to handle complex IDs (URLs, dots, special chars)
+        # This ensures consistent length and filesystem-safe keys
+        title_hash = hashlib.sha256(title_id.encode()).hexdigest()[:16]
+
+        # Start with base key using hash
+        key_parts = ["titles", title_hash]
+
+        # Add region if available
+        if region:
+            key_parts.append(region.lower())
+
+        # Add account hash if available
+        if account_hash:
+            key_parts.append(account_hash[:8])  # Use first 8 chars of hash
+
+        # Join with underscores
+        cache_key = "_".join(key_parts)
+
+        # Log the mapping for debugging
+        self.log.debug(f"Cache key mapping: {title_id} -> {cache_key}")
+
+        return cache_key
+
+    def get_cached_titles(
+        self,
+        title_id: str,
+        fetch_function,
+        region: Optional[str] = None,
+        account_hash: Optional[str] = None,
+        no_cache: bool = False,
+        reset_cache: bool = False,
+    ) -> Optional[Titles_T]:
+        """
+        Get titles from cache or fetch from API with fallback support.
+
+        Args:
+            title_id: The title identifier
+            fetch_function: Function to call to fetch fresh titles
+            region: The region/proxy identifier
+            account_hash: Hash of account credentials
+            no_cache: Bypass cache completely
+            reset_cache: Clear cache before fetching
+
+        Returns:
+            Titles object (Movies, Series, or Album)
+        """
+        # If caching is globally disabled or no_cache flag is set
+        if not config.title_cache_enabled or no_cache:
+            self.log.debug("Cache bypassed, fetching fresh titles")
+            return fetch_function()
+
+        # Generate cache key
+        cache_key = self._generate_cache_key(title_id, region, account_hash)
+
+        # If reset_cache flag is set, clear the cache entry
+        if reset_cache:
+            self.log.info(f"Clearing cache for {cache_key}")
+            cache_path = (config.directories.cache / self.service_name / cache_key).with_suffix(".json")
+            if cache_path.exists():
+                cache_path.unlink()
+
+        # Try to get from cache
+        cache = self.cacher.get(cache_key, version=1)
+
+        # Check if we have valid cached data
+        if cache and not cache.expired:
+            self.stats["hits"] += 1
+            self.log.debug(f"Cache hit for {title_id} (hits: {self.stats['hits']}, misses: {self.stats['misses']})")
+            return cache.data
+
+        # Cache miss or expired, try to fetch fresh data
+        self.stats["misses"] += 1
+        self.log.debug(f"Cache miss for {title_id}, fetching fresh data")
+
+        try:
+            # Attempt to fetch fresh titles
+            titles = fetch_function()
+
+            if titles:
+                # Successfully fetched, update cache
+                self.log.debug(f"Successfully fetched titles for {title_id}, updating cache")
+                cache = self.cacher.get(cache_key, version=1)
+                cache.set(titles, expiration=datetime.now() + timedelta(seconds=config.title_cache_time))
+
+            return titles
+
+        except Exception as e:
+            # API call failed, check if we have fallback cached data
+            if cache and cache.data:
+                # We have expired cached data, use it as fallback
+                current_time = datetime.now()
+                max_retention_time = cache.expiration + timedelta(
+                    seconds=config.title_cache_max_retention - config.title_cache_time
+                )
+
+                if current_time < max_retention_time:
+                    self.stats["fallbacks"] += 1
+                    self.log.warning(
+                        f"API call failed for {title_id}, using cached data as fallback "
+                        f"(fallbacks: {self.stats['fallbacks']})"
+                    )
+                    self.log.debug(f"Error was: {e}")
+
+                    # Extend cache lifetime
+                    extended_expiration = current_time + timedelta(minutes=5)
+                    if extended_expiration < max_retention_time:
+                        cache.expiration = extended_expiration
+                        cache.set(cache.data, expiration=extended_expiration)
+
+                    return cache.data
+                else:
+                    self.log.error(f"API call failed and cached data for {title_id} exceeded maximum retention time")
+
+            # Re-raise the exception if no fallback available
+            raise
+
+    def clear_all_title_cache(self):
+        """Clear all title caches for this service."""
+        cache_dir = config.directories.cache / self.service_name
+        if cache_dir.exists():
+            for cache_file in cache_dir.glob("titles_*.json"):
+                cache_file.unlink()
+                self.log.info(f"Cleared cache file: {cache_file.name}")
+
+    def get_cache_stats(self) -> dict:
+        """Get cache statistics."""
+        total = sum(self.stats.values())
+        if total > 0:
+            hit_rate = (self.stats["hits"] / total) * 100
+        else:
+            hit_rate = 0
+
+        return {
+            "hits": self.stats["hits"],
+            "misses": self.stats["misses"],
+            "fallbacks": self.stats["fallbacks"],
+            "hit_rate": f"{hit_rate:.1f}%",
+        }
+
+
+def get_region_from_proxy(proxy_url: Optional[str]) -> Optional[str]:
+    """
+    Extract region identifier from proxy URL.
+
+    Args:
+        proxy_url: The proxy URL string
+
+    Returns:
+        Region identifier or None
+    """
+    if not proxy_url:
+        return None
+
+    # Try to extract region from common proxy patterns
+    # e.g., "us123.nordvpn.com", "gb-proxy.example.com"
+    import re
+
+    # Pattern for NordVPN style
+    nord_match = re.search(r"([a-z]{2})\d+\.nordvpn", proxy_url.lower())
+    if nord_match:
+        return nord_match.group(1)
+
+    # Pattern for country code at start
+    cc_match = re.search(r"([a-z]{2})[-_]", proxy_url.lower())
+    if cc_match:
+        return cc_match.group(1)
+
+    # Pattern for country code subdomain
+    subdomain_match = re.search(r"://([a-z]{2})\.", proxy_url.lower())
+    if subdomain_match:
+        return subdomain_match.group(1)
+
+    return None
+
+
+def get_account_hash(credential) -> Optional[str]:
+    """
+    Generate a hash for account identification.
+
+    Args:
+        credential: Credential object
+
+    Returns:
+        SHA1 hash of the credential or None
+    """
+    if not credential:
+        return None
+
+    # Use existing sha1 property if available
+    if hasattr(credential, "sha1"):
+        return credential.sha1
+
+    # Otherwise generate hash from username
+    if hasattr(credential, "username"):
+        return hashlib.sha1(credential.username.encode()).hexdigest()
+
+    return None
--- a/unshackle/core/tracks/subtitle.py
+++ b/unshackle/core/tracks/subtitle.py
@@ -870,7 +870,18 @@ class Subtitle(Track):
        elif sdh_method == "filter-subs":
            # Force use of filter-subs
            sub = Subtitles(self.path)
-            sub.filter(rm_fonts=True, rm_ast=True, rm_music=True, rm_effects=True, rm_names=True, rm_author=True)
+            try:
+                sub.filter(rm_fonts=True, rm_ast=True, rm_music=True, rm_effects=True, rm_names=True, rm_author=True)
+            except ValueError as e:
+                if "too many values to unpack" in str(e):
+                    # Retry without name removal if the error is due to multiple colons in time references
+                    # This can happen with lines like "at 10:00 and 2:00"
+                    sub = Subtitles(self.path)
+                    sub.filter(
+                        rm_fonts=True, rm_ast=True, rm_music=True, rm_effects=True, rm_names=False, rm_author=True
+                    )
+                else:
+                    raise
            sub.save()
            return
        elif sdh_method == "auto":
@@ -906,7 +917,18 @@ class Subtitle(Track):
            )
        else:
            sub = Subtitles(self.path)
-            sub.filter(rm_fonts=True, rm_ast=True, rm_music=True, rm_effects=True, rm_names=True, rm_author=True)
+            try:
+                sub.filter(rm_fonts=True, rm_ast=True, rm_music=True, rm_effects=True, rm_names=True, rm_author=True)
+            except ValueError as e:
+                if "too many values to unpack" in str(e):
+                    # Retry without name removal if the error is due to multiple colons in time references
+                    # This can happen with lines like "at 10:00 and 2:00"
+                    sub = Subtitles(self.path)
+                    sub.filter(
+                        rm_fonts=True, rm_ast=True, rm_music=True, rm_effects=True, rm_names=False, rm_author=True
+                    )
+                else:
+                    raise
            sub.save()

    def reverse_rtl(self) -> None:
--- a/unshackle/core/utils/tags.py
+++ b/unshackle/core/utils/tags.py
@@ -44,6 +44,89 @@ def fuzzy_match(a: str, b: str, threshold: float = 0.8) -> bool:
    return ratio >= threshold


+def search_simkl(title: str, year: Optional[int], kind: str) -> Tuple[Optional[dict], Optional[str], Optional[int]]:
+    """Search Simkl API for show information by filename (no auth required)."""
+    log.debug("Searching Simkl for %r (%s, %s)", title, kind, year)
+
+    # Construct appropriate filename based on type
+    filename = f"{title}"
+    if year:
+        filename = f"{title} {year}"
+
+    if kind == "tv":
+        filename += " S01E01.mkv"
+    else:  # movie
+        filename += " 2160p.mkv"
+
+    try:
+        resp = requests.post("https://api.simkl.com/search/file", json={"file": filename}, headers=HEADERS, timeout=30)
+        resp.raise_for_status()
+        data = resp.json()
+        log.debug("Simkl API response received")
+
+        # Handle case where SIMKL returns empty list (no results)
+        if isinstance(data, list):
+            log.debug("Simkl returned list (no matches) for %r", filename)
+            return None, None, None
+
+        # Handle TV show responses
+        if data.get("type") == "episode" and "show" in data:
+            show_info = data["show"]
+            show_title = show_info.get("title")
+            show_year = show_info.get("year")
+
+            # Verify title matches and year if provided
+            if not fuzzy_match(show_title, title):
+                log.debug("Simkl title mismatch: searched %r, got %r", title, show_title)
+                return None, None, None
+            if year and show_year and abs(year - show_year) > 1:  # Allow 1 year difference
+                log.debug("Simkl year mismatch: searched %d, got %d", year, show_year)
+                return None, None, None
+
+            tmdb_id = show_info.get("ids", {}).get("tmdbtv")
+            if tmdb_id:
+                tmdb_id = int(tmdb_id)
+            log.debug("Simkl -> %s (TMDB ID %s)", show_title, tmdb_id)
+            return data, show_title, tmdb_id
+
+        # Handle movie responses
+        elif data.get("type") == "movie" and "movie" in data:
+            movie_info = data["movie"]
+            movie_title = movie_info.get("title")
+            movie_year = movie_info.get("year")
+
+            # Verify title matches and year if provided
+            if not fuzzy_match(movie_title, title):
+                log.debug("Simkl title mismatch: searched %r, got %r", title, movie_title)
+                return None, None, None
+            if year and movie_year and abs(year - movie_year) > 1:  # Allow 1 year difference
+                log.debug("Simkl year mismatch: searched %d, got %d", year, movie_year)
+                return None, None, None
+
+            ids = movie_info.get("ids", {})
+            tmdb_id = ids.get("tmdb") or ids.get("moviedb")
+            if tmdb_id:
+                tmdb_id = int(tmdb_id)
+            log.debug("Simkl -> %s (TMDB ID %s)", movie_title, tmdb_id)
+            return data, movie_title, tmdb_id
+
+    except (requests.RequestException, ValueError, KeyError) as exc:
+        log.debug("Simkl search failed: %s", exc)
+
+    return None, None, None
+
+
+def search_show_info(title: str, year: Optional[int], kind: str) -> Tuple[Optional[int], Optional[str], Optional[str]]:
+    """Search for show information, trying Simkl first, then TMDB fallback. Returns (tmdb_id, title, source)."""
+    simkl_data, simkl_title, simkl_tmdb_id = search_simkl(title, year, kind)
+
+    if simkl_data and simkl_title and fuzzy_match(simkl_title, title):
+        return simkl_tmdb_id, simkl_title, "simkl"
+
+    tmdb_id, tmdb_title = search_tmdb(title, year, kind)
+    return tmdb_id, tmdb_title, "tmdb"
+
+
 def search_tmdb(title: str, year: Optional[int], kind: str) -> Tuple[Optional[int], Optional[str]]:
    api_key = _api_key()
    if not api_key:
@@ -202,10 +285,8 @@ def tag_file(path: Path, title: Title, tmdb_id: Optional[int] | None = None) ->
    log.debug("Tagging file %s with title %r", path, title)
    standard_tags: dict[str, str] = {}
    custom_tags: dict[str, str] = {}
-    # To add custom information to the tags
-    # custom_tags["Text to the left side"] = "Text to the right side"

-    if config.tag:
+    if config.tag and config.tag_group_name:
        custom_tags["Group"] = config.tag
    description = getattr(title, "description", None)
    if description:
@@ -216,12 +297,6 @@ def tag_file(path: Path, title: Title, tmdb_id: Optional[int] | None = None) ->
            description = truncated + "..."
        custom_tags["Description"] = description

-    api_key = _api_key()
-    if not api_key:
-        log.debug("No TMDB API key set; applying basic tags only")
-        _apply_tags(path, custom_tags)
-        return
-
    if isinstance(title, Movie):
        kind = "movie"
        name = title.name
@@ -234,32 +309,60 @@ def tag_file(path: Path, title: Title, tmdb_id: Optional[int] | None = None) ->
        _apply_tags(path, custom_tags)
        return

-    tmdb_title: Optional[str] = None
-    if tmdb_id is None:
-        tmdb_id, tmdb_title = search_tmdb(name, year, kind)
-        log.debug("Search result: %r (ID %s)", tmdb_title, tmdb_id)
-        if not tmdb_id or not tmdb_title or not fuzzy_match(tmdb_title, name):
-            log.debug("TMDB search did not match; skipping external ID lookup")
+    if config.tag_imdb_tmdb:
+        # If tmdb_id is provided (via --tmdb), skip Simkl and use TMDB directly
+        if tmdb_id is not None:
+            log.debug("Using provided TMDB ID %s for tags", tmdb_id)
+        else:
+            # Try Simkl first for automatic lookup
+            simkl_data, simkl_title, simkl_tmdb_id = search_simkl(name, year, kind)
+
+            if simkl_data and simkl_title and fuzzy_match(simkl_title, name):
+                log.debug("Using Simkl data for tags")
+                if simkl_tmdb_id:
+                    tmdb_id = simkl_tmdb_id
+
+                show_ids = simkl_data.get("show", {}).get("ids", {})
+                if show_ids.get("imdb"):
+                    standard_tags["IMDB"] = f"https://www.imdb.com/title/{show_ids['imdb']}"
+                if show_ids.get("tvdb"):
+                    standard_tags["TVDB"] = f"https://thetvdb.com/dereferrer/series/{show_ids['tvdb']}"
+                if show_ids.get("tmdbtv"):
+                    standard_tags["TMDB"] = f"https://www.themoviedb.org/tv/{show_ids['tmdbtv']}"
+
+        # Use TMDB API for additional metadata (either from provided ID or Simkl lookup)
+        api_key = _api_key()
+        if not api_key:
+            log.debug("No TMDB API key set; applying basic tags only")
            _apply_tags(path, custom_tags)
            return

-    tmdb_url = f"https://www.themoviedb.org/{'movie' if kind == 'movie' else 'tv'}/{tmdb_id}"
-    standard_tags["TMDB"] = tmdb_url
-    try:
-        ids = external_ids(tmdb_id, kind)
-    except requests.RequestException as exc:
-        log.debug("Failed to fetch external IDs: %s", exc)
-        ids = {}
-    else:
-        log.debug("External IDs found: %s", ids)
+        tmdb_title: Optional[str] = None
+        if tmdb_id is None:
+            tmdb_id, tmdb_title = search_tmdb(name, year, kind)
+            log.debug("TMDB search result: %r (ID %s)", tmdb_title, tmdb_id)
+            if not tmdb_id or not tmdb_title or not fuzzy_match(tmdb_title, name):
+                log.debug("TMDB search did not match; skipping external ID lookup")
+                _apply_tags(path, custom_tags)
+                return

-    imdb_id = ids.get("imdb_id")
-    if imdb_id:
-        standard_tags["IMDB"] = f"https://www.imdb.com/title/{imdb_id}"
-    tvdb_id = ids.get("tvdb_id")
-    if tvdb_id:
-        tvdb_prefix = "movies" if kind == "movie" else "series"
-        standard_tags["TVDB"] = f"https://thetvdb.com/dereferrer/{tvdb_prefix}/{tvdb_id}"
+        tmdb_url = f"https://www.themoviedb.org/{'movie' if kind == 'movie' else 'tv'}/{tmdb_id}"
+        standard_tags["TMDB"] = tmdb_url
+        try:
+            ids = external_ids(tmdb_id, kind)
+        except requests.RequestException as exc:
+            log.debug("Failed to fetch external IDs: %s", exc)
+            ids = {}
+        else:
+            log.debug("External IDs found: %s", ids)
+
+        imdb_id = ids.get("imdb_id")
+        if imdb_id:
+            standard_tags["IMDB"] = f"https://www.imdb.com/title/{imdb_id}"
+        tvdb_id = ids.get("tvdb_id")
+        if tvdb_id:
+            tvdb_prefix = "movies" if kind == "movie" else "series"
+            standard_tags["TVDB"] = f"https://thetvdb.com/dereferrer/{tvdb_prefix}/{tvdb_id}"

    merged_tags = {
        **custom_tags,
@@ -269,6 +372,8 @@ def tag_file(path: Path, title: Title, tmdb_id: Optional[int] | None = None) ->


 __all__ = [
+    "search_simkl",
+    "search_show_info",
    "search_tmdb",
    "get_title",
    "get_year",
--- a/unshackle/unshackle-example.yaml
+++ b/unshackle/unshackle-example.yaml
@@ -1,6 +1,12 @@
 # Group or Username to postfix to the end of all download filenames following a dash
 tag: user_tag

+# Enable/disable tagging with group name (default: true)
+tag_group_name: true
+
+# Enable/disable tagging with IMDB/TMDB/TVDB details (default: true)
+tag_imdb_tmdb: true
+
 # Set terminal background color (custom option not in CONFIG.md)
 set_terminal_bg: false

@@ -15,6 +21,12 @@ update_checks: true
 # How often to check for updates, in hours (default: 24)
 update_check_interval: 24

+# Title caching configuration
+# Cache title metadata to reduce redundant API calls
+title_cache_enabled: true  # Enable/disable title caching globally (default: true)
+title_cache_time: 1800  # Cache duration in seconds (default: 1800 = 30 minutes)
+title_cache_max_retention: 86400  # Maximum cache retention for fallback when API fails (default: 86400 = 24 hours)
+
 # Muxing configuration
 muxing:
  set_title: false
--- a/uv.lock
+++ b/uv.lock
@@ -1505,7 +1505,7 @@ wheels = [

 [[package]]
 name = "unshackle"
-version = "1.4.0"
+version = "1.4.1"
 source = { editable = "." }
 dependencies = [
    { name = "appdirs" },
Author	SHA1	Message	Date
Andy	9952758b38	feat(changelog): Update changelog with enhanced tagging configuration and improvements	2025-08-08 05:03:57 +00:00
Andy	f56e7c1ec8	chore(release): Bump version to 1.4.1 and update changelog with title caching features	2025-08-08 04:57:32 +00:00
Andy	096b7d70f8	Merge remote-tracking branch 'origin/main' into feature/title-caching	2025-08-08 04:50:46 +00:00
Andy	460878777d	refactor(tags): Simplify Simkl search logic and soft-fail when no results found	2025-08-07 17:56:36 +00:00
Andy	9eb6bdbe12	feat(tags): Enhance tag_file function to prioritize provided TMDB ID if --tmdb is used	2025-08-06 22:15:16 +00:00
Andy	41d203aaba	feat(config): Add options for tagging with group name and IMDB/TMDB details and new API endpoint of simkl if no tmdb api key is added.	2025-08-06 21:34:14 +00:00
Andy	0c6909be4e	feat(dl): Update language option default to 'orig' if no -l is set, avoids hardcoded en	2025-08-06 21:33:23 +00:00
Andy	f0493292af	feat: Implement title caching system to reduce API calls - Add configurable title caching with fallback support - Cache titles for 30 minutes by default, with 24-hour fallback on API failures - Add --no-cache and --reset-cache CLI flags for cache control - Implement region-aware caching to handle geo-restricted content - Use SHA256 hashing for cache keys to handle complex title IDs - Add cache configuration variables to config system - Document new caching options in example config This caching system significantly reduces redundant API calls when debugging or modifying CLI parameters, improving both performance and reliability.	2025-08-06 17:08:58 +00:00
Andy	ead05d08ac	fix(subtitle): Handle ValueError in subtitle filtering for multiple colons in time references fixes issues with subtitles that contain multiple colons	2025-08-06 01:28:03 +00:00
Andy	8c1f51a431	refactor: Remove Dockerfile and .dockerignore from the repository	2025-08-05 23:56:07 +00:00