Category Archives: How-To

How-To’s, Tips, Tricks, Snippets, and more!

Automate Things

In the last issue of our newsletter we said we would create a script to bring a disorganized media library under control. Instead of going through 2TB of movies and TV shows and manually cleaning filenames, managing duplicates, and combining folders we can create a script to reliably do the same thing! As a free bonus we get to practice our skills in Python. Sure you could do this in Powershell, VBS, bash, or Perl as well but lets excercise our brains a bit.

Before we start it’s a good idea to jot down ideas for our script. What kinds of inputs will it see? What outputs do we want to get out of it? How do we want to pass arguments to it? What are some of the steps our files must go through during their transformation from ugly to pristine? What edge cases could possibly interfere with those steps?

We know it’s going to see filenames that contain a lot of repeating strings. Like “BRRip” and “x264” and “[eng]”. I made a list of all the things I can see that have to go just from scrolling through the mess. I’m going to need to evaluate a messy folder name full of messy filenames and try to create a clean folder full of clean files. Given my propensity for typos, I don’t want to pass arguments to this thing from the command line. (One time I chown’d an entire /var directory and roached a ‘nix box less than 2 hours after getting it configured). Edge cases surrounding the file extensions and duplicates could complicate things.

So now that we have an organized idea of what we want to accomplish, lets rough out some pseudocode! Roughing out a design with pseudocode makes prototyping a lot easier and ordered. In the case of this particular project I even used many of my lines of pseudo as code comments in the final product! 

My pseudocode looks like this…..

Check to see if an input folder was supplied.

Verify that the target folder exists and is writable.

Create an array of subfolders of the parent folder. 

For each subfolder...

  Clean the folder name and create a new folder. 

  Replace dots with spaces.

  Replace ( and ) with [ and ].

  Replace commonly found torrent group names.

  For each file...  

    Copy each file into the new folder with proper names. 

    Include artwork and subtitles.

    Replace dots with spaces.

    Replace ( and ) with [ and ].

    Replace commonly found torrent group names.  

  Delete the improperly named file.

  Delete the improperly named folder.

That’s a pretty good start. Looks like we need to program two loops that manipulate the hell out of a couple of strings, and then use the strings to create folders and move files around. Each time we do something risky with a file we’ll put a check in the way to make sure things go smoothly. 

The tricky part here is removing the unwanted substrings from the folder name and file names. We want to remove a lot of things as quickly as possible. Assuming we need to remove 4 substrings per string and there are 4 strings per folder with 1,000 folders this works out to 16,000 iterations if we were to use a loop to scan for each removal. Even then, what if we want to remove ‘abc‘ and ‘bcd’ from the string ‘abcde‘? Using a regular loop this would produce a string ‘de‘ when we really expect ‘e‘ as output. 

What we need is a function that we can call to do this work for us using regular expressions. Because we’re using Python, we can use dicts (short for dictionary) to store key:value pairs for our replacements. This has the added benefit of being able to set specific replacements to use for specific substrings. In our case we’re going to use this functionality to replace all periods from the filename with spaces while most other substrings get replaced with no space. 

Luckily I found bgusach‘s Github Gist (located here) that does mostly what we want. We’re going to hard-code the second argument and actually use two copies of it, one crafted to clean our folder names and the other tailored for cleaning our filenames.

Here’s the folder cleaning function…..

# A function to prepare the folder name using our filters.

def cleanDir(string):

  # Define the dictionary of {‘matches’: ‘replacements’}

  replacements = {‘()’: ”, ‘[‘: ”, ‘]’: ”, ‘{‘: ”, ‘}’: ”, ‘.’: ‘ ‘, ‘(‘: ”, ‘)’: ”, ‘BrRip’: ”, ‘BRRip’: ”, ‘XviD’: ”, ‘BluRay’:”, ‘YIFY’: ”, ‘[YTS.AG]’: ”, ‘[YTS.PE]’: ”, ‘HDTS’: ”, ‘720p’: ”, ‘x264’: ”, ‘AC3’: ”, ‘-‘: ”, ‘1080p’: ”, ‘,’: ”}

  # Place longer ones first to keep shorter substrings from matching where the longer ones should take place

  # For instance given the replacements {‘ab’: ‘AB’, ‘abc’: ‘ABC’} against the string ‘hey abc’, it should produce

  # ‘hey ABC’ and not ‘hey ABc’

  substrs = sorted(replacements, key=len, reverse=True)

  # Create a big OR regex that matches any of the substrings to replace

  regexp = re.compile(‘|’.join(map(re.escape, substrs)))

  # For each match, look up the new string in the replacements

 

  return regexp.sub(lambda match: replacements[match.group(0)], string)

We need to have two functions because filenames have extensions which need to stay exempt from cleaning. Remember that we’re replacing periods with whitespace, so to avoid losing the extension during processing we’re going to chop it off before cleaning and add it back on afterwards. We implement this crudely by simply counting 4 characters from the end of the input string, which means we still corrupt .torrent files, but I don’t care. If you do care it would be wise to add your own handler for that somehow. It could also be better if we used a second argument inside a single function to switch between folder cleaning and file cleaning, but I really only need to run this script once and my media library should be good to go. For what it is, doubling the programming time to improve program efficiency by 10% just doesn’t make sense. We’re trying to do this as easily as possible.

Here’s the file cleaning function…..

# A function to prepare the filename using our filters.

def cleanFile(string):

  # Separate the last 4 characters (the file extension in the case of media, images, and subtitles which are all we need).

  stringExt = string[-4:]

  string = string[:-4]

  # Define the dictionary of {‘matches’: ‘replacements’}

  replacements = {‘()’: ”, ‘[‘: ”, ‘]’: ”, ‘{‘: ”, ‘}’: ”, ‘(‘: ”, ‘)’: ”, ‘BrRip’: ”, ‘BRRip’: ”, ‘XviD’: ”, ‘BluRay’:”, ‘YIFY’: ”, ‘[YTS.AG]’: ”, ‘[YTS.PE]’: ”, ‘HDTS’: ”, ‘720p’: ”, ‘x264’: ”, ‘AC3’: ”, ‘-‘: ”, ‘1080p’: ”, ‘.’: ‘ ‘}

  # Place longer ones first to keep shorter substrings from matching where the longer ones should take place

  # For instance given the replacements {‘ab’: ‘AB’, ‘abc’: ‘ABC’} against the string ‘hey abc’, it should produce

  # ‘hey ABC’ and not ‘hey ABc’.

  substrs = sorted(replacements, key=len, reverse=True)

  # Create a big OR regex that matches any of the substrings to replace.

  regexp = re.compile(‘|’.join(map(re.escape, substrs)))

  # For each match, look up the new string in the replacements and re-add the extension.

  return regexp.sub(lambda match: replacements[match.group(0)], string).rstrip(‘ ‘) + stringExt

# Verify that the target folder exists and is writable.

if os.access(inputDir, os.W_OK) is not True:

  print(“inputDir not writable!”)

  sys.exit()

And believe it or not that was the hard part! The easy part is creating an array full of subdirectories and then iterating through it, scrubbing all the strings. The remaining logic is fully commented, which pretty much explains removing the remaining whitespace, checking for duplicates, creating the folders, copying the files, and deleting the originals…..

# For each folder…

for dir in os.walk(inputDir):

  # Apply our filters to the input folder.

  oldDir = dir[0]

  newDir = cleanDir(oldDir)

  newDir = newDir.replace(”   “, ” “)

  newDir = newDir.replace(”  “, ” “)

  # Check if a folder already exists and create one if it does not.

  if os.path.exists(newDir) is not True:

    try:

      os.mkdir(newDir)

    except:

      print(“newDir not writable!”) 

  # Scan the folder for files.

  oldFiles = os.listdir(oldDir)

  # For each file within a folder…

  for oldFile in oldFiles:

    oldFilePath = os.path.join(oldDir, oldFile)

    # Make sure the file is real and not just a symlink.

    if (os.path.isfile(oldFilePath)):

      # Apply our filters to the input file.

      newFile = cleanFile(oldFile)

      print newFile

      newFilePath = newDir + ‘/’ + newFile

      newFilePath = newFilePath.replace(”   “, ” “)

      newFilePath = newFilePath.replace(”  “, ” “)

      # Check that the target file doesn’t already exist.

      if (os.path.isfile(newFilePath)):

        # Increment the filename if a file already exists with the same name.

        newFilePath = newDir + ‘/1_’ + newFile

        os.rename(oldFilePath, newFilePath)

      if (os.path.isfile(newFilePath)) is not True:

        # Copy the file to the new directory.

        os.rename(oldFilePath, newFilePath)

  # After all files are processed try to delete the folder.

  # We could use shutil.rmtree instead but for this it’s better to have errors with duplicates

  # that can be corrected instead of errors and deleted originals.

  if oldDir != newDir:

    try:

      os.rmdir(oldDir)

    except:

 

      print(“Cannot Delete oldDir!”)

When we put it all together it looks something like this…..

# HR_Media_Organizer

import sys, os, re

inputDir = ‘/home/justin/Desktop/testDir’

 

# Create an array of folders. 

files = os.chdir(inputDir)

 

# A function to prepare the folder name using our filters.

def cleanDir(string):

  # Define the dictionary of {‘matches’: ‘replacements’}

  replacements = {‘()’: ”, ‘[‘: ”, ‘]’: ”, ‘{‘: ”, ‘}’: ”, ‘.’: ‘ ‘, ‘(‘: ”, ‘)’: ”, ‘BrRip’: ”, ‘BRRip’: ”, ‘XviD’: ”, ‘BluRay’:”, ‘YIFY’: ”, ‘[YTS.AG]’: ”, ‘[YTS.PE]’: ”, ‘HDTS’: ”, ‘720p’: ”, ‘x264’: ”, ‘AC3’: ”, ‘-‘: ”, ‘1080p’: ”, ‘,’: ”}

  # Place longer ones first to keep shorter substrings from matching where the longer ones should take place

  # For instance given the replacements {‘ab’: ‘AB’, ‘abc’: ‘ABC’} against the string ‘hey abc’, it should produce

  # ‘hey ABC’ and not ‘hey ABc’

  substrs = sorted(replacements, key=len, reverse=True)

  # Create a big OR regex that matches any of the substrings to replace

  regexp = re.compile(‘|’.join(map(re.escape, substrs)))

  # For each match, look up the new string in the replacements

  return regexp.sub(lambda match: replacements[match.group(0)], string)

 

# A function to prepare the filename using our filters.

def cleanFile(string):

  # Separate the last 4 characters (the file extension in the case of media, images, and subtitles which are all we need).

  stringExt = string[-4:]

  string = string[:-4]

  # Define the dictionary of {‘matches’: ‘replacements’}

  replacements = {‘()’: ”, ‘[‘: ”, ‘]’: ”, ‘{‘: ”, ‘}’: ”, ‘(‘: ”, ‘)’: ”, ‘BrRip’: ”, ‘BRRip’: ”, ‘XviD’: ”, ‘BluRay’:”, ‘YIFY’: ”, ‘[YTS.AG]’: ”, ‘[YTS.PE]’: ”, ‘HDTS’: ”, ‘720p’: ”, ‘x264’: ”, ‘AC3’: ”, ‘-‘: ”, ‘1080p’: ”, ‘.’: ‘ ‘}

  # Place longer ones first to keep shorter substrings from matching where the longer ones should take place

  # For instance given the replacements {‘ab’: ‘AB’, ‘abc’: ‘ABC’} against the string ‘hey abc’, it should produce

  # ‘hey ABC’ and not ‘hey ABc’.

  substrs = sorted(replacements, key=len, reverse=True)

  # Create a big OR regex that matches any of the substrings to replace.

  regexp = re.compile(‘|’.join(map(re.escape, substrs)))

  # For each match, look up the new string in the replacements and re-add the extension.

  return regexp.sub(lambda match: replacements[match.group(0)], string).rstrip(‘ ‘) + stringExt

# Verify that the target folder exists and is writable.

if os.access(inputDir, os.W_OK) is not True:

  print(“inputDir not writable!”)

  sys.exit()

 

# For each folder…

for dir in os.walk(inputDir):

  # Apply our filters to the input folder.

  oldDir = dir[0]

  newDir = cleanDir(oldDir)

  newDir = newDir.replace(”   “, ” “)

  newDir = newDir.replace(”  “, ” “)

  # Check if a folder already exists and create one if it does not.

  if os.path.exists(newDir) is not True:

    try:

      os.mkdir(newDir)

    except:

      print(“newDir not writable!”) 

  # Scan the folder for files.

  oldFiles = os.listdir(oldDir)

  # For each file within a folder…

  for oldFile in oldFiles:

    oldFilePath = os.path.join(oldDir, oldFile)

    # Make sure the file is real and not just a symlink.

    if (os.path.isfile(oldFilePath)):

      # Apply our filters to the input file.

      newFile = cleanFile(oldFile)

      print newFile

      newFilePath = newDir + ‘/’ + newFile

      newFilePath = newFilePath.replace(”   “, ” “)

      newFilePath = newFilePath.replace(”  “, ” “)

      # Check that the target file doesn’t already exist.

      if (os.path.isfile(newFilePath)):

        # Increment the filename if a file already exists with the same name.

        newFilePath = newDir + ‘/1_’ + newFile

        os.rename(oldFilePath, newFilePath)

      if (os.path.isfile(newFilePath)) is not True:

        # Copy the file to the new directory.

        os.rename(oldFilePath, newFilePath)

  # After all files are processed try to delete the folder.

  # We could use shutil.rmtree instead but for this it’s better to have errors with duplicates

  # that can be corrected instead of errors and deleted originals.

  if oldDir != newDir:

    try:

      os.rmdir(oldDir)

    except:

      print(“Cannot Delete oldDir!”)

And there you have it! This script will turn a folder named “Jigsaw (2017) [1080p] [YTS.AG]” into “Jigsaw 2017, which is exactly what I want. You might want something different so you certainly shouldn’t attempt to run this script right out of the box without testing some samples. Obviously there is no warranty on this and if you mess up anything I take no responsibility. I just want to illustrate how approachable programming really is, and that programatically solving problems is oftentimes a lot easier than doing things manually. It’s simply a matter of establishing a repeatable system that you can describe, and then putting that description into your syntax of choice. 

Download this code on Github!