Creating a Static Web-site from a WordPress Blog

I have an old WordPress Blog that I created circa 2005, which is no longer updated. In order to remove the need to (constantly) update the version of WordPress, themes, plugins etc. and to create a version suitable for archival purposes, I decided to convert the blog to a Static web-site.

It was difficult to find a suitable WordPress plugin to do the job so I deciced to look for alternative approaches. One way of doing this is to use a software application such as HHTrack, a free (GPL, libre/free software) and easy-to-use offline browser utility, that will create a copy of
a web-site on your local computer.

https://www.httrack.com

It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site’s relative link-structure. Simply open a page of the “mirrored” website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.

The existing blog consists of just under 600 posts/pages and has approx 150MB of media files.

Before starting, I backed up the WordPress directory and related MYSQL database on the server.

An initial run of HTTrack successfully created a version of the blog as a static web-site that seemed to work OK and highlighted a number of issues and limitations.

  • Comments do not transfer – This is not a big issue for me, although it should be possible to come up with a suitable work-around for handling these if necessary.

The first thing I did was to check the WordPress plugins being used, removing those that were not relevant for a static website. I also wrote a small plugin to remove/disable the current Responsive Images feature in WordPress.

It was interesting to note a variety of different techniques used for embdedding YouTube videos in the blog pages – this was caused by differing techniques evolving over the time that the blog was written (10 years+) and I decided to ensure they were consistent through the site using the embed code provided from the YouTube site.

I also decided to migrate the blog from HTTP to HTTPS as part of the exercise. The WordPress theme being used has some hard-coded urls using HTTP so I edited these to use HTTPS.

I then ran the HTTrack utility to create a local copy of the blog, in fact I ran the utility a number of times checking and tweaking details in the blog between each run. The utility provides a helpful log of warning/error messages that occur during the run

Final steps included mass-file editing of the local files to convert HTTP references to HTTPS and other clean-up operations on the final files.

I also created a zipped up copy of the original blog files and MYSQL database, which have now been archived, in case I ever need to return to the original blog.

1 item not addressed during this process was how to handle referenced URLs which have expired. An expired URL is one which is no longer valid because the original destination has changed or disappeared completely. I have a few ideas how to handle this phenomenon if I can ever find the time to explore it further.

Other Links

http://redirectdetective.com/creating-redirects.html

Internet Shortcuts to HTML file – Python

I had problems with the DOS batch file version of this utility so decided to replace it with a more robust version written in Python.

import os
import time
import datetime

# Converting all url files in directory and sub-directories to a single html file

tim = datetime.datetime.now()
tim1 = tim.strftime("%d-%m-%Y")
day = tim.strftime("%d")
mth = tim.strftime("%m")
year = tim.strftime("%Y")

tim2 = tim.strftime("%H:%M:%S")
nameOutput = "Links"+day+mth+year+".html"
nameList = "Links"+day+mth+year+"_list.txt"
print 'Generating list ' + nameOutput +' at '+tim2+' on '+tim1

dir_path = os.getcwd()
listFile = open(nameList, 'w')

with open(nameOutput, 'w') as outputFile:
  outputFile.write( '<h1>List Generated at '+tim2+' on ' + tim1 +'</h1>')

  for dirName, subDirs, files in os.walk(dir_path):
    outputFile.write( '<h3>'+dirName+'</h3>\n' )
    outputFile.write( '<ol>\n' )
    for name in files:
      if name.endswith(".url"):
        with open(dirName+'\\'+name) as fh:
          base, ext = os.path.splitext(name)
          for line in fh:
            if line.startswith("URL="):
              href = line.split("URL=",1)[1].rstrip('\n')
              #href.rstrip('\n')
              outputFile.write( '<li><a href="'+href+'" target="_blank">'+base+'</a></li>\n' )
              listFile.write( href+'\n' )
              break
    outputFile.write( '</ol>\n' )

listFile.close()

When invoked, this version will go through the directory and all sub-directories to generate an html file of all the URLs it finds. It will also create a list of the URLs in a separate file.

eMail Archiving and Outlook

I have a number of email archives, in total about 5GB, stored as .pst files from Microsoft Outlook and going back to around 1998.

Without a copy of Outlook, the files have been pretty useless, just taking up space, but now I have an Office 365 subscription I can view and access them again.

A couple of points to note:

Using Outlook with your Microsoft account

My Microsoft ID account has 2 step verification enabled so that when I have installed Outlook on my laptop connecting to my Microsoft account requires me to generate an app password to use rather than my normal account password. This is because the app can’t prompt you to enter a security code when you try to sign in.

If you try to use your normal password you just get repeated requests to enter your password.

  1. Go to the Security basics page and sign in to your Microsoft account.(https://account.microsoft.com/security)
  2. Select more security options.
  3. Under App passwords, select Create a new app password. A new app password is generated and appears on your screen.
  4. Enter this app password where you would enter your normal password.

Outlook blocked access to attachments

Some of the emails in the .pst files had attachments which Outlook refused to display due to security concerns.

I came across a method for enabling access to these attachments by making a change to the Windows Registry (Windows 7)

Only modify the Registry if you know what you are doing!! Serious problems might occur if you modify the registry incorrectly. Before you modify it, back up the registry for restoration in case problems occur.

Add a new string value Level1Remove to the registry key

HKEY_CURRENT_USER\Software\Microsoft\Office\16.0\Outlook\Security

where the value(s) are the file extensions separate by ; eg: .exe; .url

To block specific file types add a new string value Level1Add instead.


On a separate issue, also came across an article on enabling the “Group Policy Editor” (gpedit.msc) in Windows 7 Home Edition – although the original article does include the following disclaimer.

DISCLAIMER: This tutorial has been shared for the sake of knowledge sharing. Patching system files or using 3rd party software might be dangerous for your computer. We do not recommend it and we’ll not be responsible if it harms your system.

https://www.askvg.com/how-to-enable-group-policy-editor-gpedit-msc-in-windows-7-home-premium-home-basic-and-starter-editions/

Internet Shortcuts to HTML file

While trawling around the web, I often save shortcuts to websites that I’ve visited .

Each shortcut is saved as a special filetype with the suffix .url and is recognised by Windows as an Internet Shortcut. The name of the file is a description of the link (usually) taken from the contents of the webpage title html tag and the contents of the file include the actual url.

For example,

openEyes – Eye tracking for the masses.url

has the following contents

[InternetShortcut]
URL=http://thirtysixthspan.com/openEyes/

I came across a technique for combining one or more .url files into a single .html file originally described on the mozillaZine website (http://mozillazine.org) in the forum post

Convert links: desktop shortcut to bookmark?

This describes a method using a DOS batch/cmd file to automate the process (I’d forgotten how powerful DOS batch/cmd files can be). Here is my modified version, saved into a file called url2html.cmd:

@echo off
setlocal enabledelayedexpansion

cd /d "%~dp1"
set output="LINKS_%RANDOM%.html"
echo ^<ol^> >> %output%
for /f "tokens=* delims=" %%t in ('dir /b "%~dp1*.url"') do (
     set strLine2=%%t
   type "%%t" | find "URL=" > u2htemp
   set /p strLine1= < u2htemp
    echo ^<li^>^<a href="!strLine1:~4!"^>!strLine2:~0,-4!^</a^>^</li^> >> %output%
)
echo ^<^/ol^> >> %output%

del u2htemp

This code can be run as follows from a command line prompt:

C:\mutils\url2html.cmd "c:\urls\openEyes - Eye tracking for the masses.url" 

and will add all url files in the same directory of the specified url to a single html file.

It is set up in this way so that it can be easily invoked from Windows File Explorer from right clicking on a .url file and selecting the Send to option. This can be enabled as follows:

Open the Send to special folder ( e.g. by using WindowsKey-R and typing shell:SendTo ) and dragging a shortcut to your newly created .cmd file there. The command url2html will then be added to the Send to option.

There are some interesting points to note regarding some of the commands used in this script.

setlocal enabledelayedexpansion

I found a description of this, with an example, at EnableDelayedExpansion

Delayed Expansion will cause variables within a batch file to be expanded at execution time rather than at parse time, this option is turned on with the SETLOCAL EnableDelayedExpansion command.

Variable expansion means replacing a variable (e.g. %windir%) with its value C:\WINDOWS

By default expansion will happen just once, before each line is executed. The delayedexpansion is performed each time the line is executed, or for each loop in a FOR looping command. For simple commands this will make no noticable difference, but with loop commands like FOR, compound or bracketed expressions delayed expansion will allow you to always see the current value of the variable.

When delayed expansion is in effect, variables can be immediately read using !variable_name! you can still read and use %variable_name% but that will continue to show the initial value (expanded at the beginning of the line).

Another interesting concept is the handling/interpretation of arguments passed to the cmd file.

cd /d "%~dp1"

This is used to attach to the directory containing the url file passed as an argument when executing the cmd file.
The modifier %~dp1 expands the first argument – the url filename and extracts the drive letter and path.

Details can be found from Microsoft at
Using batch parameters

5.1 Surround Sound in VLC Player

Using a Cambridge Soundworks Desktop Theater 5.1 DTT2200 sound systems with my PC (Windows 7).

Set up all speakers and check that they are all working at the system level

Instructions to set VLC media player to play through all speakers as follows:

Open VLC preferences (Tools…Preferences) and under the “Show settings” section, select All.

Under the Audio section expand the Output modules section and select DirectX (or Dolby Digital)

Pick the output device you’re using, which has been set up for 5.1 sound, and specify the Speaker configuration as 5.1

Save the preferences and restart the VLC player. Audio will be played using all the speakers.

References

VLC media player

VLC is a free and open source cross-platform multimedia player and framework that plays most multimedia files as well as DVDs, Audio CDs, VCDs, and various streaming protocols.

http://www.videolan.org

DTT2200

User guide available

DTT2200 User Guide (pdf)

(from http://files2.europe.creative.com/manualdn/Manuals/TSD/727/English.pdf )