Wiki server with manual editing of text files as backend, with a selectable filter converting them to HTML and a caching mechanism.
I've tried using purely web-based wikis like MediaWiki, but the database and server maintenance involved with them have always ended up leaving me neglecting them. This time, I'm going to try something a bit different.
- Python 3 (3.3 or newer)
- pytz
- python-dateutil
- lxml
- chardet
- python-magic
- tornado
- An OS that supports the built-in fcntl module
- asciidoc in PATH (optional)
- markdown for Python (optional)
<?xml version="1.0" ?>
<configuration>
<log-level>DEBUG</log-level><!-- Passed to logging module -->
<bind-address>127.0.0.1</bind-address><!-- OPTIONAL: address to bind to -->
<bind-port>8080</bind-port><!-- Port to bind to -->
<document-root>testdata/test_root</document-root><!-- Root of directory containing files which will be procesed and served -->
<preview-lines>5</preview-lines><!-- OPTIONAL: When performing a search, show this many lines from the source document -->
<worker-threads>4</worker-threads><!-- OPTIONAL: Number of all-purpose worker threads to spawn. DEFAULT: 1 -->
<runtime-vars>4</runtime-vars><!-- Storage for runtime variables separate from the cache -->
<cache dir="testdata/test_cache"><!-- dir=Root of cache directory -->
<checksum-function>sha1</checksum-function><!-- Checksum algorithm used on the files to be processed to determine cache state -->
<max-age>86400</max-age><!-- OPTIONAL: Whenever a scrub is performed, delete files that are older than this age (seconds) -->
<max-entries>2048</max-entries><!-- OPTIONAL: Use an LRU algorithm to limit the approximate maximum number of entries in the cache -->
<auto-scrub /><!-- OPTIONAL: When the LRU algorithm hits the maximum number of entries, automatically scrub the cache to clear up free slots -->
<dispatcher-thread /><!-- OPTIONAL: Use the DispatcherCache class instead, which will perform automatic scrubbing in a separate thread -->
<send-etags /><!-- OPTIONAL: Send Etags based on checksum algorithm -->
</cache>
<search-cache><!-- OPTIONAL: Simply having this element here enabled cached searches
<max-age>3600</max-age><!-- OPTIONAL: Whenever a scrub is performed, delete files that are older than this age (seconds) -->
<max-entries>32</max-entries><!-- OPTIONAL: Use an LRU algorithm to limit the approximate maximum number of entries in the cache -->
<auto-scrub /><!-- OPTIONAL: When the LRU algorithm hits the maximum number of entries, automatically scrub the cache to clear up free slots -->
</search-cache>
<processors>
<encoding>utf8</encoding><!-- Output encoding passed to all the processors -->
<processor>asciidoc-xhtml11</processor><!-- OPTIONAL: Sets the default processor used to convert files to HTML -->
<!-- If no default processor is specified, the 'autoraw-nocache' processor is used -->
<processor extensions="txt foo">asciidoc-xhtml11</processor><!-- For the extensions txt and foo, use this processor to convert -->
<processor extensions="bar">asciidoc-html5</processor><!-- For the extensions bar, used asciidoc-html5 instead -->
</processors>
</configuration>
Since Python's argparse module
is used for parsing arguments, --help
will provide some advice.
usage: server.py [ options ] -c config.xml
optional arguments:
-h, --help show this help message and exit
--config CONFIG.XML, -c CONFIG.XML
XML configuration file
--scrub Instead of running the server, just do a cache scrub
--bind-address ADDRESS
Bind to ADDRESS instead of the address specified in
configuration
--bind-port ADDRESS Bind to ADDRESS instead of the port specified in
configuration
-
File locking everywhere to keep things consistent within the server processes and threads.
-
Raw source files will be used as the input, which can be modified whenever.
-
A caching system with a directory tree that corresponds to the source (asciidoc, etc.) structure.
- Web server processes will have a shared lock on a toplevel
.lock
file. - Other threads and processes will have an exclusive lock on a toplevel
.lock
file.
- Web server processes will have a shared lock on a toplevel
-
The caching system will have two methods of cleanup.
- A time to live system that with a configurable maximum age. The cleanup process will have to be scheduled in cron or something similar.
- An optional LRU-based system that will delete the oldest entry when there are too many sitting around. A thread will run in the background will have this task dispatched to it.
-
The caching system will use file size, file modification time, and a configurable checksum to check for changes in source files.
-
The actual filter will be configurable and replaceable, with asciidoc as both the initial and reference implementation.
-
Any source file revision control is left to the person managing the source directory tree.