Skip to content

Commit

Permalink
Remove outdated items and junk to prep for release
Browse files Browse the repository at this point in the history
  • Loading branch information
pmyteh committed Jan 5, 2016
1 parent e894d41 commit c4e1bcf
Show file tree
Hide file tree
Showing 9 changed files with 36 additions and 176 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ dton-warcs
*.pyc
*.csv
*.tsv
.~lock*
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ handclassifier
A quick-and-dirty python GUI for facilitating hand-classifying text and
web content into arbitrary categories.

This is still rudimentary and the API should not be considered stable.

The basic framework is to use a tkinter gui window to present the possible
classes for each document, with the document itself presented in another
window:
Expand All @@ -19,7 +17,9 @@ window:
from a MongoDB instance

This code is largely by Tom Nicholls, based upon earlier work by Jonathan
Bright.
Bright. Some example scripts are provided, together with a related piece of
code which classifies pairs of content against each other; this is earlier and
very rough, but may prove interesting.

Copyright 2013-2015, Tom Nicholls and Jonathan Bright
contact: [email protected]
Expand Down
65 changes: 0 additions & 65 deletions article_list_create.py

This file was deleted.

34 changes: 32 additions & 2 deletions darlington_classifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,38 @@
# This can be installed with 'pip install warctools'. Beware that there are
# several old versions floating around under different names in the index.
from hanzo.warctools import WarcRecord
from warcresponseparse import *

from hanzo.httptools import RequestMessage, ResponseMessage

#####
#UTILITY FUNCTIONS
#####
def parse_http_response(record):
"""Parses the payload of an HTTP 'response' record, returning code,
content type and body.
Adapted from github's internetarchive/warctools hanzo/warcfilter.py,
commit 1850f328e31e505569126b4739cec62ffa444223. MIT licenced."""
message = ResponseMessage(RequestMessage())
remainder = message.feed(record.content[1])
message.close()
if remainder or not message.complete():
if remainder:
print 'trailing data in http response for', record.url
if not message.complete():
print 'truncated http response for', record.url
header = message.header

mime_type = [v for k,v in header.headers if k.lower() == b'content-type']
if mime_type:
mime_type = mime_type[0].split(b';')[0]
else:
mime_type = None

return header.code, mime_type, message.get_body()

#####
#MAIN
#####
categories = ("1 - Information transmission",
"2 - Electronic service delivery",
"3 - Participation and collaboration",
Expand Down
5 changes: 0 additions & 5 deletions govUK_classifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,6 @@
# the Wayback classfier as it's fetched through the Wayback index.
# Not sending it through here as the second part of the tuple
# saves a good deal of memory.
# TODO: Could make this a FilePart or similar to vastly
# reduce the memory load if this is a problem.
# TODO: Could change interface to pass the mimetype - maybe
# make it easier to send to an appropriate program, or to name
# the file correctly when it's sent to a web browser?
content.append((row[0],None))

# Shuffle content so it's not in alphabetical order for classifying
Expand Down
2 changes: 0 additions & 2 deletions handclassifier/handclassifier.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
"""A quick-and-dirty python GUI for facilitating hand-classifying text and
web content into arbitrary categories.
This is still rudimentary and the API should not be considered stable.
The basic framework is to use a tkinter gui window to present the possible
classes for each document, with the document itself presented in another
window:
Expand Down
66 changes: 0 additions & 66 deletions news_classifier.py

This file was deleted.

33 changes: 0 additions & 33 deletions warcresponseparse.py

This file was deleted.

Binary file removed warcresponseparse.pyc
Binary file not shown.

0 comments on commit c4e1bcf

Please sign in to comment.