Skip to content

Commit

Permalink
v2
Browse files Browse the repository at this point in the history
  • Loading branch information
alexlitel committed Sep 26, 2017
1 parent 8506a80 commit 3f8328a
Show file tree
Hide file tree
Showing 19 changed files with 1,678 additions and 1,421 deletions.
19 changes: 5 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,31 +7,22 @@ This repo houses the backend portion of a project collecting the daily tweets of
Licensed under [MIT](http://www.opensource.org/licenses/mit-license.php)

## How it works
This project is designed to run on a service like Heroku, interfacing with the Twitter API at a set interval to make sure tweets are captured. The app culls data from a Twitter list following all the relevant congressional accounts, the most anonymous way of following a Twitter account. If you follow this strategy (designed to minimizing chances of blocking), I recommend using an undetectable private Twitter list in combination with either a private Twitter account or a burner account you never use. This app does not presently initialize the list or automate following process, though I might create some version of the latter in the future.
This project is designed to run on a service like Heroku, interfacing with the Twitter API at a set interval to make sure tweets are captured. The app culls data from a Twitter list following all the relevant congressional accounts, the most anonymous way of following a Twitter account. If you follow this strategy (designed to minimizing chances of blocking), I recommend using an undetectable private Twitter list in combination with either a private Twitter account or a burner account you never use. To collect tweets, the app iterates through Twitter search queries.

To track tweets and a few other data points, the app uses a small Redis store that contains some stringified data that gets parsed when the app runs. To reduce unwieldiness, the app transforms the received Twitter tweet data into much smaller objects with a few properties like text, screen name, date, and id. It includes both retweets and full text of quoted tweets. At the end of the day (EST), the app empties out the previous day's tweet day into JSON dumps of tweets (generated using data from `data/users.json`, stored on the Redis store).

The app uses the Github API to commit JSON data (and a small MD file/Jekyll post for some frontend/RSS stuff) to the frontend repo. I have set up and recommend a secondary account so you do not have 30 extra commits at the end of the month on your page.

*Note: By default, the app does not collect replies to tweets from users outside the list, though there is a configurable option to do so.*

### Maintenance
In addition to the app collating tweets from a list, there is a highly customizable maintenance process that allows for the easy updating and organization of user datasets and the Twitter list and Redis store powering the project. The maintenance process checks the local user datasets against the Twitter list, and current legislator and social media datasets from [@unitedstates/congress-legislators](https://github.com/unitedstates/congress-legislators) to look for outdated information, and if there is any outdated info, will update the datasets accordingly. Server-side or with a local Redis store, this process checks for reactivated and deactivated accounts, and deletes any accounts from the current user dataset that have been were deactivated long enough ago (more than 30 days) for Twitter to delete the account from its servers.

In addition to maintaining the datasets, the the process handles store and list initialization, and post-build updates of the store. Depending on the environment and configuration, the maintenance process can update files and/or store, and commit the updated datasets to Github with a message and body.

The maintenance process' behavior can be modified based on the options described in the section below.


### App/Maintenance options
There are a number of options that you can pass to the app and maintenance processes to customize their behavior. The options are passed as flags when running the `update` or `main` files (i.e. `node lib/update --exampleFlag=foo` or `babel-node src/update.js --exampleFlag=foo`, etc).
The maintenance process' behavior can be modified based on the options described below.

##### App
* **`collect-replies`**: allows app to collect replies to tweets from accounts from users outside the list.

*aliases: `c`, `cr`, `collect`, `collectreplies`*
##### Options
There are a number of options that you can pass to the maintenance processes to customize its behavior. The options are passed as flags when running the `update` file (i.e. `node lib/update --exampleFlag=foo` or `babel-node src/update.js --exampleFlag=foo`, depending on environment).

##### Maintenance
* **`format-only`**: in local environment, simply sorts the dataset files tidily.

*aliases: `format`, `ff`, `formatfiles`, `formatonly`, `fo`, `fmt`*
Expand Down Expand Up @@ -109,7 +100,7 @@ You'll need the following environmental variables set in a `.env` file in the di
**Optional variables**: There's a `TZ` variable for helping the `moment-timezone` module operate, but that defaults to `America/New_York` in its absence and isn't needed. For the self-updating maintenance process, there's a `SELF_REPO` variable for the quasi-recursive updates. Make sure you have Github repos set up for deployment, otherwise those parts of the app may fail. Deploying the app will automatically run unit tests and linters.

## Testing
To test the app, simply run `yarn test` to lint and run Jest tests and other fun stuff. For v1, I created a mock API to handle the requests to Github content, Github's API and Twitter's API, and added a variety of utilities
To test the app, simply run `yarn test` to lint and run Jest tests and other fun stuff. As of V1, there's a mock API to handle the requests to Github content, Github's API and Twitter's API, and added a variety of utilities

## Issues, etc.
If you come across any issues, don't hesitate to file any issue in this repo, make a pull request or [send an email](mailto:alexlitelATgmailDOTcom).
Expand Down
15 changes: 8 additions & 7 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,11 @@
"testRegex": "test/.+.test.js"
},
"dependencies": {
"babel-core": "^6.26.0",
"bluebird": "^3.5.0",
"dotenv": "^4.0.0",
"ent": "^2.2.0",
"flat": "^2.0.1",
"flat": "^4.0.0",
"github": "^11.0.0",
"lodash": "^4.17.4",
"moment": "^2.18.1",
Expand All @@ -39,22 +40,22 @@
},
"devDependencies": {
"babel-cli": "^6.24.1",
"babel-jest": "^20.0.3",
"babel-jest": "^21.0.2",
"babel-plugin-syntax-class-properties": "^6.13.0",
"babel-plugin-transform-class-properties": "^6.24.1",
"babel-plugin-transform-object-rest-spread": "^6.23.0",
"babel-polyfill": "^6.23.0",
"babel-preset-env": "^1.5.2",
"coveralls": "^2.13.1",
"cross-env": "^5.0.1",
"eslint": "^3.19.0",
"eslint-config-airbnb-base": "^11.2.0",
"eslint": "^4.7.2",
"eslint-config-airbnb-base": "^12.0.0",
"eslint-plugin-import": "^2.3.0",
"husky": "^0.13.4",
"jest": "^20.0.4",
"husky": "^0.14.3",
"jest": "^21.1.0",
"nock": "^9.0.14",
"nodemon": "^1.11.0",
"redis-mock": "^0.17.0",
"redis-mock": "^0.20.0",
"rimraf": "^2.6.1"
},
"engines": {
Expand Down
46 changes: 19 additions & 27 deletions src/app.js
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
import _ from 'lodash'
import {
TwitterHelper,
TwitterHelper,
} from './twitter'
import GithubHelper from './github'
import {
configureMaintenance,
configureMaintenance,
} from './maintenance'
import {
createTimeObj,
getTime,
serializeObj,
unserializeObj,
createTimeObj,
getTime,
serializeObj,
unserializeObj,
} from './util'


Expand All @@ -23,45 +23,37 @@ export class App {
try {
const isActive = !!await this.redisClient.existsAsync('app')
const data = isActive ?
unserializeObj(await this.redisClient.hgetallAsync('app'))
: await this.init()
unserializeObj(await this.redisClient.hgetallAsync('app'))
: await this.init()

data.time = _.chain(data)
.pick(['initDate', 'lastRun', 'lastUpdate'])
.mapValues(v => _.isNil(v) ? null : getTime(v))
.thru(timeProps => createTimeObj(timeProps))
.value()
.pick(['initDate', 'lastRun', 'lastUpdate'])
.mapValues(v => _.isNil(v) ? null : getTime(v))
.thru(timeProps => createTimeObj(timeProps))
.value()

if (!data.lastRun) {
data.lastRun = getTime().startOf('day').format()
}

const twitterClient = new TwitterHelper(this.config.TWITTER_CONFIG, this.config.LIST_ID)
const twitterData = await twitterClient.run(data)

const newData = {}

if (twitterData.sinceId !== undefined && twitterData.sinceId !== 'undefined') {
if (twitterData.sinceId && twitterData.sinceId !== undefined && twitterData.sinceId !== 'undefined') {
newData.sinceId = twitterData.sinceId
}

if (data.time.yesterdayDate || twitterData.tweets.length > 0) {
newData.tweets = await _.uniqBy(data.time.yesterdayDate ?
twitterData.tweets.today :
data.tweets.concat(twitterData.tweets), 'id')
twitterData.tweets.today :
data.tweets.concat(twitterData.tweets), 'id')
}

if (data.time.yesterdayDate) {
newData.collectSince = newData.sinceId

if (this.options.collectReplies) {
data.ids = {}
data.ids.all = data.accounts.map(account => account.id)
data.ids.toCheck = (await twitterClient
.getActiveUsers(data.time))
.map(account => account.id_str)

twitterData.tweets.yesterday = (await twitterClient
.run(data, { collectReplies: true })).tweets
.concat(twitterData.tweets.yesterday)
}

data.tweets = await _.uniqBy(data.tweets.concat(twitterData.tweets.yesterday), 'id')

await new GithubHelper(this.config.GITHUB_TOKEN, this.config.GITHUB_CONFIG).run(data)
Expand Down
6 changes: 5 additions & 1 deletion src/config.js
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
/* eslint-disable */
export const TWITTER_CONFIG = {
access_token: process.env.ACCESS_TOKEN,
access_token_secret: process.env.ACCESS_TOKEN_SECRET,
Expand All @@ -6,8 +7,8 @@ export const TWITTER_CONFIG = {
}

export const TIME_ZONE = process.env.TZ || 'America/New_York'
export const LIST_ID = process.env.LIST_ID
export const REDIS_URL = process.env.REDIS_URL || 'redis://localhost:6379'
export const LIST_ID = process.env.LIST_ID
export const GITHUB_TOKEN = process.env.GITHUB_TOKEN
export const GITHUB_USER = process.env.GITHUB_USER
export const SITE_REPO = process.env.SITE_REPO
Expand All @@ -25,3 +26,6 @@ export const APP_CONFIG = {
LIST_ID,
GITHUB_TOKEN,
}


/* eslint-enable */
100 changes: 49 additions & 51 deletions src/github.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@ import unionBy from 'lodash/unionBy'
import toPairs from 'lodash/toPairs'
import bluebird from 'bluebird'
import {
BuildMd,
BuildMd,
} from './helpers'

export default class GithubHelper {

checkValidity() {
if (!this.token) throw new Error('Missing Github auth token')
else if (!this.config) throw new Error('Missing Github user and repo')
Expand All @@ -24,32 +23,31 @@ export default class GithubHelper {

if (recursive) {
promises = await toPairs(data.toWrite).map(pair =>
[pair[0].replace(/_/g, '-'), JSON.stringify(pair[1])],
)
[pair[0].replace(/_/g, '-'), JSON.stringify(pair[1])])
} else {
promises = [await JSON.stringify(data.tweets),
await BuildMd.generateMeta(data.time.yesterdayDate),
]
}

return bluebird.map(promises, async (item, i) => {
const buffer = await new Buffer(recursive ? item[1] : item).toString('base64')
const buffer = await Buffer.from(recursive ? item[1] : item).toString('base64')
const promiseData = (await this
.client
.gitdata
.createBlob({
...this.config,
content: buffer,
encoding: 'base64',
})).data
.client
.gitdata
.createBlob({
...this.config,
content: buffer,
encoding: 'base64',
})).data

let blobPath
if (recursive) {
blobPath = `data/${item[0]}.json`
} else {
blobPath = i === 0
? `data/${data.time.yesterdayDate}.json`
: `_posts/${data.time.yesterdayDate}--tweets.md`
? `data/${data.time.yesterdayDate}.json`
: `_posts/${data.time.yesterdayDate}--tweets.md`
}

return Object.assign(promiseData, {
Expand All @@ -68,12 +66,12 @@ export default class GithubHelper {
try {
await this.checkValidity()
return (await this.client
.repos
.getShaOfCommitRef({
...this.config,
ref: 'heads/master',
...opts,
})).data.sha
.repos
.getShaOfCommitRef({
...this.config,
ref: 'heads/master',
...opts,
})).data.sha
} catch (e) {
return Promise.reject(e)
}
Expand All @@ -83,13 +81,13 @@ export default class GithubHelper {
async getTree(time, sha, blobs, recursive) {
try {
await this.checkValidity()
const tree = (await this.client
.gitdata
.getTree({
...this.config,
sha,
recursive: true,
})).data.tree
const { tree } = (await this.client
.gitdata
.getTree({
...this.config,
sha,
recursive: true,
})).data
if (!recursive) {
if (time.deleteDate) {
return tree.filter(item => !item.path.includes(time.deleteDate)).concat(blobs)
Expand All @@ -106,11 +104,11 @@ export default class GithubHelper {
try {
await this.checkValidity()
return (await this.client
.gitdata
.createTree({
...this.config,
tree,
})).data.sha
.gitdata
.createTree({
...this.config,
tree,
})).data.sha
} catch (e) {
return Promise.reject(e)
}
Expand All @@ -122,13 +120,13 @@ export default class GithubHelper {

const parents = typeof prevCommitSha === 'object' ? prevCommitSha : [prevCommitSha]
return (await this.client
.gitdata
.createCommit({
...this.config,
message: message || `Add tweets for ${time.yesterdayDate}`,
tree: treeSha,
parents,
})).data.sha
.gitdata
.createCommit({
...this.config,
message: message || `Add tweets for ${time.yesterdayDate}`,
tree: treeSha,
parents,
})).data.sha
} catch (e) {
return Promise.reject(e)
}
Expand All @@ -139,13 +137,13 @@ export default class GithubHelper {
try {
await this.checkValidity()
return this.client
.gitdata
.updateReference({
...this.config,
ref: 'heads/master',
sha,
force: true,
})
.gitdata
.updateReference({
...this.config,
ref: 'heads/master',
sha,
force: true,
})
} catch (e) {
return Promise.reject(e)
}
Expand All @@ -163,11 +161,11 @@ export default class GithubHelper {
const headSha = await this.getLatestCommitSha()

await this
.createBlobs(data, recursive)
.then(blobs => this.getTree(data.time, headSha, blobs, recursive))
.then(tree => this.createTree(tree))
.then(createdTree => this.createCommit(createdTree, data.time, headSha, message))
.then(commit => this.updateReference(commit))
.createBlobs(data, recursive)
.then(blobs => this.getTree(data.time, headSha, blobs, recursive))
.then(tree => this.createTree(tree))
.then(createdTree => this.createCommit(createdTree, data.time, headSha, message))
.then(commit => this.updateReference(commit))

return {
success: true,
Expand Down
Loading

0 comments on commit 3f8328a

Please sign in to comment.