Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2.5 #763

Open
wants to merge 59 commits into
base: master
Choose a base branch
from
Open

V2.5 #763

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
4611bbf
changed schema based on scrape type, simplified signing requests
fadytaher Oct 15, 2021
d2dd799
changed repo
fadytaher Oct 18, 2021
cb9d383
adding build folder
fadytaher Oct 18, 2021
4a9b531
added imageUrl
fadytaher Oct 18, 2021
0c1a5a4
added imageUrl
fadytaher Oct 18, 2021
8fda6da
fixed webvideo url
fadytaher Oct 18, 2021
0abbf04
handling status code
fadytaher Oct 19, 2021
e87513b
removing logs
fadytaher Oct 19, 2021
2421e41
version marker
fadytaher Oct 19, 2021
d0410b1
version marker
fadytaher Oct 19, 2021
1ae8e99
versionmarker4
fadytaher Oct 19, 2021
bc8a5e6
excluding empty items
fadytaher Oct 20, 2021
7f95bc9
removing empty objects
fadytaher Oct 20, 2021
cf1d2ea
add jar
fadytaher Nov 5, 2021
03ea4df
changed userInfo endpoint
fadytaher Nov 10, 2021
828d809
removed unrequired logging
fadytaher Nov 10, 2021
5ab86ce
version 3
fadytaher Nov 10, 2021
18629cf
added versioning message
fadytaher Nov 10, 2021
da05e47
Merge pull request #1 from fadytaher/marker.v3
fadytaher Nov 16, 2021
ee47012
using tag
fadytaher Nov 16, 2021
1526063
v2.0.2
fadytaher Nov 16, 2021
c9b9747
Merge pull request #2 from fadytaher/marker.v4
fadytaher Nov 16, 2021
fb7b113
throwing error if userProfile return empty UserInfo
fadytaher Nov 23, 2021
a4bee3e
throwing `user doesn't exist` if userInfo is empty
fadytaher Nov 23, 2021
d9aa5a8
tune error thrown if userPorfile doesn't exist
fadytaher Nov 23, 2021
f9a5db2
marker v5, 2.0.4
fadytaher Nov 23, 2021
ada027c
Merge pull request #3 from fadytaher/marker.v5
fadytaher Nov 23, 2021
4ea3342
parse signedURL response
fadytaher Dec 2, 2021
c129bbf
Merge pull request #4 from fadytaher/marker.v6
fadytaher Dec 2, 2021
541e72c
v2.1.0
fadytaher Dec 2, 2021
e4b78de
Merge pull request #5 from fadytaher/v2.1.0
fadytaher Dec 2, 2021
78dece1
stopping finite looping on hasMore attribute
fadytaher Dec 15, 2021
48c33f8
Merge pull request #6 from fadytaher/v2.1.1
fadytaher Dec 15, 2021
de9a8c2
handling tiktok post short url
fadytaher Feb 15, 2022
7b591d3
Merge pull request #7 from fadytaher/v2.1.2
fadytaher Feb 15, 2022
ebc5079
centeralize long/short url handling
fadytaher Feb 15, 2022
edaf36f
Merge pull request #8 from fadytaher/v2.1.3
fadytaher Feb 15, 2022
ca83409
firing 2 requests to handle AWS issue
fadytaher Feb 17, 2022
3fe2d29
Merge pull request #9 from fadytaher/v2.1.4
fadytaher Feb 17, 2022
908063f
adding inifite loop safeguard
fadytaher Feb 17, 2022
582e069
adding logs
fadytaher Feb 17, 2022
3d46b24
adding generated url in response items
fadytaher Feb 17, 2022
b4b45fb
build project
fadytaher Feb 17, 2022
6e2b513
adding longUrl to output
fadytaher Feb 17, 2022
6ed1c2e
Merge pull request #10 from fadytaher/v2.1.5
fadytaher Feb 17, 2022
05d1e8d
using head instead of get for getVideoLink
fadytaher Feb 17, 2022
36e08a9
Merge pull request #11 from fadytaher/v2.1.6
fadytaher Feb 17, 2022
5552a80
checking requests statuscode
fadytaher Apr 1, 2022
a999e47
Merge pull request #12 from fadytaher/v2.1.7
fadytaher Apr 1, 2022
107a1a9
checking for status codes and head request
fadytaher Apr 1, 2022
62559dd
Merge pull request #13 from fadytaher/v2.1.8
fadytaher Apr 1, 2022
dc38fd2
updated error messages
fadytaher Apr 1, 2022
35fd130
Merge pull request #14 from fadytaher/v2.2
fadytaher Apr 1, 2022
85ede7e
Getting user profile data from HTML content
fadytaher May 17, 2022
671385e
updated version
fadytaher May 17, 2022
2669c1d
Merge pull request #15 from fadytaher/v2.3
fadytaher May 17, 2022
b69ec64
version 2.4, adding proxy
fadytaher May 19, 2022
b8b7dd3
Merge pull request #16 from fadytaher/v2.4
fadytaher May 19, 2022
9415936
logging proxy
fadytaher May 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
node_modules/
reverse_eng/
build/
coverage/
garbage/

Expand Down
14 changes: 14 additions & 0 deletions build/constant/index.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
declare const _default: {
scrape: string[];
chronologicalTypes: string[];
history: string[];
requiredSession: string[];
sourceType: {
user: number;
music: number;
trend: number;
};
verifyFp: () => never;
userAgent: () => string;
};
export = _default;
49 changes: 49 additions & 0 deletions build/constant/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
"use strict";
module.exports = {
scrape: [
'user',
'hashtag',
'trend',
'music',
'discover_user',
'discover_hashtag',
'discover_music',
'history',
'video',
'from-file',
'userprofile',
],
chronologicalTypes: ['user'],
history: ['user', 'hashtag', 'trend', 'music'],
requiredSession: ['user', 'hashtag', 'trend', 'music'],
sourceType: {
user: 8,
music: 11,
trend: 12,
},
verifyFp: () => {
const variants = [];
return variants[Math.floor(Math.random() * variants.length)];
},
userAgent: () => {
const os = [
'Macintosh; Intel Mac OS X 10_15_7',
'Macintosh; Intel Mac OS X 10_15_5',
'Macintosh; Intel Mac OS X 10_11_6',
'Macintosh; Intel Mac OS X 10_6_6',
'Macintosh; Intel Mac OS X 10_9_5',
'Macintosh; Intel Mac OS X 10_10_5',
'Macintosh; Intel Mac OS X 10_7_5',
'Macintosh; Intel Mac OS X 10_11_3',
'Macintosh; Intel Mac OS X 10_10_3',
'Macintosh; Intel Mac OS X 10_6_8',
'Macintosh; Intel Mac OS X 10_10_2',
'Macintosh; Intel Mac OS X 10_10_3',
'Macintosh; Intel Mac OS X 10_11_5',
'Windows NT 10.0; Win64; x64',
'Windows NT 10.0; WOW64',
'Windows NT 10.0',
];
return `Mozilla/5.0 (${os[Math.floor(Math.random() * os.length)]}) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/${Math.floor(Math.random() * 3) + 87}.0.${Math.floor(Math.random() * 190) + 4100}.${Math.floor(Math.random() * 50) + 140} Safari/537.36`;
},
};
21 changes: 21 additions & 0 deletions build/core/Downloader.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
/// <reference types="node" />
import { CookieJar } from 'request';
import { MultipleBar } from '../helpers';
import { DownloaderConstructor, PostCollector, DownloadParams, Headers } from '../types';
export declare class Downloader {
progress: boolean;
mbars: MultipleBar;
progressBar: any[];
private proxy;
noWaterMark: boolean;
filepath: string;
bulk: boolean;
headers: Headers;
cookieJar: CookieJar;
constructor({ progress, proxy, noWaterMark, headers, filepath, bulk, cookieJar }: DownloaderConstructor);
private get getProxy();
addBar(type: boolean, len: number): any[];
toBuffer(item: PostCollector): Promise<Buffer>;
downloadPosts({ zip, folder, collector, fileName, asyncDownload }: DownloadParams): Promise<unknown>;
downloadSingleVideo(post: PostCollector): Promise<void>;
}
153 changes: 153 additions & 0 deletions build/core/Downloader.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
"use strict";
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
const request_1 = __importDefault(require("request"));
const request_promise_1 = __importDefault(require("request-promise"));
const fs_1 = require("fs");
const bluebird_1 = require("bluebird");
const archiver_1 = __importDefault(require("archiver"));
const socks_proxy_agent_1 = require("socks-proxy-agent");
const async_1 = require("async");
const helpers_1 = require("../helpers");
class Downloader {
constructor({ progress, proxy, noWaterMark, headers, filepath, bulk, cookieJar }) {
this.progress = true || progress;
this.progressBar = [];
this.noWaterMark = noWaterMark;
this.headers = headers;
this.filepath = filepath;
this.mbars = new helpers_1.MultipleBar();
this.proxy = proxy;
this.bulk = bulk;
this.cookieJar = cookieJar;
}
get getProxy() {
if (Array.isArray(this.proxy)) {
const selectProxy = this.proxy.length ? this.proxy[Math.floor(Math.random() * this.proxy.length)] : '';
return {
socks: false,
proxy: selectProxy,
};
}
if (this.proxy.indexOf('socks4://') > -1 || this.proxy.indexOf('socks5://') > -1) {
return {
socks: true,
proxy: new socks_proxy_agent_1.SocksProxyAgent(this.proxy),
};
}
return {
socks: false,
proxy: this.proxy,
};
}
addBar(type, len) {
this.progressBar.push(this.mbars.newBar(`Downloading (${!type ? 'WITH WM' : 'WITHOUT WM'}) :id [:bar] :percent`, {
complete: '=',
incomplete: ' ',
width: 30,
total: len,
}));
return this.progressBar[this.progressBar.length - 1];
}
toBuffer(item) {
return new Promise((resolve, reject) => {
const proxy = this.getProxy;
let r = request_1.default;
let barIndex;
let buffer = Buffer.from('');
if (proxy.proxy && !proxy.socks) {
r = request_1.default.defaults({ proxy: `http://${proxy.proxy}/` });
}
if (proxy.proxy && proxy.socks) {
r = request_1.default.defaults({ agent: proxy.proxy });
}
r.get({
url: item.videoUrlNoWaterMark ? item.videoUrlNoWaterMark : item.videoUrl,
headers: this.headers,
jar: this.cookieJar,
})
.on('response', response => {
const len = parseInt(response.headers['content-length'], 10);
if (this.progress && !this.bulk && len) {
barIndex = this.addBar(!!item.videoUrlNoWaterMark, len);
}
if (this.progress && !this.bulk && !len) {
console.log(`Empty response! You can try again with a proxy! Can't download video: ${item.id}`);
}
})
.on('data', chunk => {
if (chunk.length) {
buffer = Buffer.concat([buffer, chunk]);
if (this.progress && !this.bulk && barIndex && barIndex.hasOwnProperty('tick')) {
barIndex.tick(chunk.length, { id: item.id });
}
}
})
.on('end', () => {
resolve(buffer);
})
.on('error', () => {
reject(new Error(`Cant download video: ${item.id}. If you were using proxy, please try without it.`));
});
});
}
downloadPosts({ zip, folder, collector, fileName, asyncDownload }) {
return new Promise((resolve, reject) => {
const saveDestination = zip ? `${fileName}.zip` : folder;
const archive = archiver_1.default('zip', {
gzip: true,
zlib: { level: 9 },
});
if (zip) {
const output = fs_1.createWriteStream(saveDestination);
archive.pipe(output);
}
async_1.forEachLimit(collector, asyncDownload, (item, cb) => {
this.toBuffer(item)
.then(async (buffer) => {
if (buffer.length) {
item.downloaded = true;
if (zip) {
archive.append(buffer, { name: `${item.id}.mp4` });
}
else {
await bluebird_1.fromCallback(cback => fs_1.writeFile(`${saveDestination}/${item.id}.mp4`, buffer, cback));
}
}
else {
item.downloaded = false;
}
cb(null);
})
.catch(() => {
item.downloaded = false;
cb(null);
});
}, error => {
if (error) {
return reject(error);
}
if (zip) {
archive.finalize();
archive.on('end', () => resolve(''));
}
else {
resolve('');
}
});
});
}
async downloadSingleVideo(post) {
const proxy = this.getProxy;
let url = post.videoUrlNoWaterMark;
if (!url) {
url = post.videoUrl;
}
const options = Object.assign(Object.assign({ uri: url, method: 'GET', jar: this.cookieJar, headers: this.headers, encoding: null }, (proxy.proxy && proxy.socks ? { agent: proxy.proxy } : {})), (proxy.proxy && !proxy.socks ? { proxy: `http://${proxy.proxy}/` } : {}));
const result = await request_promise_1.default(options);
await bluebird_1.fromCallback(cb => fs_1.writeFile(`${this.filepath}/${post.id}.mp4`, result, cb));
}
}
exports.Downloader = Downloader;
88 changes: 88 additions & 0 deletions build/core/TikTok.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
/// <reference types="node" />
import { CookieJar } from 'request';
import { EventEmitter } from 'events';
import { PostCollector, TikTokConstructor, Result, MusicMetadata, UserMetadata, HashtagMetadata, Headers } from '../types';
import { Downloader } from '../core';
export declare class TikTokScraper extends EventEmitter {
private mainHost;
private userIdStore;
private download;
private filepath;
private json2csvParser;
private filetype;
private input;
private proxy;
private strictSSL;
private number;
private since;
private asyncDownload;
private asyncScraping;
private collector;
private event;
private scrapeType;
private cli;
private spinner;
private byUserId;
private storeHistory;
private historyPath;
private idStore;
Downloader: Downloader;
private storeValue;
private maxCursor;
private noWaterMark;
private noDuplicates;
private timeout;
private bulk;
private validHeaders;
private csrf;
private zip;
private fileName;
private test;
private hdVideo;
private webHookUrl;
private method;
private httpRequests;
headers: Headers;
private sessionList;
private verifyFp;
private store;
cookieJar: CookieJar;
constructor({ download, filepath, filetype, proxy, strictSSL, asyncDownload, cli, event, progress, input, number, since, type, by_user_id, store_history, historyPath, noWaterMark, useTestEndpoints, fileName, timeout, bulk, zip, test, hdVideo, webHookUrl, method, headers, verifyFp, sessionList, }: TikTokConstructor);
private get fileDestination();
private get folderDestination();
private get getApiEndpoint();
private get getProxy();
private request;
private returnInitError;
scrape(): Promise<Result | any>;
private withoutWatermark;
private extractVideoId;
private getUrlWithoutTheWatermark;
private mainLoop;
private submitScrapingRequest;
private saveCollectorData;
saveMetadata({ json, csv }: {
json: any;
csv: any;
}): Promise<void>;
private getDownloadedVideosFromHistory;
private storeDownloadProgress;
private mapItem;
private collectPosts;
private getValidHeaders;
private scrapeData;
private getTrendingFeedQuery;
private getMusicFeedQuery;
private getHashTagId;
private getUserId;
getUserProfileInfo(): Promise<UserMetadata>;
getHashtagInfo(): Promise<HashtagMetadata>;
getMusicInfo(): Promise<MusicMetadata>;
signUrl(): Promise<any>;
signGivenUrl(url: any): Promise<any>;
private getVideoMetadataFromHtml;
private getVideoLink;
private getVideoMetadata;
getVideoMeta(html?: boolean): Promise<PostCollector>;
private sendDataToWebHookUrl;
}
Loading