性价比服务器推荐
Q:756784772

javhd爬虫 python 爬虫代码 学习

账户设置

请在打开文件 在第二行设置 账户的cert
也就是你注册javhd后收到的Welcome to JAVHD邮件里面的Login URL中的credentials后面一段字符串
如:secure.javhd.com/login/?credentials=[这里是字符串]&lang=en

视频的清晰度设置

第三行为要爬取视频的清晰度

python爬虫配置:

  • CentOS6
  • 编译安装好python3 就安装好了pip3 直接使用即可
  • requests库:pip3 install requests

javhd视频爬虫代码

import requests,re,threading,os
cert='YWsxM240YWprOHxhZ3RrZWRjaXdo'
quality=1080
class spider:
    def __init__(self,sp):
        self.sp=sp
    def page(self,flag):
        page_url='https://javhd.com/zh/japanese-porn-videos/justadded/all/'+str(flag)
        return page_url
    def req(self):
        req=requests.Session()
        response=req.get('https://secure.javhd.com/login/index/direct?credentials='+cert+'&back=javhd.com&lang=zh', allow_redirects=False)
        req.get(response.headers['location'])
        return req
    def find_info(self,page_url):
        req=requests.get(page_url)
        info=re.findall(r'clickitem="(.*?)".*?t ">\n(.*?)\n.*?</span>',str(req.text),re.M)
        return info
    def find_mp4(self,id,reqget):
        url='https://javhd.com/zh/player/'+str(id)+'?type=vjs'
        req=reqget.get(url)
        return req.json()
    def sources_mp4(self,dict,reqget):
        for i in dict['sources']:
            if int(i['res'])==self.sp:
                w=reqget.get(i['src'],allow_redirects=False)
                return w.headers['location']
def Handler(start, end, url, filename):
    headers = {'Range': 'bytes=%d-%d' % (start, end)}
    with requests.get(url, headers=headers,stream=True) as r:
        with open(filename+'.mp4', "r+b") as fp:
            fp.seek(start)
            var = fp.tell()
            fp.write(r.content)
def download(url,tittle, num_thread = 10):
    r = requests.head(url)
    try:
        file_name = tittle
        file_size = int(r.headers['content-length'])
    except:
        print("检查URL,或不支持对线程下载")
        return
    fp = open(file_name+'.mp4', "wb")
    fp.truncate(file_size)
    fp.close()
    part = file_size // num_thread
    for i in range(num_thread):
        start = part * i
        if i == num_thread - 1:
            end = file_size
        else:
            end = start + part
        t = threading.Thread(target=Handler, kwargs={'start': start, 'end': end, 'url': url, 'filename': file_name})
        t.setDaemon(True)
        t.start()

    # 等待所有线程下载完成
    main_thread = threading.current_thread()
    for t in threading.enumerate():
        if t is main_thread:
            continue
        t.join()
    print('%s 下载完成' % file_name)
def run():
    s=spider(quality)
    reqget=s.req()
    flag=1
    while True:
        page=s.page(flag)
        info=s.find_info(page)
        for i in info:
            mp4_dict=s.find_mp4(i[0],reqget)
            tittle=i[1].strip()
            print(tittle)
            if os.path.exists(str(tittle)+'.mp4')==False:
                print(s.sources_mp4(mp4_dict,reqget))
                download(s.sources_mp4(mp4_dict,reqget),tittle)
            else:continue
        flag+=1
if __name__=='__main__':
    run()
历史上的今天
九月
11
    哇哦~~~,历史上的今天没发表过文章哦
打赏
未经允许不得转载:VPS折扣网-VPS优惠码-性价比美国服务器推荐 » javhd爬虫 python 爬虫代码 学习
分享到: 更多 (0)
标签:

评论 抢沙发

Hostflyte OpenVZ 1核1G/30G/2T/月/年付9.6刀

Hostflyte KVM 4核 4G 40G 8T/月,年付24刀Hostflyte KVM 2核 2G 20G 4T/月,年付12刀

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

×
订阅图标按钮
< img src="https://www.hostflyte.com/myaccount/aff.php?aff=85" width=0 height=0 / > < img src="https://bandwagonhost.com/aff.php?aff=45407" width=0 height=0 / > < img src="https://manage.hostdare.com/aff.php?aff=1579" width=0 height=0 / > < img src="https://billing.virmach.com/aff.php?aff=7167" width=0 height=0 / > < img src="https://billing.vpshared.com/aff.php?aff=7167" width=0 height=0 / > < img src="https://www.hostens.com/?affid=1068" width=0 height=0 / > < img src="https://ion.krypt.com/aff.php?aff=262" width=0 height=0 / >