mirror of
https://github.com/fish2018/pansou.git
synced 2026-05-06 21:51:31 +08:00
huban 修正搜索url 重构为html 增加封面获取
This commit is contained in:
133
plugin/huban/html结构分析.md
Normal file
133
plugin/huban/html结构分析.md
Normal file
@@ -0,0 +1,133 @@
|
||||
# Huban HTML 数据结构分析
|
||||
|
||||
## 基本信息
|
||||
- **数据源类型**: HTML 网页
|
||||
- **搜索URL格式**: `http://xsayang.fun:12512/index.php/vod/search/wd/{关键词}.html`
|
||||
- **详情URL格式**: `http://xsayang.fun:12512/index.php/vod/detail/id/{资源ID}.html`
|
||||
- **数据特点**: 视频点播(VOD)系统网页,提供HTML格式的影视资源数据
|
||||
- **特殊说明**: 使用HTML解析替代JSON API,与erxiao/zhizhen/muou插件使用相同的HTML结构
|
||||
|
||||
## HTML 页面结构
|
||||
|
||||
### 搜索结果页面 (`.module-search-item`)
|
||||
搜索结果页面包含多个搜索项,每个搜索项的HTML结构如下:
|
||||
|
||||
```html
|
||||
<div class="module-search-item">
|
||||
<div class="module-item-pic">
|
||||
<img data-src="https://..." />
|
||||
</div>
|
||||
<div class="module-item-text">
|
||||
<div class="video-info-header">
|
||||
<h3><a href="/index.php/vod/detail/id/12345.html">电影标题</a></h3>
|
||||
<span class="video-info-remarks">HD</span>
|
||||
</div>
|
||||
<div class="video-info-items">
|
||||
<div class="video-info-item">
|
||||
<span class="video-info-itemtitle">分类:</span>
|
||||
<span class="video-info-item">动作</span>
|
||||
</div>
|
||||
<div class="video-info-item">
|
||||
<span class="video-info-itemtitle">导演:</span>
|
||||
<span class="video-info-item">导演名字</span>
|
||||
</div>
|
||||
<div class="video-info-item">
|
||||
<span class="video-info-itemtitle">主演:</span>
|
||||
<span class="video-info-item">演员1,演员2</span>
|
||||
</div>
|
||||
<div class="video-info-item">
|
||||
<span class="video-info-itemtitle">年份:</span>
|
||||
<span class="video-info-item">2024</span>
|
||||
</div>
|
||||
<div class="video-info-item">
|
||||
<span class="video-info-itemtitle">剧情:</span>
|
||||
<span class="video-info-item">这是一部精彩的电影...</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
### 详情页面 (`.mobile-play` 和 `#download-list`)
|
||||
详情页面包含海报图片和下载链接:
|
||||
|
||||
```html
|
||||
<div class="mobile-play">
|
||||
<img class="lazyload" data-src="https://poster-url.jpg" />
|
||||
</div>
|
||||
|
||||
<div id="download-list">
|
||||
<div class="module-row-one">
|
||||
<div class="module-row-text">
|
||||
<span data-clipboard-text="https://pan.quark.cn/s/xxxxx">夸克网盘</span>
|
||||
</div>
|
||||
</div>
|
||||
<div class="module-row-one">
|
||||
<div class="module-row-text">
|
||||
<span data-clipboard-text="https://pan.baidu.com/s/xxxxx?pwd=xxxx">百度网盘</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
## CSS 选择器参考
|
||||
|
||||
### 搜索结果提取
|
||||
- **搜索结果容器**: `.module-search-item`
|
||||
- **标题**: `.video-info-header h3 a` (文本内容)
|
||||
- **详情页链接**: `.video-info-header h3 a` (href属性)
|
||||
- **封面图片**: `.module-item-pic > img` (data-src属性)
|
||||
- **质量/状态**: `.video-info-header .video-info-remarks` (文本内容)
|
||||
|
||||
### 详情页下载链接提取
|
||||
- **海报图片**: `.mobile-play .lazyload` (data-src属性)
|
||||
- **下载链接容器**: `#download-list .module-row-one`
|
||||
- **下载链接**: `[data-clipboard-text]` (data-clipboard-text属性)
|
||||
|
||||
## 支持的网盘类型
|
||||
- **Quark网盘**: `https://pan.quark.cn/s/{分享码}`
|
||||
- **百度网盘**: `https://pan.baidu.com/s/{分享码}?pwd={密码}`
|
||||
- **阿里云盘**: `https://www.aliyundrive.com/s/{分享码}`
|
||||
- **迅雷网盘**: `https://pan.xunlei.com/s/{分享码}`
|
||||
- **天翼云盘**: `https://cloud.189.cn/t/{分享码}`
|
||||
- **UC网盘**: `https://drive.uc.cn/s/{分享码}`
|
||||
- **115网盘**: `https://115.com/s/{分享码}`
|
||||
- **123网盘**: `https://123pan.com/s/{分享码}`
|
||||
- **PikPak**: `https://mypikpak.com/s/{分享码}`
|
||||
- **移动云盘**: `https://caiyun.feixin.10086.cn/{分享码}`
|
||||
- **磁力链接**: `magnet:?xt=urn:btih:{hash}`
|
||||
- **ED2K链接**: `ed2k://|file|...`
|
||||
|
||||
## 数据流程
|
||||
|
||||
### 搜索流程
|
||||
1. **构建搜索URL**: `http://xsayang.fun:12512/index.php/vod/search/wd/{keyword}.html`
|
||||
2. **发送HTTP请求**: 获取搜索结果页面
|
||||
3. **解析HTML**: 使用goquery解析页面
|
||||
4. **提取搜索项**: 遍历`.module-search-item`元素
|
||||
5. **异步获取详情**: 并发请求详情页面获取下载链接
|
||||
6. **缓存管理**: 使用sync.Map缓存详情页结果,TTL为1小时
|
||||
7. **关键词过滤**: 过滤不相关的结果
|
||||
|
||||
## 并发控制
|
||||
- **最大并发数**: 20 (MaxConcurrency)
|
||||
- **搜索超时**: 8秒 (DefaultTimeout)
|
||||
- **详情页超时**: 6秒 (DetailTimeout)
|
||||
- **缓存TTL**: 1小时 (cacheTTL)
|
||||
|
||||
## 性能统计
|
||||
- **搜索请求数**: 总搜索请求数
|
||||
- **平均搜索时间**: 单次搜索平均耗时(毫秒)
|
||||
- **详情页请求数**: 总详情页请求数
|
||||
- **平均详情页时间**: 单次详情页请求平均耗时(毫秒)
|
||||
- **缓存命中数**: 详情页缓存命中次数
|
||||
- **缓存未命中数**: 详情页缓存未命中次数
|
||||
|
||||
## 注意事项
|
||||
1. **HTML解析**: 使用goquery库进行HTML解析
|
||||
2. **异步获取详情**: 搜索结果只包含基本信息,需要异步请求详情页获取下载链接
|
||||
3. **并发控制**: 使用信号量限制并发数为20
|
||||
4. **缓存管理**: 使用sync.Map缓存详情页结果,避免重复请求
|
||||
5. **链接验证**: 过滤掉无效链接(如包含`javascript:`、`#`等)
|
||||
6. **密码提取**: 从URL中提取`?pwd=`参数作为密码
|
||||
|
||||
@@ -2,41 +2,58 @@ package huban
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"regexp"
|
||||
"strings"
|
||||
"time"
|
||||
"context"
|
||||
"sync"
|
||||
"sync/atomic"
|
||||
|
||||
"github.com/PuerkitoBio/goquery"
|
||||
"pansou/model"
|
||||
"pansou/plugin"
|
||||
"pansou/util/json"
|
||||
)
|
||||
|
||||
const (
|
||||
// 默认超时时间 - 优化为更短时间
|
||||
// 默认超时时间
|
||||
DefaultTimeout = 8 * time.Second
|
||||
DetailTimeout = 6 * time.Second
|
||||
|
||||
// HTTP连接池配置
|
||||
MaxIdleConns = 200
|
||||
MaxIdleConnsPerHost = 50
|
||||
MaxConnsPerHost = 100
|
||||
IdleConnTimeout = 90 * time.Second
|
||||
|
||||
|
||||
// 并发控制
|
||||
MaxConcurrency = 20
|
||||
|
||||
// 缓存TTL
|
||||
cacheTTL = 1 * time.Hour
|
||||
|
||||
// 请求来源控制 - 默认开启,提高安全性
|
||||
EnableRefererCheck = false
|
||||
|
||||
|
||||
// 调试日志开关
|
||||
DebugLog = false
|
||||
)
|
||||
|
||||
// 性能统计(原子操作)
|
||||
var (
|
||||
searchRequests int64 = 0
|
||||
totalSearchTime int64 = 0 // 纳秒
|
||||
searchRequests int64 = 0
|
||||
totalSearchTime int64 = 0 // 纳秒
|
||||
detailPageRequests int64 = 0
|
||||
totalDetailTime int64 = 0 // 纳秒
|
||||
cacheHits int64 = 0
|
||||
cacheMisses int64 = 0
|
||||
)
|
||||
|
||||
// Detail page缓存
|
||||
var (
|
||||
detailCache sync.Map
|
||||
cacheMutex sync.RWMutex
|
||||
)
|
||||
|
||||
// 请求来源控制配置
|
||||
@@ -59,6 +76,9 @@ var (
|
||||
// 密码提取正则表达式
|
||||
passwordRegex = regexp.MustCompile(`\?pwd=([0-9a-zA-Z]+)`)
|
||||
password115Regex = regexp.MustCompile(`password=([0-9a-zA-Z]+)`)
|
||||
|
||||
// 详情页ID提取正则表达式
|
||||
detailIDRegex = regexp.MustCompile(`/id/(\d+)`)
|
||||
|
||||
// 常见网盘链接的正则表达式(支持16种类型)
|
||||
quarkLinkRegex = regexp.MustCompile(`https?://pan\.quark\.cn/s/[0-9a-zA-Z]+`)
|
||||
@@ -149,7 +169,7 @@ func (p *HubanAsyncPlugin) SearchWithResult(keyword string, ext map[string]inter
|
||||
return p.AsyncSearchWithResult(keyword, p.searchImpl, p.MainCacheKey, ext)
|
||||
}
|
||||
|
||||
// searchImpl 搜索实现(双域名支持)
|
||||
// searchImpl 搜索实现 - HTML解析版本
|
||||
func (p *HubanAsyncPlugin) searchImpl(client *http.Client, keyword string, ext map[string]interface{}) ([]model.SearchResult, error) {
|
||||
// 性能统计
|
||||
start := time.Now()
|
||||
@@ -164,259 +184,296 @@ func (p *HubanAsyncPlugin) searchImpl(client *http.Client, keyword string, ext m
|
||||
client = p.optimizedClient
|
||||
}
|
||||
|
||||
// 定义双域名 - 主备模式
|
||||
urls := []string{
|
||||
fmt.Sprintf("http://xsayang.fun:12512/api.php/provide/vod?ac=detail&wd=%s", url.QueryEscape(keyword)),
|
||||
fmt.Sprintf("http://103.45.162.207:20720/api.php/provide/vod?ac=detail&wd=%s", url.QueryEscape(keyword)),
|
||||
}
|
||||
|
||||
// 主备模式:优先使用第一个域名,失败时切换到第二个
|
||||
for i, searchURL := range urls {
|
||||
if results, err := p.tryRequest(searchURL, client); err == nil {
|
||||
return results, nil
|
||||
} else if i == 0 {
|
||||
// 第一个域名失败,记录日志但继续尝试第二个
|
||||
// fmt.Printf("[%s] 域名1失败,尝试域名2: %v\n", p.Name(), err)
|
||||
}
|
||||
}
|
||||
|
||||
return nil, fmt.Errorf("[%s] 所有域名都请求失败", p.Name())
|
||||
}
|
||||
// 1. 构建搜索URL
|
||||
searchURL := fmt.Sprintf("http://103.45.162.207:20720/index.php/vod/search/wd/%s.html", url.QueryEscape(keyword))
|
||||
|
||||
// tryRequest 尝试单个域名请求
|
||||
func (p *HubanAsyncPlugin) tryRequest(searchURL string, client *http.Client) ([]model.SearchResult, error) {
|
||||
// 创建HTTP请求
|
||||
// 2. 创建带超时的上下文
|
||||
ctx, cancel := context.WithTimeout(context.Background(), DefaultTimeout)
|
||||
defer cancel()
|
||||
|
||||
|
||||
// 3. 创建请求
|
||||
req, err := http.NewRequestWithContext(ctx, "GET", searchURL, nil)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("创建搜索请求失败: %w", err)
|
||||
return nil, fmt.Errorf("[%s] 创建请求失败: %w", p.Name(), err)
|
||||
}
|
||||
|
||||
// 设置请求头
|
||||
|
||||
// 4. 设置请求头
|
||||
req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36")
|
||||
req.Header.Set("Accept", "application/json, text/plain, */*")
|
||||
req.Header.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
|
||||
req.Header.Set("Accept-Language", "zh-CN,zh;q=0.9,en;q=0.8")
|
||||
req.Header.Set("Connection", "keep-alive")
|
||||
req.Header.Set("Cache-Control", "no-cache")
|
||||
|
||||
// 发送请求
|
||||
req.Header.Set("Referer", "http://103.45.162.207:20720/")
|
||||
|
||||
// 5. 发送请求
|
||||
resp, err := p.doRequestWithRetry(req, client)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("搜索请求失败: %w", err)
|
||||
return nil, fmt.Errorf("[%s] 搜索请求失败: %w", p.Name(), err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
// 解析JSON响应
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
|
||||
var apiResponse HubanAPIResponse
|
||||
if err := json.Unmarshal(body, &apiResponse); err != nil {
|
||||
return nil, fmt.Errorf("解析JSON响应失败: %w", err)
|
||||
|
||||
if resp.StatusCode != 200 {
|
||||
return nil, fmt.Errorf("[%s] 搜索请求返回状态码: %d", p.Name(), resp.StatusCode)
|
||||
}
|
||||
|
||||
// 检查API响应状态
|
||||
if apiResponse.Code != 1 {
|
||||
return nil, fmt.Errorf("API返回错误: %s", apiResponse.Msg)
|
||||
|
||||
// 6. 解析搜索结果页面
|
||||
doc, err := goquery.NewDocumentFromReader(resp.Body)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("[%s] 解析搜索页面失败: %w", p.Name(), err)
|
||||
}
|
||||
|
||||
// 解析搜索结果
|
||||
|
||||
// 7. 提取搜索结果
|
||||
var results []model.SearchResult
|
||||
for _, item := range apiResponse.List {
|
||||
if result := p.parseAPIItem(item); result.Title != "" {
|
||||
|
||||
doc.Find(".module-search-item").Each(func(i int, s *goquery.Selection) {
|
||||
result := p.parseSearchItem(s, keyword)
|
||||
if result.UniqueID != "" {
|
||||
results = append(results, result)
|
||||
}
|
||||
})
|
||||
|
||||
// 8. 异步获取详情页信息
|
||||
enhancedResults := p.enhanceWithDetails(client, results)
|
||||
|
||||
// 9. 关键词过滤
|
||||
return plugin.FilterResultsByKeyword(enhancedResults, keyword), nil
|
||||
}
|
||||
|
||||
// parseSearchItem 解析单个搜索结果项
|
||||
func (p *HubanAsyncPlugin) parseSearchItem(s *goquery.Selection, keyword string) model.SearchResult {
|
||||
result := model.SearchResult{}
|
||||
|
||||
// 提取详情页链接和ID
|
||||
detailLink, exists := s.Find(".video-info-header h3 a").First().Attr("href")
|
||||
if !exists {
|
||||
return result
|
||||
}
|
||||
|
||||
return results, nil
|
||||
}
|
||||
|
||||
// HubanAPIResponse API响应结构
|
||||
type HubanAPIResponse struct {
|
||||
Code int `json:"code"`
|
||||
Msg string `json:"msg"`
|
||||
Page int `json:"page"`
|
||||
PageCount int `json:"pagecount"`
|
||||
Limit interface{} `json:"limit"` // 可能是字符串或数字
|
||||
Total int `json:"total"`
|
||||
List []HubanAPIItem `json:"list"`
|
||||
}
|
||||
// 提取ID
|
||||
matches := detailIDRegex.FindStringSubmatch(detailLink)
|
||||
if len(matches) < 2 {
|
||||
return result
|
||||
}
|
||||
itemID := matches[1]
|
||||
|
||||
// HubanAPIItem API数据项
|
||||
type HubanAPIItem struct {
|
||||
VodID int `json:"vod_id"`
|
||||
VodName string `json:"vod_name"`
|
||||
VodActor string `json:"vod_actor"`
|
||||
VodDirector string `json:"vod_director"`
|
||||
VodDownFrom string `json:"vod_down_from"`
|
||||
VodDownURL string `json:"vod_down_url"`
|
||||
VodRemarks string `json:"vod_remarks"`
|
||||
VodPubdate string `json:"vod_pubdate"`
|
||||
VodArea string `json:"vod_area"`
|
||||
VodLang string `json:"vod_lang"`
|
||||
VodYear string `json:"vod_year"`
|
||||
VodContent string `json:"vod_content"`
|
||||
VodBlurb string `json:"vod_blurb"`
|
||||
VodPic string `json:"vod_pic"`
|
||||
}
|
||||
|
||||
// parseAPIItem 解析API数据项
|
||||
func (p *HubanAsyncPlugin) parseAPIItem(item HubanAPIItem) model.SearchResult {
|
||||
// 构建唯一ID
|
||||
uniqueID := fmt.Sprintf("%s-%d", p.Name(), item.VodID)
|
||||
|
||||
// 构建标题
|
||||
title := strings.TrimSpace(item.VodName)
|
||||
uniqueID := fmt.Sprintf("%s-%s", p.Name(), itemID)
|
||||
|
||||
// 提取标题
|
||||
title := strings.TrimSpace(s.Find(".video-info-header h3 a").First().Text())
|
||||
if title == "" {
|
||||
return model.SearchResult{}
|
||||
return result
|
||||
}
|
||||
|
||||
// 构建描述(需要清理数据)
|
||||
content := p.buildContent(item)
|
||||
|
||||
// 解析下载链接(huban特殊格式)
|
||||
links := p.parseHubanLinks(item.VodDownFrom, item.VodDownURL)
|
||||
|
||||
|
||||
// 提取分类
|
||||
category := strings.TrimSpace(s.Find(".video-info-items").First().Find(".video-info-item").First().Text())
|
||||
|
||||
// 提取导演
|
||||
directorElement := s.Find(".video-info-items").FilterFunction(func(i int, item *goquery.Selection) bool {
|
||||
title := strings.TrimSpace(item.Find(".video-info-itemtitle").Text())
|
||||
return strings.Contains(title, "导演")
|
||||
})
|
||||
director := strings.TrimSpace(directorElement.Find(".video-info-item").Text())
|
||||
|
||||
// 提取主演
|
||||
actorElement := s.Find(".video-info-items").FilterFunction(func(i int, item *goquery.Selection) bool {
|
||||
title := strings.TrimSpace(item.Find(".video-info-itemtitle").Text())
|
||||
return strings.Contains(title, "主演")
|
||||
})
|
||||
actor := strings.TrimSpace(actorElement.Find(".video-info-item").Text())
|
||||
|
||||
// 提取年份
|
||||
year := strings.TrimSpace(s.Find(".video-info-items").Last().Find(".video-info-item").First().Text())
|
||||
|
||||
// 提取质量/状态
|
||||
quality := strings.TrimSpace(s.Find(".video-info-header .video-info-remarks").Text())
|
||||
|
||||
// 提取剧情简介
|
||||
plotElement := s.Find(".video-info-items").FilterFunction(func(i int, item *goquery.Selection) bool {
|
||||
title := strings.TrimSpace(item.Find(".video-info-itemtitle").Text())
|
||||
return strings.Contains(title, "剧情")
|
||||
})
|
||||
plot := strings.TrimSpace(plotElement.Find(".video-info-item").Text())
|
||||
|
||||
// 提取封面图片
|
||||
coverImage, _ := s.Find(".module-item-pic > img").Attr("data-src")
|
||||
|
||||
// 构建内容描述
|
||||
var contentParts []string
|
||||
if category != "" {
|
||||
contentParts = append(contentParts, fmt.Sprintf("分类: %s", category))
|
||||
}
|
||||
if director != "" {
|
||||
contentParts = append(contentParts, fmt.Sprintf("导演: %s", director))
|
||||
}
|
||||
if actor != "" {
|
||||
contentParts = append(contentParts, fmt.Sprintf("主演: %s", actor))
|
||||
}
|
||||
if quality != "" {
|
||||
contentParts = append(contentParts, fmt.Sprintf("质量: %s", quality))
|
||||
}
|
||||
if plot != "" {
|
||||
contentParts = append(contentParts, fmt.Sprintf("剧情: %s", plot))
|
||||
}
|
||||
|
||||
// 构建标签
|
||||
var tags []string
|
||||
if item.VodYear != "" {
|
||||
tags = append(tags, item.VodYear)
|
||||
if year != "" {
|
||||
tags = append(tags, year)
|
||||
}
|
||||
// area通常为空,不添加
|
||||
|
||||
|
||||
// 构建图片数组
|
||||
var images []string
|
||||
if coverImage != "" {
|
||||
images = append(images, coverImage)
|
||||
}
|
||||
|
||||
return model.SearchResult{
|
||||
UniqueID: uniqueID,
|
||||
Title: title,
|
||||
Content: content,
|
||||
Links: links,
|
||||
Content: strings.Join(contentParts, " | "),
|
||||
Images: images,
|
||||
Tags: tags,
|
||||
Channel: "", // 插件搜索结果Channel为空
|
||||
Datetime: time.Time{}, // 使用零值而不是nil,参考jikepan插件标准
|
||||
Channel: "",
|
||||
Datetime: time.Time{},
|
||||
}
|
||||
}
|
||||
|
||||
// buildContent 构建内容描述(清理特殊字符)
|
||||
func (p *HubanAsyncPlugin) buildContent(item HubanAPIItem) string {
|
||||
var contentParts []string
|
||||
|
||||
// 清理演员字段(移除前后逗号)
|
||||
if item.VodActor != "" {
|
||||
actor := strings.Trim(item.VodActor, ",")
|
||||
actor = strings.TrimSpace(actor)
|
||||
if actor != "" {
|
||||
contentParts = append(contentParts, fmt.Sprintf("主演: %s", actor))
|
||||
}
|
||||
}
|
||||
|
||||
// 清理导演字段(移除前后逗号)
|
||||
if item.VodDirector != "" {
|
||||
director := strings.Trim(item.VodDirector, ",")
|
||||
director = strings.TrimSpace(director)
|
||||
if director != "" {
|
||||
contentParts = append(contentParts, fmt.Sprintf("导演: %s", director))
|
||||
}
|
||||
}
|
||||
|
||||
if item.VodYear != "" {
|
||||
contentParts = append(contentParts, fmt.Sprintf("年份: %s", item.VodYear))
|
||||
}
|
||||
|
||||
if item.VodRemarks != "" {
|
||||
contentParts = append(contentParts, fmt.Sprintf("状态: %s", item.VodRemarks))
|
||||
}
|
||||
|
||||
return strings.Join(contentParts, " | ")
|
||||
}
|
||||
// enhanceWithDetails 异步获取详情页信息
|
||||
func (p *HubanAsyncPlugin) enhanceWithDetails(client *http.Client, results []model.SearchResult) []model.SearchResult {
|
||||
var enhancedResults []model.SearchResult
|
||||
var wg sync.WaitGroup
|
||||
var mu sync.Mutex
|
||||
|
||||
// parseHubanLinks 解析huban特殊格式的链接
|
||||
func (p *HubanAsyncPlugin) parseHubanLinks(vodDownFrom, vodDownURL string) []model.Link {
|
||||
if vodDownFrom == "" || vodDownURL == "" {
|
||||
return nil
|
||||
}
|
||||
|
||||
// 按$$$分隔网盘类型
|
||||
fromParts := strings.Split(vodDownFrom, "$$$")
|
||||
urlParts := strings.Split(vodDownURL, "$$$")
|
||||
|
||||
var links []model.Link
|
||||
minLen := len(fromParts)
|
||||
if len(urlParts) < minLen {
|
||||
minLen = len(urlParts)
|
||||
}
|
||||
|
||||
for i := 0; i < minLen; i++ {
|
||||
linkType := p.mapHubanCloudType(fromParts[i])
|
||||
if linkType == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
// 解析单个网盘类型的多个链接
|
||||
// 格式: "来源$链接1#标题1$链接2#标题2#"
|
||||
urlSection := urlParts[i]
|
||||
|
||||
// 移除来源前缀(如"小虎斑$")
|
||||
if strings.Contains(urlSection, "$") {
|
||||
urlSection = urlSection[strings.Index(urlSection, "$")+1:]
|
||||
}
|
||||
|
||||
// 按#分隔多个链接
|
||||
linkParts := strings.Split(urlSection, "#")
|
||||
for j := 0; j < len(linkParts); j++ {
|
||||
linkURL := strings.TrimSpace(linkParts[j])
|
||||
|
||||
// 跳过空链接和标题(标题通常不是链接格式)
|
||||
if linkURL == "" || !p.isValidNetworkDriveURL(linkURL) {
|
||||
continue
|
||||
// 创建信号量限制并发数
|
||||
semaphore := make(chan struct{}, MaxConcurrency)
|
||||
|
||||
for _, result := range results {
|
||||
wg.Add(1)
|
||||
go func(result model.SearchResult) {
|
||||
defer wg.Done()
|
||||
semaphore <- struct{}{} // 获取信号量
|
||||
defer func() { <-semaphore }() // 释放信号量
|
||||
|
||||
// 从UniqueID中提取itemID
|
||||
parts := strings.Split(result.UniqueID, "-")
|
||||
if len(parts) < 2 {
|
||||
mu.Lock()
|
||||
enhancedResults = append(enhancedResults, result)
|
||||
mu.Unlock()
|
||||
return
|
||||
}
|
||||
|
||||
// 提取密码
|
||||
password := p.extractPassword(linkURL)
|
||||
|
||||
links = append(links, model.Link{
|
||||
Type: linkType,
|
||||
URL: linkURL,
|
||||
Password: password,
|
||||
})
|
||||
}
|
||||
itemID := parts[1]
|
||||
|
||||
// 检查缓存
|
||||
if cached, ok := detailCache.Load(itemID); ok {
|
||||
atomic.AddInt64(&cacheHits, 1)
|
||||
r := cached.(model.SearchResult)
|
||||
mu.Lock()
|
||||
enhancedResults = append(enhancedResults, r)
|
||||
mu.Unlock()
|
||||
return
|
||||
}
|
||||
|
||||
atomic.AddInt64(&cacheMisses, 1)
|
||||
|
||||
// 获取详情页链接和图片
|
||||
detailLinks, detailImages := p.fetchDetailLinksAndImages(client, itemID)
|
||||
result.Links = detailLinks
|
||||
|
||||
// 合并图片:优先使用详情页的海报,如果没有则使用搜索结果的图片
|
||||
if len(detailImages) > 0 {
|
||||
result.Images = detailImages
|
||||
}
|
||||
|
||||
// 缓存结果
|
||||
detailCache.Store(itemID, result)
|
||||
|
||||
mu.Lock()
|
||||
enhancedResults = append(enhancedResults, result)
|
||||
mu.Unlock()
|
||||
}(result)
|
||||
}
|
||||
|
||||
// 去重(可能存在重复链接)
|
||||
return p.deduplicateLinks(links)
|
||||
|
||||
wg.Wait()
|
||||
return enhancedResults
|
||||
}
|
||||
|
||||
// mapHubanCloudType 映射huban特有的网盘标识符
|
||||
func (p *HubanAsyncPlugin) mapHubanCloudType(apiType string) string {
|
||||
switch strings.ToUpper(apiType) {
|
||||
case "UCWP":
|
||||
return "uc"
|
||||
case "KKWP":
|
||||
return "quark"
|
||||
case "ALWP":
|
||||
return "aliyun"
|
||||
case "BDWP":
|
||||
return "baidu"
|
||||
case "123WP":
|
||||
return "123"
|
||||
case "115WP":
|
||||
return "115"
|
||||
case "TYWP":
|
||||
return "tianyi"
|
||||
case "XYWP":
|
||||
return "xunlei"
|
||||
case "WYWP":
|
||||
return "weiyun"
|
||||
case "LZWP":
|
||||
return "lanzou"
|
||||
case "JGYWP":
|
||||
return "jianguoyun"
|
||||
case "PKWP":
|
||||
return "pikpak"
|
||||
default:
|
||||
return ""
|
||||
// fetchDetailLinksAndImages 获取详情页的下载链接和图片
|
||||
func (p *HubanAsyncPlugin) fetchDetailLinksAndImages(client *http.Client, itemID string) ([]model.Link, []string) {
|
||||
// 性能统计
|
||||
start := time.Now()
|
||||
atomic.AddInt64(&detailPageRequests, 1)
|
||||
defer func() {
|
||||
duration := time.Since(start).Nanoseconds()
|
||||
atomic.AddInt64(&totalDetailTime, duration)
|
||||
}()
|
||||
|
||||
detailURL := fmt.Sprintf("http://103.45.162.207:20720/index.php/vod/detail/id/%s.html", itemID)
|
||||
|
||||
// 创建带超时的上下文
|
||||
ctx, cancel := context.WithTimeout(context.Background(), DetailTimeout)
|
||||
defer cancel()
|
||||
|
||||
// 创建请求
|
||||
req, err := http.NewRequestWithContext(ctx, "GET", detailURL, nil)
|
||||
if err != nil {
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
// 设置请求头
|
||||
req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36")
|
||||
req.Header.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
|
||||
req.Header.Set("Accept-Language", "zh-CN,zh;q=0.9,en;q=0.8")
|
||||
req.Header.Set("Connection", "keep-alive")
|
||||
req.Header.Set("Referer", "http://103.45.162.207:20720/")
|
||||
|
||||
// 发送请求(带重试)
|
||||
resp, err := p.doRequestWithRetry(req, client)
|
||||
if err != nil {
|
||||
return nil, nil
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
if resp.StatusCode != 200 {
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
doc, err := goquery.NewDocumentFromReader(resp.Body)
|
||||
if err != nil {
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
var links []model.Link
|
||||
var images []string
|
||||
|
||||
// 提取详情页的海报图片
|
||||
if posterURL, exists := doc.Find(".mobile-play .lazyload").Attr("data-src"); exists && posterURL != "" {
|
||||
images = append(images, posterURL)
|
||||
}
|
||||
|
||||
// 查找下载链接区域
|
||||
doc.Find("#download-list .module-row-one").Each(func(i int, s *goquery.Selection) {
|
||||
// 从data-clipboard-text属性提取链接
|
||||
if linkURL, exists := s.Find("[data-clipboard-text]").Attr("data-clipboard-text"); exists {
|
||||
// 过滤掉无效链接
|
||||
if p.isValidNetworkDriveURL(linkURL) {
|
||||
if linkType := p.determineLinkType(linkURL); linkType != "" {
|
||||
link := model.Link{
|
||||
Type: linkType,
|
||||
URL: linkURL,
|
||||
Password: "", // 大部分网盘不需要密码
|
||||
}
|
||||
links = append(links, link)
|
||||
}
|
||||
}
|
||||
}
|
||||
})
|
||||
|
||||
return links, images
|
||||
}
|
||||
|
||||
|
||||
|
||||
// isValidNetworkDriveURL 检查URL是否为有效的网盘链接
|
||||
func (p *HubanAsyncPlugin) isValidNetworkDriveURL(url string) bool {
|
||||
// 过滤掉明显无效的链接
|
||||
@@ -497,27 +554,11 @@ func (p *HubanAsyncPlugin) extractPassword(url string) string {
|
||||
return ""
|
||||
}
|
||||
|
||||
// deduplicateLinks 去重链接
|
||||
func (p *HubanAsyncPlugin) deduplicateLinks(links []model.Link) []model.Link {
|
||||
seen := make(map[string]bool)
|
||||
var result []model.Link
|
||||
|
||||
for _, link := range links {
|
||||
key := fmt.Sprintf("%s-%s", link.Type, link.URL)
|
||||
if !seen[key] {
|
||||
seen[key] = true
|
||||
result = append(result, link)
|
||||
}
|
||||
}
|
||||
|
||||
return result
|
||||
}
|
||||
|
||||
// doRequestWithRetry 带重试的HTTP请求(优化JSON API的重试策略)
|
||||
// doRequestWithRetry 带重试的HTTP请求
|
||||
func (p *HubanAsyncPlugin) doRequestWithRetry(req *http.Request, client *http.Client) (*http.Response, error) {
|
||||
maxRetries := 2 // 对于JSON API减少重试次数
|
||||
maxRetries := 2
|
||||
var lastErr error
|
||||
|
||||
|
||||
for i := 0; i < maxRetries; i++ {
|
||||
resp, err := client.Do(req)
|
||||
if err == nil {
|
||||
@@ -529,13 +570,13 @@ func (p *HubanAsyncPlugin) doRequestWithRetry(req *http.Request, client *http.Cl
|
||||
} else {
|
||||
lastErr = err
|
||||
}
|
||||
|
||||
// JSON API快速重试:只等待很短时间
|
||||
|
||||
// 快速重试:只等待很短时间
|
||||
if i < maxRetries-1 {
|
||||
time.Sleep(100 * time.Millisecond) // 从秒级改为100毫秒
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
return nil, fmt.Errorf("[%s] 请求失败,重试%d次后仍失败: %w", p.Name(), maxRetries, lastErr)
|
||||
}
|
||||
|
||||
@@ -543,16 +584,30 @@ func (p *HubanAsyncPlugin) doRequestWithRetry(req *http.Request, client *http.Cl
|
||||
func (p *HubanAsyncPlugin) GetPerformanceStats() map[string]interface{} {
|
||||
totalRequests := atomic.LoadInt64(&searchRequests)
|
||||
totalTime := atomic.LoadInt64(&totalSearchTime)
|
||||
|
||||
detailRequests := atomic.LoadInt64(&detailPageRequests)
|
||||
detailTime := atomic.LoadInt64(&totalDetailTime)
|
||||
hits := atomic.LoadInt64(&cacheHits)
|
||||
misses := atomic.LoadInt64(&cacheMisses)
|
||||
|
||||
var avgTime float64
|
||||
if totalRequests > 0 {
|
||||
avgTime = float64(totalTime) / float64(totalRequests) / 1e6 // 转换为毫秒
|
||||
}
|
||||
|
||||
|
||||
var avgDetailTime float64
|
||||
if detailRequests > 0 {
|
||||
avgDetailTime = float64(detailTime) / float64(detailRequests) / 1e6 // 转换为毫秒
|
||||
}
|
||||
|
||||
return map[string]interface{}{
|
||||
"search_requests": totalRequests,
|
||||
"avg_search_time_ms": avgTime,
|
||||
"search_requests": totalRequests,
|
||||
"avg_search_time_ms": avgTime,
|
||||
"total_search_time_ns": totalTime,
|
||||
"detail_page_requests": detailRequests,
|
||||
"avg_detail_time_ms": avgDetailTime,
|
||||
"total_detail_time_ns": detailTime,
|
||||
"cache_hits": hits,
|
||||
"cache_misses": misses,
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user