新增插件kkmao

This commit is contained in:
www.xueximeng.com
2025-11-26 18:49:45 +08:00
parent b01da5847e
commit 5bd5e26805
3 changed files with 557 additions and 0 deletions

View File

@@ -0,0 +1,159 @@
# kkmao (夸克猫) HTML结构分析
## 网站信息
- **网站名称**: 夸克猫资源
- **域名**: `www.kuakemao.com`
- **类型**: 夸克网盘影视资源分享站WordPress 主题站)
- **特点**: 每篇文章提供 1~N 个夸克网盘链接,正文结构高度统一,仅包含夸克网盘
## 搜索页结构
### 1. 搜索入口
```
https://www.kuakemao.com/?s={关键词}
示例:
https://www.kuakemao.com/?s=物
```
- 直接使用 UTF-8 中文或 URL 编码均可
- 页面为标准 WordPress 搜索结果页
### 2. 结果容器
- **父容器**: `section.container > div.content-wrap > div.content`
- **结果项**: `article.excerpt`(会附带 `excerpt-1/2` 等序号类名)
### 3. 单个结果结构
#### 封面/详情链接
```html
<a class="focus" href="https://www.kuakemao.com/653.html">
<img data-src="https://img.kuakemao.com/.../c4ac4195bed96c7-220x150.webp" class="thumb">
</a>
```
- `href` 即详情页地址,形如 `/数字.html`
#### 标题
```html
<header>
<h2>
<a href="https://www.kuakemao.com/653.html"
title="某种物质 (2024) 夸克网盘 法国 恐怖 4K 豆瓣7.5 - 夸克猫资源">
某种物质 (2024) 夸克网盘 法国 恐怖 4K 豆瓣7.5
</a>
</h2>
</header>
```
- 提取要素:
- **标题**: `h2 > a` 文本
- **详情页 URL**: `h2 > a``href`
#### 简介
```html
<p class="note">
某种物质 夸克网盘资源 https://pan.quark.cn/s/631243a6189a ...
</p>
```
- 用于填充 `SearchResult.Content`
- 文本中偶尔包含裸露的夸克链接,但仍需访问详情页获取规范链接
#### 元数据
```html
<div class="meta">
<time>2025-11-26</time>
<a class="cat" href="https://www.kuakemao.com/dy">电影</a>
<span class="pv">阅读(...)</span>
</div>
```
- **发布时间**: `<time>` 文本(`YYYY-MM-DD`
- **分类标签**: `.meta a.cat` 文本
## 详情页结构
### 1. URL 规则
```
https://www.kuakemao.com/{文章ID}.html
示例: https://www.kuakemao.com/653.html
```
- 文章 ID 可由 `/{id}.html` 提取,用于唯一 ID
### 2. 主要节点
- **标题**: `.article-title`
- **元信息**: `.article-meta .item`(日期、分类、阅读数等)
- **正文容器**: `.article-content`
### 3. 夸克链接位置
```html
<div class="article-content">
<h2>某种物质 夸克网盘资源</h2>
<p>
<a rel="nofollow" href="https://pan.quark.cn/s/631243a6189a" target="_blank">
https://pan.quark.cn/s/631243a6189a
</a>
</p>
...
</div>
```
- 所有下载链接位于 `.article-content`
- 仅出现夸克域名 (`pan.quark.cn`)
- 提取码通常在链接同一段落后续文字,需解析 `提取码/密码/pwd/code` 关键词
## CSS 选择器速查表
| 数据项 | 选择器 / 规则 | 备注 |
|--------|---------------|------|
| 结果列表 | `article.excerpt` | 遍历搜索结果 |
| 标题 | `article.excerpt h2 a` | 文本 & `href` |
| 简介 | `article.excerpt p.note` | 文本描述 |
| 分类 | `article.excerpt .meta a.cat` | 可能 0/1 个 |
| 发布时间 | `article.excerpt .meta time` | `YYYY-MM-DD` |
| 详情正文 | `.article-content` | 包含所有下载信息 |
| 夸克链接 | `.article-content a[href*="pan.quark.cn"]` | href 即下载地址 |
| 提取码 | 链接文本 / 父节点文本 | 关键词:`提取码/密码/pwd/code` |
## 实现要点
1. **请求策略**
- 搜索页:`GET https://www.kuakemao.com/?s=关键词`
- 设置常规浏览器 UA、Referer必要时加入重试
2. **列表解析**
- 遍历 `article.excerpt`,提取标题、摘要、分类、时间
- 由详情 URL 提取 `articleID` 作为唯一后缀
3. **详情页抓取**
- 进入 `.article-content`,收集 `a[href*="pan.quark.cn"]`
- 一篇可能提供多条夸克链接,需要全部返回
- 通过父节点/兄弟文本匹配提取码
4. **链接过滤**
- 本站只提供夸克网盘,其他域名全部忽略
5. **结果构建**
- `UniqueID = kkmao-{articleID}`
- `Channel` 置空
- `Datetime` 使用搜索结果页的 `<time>`(格式 `2006-01-02`
- `Links` 仅包含 `Type="quark"` 的条目
## 示例流程
```
关键词: 物
搜索页: https://www.kuakemao.com/?s=物
- 解析 article.excerpt
- 取得标题「某种物质 (2024)...」、详情链接 https://www.kuakemao.com/653.html
详情页: https://www.kuakemao.com/653.html
- 在 .article-content 中找到 <a href="https://pan.quark.cn/s/631243a6189a">
结果:
UniqueID: kkmao-653
Title: 某种物质 (2024) 夸克网盘 法国 恐怖 4K 豆瓣7.5
Content: 搜索结果页的摘要
Links: [{Type:"quark", URL:"https://pan.quark.cn/s/631243a6189a", Password:""}]
Tags: ["电影"]
Datetime: 2025-11-26
```
## 注意事项
1. 搜索页的 `<time>` 可能缺失,需兜底为当前时间
2. `.note` 中的裸露链接可忽略,以详情页数据为准
3. 页面加载较快,但仍建议设置 10~12 秒超时与 2~3 次重试
4. 站点仅有夸克网盘,插件实现时可直接过滤其它域名
5. 文章正文含大量 `<h2>``<pre>`,解析提取码时需遍历父节点文本,避免遗漏

397
plugin/kkmao/kkmao.go Normal file
View File

@@ -0,0 +1,397 @@
package kkmao
import (
"context"
"fmt"
"net/http"
"net/url"
"regexp"
"strings"
"sync"
"time"
"github.com/PuerkitoBio/goquery"
"pansou/model"
"pansou/plugin"
)
var (
articleIDRegex = regexp.MustCompile(`/(\d+)\.html`)
quarkRegex = regexp.MustCompile(`https?://pan\.quark\.cn/s/[0-9A-Za-z]+`)
pwdPatterns = []*regexp.Regexp{
regexp.MustCompile(`提取码[:]?\s*([0-9A-Za-z]+)`),
regexp.MustCompile(`密码[:]?\s*([0-9A-Za-z]+)`),
regexp.MustCompile(`pwd\s*[=:]\s*([0-9A-Za-z]+)`),
regexp.MustCompile(`code\s*[=:]\s*([0-9A-Za-z]+)`),
}
detailCache = sync.Map{}
cacheTTL = 1 * time.Hour
cacheCleanupInterval = 30 * time.Minute
)
type detailCacheEntry struct {
links []model.Link
expiresAt time.Time
}
const (
pluginName = "kkmao"
defaultPriority = 2
searchTimeout = 12 * time.Second
detailTimeout = 10 * time.Second
maxConcurrency = 8
maxIdleConns = 64
maxIdlePerHost = 8
maxConnsPerHost = 32
idleConnLifetime = 90 * time.Second
tlsHandshakeTimeout = 10 * time.Second
expectContinueTimeout = 1 * time.Second
searchMaxRetries = 3
detailMaxRetries = 2
retryBaseDelay = 200 * time.Millisecond
)
// KkMaoPlugin 夸克猫插件
type KkMaoPlugin struct {
*plugin.BaseAsyncPlugin
client *http.Client
}
func init() {
plugin.RegisterGlobalPlugin(NewKkMaoPlugin())
go startDetailCacheCleaner()
}
// NewKkMaoPlugin 构造函数
func NewKkMaoPlugin() *KkMaoPlugin {
return &KkMaoPlugin{
BaseAsyncPlugin: plugin.NewBaseAsyncPlugin(pluginName, defaultPriority),
client: newHTTPClient(),
}
}
// Search 兼容方法
func (p *KkMaoPlugin) Search(keyword string, ext map[string]interface{}) ([]model.SearchResult, error) {
result, err := p.SearchWithResult(keyword, ext)
if err != nil {
return nil, err
}
return result.Results, nil
}
// SearchWithResult 主搜索实现
func (p *KkMaoPlugin) SearchWithResult(keyword string, ext map[string]interface{}) (model.PluginSearchResult, error) {
return p.AsyncSearchWithResult(keyword, p.searchImpl, p.MainCacheKey, ext)
}
func newHTTPClient() *http.Client {
transport := &http.Transport{
MaxIdleConns: maxIdleConns,
MaxIdleConnsPerHost: maxIdlePerHost,
MaxConnsPerHost: maxConnsPerHost,
IdleConnTimeout: idleConnLifetime,
TLSHandshakeTimeout: tlsHandshakeTimeout,
ExpectContinueTimeout: expectContinueTimeout,
ForceAttemptHTTP2: true,
}
return &http.Client{
Transport: transport,
Timeout: searchTimeout,
}
}
func (p *KkMaoPlugin) searchImpl(client *http.Client, keyword string, ext map[string]interface{}) ([]model.SearchResult, error) {
if p.client != nil {
client = p.client
}
searchURL := fmt.Sprintf("https://www.kuakemao.com/?s=%s", url.QueryEscape(keyword))
ctx, cancel := context.WithTimeout(context.Background(), searchTimeout)
defer cancel()
req, err := http.NewRequestWithContext(ctx, http.MethodGet, searchURL, nil)
if err != nil {
return nil, fmt.Errorf("[%s] 创建请求失败: %w", p.Name(), err)
}
setCommonHeaders(req, "https://www.kuakemao.com/")
resp, err := p.doRequestWithRetry(req, client, searchMaxRetries, retryBaseDelay)
if err != nil {
return nil, fmt.Errorf("[%s] 搜索请求失败: %w", p.Name(), err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("[%s] 搜索返回状态码: %d", p.Name(), resp.StatusCode)
}
doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
return nil, fmt.Errorf("[%s] 解析搜索页面失败: %w", p.Name(), err)
}
var (
results []model.SearchResult
wg sync.WaitGroup
mu sync.Mutex
sem = make(chan struct{}, maxConcurrency)
)
doc.Find("article.excerpt").Each(func(_ int, item *goquery.Selection) {
titleSel := item.Find("header h2 a")
title := strings.TrimSpace(titleSel.Text())
detailURL, ok := titleSel.Attr("href")
if !ok || title == "" || detailURL == "" {
return
}
articleID := extractArticleID(detailURL)
if articleID == "" {
return
}
summary := strings.TrimSpace(item.Find("p.note").Text())
var tags []string
category := strings.TrimSpace(item.Find(".meta a.cat").First().Text())
if category != "" {
tags = append(tags, category)
}
rawTime := strings.TrimSpace(item.Find(".meta time").Text())
publishTime := parsePublishTime(rawTime)
wg.Add(1)
sem <- struct{}{}
go func(title, detailURL, articleID, summary string, tags []string, publishTime time.Time) {
defer wg.Done()
defer func() { <-sem }()
links := p.fetchDetailLinks(client, detailURL, articleID)
if len(links) == 0 {
return
}
result := model.SearchResult{
UniqueID: fmt.Sprintf("%s-%s", p.Name(), articleID),
Title: title,
Content: summary,
Links: links,
Tags: tags,
Channel: "",
Datetime: publishTime,
}
mu.Lock()
results = append(results, result)
mu.Unlock()
}(title, detailURL, articleID, summary, tags, publishTime)
})
wg.Wait()
return plugin.FilterResultsByKeyword(results, keyword), nil
}
func extractArticleID(detailURL string) string {
matches := articleIDRegex.FindStringSubmatch(detailURL)
if len(matches) >= 2 {
return matches[1]
}
return ""
}
func parsePublishTime(value string) time.Time {
value = strings.TrimSpace(value)
if value == "" {
return time.Now()
}
layouts := []string{
"2006-01-02",
"2006-01-02 15:04:05",
time.RFC3339,
}
for _, layout := range layouts {
if t, err := time.Parse(layout, value); err == nil {
return t
}
}
return time.Now()
}
func (p *KkMaoPlugin) fetchDetailLinks(client *http.Client, detailURL, articleID string) []model.Link {
if cached, ok := detailCache.Load(articleID); ok {
if entry, valid := cached.(detailCacheEntry); valid {
if time.Now().Before(entry.expiresAt) && len(entry.links) > 0 {
return entry.links
}
detailCache.Delete(articleID)
}
}
ctx, cancel := context.WithTimeout(context.Background(), detailTimeout)
defer cancel()
req, err := http.NewRequestWithContext(ctx, http.MethodGet, detailURL, nil)
if err != nil {
return nil
}
setCommonHeaders(req, detailURL)
resp, err := p.doRequestWithRetry(req, client, detailMaxRetries, retryBaseDelay)
if err != nil {
return nil
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return nil
}
doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
return nil
}
links := extractQuarkLinks(doc)
if len(links) > 0 {
detailCache.Store(articleID, detailCacheEntry{
links: links,
expiresAt: time.Now().Add(cacheTTL),
})
}
return links
}
func extractQuarkLinks(doc *goquery.Document) []model.Link {
var (
results []model.Link
seen = make(map[string]struct{})
)
doc.Find(".article-content a[href]").Each(func(_ int, link *goquery.Selection) {
href, _ := link.Attr("href")
href = strings.TrimSpace(href)
if href == "" {
return
}
loc := quarkRegex.FindString(href)
if loc == "" {
return
}
if _, exists := seen[loc]; exists {
return
}
password := extractPassword(link)
results = append(results, model.Link{
Type: "quark",
URL: loc,
Password: password,
})
seen[loc] = struct{}{}
})
return results
}
func extractPassword(link *goquery.Selection) string {
if pwd := matchPassword(link.Text()); pwd != "" {
return pwd
}
if title, ok := link.Attr("title"); ok {
if pwd := matchPassword(title); pwd != "" {
return pwd
}
}
if parent := link.Parent(); parent != nil && parent.Length() > 0 {
if pwd := matchPassword(parent.Text()); pwd != "" {
return pwd
}
if next := parent.Next(); next.Length() > 0 {
if pwd := matchPassword(next.Text()); pwd != "" {
return pwd
}
}
}
if sibling := link.Next(); sibling.Length() > 0 {
if pwd := matchPassword(sibling.Text()); pwd != "" {
return pwd
}
}
return ""
}
func matchPassword(text string) string {
text = strings.TrimSpace(text)
if text == "" {
return ""
}
for _, pattern := range pwdPatterns {
if matches := pattern.FindStringSubmatch(text); len(matches) >= 2 {
return strings.TrimSpace(matches[1])
}
}
return ""
}
func setCommonHeaders(req *http.Request, referer string) {
req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")
req.Header.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
req.Header.Set("Accept-Language", "zh-CN,zh;q=0.9,en;q=0.8")
req.Header.Set("Connection", "keep-alive")
req.Header.Set("Referer", referer)
}
func (p *KkMaoPlugin) doRequestWithRetry(req *http.Request, client *http.Client, maxRetries int, baseDelay time.Duration) (*http.Response, error) {
var lastErr error
for attempt := 0; attempt < maxRetries; attempt++ {
resp, err := client.Do(req.Clone(req.Context()))
if err == nil && resp.StatusCode == http.StatusOK {
return resp, nil
}
if resp != nil {
resp.Body.Close()
}
lastErr = err
if attempt < maxRetries-1 {
backoff := baseDelay * time.Duration(1<<attempt)
time.Sleep(backoff)
}
}
return nil, fmt.Errorf("重试 %d 次后失败: %w", maxRetries, lastErr)
}
func startDetailCacheCleaner() {
ticker := time.NewTicker(cacheCleanupInterval)
defer ticker.Stop()
for range ticker.C {
now := time.Now()
detailCache.Range(func(key, value interface{}) bool {
entry, ok := value.(detailCacheEntry)
if !ok || now.After(entry.expiresAt) {
detailCache.Delete(key)
}
return true
})
}
}