12 Commits

Author SHA1 Message Date
b650030f26 README 2025-10-22 02:25:35 +11:00
b136556822 test(unscrambleParagraphs): try a crawling method without playwright 2025-10-22 02:17:11 +11:00
17c3859e9e feat(logging): Implement structured logging and debug mode
fix: Windows cannot download novels correctly
2025-10-17 01:36:23 +11:00
11fccdb05f ci(goreleaser): Install templ before generating templates
Adds a `go install` hook to `.goreleaser.yaml` to ensure the `templ` binary is
installed and up-to-date before `templ generate` is executed. This prevents
potential build failures in CI/CD environments where `templ` might not be
pre-installed or could be an outdated version, making the release process
more robust and self-contained.
2025-10-06 18:20:47 +11:00
af968cbc9a ci(workflow): Upgrade GitHub Actions in release workflow
Updated the major versions of several GitHub Actions used in the release workflow:
- actions/checkout from v4 to v5
- actions/setup-go from v4 to v5
- goreleaser/goreleaser-action from v5 to v6

This ensures we are using the latest features, bug fixes, and security updates provided by these actions.
2025-10-06 18:11:27 +11:00
08e6280c34 feat: Add NFPM packaging and defer Playwright installation
This commit introduces NFPM configuration in `.goreleaser.yaml` to
generate native packages for various Linux distributions (e.g., .deb,
.rpm, .apk). This provides a more streamlined installation experience
for Linux users.

The Playwright browser installation logic has been moved from `main.go`
to the `Run` function of the `download` command. This change ensures
that Playwright binaries are only downloaded and installed when the
`download` command is actually invoked, improving initial application
startup performance and reducing unnecessary overhead for other commands.

The Goreleaser configuration has also been updated to version 2 syntax
and the `arm` architecture has been removed from builds.
2025-10-06 18:07:54 +11:00
34179b4dc0 Create LICENSE 2025-10-06 18:03:04 +11:00
b0f8f31dcc feat: Add concurrency and headless options for downloads
This commit introduces new features for controlling the download process:

-   **Concurrency**: Users can now specify the number of concurrent volume downloads using the `--concurrency` flag. This significantly speeds up the download of entire novels.
-   **Headless Mode**: A `--headless` flag has been added to control whether the browser operates in headless mode (without a visible UI). This is useful for debugging or running in environments without a display.

**Changes include:**

-   Updated `download` command to accept `--concurrency` and `--headless` flags.
-   Refactored `bilinovel` downloader to support `BilinovelNewOption` for configuring headless mode and concurrency.
-   Implemented a page pool and concurrency control mechanism within the `bilinovel` downloader to manage concurrent browser page usage.
-   Added `DownloadNovel` and `DownloadVolume` methods to the `bilinovel` downloader, utilizing goroutines and wait groups for parallel processing.
-   Updated `.vscode/launch.json` with new configurations for testing novel and volume downloads with the new options.
2025-10-06 10:20:36 +11:00
6084386989 refactor(bilinovel): Migrate browser automation from Chromedp to Playwright
This commit replaces the `chromedp` library with `playwright-go` for browser automation within the Bilinovel downloader.

Changes include:
*   Updated `Bilinovel` struct to manage Playwright browser, context, and page instances.
*   Rewrote `initBrowser` and `Close` methods to use Playwright's API for browser lifecycle management.
*   Refactored `processContentWithChromedp` to `processContentWithPlaywright`, adapting the logic to use Playwright's page evaluation capabilities.
*   Removed unused `context` and `time` imports.
*   Added HTML cleanup in `getChapterByPage` to remove `class` attributes from images and `data-k` attributes from all elements, improving content consistency.
2025-10-06 07:58:31 +11:00
f1320cb978 Merge pull request #2 from sarymo/patch-3
fix: normalize path separators in wrapper.go
2025-09-03 13:02:07 +10:00
sarymo
434d5f54bd Update wrapper.go 2025-09-03 08:39:30 +08:00
b8cd053b00 refactor: improve network event handling and cleanup of hidden elements in Bilinovel processing 2025-08-24 20:51:09 +10:00
17 changed files with 807 additions and 206 deletions

28
.github/workflows/release.yml vendored Normal file
View File

@@ -0,0 +1,28 @@
name: release
on:
push:
tags:
- "v*"
permissions:
contents: write
jobs:
goreleaser:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v5
- name: Set up Go
uses: actions/setup-go@v5
- name: Run GoReleaser
uses: goreleaser/goreleaser-action@v6
with:
distribution: goreleaser
version: latest
args: release --clean
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -1,6 +1,8 @@
version: 2
project_name: bilinovel-downloader project_name: bilinovel-downloader
before: before:
hooks: hooks:
- go install github.com/a-h/templ/cmd/templ@latest
- templ generate - templ generate
builds: builds:
- env: - env:
@@ -12,16 +14,15 @@ builds:
goarch: goarch:
- amd64 - amd64
- arm64 - arm64
- arm
- "386" - "386"
ldflags: ldflags:
- -s -w -X bilinovel-downloader/cmd.Version={{ .Version }} - -s -w -X bilinovel-downloader/cmd.Version={{ .Version }}
flags: flags:
- -trimpath - -trimpath
archives: archives:
- format: tar.gz - formats: ["tar.gz"]
format_overrides: format_overrides:
- format: zip - formats: ["zip"]
goos: windows goos: windows
wrap_in_directory: true wrap_in_directory: true
release: release:
@@ -29,3 +30,17 @@ release:
upx: upx:
- enabled: true - enabled: true
compress: best compress: best
nfpms:
- id: bilinovel-downloader
homepage: https://github.com/bestnite/bilinovel-downloader
maintainer: Nite <admin@nite07.com>
license: "MIT"
formats:
- apk
- deb
- rpm
- termux.deb
- archlinux
provides:
- bilinovel-downloader

19
.vscode/launch.json vendored
View File

@@ -2,7 +2,7 @@
"version": "0.2.0", "version": "0.2.0",
"configurations": [ "configurations": [
{ {
"name": "download", "name": "novel",
"type": "go", "type": "go",
"request": "launch", "request": "launch",
"mode": "auto", "mode": "auto",
@@ -10,7 +10,22 @@
"args": [ "args": [
"download", "download",
"-n", "-n",
"3095", "2727",
"--concurrency",
"5"
]
},
{
"name": "volume",
"type": "go",
"request": "launch",
"mode": "auto",
"program": "${workspaceFolder}",
"args": [
"download",
"-n=2388",
"-v=84522",
"--debug=true"
] ]
} }
] ]

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Nite
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -22,3 +22,8 @@
```bash ```bash
bilinovel-downloader pack -d <目录路径> bilinovel-downloader pack -d <目录路径>
``` ```
## 算法分析
目前程序使用 playwright 进行爬取来规避 bilinovel 的反爬(诱饵段落和段落重排)策略。
但是依然对 bilinovel 的算法进行了简单的分析,具体可以参考[代码](./test/no_playwright_method_test.go),这个代码目前是可行的,但如果 bilinovel 频繁更改初始化种子的计算方式或算法的实现,会让排序方法失效,这也是为什么目前程序使用 playwright。

View File

@@ -1,16 +1,19 @@
package cmd package cmd
import ( import (
"bilinovel-downloader/downloader"
"bilinovel-downloader/downloader/bilinovel" "bilinovel-downloader/downloader/bilinovel"
"bilinovel-downloader/epub" "bilinovel-downloader/epub"
"bilinovel-downloader/model" "bilinovel-downloader/model"
"bilinovel-downloader/text" "bilinovel-downloader/text"
"encoding/json" "encoding/json"
"fmt" "fmt"
"log" "io"
"log/slog"
"os" "os"
"path/filepath" "path/filepath"
"github.com/playwright-community/playwright-go"
"github.com/spf13/cobra" "github.com/spf13/cobra"
) )
@@ -19,18 +22,31 @@ var downloadCmd = &cobra.Command{
Short: "Download a novel or volume", Short: "Download a novel or volume",
Long: "Download a novel or volume", Long: "Download a novel or volume",
Run: func(cmd *cobra.Command, args []string) { Run: func(cmd *cobra.Command, args []string) {
err := runDownloadNovel() slog.Info("Installing playwright")
err := playwright.Install(&playwright.RunOptions{
Browsers: []string{"chromium"},
Stdout: io.Discard,
})
if err != nil { if err != nil {
log.Printf("failed to download novel: %v", err) slog.Error("failed to install playwright")
return
}
err = runDownloadNovel()
if err != nil {
slog.Error("failed to download novel", slog.Any("error", err))
return
} }
}, },
} }
type downloadCmdArgs struct { type downloadCmdArgs struct {
NovelId int `validate:"required"` NovelId int `validate:"required"`
VolumeId int `validate:"required"` VolumeId int `validate:"required"`
outputPath string outputPath string
outputType string outputType string
concurrency int
debug bool
} }
var ( var (
@@ -42,18 +58,23 @@ func init() {
downloadCmd.Flags().IntVarP(&downloadArgs.VolumeId, "volume-id", "v", 0, "volume id") downloadCmd.Flags().IntVarP(&downloadArgs.VolumeId, "volume-id", "v", 0, "volume id")
downloadCmd.Flags().StringVarP(&downloadArgs.outputPath, "output-path", "o", "novels", "output path") downloadCmd.Flags().StringVarP(&downloadArgs.outputPath, "output-path", "o", "novels", "output path")
downloadCmd.Flags().StringVarP(&downloadArgs.outputType, "output-type", "t", "epub", "output type, epub or text") downloadCmd.Flags().StringVarP(&downloadArgs.outputType, "output-type", "t", "epub", "output type, epub or text")
downloadCmd.Flags().BoolVar(&downloadArgs.debug, "debug", false, "debug mode")
downloadCmd.Flags().IntVar(&downloadArgs.concurrency, "concurrency", 3, "concurrency of downloading volumes")
RootCmd.AddCommand(downloadCmd) RootCmd.AddCommand(downloadCmd)
} }
func runDownloadNovel() error { func runDownloadNovel() error {
downloader, err := bilinovel.New() downloader, err := bilinovel.New(bilinovel.BilinovelNewOption{
Concurrency: downloadArgs.concurrency,
Debug: downloadArgs.debug,
})
if err != nil { if err != nil {
return fmt.Errorf("failed to create downloader: %v", err) return fmt.Errorf("failed to create downloader: %v", err)
} }
// 确保在函数结束时关闭资源 // 确保在函数结束时关闭资源
defer func() { defer func() {
if closeErr := downloader.Close(); closeErr != nil { if closeErr := downloader.Close(); closeErr != nil {
log.Printf("Failed to close downloader: %v", closeErr) slog.Info("Failed to close downloader", slog.Any("error", closeErr))
} }
}() }()
@@ -63,16 +84,10 @@ func runDownloadNovel() error {
if downloadArgs.VolumeId == 0 { if downloadArgs.VolumeId == 0 {
// 下载整本小说 // 下载整本小说
novel, err := downloader.GetNovel(downloadArgs.NovelId, true) err := downloadNovel(downloader, downloadArgs.NovelId)
if err != nil { if err != nil {
return fmt.Errorf("failed to get novel: %v", err) return fmt.Errorf("failed to get novel: %v", err)
} }
for _, volume := range novel.Volumes {
err = downloadVolume(downloader, volume.Id)
if err != nil {
return fmt.Errorf("failed to download volume: %v", err)
}
}
} else { } else {
// 下载单卷 // 下载单卷
err = downloadVolume(downloader, downloadArgs.VolumeId) err = downloadVolume(downloader, downloadArgs.VolumeId)
@@ -84,7 +99,59 @@ func runDownloadNovel() error {
return nil return nil
} }
func downloadVolume(downloader model.Downloader, volumeId int) error { func downloadNovel(downloader downloader.Downloader, novelId int) error {
novelInfo, err := downloader.GetNovel(novelId, true, nil)
if err != nil {
return fmt.Errorf("failed to get novel info: %w", err)
}
skipVolumes := make([]int, 0)
for _, volume := range novelInfo.Volumes {
jsonPath := filepath.Join(downloadArgs.outputPath, fmt.Sprintf("volume-%d-%d.json", downloadArgs.NovelId, volume.Id))
err = os.MkdirAll(filepath.Dir(jsonPath), 0755)
if err != nil {
return fmt.Errorf("failed to create directory: %v", err)
}
_, err = os.Stat(jsonPath)
if err == nil {
// 已经下载
skipVolumes = append(skipVolumes, volume.Id)
}
}
novel, err := downloader.GetNovel(novelId, false, skipVolumes)
if err != nil {
return fmt.Errorf("failed to download novel: %w", err)
}
for _, volume := range novel.Volumes {
jsonPath := filepath.Join(downloadArgs.outputPath, fmt.Sprintf("volume-%d-%d.json", downloadArgs.NovelId, volume.Id))
err = os.MkdirAll(filepath.Dir(jsonPath), 0755)
if err != nil {
return fmt.Errorf("failed to create directory: %v", err)
}
jsonFile, err := os.Create(jsonPath)
if err != nil {
return fmt.Errorf("failed to create json file: %v", err)
}
err = json.NewEncoder(jsonFile).Encode(volume)
if err != nil {
return fmt.Errorf("failed to encode json file: %v", err)
}
switch downloadArgs.outputType {
case "epub":
err = epub.PackVolumeToEpub(volume, downloadArgs.outputPath, downloader.GetStyleCSS(), downloader.GetExtraFiles())
if err != nil {
return fmt.Errorf("failed to pack volume: %v", err)
}
case "text":
err = text.PackVolumeToText(volume, downloadArgs.outputPath)
if err != nil {
return fmt.Errorf("failed to pack volume: %v", err)
}
}
}
return nil
}
func downloadVolume(downloader downloader.Downloader, volumeId int) error {
jsonPath := filepath.Join(downloadArgs.outputPath, fmt.Sprintf("volume-%d-%d.json", downloadArgs.NovelId, volumeId)) jsonPath := filepath.Join(downloadArgs.outputPath, fmt.Sprintf("volume-%d-%d.json", downloadArgs.NovelId, volumeId))
err := os.MkdirAll(filepath.Dir(jsonPath), 0755) err := os.MkdirAll(filepath.Dir(jsonPath), 0755)
if err != nil { if err != nil {

View File

@@ -4,4 +4,6 @@ import (
"github.com/spf13/cobra" "github.com/spf13/cobra"
) )
var RootCmd = &cobra.Command{} var RootCmd = &cobra.Command{
Use: "bilinovel-downloader",
}

View File

@@ -6,7 +6,7 @@ import (
"github.com/spf13/cobra" "github.com/spf13/cobra"
) )
const ( var (
Version = "dev" Version = "dev"
) )

View File

@@ -4,24 +4,23 @@ import (
"bilinovel-downloader/model" "bilinovel-downloader/model"
"bilinovel-downloader/utils" "bilinovel-downloader/utils"
"bytes" "bytes"
"context"
"crypto/sha256" "crypto/sha256"
_ "embed" _ "embed"
"fmt" "fmt"
"log" "log/slog"
"net/http" "net/http"
"os" "os"
"path" "path"
"path/filepath" "path/filepath"
"regexp" "regexp"
"slices"
"strconv" "strconv"
"strings" "strings"
"time" "sync"
"github.com/PuerkitoBio/goquery" "github.com/PuerkitoBio/goquery"
mapper "github.com/bestnite/font-mapper" mapper "github.com/bestnite/font-mapper"
"github.com/chromedp/cdproto/network" "github.com/playwright-community/playwright-go"
"github.com/chromedp/chromedp"
) )
//go:embed read.ttf //go:embed read.ttf
@@ -36,27 +35,50 @@ type Bilinovel struct {
restyClient *utils.RestyClient restyClient *utils.RestyClient
// 浏览器实例复用 // 浏览器实例复用
allocCtx context.Context browser playwright.Browser
allocCancel context.CancelFunc browserContext playwright.BrowserContext
browserCtx context.Context pages map[string]playwright.Page
browserCancel context.CancelFunc concurrency int
concurrentChan chan any
logger *slog.Logger
} }
func New() (*Bilinovel, error) { type BilinovelNewOption struct {
Concurrency int
Debug bool
}
func New(option BilinovelNewOption) (*Bilinovel, error) {
fontMapper, err := mapper.NewGlyphOutlineMapper(readTTF, miLantingTTF) fontMapper, err := mapper.NewGlyphOutlineMapper(readTTF, miLantingTTF)
if err != nil { if err != nil {
return nil, fmt.Errorf("failed to create font mapper: %v", err) return nil, fmt.Errorf("failed to create font mapper: %v", err)
} }
restyClient := utils.NewRestyClient(50) restyClient := utils.NewRestyClient(50)
var logLevel slog.Level
if option.Debug {
logLevel = slog.LevelDebug
} else {
logLevel = slog.LevelInfo
}
handlerOptions := &slog.HandlerOptions{
Level: logLevel,
}
b := &Bilinovel{ b := &Bilinovel{
fontMapper: fontMapper, fontMapper: fontMapper,
textOnly: false, textOnly: false,
restyClient: restyClient, restyClient: restyClient,
pages: make(map[string]playwright.Page),
concurrency: option.Concurrency,
concurrentChan: make(chan any, option.Concurrency),
logger: slog.New(slog.NewTextHandler(os.Stdout, handlerOptions)),
} }
// 初始化浏览器实例 // 初始化浏览器实例
err = b.initBrowser() err = b.initBrowser(option.Debug)
if err != nil { if err != nil {
return nil, fmt.Errorf("failed to init browser: %v", err) return nil, fmt.Errorf("failed to init browser: %v", err)
} }
@@ -73,47 +95,38 @@ func (b *Bilinovel) GetExtraFiles() []model.ExtraFile {
} }
// initBrowser 初始化浏览器实例 // initBrowser 初始化浏览器实例
func (b *Bilinovel) initBrowser() error { func (b *Bilinovel) initBrowser(debug bool) error {
// 创建chromedp选项 pw, err := playwright.Run()
opts := append(chromedp.DefaultExecAllocatorOptions[:],
chromedp.Flag("headless", true),
chromedp.Flag("disable-gpu", true),
chromedp.Flag("disable-dev-shm-usage", true),
chromedp.Flag("disable-extensions", true),
chromedp.Flag("no-sandbox", true),
chromedp.Flag("disable-background-timer-throttling", true),
chromedp.Flag("disable-backgrounding-occluded-windows", true),
chromedp.Flag("disable-renderer-backgrounding", true),
)
var err error
b.allocCtx, b.allocCancel = chromedp.NewExecAllocator(context.Background(), opts...)
b.browserCtx, b.browserCancel = chromedp.NewContext(b.allocCtx)
// 预热浏览器 - 导航到空白页
err = chromedp.Run(b.browserCtx, chromedp.Navigate("about:blank"))
if err != nil { if err != nil {
b.closeBrowser() return fmt.Errorf("could not start playwright: %w", err)
return fmt.Errorf("failed to initialize browser: %v", err)
} }
log.Println("Browser initialized successfully") b.browser, err = pw.Chromium.Launch(playwright.BrowserTypeLaunchOptions{
Headless: playwright.Bool(!debug),
Devtools: playwright.Bool(debug),
})
if err != nil {
return fmt.Errorf("could not launch browser: %w", err)
}
b.browserContext, err = b.browser.NewContext()
if err != nil {
return fmt.Errorf("could not create browser context: %w", err)
}
b.logger.Info("Browser initialized successfully")
return nil return nil
} }
// closeBrowser 关闭浏览器实例 // Close 清理资源
func (b *Bilinovel) closeBrowser() {
if b.browserCancel != nil {
b.browserCancel()
}
if b.allocCancel != nil {
b.allocCancel()
}
}
// Close 关闭下载器时清理资源
func (b *Bilinovel) Close() error { func (b *Bilinovel) Close() error {
b.closeBrowser() if b.browser != nil {
if err := b.browser.Close(); err != nil {
b.logger.Error("could not close browser", slog.Any("error", err))
}
b.browser = nil
b.browserContext = nil
}
return nil return nil
} }
@@ -124,8 +137,8 @@ func (b *Bilinovel) GetStyleCSS() string {
return string(styleCSS) return string(styleCSS)
} }
func (b *Bilinovel) GetNovel(novelId int, skipChapter bool) (*model.Novel, error) { func (b *Bilinovel) GetNovel(novelId int, skipChapterContent bool, skipVolumes []int) (*model.Novel, error) {
log.Printf("Getting novel %v\n", novelId) b.logger.Info("Getting novel", slog.Int("novelId", novelId))
novelUrl := fmt.Sprintf("https://www.bilinovel.com/novel/%v.html", novelId) novelUrl := fmt.Sprintf("https://www.bilinovel.com/novel/%v.html", novelId)
resp, err := b.restyClient.R().Get(novelUrl) resp, err := b.restyClient.R().Get(novelUrl)
@@ -154,7 +167,7 @@ func (b *Bilinovel) GetNovel(novelId int, skipChapter bool) (*model.Novel, error
novel.Authors = append(novel.Authors, strings.TrimSpace(s.Text())) novel.Authors = append(novel.Authors, strings.TrimSpace(s.Text()))
}) })
volumes, err := b.getAllVolumes(novelId, skipChapter) volumes, err := b.getAllVolumes(novelId, skipChapterContent, skipVolumes)
if err != nil { if err != nil {
return nil, fmt.Errorf("failed to get novel volumes: %v", err) return nil, fmt.Errorf("failed to get novel volumes: %v", err)
} }
@@ -163,8 +176,8 @@ func (b *Bilinovel) GetNovel(novelId int, skipChapter bool) (*model.Novel, error
return novel, nil return novel, nil
} }
func (b *Bilinovel) GetVolume(novelId int, volumeId int, skipChapter bool) (*model.Volume, error) { func (b *Bilinovel) GetVolume(novelId int, volumeId int, skipChapterContent bool) (*model.Volume, error) {
log.Printf("Getting volume %v of novel %v\n", volumeId, novelId) b.logger.Info("Getting volume of novel", slog.Int("volumeId", volumeId), slog.Int("novelId", novelId))
novelUrl := fmt.Sprintf("https://www.bilinovel.com/novel/%v/catalog", novelId) novelUrl := fmt.Sprintf("https://www.bilinovel.com/novel/%v/catalog", novelId)
resp, err := b.restyClient.R().Get(novelUrl) resp, err := b.restyClient.R().Get(novelUrl)
@@ -238,7 +251,7 @@ func (b *Bilinovel) GetVolume(novelId int, volumeId int, skipChapter bool) (*mod
idRegexp := regexp.MustCompile(`/novel/(\d+)/(\d+).html`) idRegexp := regexp.MustCompile(`/novel/(\d+)/(\d+).html`)
if !skipChapter { if !skipChapterContent {
for i := range volume.Chapters { for i := range volume.Chapters {
matches := idRegexp.FindStringSubmatch(volume.Chapters[i].Url) matches := idRegexp.FindStringSubmatch(volume.Chapters[i].Url)
if len(matches) > 0 { if len(matches) > 0 {
@@ -261,8 +274,8 @@ func (b *Bilinovel) GetVolume(novelId int, volumeId int, skipChapter bool) (*mod
return volume, nil return volume, nil
} }
func (b *Bilinovel) getAllVolumes(novelId int, skipChapter bool) ([]*model.Volume, error) { func (b *Bilinovel) getAllVolumes(novelId int, skipChapterContent bool, skipVolumes []int) ([]*model.Volume, error) {
log.Printf("Getting all volumes of novel %v\n", novelId) b.logger.Info("Getting all volumes of novel", slog.Int("novelId", novelId))
catelogUrl := fmt.Sprintf("https://www.bilinovel.com/novel/%v/catalog", novelId) catelogUrl := fmt.Sprintf("https://www.bilinovel.com/novel/%v/catalog", novelId)
resp, err := b.restyClient.R().Get(catelogUrl) resp, err := b.restyClient.R().Get(catelogUrl)
@@ -289,27 +302,63 @@ func (b *Bilinovel) getAllVolumes(novelId int, skipChapter bool) ([]*model.Volum
} }
}) })
volumes := make([]*model.Volume, 0) volumes := make([]*model.Volume, len(volumeIds))
var wg sync.WaitGroup
var mu sync.Mutex // 保护 volumes 写入的互斥锁
for i, volumeIdStr := range volumeIds { for i, volumeIdStr := range volumeIds {
volumeId, err := strconv.Atoi(volumeIdStr) wg.Add(1)
if err != nil { b.concurrentChan <- struct{}{} // 获取一个并发槽
return nil, fmt.Errorf("failed to convert volume id: %v", err)
} go func(i int, volumeIdStr string) {
volume, err := b.GetVolume(novelId, volumeId, skipChapter) defer wg.Done()
if err != nil { defer func() { <-b.concurrentChan }() // 释放并发槽
return nil, fmt.Errorf("failed to get volume info: %v", err)
} volumeId, err := strconv.Atoi(volumeIdStr)
volume.SeriesIdx = i if err != nil {
volumes = append(volumes, volume) b.logger.Error("failed to convert volume id", slog.String("volumeIdStr", volumeIdStr), slog.Any("error", err))
return
}
if slices.Contains(skipVolumes, volumeId) {
return
}
volume, err := b.GetVolume(novelId, volumeId, skipChapterContent)
if err != nil {
b.logger.Error("failed to get volume info", slog.Int("novelId", novelId), slog.Int("volumeId", volumeId), slog.Any("error", err))
return
}
volume.SeriesIdx = i
// 关闭浏览器标签页
pwPageKey := fmt.Sprintf("%v-%v", novelId, volumeId)
if pwPage, ok := b.pages[pwPageKey]; ok {
_ = pwPage.Close()
delete(b.pages, pwPageKey)
}
mu.Lock()
volumes[i] = volume
mu.Unlock()
}(i, volumeIdStr)
} }
return volumes, nil wg.Wait()
// 过滤掉获取失败的 nil volume
filteredVolumes := make([]*model.Volume, 0, len(volumes))
for _, vol := range volumes {
if vol != nil {
filteredVolumes = append(filteredVolumes, vol)
}
}
return filteredVolumes, nil
} }
func (b *Bilinovel) GetChapter(novelId int, volumeId int, chapterId int) (*model.Chapter, error) { func (b *Bilinovel) GetChapter(novelId int, volumeId int, chapterId int) (*model.Chapter, error) {
log.Printf("Getting chapter %v of novel %v\n", chapterId, novelId) b.logger.Info("Getting chapter of novel", slog.Int("chapterId", chapterId), slog.Int("novelId", novelId))
page := 1 pageNum := 1
chapter := &model.Chapter{ chapter := &model.Chapter{
Id: chapterId, Id: chapterId,
NovelId: novelId, NovelId: novelId,
@@ -317,22 +366,33 @@ func (b *Bilinovel) GetChapter(novelId int, volumeId int, chapterId int) (*model
Url: fmt.Sprintf("https://www.bilinovel.com/novel/%v/%v.html", novelId, chapterId), Url: fmt.Sprintf("https://www.bilinovel.com/novel/%v/%v.html", novelId, chapterId),
} }
for { for {
hasNext, err := b.getChapterByPage(chapter, page) pwPageKey := fmt.Sprintf("%v-%v", novelId, volumeId)
if _, ok := b.pages[pwPageKey]; !ok {
pwPage, err := b.browserContext.NewPage()
if err != nil {
return nil, fmt.Errorf("failed to create browser page: %w", err)
}
b.pages[pwPageKey] = pwPage
}
hasNext, err := b.getChapterByPage(b.pages[pwPageKey], chapter, pageNum)
if err != nil { if err != nil {
return nil, fmt.Errorf("failed to download chapter: %w", err) return nil, fmt.Errorf("failed to download chapter: %w", err)
} }
if !hasNext { if !hasNext {
break break
} }
page++ pageNum++
} }
return chapter, nil return chapter, nil
} }
func (b *Bilinovel) getChapterByPage(chapter *model.Chapter, page int) (bool, error) { var nextPageUrlRegexp = regexp.MustCompile(`url_next:\s?['"]([^'"]*?)['"]`)
log.Printf("Getting chapter %v by page %v\n", chapter.Id, page) var cleanNextPageUrlRegexp = regexp.MustCompile(`(_\d+)?\.html$`)
Url := strings.TrimSuffix(chapter.Url, ".html") + fmt.Sprintf("_%v.html", page) func (b *Bilinovel) getChapterByPage(pwPage playwright.Page, chapter *model.Chapter, pageNum int) (bool, error) {
b.logger.Info("Getting chapter by page", slog.Int("chapter", chapter.Id), slog.Int("page", pageNum))
Url := strings.TrimSuffix(chapter.Url, ".html") + fmt.Sprintf("_%v.html", pageNum)
hasNext := false hasNext := false
headers := map[string]string{ headers := map[string]string{
@@ -353,8 +413,9 @@ func (b *Bilinovel) getChapterByPage(chapter *model.Chapter, page int) (bool, er
} }
html := resp.Body() html := resp.Body()
// 解决乱序问题 // 解决乱序问题
resortedHtml, err := b.processContentWithChromedp(string(html)) resortedHtml, err := b.processContentWithPlaywright(pwPage, string(html))
if err != nil { if err != nil {
return false, fmt.Errorf("failed to process html: %w", err) return false, fmt.Errorf("failed to process html: %w", err)
} }
@@ -363,7 +424,18 @@ func (b *Bilinovel) getChapterByPage(chapter *model.Chapter, page int) (bool, er
return false, fmt.Errorf("failed to parse html: %w", err) return false, fmt.Errorf("failed to parse html: %w", err)
} }
if page == 1 { // 判断章节是否有下一页
n := nextPageUrlRegexp.FindStringSubmatch(resortedHtml)
if len(n) != 2 {
return false, fmt.Errorf("failed to determine wether there is a next page")
}
s := cleanNextPageUrlRegexp.ReplaceAllString(n[1], "")
if strings.Contains(Url, s) {
hasNext = true
}
if pageNum == 1 {
chapter.Title = doc.Find("#atitle").Text() chapter.Title = doc.Find("#atitle").Text()
} }
content := doc.Find("#acontent").First() content := doc.Find("#acontent").First()
@@ -371,7 +443,7 @@ func (b *Bilinovel) getChapterByPage(chapter *model.Chapter, page int) (bool, er
content.Find("center").Remove() content.Find("center").Remove()
content.Find(".google-auto-placed").Remove() content.Find(".google-auto-placed").Remove()
if strings.Contains(resp.String(), `font-family: "read"`) { if strings.Contains(resortedHtml, `font-family: "read"`) {
html, err := content.Find("p").Last().Html() html, err := content.Find("p").Last().Html()
if err != nil { if err != nil {
return false, fmt.Errorf("failed to get html: %v", err) return false, fmt.Errorf("failed to get html: %v", err)
@@ -402,6 +474,7 @@ func (b *Bilinovel) getChapterByPage(chapter *model.Chapter, page int) (bool, er
imageFilename := fmt.Sprintf("%x%s", string(imageHash[:]), path.Ext(imgUrl)) imageFilename := fmt.Sprintf("%x%s", string(imageHash[:]), path.Ext(imgUrl))
s.SetAttr("src", imageFilename) s.SetAttr("src", imageFilename)
s.SetAttr("alt", imgUrl) s.SetAttr("alt", imgUrl)
s.RemoveAttr("class")
img, err := b.getImg(imgUrl) img, err := b.getImg(imgUrl)
if err != nil { if err != nil {
return return
@@ -416,6 +489,19 @@ func (b *Bilinovel) getChapterByPage(chapter *model.Chapter, page int) (bool, er
}) })
} }
doc.Find("*").Each(func(i int, s *goquery.Selection) {
if len(s.Nodes) > 0 && len(s.Nodes[0].Attr) > 0 {
// 遍历元素的所有属性
for _, attr := range s.Nodes[0].Attr {
// 3. 检查属性名是否以 "data-k" 开头,且属性值是否为空
if strings.HasPrefix(attr.Key, "data-k") {
// 4. 如果满足条件,就移除这个属性
s.RemoveAttr(attr.Key)
}
}
}
})
htmlStr, err := content.Html() htmlStr, err := content.Html()
if err != nil { if err != nil {
return false, fmt.Errorf("failed to get html: %v", err) return false, fmt.Errorf("failed to get html: %v", err)
@@ -430,7 +516,7 @@ func (b *Bilinovel) getChapterByPage(chapter *model.Chapter, page int) (bool, er
} }
func (b *Bilinovel) getImg(url string) ([]byte, error) { func (b *Bilinovel) getImg(url string) ([]byte, error) {
log.Printf("Getting img %v\n", url) b.logger.Info("Getting img", slog.String("url", url))
resp, err := b.restyClient.R().SetHeader("Referer", "https://www.bilinovel.com").Get(url) resp, err := b.restyClient.R().SetHeader("Referer", "https://www.bilinovel.com").Get(url)
if err != nil { if err != nil {
return nil, err return nil, err
@@ -439,9 +525,17 @@ func (b *Bilinovel) getImg(url string) ([]byte, error) {
return resp.Body(), nil return resp.Body(), nil
} }
// processContentWithChromedp 使用复用的浏览器实例处理内容 // processContentWithPlaywright 使用复用的浏览器实例处理内容
func (b *Bilinovel) processContentWithChromedp(htmlContent string) (string, error) { func (b *Bilinovel) processContentWithPlaywright(page playwright.Page, htmlContent string) (string, error) {
tempFile, err := os.CreateTemp("", "bilinovel-temp-*.html") // 替换 window.location.replace防止页面跳转
htmlContent = strings.ReplaceAll(htmlContent, "window.location.replace", "console.log")
tempPath := filepath.Join(os.TempDir(), "bilinovel-downloader")
err := os.MkdirAll(tempPath, 0755)
if err != nil {
return "", fmt.Errorf("failed to create temp dir: %w", err)
}
tempFile, err := os.CreateTemp(tempPath, "temp-*.html")
if err != nil { if err != nil {
return "", fmt.Errorf("failed to create temp file: %w", err) return "", fmt.Errorf("failed to create temp file: %w", err)
} }
@@ -454,55 +548,92 @@ func (b *Bilinovel) processContentWithChromedp(htmlContent string) (string, erro
tempFile.Close() tempFile.Close()
tempFilePath := tempFile.Name() tempFilePath := tempFile.Name()
// 为当前任务创建子上下文 // // 屏蔽请求
ctx, cancel := context.WithTimeout(b.browserCtx, 30*time.Second) // googleAdsDomains := []string{
defer cancel() // "adtrafficquality.google",
// "doubleclick.net",
// "googlesyndication.com",
// "googletagmanager.com",
// "hm.baidu.com",
// "cloudflareinsights.com",
// "fsdoa.js", // adblock 检测
// "https://www.linovelib.com/novel/", // 阻止从本地文件跳转到在线页面
// }
// err = page.Route("**/*", func(route playwright.Route) {
// for _, d := range googleAdsDomains {
// if strings.Contains(route.Request().URL(), d) {
// b.logger.Debug("blocking request", slog.String("url", route.Request().URL()))
// err := route.Abort("aborted")
// if err != nil {
// b.logger.Debug("failed to block request", route.Request().URL(), err)
// }
// return
// }
// }
// _ = route.Continue()
// })
// if err != nil {
// return "", fmt.Errorf("failed to intercept requests: %w", err)
// }
var processedHTML string _, err = page.ExpectResponse(func(url string) bool {
return strings.Contains(url, "chapterlog.js")
}, func() error {
_, err = page.Goto("file://" + filepath.ToSlash(tempFilePath))
if err != nil {
return fmt.Errorf("could not navigate to file: %w", err)
}
return nil
}, playwright.PageExpectResponseOptions{
Timeout: playwright.Float(10000),
})
if err != nil {
return "", fmt.Errorf("failed to wait for network request finish")
}
// 执行处理任务 err = page.Locator("#acontent").WaitFor(playwright.LocatorWaitForOptions{
err = chromedp.Run(ctx, State: playwright.WaitForSelectorStateVisible,
network.Enable(), Timeout: playwright.Float(10000),
})
if err != nil {
return "", fmt.Errorf("could not wait for #acontent: %w", err)
}
// 等待JavaScript执行完成 // 遍历所有 #acontent 的子元素, 通过 window.getComputedStyle().display 检测是否是 none, 如果是 none 则从页面删除这个元素
chromedp.ActionFunc(func(ctx context.Context) error { result, err := page.Evaluate(`
// 监听网络事件 (function() {
networkEventChan := make(chan bool, 1) const acontent = document.getElementById('acontent');
requestID := "" if (!acontent) {
chromedp.ListenTarget(ctx, func(ev interface{}) { return 'acontent element not found';
switch ev := ev.(type) { }
case *network.EventRequestWillBeSent:
if strings.Contains(ev.Request.URL, "chapterlog.js") { let removedCount = 0;
requestID = ev.RequestID.String() const elements = acontent.querySelectorAll('*');
}
case *network.EventLoadingFinished: // 从后往前遍历,避免删除元素时影响索引
if ev.RequestID.String() == requestID { for (let i = elements.length - 1; i >= 0; i--) {
networkEventChan <- true const element = elements[i];
} const computedStyle = window.getComputedStyle(element);
if (computedStyle.display === 'none' || computedStyle.transform == 'matrix(0, 0, 0, 0, 0, 0)') {
element.remove();
removedCount++;
} }
}) }
go func() { return 'Removed ' + removedCount + ' hidden elements';
select { })()
case <-networkEventChan: `)
case <-time.After(30 * time.Second):
log.Println("Timeout waiting for external script")
case <-ctx.Done():
log.Println("Context cancelled")
}
}()
return nil
}),
// 导航到本地文件
chromedp.Navigate("file://"+filepath.ToSlash(tempFilePath)),
// 等待页面加载完成
chromedp.WaitVisible(`#acontent`, chromedp.ByID),
// 获取页面的HTML代码
chromedp.OuterHTML("html", &processedHTML, chromedp.ByQuery),
)
if err != nil { if err != nil {
return "", fmt.Errorf("chromedp execution failed: %w", err) return "", fmt.Errorf("failed to remove hidden elements: %w", err)
}
b.logger.Debug("Hidden elements removal result", slog.Any("count", result))
processedHTML, err := page.Content()
if err != nil {
return "", fmt.Errorf("could not get page content: %w", err)
} }
return processedHTML, nil return processedHTML, nil

12
downloader/downloader.go Normal file
View File

@@ -0,0 +1,12 @@
package downloader
import "bilinovel-downloader/model"
type Downloader interface {
GetNovel(novelId int, skipChapterContent bool, skipVolumes []int) (*model.Novel, error)
GetVolume(novelId int, volumeId int, skipChapterContent bool) (*model.Volume, error)
GetChapter(novelId int, volumeId int, chapterId int) (*model.Chapter, error)
GetStyleCSS() string
GetExtraFiles() []model.ExtraFile
Close() error
}

View File

@@ -339,6 +339,8 @@ func addDirContentToZip(zipWriter *zip.Writer, dirPath string, method uint16) er
return err return err
} }
relPath = filepath.ToSlash(relPath)
file, err := os.Open(filePath) file, err := os.Open(filePath)
if err != nil { if err != nil {
return err return err

12
go.mod
View File

@@ -6,24 +6,20 @@ require (
github.com/PuerkitoBio/goquery v1.10.3 github.com/PuerkitoBio/goquery v1.10.3
github.com/a-h/templ v0.3.943 github.com/a-h/templ v0.3.943
github.com/bestnite/font-mapper v0.0.0-20250823155658-56c76d820267 github.com/bestnite/font-mapper v0.0.0-20250823155658-56c76d820267
github.com/chromedp/cdproto v0.0.0-20250803210736-d308e07a266d
github.com/chromedp/chromedp v0.14.1
github.com/go-resty/resty/v2 v2.16.5 github.com/go-resty/resty/v2 v2.16.5
github.com/google/uuid v1.6.0 github.com/google/uuid v1.6.0
github.com/playwright-community/playwright-go v0.5200.1
github.com/spf13/cobra v1.9.1 github.com/spf13/cobra v1.9.1
) )
require ( require (
github.com/andybalholm/cascadia v1.3.3 // indirect github.com/andybalholm/cascadia v1.3.3 // indirect
github.com/chromedp/sysutil v1.1.0 // indirect github.com/deckarep/golang-set/v2 v2.8.0 // indirect
github.com/go-json-experiment/json v0.0.0-20250813233538-9b1f9ea2e11b // indirect github.com/go-jose/go-jose/v3 v3.0.4 // indirect
github.com/gobwas/httphead v0.1.0 // indirect github.com/go-stack/stack v1.8.1 // indirect
github.com/gobwas/pool v0.2.1 // indirect
github.com/gobwas/ws v1.4.0 // indirect
github.com/golang/freetype v0.0.0-20170609003504-e2365dfdc4a0 // indirect github.com/golang/freetype v0.0.0-20170609003504-e2365dfdc4a0 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/spf13/pflag v1.0.7 // indirect github.com/spf13/pflag v1.0.7 // indirect
golang.org/x/image v0.30.0 // indirect golang.org/x/image v0.30.0 // indirect
golang.org/x/net v0.43.0 // indirect golang.org/x/net v0.43.0 // indirect
golang.org/x/sys v0.35.0 // indirect
) )

48
go.sum
View File

@@ -1,49 +1,48 @@
github.com/PuerkitoBio/goquery v1.10.3 h1:pFYcNSqHxBD06Fpj/KsbStFRsgRATgnf3LeXiUkhzPo= github.com/PuerkitoBio/goquery v1.10.3 h1:pFYcNSqHxBD06Fpj/KsbStFRsgRATgnf3LeXiUkhzPo=
github.com/PuerkitoBio/goquery v1.10.3/go.mod h1:tMUX0zDMHXYlAQk6p35XxQMqMweEKB7iK7iLNd4RH4Y= github.com/PuerkitoBio/goquery v1.10.3/go.mod h1:tMUX0zDMHXYlAQk6p35XxQMqMweEKB7iK7iLNd4RH4Y=
github.com/a-h/templ v0.3.906 h1:ZUThc8Q9n04UATaCwaG60pB1AqbulLmYEAMnWV63svg=
github.com/a-h/templ v0.3.906/go.mod h1:FFAu4dI//ESmEN7PQkJ7E7QfnSEMdcnu7QrAY8Dn334=
github.com/a-h/templ v0.3.943 h1:o+mT/4yqhZ33F3ootBiHwaY4HM5EVaOJfIshvd5UNTY= github.com/a-h/templ v0.3.943 h1:o+mT/4yqhZ33F3ootBiHwaY4HM5EVaOJfIshvd5UNTY=
github.com/a-h/templ v0.3.943/go.mod h1:oCZcnKRf5jjsGpf2yELzQfodLphd2mwecwG4Crk5HBo= github.com/a-h/templ v0.3.943/go.mod h1:oCZcnKRf5jjsGpf2yELzQfodLphd2mwecwG4Crk5HBo=
github.com/andybalholm/cascadia v1.3.3 h1:AG2YHrzJIm4BZ19iwJ/DAua6Btl3IwJX+VI4kktS1LM= github.com/andybalholm/cascadia v1.3.3 h1:AG2YHrzJIm4BZ19iwJ/DAua6Btl3IwJX+VI4kktS1LM=
github.com/andybalholm/cascadia v1.3.3/go.mod h1:xNd9bqTn98Ln4DwST8/nG+H0yuB8Hmgu1YHNnWw0GeA= github.com/andybalholm/cascadia v1.3.3/go.mod h1:xNd9bqTn98Ln4DwST8/nG+H0yuB8Hmgu1YHNnWw0GeA=
github.com/bestnite/font-mapper v0.0.0-20250823155658-56c76d820267 h1:nmUTJV2u/0XmVjQ++VIy/Hu+MtxdpQvOevvcSZtUATA= github.com/bestnite/font-mapper v0.0.0-20250823155658-56c76d820267 h1:nmUTJV2u/0XmVjQ++VIy/Hu+MtxdpQvOevvcSZtUATA=
github.com/bestnite/font-mapper v0.0.0-20250823155658-56c76d820267/go.mod h1:cfB1e9YhoI/QWrXPp3h6QVAKU6iCI2ifbjRPHP3xf/0= github.com/bestnite/font-mapper v0.0.0-20250823155658-56c76d820267/go.mod h1:cfB1e9YhoI/QWrXPp3h6QVAKU6iCI2ifbjRPHP3xf/0=
github.com/chromedp/cdproto v0.0.0-20250803210736-d308e07a266d h1:ZtA1sedVbEW7EW80Iz2GR3Ye6PwbJAJXjv7D74xG6HU=
github.com/chromedp/cdproto v0.0.0-20250803210736-d308e07a266d/go.mod h1:NItd7aLkcfOA/dcMXvl8p1u+lQqioRMq/SqDp71Pb/k=
github.com/chromedp/chromedp v0.14.1 h1:0uAbnxewy/Q+Bg7oafVePE/6EXEho9hnaC38f+TTENg=
github.com/chromedp/chromedp v0.14.1/go.mod h1:rHzAv60xDE7VNy/MYtTUrYreSc0ujt2O1/C3bzctYBo=
github.com/chromedp/sysutil v1.1.0 h1:PUFNv5EcprjqXZD9nJb9b/c9ibAbxiYo4exNWZyipwM=
github.com/chromedp/sysutil v1.1.0/go.mod h1:WiThHUdltqCNKGc4gaU50XgYjwjYIhKWoHGPTUfWTJ8=
github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g= github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g=
github.com/go-json-experiment/json v0.0.0-20250813233538-9b1f9ea2e11b h1:6Q4zRHXS/YLOl9Ng1b1OOOBWMidAQZR3Gel0UKPC/KU= github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/go-json-experiment/json v0.0.0-20250813233538-9b1f9ea2e11b/go.mod h1:TiCD2a1pcmjd7YnhGH0f/zKNcCD06B029pHhzV23c2M= github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/deckarep/golang-set/v2 v2.8.0 h1:swm0rlPCmdWn9mESxKOjWk8hXSqoxOp+ZlfuyaAdFlQ=
github.com/deckarep/golang-set/v2 v2.8.0/go.mod h1:VAky9rY/yGXJOLEDv3OMci+7wtDpOF4IN+y82NBOac4=
github.com/go-jose/go-jose/v3 v3.0.4 h1:Wp5HA7bLQcKnf6YYao/4kpRpVMp/yf6+pJKV8WFSaNY=
github.com/go-jose/go-jose/v3 v3.0.4/go.mod h1:5b+7YgP7ZICgJDBdfjZaIt+H/9L9T/YQrVfLAMboGkQ=
github.com/go-resty/resty/v2 v2.16.5 h1:hBKqmWrr7uRc3euHVqmh1HTHcKn99Smr7o5spptdhTM= github.com/go-resty/resty/v2 v2.16.5 h1:hBKqmWrr7uRc3euHVqmh1HTHcKn99Smr7o5spptdhTM=
github.com/go-resty/resty/v2 v2.16.5/go.mod h1:hkJtXbA2iKHzJheXYvQ8snQES5ZLGKMwQ07xAwp/fiA= github.com/go-resty/resty/v2 v2.16.5/go.mod h1:hkJtXbA2iKHzJheXYvQ8snQES5ZLGKMwQ07xAwp/fiA=
github.com/gobwas/httphead v0.1.0 h1:exrUm0f4YX0L7EBwZHuCF4GDp8aJfVeBrlLQrs6NqWU= github.com/go-stack/stack v1.8.1 h1:ntEHSVwIt7PNXNpgPmVfMrNhLtgjlmnZha2kOpuRiDw=
github.com/gobwas/httphead v0.1.0/go.mod h1:O/RXo79gxV8G+RqlR/otEwx4Q36zl9rqC5u12GKvMCM= github.com/go-stack/stack v1.8.1/go.mod h1:dcoOX6HbPZSZptuspn9bctJ+N/CnF5gGygcUP3XYfe4=
github.com/gobwas/pool v0.2.1 h1:xfeeEhW7pwmX8nuLVlqbzVc7udMDrwetjEv+TZIz1og=
github.com/gobwas/pool v0.2.1/go.mod h1:q8bcK0KcYlCgd9e7WYLm9LpyS+YeLd8JVDW6WezmKEw=
github.com/gobwas/ws v1.4.0 h1:CTaoG1tojrh4ucGPcoJFiAQUAsEWekEWvLy7GsVNqGs=
github.com/gobwas/ws v1.4.0/go.mod h1:G3gNqMNtPppf5XUz7O4shetPpcZ1VJ7zt18dlUeakrc=
github.com/golang/freetype v0.0.0-20170609003504-e2365dfdc4a0 h1:DACJavvAHhabrF08vX0COfcOBJRhZ8lUbR+ZWIs0Y5g= github.com/golang/freetype v0.0.0-20170609003504-e2365dfdc4a0 h1:DACJavvAHhabrF08vX0COfcOBJRhZ8lUbR+ZWIs0Y5g=
github.com/golang/freetype v0.0.0-20170609003504-e2365dfdc4a0/go.mod h1:E/TSTwGwJL78qG/PmXZO1EjYhfJinVAhrmmHX6Z8B9k= github.com/golang/freetype v0.0.0-20170609003504-e2365dfdc4a0/go.mod h1:E/TSTwGwJL78qG/PmXZO1EjYhfJinVAhrmmHX6Z8B9k=
github.com/google/go-cmp v0.5.9/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI= github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY= github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0= github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo= github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8= github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw= github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
github.com/ledongthuc/pdf v0.0.0-20220302134840-0c2507a12d80 h1:6Yzfa6GP0rIo/kULo2bwGEkFvCePZ3qHDDTC3/J9Swo= github.com/mitchellh/go-ps v1.0.0 h1:i6ampVEEF4wQFF+bkYfwYgY+F/uYJDktmvLPf7qIgjc=
github.com/ledongthuc/pdf v0.0.0-20220302134840-0c2507a12d80/go.mod h1:imJHygn/1yfhB7XSJJKlFZKl/J+dCPAknuiaGOshXAs= github.com/mitchellh/go-ps v1.0.0/go.mod h1:J4lOc8z8yJs6vUwklHw2XEIiT4z4C40KtWVN3nvg8Pg=
github.com/orisano/pixelmatch v0.0.0-20220722002657-fb0b55479cde h1:x0TT0RDC7UhAVbbWWBzr41ElhJx5tXPWkIHA2HWPRuw= github.com/playwright-community/playwright-go v0.5200.1 h1:Sm2oOuhqt0M5Y4kUi/Qh9w4cyyi3ZIWTBeGKImc2UVo=
github.com/orisano/pixelmatch v0.0.0-20220722002657-fb0b55479cde/go.mod h1:nZgzbfBr3hhjoZnS66nKrHmduYNpc34ny7RK4z5/HM0= github.com/playwright-community/playwright-go v0.5200.1/go.mod h1:UnnyQZaqUOO5ywAZu60+N4EiWReUqX1MQBBA3Oofvf8=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM= github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/spf13/cobra v1.9.1 h1:CXSaggrXdbHK9CF+8ywj8Amf7PBRmPCOJugH954Nnlo= github.com/spf13/cobra v1.9.1 h1:CXSaggrXdbHK9CF+8ywj8Amf7PBRmPCOJugH954Nnlo=
github.com/spf13/cobra v1.9.1/go.mod h1:nDyEzZ8ogv936Cinf6g1RU9MRY64Ir93oCnqb9wxYW0= github.com/spf13/cobra v1.9.1/go.mod h1:nDyEzZ8ogv936Cinf6g1RU9MRY64Ir93oCnqb9wxYW0=
github.com/spf13/pflag v1.0.6 h1:jFzHGLGAlb3ruxLB8MhbI6A8+AQX/2eW4qeyNZXNp2o=
github.com/spf13/pflag v1.0.6/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg= github.com/spf13/pflag v1.0.6/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
github.com/spf13/pflag v1.0.7 h1:vN6T9TfwStFPFM5XzjsvmzZkLuaLX+HS+0SeFLRgU6M= github.com/spf13/pflag v1.0.7 h1:vN6T9TfwStFPFM5XzjsvmzZkLuaLX+HS+0SeFLRgU6M=
github.com/spf13/pflag v1.0.7/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg= github.com/spf13/pflag v1.0.7/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA=
github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY= github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc= golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
@@ -67,8 +66,6 @@ golang.org/x/net v0.15.0/go.mod h1:idbUs1IY1+zTqbi8yxTbhexhEEk5ur9LInksu6HrEpk=
golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44= golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44=
golang.org/x/net v0.25.0/go.mod h1:JkAGAh7GEvH74S6FOH42FLoXpXbE/aqXSrIQjXgsiwM= golang.org/x/net v0.25.0/go.mod h1:JkAGAh7GEvH74S6FOH42FLoXpXbE/aqXSrIQjXgsiwM=
golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4= golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4=
golang.org/x/net v0.39.0 h1:ZCu7HMWDxpXpaiKdhzIfaltL9Lp31x/3fCP11bc6/fY=
golang.org/x/net v0.39.0/go.mod h1:X7NRbYVEA+ewNkCNyJ513WmMdQ3BineSwVtN2zD/d+E=
golang.org/x/net v0.43.0 h1:lat02VYK2j4aLzMzecihNvTlJNQUq316m2Mr9rnM6YE= golang.org/x/net v0.43.0 h1:lat02VYK2j4aLzMzecihNvTlJNQUq316m2Mr9rnM6YE=
golang.org/x/net v0.43.0/go.mod h1:vhO1fvI4dGsIjh73sWfUVjj3N7CA9WkKJNQm2svM6Jg= golang.org/x/net v0.43.0/go.mod h1:vhO1fvI4dGsIjh73sWfUVjj3N7CA9WkKJNQm2svM6Jg=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
@@ -84,14 +81,11 @@ golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBc
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.20.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= golang.org/x/sys v0.20.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.28.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= golang.org/x/sys v0.28.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.35.0 h1:vz1N37gP5bs89s7He8XuIYXpyY0+QlsKmzipCbUtyxI=
golang.org/x/sys v0.35.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
golang.org/x/telemetry v0.0.0-20240228155512-f48c80bd79b2/go.mod h1:TeRTkGYfJXctD9OcfyVLyj2J3IxLnKwHJR8f4D8a3YE= golang.org/x/telemetry v0.0.0-20240228155512-f48c80bd79b2/go.mod h1:TeRTkGYfJXctD9OcfyVLyj2J3IxLnKwHJR8f4D8a3YE=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8= golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
@@ -120,4 +114,6 @@ golang.org/x/tools v0.13.0/go.mod h1:HvlwmtVNQAhOuCjW7xxvovg8wbNq7LwfXh/k7wXUl58
golang.org/x/tools v0.21.1-0.20240508182429-e35e4ccd0d2d/go.mod h1:aiJjzUbINMkxbQROHiO6hDPo2LHcIPhhQsa9DLh0yGk= golang.org/x/tools v0.21.1-0.20240508182429-e35e4ccd0d2d/go.mod h1:aiJjzUbINMkxbQROHiO6hDPo2LHcIPhhQsa9DLh0yGk=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

View File

@@ -1,16 +0,0 @@
package model
type ExtraFile struct {
Data []byte
Path string
ManifestItem ManifestItem
}
type Downloader interface {
GetNovel(novelId int, skipChapter bool) (*Novel, error)
GetVolume(novelId int, volumeId int, skipChapter bool) (*Volume, error)
GetChapter(novelId int, volumeId int, chapterId int) (*Chapter, error)
GetStyleCSS() string
GetExtraFiles() []ExtraFile
Close() error
}

View File

@@ -1,6 +1,14 @@
package model package model
import "encoding/xml" import (
"encoding/xml"
)
type ExtraFile struct {
Data []byte
Path string
ManifestItem ManifestItem
}
type DublinCoreMetadata struct { type DublinCoreMetadata struct {
XMLName xml.Name `xml:"metadata"` XMLName xml.Name `xml:"metadata"`

View File

@@ -8,12 +8,12 @@ import (
) )
func TestBilinovel_GetNovel(t *testing.T) { func TestBilinovel_GetNovel(t *testing.T) {
bilinovel, err := bilinovel.New() bilinovel, err := bilinovel.New(bilinovel.BilinovelNewOption{Concurrency: 5})
bilinovel.SetTextOnly(true) bilinovel.SetTextOnly(true)
if err != nil { if err != nil {
t.Fatalf("failed to create bilinovel: %v", err) t.Fatalf("failed to create bilinovel: %v", err)
} }
novel, err := bilinovel.GetNovel(4519, false) novel, err := bilinovel.GetNovel(2727, false, nil)
if err != nil { if err != nil {
t.Fatalf("failed to get novel: %v", err) t.Fatalf("failed to get novel: %v", err)
} }
@@ -25,12 +25,12 @@ func TestBilinovel_GetNovel(t *testing.T) {
} }
func TestBilinovel_GetVolume(t *testing.T) { func TestBilinovel_GetVolume(t *testing.T) {
bilinovel, err := bilinovel.New() bilinovel, err := bilinovel.New(bilinovel.BilinovelNewOption{Concurrency: 1})
bilinovel.SetTextOnly(true) bilinovel.SetTextOnly(true)
if err != nil { if err != nil {
t.Fatalf("failed to create bilinovel: %v", err) t.Fatalf("failed to create bilinovel: %v", err)
} }
volume, err := bilinovel.GetVolume(1410, 52748, false) volume, err := bilinovel.GetVolume(2727, 129092, false)
if err != nil { if err != nil {
t.Fatalf("failed to get volume: %v", err) t.Fatalf("failed to get volume: %v", err)
} }
@@ -42,11 +42,12 @@ func TestBilinovel_GetVolume(t *testing.T) {
} }
func TestBilinovel_GetChapter(t *testing.T) { func TestBilinovel_GetChapter(t *testing.T) {
bilinovel, err := bilinovel.New() bilinovel, err := bilinovel.New(bilinovel.BilinovelNewOption{Concurrency: 1})
bilinovel.SetTextOnly(true)
if err != nil { if err != nil {
t.Fatalf("failed to create bilinovel: %v", err) t.Fatalf("failed to create bilinovel: %v", err)
} }
chapter, err := bilinovel.GetChapter(1410, 52748, 52752) chapter, err := bilinovel.GetChapter(2727, 129092, 129094)
if err != nil { if err != nil {
t.Fatalf("failed to get chapter: %v", err) t.Fatalf("failed to get chapter: %v", err)
} }

View File

@@ -0,0 +1,318 @@
package test
import (
"fmt"
"log"
"strings"
"testing"
"github.com/PuerkitoBio/goquery"
)
// unscrambleParagraphs 函数的核心功能是接收一个乱序的段落列表,
// 并根据 chapterID 将它们重新排序为正确的阅读顺序。
// 算法来源 https://www.bilinovel.com/themes/zhmb/js/chapterlog.js?v1006c1
// 反混淆工具 https://obf-io.deobfuscate.io http://jsnice.org
// 这个方案是可行的,但如果 bilinovel 频繁更改初始化种子的计算方式或算法的实现,会让排序方法失效,可能 playwright 还是最优解。
func unscrambleParagraphs(scrambledParagraphs []*goquery.Selection, chapterID int) []*goquery.Selection {
j := len(scrambledParagraphs)
// 根据JS逻辑如果段落数小于等于20则不进行排序
if j <= 20 {
return scrambledParagraphs
}
// 1. 精确复刻JS中的伪随机数生成器和洗牌算法以得到正确的索引映射关系。
// 初始化种子
ms := int64(chapterID*127 + 235)
// value 数组存放的是需要被打乱的、从20开始的段落的相对索引0, 1, 2...
value := make([]int, j-20)
for i := range value {
value[i] = i
}
// 执行与JS完全相同的 Fisher-Yates-like 洗牌算法
for i := len(value) - 1; i > 0; i-- {
ms = (ms*9302 + 49397) % 233280
prop := int(float64(ms) / 233280.0 * float64(i+1))
// 交换元素
value[i], value[prop] = value[prop], value[i]
}
// 2. 构建最终的索引映射表 (aProperties)。
// 这个表告诉我们,乱序列表中的每一项,应该被放到正确顺序列表的哪个位置。
aProperties := make([]int, j)
// 前20个段落顺序不变
for i := range 20 {
aProperties[i] = i
}
// 后续的段落使用洗牌后的索引并加上20的偏移量
for i := range value {
aProperties[i+20] = value[i] + 20
}
// 3. 根据索引映射关系,从乱序列表中恢复出正确顺序。
// JS逻辑: elements[aProperties[i]] = out[i].node
// 翻译过来就是:乱序列表中的第 `i` 项 (scrambledParagraphs[i])
// 它在最终排好序的列表中的正确位置应该是 `aProperties[i]`。
correctlyOrdered := make([]*goquery.Selection, j)
for i := range j {
correctPosition := aProperties[i]
correctlyOrdered[correctPosition] = scrambledParagraphs[i]
}
return correctlyOrdered
}
func TestResortDom(t *testing.T) {
// --- 步骤 1: 准备原始HTML ---
// 请将您用 http 请求获取到的、未经处理的完整HTML源码粘贴到这里。
// 这里使用的是您之前提供的原始HTML作为示例。
unprocessedHtmlContent := `
<!DOCTYPE html>
<html lang="zh-Hans">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>女主角? 圣女? 不,我是全业女仆(自豪)! 第1章 第1话 目标成为女仆的少女_哔哩轻小说</title>
<meta name="keywords" content="女主角? 圣女? 不,我是全业女仆(自豪)!,第1话 目标成为女仆的少女,哔哩轻小说" />
<meta name="description" content="哔哩轻小说提供 あてきち 所创作的 女主角? 圣女? 不,我是全业女仆(自豪)! 第1章 第1话 目标成为女仆的少女 在线阅读与TXT,epub下载" />
<meta name="viewport" content="initial-scale=1.0,minimum-scale=1.0,user-scalable=yes,width=device-width" />
<meta name="theme-color" content="#232323" media="(prefers-color-scheme: dark)" />
<meta name="applicable-device" content="mobile" />
<link rel="stylesheet" href="https://www.bilinovel.com/themes/zhmb/css/read.css?v0409c2">
<link rel="stylesheet" href="https://www.bilinovel.com/themes/zhmb/css/chapter.css?v1126a9">
<link rel="dns-preconnect" href="https://www.bilinovel.com">
<link rel="alternate" hreflang="zh-Hant" href="https://tw.linovelib.com/novel/4126/236197.html" />
<script src="https://www.bilinovel.com/themes/zhmb/js/jquery-3.3.1.js"></script>
<script type="text/javascript" src="/scripts/darkmode.js"></script>
<script async src="https://www.bilinovel.com/themes/zhmb/js/lazysizes.min.js"></script>
<script src="https://www.bilinovel.com/scripts/common.js?v0922a3"></script>
<script src="https://www.bilinovel.com/scripts/zation.js?v1004a4"></script>
<style>.center-note{text-align: center; margin: 0; height: 50vh; display: flex ; justify-content: center; align-items: center;}.sum1{display:none}.footlink a{box-shadow: 0 0 1px rgba(150,150,150,.6);}.footlink a:nth-child(1){display: inline-block;margin-bottom: 10px;width: 90%;}.footlink a:nth-child(2){padding: 5px 10px;float: left;width: 35%;margin-left: 5%;}.footlink a:nth-child(3){padding: 5px 10px;float: right;width: 35%;margin-right: 5%;}.footlink a:nth-child(4){display: inline-block;margin-top: 10px;width: 90%;}#acontent{text-align: unset;}</style>
<script type="text/javascript">var ual = navigator.language.toLowerCase();var isWindows = navigator.platform.toLowerCase().includes("win");if(ual == 'zh-tw' || ual == 'zh-hk'){window.location.replace("https://tw.linovelib.com/novel/4126/236197.html");}if (ual === 'zh-cn' && isWindows) { window.location.replace("https://www.linovelib.com/novel/4126/236197.html");}</script>
</head>
<body id="aread">
<script type="text/javascript">var ReadParams={url_previous:'/novel/4126/236196.html',url_next:'/novel/4126/236197_2.html',url_index:'/novel/4126/catalog',url_articleinfo:'/novel/4126/vol_236194.html',url_image:'https://www.bilinovel.com/files/article/image/4/4126/4126s.jpg',url_home:'https://www.bilinovel.com/',articleid:'4126',articlename:'女主角? 圣女? 不,我是全业女仆(自豪)!',subid:'/4',author:'あてきち',chapterid:'236197',page:'1',chaptername:'第1章 第1话 目标成为女仆的少女',chapterisvip:'0',userid:'0',readtime:'1761057661'}</script>
<div class="main">
<div id="abox" class="abox">
<div id="apage" class="apage">
<div class="atitle"><h1 id="atitle">第1话 目标成为女仆的少女</h1><h3>第1章</h3></div>
<div id="acontent" class="contente"><div class="cgo"><!--<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8799828951681010"
crossorigin="anonymous"></script>
<ins class="adsbygoogle"
style="display:block"
data-ad-client="ca-pub-8799828951681010"
data-ad-slot="2277430192"
data-ad-format="auto"
data-full-width-responsive="true"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>--></div><p>「欢迎回来,老爷。」</p>
<br>
<p>一位少女恭敬地弯腰向走进木质大门的绅士致意。</p>
<p>少女穿着一件做工精致的黑色连衣裙,上面系着花边以及刺绣、并不华丽的纯白围裙,梳成编辫的黑发上系着可爱的蕾丝头带。</p>
<br>
<p>无论从哪个角度看,都是迎接主人归来的女仆样子。</p>
<br>
<p>「啊,我回来了」</p>
<br>
<p>绅士把帽子和大衣交给恭敬地弯腰的女仆,用温柔的语气回答。</p>
<br>
<p>「我马上为您准备茶水。请问您想要哪一款?」</p>
<p>「那么,我想要一杯伯爵红茶。」</p>
<p>「要加牛奶之类的吗?」</p>
<p>「不,不用了。」</p>
<p>「遵命。茶点要什么呢?」</p>
<p>「嗯,就交给你吧。拜托了?」</p>
<br>
<p>对着绅士的话语,身为女仆的少女露出了轻柔的微笑。她可能只有十五、六岁吧。脸上还带着稚气,但未来值得期待,可爱又温柔的容貌。</p>
<br>
<p>「请交给我,我会准备合您口味的茶点。」</p>
<p>「啊,拜托了。」</p>
<br>
<p>女仆少女将帽子和大衣挂在衣架上,然后引导绅士到餐桌。</p>
<br>
<p></p>
<br>
<p>「那么,我要出门了。」</p>
<p>「好的,老爷」</p>
<p>「下次回来时,如果能再让妳接待就好了……」</p>
<br>
<p>「下次她想要带朋友在露台喝茶,也希望你能照顾他们。」</p>
<br>
<p>轻轻敲门后,听到「请进」的回答,少女走进了房间行礼。</p>
<p>一个少女嘟囔着。那是一位身穿简素蓝色连衣裙的少女。闪闪发光的银色头发留到了胸口。有着神秘的琉璃色瞳孔的美丽可爱少女站在母亲身旁。</p>
<p>送走绅士后,女仆少女前往总管的房间。</p>
<br>
<p>「欸,对我不需要用这种说话方式吧?……律子酱。」</p>
<p>薪水丰厚的兼职让她顺利存下了留学费用,留学之日即将到来。</p>
<p>「拜托了!」</p>
<br>
<p>女仆少女律子满脸笑容地回答。</p>
<br>
<p>「话说回来,律子酱。上次来的坂上夫人很喜欢你呢。上次寄来的邮件里相当称赞。她说下次还打算指名。」</p>
<br>
<p>「失礼了Miss 阿曼达。关于刚才离开宅邸的老爷报告……」</p>
<br>
<p>被叫做律子的女仆少女张开眼,刚才还散发着女仆气息的模样一下子变回稚气十足的少女,她嘟起嘴说道。</p>
<br>
<p>「这样很好啊!」</p>
<p>女仆少女律子满脸笑容地回答。</p>
<p>对担忧这一点的父母来说,当时的律子的情况无疑让人开心。</p>
<br>
<p>因此,父母并未反对女儿出人意表的宣言。</p>
<br>
<p>标题叫『深窗的公主的悲恋』。</p>
<p>优雅的动作,没有任何不自然的温柔笑容。仿佛是女仆典范一般的少女。看着她的身影,总管阿曼达皱了皱眉。不,这是因为……</p>
<p>「怎么了?瑟蕾丝蒂?」</p>
<br>
<p>「啊,拜托了。那么……」</p>
<p>「一路顺风,老爷。」</p>
<p>「遵命。我会将您的意愿转达给<ruby>女仆总管<rp>(</rp><rt>家政妇</rt><rp>)</rp></ruby>。」</p>
<p>(公主身后的女仆们是多么的优秀啊!)</p>
<br>
<p>「你真的很喜欢做这种工作呢。这样一来就得早晨开始准备了。下次我会去问问她们的希望。」</p>
<br>
<p>这部电影以旧时英国贵族的故事为题材。描述了一位在呵护下长大的贵族千金,偶然认识一位平民青年,并陷入爱河的故事。最后,因为身份差异,两人自尽,悲剧结局。</p>
<br>
<p>父母看着律子的身影,感到非常开心。</p>
<p>女仆们使出各种手段帮助她与男子相会。</p>
<p>在女仆的影响下,律子对各种事物产生了兴趣,玩耍、笑声、学习,成长为一个非常优秀的女儿。自从遇见女仆以来,好奇心无止境,虽然年龄和性格相比有些幼稚,但对父母来说,女仆这个存在也是让人有好感的。</p>
<br>
<p>她的名字是瑞波律子,二十岁,现在是大学二年级的学生。</p>
<br>
<p>「我讨厌那个名字啊。明明是日本人,却叫阿曼达……」 <span style="color: rgb(61142185);">(*亚万田日语念成阿曼达)</span></p>
<br>
<p>当然,因为主角是英国贵族千金,所以电影里并没有描绘女仆们努力的场景。但正因为如此,律子对在幕后默默支持的女仆们十分感动。</p>
<p>「……本来应该是这样的啊。」</p>
<p>来这家女仆咖啡厅的客人并不仅仅是男性。这家店的男女客人比例几乎是一比一。</p>
<br>
<p>会员制高级女仆咖啡厅『<ruby>贵族的日常<rp>(</rp><rt>Noble's One Day</rt><rp>)</rp></ruby>』。</p>
<br>
<p>生活了六年,律子慢慢的成长,但她却不对事物报持热情。喜欢的玩具和书籍都没有,看电视也不会表现出太多兴趣。</p>
<br>
<p>「拜托了!」</p>
<br>
<p>那是瑞波律子还不懂爱情的六岁春天的事……先不管给一个六岁小孩看悲恋电影的问题。</p>
<p>是被称为女仆总管的女性,亚万田凪沙创建的店。</p>
<br>
<p>「我在大学毕业后,想在英国成为真正的女仆!」</p>
<br>
<p>「好的,请放心交给我!」</p>
<br>
<p>「欸,真的吗!? 就是上周来过的那位温柔的女士吗?」</p>
<p>男士需穿着西装,女士需穿着礼服,这是服装规定。特别为女性客人提供服装租赁服务,因此女性客人可以享受穿着平时难得一穿的贵族少女或贵妇风格的洋装,扮演女主人的角色。</p>
<br>
<p>虽然二十岁了,律子的脸庞略显年幼,她是这家店最受欢迎的女仆。</p>
<br>
<p>看过这部电影的观众都为两人的悲恋流泪,感动不已。</p>
<br>
<p>从那时起,律子就迷上了女仆。她向父母说明了女仆是多么伟大的存在,并激动地宣布有一天她也会成为女仆。</p>
<p>完全预约制,到店时会有指名的女仆迎接。此时店员会完全扮演女仆角色,客人不是客人身份,而是扮演女仆的主人,享受其中。</p>
<br>
<p>一切都是顺风顺水。距离成为女仆只剩下最后一步!</p>
<br>
<p>美丽的行礼后,少女向绅士回以温柔的微笑。绅士推开门离开了。</p>
<br>
<p>律子的梦想是成为女仆。原因非常简单,那是因为她小时候看过的一部电影。</p>
<br>
<p>在父母的支持下,律子在大学学习外语、历史、文学、礼仪等,以成为女仆为目标,在本格派女仆咖啡厅进行女仆训练的日常。</p>
<br>
<p>「那么,我也可以帮忙准备衣服和化妆吗?」</p>
<p>「讨厌!再让我扮一下女仆也没关系嘛,亚万田小姐!」</p>
<br>
<p>绅士略显羞涩地说着,女仆的少女露出了微笑回答。</p>
<p>然而,律子却对另一方面感动不已。</p>
<br>
<br>
<br>
<br>
<br>
<p>支付是预付制,店内不谈金钱。没有菜单,女仆会自然接受点单。客人只需要享受那片刻的主人时光即可。</p>
<p>女主角的贵族千金拥有很温柔的人格,所以她的女仆们也非常喜爱她。</p>
<br>
<p>为了筹集到英国留学的资金,进入大学的律子开始寻找兼职工作。她认为对未来有帮助的工作是最好的,于是找到了这家女仆咖啡厅。</p><div class="cgo"><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8799828951681010"
crossorigin="anonymous"></script>
<ins class="adsbygoogle"
style="display:block"
data-ad-client="ca-pub-8799828951681010"
data-ad-slot="9085546976"
data-ad-format="auto"
data-full-width-responsive="true"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script></div>
</div>
</div>
</div>
<div id="toptext" class="toptext" style="display:none;"></div>
<div id="bottomtext" class="bottomtext" style="display:none;"></div>
<div id="operatetip" class="operatetip" style="display:none;" onclick="this.style.display='none';">
<div class="tipl"><p>翻上页</p></div>
<div class="tipc"><p>呼出功能<br><br><small>漫画&插图<br>建议使用上下翻页</small><br><br><small>【翻页模式】章评·默认隐藏</small></p></div>
<div class="tipr"><p>翻下页</p></div>
</div>
</div>
<div id="footlink" class="footlink"><a onclick="window.location.href = ReadParams.url_previous;">序章 路多帕克家的大小姐以及万能女仆</a><a onclick="window.location.href = ReadParams.url_index;">目录</a><a onclick="window.location.href = ReadParams.url_articleinfo;">书页</a><a onclick="window.location.href = ReadParams.url_next;">下一頁</a></div>
<script>$(document).ready(function(){var prevpage="/novel/4126/236196.html";var nextpage="/novel/4126/236197_2.html";var bookpage="/novel/4126.html";$("body").keydown(function(event){var isInput=event.target.tagName==='INPUT'||event.target.tagName==='TEXTAREA';if(!isInput){if(event.keyCode==37){location=prevpage}else if(event.keyCode==39){location=nextpage}}})});</script>
<script type="text/javascript" src="https://www.bilinovel.com/themes/zhmb/js/readtools.js?42sfaj-8"></script>
<script type="text/javascript" src="https://www.bilinovel.com/scripts/json2.js"></script>
<script type="text/javascript" src="https://www.bilinovel.com/themes/zhmb/js/chapterlog.js?v1006c1"></script>
<script async src="https://www.googletagmanager.com/gtag/js?id=G-1K4JZ603WH"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-1K4JZ603WH');
</script>
<script>
var _hmt = _hmt || [];
(function() {
var hm = document.createElement("script");
hm.src = "https://hm.baidu.com/hm.js?6f9595b2c4b57f95a93aa5f575a77fb0";
var s = document.getElementsByTagName("script")[0];
s.parentNode.insertBefore(hm, s);
})();
</script>
<!--<script>
if ('serviceWorker' in navigator) {
navigator.serviceWorker.getRegistrations().then(function(registrations) {
for (let registration of registrations) {
registration.unregister();
}
});
}
</script>-->
<script defer src="https://static.cloudflareinsights.com/beacon.min.js/vcd15cbe7772f49c399c6a5babf22c1241717689176015" integrity="sha512-ZpsOmlRQV6y907TI0dKBHq9Md29nnaEIPlkf84rnaERnq6zvWvPUqr2ft8M1aS28oN72PdrCzSjY4U6VaAw1EQ==" data-cf-beacon='{"version":"2024.11.0","token":"192783771d59492782cd05bd12eb61b9","r":1,"server_timing":{"name":{"cfCacheStatus":true,"cfEdge":true,"cfExtPri":true,"cfL4":true,"cfOrigin":true,"cfSpeedBrain":true},"location_startswith":null}}' crossorigin="anonymous"></script>
</body>
</html>`
// --- 步骤 2: 解析HTML并提取关键信息 ---
doc, err := goquery.NewDocumentFromReader(strings.NewReader(unprocessedHtmlContent))
if err != nil {
log.Fatalf("解析HTML失败: %v", err)
}
chapterID := 236197
// --- 步骤 3: 收集所有需要重排的段落 ---
var scrambledParagraphs []*goquery.Selection
doc.Find("#acontent p").Each(func(i int, s *goquery.Selection) {
// 确保只添加非空段落与JS逻辑保持一致
if len(strings.TrimSpace(s.Text())) > 0 {
scrambledParagraphs = append(scrambledParagraphs, s)
}
})
fmt.Printf("从原始HTML中找到 %d 个乱序段落,准备重排。\n\n", len(scrambledParagraphs))
// --- 步骤 4: 执行重排算法 ---
correctlyOrderedParagraphs := unscrambleParagraphs(scrambledParagraphs, chapterID)
// --- 步骤 5: 输出最终结果 ---
fmt.Println("--- 已恢复正确顺序的最终内容 ---")
for i, p := range correctlyOrderedParagraphs {
fmt.Printf("%d: %s\n", i+1, p.Text())
}
}