225 lines
5.3 KiB
Markdown
225 lines
5.3 KiB
Markdown
# Paroles.net Scraper
|
|
|
|
A Python package to fetch song lyrics from [paroles.net](https://www.paroles.net/).
|
|
|
|
## Features
|
|
|
|
- Fetches song lyrics from paroles.net
|
|
- Cleans up advertisement content from lyrics
|
|
- Handles URL construction for different artists and songs
|
|
- Command-line interface for easy usage
|
|
- Comprehensive test suite
|
|
- Installable Python package
|
|
|
|
## Installation
|
|
|
|
1. Clone or download this repository
|
|
2. Install the package in development mode:
|
|
```bash
|
|
pip install -e .
|
|
```
|
|
|
|
Or if you're using uv:
|
|
```bash
|
|
uv sync
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Command Line Interface
|
|
|
|
After installation, you can use the command line interface with two formats:
|
|
|
|
1. Single argument with dash separator:
|
|
```bash
|
|
paroles-scraper "Artist Name - Song Title"
|
|
```
|
|
|
|
2. Two separate arguments:
|
|
```bash
|
|
paroles-scraper "Artist Name" "Song Title"
|
|
```
|
|
|
|
### Examples
|
|
|
|
```bash
|
|
paroles-scraper "SCORPIONS - SEND ME AN ANGEL"
|
|
paroles-scraper "Ed Sheeran" "Shape of You"
|
|
paroles-scraper "Imagine Dragons" "Believer"
|
|
```
|
|
|
|
### As a Python Package
|
|
|
|
You can also use the package directly in your Python code:
|
|
|
|
```python
|
|
from paroles_net_scraper import get_song_lyrics
|
|
|
|
lyrics = get_song_lyrics("Ed Sheeran", "Shape of You")
|
|
print(lyrics)
|
|
```
|
|
|
|
## Testing
|
|
|
|
The project includes a comprehensive test suite using pytest. To run the tests:
|
|
|
|
```bash
|
|
pytest tests/ -v
|
|
```
|
|
|
|
Or if you're using uv:
|
|
|
|
```bash
|
|
uv run pytest tests/ -v
|
|
```
|
|
|
|
## Code Quality
|
|
|
|
This project uses Ruff for linting and formatting, and pre-commit hooks to ensure code quality.
|
|
|
|
### Linting and Formatting
|
|
|
|
- **Ruff** is used for both linting and formatting
|
|
- **pre-commit** hooks are installed to automatically check and format code before committing
|
|
|
|
To manually run linting:
|
|
```bash
|
|
ruff check .
|
|
```
|
|
|
|
To automatically fix linting issues:
|
|
```bash
|
|
ruff check . --fix
|
|
```
|
|
|
|
To format code:
|
|
```bash
|
|
ruff format .
|
|
```
|
|
|
|
### Pre-commit Hooks
|
|
|
|
Pre-commit hooks are automatically run when you commit changes. To manually run all pre-commit hooks:
|
|
|
|
```bash
|
|
pre-commit run --all-files
|
|
```
|
|
|
|
### Development Dependencies
|
|
|
|
To install development dependencies including Ruff and pre-commit:
|
|
|
|
```bash
|
|
uv pip install -e .[dev]
|
|
```
|
|
|
|
Or using Make:
|
|
```bash
|
|
make install-dev
|
|
```
|
|
|
|
## CI/CD
|
|
|
|
This project includes a GitLab CI configuration (`.gitlab-ci.yml`) that:
|
|
|
|
- Runs tests on multiple Python versions
|
|
- Builds the package using uv
|
|
- Can deploy to PyPI (when configured with credentials)
|
|
|
|
To use the GitLab CI pipeline:
|
|
|
|
1. Push your code to a GitLab repository
|
|
2. Ensure your GitLab runner is configured
|
|
3. Set up PyPI credentials as CI/CD variables if you want to deploy
|
|
|
|
## Building and Uploading to Gitea Package Registry
|
|
|
|
This project includes scripts to build and upload packages to Gitea's PyPI package registry.
|
|
|
|
### Prerequisites
|
|
|
|
1. Create a personal access token in Gitea with package registry permissions
|
|
2. Configure `~/.pypirc` with your Gitea credentials:
|
|
|
|
```ini
|
|
[distutils]
|
|
index-servers = gitea
|
|
|
|
[gitea]
|
|
repository = https://gitea.parano.ch/api/packages/herel/pypi
|
|
username = herel
|
|
password = YOUR_GITEA_PERSONAL_ACCESS_TOKEN
|
|
```
|
|
|
|
You can use the template provided in `GITEA_PYPI_CONFIG_TEMPLATE.txt` as a starting point.
|
|
|
|
### Using the Build and Upload Scripts (Development Only)
|
|
|
|
These scripts are intended for development purposes only and are not installed with the package. They should only be used by developers working on the project.
|
|
|
|
#### Option 1: Using the shell script
|
|
|
|
```bash
|
|
./scripts/build_and_upload.sh
|
|
```
|
|
|
|
#### Option 2: Using the Python script
|
|
|
|
```bash
|
|
uv run scripts/build_and_upload.py
|
|
```
|
|
|
|
#### Option 3: Using Make
|
|
|
|
```bash
|
|
make upload
|
|
```
|
|
|
|
|
|
### Manual Build and Upload
|
|
|
|
If you prefer to build and upload manually:
|
|
|
|
1. Build the package:
|
|
```bash
|
|
uv build
|
|
```
|
|
|
|
2. Upload to Gitea:
|
|
```bash
|
|
uv run twine upload --repository gitea dist/*
|
|
```
|
|
|
|
### Package Availability
|
|
|
|
The package is published to Gitea's PyPI registry under the user namespace. In Gitea, packages are associated with users or organizations rather than specific repositories.
|
|
|
|
You can view and manage your packages at: https://gitea.parano.ch/herel/-/packages
|
|
|
|
Note that while the package is associated with the "herel" user account (same as the repository owner), it is not directly linked to this specific repository in Gitea's package registry system. This is how Gitea's package management system works - packages belong to owners (users or organizations) rather than individual repositories.
|
|
|
|
### Installing from Gitea PyPI
|
|
|
|
To install a package from Gitea's registry:
|
|
```bash
|
|
pip install --index-url https://YOUR_GITEA_TOKEN@gitea.parano.ch/api/packages/herel/pypi/simple --no-deps paroles-net-scraper
|
|
```
|
|
|
|
Note: Replace `YOUR_GITEA_TOKEN` with your actual personal access token.
|
|
|
|
## How it works
|
|
|
|
The package constructs a URL based on the artist name and song title, then scrapes the paroles.net website to extract the lyrics. It uses BeautifulSoup to parse the HTML and extract only the relevant text content while filtering out advertisements and other unwanted content.
|
|
|
|
## Disclaimer
|
|
|
|
This package is for educational purposes only. Please respect the terms of service of paroles.net and use this package responsibly. Consider the legal and ethical implications of web scraping before using this tool.
|
|
|
|
## Dependencies
|
|
|
|
- Python 3.7+
|
|
- requests
|
|
- beautifulsoup4
|
|
- pytest (for running tests)
|
|
- uv (for dependency management and packaging)
|