Skip to content

Latest commit

 

History

History
364 lines (250 loc) · 10.5 KB

File metadata and controls

364 lines (250 loc) · 10.5 KB

podcastparser

podcastparser is a simple and fast podcast feed parser library in Python. The two primary users of the library are the gPodder Podcast Client and the gpodder.net web service.

The following feed types are supported:

The following specifications are supported:

These formats only specify the possible markup elements and attributes. We recommend that you also read the Podcast Feed Best Practice guide if you want to optimize your feeds for best display in podcast clients.

Where times and durations are used, the values are expected to be formatted either as seconds or as RFC 2326 Normal Play Time (NPT).

Example

Using the built-in urllib.request module from Python 3:

import podcastparser
import urllib.request

feedurl = 'http://example.com/feed.xml'

parsed = podcastparser.parse(feedurl, urllib.request.urlopen(feedurl))

# parsed is a dict
import pprint
pprint.pprint(parsed)

Using Requests:

import podcastparser
import requests

url = 'https://example.net/podcast.atom'

with requests.get(url, stream=True) as response:
    response.raw.decode_content = True
    parsed = podcastparser.parse(url, response.raw)

# parsed is a dict
import pprint
pprint.pprint(parsed)

Supported XML Elements and Attributes

For both RSS and Atom feeds, only a subset of elements (those that are relevant to podcast client applications) is parsed. This section describes which elements and attributes are parsed and how the contents are interpreted/used.

RSS

rss@xml:base
Base URL for all relative links in the RSS file.
rss/channel
Podcast.
rss/channel/title
Podcast title (whitespace is squashed).
rss/channel/link
Podcast website.
rss/channel/description
Podcast description (whitespace is squashed).
rss/channel/itunes:summary
Podcast description (whitespace is squashed).
rss/channel/image/url
Podcast cover art.
rss/channel/itunes:image
Podcast cover art (alternative).
rss/channel/itunes:type
Podcast type (whitespace is squashed). One of 'episodic' or 'serial'.
rss/channel/itunes:keywords
Podcast keywords (whitespace is squashed).
rss/channel/atom:link@rel=payment
Podcast payment URL (e.g. Flattr).
rss/channel/generator
A string indicating the program used to generate the channel. (e.g. MightyInHouse Content System v2.3).
rss/channel/language
Podcast language.
rss/channel/itunes:author
The group responsible for creating the show.
rss/channel/itunes:owner
The podcast owner contact information. The <itunes:owner> tag information is for administrative communication about the podcast and isn't displayed in Apple Podcasts
rss/channel/itunes:category
The show category information.
rss/channel/itunes:explicit
Indicates whether podcast contains explicit material.
rss/channel/itunes:new-feed-url
The new podcast RSS Feed URL.
rss/channel/podcast:locked
If the podcast is currently locked from being transferred.
rss/channel/podcast:funding
Funding link for podcast.
rss/redirect/newLocation
The new podcast RSS Feed URL.
rss/channel/item
Episode.
rss/channel/item/guid
Episode unique identifier (GUID), mandatory.
rss/channel/item/title
Episode title (whitespace is squashed).
rss/channel/item/link
Episode website.
rss/channel/item/description
Episode description. If it contains html, it's returned as description_html. Otherwise it's returned as description (whitespace is squashed). See Mozilla's article Why RSS Content Module is Popular
rss/channel/item/itunes:summary
Episode description (whitespace is squashed).
rss/channel/item/itunes:subtitle
Episode subtitled / one-line description (whitespace is squashed).
rss/channel/item/content:encoded
Episode description in HTML. Best source for description_html.
rss/channel/item/itunes:duration
Episode duration.
rss/channel/item/pubDate
Episode publication date.
rss/channel/item/atom:link@rel=payment
Episode payment URL (e.g. Flattr).
rss/channel/item/atom:link@rel=enclosure
File download URL (@href), size (@length) and mime type (@type).
rss/channel/item/itunes:image
Episode art URL.
rss/channel/item/media:thumbnail
Episode art URL.
rss/channel/item/media:group/media:thumbnail
Episode art URL.
rss/channel/item/media:content
File download URL (@url), size (@fileSize) and mime type (@type).
rss/channel/item/media:group/media:content
File download URL (@url), size (@fileSize) and mime type (@type).
rss/channel/item/enclosure
File download URL (@url), size (@length) and mime type (@type).
rss/channel/item/psc:chapters
Podlove Simple Chapters, version 1.1 and 1.2.
rss/channel/item/psc:chapters/psc:chapter
Chapter entry (@start, @title, @href and @image).
rss/channel/item/itunes:explicit
Indicates whether episode contains explicit material.
rss/channel/item/itunes:author
The group responsible for creating the episode.
rss/channel/item/itunes:season
The season number of the episode.
rss/channel/item/itunes:episode
An episode number.
rss/channel/item/itunes:episodeType
The episode type. This flag is used if an episode is a trailer or bonus content.
rss/channel/item/podcast:chapters
The url to a JSON file describing the chapters. Only the url is added to the data as fetching an external URL would be unsafe.
rss/channel/item/podcast:person
A person involved in the episode, e.g. host, or guest.
rss/channel/item/podcast:transcript
The url for the transcript file associated with this episode.

Atom

For Atom feeds, podcastparser will handle the following elements and attributes:

atom:feed
Podcast.
atom:feed/atom:title
Podcast title (whitespace is squashed).
atom:feed/atom:subtitle
Podcast description (whitespace is squashed).
atom:feed/atom:icon
Podcast cover art.
atom:feed/atom:link@href
Podcast website.
atom:feed/atom:entry
Episode.
atom:feed/atom:entry/atom:id
Episode unique identifier (GUID), mandatory.
atom:feed/atom:entry/atom:title
Episode title (whitespace is squashed).
atom:feed/atom:entry/atom:link@rel=enclosure
File download URL (@href), size (@length) and mime type (@type).
atom:feed/atom:entry/atom:link@rel=(self|alternate)
Episode website.
atom:feed/atom:entry/atom:link@rel=payment
Episode payment URL (e.g. Flattr).
atom:feed/atom:entry/atom:content
Episode description (in HTML or plaintext).
atom:feed/atom:entry/atom:published
Episode publication date.
atom:feed/atom:entry/media:thumbnail
Episode art URL.
atom:feed/atom:entry/media:group/media:thumbnail
Episode art URL.
atom:feed/atom:entry/psc:chapters
Podlove Simple Chapters, version 1.1 and 1.2.
atom:feed/atom:entry/psc:chapters/psc:chapter
Chapter entry (@start, @title, @href and @image).

The podcastparser module

.. automodule:: podcastparser
   :members:

Unsupported Namespaces

This is a list of podcast-related XML namespaces that are not yet supported by podcastparser, but might be in the future.

Chapter Marks

  • rawvoice RSS: Rating, Frequency, Poster, WebM, MP4, Metamark (kind of chapter-like markers)
  • IGOR: Chapter Marks

Others

  • libSYN RSS Extensions: contactPhone, contactEmail, contactTwitter, contactWebsite, wallpaper, pdf, background
  • Comment API: Comments to a given item (readable via RSS)
  • MVCB: Error Reports To Field (usually a mailto: link)
  • Syndication Module: Update period, frequency and base (for skipping updates)
  • Creative Commons RSS: Creative commons license for the content
  • Pheedo: Original link to website and original link to enclosure (without going through pheedo redirect)
  • WGS84: Geo-Coordinates per item
  • Conversations Network: Intro duration in milliseconds (for skipping the intro), ratings
  • purl DC Elements: dc:creator (author / creator of the podcast, possibly with e-mail address)
  • Tristana: tristana:self (canonical URL to feed)
  • Blip: Show name, show page, picture, username, language, rating, thumbnail_src, license

Indices and tables