Notify


So this is a simple script I threw together last weekend but it really shows the power of combining Python with the API driven world we are living in these days.

Quick tip to Product Makers: If your product doesn’t have an API then just don’t even bother releasing it!

Enough ranting, on to the code!

Create your Virtual Environment

So, the first thing you need to do is setup a virtual environment in order to keep your python dependencies separate from your base system Python. Note I will be using Python 3 for all of this.

virtualenv -p python3 some_folder/venv

Install feedparser

We are going to start with the code to parse the RSS feed and for that we will need the feedparser module. So let’s use pip to install it:

pip install feedparser

Using feedparser

Create a Python file and name it whatever you want. I am going to name my file main.py because I have been writing a lot of C code lately for my operating systems class. Also because naming things is hard.

Let’s take a look at the first snippet of code:

import feedparser

rss = 'https://www.reddit.com/r/Python/.rss'
feed = feedparser.parse(rss)
for key in feed["entries"]: 
    print(key)

This will give us a large mess of json-like data (Python dicts) printed to the terminal. Some people like to use pprint to nicely format the python dict structures but I usually just throw them in a json formatter so I can get a better feel for what fields exist and what I may want to extract.

Let us say that for this example we want the title and url of each post to the reddit Python subreddit.

We can get that with:

import feedparser

rss = 'https://www.reddit.com/r/Python/.rss'
feed = feedparser.parse(rss)
for key in feed["entries"]: 
    title = key['title']
    url = key['links'][0]['href']

Adding the ability to search for keywords

Now that is fairly useful but you could easily do that with IFTTT and wouldn’t have to write a line of code. Let's make this script a little more useful though by having it only match on certain keywords being contained in the title of the entry and let's add a simple text file database to keep track of the entries that we have seen before.

For a lot of RSS feeds we could skip the database and use the date to indicate whether an entry is new. But there are some feeds where content is added where the date is older but the entry is actually “new”. So that is one reason I am using the text file database. I could get a bit more fancy and use a local sqlite database but for a script this simple its really just overkill in my humble opinion.

Now, here is our updated code with those features:

# Just some sample keywords to search for in the title
key_words = ['Flask','Pyramid', 'JOB']

# get the urls we have seen prior
f = open('viewed_urls.txt', 'r')
urls = f.readlines()
urls = [url.rstrip() for url in urls] # remove the '\n' char
f.close()

def contains_wanted(in_str):
    # returns true if the in_str contains a keyword
    # we are interested in. Case-insensitive
    for wrd in key_words:
        if wrd.lower() in in_str:
            return True
    return False

def url_is_new(urlstr):
    # returns true if the url string does not exist 
    # in the list of strings extracted from the text file
    if urlstr in urls:
        return False
    else:
        return True

rss = 'https://www.reddit.com/r/Python/.rss'
feed = feedparser.parse(rss)
for key in feed["entries"]: 
    url = key['links'][0]['href']
    title = key['title']
    content = key['content']

    if contains_wanted(title.lower()) and url_is_new(url):
        print('{} - {}'.format(title, url))

        msgtitle = title
        msg = '{}\n{}'.format(title, url)

        # send_pb_msg(msgtitle, msg)

        with open('viewed_urls.txt', 'a') as f:
            f.write('{}\n'.format(url))

Adding notifications

Now, let’s use a service called Pushbullet to make this script really useful. If you are not familiar with Pushbullet, it allows you to send and receive data on all your devices at once.

We will use a Python library called pushbullet.py (Yes, the .py is actually in the name) which is a thin wrapper around the Pushbullet REST API.

To use Pushbullet's API through the wrapper library you will need to sign up for an API Access Token at: https://www.pushbullet.com/#settings/account

A note on API key security

Obviously, it’s not secure to keep your API key in source code, especially if you are uploading that code to GitHub. A better approach would be to set your API key in an environmental variable and then import the variable into your script. That approach is beyond the scope of this tutorial though and for scripts that are running on your own NON-shared machine this will work.

The pushbullet function

Replace the sample API key below with your real API key.

def send_pb_msg(title, msg):
    ACCESS_TOKEN = 'THIS IS NOT MY TOKEN'

    pb = Pushbullet(ACCESS_TOKEN)
    push = pb.push_note(title, msg)

Running the script repeatedly with cron

To run this script repeatedly I created a simple bash script that I call with a cron job. For more info on cron you can see my earlier tutorial: How to use Cron

The bash script run_main.sh

#!/bin/bash

cd /home/tim/rss_notifier
source venv/bin/activate
python main.py

The cron command I added with crontab -e command:

*/5 * * * * /home/tim/rss_notifier/run_main.sh >> /home/tim/rss_notifier/run_main.log 2>&1 

The full script main.py:

Now, here is the full script:

import feedparser
import pprint
from pushbullet import Pushbullet



def send_pb_msg(title, msg):
    ACCESS_TOKEN = 'NOT MY ACCESS TOKEN'

    pb = Pushbullet(ACCESS_TOKEN)
    push = pb.push_note(title, msg)

# Just some sample keywords to search for in the title
key_words = ['Flask','Pyramid', 'JOB']

# get the urls we have seen prior
f = open('viewed_urls.txt', 'r')
urls = f.readlines()
urls = [url.rstrip() for url in urls] # remove the '\n' char
f.close()

def contains_wanted(in_str):
    # returns true if the in_str contains a keyword
    # we are interested in. Case-insensitive
    for wrd in key_words:
        if wrd.lower() in in_str:
            return True
    return False

def url_is_new(urlstr):
    # returns true if the url string does not exist 
    # in the list of strings extracted from the text file
    if urlstr in urls:
        return False
    else:
        return True

rss = 'https://www.reddit.com/r/Python/.rss'
feed = feedparser.parse(rss)
for key in feed["entries"]: 
    url = key['links'][0]['href']
    title = key['title']
    content = key['content']

    if contains_wanted(title.lower()) and url_is_new(url):
        print('{} - {}'.format(title, url))

        msgtitle = title
        msg = '{}\n{}'.format(title, url)

        send_pb_msg(msgtitle, msg)

        with open('viewed_urls.txt', 'a') as f:
            f.write('{}\n'.format(url))

Comments

comments powered by Disqus