Monday, June 9, 2014

Building a Customized News Service with Google Apps Script

A big part of my job involves staying on top of current technology events and news. There are several technology-related news sites that I read daily, and typically I scan them first thing in the morning for anything I consider to be important before reading other items of interest. It’s a bit tedious to open each site and read all the individual stories for the ones I care about, and even the blog readers I use aren't well-suited for a quick morning summary of important items.

I've been playing around with Google Apps Script for a while now, and I’m pretty impressed with the versatility and power available via the APIs coupled with the simplicity of creating and running scripts. So, I decided to see if I could write a script that would scan the news sites I read each morning and deliver a customized news feed to my Gmail inbox, along with a summary of my day’s calendar. I was originally going to try the same thing with AppEngine and cron tasks, but it turns out Apps Script is more than enough to handle this job. I’m pretty happy with the results, and in this post I’ll discuss how I created the script.

Overview

The script itself is pretty simple: it scans a set of news feeds, creates links to the stories, scans the headlines for specific words, and calls them out separately from the rest of the news. It also reads my calendar for the day’s events and tells me if I have any upcoming events that I have not yet RSVP’d to.

Getting started with Google Apps Script development is pretty easy; just make sure that you have a Google account, are signed in, and visit script.google.com.

To get this script to work, there are a couple of things you'll need to do:
  1. Enable the Calendar API. From within the script editor, select Resources -> Advanced Google Services, then enable Calendar API.
  2. You also need to enable the Calendar API from within the Google Developers Console. There's a link to do this in the dialog in step 1.

What It Looks Like

You create scripts using the Apps Script IDE, which is of course hosted in the cloud. It's not exactly Eclipse or Visual Studio, but it's very functional and gets the job done:

[click image for a larger view]

When the script is run, it delivers an email to the address of whoever the person running the script is. The screen shot below shows what the resulting email looks like:

[click image for a larger view]

At the top, I have my daily events including ones that need my response, followed by my curated top stories, then the rest of the news items.

The script relies on some global settings to work, which I've listed here:

// data feed URLs
var dataSources = [
"http://gigaom.com/feed/",
"http://feeds.reuters.com/reuters/technologyNews?format=xml",
"http://www.engadget.com/rss-hd.xml",
"http://feeds2.feedburner.com/thenextweb",
"http://feeds.arstechnica.com/arstechnica/index?format=xml",
"http://www.forbes.com/technology/feed/",
"http://www.pcworld.com/index.rss"
];
// keyword triggers
var keyWords = [
"chrome", "chromebook", "chromeos", "google", "android", "gmail", "cloud", "app engine",
"appengine", "compute engine", "microsoft", "facebook", "apple", "windows phone", "windows 8"
];
// List to hold headlines that contain keywords
var topStories = [];
// Settings
var HEADLINE_LIMIT = 15; // Number of headlines per news source
var EMAIL_TITLE = "The Day Ahead"; // What to title the email
var DAYS_AHEAD = 7; // Number of days out to scan events
view raw settings.js hosted with ❤ by GitHub
The main function of this script is called deliverNews(). It looks like this:

function deliverNews()
{
var newsMsg = ""; // will hold the completed HTML to email
var deliverAddress = Session.getActiveUser().getEmail();
var calEventsStr = "<h2>Calendar</h2>";
// get a list of today's events
var calEvents = getEventsForToday();
if (calEvents.length > 0) {
calEventsStr += "<p>You have " + calEvents.length + " events today</p>";
calEventsStr += buildEventsHTML(calEvents);
}
else {
calEventsStr += "<p>No events today</p>";
}
// Get upcoming calendar events that have not been responded to
calEvents = getEventsMissingResponse();
if (calEvents.length > 0) {
calEventsStr += "<p>You have " + calEvents.length + " events in the next " +
DAYS_AHEAD + " days that you have not RSVP'd to:</p>";
calEventsStr += buildEventsHTML(calEvents);
}
// Collect the headlines from the feeds and filter the top stories
var feedStoriesStr = "";
for (var i=0; i < dataSources.length; i++) {
feedStoriesStr += retrieveFeedItems(dataSources[i]);
}
// Generate the Top Stories list that was created based on keywords
var topStoriesStr = "<h2>Top Stories</h2>";
if (topStories.length > 0) {
topStoriesStr += "<ul>";
for (var k=0; k<topStories.length; k++) {
topStoriesStr += "<li style='font-weight:bold'><a href='" + topStories[k].link + "'>" +
topStories[k].title + "</a></li>\n";
}
topStoriesStr += "</ul>";
}
// put all the data together
newsMsg = "<h1>" + EMAIL_TITLE + "</h1>\n" + calEventsStr + topStoriesStr + feedStoriesStr;
// Deliver the email message as HTML to the recipient
GmailApp.sendEmail(deliverAddress, EMAIL_TITLE, "", { htmlBody: newsMsg });
Logger.log(newsMsg.length);
}
view raw delivernews.js hosted with ❤ by GitHub
The deliverNews function does several things:
  1. Reads the Calendar for the day's upcoming events
  2. Checks the week ahead to see if there are any events I have not RSVP'd to
  3. Reads the news feeds and builds the Top Stories list
  4. Puts the rest of the news stories into their own sections

Reading the Calendar

Let’s start by looking at the code to generate the calendar summary. The Google Apps Script API has an advanced service called Calendar. This service is a very precise wrapping of the actual Calendar REST API itself.

To get the events for the upcoming day, I use the Calendar service to retrieve the events using time bounds and a setting to expand recurring events into single instances:

function getEventsForToday() {
var returnEvents = null;
// set the lower bound at midnight
var today1 = new Date();
today1.setHours(0,0,0);
// set the upper bound at 23:59:59
var today2 = new Date();
today2.setHours(23, 59, 59);
// Create ISO strings to pass to Calendar API
var ds1 = today1.toISOString();
var ds2 = today2.toISOString();
var result = Calendar.Events.list("primary", {singleEvents: true, timeMin: ds1, timeMax: ds2});
// Get the events
returnEvents = result.items;
return returnEvents;
}
The getEventsForToday() function returns a list of Event resources. I then just need to scan each event for the title, URL, and date and build the resulting list:

function buildEventsHTML(calEvents) {
var str="";
str += "<ul>";
for (var i=0; i < calEvents.length; i++) {
// Gotcha! All-day events don't have a dateTime, just a date, so need to check
var dateStr = convertDate(calEvents[i].start.dateTime ?
calEvents[i].start.dateTime :
calEvents[i].start.date).toLocaleString();
str += "<li><a href='" + calEvents[i].htmlLink + "'>" +
calEvents[i].summary + "</a> " + dateStr + "</li>";
}
str += "</ul>";
return str;
}

You’ll probably notice that I wrote my own convertDate function to generate a Date. Why? Because Apps Script returns dates as ISO Date strings. In ECMAScript 5, you can pass ISO Date strings directly to the Date() constructor to create a date, but as of this writing the version that Apps Script appears to be using doesn’t support this, so I made my own:

function convertDate(tStr) {
var dateTimeRE = /(\d+)-(\d+)-(\d+)T(\d+):(\d+):(\d+)([+\-]\d+):(\d+)/;
var dateRE = /(\d+)-(\d+)-(\d+)/;
var match = tStr.match(dateTimeRE);
if (!match)
match = tStr.match(dateRE);
var nums = [];
if (match) {
for (var i = 1; i < match.length; i++) {
nums.push(parseInt(match[i], 10));
}
if (match.length > 4) {
// YYYY-MM-DDTHH:MM:SS
return(new Date(nums[0], nums[1] - 1, nums[2], nums[3], nums[4], nums[5]));
}
else {
// YYYY-MM-DD
return(new Date(nums[0], nums[1] - 1, nums[2]));
}
}
else return null;
}
view raw convertDate.js hosted with ❤ by GitHub

Next, I need to find events that I have not RSVP’d to. I use the Calendar service to retrieve events for the current day, then scan each one to find the attendee record that corresponds to my email address. If found, I check the responseStatus feed, which will be "needsAction" if I haven't responded to it:

function getEventsMissingResponse() {
var d = new Date();
var now = d.toISOString();
var then = new Date(d.getTime() + (1000 * 60 * 60 * 24 * DAYS_AHEAD)).toISOString();
var events = [];
var returnEvents = [];
// Find future events that have not been responded to yet
events = Calendar.Events.list("primary", {singleEvents: true, timeMin: now, timeMax: then});
for (var i=0; i < events.items.length; i++) {
var attendees = events.items[i].attendees;
if (attendees) {
for (var j=0; j<attendees.length; j++) {
if (attendees[j].email && attendees[j].email == Session.getActiveUser().getEmail()) {
if (attendees[j].responseStatus == "needsAction") {
returnEvents.push(events.items[i]);
break;
}
}
}
}
}
Logger.log("%s Calendar events with no RSVP",events.length);
return returnEvents;
}

Getting the News

There are two parts to this portion of the script - one is getting the news headlines and generating the categorized results, the other is extracting the headlines that are important and elevating them to the Top Stories section. To do this, I use the UrlFetchApp service to retrieve the data content, then the XmlService to parse each feed.

The code uses UrlFetchApp to read each feed and XmlService to parse the result into a document tree that I can use to extract each item’s title and link. Right now, the script only parses RSS 2.0 feeds, but since practically everyone supports that format, it’s what I decided to code to. Adding support for ATOM would be easy enough as well.

function retrieveFeedItems(feedUrl) {
var feedSrc = UrlFetchApp.fetch(feedUrl).getContentText();
var feedDoc = null;
var str = "";
var itemCount = 0;
var root = null;
var type = "unknown";
// to avoid having one bad XML feed take down the entire script,
// wrap the parsing in a try-catch block
try {
feedDoc = XmlService.parse(feedSrc);
if (feedDoc)
root = feedDoc.getRootElement();
}
catch (e) {
Logger.log("Error reading feed: " + feedUrl);
Logger.log(e);
}
// detect the kind of feed this is. Right now only handles RSS 2.0
// but adding other formats would be easy enough
if (root && root.getName() == "rss") {
var version = root.getAttribute("version").getValue();
if (version == "2.0")
type = "rss2";
}
if (type == "rss2") {
str += "<div>";
var channel = root.getChild("channel");
var items = channel.getChildren("item");
str += "<h2><a href='"+channel.getChildText("link")+"'>"+channel.getChildText("title")+"</a></h2>\n";
Logger.log("%s items from %s", items.length, channel.getChildText("title"));
// Limit the number of headlines
itemCount = (items.length > HEADLINE_LIMIT ? HEADLINE_LIMIT : items.length);
str += "<ul>";
for (var i=0; i < itemCount; i++) {
var keywordFound = false;
var strTitle = items[i].getChildText("title");
var strLink = items[i].getChildText("link");
// If the title triggers a keyword, add it to the topStories list
for (var j=0; j < keyWords.length; j++) {
// simple index search, could be vastly improved
if ( strTitle.toLowerCase().indexOf(keyWords[j]) != -1) {
topStories.push( {title: strTitle, link: strLink} );
keywordFound=true;
break;
}
}
// If we didn't add this item to the topStories, add it to the main news
if (!keywordFound) {
str += "<li><a href='" + strLink + "'>" + strTitle + "</a></li>\n";
}
Logger.log(strTitle);
}
str += "</ul></div>\n";
}
return str;
}
Now, by itself, this would result is a pretty large set of headlines, and I don’t want to have to scan each one to see if it is important. Instead, I have the script look at each headline for a set of keywords that I’ve defined. If a headline contains a keyword, then that headline is put into a special array. If the headline didn’t contain a keyword, it gets added to the feed’s category normally.

Sending the Email

Now all that’s left to do is send the email. The GmailApp service is used for this, and it provides the ability to send an email with HTML content. Let's revisit the last part of the deliverNews function:

// put all the data together
newsMsg = "<h1>" + EMAIL_TITLE + "</h1>\n" + calEventsStr + topStoriesStr + feedStoriesStr;
// Deliver the email message as HTML to the recipient
GmailApp.sendEmail(deliverAddress, EMAIL_TITLE, "", { htmlBody: newsMsg });
Logger.log(newsMsg.length);
view raw sendemail.js hosted with ❤ by GitHub
At this point in the script, the string variables calEventsStr, topStoriesStr, and feedsStr contain the HTML code for each of the sections. All that’s left to do is put them together into one result and send that content via the GmailApp.

Automating the Script

Now I have a script that can be manually run whenever I want to generate the email, but that’s obviously not ideal. What I really want is a script that runs for me on a pre-set schedule. Google Apps Script makes this possible by enabling what are called “Triggers”.

I want my email delivered to me every morning before I wake up. To do so, under the Resources menu, select “Project Triggers”. This will bring up a dialog that lets me set a timed trigger:


[click image for a larger view]

In this case, I have my Trigger set to run daily between 6 and 7 in the morning. That’s really all there is to it - you just choose the function you want executed and when you want it run. Apps Script does the rest!

The full script code is available here if you want to download it and modify it for your own needs.