SQL for APIs: limiting the filter or filtering the limit?

A quick dive into a SQL query and the nuances of filters and limits.

Morad Ankri · Jul 30, 2019

Recently I worked with @peckjon on a Transposit connector for Algorithmia and an app that uses it. We ended up with a pretty interesting demonstration not just of the Algorithmia connector, but also the power of SQL for APIs.

We wanted to create a SQL query that gets commit messages from GitHub for a given repository and sends them to a sentiment analysis algorithm in Algorithmia. This is easy to do with Transposit and the final query looks like this:

SELECT *
FROM (SELECT C.commit.message, S.result[0].sentiment
      FROM github.list_commits AS C
        JOIN algorithmia.call_algorithm AS S
          ON S.$body.document = C.commit.message
      WHERE C.owner='seattlerb' AND C.repo='heckle'
        AND S.user='nlp' AND S.algorithm='SentimentAnalysis'
      LIMIT 100)
WHERE sentiment > 0.2 OR sentiment < -0.2
ORDER BY sentiment

This query has few interesting nuances, but first let's start with the basics.

To get the commit messages from GitHub for a given repository we can use this query:

SELECT *
FROM github.list_commits
WHERE owner='seattlerb' AND repo='heckle'

To send text to Algorithmia for sentiment analysis we can use this query:

SELECT *
FROM algorithmia.call_algorithm
WHERE user='nlp' AND algorithm='SentimentAnalysis'
  AND $body.document = 'text to send to the sentiment analysis algorithm'

We can use JOIN to combine these queries together:

SELECT C.commit.message, S.result[0].sentiment
FROM github.list_commits AS C
  JOIN algorithmia.call_algorithm AS S
    ON S.$body.document = C.commit.message
WHERE C.owner='seattlerb' AND C.repo='heckle'
  AND S.user='nlp' AND S.algorithm='SentimentAnalysis'

As expected, this query will get the list of commits from GitHub and for each one will call the sentiment analysis algorithm with the commit message.

When you run a similar query in a relational database the DB engine will try to match all the rows in both tables based on the condition in the ON clause.

Transposit will basically do the same with the underlying APIs, except for two differences:

Another interesting part in this query is in the last WHERE clause:

WHERE sentiment > 0.2 OR sentiment < -0.2

as you can see this WHERE is in the outer SELECT, and the JOIN is in a subquery.

We can reduce this type of query to the following template:

SELECT *
FROM (SELECT *
      FROM source
      WHERE [subquery filters]
      LIMIT 100)
WHERE [outer query filters]

At first look this query looks the same as:

SELECT *
FROM source
WHERE ([subquery filters]) AND ([outer query filters])
LIMIT 100

But these queries are quite different: the first query tells the SQL engine to do the subquery filter and the limit 100 before the outer query filters and the second query applies the limit after both queries.

Say what you mean. Mean what you say.

These two queries mean different things and can yield quite different results. The power of SQL is that you can do a lot by saying a little. As with any high level language, it’s important to mean what you say. Do you want to find “interesting” messages among the first 100 commits or do you want to find the first 100 polarizing commit messages? The former produces a well—understood amount of work; on a bland repository, the latter could churn until you hit a rate limit. Just like SQL for traditional databases, with SQL for APIs it’s important to say what you mean (and go crazy in those commit messages).