Learning BigQuery + Google Sheets I

I went down the road of studying BigQuery and Google Sheets, inspired by a suggestion from Michael Manoochehri:

a powerful and lucrative integration is BigQuery/Google Sheets via Apps Script. Some of our customers use this combo for report generation once we (Switchboard) provide foundational data in BigQuery

What is BigQuery?  |  BigQuery  |  Google Cloud Platform:

What is BigQuery?

Storing and querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google BigQuery is an enterprise data warehouse that solves this problem by enabling super-fast SQL queries using the processing power of Google's infrastructure. Simply move your data into BigQuery and let us handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.

What have I done so far and what are the next steps?

As a learning exercise, I'm using Apps Script to write out data about all the public datasets into a Google Sheet and creating a visualization of the datasets: BigQuery Learning (public view with Google account). Most of the code in the project is borrowed from the sample tutorial code listed BigQuery Service  |  Apps Script  |  Google Developers. When I run the sample code, whose core functionality is in the SQL query:

'SELECT TOP(word, 300) AS word, COUNT(*) AS word_count ' +
      'FROM publicdata:samples.shakespeare WHERE LENGTH(word) > 10;'

Roughly translated into plain English, this query says: Compute the 300 most common words whose length is greather than 10 (and the number of times the word occurs) in the public Shakespeare corpus. The rest of the code then stores that list in a Google Sheet. Here are the top ten words:

word    word_count
counterfeit 28
remembrance 24
countenance 24
acquaintance    23
satisfaction    20
entertainment   20
displeasure 20
sovereignty 19
imagination 19
disposition 19

Of the many public data sets to play with, I've chosen the Shakespeare data set, not only because it is used in the Google tutorials, but is small (and therefore you're less likely to spend too much money accidentally doing an inefficient query).

I wrote a non-trivial query on the database to calculate the number of words in each of the corpora: Shakespeare corpora by descending word count.

Does the BigQuery API provide access to saved queries? That is, can list my saved queries, read the content of my queries, write saved queries and even run them? (The last function is, on second thoughts, such a big deal since there is already functionality I know in the API to run queries.)

 /* list corpora by descending total number of words */
SELECT
  corpus,
  corpus_date,
  SUM(word_count) AS num_words
FROM
  publicdata:samples.shakespeare
GROUP BY
  corpus, corpus_date
ORDER BY
  num_words DESC

From the query interface, you can see a number of options for what to do with the output of the query, including:

  • download as CSV
  • download as JSON
  • Save as Table
  • Save as Google Sheets

Big Query: list corpora by descending total number of words

After dipping my toes into using BigQuery on public data sets, I wanted to learn more about the data sets themselves that Google has made available. And not surprising, you can use BigQuery to learn about the data sets. Stay tuned for a write up on what I learned.

Leave a Reply

Only people in my network can comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed.