Learning BigQuery + Google Sheets II

For use with BigQuery, there is an associated collection of quite useful public datasets. If you ever want to use any of these datasets, you should know how how data is contained in each because Google charges by the amount of data processed. If you're learning how to use BigQuery, start with smaller datasets so that mistakes will cost a lot less money and time.

I wrote some Google Apps Script code to compile spreadsheet of all the tables in the public BigQuery datasets. Tables range from 0 bytes to 7.5 terabytes in size. Here's a histogram of total database size of Google BigQuery public datasets (log scale):

Histogram of total database size of Google BigQuery public datasets (log scale)

There's a lot more to say; in the days to come, I will unpack this thumbnail sketch of my computation and lay out possible future directions.

Leave a Reply

Only people in my network can comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed.