Starting with Google BigQuery • Notes and anecdotes

Notes made while using Google BigQuery for the first time. This includes creating my first BigQuery table, importing via python, queries, deletion and altering table schemas.

Creating a BigQuery project

See the “Before you begin” section in BigQuery Quickstart

In short, these are the steps

Create a Google Cloud project
Enable the BigQuery API.
Create a service account and download a service account key

Importing data from Python to a BigQuery table

Installing requirements

Install dependencies from the file below by running

# preferably do this in a local virtualenv
pip install -r requirements.txt

Now, either paste your downloaded service account json into the file below, or export it as an env var ala

export SA_ACCOUNT='{
  "type": "service_account",
  "...etc"
}'

You can then run the file below with

python bigquery_import.py

Selecting, updating and deleting BigQuery data

BigQuery can be queried with SQL. Queries can be run via an interactive console in the browser, or via libraries as shown below.

With interactive console

See Running interactive queries

The queries are similar to SQL.

For instance, querying data

SELECT * FROM myproject.mydataset.mytable
  WHERE first_name = 'Tomas';

…deleting data…

DELETE FROM myproject.mydataset.mytable
  WHERE first_name = 'Tomas';

…or updating:

UPDATE myproject.mydataset.mytable
  SET last_name = 'Newname'
  WHERE first_name = 'Tomas';

Via a library

The queries can be sent via the library client too:

# python
client = get_client();
query = """
    SELECT name, SUM(number) as total_people
    FROM `your-project.dataset.table_name`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
"""
query_job = client.query(query)

Modifying table schemas

Only three modifications can be made to the table schema.

Adding columns to a schema definition
Deleting, or dropping, a column from a schema definition
Relaxing a column’s mode from REQUIRED to NULLABLE

Dropping column

Dropping columns can be done via the interactive console.

ALTER TABLE your-project.dataset.table_name
  DROP COLUMN IF EXISTS NAME

Adding column

Via the interactive console

ALTER TABLE your-project.dataset.table_name
  ADD COLUMN NAME STRING

or via the library

table_id = "your-project.dataset.table_name"
table = client.get_table(table_id)

original_schema = table.schema

# Creates a copy of the schema.
new_schema = original_schema[:]
new_schema.append(
    bigquery.SchemaField("phone", "STRING")
)

table.schema = new_schema
table = client.update_table(table, ["schema"])

print("A new column has been added.")

Previous postPlease lower case db table names

Next PostNextJS: Add state to URL query params