Starting with Google BigQuery
Notes made while using Google BigQuery for the first time. This includes creating my first BigQuery table, importing via python, queries, deletion and altering table schemas.
Creating a BigQuery project
See the “Before you begin” section in BigQuery Quickstart
In short, these are the steps
- Create a Google Cloud project
- Enable the BigQuery API.
- Create a service account and download a service account key
Importing data from Python to a BigQuery table
Installing requirements
Install dependencies from the file below by running
# preferably do this in a local virtualenv
pip install -r requirements.txt
Now, either paste your downloaded service account json into the file below, or export it as an env var ala
export SA_ACCOUNT='{
"type": "service_account",
"...etc"
}'
You can then run the file below with
python bigquery_import.py
Selecting, updating and deleting BigQuery data
BigQuery can be queried with SQL. Queries can be run via an interactive console in the browser, or via libraries as shown below.
With interactive console
See Running interactive queries
The queries are similar to SQL.
For instance, querying data
SELECT * FROM myproject.mydataset.mytable
WHERE first_name = 'Tomas';
…deleting data…
DELETE FROM myproject.mydataset.mytable
WHERE first_name = 'Tomas';
…or updating:
UPDATE myproject.mydataset.mytable
SET last_name = 'Newname'
WHERE first_name = 'Tomas';
Via a library
The queries can be sent via the library client too:
# python
client = get_client();
query = """
SELECT name, SUM(number) as total_people
FROM `your-project.dataset.table_name`
WHERE state = 'TX'
GROUP BY name, state
ORDER BY total_people DESC
LIMIT 20
"""
query_job = client.query(query)
Modifying table schemas
Only three modifications can be made to the table schema.
- Adding columns to a schema definition
- Deleting, or dropping, a column from a schema definition
- Relaxing a column’s mode from REQUIRED to NULLABLE
Dropping column
Dropping columns can be done via the interactive console.
ALTER TABLE your-project.dataset.table_name
DROP COLUMN IF EXISTS NAME
Adding column
Via the interactive console
ALTER TABLE your-project.dataset.table_name
ADD COLUMN NAME STRING
or via the library
table_id = "your-project.dataset.table_name"
table = client.get_table(table_id)
original_schema = table.schema
# Creates a copy of the schema.
new_schema = original_schema[:]
new_schema.append(
bigquery.SchemaField("phone", "STRING")
)
table.schema = new_schema
table = client.update_table(table, ["schema"])
print("A new column has been added.")