Week 12 — My Journey into Data Analytics — DA Minidegree Review — CXL Institute
Welcome to the 12th and the Last week of my Journey into Digital Analytics with CXL. This course has been a necessary step up in my career and helped me to become an efficient Data-Driven Marketer. As CXL states, these courses are actually difficult but believe me, they are worth the Hype.
This week I covered the Bigquery module apart from Advance Google Tag manager, Excel sheets for marketers, and Google Data studio. Let’s get an overview of what Bigquery is, do you need it for your business or client right now.
Google BigQuery is a serverless, highly scalable, and cost-effective multi-cloud warehouse for analytics data. It’s a place where you can:
- Store your data
- Query your data
- Create nice (and free!) dashboards in Google Data Studio (or any other visualization tool, for the price of that tool)
Some of the Benefits of using Bigquery
- Google Analytics connected to BigQuery enables you to answer more, deeper business questions in more detail.
- Analyzing massive amounts of traffic data will be done in the cloud. The Data is unsampled, unlike Standard GA.
- Many native platforms limit the amount of historical data you can access. For example, Google Search Console offers six months of historical data within its native interface. But with BigQuery you can use it to store all your past data, giving a complete overview of your historical performance.
- It is easy to import custom data sets and use them with your Google Analytics data.
- With hit-based data, you can analyze what is happening on your website on a very granular level (second by second, filtering by dimensions…), sequence of interactions down to a particular session level.
- With BigQuery you can access individual user’s hits (anonymized data), which can help you personalize your website for the next time they visit it.
How to Get started
To start working with it, you have to create (or log in to) a Gmail account and then go to Google Cloud Console to create a Cloud Project. The creation of these elements is straightforward.
Note: When you enter a Cloud account, it asks you to provide a credit card to get $300 in credits to test the platform. They do not automatically charge your credit card if the credits run out.
If you don’t want to enter your credit card and only want to play with BigQuery and public data (there are plenty of public datasets within BigQuery), you can use a BigQuery sandbox.
Just enter a BigQuery service after creating a Cloud Project and accepting all the terms, etc. You’ll see a “Sandbox” label in the top-left corner.
You can access BigQuery in multiple ways:
• You can access BigQuery and run jobs from your web browser
• Developers can use bq command-line tool
- the python-based tool that can access BigQuery from the command line
•Developers can also leverage the Service API
- RESTful API to access BigQuery programmatically
- Requires authorization by OAuth2
•Integration Possible with Third-party Tools
- Visualization and Statistical Tools tools like Tableau, QlikView, R, etc.
• You can export data in a .csv file, JSON, or to Google Cloud Storage
Big query Structure
BigQuery is structured as a hierarchy with 4 levels:
• Projects: Top-level containers in the Google Cloud Platform that store the data. Within the project, you can configure settings, permissions, and other metadata that describe your applications. Each project has a name, ID, and number that you’ll use as identifiers
• Datasets: Within projects, datasets hold one or more tables of data. All tables must belong to a dataset. You must create a dataset before loading data into BigQuery. You can even access the public Datasets.
• Tables: Within datasets, tables are row-column structures that hold actual data. Each table has a schema that describes the data contained in the table, including field names, types, and descriptions
BigQuery supports the following table types:
- Native tables: tables backed by native BigQuery storage
- External tables: tables backed by storage external to BigQuery
- Views: virtual tables defined by a SQL query
• Jobs: The tasks you are performing on the data, such as running queries, loading data, and exporting data. You can run multiple jobs concurrently. Completed jobs are listed in the Jobs collection
There are four types of jobs:
- Load: load data into a table
- Query: run a query against BigQuery data •
- Extract: export a BigQuery table to Google Cloud Storage
- Copy: copy an existing table into another new or existing table
The taxonomy of BigQuery flows as follows:
Projects > datasets > tables and views.
Within a project, you can create/delete/copy datasets and tables
The BigQuery interface
- Query history. Queries you’ve run previously. It’s especially useful when you run tests but forget to register the best queries you might need later.
- Saved query. Where to find your registered queries. Name them clearly so you can find them quickly later.
- Job history. The history of what happened in BigQuery — imports, exports, task history, etc.
- Transfers. Where you see and configure Data Transfers, a Google service to import Google data (e.g. Ads, Play, YouTube) into BigQuery.
- Scheduled queries. Register queries and runs them every hour/day/week, etc.
- BI engines. A new feature that integrates with familiar Google tools like Google Data Studio to accelerate data exploration and analysis. A BigQuery enhancer.
- Resources. Pin a project to see it at the top of the list. It’s super useful when you have lots of projects but work often on just a few. You can also add a public dataset to play with data if you want to learn the tool but don’t have your own.
How to Query Data?
BigQuery uses a SQL-like language for querying and manipulating data. SQL statements are used to perform various database tasks, such as querying data, creating tables, and updating databases.
Basic queries contain the following components:
• SELECT (required): identifies the columns to be included in the query
• FROM (required): the table that contains the columns in the SELECT statement
• WHERE: a condition for filtering records
• ORDER BY: how to sort the result set
• GROUP BY: how to aggregate data in the result set
SELECT year, state, is_male, gestation_weeks FROM [bigquery-public-data:samples.natality]
Example — Time Spent Per session per user
Bidding Adieu and Promising many more articles on Digital Analytics as my journey continues in the real world.