GA4データをBigQueryに入れて、そこからSQLやPythonで分析していく内容を共有します。
1. 管理画面からBigQueryの画面へ移行
data:image/s3,"s3://crabby-images/324be/324bef75bdd0b0a41c30b1c71544ac2f4c80e2a8" alt="Image in a image block"
2. GA4のサンプルデータセットを追加
data:image/s3,"s3://crabby-images/0c29e/0c29e2e8f9fd26ee251dd9b88e26828d692e0ebc" alt="Image in a image block"
data:image/s3,"s3://crabby-images/b23ef/b23ef97169947ff871fa7d61651057eeed86131b" alt="Image in a image block"
data:image/s3,"s3://crabby-images/8672b/8672be7c3013ed2b1188ea17c956d38eff78dca7" alt="Image in a image block"
3. SQLで集計する
✔️ 購入イベントの一覧を見る
data:image/s3,"s3://crabby-images/22cd6/22cd6945d1b6b9953d87ae566a5d15cd46275507" alt="Image in a image block"
select *
from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`
where event_name = "purchase"
✔️ UNNESTで分解して確認する
data:image/s3,"s3://crabby-images/f8ece/f8ecebf3a24a32cf17f84fe95debf098648496fa" alt="Image in a image block"
select params
from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`, unnest(event_params) as params
✔️ 日別でユニークユーザー数を確認
data:image/s3,"s3://crabby-images/db10e/db10ea935289ec95e9434405cfd1679880e9ed72" alt="Image in a image block"
select
event_date,
count(distinct user_pseudo_id) as UU
from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_202101*`
group by event_date
order by event_date
✔️ Session IDとページロケーションの一覧を取得
data:image/s3,"s3://crabby-images/395eb/395eb36b019916aa80de1068fd1f76cc5114de31" alt="Image in a image block"
select
event_date,
(select value.int_value from unnest(event_params) where key = 'ga_session_id'),
(select value.string_value from unnest(event_params) where key = 'page_location')
from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_202101*`
✔️ 日別のSession数
data:image/s3,"s3://crabby-images/84173/841738a84fdd504ca33de222bd1cfc9ef4df4204" alt="Image in a image block"
with session_id_tb as (
select
event_date,
user_pseudo_id,
(select value.int_value from unnest(event_params) where key = "ga_session_id") as ga_session_id
from
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_202101*`
)
select
event_date,
count(distinct concat(user_pseudo_id, ga_session_id)) session_num
from
session_id_tb
group by event_date
order by event_date
✔️ LTVを確認する
data:image/s3,"s3://crabby-images/bf440/bf440bb81f9801ac73014e016ef40c5829e6fd96" alt="Image in a image block"
select
user_pseudo_id,
sum(user_ltv.revenue) as sum_amt
from
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`
group by user_pseudo_id
order by sum_amt desc
デバイス別の訪問数を確認
data:image/s3,"s3://crabby-images/84925/849252522715a21e35a2cceefa022560a0fe6699" alt="Image in a image block"
select
device.category,
count(*) cnt,
count(*) / sum(count(*)) over() as share
from
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`
group by device.category
order by cnt desc
国別の訪問数を確認
data:image/s3,"s3://crabby-images/9da4f/9da4f9cafee5d05106b8ff314921d618ed92381a" alt="Image in a image block"
select
geo.country,
count(*) cnt,
count(*) / sum(count(*)) over() as share
from
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`
group by geo.country
order by cnt desc
トラフィックソース(チャンネル)別の訪問数を確認
data:image/s3,"s3://crabby-images/10eac/10eac0e66c2a9d3857326029e230a85b3966708b" alt="Image in a image block"
select
traffic_source.medium,
count(*) cnt,
count(*) / sum(count(*)) over() as share
from
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`
group by traffic_source.medium
order by cnt desc
ユーザーID毎のデータを取得
data:image/s3,"s3://crabby-images/b9cd1/b9cd166c9424fe732312b700c1411fb1b875167e" alt="Image in a image block"
with user_tb as(
select
event_date,
event_name,
user_pseudo_id,
(select value.int_value from unnest(event_params) where key = "ga_session_id") as ga_session_id
from
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_202101*`
)
select
user_pseudo_id,
count(distinct concat(user_pseudo_id, ga_session_id)) as session_num,
countif(event_name = "page_view") as pv_num,
countif(event_name = "purchase") as purchase_num,
from
user_tb
group by user_pseudo_id
order by session_num desc
自分のプロジェクトにデータセットを移行する
data:image/s3,"s3://crabby-images/e1398/e13980caa0beb1991571a7c98fb572b49a44d64b" alt="Image in a image block"
data:image/s3,"s3://crabby-images/62dcc/62dcc28ba66648a7ddb8474dae0414d6e7323bc5" alt="Image in a image block"
💡
地域は東京にした方が良いです。
data:image/s3,"s3://crabby-images/e6692/e6692636c1aeb1e4692d6e1f39ee03e6bbd11890" alt="Image in a image block"
create or replace table 'test-420302.test2.access_log_ga4' as
with user_tb as(
select
event_date,
event_name,
user_pseudo_id,
(select value.int_value from unnest(event_params) where key = "ga_session_id") as ga_session_id
from
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_202101*`
)
select
user_pseudo_id,
count(distinct concat(user_pseudo_id, ga_session_id)) as session_num,
countif(event_name = "page_view") as pv_num,
countif(event_name = "purchase") as purchase_num,
from
user_tb
group by user_pseudo_id
order by session_num desc
data:image/s3,"s3://crabby-images/32abd/32abd01a3ee66283df83b1d94019e93a331e2f35" alt="Image in a image block"
Pythonで分析する
Google ColabからBigQueryデータを読み込む
from google.colab import auth
auth.authenticate_user()
ライブラリをインポート
from google.cloud import bigquery
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
プロジェクトを読み込んでデータフレームを作成
project_id = "test-420302"
client = bigquery.Client(project=project_id)
query= """
select *
from `test-420302.test.access_log_ga4`
"""
df = client.query(query).to_dataframe()
df.head()
data:image/s3,"s3://crabby-images/de36c/de36cdca25c2323bb90142aa8f00a2bed3391e62" alt="Image in a image block"
ユーザーIDを削除
df = df.drop("user_pseudo_id", axis=1)
df.head()
data:image/s3,"s3://crabby-images/4d466/4d466fc0ea14c4c79fb3f86c8e76f5bb21b642e6" alt="Image in a image block"
基本的なクラスタ分析をする
## クラスタ分析
km = KMeans(n_clusters=3)
cluster = km.fit_predict(df)
df["cluster"] = cluster
df.head()
data:image/s3,"s3://crabby-images/5ae4f/5ae4f86508064c4ad73d4e46fb78a39cd16be74b" alt="Image in a image block"
クラスタ分析の集計をする
pd.concat([df.groupby("cluster").mean(),df["cluster"].value_counts()], axis=1)
data:image/s3,"s3://crabby-images/ff1ab/ff1ab95aef003f99300c34179a052babe54eb715" alt="Image in a image block"
クラスタ分析に最適な分類グループ数を探す
sse_list = []
for i in range(1,10):
km = KMeans(n_clusters=i)
cluster = km.fit_predict(df)
sse = km.inertia_
sse_list.append(sse)
plt.figure(figsize=(10,5))
plt.plot(range(1,10), sse_list)
plt.show()
data:image/s3,"s3://crabby-images/126c0/126c0b012df5f07d222eb7b1f68d99123518aa90" alt="Image in a image block"
最適なグループ数(5)で再度クラスタ分析
## クラスタ分析
km = KMeans(n_clusters=5)
cluster = km.fit_predict(df)
df["cluster"] = cluster
pd.concat([df.groupby("cluster").mean(),df["cluster"].value_counts()], axis=1)
data:image/s3,"s3://crabby-images/d6c7a/d6c7a7ce07143ed5ca87a337ee419e73c177eb71" alt="Image in a image block"