PhD in Computer science, data scientist at Consid AB.
Sometimes a it is not possible to query Google Big Query to extract Google Analytics 360 data. There exists an API to access GA data but it is quite undocumented.
What we did here is to create an Azure Function (in python) that reads a CSV file from our Azure Blob Storage and makes requests to https://www.googleapis.com/analytics/v3/data/ga
.
The analytics API accepts the following variables "ids", "start-date", "end-date", "metrics", "dimensions", "filters", "segment"
and due to limited experience in GBQ (and the fact that our CSV already had the metrics, dimensions, filters, and segment
filled) we decided to follow the API way. There are however a few limits with this API.
Google limits the rate of how much data that can be accessed (at least in November 2020). Therefore we had to build some workarounds on how to handle different issues (from the API).
An Azure Function is only allowed a limited execution time (time as in seconds not cpu-cycles). If the Azure Function can’t complete the request within this time the function will return a 502 Bad Gateway
page. To handle this we decided to not to retry when the Google Analytics API fail, instead are we returning that part as empty and let ADF do multiple retries to the Azure Function.
The following code was used in the Azure Function.
import logging
import json
import azure.functions as func
from azure.identity import ManagedIdentityCredential
from azure.keyvault.secrets import SecretClient
from azure.storage.blob import ContainerClient
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd
import requests
KV_URL = "https://<KV>.vault.azure.net/"
KV_ANALYTICS_SECRET = "<SECRET>"
KV_BLOB_SECRET = "<BLOB SECRET>"
SCOPE = "https://www.googleapis.com/auth/analytics.readonly"
def read_blob_xls_file(connection_string, container, blob):
con = ContainerClient.from_connection_string(
conn_str=connection_string, container_name=container
)
df = pd.read_excel(con.get_blob_client(blob).download_blob().readall())
return df
def write_df_to_blob(df, connection_string, container, blob_name):
ContainerClient.from_connection_string(
conn_str=connection_string, container_name=container
).get_blob_client(blob_name).upload_blob(df.to_csv(index=False))
def ga_lookup(df, token):
url = "https://www.googleapis.com/analytics/v3/data/ga"
result = []
variables = (
"ids",
"start-date",
"end-date",
"metrics",
"dimensions",
"filters",
"segment",
)
for _, row in df.iterrows():
payload = {"access_token": token}
for var in variables:
if row[var]:
payload[var] = row[var]
res = requests.get(url, params=payload)
if not res:
logging.error(payload)
logging.error(res.text)
continue
for res_row in res.json().get("rows", []):
result.append(
{
"MetricId": row["componentName"],
"dimensions": row["dimensions"],
"metrics": row["metrics"],
"gAdate": res_row[0],
"amount": res_row[1],
}
)
return pd.DataFrame(result)
def get_ga_token(key_dict: str) -> str:
return ServiceAccountCredentials.from_json_keyfile_dict(
json.loads(key_dict),
scopes=SCOPE,
).get_access_token().access_token
def get_kv_secret(kv_url: str, secret: str) -> str:
return SecretClient(
vault_url=kv_url,
credential=ManagedIdentityCredential(),
).get_secret(secret).value