Rethinking Data Warehouse Design with BigQuery

Leveraging Google BigQuery capabilities to optimize Data Warehouse Models

Mabrouk GADRI
17 min readApr 15, 2020
Photo by Khara Woods on Unsplash

In the process of data warehouse design, BigQuery gives the data modeler the long overdue freedom to focus on the most common usage of the ingested data without dismissing the less frequent queries submitted by end users and data analysts. Meaning, if end users and analysts can tell us in advance the type of select queries that will dominate in terms of frequency, there would definitely be an opportunity for the data modeler to optimize cost and general query efficency by using native big query features like nesting and benefit from its architectural components like its columnar storage system.

With a traditional RDBMS, this aspect is fated to be relatively overlooked because the technical constraints of a relational database (lack of nesting features, cost of joins, etc.) would force the data modeler to design a schema that has to answer every possible query (both summary and detail ones) and therefore ending up creating the most granular model and sacrificing significant performance and cost gains for the most common use cases.

In the case of BigQuery, there is a big room to improve a Star Schema efficiency that is not found with traditional database systems. One of the most popular recommendations by Google is the denormalization of the data to the most acceptable extent because extracting nested data in BigQeury is far more efficient than joins. But is it the only benefit of denormalization ?

This article explores that route on a concrete traditional sales star schema and hopefully gives some clues to data modelers and data analysts in optimizing their design and usage of a bigquery data warehouse.
There are also some query plan explanations and comparisons that you can skip if you only are interested in Modeling.

Table partitionning and column clustering won’t be discussed in this article as they are common features among data warehouse systems. In the case of bigquery. Partitioning is definitely a good practice for large tables. Clustering is usually beneficial when used carefully but may sometimes slow down certain queries. I will probably discuss the topic in a future post or share an existing blog which adresses it.

The example used is a buffed up Adventure Works sales star schema (shared online by Microsoft):

  • traditional_sales_fact_table has 7 million rows and weighs about 750MB
  • traditional_customer_dimension : 5 million rows / 350MB
  • traditional_product_dimension and traditional_date_dimension are small tables.

This model will be our base of comparison throughtout this article.
It is not a crazy big model but it is enough to work on and draw conclusions about improvement practices that can be applied to bigger tables.

We will also make some assumptions that will drive our modeling. They may not apply all at once to all business cases but they are definitely quite common in the data warehousing world.

Before exploring potential optimization opportunities for our traditional model, let’s assume that end users and analysts need the data warehouse to answer the following business queries :

  1. For a given month, I want to analyze my order amounts and quantities by customer country region and by the Product Category hierarchy
  2. In a given quarter, how much were the sales amount and order count by product category ?
  3. What is the average delivery time by Customer City and product Weight class in a given quarter ?
  4. What is the distribution of my customers by postal code in a given country ?
  5. How much of products A AND B did we sell to Customers X and Y SINCE DATE Z ?

Our next and final assumption is that queries 1, 2 and 3 are of the type that will be frequently sent to the data warehouse and that 4 and 5 are less common.

Based on everything already mentioned, now we can tweak the model in a way that will generally improve query performance, saves cost in an On-demand pricing model and consumes less compute slots in a flat rate pricing Model.

Remodeling the fact table

Business analysts in this scenario seem to be fans of analyzing their sales data at the month level and above. Right there lies an opportunity to reduce the granularity of the facts to the order month level.
At the same time, we don’t want to lose the possibility of drilling down to the order date level. In this case, we can make use of an Array (or what Google calls repeated record) combined with a Struct to store this type of detailed data.

We will get back to the creation script of the fact table after reviewing the dimensions.

Rethinking dimension tables

The new product dimension

If the product category hierarchy is often used rather than the product details (which we will keep of course in a case of a less frequent need), it would be usefull to level up the granularity of the product dimension to the product subcategory and store the lower level in a nested structure, i.e., an Array.

In a real word scenario, the data Modeler will probably have to create an empty table, but in this case we will recreate the product demension and load it in the same query :

CREATE TABLE myDs.new_product_dimension AS SELECT
ProductSubCatgeoryCode || '/' || ProductCatgeoryCode AS ProductHierarchyKey,
ProductSubCatgeoryCode,
ProductSubCatgeory,
ProductCatgeoryCode,
ProductCategory,
ARRAY_AGG( STRUCT( ProductId,
ProductName,
ListPrice,Size,
Weight,
StartDate,
EndDate,
Status
)
) AS ProductData
FROM myDs.traditional_product_dimension
GROUP BY
ProductSubCatgeoryCode, ProductSubCatgeory, ProductCatgeoryCode, ProductCategory;

Notice that, in addition to aggregating product details into a repeated field, the product sorrogate key ProductKey was ditched while keeping ProductId which is the Product business Key or SKU at the detail level of the dimension.
At the top level of the dimension, a concatenation of subcategory code and category code is used as a dimension key.
Both of these practices will simplify and accelerate a lot of queries as we will see later when we will try to answer the 5 business questions.

The new customer dimension

In the Customer table (which is a relatively large one in our case), users don’t typically look for specific attributes like customer Id or his postal code. However, they do aggregate data by attributes like the gender , marital status and city.
These attributes do not form a natural hierachy but in terms of cardinality, the count of every combination of them would be much lower than the traditional customer dimension at the customer key granularity. We can remodel the customer dimension to be like this :

CREATE TABLE myDs.new_customer_dimension 
AS
SELECT SHA1(CONCAT(MaritalStatus, Gender, City, StateProvince, CountryRegion)) AS CustomerAttributesKey,
MaritalStatus,
Gender,
City,
StateProvince,
CountryRegion,

ARRAY_AGG( STRUCT( CustomerId, BirthDate, PostalCode ) ) AS CustomerData
from myDs.traditional_customer_dimension
GROUP BY MaritalStatus, Gender, City, StateProvince, CountryRegion

Specific attributes like CustomerId, BirthDate and PostalCode, which made the customer dimension a large one, are now encapsulated in an array.
The new customer dimension becomes much smaller. So whenever it is used in a join to get back one of its top level attributes, chances are BigQuery will broadcast it instead of shuffling it.
In this case we have no choice but to create a sorrogate key (CustomerAttributesKey) by using a hashing function as there are too many top level attributes which are not corellated with each other.

About the date dimension

The date dimension is a small table. But having to include it every query might be annoying to some folks. (including me)
It may be usefull to keep it in the dataset for third party reporting tools, but for sql queries, I think (and it is a personal preference) that it is slightly better to replace it with
a well documented user-defined-function (UDF) which takes a date as an input and converts it to another level in the calendar hierarchy.
As a side note, using a UDF instead of a broadcast table is slightly better in terms of slot time.

This is an incomplete example of such function written in js. But the same can be accomplished with TIMESTAMP_TRUNC and TIMESTAMP_FORMAT in a SQL UDF.

CREATE OR REPLACE FUNCTION myDs.getDatePart(inputTimestamp TIMESTAMP, outputDatePartName STRING) 
RETURNS STRING LANGUAGE js AS
"""
function getQuarter(d) {
const q = [1,2,3,4];
return "Q"+q[Math.floor(d.getUTCMonth() / 3)].toString();
}
function getYearMonth(d) {
const months = ["01", "02", "03","04", "05", "06", "07", "08", "09", "10", "11", "12"];
return d.getUTCFullYear() + "-" + months[d.getUTCMonth()];
}

var result;

switch (outputDatePartName.toUpperCase()) {
case "YEAR":
result = inputTimestamp.getUTCFullYear();
break;
case "QUARTER":
result = getQuarter(inputTimestamp);
break;
case "YEAR-MONTH":
result = getYearMonth(inputTimestamp);
break;
default :
throw "output format not recognized"
}

return result;
""";

Back to the fact table

The new sales facts will remain one single table that contains both summarized and detailed data. We will also store preaggregated measures to read less off the table, improve performance while answering the majority of the queries.

This script creates and loads the new fact table :

CREATE TABLE myDs.new_sales_fact_table
AS
SELECT
myDs.getDatePart( fact.OrderDate , 'YEAR-MONTH') AS OrderMonth,
trd_prdt.ProductSubCatgeoryCode || '/' || trd_prdt.ProductCatgeoryCode AS ProductHierarchyKey,
SHA1(CONCAT(trd_cust.MaritalStatus,trd_cust.Gender, trd_cust.City,
trd_cust.StateProvince, trd_cust.CountryRegion
)
) AS CustomerAttributesKey,
SUM( fact.SalesAmount) AS SalesAmount,
SUM( fact.OrderQuantity) AS OrderQuantity,
SUM( fact.DiscountAmount ) AS DiscountAmount,
SUM( fact.TaxAmt ) AS TaxAmt,
HLL_COUNT.INIT(SalesOrderNumber) AS SalesOrderNumbers,
ARRAY_AGG(
STRUCT(
trd_prdt.ProductId , trd_cust.CustomerId,
fact.OrderDate,fact.ShipDate, fact.SalesOrderNumber, fact.SalesOrderLineNumber,
fact.OrderQuantity, fact.UnitPrice, fact.SalesAmount, fact.DiscountAmount, fact.TaxAmt
)
) AS SalesDetails
from

myDs.traditional_sales_fact_table AS fact
LEFT JOIN myDs.traditional_product_dimension AS trd_prdt USING (ProductKey)
LEFT JOIN myDs.traditional_customer_dimension AS trd_cust USING (CustomerKey)

GROUP BY
1,2,3;

The new fact table row count of 400K is a fraction of the original one at 7 Million.
The first level of the table is now at the OrderMonth, ProductSubCategory (ProductHierarchyKey) and Customer profile (CustomerAttributesKey).

Additive measures like Sales Amount and Order Quantity are also precomputed in the the first level.
Distinct Count can* also be preaggregated: Bigquery provides a way (using an algorithm called HyperLogLog++) to precompute fixed-length data sketches that can be either directly used or merged later to have approximate distinct counts at higher levels of aggregation with a margin of error down to 1%! In this case we are using it to count the Sales Orders.
So if exact counts are often not a requirement, HyperLogLog++ functions have to be used extensively to compute distinct counts with a much better performance, generally much less processed data and compute slots than a traditional COUNT(DISTINCT).
The HLL_COUNT.INIT function in the query above is used to compute a sketch describing the distribution of SalesOrderNumber that can be later aggregated using HLL_COUNT.MERGE_PARTIAL and HLL_COUNT.EXTRACT. (More on that probably in a future post)

The new model is finally ready and this is what it looks like :

The New Sales Star Schema

As we can see, dimension row counts and table sizes are much lower but fact table size increased a bit due to adding the preaggregated measures and HLL sketches.

Both dimension and fact tables have been altered to nested structures so that the model can be more efficient for the majority of the queries.

The idea of creating preaggregated fact tables is not new in data warehousing, but it is often avoided by Modelers because in a traditional RDBMS they didn’t have to care much about the size of processed data (which is important in BigQuery) and more importantly, creating aggregated fact tables would generate a lot of complexity and uncertainty for analysts as they would be constantly searching for the right set of tables to use and would be forced in some cases to join aggregated and detailed tables to obtain their results. AS we will see later, BigQuery solves these two problems with the elimination of the need to join aggregated and detailed data and opens the gates to much more possibilities like creating nested structures inside dimension tables.

Running the select queries on both models

The following is aimed to test the efficiency of the new model compared to the traditional one.
We will focus on the size of processed data and the used compute solt time.

Q1 — For a given month, I want to analyze my order amounts and quantities by customer country region and by the Product Category hierarchy

Q1 Answer on the original model

SELECT
cust.CountryRegion,
prdt.ProductCategory,
prdt.ProductSubCatgeory,
SUM(fact.SalesAmount) as SalesAmount,
SUM( OrderQuantity) as OrderQty
FROM
myDs.traditional_sales_fact_table fact
LEFT JOIN myDs.traditional_customer_dimension cust USING ( CustomerKey )
LEFT JOIN myDs.traditional_product_dimension prdt USING ( ProductKey )
LEFT JOIN myDs.traditional_date_dimension dat ON dat.date = fact.OrderDate
WHERE dat.YearMonth = '2014-01'
GROUP BY 1,2,3;

Q1 execution results on the original model:

The query executed in 4.4 seconds, processed 415 MB, shuffled 468 MB and consumed 29 seconds of slot time.
The shuffle operation is due to the considerable size of the customer table.
This is confirmed by the “JOIN EACH WITH EACH” that we can see in the join stage :

Q1 Answer on the new model

SELECT
cust.CountryRegion,
prdt.ProductCategory ,
prdt.ProductSubCatgeory,
SUM(fact.SalesAmount) as SalesAmount,
SUM(OrderQuantity) as OrderQty
FROM
myDs.new_sales_fact_table fact
LEFT JOIN myDs.new_customer_dimension cust USING ( CustomerAttributesKey )
LEFT JOIN myDs.new_product_dimension prdt USING ( ProductHierarchyKey )
WHERE fact.OrderMonth = '2014-01'
GROUP BY 1,2,3;

Q1 execution results on the new model

AS expected, a drastic improvement on all levels : 29 MB of processed data (instead of 415 MB) in 0.6 seconds (instead of 4.4 seconds), no shuffle, only broadcasts ( “JOIN EACH WITH ALL”) :

The BigQuery columnar storage allows tables to behave according to the query either as small tables that can be broadcast or as larger tables when the user needs some detailed data.
Even when detailed data is queried, there is still a chance that bigquery will perform a broadcast join since each of the elements of a struct in an array are themselves stored in separate columns !

The advantage of getting rid of surrogate keys when possible

If the data analyst knows well enough his product business, he can directly filter the fact table on the product hierarchy by using the business keys that he knows.
The previous query becomes :

SELECT
cust.CountryRegion,
SPLIT(fact.ProductHierarchyKey,'/')[ORDINAL(1)] AS ProductSubCategory,
SPLIT(fact.ProductHierarchyKey,'/')[ORDINAL(2)] AS ProductCategory,
SUM(fact.SalesAmount) as SalesAmount,
SUM(OrderQuantity) as OrderQty
FROM
myDs.new_sales_fact_table fact
LEFT JOIN myDs.new_customer_dimension cust USING ( CustomerAttributesKey )
WHERE fact.OrderMonth = '2014-01'
GROUP BY 1,2,3;

This query performs slightly better than the previous one. In another scenario where the dimension is bigger, this last query may be much more efficient.

Having composite business keys in the fact table turns out to be very usefull in other scenarios where the user needs to filter his big fact table by anoher big dimension attributes and a good practice is to push part of the dimension filter to the fact table before using other dimension data. I will detail this with an example in a future shorter post :p.

Next Question:

Q2 — In a given quarter, how much were the sales amount and order count by product category ?

This is another summary-type query but with a distinct count in it:

Q2 Answer on the original model

SELECT
prdt.ProductCategory,
SUM(fact.SalesAmount) as SalesAmount,
COUNT(DISTINCT SalesOrderNumber) as ordersCount

FROM
myDs.traditional_sales_fact_table fact
LEFT JOIN myDs.traditional_product_dimension prdt USING ( ProductKey )
LEFT JOIN myDs.traditional_date_dimension dt ON dt.Date = fact.OrderDate

WHERE dt.Year = 2014 AND dt.Quarter = 'Q2'
GROUP BY 1
ORDER BY ordersCount DESC;

Q2 execution results on the original model

274 MB of prcessed data in 2.3 seconds with 7 seconds of slot time consumed :

Q2 Answer on the new model
Here we need to go into fact detailed data but we can use the aggregated part of the product dimension !

SELECT
prdt.ProductCategory,
SUM(sales_det.SalesAmount) as SalesAmount,
COUNT(DISTINCT sales_det.SalesOrderNumber) AS ordersCount

FROM
myDs.new_sales_fact_table fact , UNNEST(fact.SalesDetails) AS sales_det
LEFT JOIN myDs.new_product_dimension prdt USING ( ProductHierarchyKey )

WHERE fact.OrderMonth BETWEEN '2014-04' AND '2014-06'
GROUP BY 1
ORDER BY ordersCount DESC;

Q2 Execution results on the new model

This query also performs significantly better than the one sent to the original model (176 MB vs 274 MB and 3 slot seconds vs 7 slot seconds) since it does not have to scan a large ProductKey column in the fact table but scans ProductHierarchyKey which is much smaller since it stands at the aggregated part of it.

What is also noteworthy is that the “myDs.new_sales_fact_table fact , UNNEST(fact.SalesDetails) AS sales_det” part of the query which is called a corellated cross join almost costed nothing to the engine as it figures directly in a read phase of the plan. Which means that it is a fast select operation and not a traditional join. This is good news for the subsequent more detailed queries on the new model.

Answering Q2 using HyperLogLog functions

At a high level of aggregation, measures such as order count are rarely required to be 100% exact. The previous query can run even much faster and much more efficiently if we settle with 99% exact information about order counts :

SELECT
ProductCategory,
SalesAmount,
HLL_COUNT.EXTRACT(ordersCount) AS ordersCount
FROM
(SELECT
prdt.ProductCategory,
SUM(fact.SalesAmount) as SalesAmount,
HLL_COUNT.MERGE_PARTIAL(fact.SalesOrderNumbers) AS ordersCount

FROM
myDs.new_sales_fact_table fact
LEFT JOIN myDs.new_product_dimension prdt USING ( ProductHierarchyKey )
WHERE fact.OrderMonth BETWEEN '2014-04' AND '2014-06'
GROUP BY 1
)
ORDER BY ordersCount DESC;

Q2 Execution Results on the new model using HLL functions

AS we can see, performance and cost metrics are fractions of the ones we saw with the previous query.
Basically in this example, MERGE_PARTIAL takes the sketches of salesOrderNumbers that were calculated at the granularity of the fact table, merges them to new sketches at the product category granularity and finally the EXTRACT function yields an 99% accurate approximation of the desired distinct count.

Q3 — What is the average delivery time by Customer City and product Weight class in a given quarter ?

This is a query where we have to go into detailed data in both facts and dimensions.

Q3 Answer on the original model

SELECT
cust.City,
IF(prdt.Weight > 10, 'Heavy', 'Not Heavy') AS WeightClass,
AVG(TIMESTAMP_DIFF(fact.ShipDate,fact.OrderDate, DAY)) AS AvgDeliveryTime
FROM
myDs.traditional_sales_fact_table fact
LEFT JOIN myDs.traditional_product_dimension prdt USING ( ProductKey )
LEFT JOIN myDs.traditional_customer_dimension cust USING ( CustomerKey )
LEFT JOIN myDs.traditional_date_dimension dat ON dat.date = fact.OrderDate
WHERE dat.Year = 2014 AND dat.Quarter = 'Q2'
GROUP BY 1,2;

Q3 Execution Results on the original model

301 MB of processed data and 22 slot*seconds. The results are not thrilling because we are joining two relatively large tables : facts with customers.

Q3 Answer on the new model

In this query we are unnesting the fact table and the product table but we are keeping the compact version of the customer table.

SELECT
cust.City,
WeightClass,
AVG(TIMESTAMP_DIFF(sales_det.ShipDate,sales_det.OrderDate, DAY)) AS AvgDeliveryTime
FROM
myDs.new_sales_fact_table fact, UNNEST(fact.SalesDetails) AS sales_det
LEFT JOIN (
SELECT ProductId, IF(prd_det.Weight > 10, 'Heavy', 'Not Heavy') AS WeightClass
FROM myDs.new_product_dimension , UNNEST(ProductData) AS prd_det

) prdt USING ( ProductId )

LEFT JOIN myDs.new_customer_dimension cust USING ( CustomerAttributesKey )

WHERE fact.OrderMonth BETWEEN '2014-04' AND '2014-06'

GROUP BY 1,2;

Q3 Execution Results on the new model

By now, we know that this type of query is more efficient on the new model without having to look at the execution details. This was just an example that shows a join between an unnested fact table and an unnested product dimension table.

Q4 — What is the distribution of my customers by postal code in a given country ?

This is a query on a single customer table that does medium aggregation. Postal code is quite a detail attribute that is not frequently used.

Q4 Answer on the original model

SELECT PostalCode,COUNT(*) 
FROM myDs.traditional_customer_dimension
WHERE CountryRegion = 'Canada'
GROUP BY PostalCode;

Q4 Execution results on the original model

note the 89 MB of processed data.

Q4 Answer on the new model

On the new model, we have to unnest the customer details from the table.

SELECT CustDet.PostalCode,COUNT(*)
FROM myDs.new_customer_dimension, UNNEST(CustomerData) AS CustDet
WHERE CountryRegion = 'Canada'
GROUP BY CustDet.PostalCode;

Q4 Execution results on the new model

60% less of processed data because the country column weighs a lot less in the new model.

This query however needs a slightly more slot time performing the corellated cross join to flatten the set of columns together.

Q5 — How much of products A AND B did we sell to Customers X and Y SINCE DATE Z ?

This an example where an analyst needs to go into the details of his data or where a developer needs to run some unit tests to validate his code.

Q5 Answer on the original model

Everything is joined together :

SELECT
prdt.ProductId,
cust.CustomerId,
SUM(fact.OrderQuantity ) as OrderQuantity
FROM
myDs.traditional_sales_fact_table fact
INNER JOIN myDs.traditional_product_dimension prdt USING (ProductKey)

INNER JOIN myDs.traditional_customer_dimension cust USING (CustomerKey)

WHERE fact.OrderDate >= '2014-01-01'

AND ProductId IN ('TT-R982','TT-M928') AND CustomerId IN ('AW00028411','AW00017645')

GROUP BY 1,2;

Q5 Execution results on the original model

306 MB processed data and almost no shuffle despite the size of the customer table. The BigQuery optimizer chose to filter the dimension tables before executing the join.

Q5 Answer on the new model

This query is not what a developer would write as there is a much direct way to have the answer, but the idea here is to evaluate the cost of a detailed query on the new model.

SELECT
prdt.ProductId,
cust.CustomerId,
SUM(sales_det.OrderQuantity ) as OrderQuantity
FROM
myDs.new_sales_fact_table fact, UNNEST(fact.SalesDetails) AS sales_det
INNER JOIN (
SELECT ProductId
FROM myDs.new_product_dimension , UNNEST(ProductData) AS prd_det
) prdt USING ( ProductId )

INNER JOIN (SELECT CustomerId
FROM myDs.new_customer_dimension, UNNEST(CustomerData) AS cust_det
) cust USING ( CustomerId )

WHERE sales_det.OrderDate >= '2014-01-01' AND prdt.ProductId IN ('TT-R982','TT-M928') AND cust.CustomerId IN ('AW00028411','AW00017645')

GROUP BY 1,2;

Q5 Execution Results on the new model

This time, the original model outperforms the new model by as little as 1% of processed data. This is mainly due to the fact that join columns, customerId and ProductId which are unicode string business keys, need more storage space than CustomerKey and ProductKey wich are of integer type.

The following is a query that benefits from the fact that business keys are stored in the fact table instead of surrogate keys and beats the query on the original model :

SELECT
sales_det.ProductId,
sales_det.CustomerId,
SUM(sales_det.OrderQuantity ) as OrderQuantity
FROM
myDs.new_sales_fact_table fact, UNNEST(fact.SalesDetails) AS sales_det

WHERE sales_det.OrderDate >= '2014-01-01' AND sales_det.ProductId IN ('TT-R982','TT-M928') AND sales_det.CustomerId IN ('AW00028411','AW00017645')

GROUP BY 1,2;

Q5 Execution Results on the new model (V2)

Around 15% less processed data and a much faster result :

Conclusions

- Modeling a data warehouse on BigQuery is a fun process as it supports a multitude of ways to address analytic use cases allowing the developers to equally focus on business requirement and technical choices.

- Setting the granularity of facts and dimensions according to the general usage of the tables turns out to be orders of magnitude more beneficial on general performance and cost regardless of the pricing model.

- Detail level queries perfrom equally well on a nested model as on a flattened model. Exploding a nested stucture is almost seamless in bigquery.

- In addition to avoiding shuffle operations (even if BigQuery executes them in-memory), denormalizing is a cost saver because submitted queries rarely join all of the tables at their maximum detail. So even if some dimension is small, aggregating it and storing its details in an array is not always a bad idea because it would imply that the size of the join column on the fact table side is also reduced.

- Large dimensions, like internet Customers or a big product catalog, function better when they are aggregated to a set of attributes with much less cardinality while keeping too specific attributes in an array of structs. An aggregated dimension is more likely to be broadcast than shuffled.

- Meaningless surrogate keys are generally not a good choice. Business keys or a combination of them are to be preferred instead. Otherwise, the modeler might leave significant efficiency gains on the table.

- It is still better to keep some dimensions in their own separate tables so that dimension data remains manageable, easily updated in one place and shareble accross the data warehouse.

- In a heavily nested model, some queries may be a little more complex to write but it comes down to building the new habits.

- If a fact table has to answer very different needs at the same time, it may be worth it to create different structures of the same fact table and exclusively dedicate each one to the suitable business team.

--

--

Mabrouk GADRI

As a Data Consultant, I focus on using the best available technologies at their full potential.