Front-end Google BigQuery with an 4.5 application - Part 3


This three part article shows how to set up a Google BigQuery project, how to front-end that project with a sample application available for download as a GitHub repository, and how the engineering behind that sample application works.

Part one explained how to set up a Google Cloud Platform account, and how to build a basic BigQuery project.

Part two first described the sample application user experience, and then focused on application configuration and the security set-up.

Part three here explains the engineering behind the application.

I want to say thank you to Eyal Peled, of Google, for taking time to answer some questions I had about BigQuery.

Now, we'll look at the C# code for this application. Since BigQuery is not ODBC compliant, I could not use Object Relational Mapping for this application. Instead, the application used an object created from a custom class - BQObject - to communicate with the BigQuery Project. I made the BQObject code available here as a Github repository share.

In the BQObject class, method returnService() returns a BigqueryService object. The using statement opens the client_secrets JSON file. The MapPath function dynamically maps the virtual path of this file to its physical path, which means the application can go in any directory. Use the Project Name value from the New Project window (Part 1 / image 3) for the ApplicationName variable in the Service block of this method.

The using statement block talks with the Google authentication servers to set the credential object, based on the client_secrets.json file opened in the using statement. In the AuthorizeAsync method call, the second parameter "scopes" tells the authentication servers the specific Google Cloud services the credential will cover. Here, I set it to BigqueryService.Bigquery, to match the API I picked in Part 1, Figure 5. The next block creates and initializes the BigQuery service, which the returnService method ultimately returns.

The returnDT() method queries the BigQuery dataset I built in part one. It returns a C# datatable. The method runs the queryParam parameter as its query against the BigQueryDemoAppDS dataset. The BigQuery response has a JSON structure with metadata that will help parse the response data set. I set the QueryResponse object "response" to the JobsResource object "j" Query method. In the Query method, I used the project name string - in this case "glass-mantra-589" - from the Google Cloud Platform project setup as the second parameter.

The JSON response object metadata includes the result set column names. The for loop in returnDT() extracts them from the JSON response object Schema property and uses them for the column names of the datatable object the returnDT() method will return. In the response object, the column headers might have non-alphanumeric characters. BigQuery can only handle alphanumeric characters in column aliases. Therefore, this application must simulate non-alphanumeric characters, and the replace functions here handle the mappings in the for loop.

The foreach loop pulls the data values from the JSON response object into the datatable object the method returns. It iterates through the response Rows property one row at a time. A BigQuery function returns null if it can't calculate a value. In the loop, the first if-block looks for one row and one column in the result set. If the data value is null, the block places a one-column row, with an information message, in the datatable.

The second if-block handles a quantiles function result set that returned a NULL result set - in other words, no calculated values. The JSON response object will have a null value in the second column and this block places a two-column row, with appropriate messages, in the datatable.

The third if-block handles all single-value result sets and a multiple-value result set from a quantiles function call. For a multiple-value result set, this if-block places the row values, one column at a time, into the datatable. Finally, returnDT() returns the finished datatable.

The sample application has one form - Default.aspx. See the code-behind here as a GitHub repository. We'll look at this code now.

The user sets the payroll dropdown values in the form. A value of 1 maps to 1000 to match the original Census Bureau file values. On the form, gridview GV1 shows the result sets. It handles one or more rows and shows an appropriate message if a BigQuery calculation returns null.

If DDLQuery, the function dropdown, has "QUANTILES EMPLOYEE COUNT" visible, the Page_Load method first sets DDLQuantiles, the quantiles dropdown, visible. The method also shows the associated labels. This line hides DDLQuantiles and the labels for other DDLQuery picks. Next, the method loads DDLQuantiles with text / value pairs from 2 to 100.

The calcStats() method takes the paramString parameter when the btnSubmit_Click() method calls it. The paramString parameter is the assembled query to run against the BigQueryDemoAppDS dataset. The calcStats() method first creates localBQObject, a BQObject object. In a try-catch block, calcStats() calls the localBQObject.returnDT() method, passing paramString as the argument. It sets the GV1 datasource to this call. If the try block throws an error, the catch block formats the error text and shows it in a JavaScript alert.

The btnSubmit_Click() method runs when the user clicks the submit button. In the queryString variable, it builds the SELECT statement the application runs against the BigQueryDemoAppDS dataset, based on the control values the user picked. The switch statement maps the selected value of the DDLQuery dropdown to the correct BigQuery function. BigQuery doesn't like spaces and certain other characters in column aliases. Therefore, in the column name aliases in the switch statement cases, I used underscores for spaces and I mapped the characters I wanted over to Unicode equivalents like this:

    $ <-> x24
    ( <-> x28
    ) <-> x29
     = <-> x3d
   ^ <-> x5e

The BQObject returnDT() method maps the Unicode strings back to the intended characters. Note that BigQuery expects to see a

{dataset name}.{table name}

table name syntax in the FROM clause. As the btnSubmit_Click() method builds the query string, method endOfQueryString() adds " AND" to the end of query string.

The foreach loop pulls non-default values from the DropDownList controls, and adds them to queryString. The endOfQueryString function first adds " AND" to queryString. For the ZIP dropdown list DDLZIP, the loop adds a "ZIP LIKE" clause that looks at the first character in the ZIP column. For the other dropdowns, it adds a BETWEEN clause. If the user never changed the value of a specific dropdown from its default "ANY" value, the loop ignores that dropdown.

If the user left all the dropdown values unchanged, queryString would end with "WHERE ". This would crash BigQuery, so the last if statement in btnSubmit_click() substrings "WHERE " out of queryString if necessary. Finally, btnSubmit_Click() calls calcStats to fill the GV1 gridview control.

The DDLQueryClause_SelectedIndexChanged method hides or unhides the Quantiles dropdown DDLQuantiles, and the associated labels. If the user picks the quantiles function in that dropdown, this method sets these three controls visible, or hidden otherwise.

BigQueryDemoApp shows how to easily front-end a BigQuery dataset with an application. BigQuery is a powerful, flexible product that offers an affordable way to crunch large data volumes.