A relational database like MariaDB (MySQL) is great for storing data that are related in a complex way. However, manually entering data into a database table is a tedious process.
Data entry can be performed using SQL INSERT
statements, and changed using UPDATE
statements. However, this is extremely tedious for a person. Web forms can be created for data entry purposes. However, it is a hassle to maintain the data entry web forms as tables in a database are changed.
Google Sheets, on the other hand, is great for user interaction because a sheet is, essentially, a big table. Most users are familiar with working with tables. However, Google Sheets is not a suitable platform to perform queries because it lacks relational operators (JOIN
).
This article proposes an approach to utilize Google Sheets for data entry/display, but use a relational database in the “backend” to store, organize and potentially perform queries.
In database design, a canonical table has the minimum mandatory columns associated with what each row is supposed to represent. The idea is that additional information for each row can be associated with a row using an associative table.
As an example, let us consider a table called people
where each row represents a person. For simplicity, let us use the following:
id | lastname | firstname |
---|---|---|
1 | Auyeunt | Tak |
2 | chuster | Randy |
In this example, 1 (just an integer value) is the internal ID of Auyeung, Tak
.
Likewise, let define a projects
table:
id | shortnamne |
---|---|
1 | relaTable |
2 | discordBot |
3 | moleculez |
In this example, let us also define a roles
table:
id | shortname |
---|---|
1 | advisor |
2 | manager |
3 | intern |
Let’s say we want Auyeung, Tak
to be an advisor
of the project discordBot
, this is represented by an associative table projectRoleAssignment
that relates people, projects and role instances:
id | personId | projectId | roleId | notes |
---|---|---|---|---|
1 | 1 | 2 | 1 | Tak advises the technical programming aspect. |
In this case, personId
of the table projectRoleAssignment
is called a “foreign key” that refers to the “primary key” of id
in the people
table.
In SQL, a SELECT
statement makes use of JOIN
to relate foreign keys to the respective tables and retrieve items from the referenced tables. For example, to generate a human readable report (as a table), the following SQL statement can be used:
SELECT
people.lastname, people.firstname,
projects.shortname, roles.shortname
FROM
JOIN (people, projects, roles)
projectRoleAssignment ON
= people.id AND
projectRoleAssignment.personId = projects.id AND
projectRoleAssignment.projectId = roles.id projectRoleAssignment.roleId
SQL is a flexible and powerful query language that can be used to extract a variety of information from a database.
At first glance, this should not be too hard. After all, a table really is just a table, and so is a sheet in a spreadsheet.
The difficulties are associated with user interactions with sheets-as-tables. Let us examine each difficulty and a possible solution.
The internal IDs, such as id
of the tables people
, projects
and etc. are really just a counter that keeps counting up. In a database, there is meta data associated with each table to keep track of these IDs. A spreadsheet cannot track an auto-incrementing counter easily.
This difficulty can be fixed by the following:
When a new associative entry (such as the ones in the projectRoleAssignment
table) is created, it is best not to rely on the user to find the IDs of the foreign (referenced) tables. Google Sheets has a data validation feature that can limit the selection of values of cells to a drop down box.
This is a useful feature, but by itself it does not solve the problem. This is because this feature cannot separate what is displayed in a dropdown box versus the data value actually put into a cell. This means there is no way to see only names (of people) in the dropdown box, and enter the associated ID in a cell.
The problem can be solved by adding a column in the sheet that corresponds to a foreign table, in this example, the sheet corresponding to the people
table. This extra column (only on the Google Sheets side) is a computed value based on the id, last name, and first name of each row. For example, it may look like Auyeung, Tak (1)
. To be consistent and systematic, this column can be named personDDO
(DD for dropdown, O for output).
In the sheet corresponding to projectRoleAssignment
, an extra column is used to capture this validated data value. This is column that is extra and only exists in Google Sheet but not in the database table. Upon update of this column, a Google Script function runs to update the actual personId
column by extracting the ID from the human-readable and yet machine-parsable value, such as Auyeung, Tak [1]
. This extra column needs to be populated when a raw database table is brought into the spreadsheet. I think Google Script is needed to initialize, update and utilize the values of this column. This column in projectRoleAssignment
can be named personDDI
(person ID dropdown input).
Google Script is ECMAScript, and as such, it has extensive support of regular expressions to perform this task.
Google Sheets has a trigger onEdit
that is called when a sheet is updated. This trigger is particularly helpful when a new row is inserted, then a script can automatically fill in the auto-incrementing ID based on values in the meta-data sheet.
Conversely, when a foreign table is updated, onEdit
can also update the human-readable but machine-parsable values.
Google Sheets formulae can be directly applied in many cases. ARRAYFORMULA
is also very helpful to populate values in an entire column, combined with COUNTIF
and OFFSET
. Note that some built-in functions may not behave “as expected” when their parameters is a 2D range (returned by OFFSET
). This means some custom functions may need to be written to achieve a clean architecture.
In terms of meta-data, Unless performance is an issue, one method is to use JSON format. This allows the value of one cell to be structured in a flexible way to maintain meta-data for a variety of uses.
The Meta-data Sheet tracks data that is specific to a TDS (Table Data Sheet). Some of this data is non-volatile and needs to be tracked, Some of this data is for convenience so a complex formula only needs to be specified once.
For each TDS, the following items should be tracked in the meta-data sheet:
COUNTA
formula in Google SheetA TDS represents an actual table in SQL. However, it also has some additional columns:
As a user uses the dropdown box of the selector text of a foreign key, the corresponding SQL table cell (actual data) needs to be updated. The extraction of the ID is already mentioned above. However, the actual mechanism is little tricky.
This is because a Sheet formula cannot be used for this purpose, otherwise it will lead to a circular dependency between the actual data cell and the selector text cell of the foreign key. The selector text value can use a Sheet forula (VLOOKUP
), then the other way around needs to be updated using GS.
This update can be triggered using the onEdit
trigger. An event object parameter is passed to the onEdit
handler. Two members of this parameter are useful. range
identifies which cell is changed (the actual data column should be computable from the selector column), and value
specifies the new value. These two members help determine which SQL table data cell to update, and what value to update it with.
Any function definition associated with the App Script (Extensions | App Script) can be invoked in a cell formula.
The trick of writing custom functions is to understand that each parameter can be a single value or an arry of an array.
The following is a very simple function that only works on a singular value:
function test1(x) {
return x+1
}
You can, in fact, use this function in the formula of a cell.
Next, it is important to understand that a custom function can return a two-dimentional array. In this case, the two-dimentional array can be used in an ARRAYFORMULA
to populate the values of multiple cells. For example,
function fourValues()
{return [ [0, 1], [2, 3]]
}
In this case, this specifies, using the cell actually containing the formula as the leftmost column and topmost row, the following cell values:
0 | 1 |
2 | 3 |
If a custom function is to work with the OFFSET
built-in function, then special considerations are required. This is because OFFSET
returns a two-dimentional array. A custom function can also return a two-dimensional array. In order for a custom to determine what to do, it is important to first check whether each of the parameters is a 2D array or a single value. Assuming param
is one of the parameters, Array.isArray(param)
returns true if-and-only-if param
is an array.
Knowing the dimensionality of a parameter does not automatically determine the dimensionality of the return value. In some cases, such as concatenation of strings, it may need to concatenate across cells of the same row, but not across rows. The following shows such a function:
function hconcatenate(x)
{let inArray = (Array.isArray(x) ? x : [ [x] ])
let result = [] // each array element is a row
for (let aRow of inArray)
{let rowCat = ""
for (let aCell of aRow)
{+= aCell
rowCat
}.push([ rowCat ]) // each row of result only has 1 cell
result
}return result
}
However, there are times when the concatenation involves separators as singular items. For example, a formula be specify HCONCATENATE(A1:B2,", ",A3,B4)
. In order to accommodate any number of parameters, the ...
notation can be used to specify a name that captures “the rest of the parameters”. Processing this kind of configuration, gets a little tricky.
function hconcatenate(...x)
{let result = []
let mapResult = x.map(p => { return Array.isArray(p) ? p.length : 1})
let numRows = Math.max(...mapResult)
for (let i = 0; i < numRows; ++i)
{let rowString = ""
for (let p of x)
{+= Array.isArray(p)? ((p.length > i) ? p[i].join("") : "") : p
rowString
}.push([ rowString ])
result
}return result
}
Google Sheets has no easy way to maintain general variables used by Google Script. An easy solution is to use a sheet, potentially hidden, and its cells to maintain values needed by the Google Script associated with a Google Sheets document. The use of JSON makes it possible to store an entire object as the value of one cell.
The following are examples of how to get and set the value of a cell in a Google Sheets document, along with functions to test the get
and set
functions.
function getMyObject(sheetName, row, column)
{return SpreadsheetApp.getActive().getSheetByName(sheetName).getRange(row, column).getValue()
}
function setMyObject(sheetName, row, column, value)
{.getActive().getSheetByName(sheetName).getRange(row, column).setValue(value)
SpreadsheetApp
}
function testGetMyObject()
{console.log(getMyObject("meta-data",2,3))
}
function testSetMyObject()
{console.log(setMyObject("meta-data",2,3,"this is a test"))
}
One of the main difficulties of the concept of Google Sheet-MySQL integration is synchronization. This is particularly difficult when Google Script cannot make a network connection directly to the database server via JDBC.
Express
framework has many extensions to support more convenient data extraction out of a POST request.