Evil Multidimensional Arrays

Traditionally, a developer has been conditioned to create temporary databases or temporary working files outside of program code to leverage “database oriented” operations. In Perl, the technique for leveraging arrays defined and stored inside another array is scary for the novice developer.

Introduction

It’s hard to wrap your head around having any type of array as an element inside another array. Using this technique is not impossible but requires some practice to master its use. After a few times of performing some simple applications of the technique such as multiple hashes stored in a list array whose elements are unique persons, it becomes a brain-dead technique to incorporate into your code and becomes an essential in leveraging the utility for sorting/traversing arrays. I will use this technique when the amount of data is light and relatively small and output is generally for utility reporting. Multidimensional arrays does consume memory, however memory is fairly abundant these days and can handle fairly substantive arrays.

Devil in the Details

Foundationally as a review, there are 3 types of variables:

Scalar - A place in memory that stores either a literal or a reference to another place in memory (i.e. another variable).
Array - Also known as a “list array”. An indexed referenced variable that holds one or more scalars. References are numeric and start with element “0”.
Hash - A key/value array whose scalar value is referenced by an associated key value. This array type is unstructured though there is the utility “sort” operation for referencing the key values in alphanumeric order (of which itself is a list array of references that is returned for parsing).

At the outset, the basic approach I use in developing Perl code is to choose one data source that will seed a key, loading into a top level array. I then parse through other data sources and appropriately build off of the initial array structure, appending more arrays as appropriate. The type of array that I construct is dependent on how I need to parse it. With this technique, it helps me to break down the “data model” into consumable pieces and allows me to focus in on a more detail level without losing perspective of the whole “virtual data” landscape.

I am big on sufficient inline documentation in the code without regurgitating the code. This is especially important with multidimensional arrays once you incorporate more than 2 array levels, I find that it is important to insert in comments to document the array structures. This has saved me time in the long term when I have to come back and maintain the code, not to mention avoid horrors for someone else maintaining your code while describing you with a continuous stream of four letter words.

Practical examples where I have incorporated array in arrays include a simple case for storing key/values out of an LDIF with where each distinguished name (dn) is stored as an element in a top level list array. Where there are non-unique object keys in the LDIF (e.g. group members in a posixgroup object class), those hash values become a list array stored as the value in a hash element.

The most complicated example was where I needed to audit the “sudo” rights a user has. I accomplished this by parsing the sudoers file and associating the user, host, group and command alias sets together referentially with utility subfunctions that would dump detail out of the related array structure for an input reference. This involved loading individual array sets according to the alias type. Reporting then became modal for associating the rulesets together logically to report by user and what hosts and what commands they can run or by host and what users were authorized to run sudo and for what commands they were authorized. There were some limitations and assumptions here (e.g. how sudo handles group based rules) that the reporting could not accommodate, but this provided an 80% solution where there was no solution.

Here are a few “how-to’s” that gives detail instruction and examples for multidimensional arrays:

Last modified February 25, 2021: version 2.0 (70b449f)