Just Counting – a tool ecology for personal numeric information

Numbers are part of day-to-day life, from household budgeting to climate change. But, for many, dealing with numeric information is daunting, with multiple step changes in complexity as well as managing information sources. This paper explores this space through an ecosystem of interconnected prototype tools: TSoW interpreting orders of magnitude; calQ, a four-function calculator that shifts seamlessly to micro-spreadsheet; WS2 embedding spreadsheet-like features in web pages; and myData collating and connecting the diverse data sources. Collectively, these offer an envisionment to prompt discussion both of how end-users can more easily deal with numeric information and of the necessary technical infrastructure.


INTRODUCTION
"The power to understand and predict the quantities of the world should not be restricted to those with a freakish knack for manipulating abstract symbols."Brett Victor [33] Numbers are part of day-to-day life, from household budgeting to making sense of global warming and planning academic projects.This was particularly clear during the Covid pandemic when viral spread factors (R) and exponential growth became common parlance.However, there is constant talk of a 'numeracy crisis' [35], which affects the everyday life of many but also has large scale effects.In terms of economics, a report commissioned by the UK charity National Numeracy suggests that poor numeracy costs the UK economy £25 billion a year [22], an impact that will be replicated across the world.However, perhaps the greatest deficit is in terms of governance, as so many vital national and international issues, such as those above, require a numerically informed citizenry.
Of course, this paper does not solve this issue, but it does attempt to explore how appropriate tools could help.Note the focus throughout will be on 'data in the small', the scraps of numerical and structured information we encounter day by day in professional and domestic lives.So, while in some ways this can be seen as data-science-in-the-small it is closer to the world of personalinformation-management (PIM) [4,17], that is the goal is personal numeric information management.
The theoretical roots of this work go back many years, including Kay's analysis of VisiCalc [18], systems such as HyperCard, Victor's idea of 'explorable explanations' [32], and more recent recognition of the need for 'qualitative-quantitative' reasoning [8].This work also relates to end-user development / end-user programming research; however Barricelli et al.'s systematic 2019 literature review [2] suggests a gap in personal numeric information.
This paper presents an ecosystem of interconnected prototype tools (Fig. 1) that explore this space: calQ, a four-function calculator that shifts seamlessly to micro-spreadsheet; myData collating and connecting the diverse data sources we encounter; WS2, a WordPress plugin to embed spreadsheet-like features in web pages; and TSoW (the size of Wales) interpreting unfamiliar orders of magnitude.These tools offer an envisionment to prompt discussion of both the way end-users can more easily deal with numeric information and the background technical infrastructure necessary for this to happen.
The goal is not to promote these tools individually, nor even in composite, but more to use them as a technology probe [15] to help think about these issues and prompt further work.

PRINCIPLES
A number of design principles have driven both the specific tools and the collection as a whole.This principles-focused approach has a long history within HCI and innovation.On the academic side this includes Harold Thimbleby's generative user engineering principles (GUEPs) [29,30] and Thomas Green's cognitive dimensions [5,13].
Commercial design examples include both the Xerox Star [27], which gave rise to the graphical user interface, and, more recently, the popular personal knowledge management tool Obsidian [23].
The first principle, incremental/smooth transitions, is most pervasive.In all aspects of learning, but especially numbers where people have existing anxiety, we need to create smooth paths between levels of knowledge, and avoid barriers that need to be climbed to achieve the next stage.The second principle, the leaves are golden [7], recognises and values how people use many ways to collect and store their information.Rather than creating an all-encompassing system, a more user-centred approach is to accept as far as possible the existing ways people manage their data and work with these.There may need to be small adaptations, but the digital system should do the heavy lifting of dealing with multiple (sometimes weakly conflicting) kinds of heterogeneous data.Such a system may still import data in order to more effectively deal with it, but anything held centrally is a cache; the user's own data is golden.
The third principle is to interrelate, where possible, dual intensional/extensional representations; that is to visualise both the formula/algorithm (intensional representation) and outputs (extensional representation) either simultaneously, or nearly so.This reduces hiddenness -one of Green's earliest cognitive dimensions [13], but also increases comprehension, as intensional and extensional representations allow different forms of critique.For example, in the formula "max(100,min(0,percent))", it is easy to see that we have the right maximum and minimum bounds for a percentage, but we might need to look at some examples of how the transformation works to realise we have min and max precisely the wrong way round (a common mistake the author makes).
Rapid incremental feedback and continuous representation have always been a key feature of direct manipulation [12,26].Where there are dual intensional/extensional representations, this implies the need for immediate effect as seen in spreadsheet recalculations or, albeit slightly indirectly, in notebook-style coding interfaces such as Jupyter notebooks.At a more theoretical level, this is also related to code-data duality, the notion that in some senses computation and its output differ principally in when you look at them -variables are past computation remembered.Papert's Turtle Graphics exploited this principle -the execution of the Logo code was evident in the trace left behind on paper [24].This is equally true for all ages; indeed, it is often when students debug their code, running through it line by line, seeing the variables change, that they really understand its meaning.
Finally, undergirding several of the prototypes is the principle that use is development, or more directly: the best path to good code is through real use cases.For example, Knuth's literate programming [19] arose from his need to manage the code of TeX.More recently, start-ups talk about 'eating our own dogfood' [14], that is ] using their own software.Concrete examples improve communication and make it more likely that systems will fulfil real purposes, chiming with Victor's vision for greater access to mathematics and coding [34].Looking forward, the presence of example-driven design offers many opportunities for AI interventions.

DESIGN FOR A LIGHTWEIGHT NUMERIC ECOSYSTEM 3.1 Process
Figure 1 shows four aspects in the use of numeric and structured information.The ecosystem includes the new tools described in this paper (in italics below) as well as some existing tools.
Capture: Information is encountered in news items, reports, or in domestic settings such as receipts.Currently data items can be captured using the web-clipping research-tool Snip!t [9] and imported from the popular Readwise application.These are, respectively, proof-of-concept for deep integration and use of commercial APIs.They are both text-based, but iVolver [21] demonstrates that it is also possible to recover the underlying data from graphical representations such as pie charts.
Combine: Often one wants to combine data from multiple sources, for example, regional health spending from a government report with population numbers and demographics from census results.Downloading and managing this data is often messy and errorprone.myData allows users to identify different data sources and to describe the way these should be transformed and recombined, whilst automatically maintaining the relation between derived data and sources, both in terms of provenance and live data.
Calculate: In some cases the capture and recording of data is sufficient, but often some form of calculation may be required.This may be as simple as adding up expenditure on certain categories, but may also include more complicated 'back of the envelope' models.Within this aspect calQ deals with small sums and WS2 with complex calculations and richer data.
Comprehend: Both WS2 and TSoW aim to aid the understanding of data published on the web.TSoW is focused on individual numbers themselves and WS2 helps users to both expose and explore the calculations behind published data.
We will now look at the four prototype tools in more detail.

calQ -re-imagining a desk calculator
At first glance, calQ is just a web-based four-function calculator, like many others but with the notable addition of a virtual till roll.It used to be common for desktop calculators in shops and offices to print a record of calculations, making it easy to check past sums and also re-enter numbers that had previously been calculated.This was lost in hand-held calculators with tiny fixed displays, but is possible again with larger laptop and phone screens.In addition to being visually able to scroll back over past computations, the till roll enables past results to be copied into the current calculation using a single click or touch, largely obviating the need for explicit memory operations.
When the till roll gets too full a new roll can be started and the old one is saved, not unlike putting the cut-off paper roll in a drawer or file.
Copied values reference the step they came from (e.g '@5' for step 5).This serves as an aide-mémoire to help trace the path of calculation (intensional/extensional), but also hints at further functionality.Any previous step in the calculation can be given a short name, which is then automatically shown in any (past or future) formulae that reference it (Fig. 2, left), making the till roll look more like a human description (e.g.'amount = cost + cost × tax rate').Past steps can also be edited, resulting in recalculation (immediate effect), so that the four-function calculator becomes a one-dimensional spreadsheet.A past till roll can also be copied to allow a weekly calculation to be redone simply by changing a few values.These saved till rolls can be exported as a spreadsheet (see Fig. 2, right) or transformed into reusable functions.
Note how calQ offers a smooth transition between four-function use and abstracted reusable (fairly simple) code.At each stage the calculation is not abstract; it is working on specific values (intensional/extensional), but it can be abstracted by reuse and so allow generalisation into formulae or code (use as development).

myData -collating your scattered data
Perhaps you have a CSV download of accepted papers with authors and titles from a conference management system that need to go onto a website, or you might repeatedly do the same transformations to the arcane spreadsheet you download from the university finance system.The reader will be able to think of similar examples in domestic and academic life.Typically all this data is in different formats and spread over different machines and cloud services.myData attempts to address these issues by creating a place to record, document, transform and combine the various multiple data sources, many quite small, that we live with day by day.Following the 'leaves are golden' design principle [7], the original sources, whether a spreadsheet on your laptop, or the data from a government website are seen as primary.In essence, myData treats diverse sources as though they were an informal federated database [25].Currently there is no explicit authoring front end, because the raw data is edited at its source and the final destination is usually a web page through WordPress plugins, or bespoke applications using myData services behind the scenes.
Under the hood, myData is built using nestable transformation blocks (Fig. 3).The set of blocks is extensible, with each block configured using JSON as an abstract notation with no fixed concrete .Further blocks allow different data sources to be combined including (should you so wish) with SQL queries.SQL is included partly as proof of principle, but does mean that, for example, a local Excel spreadsheet (accessed using a cloud service such as Dropbox), a Google Doc and a CSV document from an official website can be queried as if they were tables in the same database.

WS2 (workspace)
WS2 has its roots in two use cases.
The first is computational.Say you have a mortgage-calculation spreadsheet, with several modifiable parameters such as inflation and interest rates.This allows experimentation, but, in order to compare different scenarios, you need to copy critical results into cells of a different worksheet, risking copy-paste errors, and requiring extensive re-working if the model changes.You could use the spreadsheet's scripting language, but this is a major step in terms of complexity.It seems as though it should be easy to transform the parameterised worksheet into something like a function.

Figure 4: WS2 web page embedding -simple calculations
The second use case is about presentation.Sometimes one wants to make a web-based calculator, say for wind resistance whilst cycling.Victor's reactive documents are a richer example of this [32].It is possible to do this by writing JavaScript code, potentially making use of a framework such as React.However, this is again a major step in skills for the typical web content author.
WS2 itself consists of two parts.The first, addressing the first use case, is an extensible block-based notation.Like myData's transformation blocks, WS2 uses a JSON-based abstract syntax, but with a plugin scheme for block visualisations and editors.WS2 has a declarative live-updating computational model, like that in a spreadsheet (immediate effect).The standard blocks include a table block that is rather like a spreadsheet table, but slightly more structured in that columns have default calculation formulae, avoiding many of the unexpected problems when formulae are not copied correctly as tables are updated.Tables can be based on existing data (from the myData API) or built as sequences with stopping criteria, allowing fully visible loop-like behaviour whilst still in an overall declarative framework.Note the code-data duality here -in this case a loop and a table.Also, like calQ, this adopts a use is development paradigm, building abstraction bottom-up with examples of actual data at each stage.
The second part of WS2, addressing the second use case, is that it can be embedded into web pages using a JavaScript library or plugin for WordPress (which powers more than 40% of all websites worldwide [11]).Figure 4 shows a portion of a simple calculation page, which allows the reader to explore different levels of Universal Basic Income and its implications for tax rates and the national budget.

TSoW (The Size of Wales)
In the UK it is common for news presenters, when talking about large land areas, to use Wales or the Isle of Wight as units of measurement, for example, "an iceberg the size of Wales".Similar expressions are found elsewhere, for example an Australian documentary described the annual water loss in the Murray Darling River Basin as "two Sydney-harbours full" [1].
However, many numbers we read have no such day-to-day explanations; for example a BBC News web item described ice loss from Greenland as "over 35,000 cubic metres of ice" [20], which is about enough to fill a largish street with ice, hardly world changing.In fact the true amount was a million times more: the reporter had misread "km 3 " in a UNESCO report [31] to mean one thousand cubic metres rather than a cubic kilometre.
Imagine if the reporter or editor were able to see the everyday description alongside the number, both to communicate with the reader (when the figure is correct) and for their own understanding (when the figure is wrong!)TSoW does this.Figure 5 shows it being applied to the text of web material about the damage caused by Hurricane Ida, converting "478 square kilometers" into "1 1 4 times the size of the Isle of Wight" (which has an area of 146.8 sq miles [16]) accompanied by (an envisionment of) a graphical view of the comparison.The informal scale comparisons are configurable; so, for example, the Italian option would read "about twice the size of Elba" (at 223 sq km [28]) as well as give sizes in square kilometres rather than square miles.

TOWARDS A RICHER ECOSYSTEM
We have seen a range of exploratory tools and how they embody the design principles in Section 2. However, as has been emphasised, the most important thing is that these are an experimental ecosystem.The tools serve to illustrate both the space of potential tool support for lightweight numerical information and the way in which they could operate together.We have seen examples of this during the tool descriptions and Figure 1 illustrates some of these interconnection paths.
The connections are of two kinds: extensional feeds -data flowing between tools; and intensional feeds -recipes (formulae) created in one tool being used in another.Furthermore some are live connections, notably for external data in the myData-WS2 path, but others, including all of the intensional flows, are through copy and paste.Ideally more links would be live, at least in the sense that full provenance is retained so that one knows where the data originates.

CONCLUSION AND FURTHER WORK
Each tool has novel features, and opportunities for further work.They vary in maturity from calQ, which is now in an incremental improvement stage, to TSoW, which is a proof of concept.Better authoring is required: in particular myData could use novel query mechanisms such as Query-by-Browsing, or Query-through-Drilldown, as featured at previous AVI [6,10].However, the main goal of this paper has been to consider how the tools together can form more fluid flows, both of data (intensional) and formulae/code (extensional).In general, the former is more mature than the latter, but both require further work.
This paper is intended to open up discussion.If it has prompted you to consider better tools, or ways that existing tools could be connected more deeply and easily, it has done its job.We critically need to open up numeric information to the whole citizenry.

Figure 5 :
Figure5: TSoW in action -the area of 478 square kilometers is presented in units familiar to the user (envisionment using extract from BBC Bitesize web page about Hurricane Ida[3]