# Excel / LibreOffice Calc – text and data processing

I’ll show how to parse text, extract numbers and calculate averages on an example of performance data. I have performed a dozen of performance scenarios, each repeated four times to get an average. Every sample contained three values: time in seconds of execution of the first step, time in seconds of execution of the second step, and memory consumption in MB. For instance, `17.6 | 7.4 | 687`

meant:

- step 1 took 17.6 seconds
- step 2 took 7.4 seconds
- the application used 687 MB of memory

I pasted the results of four runs into four consecutive cells for every scenario:

How to extract the numbers from those cells to calculate averages?

Table of contents

## Parse the title

Let’s first change the title – it is an output from a `git log --oneline`

command, but all that’s important in this summary is the title alone without the “pick” and hash texts. For example, instead of *“pick 4e28481 Performance: Added performance timers”* we want just *“Performance: Added performance timers”*.

Apparently, this is a simple task, because the text to be removed has always the same length: 4 characters of *“pick”*, 1 for space, 7 characters of the hash and again 1 for space; 13 characters in total. We have to use a function that will take a fragment of a string: `MID`

. It accepts 3 parameters:

- the source string, it will be
`A2`

- from which character the fragment should start: it will be
`14`

(ignore the first 13 characters and start from the 14th) - the length of the fragment: unfortunately we can’t just ask for all text until the end without providing the length. It will be equal to the length of the original text minus the 13 characters we want to be removed:
`LEN(A2) - 13`

- the general formula is:
`LEN(A2) - starting position + 1`

. In our case, it is`LEN(A2) - 14 + 1`

- sometimes it’s possible to simply provide any larger number, e.g.
`LEN(A2)`

or`100`

, but it’s more confusing

I put that formula into the first available cell in the second row: `F2`

. Then I used the fill feature to copy the formula onto other rows:

The last step is overwriting the column `A`

with values from column `F`

. Note that it’s not enough to copy the column `F`

onto `A`

, because there are formulas in the former column – moreover, they are dependent on the latter column. Doing so will result in an `#REF!`

error.

The correct way is: select the `F`

column by clicking the column header, right-click the `A`

column and select **Paste Special**, then **Values** or **Text**:

Finally, I dropped the `F`

column by right-clicking it and selecting **Delete**.

## Extract from a single cell

Let’s first parse one cell: `B2`

. It contains the following string: `17.6 | 7.4 | 687`

. How to extract the three values from it? One of the simplest methods is treating the `|`

characters as delimiters. First, we need to find positions of the two `|`

characters, and then copy the text between them using simple functions:

Information

**Excel** uses `,`

instead of `;`

for separating functions’ parameters for the US locale.

- I store the position of the first
`|`

character in the cell`F2`

:`=FIND("|"; B2)`

- I store the position of the second
`|`

character in the cell`G2`

. To find a second or next the same character or substring, just start looking*after*the position of the previous occurrence (the last parameter):`=FIND("|"; B2; F2+1)`

- Let’s do some math now:

- I store the first value (
`17.6`

) in the cell`H2`

. Because it starts from the beginning of the string, it’s best to use the`LEFT`

function:`=LEFT(B2; F2-2)`

. Note that the result is a string, not a number, so it cannot be used i.a. to calculate the average. To do so, we have to convert string to a number, e.g. by using the`NUMBERVALUE`

function. Its first parameter is the string to be converted to a number, and the second parameter is the decimal point character. Finally, the cell`H2`

should contain this function:`=NUMBERVALUE(LEFT(B2; F2-2); ".")`

- I store the second value (
`7.4`

) in the cell`I2`

. Because it starts in the middle of the string, it’s best to use the`MID`

function:`=MID(B2; F2+2; G2-F2-3)`

. Similarly, it should be converted to a number, so the final formula is:`=NUMBERVALUE(MID(B2; F2+2; G2-F2-3); ".")`

- I store the last value (
`687`

) in the cell`J2`

. Because it ends with the string, I can use the`RIGHT`

function:`=RIGHT(B2; LEN(B2)-G2-1)`

. Again, the formula after conversion is:`=NUMBERVALUE(RIGHT(B2; LEN(B2)-G2-1); ".")`

Now we have all values extracted from a single cell:

## Extract from other cells in a row

What should be done in order to extract the second cell, `C2`

? We could copy the same formulas as in the cells `F2`

..`J2`

, but we would have to edit every one to change the reference from `B2`

to `C2`

. It’s not extremely tedious, but there are two more cells to be parsed. It’s worth learning an other way, which will allow you to copy cells faster.

### The INDIRECT function

The key is to store a part of the reference to the cell that is being processed. In my case, that cell was `B2`

and it occurred in every function. Copying the functions would be faster if I extracted the reference `B2`

or its part to a separate cell.

Let’s add a column before the `F`

column and put the value `B`

into the cell `F2`

.

The function in `G2`

(formerly `F2`

) is `=FIND("|"; B2)`

, and it should be `=FIND("|"; F2 & "2")`

. Observe that the previous reference, `B2`

, is just a concatenation of `B`

and `2`

. Since `B`

is stored in the cell `F2`

, the address is not `F2 & "2"`

. However, `F2 & "2"`

is just a string, not an address. To make an address, we can use the `INDIRECT`

function: `INDIRECT(F2 & "2")`

. So we just need to replace `B2`

address with that function. The new formula in the cell `G2`

will be: `=FIND("|"; INDIRECT(F2 & "2"))`

.

The other formulas will be:

`H2`

:`=FIND("|"; INDIRECT(F2 & "2"); G2+1)`

`I2`

:`=NUMBERVALUE(LEFT(INDIRECT(F2 & "2"); G2-2); ".")`

`J2`

:`=NUMBERVALUE(MID(INDIRECT(F2 & "2"); G2+2; H2-G2-3); ".")`

`K2`

:`=NUMBERVALUE(RIGHT(INDIRECT(F2 & "2"); LEN(INDIRECT(F2 & "2"))-H2-1); ".")`

I know that the formulas got longer now, but they can be easily reused.

Just select the `F`

..`K`

columns, copy to clipboard, select the `L`

column and paste. Select the `R`

column and paste. And select the `X`

column and paste.

Now change the value in `L2`

to `C`

, in `R2`

to `D`

, and in `X2`

to `E`

. All the other formulas were automatically recalculated for the other data.

## Extract from other rows

We have the values extracted from a single row. Select the cells `F2`

..`AC2`

and copy them to the other rows, e.g. by using filling the same as in the above example with titles.

You may notice that every row contains the same values. Where is the problem? That the reference in `INDIRECT(F2 & "2")`

in the second row is changed to `INDIRECT(F3 & "2")`

in the other row. The `"2"`

value, which states for the address’ row number, is not changed. There is a quick way to solve it – use the `ROW`

function which returns the number of current row.

So, I replaced `INDIRECT(F2 & "2")`

with `INDIRECT(F2 & ROW())`

in every formula in the cells `FG`

..`K2`

. I repeated copying those cells to the next columns (see the last two steps in the previous chapter): to `L`

, `R`

, `X`

. Then I copied the second row (`F2`

..`AC2`

) again over the other rows, and now all values are correctly extracted.

## There is too much data

No wonder, all data is visible only after zooming out:

What’s important, not all of this data is relevant. The columns `B`

..`E`

contain raw data. The columns `F`

,`G`

, `H`

, and repeated four times for every parsed cell, contain auxiliary functions. It’s a good idea to hide them all. Select them either as the range `B`

..`H`

or one by one while holding the `CTRL`

key, then right-click one of the selected headers and choose **Hide**.

The data is more concise, but not very clear. Some headings would be helpful.

## Averages

Let’s calculate the averages now.

Put the following formula to the cell `AD2`

: `=AVERAGE(I2; O2; U2; AA2)`

to get the first timer’s average. Copy it to two next cells to calculate the second timer’s average and memory consumption’s average. Copy to the other rows for full data:

Let’s put the data onto a graph.

## Chart

I’ll show creating the chart in **LibreOffice Calc**, but it’s nearly the same in **Excel**.

- Select the column
`A`

, hold the`CTRL`

key and select also`AD`

,`AE`

,`AF`

. - Select the menu
**Insert > Chart**. - Choose
**Column and Line**(1 line) >**Stacked**. - Click the
**Finish**button. - The graph is visible, but because time and memory have a totally different scale, not much is visible. Let’s use a different scale for timers and memory.
- Select the line on the graph.
- Select
**Format > Format Selection**(it’s**Properties**in**Excel**). - Choose the
**Secondary Y axis**option and click the**OK**button. - Move the chart by its border under the data.
- Change the labels to be better visible (they should be radically trimmed in length by the way).

Check the following video guide:

## Summary of functions

`LEFT(text; length)`

– take the leftmost part of a text – the first`length`

characters.`MID(text; from; length)`

– take the middle of a text –`length`

characters from the`from`

position.`RIGHT(text; length)`

– take the rightmost part of a text – the last`length`

characters.`LEN(text)`

– get the length of text.`FIND(search_in; search_for [; from])`

– get the position of the first occurrence of`search_for`

within`search_in`

. Optionally start searching from the`from`

position.`NUMBERVALUE(text; decimal_separator)`

– convert a text into a number.`INDIRECT("A2")`

– convert a string into an address reference.`AVERAGE(range)`

– get the average value from numbers.

## Leave a Reply