What do your rules look like? editrules 1.8-x answers with the help of igraph

We (Edwin de Jonge and me) have recently updated our editrules package. The most important new features include (beta) support for categorical data. However, in this
post I'm going to show some visualizations we included, made possible by Gabor Csardi's awesome igraph package.

Make sure you run

before trying the code below.

First, let's load editrules' built-in editset:

Here, edits is a data.frame with a "name" column, naming the rules, an "edit" column, with a character representation of the edit rules and a "description" column. The variables have the following meaning: t: turnover, ct total cost, p profit, ch housing costs and cp personnel cost. The rules demand balance accounts to add up (e.g.cost + profit equals turnover) and demand some sanity checks (e.g. profit can not exceed 60% of turnover).

The sanity checks here are completely fictional. To do anything useful with these rules, turn them into an editmatrix.

Although the matrix representation and the textual representation have their merits, it is hard to see which rules are (indirectly) related via shared variables. This may be visualized by plotting the rules in a graph, where each variable and edit is a node, and a variable node is connected with an editnode if the variable occurs in the edits. Just do

to get the plot below.

The round, blue nodes represent variables and the square nodes represent edit rules.
You can see at a glance that everything is connected, so the editmatrix does not block into submatrices. If you want to leave out the variables, and just see how the edits are connected, use

to get

Here, (slightly) thicker lines indicate that more variables are shared. Plotting connections between variables can be done with

Which you can try for yourself. We can do cooler stuff. For example, lets define a faulty record and detect which rules it violates.

So only edit s2 is violated. We can visualize this as well.


Edit graph with violated edits colored

The complexity of error localization shown in a glance. We can try to adapt cp, but this might yield violation of b2 or p3. So here's the central question in error localization: what is the least (weighted) number of variables we have to change such that all violated rules can be obeyed without causing new violations? Editrules was actually written to answer this question. There are several functions performing this task, but here we'll use the low-level errorLocalizer function and plot the result.

Edit graph, with violated edits and variables to adapt colored red

Edit graph, with violated edits and variables to adapt colored red

So, in order to repair the record, the turnover needs to be altered and to make sure no other rules are violated, the profit p has to be altered as well.

If you don't like the colors or want to play with the igraph objects yourself, see the as.igraph or adjacency functions.

Oh, and if you wander which are the possible values to use for p and t, just substitute all the other values in the editmatrix:

The solution set to the above system of equations is the set of possible values for t and p.

This entry was posted in data correction methods, R. Bookmark the permalink.

2 Responses to What do your rules look like? editrules 1.8-x answers with the help of igraph

  1. Robin says:

    Very interesting and potentially extremely helpful. This seems to be limited to a set of linear inequalities. How could if-then type statements be modified into linear inequalities to be used with this package?

    • mark says:

      Thanks for your response.

      You bring up a good point. The thing is that relations like "IF x > 0 THEN y > 0" are not linear. They are however quite common. For example if x is the number of people employed and y the amount of wages payed by an establishment. So far the bad news. The good news is that we are planning to include functionality for such rules, probably for version 3.0 of the package. Versions 1.8-2.0 will have some improved functionality (and probably bugfixes) for categorical edits.

      We are very interested in user's experience with the package so don't hesitate to send us bug reports or suggestions!

Leave a Reply

Your email address will not be published. Required fields are marked *

7 − four =


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">