[aplusdev] Primitive function Deal?

stevan apter sapter at earthlink.net
Fri Sep 7 19:39:08 EDT 2001


----- Original Message ----- 
From: "Alexander Skomorokhov" <askom at obninsk.com>
To: "stevan apter" <sapter at earthlink.net>; <aplusdev at d13.com>
Sent: Friday, September 07, 2001 1:36 PM
Subject: RE: [aplusdev] Primitive function Deal?


> Stevan,
> 
> I just did one more experiment. I copied on a floppy:
> 17.06.2001  21:23                5.120 k.exe
> 17.06.2001  21:28              175.104 k20.dll
> 07.09.2001  04:04                3.038 olap.k           /* you script
> 
> (totally only 183K!). Put this floppy to a machine (W2000), which never
> heard about K, change to a: and typed in DOS prompt
> 
> k olap.k
> 
> And it works! Terrific.

yeah, isn't a small executable nice?

> > it bears repeating that A+ has everything you need to duplicate
> > (or improve) the k toy.
> I probably try to do this. Are you ready to answer some questions
> if they arise?

i'll try, but my A+ is really really rusty.  (also:  i'm embarrassed
to admit but i can't figure out how to boot A+ from the CD brian & john
so kindly provided.  perhaps someone can tell me how make this happen
(i'm on NT)).

> 
> The first one is (probably stupid one) meaning of:
> "data in the form of a hierarchical star scheme, denormalized".
> Could you please explain this in brief?

i'll do my best, although database specialists might justly accuse me of
mangling standard concepts.

start with a table Fact like this:

Fact:
product location time price qty
------- -------- ---- ----- ---

where records in Fact are transactions.  so a typical record of Fact is:

thumbscrew, usa-new jersey-short hills, 19990909, 10.99, 3

you can have duplicates records in Fact -- they represent transactions.
price and qty are "measure" variables; the other fields are "coordinates"
or "dimension fields".

Fact is denormalized, insofar as 'thumbscrew' can exist many times in 
the product field.

so normalize Fact by creating three "dimension" tables, each of which
has one field and contains no duplicates:

Product:
product
----

Location:
product
-------

Time:
time
----

and then replace the product, location, and time fields in Fact with
indices into the corresponding tables.

the Fact table points to the three dimension tables, so the system is
called a "star scheme" (i guess because it looks like one when a professional
dba type gets the job of documenting it).

now expand each dimension table by breaking out the parts of its single
field into several fields:

Product:
industry color size
-------- ----- ----

Location:
country state city
------- ----- ----

Time
year month day
---- ----- ---

still no duplicate records in the dimension tables, and the Fact indices
are still pointers into the corresponding dimension tables.  now your
dimensions are "hierarchical".  reading right to left in the Product
table, you have size within color within industry.

the dimension tables aren't fully normalized, but you can play the same
game with them.  e.g. create Industry, Color, and Size tables of one field
each:

Industry:
industry
--------

Color:
color
-----

Size:
size
----

and replace the industry, color, and size fields of Product with pointers
into the corresponding dimension tables.  now you've got something professional
db types call a "snowflake scheme".  i think.  at any rate, it's quite an 
elegant structure, and easy to comprehend.

you can imagine generalizing this so you can have highly irregular dimensions.

my olap toy has hierarchical dimensions, but isn't normalized.  e.g. the
first three records of Fact are:

  +Fact[;!3]
((`office;`blue;`small;`Paris;`France;2000;2;2;11;23.0)
 (`office;`green;`medium;`Rome;`Italy;1997;1;27;377;520.0)
 (`office;`purple;`tiny;`Milan;`Italy;1995;0;7;106;522.0))

which would cause Real Database Professionals to gag and run screaming from
the room.  

my starting point in a "serious" implementation for A+ would be to spend some
time designing the data-structures for a full-blown snowflake.  obviously the fields
are float-vectors, pointer fields are integers, and tables are contexts (or are they
slotfillers?  i forget.  in fact, does A+ distinguish floats and ints?  boy it's
been a while.)

then i would probably write a function 'select' which implements

    select agg(measure),...,agg(measure)
    where a.b.c.d=x,...,e.f.g=y
    group by s.t.u,...,v.w

which is the key multi-dimensional selection query.  as arthur whitney has done in
kdb, the dots in the example mean that you chase pointers through dimensions.

if you're going to allow programmatic access to the data, you have to implement
a parser, which i recommend that you defer.  instead, write the select function
as a function of 3 arguments, represented in some way:  a list of agg(measure)'s,
a list of dimension=value's, and a list of axes.

notice that multiple aggregations means that each "cell" contains several values,
and that more than 2 axes means that each physical axis of the display contains
several rows/columns of dimensional information.  have a look at excel's pivot
table to see how this might look.

to start with, i wouldn't try to get fancy by drilling down from the current
selection.  each time the user clicks to drill, i would return to the whole
Fact table and apply the selection query.  but that's a performance detail, so
don't worry about it.

hope this gives you some ideas - i need a beer.


> 
> > and the A+ gui could be used to spectacular effect.  as i recall,
> > there is a primitive matrix widget with labelled rows and columns, which
> olap.k has to
> > fake with k's table widget.
> If it works in "ASCII-mode", then fancy GUI is not a problem at all:-)
> 
> > in some remarks about language benchmarking, arthur suggested
> > that the code for an
> > rdbms is a better measure than incrementing an integer a million
> > times.  that's a lot
> > to ask from most languages, but perhaps an olap toy is not.
> Interesting remark. Well,  incrementing an integer doesn't look like a good
> benchmark any way.
> 
> Regards,
> Sasha.
> 




More information about the apluslist mailing list