不好意思是英文版的。如果分嫌少可以在加!请各位达人帮忙,谢了!
每题10分!
一、Data warehouse design
(1) Enumerate three classes of schemas that are popularly used for modeling data warehouses.
(2) Draw a snowflake schema diagram for the Big_University data warehouse which consists of four dimensions: student, course, semester and instructor, and two measures: count, and avg_grade, where avg_grade is the actual grade of student in the lowest concept layer, whereas in the higher concept layers, avg_grade is the average grade for the given student, course, semester and instructor.
(3) Starting with the base cuboid (student, course, semester, instructor), what specific OLAP operations should be performed in order to list the average grade of each student taken the course of “CS”, eg, roll up from “semester” to “year”?
(4) If each dimension contains 5 layers(including all), eg, student < major < status < university < all, then how many cuboids in this data cube ( including base cuboid and apex cuboid)?
二、Data cube computation
Suppose a base cuboid has 3 dimensions, (A, B, C), with the number of cells shown below: |A| = 1,000,000, |B| = 100, and |C| = 1,000. Suppose each dimension is partitioned evenly into 10 portions for chunking.
(1) Assuming each dimension has only one level, draw the complete lattice of the cube.
(2) If each cube cell stores one measure with 4 bytes, what is the total size of the computed cube if the cube is dense?
(3) If the cube is very sparse, describe an effective multidimensional array structure to store the sparse cube.
(4) State the order for computing the chunks in the cube which requires the least amount of space, and compute the total amount of main memory space required for computing the 2-D planes.
三、Mining association rules
Suppose we have the following transactional data.
TID Items_bought
T100 {K, A, D, B}
T200 {D, A, C, E, B}