Generating test data that matches existing data

I’ve had to create test data a number of times, including data for tables that had mandatory foreign keys to existing tables. It was not feasible to just create new master rows for my test data; I wanted to refer to a random sample of existing data; but the code that generates the test data had to perform reasonably well, even though it had to pick out some random values from a very large table.

Solution? A combination of the new 10g SAMPLE operator, and DBMS_RANDOM. To illustrate:


(create a “very large table”)
SQL> create table t as
2 select rownum n, dbms_random.string(‘a’,30) v
3 from all_objects;

Table created.

SQL> select count(*) from t;

COUNT(*)
———-
40981

(get a random sample from the table)
SQL> select n, substr(v,1,30) from t sample(0.01)
2 order by dbms_random.value;

N SUBSTR(V,1,30)
———- ——————————
11852 xSsdmFtGqkymbKCFoZwUzNxpJAPwaV
8973 RGyNjqMfVayKdiKFGvLYuAFYUpIbCw
25295 eJJtoieSWtzUTIZXCbOLzmdmWHHPOy
297 hiTxUPYKzWKAjFRYTTfJSSCuOwGGmG
1924 yZucJWgkFviAIeXiSCuNeUuDjClvxt
40646 wMTumPxfBMoAcNtVMptoPchILHTXJa

6 rows selected.

SQL> set serveroutput on

(Get a single value chosen at random)
SQL> declare
2 cursor cur_t is
3 select n from t sample(0.01)
4 order by dbms_random.value;
5 l number;
6 begin
7 open cur_t;
8 fetch cur_t into l;
9 dbms_output.put_line(l);
10 close cur_t;
11* end;
SQL> /
21098

PL/SQL procedure successfully completed.


My test code would open the cursor, fetch as many values as it needed, and then close it. If the cursor ran out of values (e.g. the sample was too small for the desired amount of test data, which varied), my code just re-opened the cursor to fetch another set of random values from the large table.

The reason I sort the sample by dbms_random.value is so that if I only want one value, it is not weighted towards rows found nearer the start of the table.

Note: If I didn’t really care about the sample being picked at random from throughout the table, I could have just selected from the table “where rownum < n".