Data Load Tips for Oracle

Disabling Foreign Key Constraints For Certain Rows

You might want to populate a table with many rows of bogus data. It is possible to disable constraints while you upload the rows, then enable the constraints afterwards without checking the integrity of the uploaded rows.

The following SQL example demonstrates this using applications in regions. Region ID is a lookup on the application. The trick is to enable the constraint using "enable novalidate". This doesn't validate existing rows when the enable happens, but does validate any updates from that point onwards.

-- Table for our regions create table test_regions ( region_id integer constraint test_region_id_pk primary key, description varchar2(100) ); -- create a bunch of regions insert into test_regions values (1,'N'); insert into test_regions values (2,'E'); insert into test_regions values (3,'S'); insert into test_regions values (4,'W'); insert into test_regions values (5,'NE'); insert into test_regions values (6,'SE'); insert into test_regions values (7,'SW'); insert into test_regions values (8,'NW'); -- Table for our applications create table test_apps ( app_id integer constraint test_app_id_pk primary key, region_id constraint test_region_id_fk references test_regions ); -- sequence for app_id create sequence test_app_id_seq start with 1000; -- create a few test apps insert into test_apps values (test_app_id_seq.nextval,1); insert into test_apps values (test_app_id_seq.nextval,8); insert into test_apps values (test_app_id_seq.nextval,8); insert into test_apps values (test_app_id_seq.nextval,6); insert into test_apps values (test_app_id_seq.nextval,7); -- verify referential integrity check (gives ORA-02291) insert into test_apps values (test_app_id_seq.nextval,10); -- create a bunch of dummy apps with bad region ids alter table test_apps modify constraint test_region_id_fk disable; insert into test_apps ( select test_app_id_seq.nextval, 10 from all_objects ); alter table test_apps modify constraint test_region_id_fk enable novalidate; -- verify referential integrity check again (gives ORA-02291) insert into test_apps values (test_app_id_seq.nextval,10);

The above example also demonstrates selecting from ALL_OBJECTS in the insert as a handy way of generating lots of rows (this usually has at least 10,000 rows). Note that you can still update the "bad" rows if you don't touch the "bad" column. For example this will work...

update test_apps set app_id = 20000 where app_id = 10000;

...even though the region_id for app_id 10,000 is invalid (10). However, this won't work:

update test_apps set region_id = 12 where app_id = 10000;

Randomising Row Order

If you want to select out random rows from a table, create a view that uses a hash of the time in 100ths of a second (from this page).

Using the TEST_REGIONS table from the above example:

CREATE OR REPLACE VIEW rnd_test_regions AS SELECT DBMS_UTILITY.GET_HASH_VALUE ( TO_CHAR(dbms_utility.get_time)||description ,2 ,1048576 ) "RANDOM_ORDER" , r.* FROM test_regions r ORDER BY RANDOM_ORDER;

Working with CSV files

As of Oracle 9i you can reference files such as CSV stored outside the database as if they were tables (this feature is called "External Tables"). This is a fairly simplistic introduction.

Let's say you have a CSV of users like this:

10000,michael.hinds@quest.ie,Michael,Hinds 10001,keith.daly@quest.ie,Keith,Daly 10002,tomas.walsh@quest.ie,Tomas,Walsh

The file must be on the database server. For this example we'll say the database OS is Linux and the file is at /home/oracle/users.csv.

First of all we create a link to the directory containing the file:

create or replace directory CSV_DIR AS '/home/oracle';

Now we can create an Oracle table to store the CSV data:

create table user_data_csv ( user_id integer, email varchar2(100), first_names varchar2(100), last_name varchar2(100) ) organization EXTERNAL ( type oracle_loader default directory CSV_DIR access parameters ( records delimited by newline badfile 'users.bad' logfile 'users.log' fields terminated by ',' ) location ('users.csv') ) reject limit unlimited /

Oracle processes the external file every time the table is accessed, so it is horribly inefficient compared to using normal Oracle tables. If you are performing queries on the CSV data and the CSV data is relatively static, it's a good idea to import from the external table to an internal one. For example:

create table user_data as select * from user_data_csv;

You can of course add keys and indexes to the internal table after, if required.

WARNING: External tables cause exports to fail! I drop the external table as soon as I've copied the data to the internal one.


Jan 2006