@@ -7,7 +7,8 @@ Schema hints are taken from [a post on Meta.StackExchange](http://meta.stackexch
77## Dependencies
88
99 - [ ` lxml ` ] ( http://lxml.de/installation.html )
10- - [ ` psychopg2 ` ] ( http://initd.org/psycopg/docs/install.html )
10+ - [ ` psycopg2 ` ] ( http://initd.org/psycopg/docs/install.html )
11+ - [ ` libarchive-c ` ] ( https://pypi.org/project/libarchive-c/ )
1112
1213## Usage
1314
@@ -18,14 +19,14 @@ Schema hints are taken from [a post on Meta.StackExchange](http://meta.stackexch
1819 ` Badges.xml ` , ` Votes.xml ` , ` Posts.xml ` , ` Users.xml ` , ` Tags.xml ` .
1920 - In some old dumps, the cases in the filenames are different.
2021 - Execute in the current folder (in parallel, if desired):
21- - ` python load_into_pg.py Badges `
22- - ` python load_into_pg.py Posts `
23- - ` python load_into_pg.py Tags ` (not present in earliest dumps)
24- - ` python load_into_pg.py Users `
25- - ` python load_into_pg.py Votes `
26- - ` python load_into_pg.py PostLinks `
27- - ` python load_into_pg.py PostHistory `
28- - ` python load_into_pg.py Comments `
22+ - ` python load_into_pg.py -t Badges `
23+ - ` python load_into_pg.py -t Posts `
24+ - ` python load_into_pg.py -t Tags ` (not present in earliest dumps)
25+ - ` python load_into_pg.py -t Users `
26+ - ` python load_into_pg.py -t Votes `
27+ - ` python load_into_pg.py -t PostLinks `
28+ - ` python load_into_pg.py -t PostHistory `
29+ - ` python load_into_pg.py -t Comments `
2930 - Finally, after all the initial tables have been created:
3031 - ` psql stackoverflow < ./sql/final_post.sql `
3132 - If you used a different database name, make sure to use that instead of
@@ -34,7 +35,25 @@ Schema hints are taken from [a post on Meta.StackExchange](http://meta.stackexch
3435 - ` psql stackoverflow < ./sql/optional_post.sql `
3536 - Again, remember to user the correct database name here, if not ` stackoverflow ` .
3637
37- ## Caveats
38+ ## Loading a complete stackexchange project
39+
40+ You can use the script to download a given stackexchange compressed file from
41+ [ archive.org] ( https://ia800107.us.archive.org/27/items/stackexchange/ ) and load
42+ all the tables at once, using the ` -s ` switch.
43+
44+ You will need the ` urllib ` and ` libarchive ` modules.
45+
46+ If you give a schema name using the ` -n ` switch, all the tables will be moved
47+ to the given schema. This schema will be created in the script.
48+
49+ To load the _ dba.stackexchange.com_ project in the ` dba ` schema, you would execute:
50+ ` ./load_into_pg.py -s dba -n dba `
51+
52+ The paths are not changed in the final scripts ` sql/final_post.sql ` and
53+ ` sql/optional_post.sql ` . To run them, first set the _ search_path_ to your
54+ schema name: ` SET search_path TO <myschema>; `
55+
56+ ## Caveats and TODOs
3857
3958 - It prepares some indexes and views which may not be necessary for your analysis.
4059 - The ` Body ` field in ` Posts ` table is NOT populated by default. You have to use ` --with-post-body ` argument to include it.
0 commit comments