forked from mediawiki-client-tools/mediawiki-dump-generator
-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
When I tried to resume my wiki dump after stopping with ctrl+c, it gave me an AttributeError message.
Here is the output from the first run:
wdg --delay 0.5 https://dragonvale.fandom.com/wiki/DragonVale_Wiki --xml --images
Checking API... https://dragonvale.fandom.com/api.php
API is OK: https://dragonvale.fandom.com/api.php
Checking index.php... https://dragonvale.fandom.com/index.php
check_index(): Trying Special:Random...
POST https://dragonvale.fandom.com/index.php {'title': 'Special:Random'} 301
GET https://dragonvale.fandom.com/wiki/Special:Random {'title': 'Special:Random'} 302
GET https://dragonvale.fandom.com/wiki/Orange_Flame_of_Corruption {'title': 'Special:Random'} 200
index.php available probability: 90% (0.9)
index.php is OK
No --path argument provided. Defaulting to:
[working_directory]/[domain_prefix]-[date]-wikidump
Which expands to:
./dragonvale.fandom.com-20250828-wikidump
Undo monkey patch...
#########################################################################
# Welcome to DumpGenerator 4.4.4 by WikiTeam3 (GPL v3) #
# More info at: <https://github.com/saveweb/wikiteam3> #
# Copyright (C) 2011-2025 WikiTeam developers #
#########################################################################
Analysing https://dragonvale.fandom.com/api.php
No suitable dump found at Internet Archive
Trying generating a new dump into a new directory...
https://dragonvale.fandom.com/api.php
Retrieving the XML for every page from the beginning
Retrieving the XML for every page
Loading page titles from namespaces = all
Excluding titles from namespaces = None
38 namespaces found
Retrieving titles in the namespace 0
Retrieving titles in the namespace 1
Retrieving titles in the namespace 2
Retrieving titles in the namespace 3
Retrieving titles in the namespace 4
Retrieving titles in the namespace 5
Retrieving titles in the namespace 6
Delay 0.5s: Session delay: wikiteam3.dumpgenerator.api.page_titles ^C.Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/mwclient/listing.py", line 63, in __next__
item = next(self._iter)
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.13/bin/wikiteam3dumpgenerator", line 8, in <module>
sys.exit(main())
~~~~^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/__init__.py", line 4, in main
DumpGenerator()
~~~~~~~~~~~~~^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/dump/generator.py", line 86, in __init__
DumpGenerator.createNewDump(config=config, other=other)
~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/dump/generator.py", line 112, in createNewDump
generate_XML_dump(config=config, session=other.session)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/dump/xmldump/xml_dump.py", line 146, in generate_XML_dump
doXMLExportDump(config, session, xmlfile, lastPage)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/dump/xmldump/xml_dump.py", line 71, in doXMLExportDump
for title in read_titles(config, session=session, start=start):
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/api/page_titles.py", line 266, in read_titles
getPageTitles(config=config, session=session)
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/api/page_titles.py", line 215, in getPageTitles
for title in titles:
^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/api/page_titles.py", line 47, in getPageTitlesAPI
for page in site.allpages(namespace=namespace):
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/mwclient/listing.py", line 188, in __next__
info = super(GeneratorList, self).__next__()
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/mwclient/listing.py", line 69, in __next__
self.load_chunk()
~~~~~~~~~~~~~~~^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/mwclient/listing.py", line 199, in load_chunk
return super(GeneratorList, self).load_chunk()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/mwclient/listing.py", line 99, in load_chunk
data = self.site.get(
'query', (self.generator, self.list_name),
*[(str(k), v) for k, v in self.args.items()]
)
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/mwclient/client.py", line 300, in get
return self.api(action, 'GET', *args, **kwargs)
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/mwclient/client.py", line 358, in api
info = self.raw_api(action, http_method, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/mwclient/client.py", line 569, in raw_api
res = self.raw_call('api', data, retry_on_error=retry_on_error,
http_method=http_method)
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/mwclient/client.py", line 497, in raw_call
stream = self.connection.request(http_method, url, **args)
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/utils/monkey_patch.py", line 142, in new_send
Delay(msg=self.delay_msg, config=self.config)
~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/cli/delay.py", line 57, in __init__
time.sleep(delay)
~~~~~~~~~~^^^^^^^
KeyboardInterrupt
Undo monkey patch...
And here is the output from the second one where I tried resuming:
wdg --delay 0.5 https://dragonvale.fandom.com/wiki/DragonVale_Wiki --xml --images
Checking API... https://dragonvale.fandom.com/api.php
API is OK: https://dragonvale.fandom.com/api.php
Checking index.php... https://dragonvale.fandom.com/index.php
check_index(): Trying Special:Random...
POST https://dragonvale.fandom.com/index.php {'title': 'Special:Random'} 301
GET https://dragonvale.fandom.com/wiki/Special:Random {'title': 'Special:Random'} 302
GET https://dragonvale.fandom.com/wiki/Valeodon_Dragon/Pedestal {'title': 'Special:Random'} 200
index.php available probability: 90% (0.9)
index.php is OK
No --path argument provided. Defaulting to:
[working_directory]/[domain_prefix]-[date]-wikidump
Which expands to:
./dragonvale.fandom.com-20250828-wikidump
Undo monkey patch...
#########################################################################
# Welcome to DumpGenerator 4.4.4 by WikiTeam3 (GPL v3) #
# More info at: <https://github.com/saveweb/wikiteam3> #
# Copyright (C) 2011-2025 WikiTeam developers #
#########################################################################
Analysing https://dragonvale.fandom.com/api.php
Warning!: "./dragonvale.fandom.com-20250828-wikidump" path exists
There is a dump in "./dragonvale.fandom.com-20250828-wikidump", probably incomplete.
If you choose resume, to avoid conflicts, some parameters you have chosen in the current session will be ignored
and the parameters available in "./dragonvale.fandom.com-20250828-wikidump/config.json" will be loaded.
Do you want to resume (y/n)? y
You have selected: YES
Loading config file to resume...
Resuming previous dump process...
Resuming XML dump from "Main Page" (revision id 2043)
https://dragonvale.fandom.com/api.php
Removing the last chunk of past XML dump: it is probably incomplete.
len(incomplete_segment.encode("utf-8")) returned 3938, while os.path.getsize(filename) returned 3938, so fh.truncate() would be fh.truncate(0), which would be illegal. Something is seriously wrong here!
Adding newline to end of ./dragonvale.fandom.com-20250828-wikidump/dragonvale.fandom.com-20250828-history.xml
WARNING: will try to start the download...
Retrieving the XML for every page
Failed to find title in last trunk XML: b'<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.11/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.11/ http://www.mediawiki.org/xml/export-0.11.xsd" version="0.11" xml:lang="en">\n <siteinfo>\n <sitename>DragonVale Wiki</sitename>\n <dbname>dragonvale</dbname>\n <base>https://dragonvale.fandom.com/wiki/DragonVale_Wiki</base>\n <generator>MediaWiki 1.43.1</generator>\n <case>first-letter</case>\n <namespaces>\n <namespace key="-2" case="first-letter">Media</namespace>\n <namespace key="-1" case="first-letter">Special</namespace>\n <namespace key="0" case="first-letter"/>\n <namespace key="1" case="first-letter">Talk</namespace>\n <namespace key="2" case="first-letter">User</namespace>\n <namespace key="3" case="first-letter">User talk</namespace>\n <namespace key="4" case="first-letter">DragonVale Wiki</namespace>\n <namespace key="5" case="first-letter">DragonVale Wiki talk</namespace>\n <namespace key="6" case="first-letter">File</namespace>\n <namespace key="7" case="first-letter">File talk</namespace>\n <namespace key="8" case="first-letter">MediaWiki</namespace>\n <namespace key="9" case="first-letter">MediaWiki talk</namespace>\n <namespace key="10" case="first-letter">Template</namespace>\n <namespace key="11" case="first-letter">Template talk</namespace>\n <namespace key="12" case="first-letter">Help</namespace>\n <namespace key="13" case="first-letter">Help talk</namespace>\n <namespace key="14" case="first-letter">Category</namespace>\n <namespace key="15" case="first-letter">Category talk</namespace>\n <namespace key="110" case="first-letter">Forum</namespace>\n <namespace key="111" case="first-letter">Forum talk</namespace>\n <namespace key="112" case="first-letter">Data</namespace>\n <namespace key="113" case="first-letter">Data talk</namespace>\n <namespace key="114" case="first-letter">DragonVale World Wiki</namespace>\n <namespace key="115" case="first-letter">DragonVale World Wiki talk</namespace>\n <namespace key="420" case="first-letter">GeoJson</namespace>\n <namespace key="421" case="first-letter">GeoJson talk</namespace>\n <namespace key="500" case="first-letter">User blog</namespace>\n <namespace key="501" case="first-letter">User blog comment</namespace>\n <namespace key="502" case="first-letter">Blog</namespace>\n <namespace key="503" case="first-letter">Blog talk</namespace>\n <namespace key="828" case="first-letter">Module</namespace>\n <namespace key="829" case="first-letter">Module talk</namespace>\n <namespace key="1200" case="first-letter">Message Wall</namespace>\n <namespace key="1201" case="first-letter">Thread</namespace>\n <namespace key="1202" case="first-letter">Message Wall Greeting</namespace>\n <namespace key="2000" case="first-letter">Board</namespace>\n <namespace key="2001" case="first-letter">Board Thread</namespace>\n <namespace key="2002" case="first-letter">Topic</namespace>\n <namespace key="2900" case="first-letter">Map</namespace>\n <namespace key="2901" case="first-letter">Map talk</namespace>\n </namespaces>\n </siteinfo>\n <page>\n <title>Main Page</title>\n <ns>0</ns>\n <id>2043</id>\n <redirect title="DragonVale Wiki"/>\n <revision>\n <id>3903</id>\n <timestamp>2011-09-10T18:44:21Z</timestamp>\n <contributor>\n <username>Default</username>\n <id>49312</id>\n </contributor>\n <comment>moved [[Main Page]] to [[DragonVale Wiki]]: SEO</comment>\n <origin>3903</origin>\n <model>wikitext</model>\n <format>text/x-wiki</format>\n <text bytes="29" sha1="nxiu2vtgfda3mrzza98jtsj2xu9ffo7" xml:space="preserve">#REDIRECT [[DragonVale Wiki]]</text>\n <sha1>nxiu2vtgfda3mrzza98jtsj2xu9ffo7</sha1>\n </revision>\n </page>\n</mediawiki>'
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.13/bin/wikiteam3dumpgenerator", line 8, in <module>
sys.exit(main())
~~~~^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/__init__.py", line 4, in main
DumpGenerator()
~~~~~~~~~~~~~^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/dump/generator.py", line 84, in __init__
DumpGenerator.resumePreviousDump(config=config, other=other)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/dump/generator.py", line 169, in resumePreviousDump
generate_XML_dump(
~~~~~~~~~~~~~~~~~^
config=config,
^^^^^^^^^^^^^^
session=other.session,
^^^^^^^^^^^^^^^^^^^^^^
resume=True,
^^^^^^^^^^^^
)
^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/dump/xmldump/xml_dump.py", line 146, in generate_XML_dump
doXMLExportDump(config, session, xmlfile, lastPage)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/wikiteam3/dumpgenerator/dump/xmldump/xml_dump.py", line 62, in doXMLExportDump
start = lastPage.find('title').text
^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'text'
I think it's because it wasn't able to get all the titles in the first run.
Also unrelated to this, when I try to resume a dump with --resume --path and set a delay to a different value, the new delay value doesn't override the old one.
Metadata
Metadata
Assignees
Labels
No labels