[size=0.875em]
Possible engine changes and their impact on our plansThe game is a 64-bit executableAfter ZUN found the switch to enable SSE2 instructions during the development of
Double Dealing Character (which threw me off for 2 hours when preparing the binary hacks for the trial version), flipping the "64-bit switch" is the most scary compiler-related setting that is yet to happen.
There are two main problems here. Firstly, portability to 64-bit wasn't exactly of interest when writing thcrap, due to its nature of being a just-in-time memory patcher targeted at 32-bit games. As a result, the code may contain quite a number of implicit 32-bit assumptions.
Once there is a working
64-bit release of OllyDbg, I will immediately start porting thcrap to 64-bit. This will probably take ½ - 1 week.
ProbabilityThus, if we assume
the data of Steam's monthly hardware survey to be an accurate representation of OS distribution among gamers, 18.30% (as of July 2014) of ZUN's potential audience would not be able to play the game if it was 64-bit.
In contrast, Windows XP now has a market share of 4.94% in this survey, and ZUN is still supporting it as of
Impossible Spell Card. Note that, although the initial trial and full versions of
Double Dealing Character did not run on XP, this was fixed in their corresponding updates.
I also don't think that people generally buy a 64-bit version of the same 32-bit OS within its lifetime. So all in all, it looks like the tipping point might only be reached once Windows 9 is released...
ZUN suddenly starts to think that .NET is a good idea- → We’re FUCKED
- Currently (April 2014), I don't think that .NET support should be part of thcrap's patching scope
Code- Tasofro-style segmented data file loading code (complete data file is never in one contiguous block of memory)
- → We’re FUCKED
- … okay, well, if it’s halfway sane (see DynaMarisa), we could do it and have the patch one day later
- PC-98 games used to do this to some extent...
Formats- Vastly different message file format
- → We’re FUCKED (for real this time)
- Hasn’t happened since th06
- Slightly different message file format (e.g. structure sizes have changed) or new opcodes relevant to us
- → Requires a new format descriptor and, for the former, possibly an update to the .msg patcher
- Nothing that would stop us
- The former hasn’t happened since th09, the latter not since th11
Development
Plaintext vs. Images
Steps 1 and 2 are required for every official version of a game.
Step 0: Collect patch-relevant information about the gameThis is not necessary for the patch itself, only to ease the work of the translators. This needs to be done by other people.
- Which characters are there, and where do they appear? (Bosses, midbosses, additional characters talking) Insert these names into MediaWiki:Thpatch-chars.
- We need a complete list of spell cards with their correct IDs to verify these later. Thus, get another person to play through the game on every difficulty (just continue through all the way).
- Do we have people with good Japanese knowledge wanting to help? If yes: Spend your time on transcribing images, not dialogue or spells or anything else. We're going dump all plaintext in the course of this workflow, but we can't dump text from images.
General supportStep 1: Hash the game- sha256sum th??.exe
- yes, SHA-256 because I trust you that little
- after this, people can already select the game in the configuration tool
Step 2: Search for breakpointsfile_size- String search for “Decode” should get us near the loading code
- if not there anymore (new logging format or more aggressive compiler optimization?), trace back from ReadFile calls
- Address which has file name and file size in some register at the same time
- if not applicable, add separate file_name breakpoint
file_load- Function call shortly after file_size which returns the fully unpacked and decrypted file
- if there is no such thing anymore, we’re FUCKED (this is exactly while we don’t do Tasofro games ourselves)
- ... except, of course, if the "function call" is merely inlined - see th06, th08 and th09
- file_buffer: register that contains the address of the final file buffer
file_loaded- some place near the end of the file_load function
- should require no parameters on its own – if the function allocates a new buffer though (th08 and th09 do), specify that in file_buffer here
update_poll- Keyed to BGM switches
- String search for “Streming BGM PreLoad” or “bgmfile is not find” [sic]
- Beginning of that function
Step 3: Dump all dataWith these breakpoints, we now have a on-the-fly data dumper, without having to know anything about the .dat format.
So... let's write a small chunk of assembly to dump it all!
- Set a debugging breakpoint on file_size
- -> file_load is this function. Confirm that, there should be lots of calls to it.
- -> file_table should be in some register.
- Step out of this function to free up critical sections and stuff.
- Search for all calls to KERNEL32.HeapFree - there shouldn't be too many.
- -> heap_free is a wrapper around this function, accepting only one parameter.
With these values, search a nice spot, adjust and paste the code somewhere, and jump to it.
Dump the entire game archive (file_dump_loop) |
Description | In this example, file_load takes the file name in ECX and the target address for the file size in EDX. |
Address | A nice place |
Code | - be 00000000
- 83ec 04
- 89e2
- 8b0e
- 85c9
- 74 13
- 31c0
- 50
- e8 00000000
- 50
- e8 00000000
- 83c6 10
- eb e5
- cc
- mov esi,file_table
- sub esp,4 ; allocate a local variable to store the file size
- mov edx,esp
- mov ecx,dword ptr ds:[esi]
- test ecx,ecx ; end of list?
- je short +0x15
- xor eax,eax
- push eax
- call file_load
- push eax
- call heap_free
- add esi,10
- jmp short -0x18
- int3 ; that's it, we're done
|
Step 4: Upload imagesFor the longest time, I was terribly scared of those... but once I did the implementation, it actually turned to be the easiest thing to patch! Because it's also the one thing that requires the most effort to translate, we'll start with them, so that the image editors can immediately get to work.
So far, the ANM format only changed with
Subterranean Animism, and has since then been constant. Script instructions have come and gone, yeah, but that's nothing we care about.
This means that, as soon as we have general dumping support, we'll also have image patching and sprite boundary dumping support. All that it now takes is a simple thanm x on every file. Then, just look through the extracted images to see what can be translated, upload that, and the rest is up to the image editors.
In-game dialogueStep 5: Fix the buffer overflow in MSG renderingFor some reason, ZUN sprintfs every line into a fixed-width char buffer. Since the strings don't contain format information (and those that do get re-implemented our way), we just remove this sprintf call, passing the raw text pointer to the rendering function instead.
... by doing our safe sprintf. Yes, there are instances where the input is actually a format string - the Music Room (for unheard tracks) or the Spell Practice menu come to mind. The Music Room usually is the fastest way to invoke a relevantsprintf call.
Step 6: Investigate the message format- probably hasn’t changed (it didn’t since th11)
- if it has, quickly hack thmsg to be able to dump it to a plaintext file
Step 7: Convert message dumps to wiki code- using the old and crappy msg2wiki C++ thing I wrote once
- Replace tokens
- Add page header and footer boilerplate
- Hit the “make available for translation” thingy
- Translation is now possible.
Header(Don't forget to set the .msg data format in the main .js file!)<languages /><translate><!--T:0--><!-- Optional message you want to include at the top of the page --></translate>Footer{{SubpageCategory}}{{LanguageCategory|story}}Step 8: Additional binary hacksFor recent games, it is necessary to correctly calculate the width of a text box by using thcrap's own GetTextExtentForFont routine.
EndingsNot different from in-game dialogue at all.
Step 9: Investigate the ending formatThis ties with ANM for the most consistent Touhou data format. The new one hasn't changed since th10, so this step will most likely be a non-issue.
Step 10: Convert ending dumps to wiki codeSee above.
Spell cardsOh boy. Ever since th10, this is probably the biggest minefield in Touhou code as far as translation is concerned.
Step 11: Fix the buffer overflows in spell card rendering and do correct alignmentThey have been there ever since th06, and we have to get rid of them before putting spell names up for translation. Otherwise, translators can not only crash the game, but also possibly corrupt score.dat, just by inputting a sufficiently long spell name. And the original limits are very strict, especially when taking Greek or Cyrillic UTF-8 text into account.
- Add a "translation" for the first spell card consisting of, like, 1 KB of Lorem ipsum, to the sandbox patch. If the game can display this without crashing, rejoice, buy ZUN a beer, and continue with Step 12. Otherwise...
- Look for the functions by tracing back TextOutA calls. Normally, it's text output function (4) ← spell name output function (3) ← spell name processing function (2) ← ECL parser (1).
- In (4), locate and label the pointer to the default font, used for dialog, spell cards and pretty much anything else that appears in the regular font size. This usually is the default case of the switch at the top. Case 2 should be the font for ruby.
- NOP out the sprintf and strlen and the beginning of (3) - we don't need those anyway.
- Do the alignment hacks. Shuffle the rest of the function around in such a way as to fit in our call to GetTextExtentForFont.
- If necessary, replace the text pointer in (3) that gets passed as a parameter to (4).
Step 12: Set up breakpointsFor spell name patching, we need up to four variables:
spell_idThe spell number as given by the ECL fileFound in (1) near the call to (2)spell_id_realThe real spell number, including a difficulty offsetFound shortly after spell_idspell_rankA value between 0 and 3, indicating the difficulty level this spell appears in.This is used in the result or Spell Practice menus where we only have spell_id_real and thus wouldn't be able to go back to the base ID of a particular spell.spell_nameThe register to write the translated spell name to.This breakpoint should be set in (2) near the call to (3). By deferring spell name fetching as long as possible, we don't have to fix all the buffer overflows in (2).Also, keep in mind that cave_exec: false is a thing (although it shouldn't be necessary anymore with deferred fetching)While locating these breakpoints, assign labels to the "ECL parameter getter" functions according to the type of their return value.
Step 13: Investigate the ECL formatAnd most likely, the ECL format has changed again, adding a few new opcodes (and other stuff we hopefully don't care about), so that simply specifying the last game will give "id ### was not found in the format table" errors.
Thus begins the trial-and-error process of finding new and changed opcodes. For every new instruction:
- Look up the raw instruction in the ECL file in a hex editor
- Look up the chunk of code that handles the opcode and read the parameter types. If necessary, set a breakpoint, run the game until the instruction is reached, and observe closer how the parameters are used.
- Recompile thecl with the new information.
- Still errors? Go back to step 1.
And yes, we
do specify the correct types in this step. Sure, we can just add "S" everywhere and quickly get that thing to work. But we have the resources to do better, and it's not worth doing crappy work now and annoying someone actually
usingour
modified thtk later.
Step 14: Convert spell names to wiki codeAt least that step is pretty straightforward.
- Grep spell card name instructions out of all files, do iconv -f shift-jis -t utf-8, do some sed magic to bring it into a simpler format, and sort it. A bash one-liner.
- Look for duplicated spell IDs and set the correct number according to the difficulty, by looking at the flags near the instruction containing the spell name.
- Run that corrected dump through ecsgrep2mw.py
- Do a bit of search-and-replace for the character names
- ... and post stuff on the wiki.
Step 15: Skipgame supportWorkflow for new full versions of games we already have trial support forWith an existing trial build, we already have the technical support worked out and it only needs the addresses and small other adjustments to work with the full version. Since the audience will be much larger, we need to be all the more careful here. Thus, we port all the technical support before doing anything else.
The new workflow is as follows (changes in bold):
- Step 0: Collect patch-relevant information about the game
- Step 1: Hash the game
- Step 2: Search breakpoints. Immediately search for update_poll and release that to base_tsa, keep the rest in the sandbox
- Step 3: Port all existing base_tsa binary hacks and breakpoints to the new build (really, it's better to leave the game untranslated for 15-30 minutes than to risk buffer overflows with some language)
- Step 4: Dump all data. Quickly check if any files differ (text.anm probably does) and whether everything ports over correctly
- Step 5: Add binary hacks and breakpoints to base_tsa
- Step 6: Upload images
- Step 7: Investigate the message format
- Step 8: Convert message dumps to wiki code
- Step 9: Investigate the ending format
- Step 10: Convert ending dumps to wiki code
- Step 11: Investigate the ECL format
- Step 12: Convert spell names to wiki code
- Step 13: Skipgame support
- Step 14: Go to sleep. All the important stuff is translatable now.
- Step 15: Do the kludgy Music Room workaround thingy
- Step 16: Use Resource Hacker to translate the resolution dialog, then ship that changed dialog as one large binary hack blob.
- Step 17: Leisurely pick out translatable hardcoded strings