【转】【神物】如何在短时间内破解东方新作

drzzm32 · 发表于 2014-12-2 21:34:05

本帖最后由 drzzm32 于 2014-12-2 21:34 编辑

可能在墙外，注意。

原地址：https://thpatch.net/wiki/How_to_patch_a_new_Touhou_game_in_a_couple_of_hours

相关站点：https://thpatch.net

How to patch a new Touhou game in a couple of hours[size=0.875em]

目录 [隐藏]

Possible engine changes and their impact on our plansThe game is a 64-bit executable

After ZUN found the switch to enable SSE2 instructions during the development of

Double Dealing Character (which threw me off for 2 hours when preparing the binary hacks for the trial version), flipping the "64-bit switch" is the most scary compiler-related setting that is yet to happen.

There are two main problems here. Firstly, portability to 64-bit wasn't exactly of interest when writing thcrap, due to its nature of being a just-in-time memory patcher targeted at 32-bit games. As a result, the code may contain quite a number of implicit 32-bit assumptions.

Secondly, there are certain parts that would require a full rewrite for 64-bit. The most troubling issue here would be this essential bit of inline assembly, which is the reason the breakpoint system works at all. Microsoft removed support for inline assembly in its 64-bit compilers and tells developers to use compiler intrinsics instead. While that's certainly a nicer way for those people who previously used inline assembly to increase performance, it's terrible for us: We use inline assembly because we must access and manipulate the stack directly, since C itself lacks a way to do this. So how would we rewrite that part then? Additional .asm files (and configuring the build environment to work with them... urgh)? Taking advantage of how function calls work on x64? I don't even know myself.

Once there is a working 64-bit release of OllyDbg, I will immediately start porting thcrap to 64-bit. This will probably take ½ - 1 week.

Probability

Unlike SSE2 which is available in every CPU built after 2003, a 64-bit build is nothing you enable just because you can - the entire operating system with all APIs need to be 64-bit as well.

Thus, if we assume the data of Steam's monthly hardware survey to be an accurate representation of OS distribution among gamers, 18.30% (as of July 2014) of ZUN's potential audience would not be able to play the game if it was 64-bit.

In contrast, Windows XP now has a market share of 4.94% in this survey, and ZUN is still supporting it as of

Impossible Spell Card. Note that, although the initial trial and full versions of

Double Dealing Character did not run on XP, this was fixed in their corresponding updates.

I also don't think that people generally buy a 64-bit version of the same 32-bit OS within its lifetime. So all in all, it looks like the tipping point might only be reached once Windows 9 is released...

ZUN suddenly starts to think that .NET is a good idea

→ We’re FUCKED
- Currently (April 2014), I don't think that .NET support should be part of thcrap's patching scope

Code

Tasofro-style segmented data file loading code (complete data file is never in one contiguous block of memory)
- → We’re FUCKED
  - … okay, well, if it’s halfway sane (see DynaMarisa), we could do it and have the patch one day later
- PC-98 games used to do this to some extent...

Formats

Vastly different message file format
- → We’re FUCKED (for real this time)
- Hasn’t happened since th06

Slightly different message file format (e.g. structure sizes have changed) or new opcodes relevant to us
- → Requires a new format descriptor and, for the former, possibly an update to the .msg patcher
- Nothing that would stop us
- The former hasn’t happened since th09, the latter not since th11

Development

Plaintext vs. Images

Steps 1 and 2 are required for every official version of a game.

Step 0: Collect patch-relevant information about the game

This is not necessary for the patch itself, only to ease the work of the translators. This needs to be done by other people.

Which characters are there, and where do they appear? (Bosses, midbosses, additional characters talking) Insert these names into MediaWiki:Thpatch-chars.
We need a complete list of spell cards with their correct IDs to verify these later. Thus, get another person to play through the game on every difficulty (just continue through all the way).
Do we have people with good Japanese knowledge wanting to help? If yes: Spend your time on transcribing images, not dialogue or spells or anything else. We're going dump all plaintext in the course of this workflow, but we can't dump text from images.

General supportStep 1: Hash the game

sha256sum th??.exe
yes, SHA-256 because I trust you that little
after this, people can already select the game in the configuration tool

Step 2: Search for breakpointsfile_size

String search for “Decode” should get us near the loading code
- if not there anymore (new logging format or more aggressive compiler optimization?), trace back from ReadFile calls
Address which has file name and file size in some register at the same time
if not applicable, add separate file_name breakpoint

file_load

Function call shortly after file_size which returns the fully unpacked and decrypted file
- if there is no such thing anymore, we’re FUCKED (this is exactly while we don’t do Tasofro games ourselves)
- ... except, of course, if the "function call" is merely inlined - see th06, th08 and th09
file_buffer: register that contains the address of the final file buffer

file_loaded

some place near the end of the file_load function
should require no parameters on its own – if the function allocates a new buffer though (th08 and th09 do), specify that in file_buffer here

update_poll

Keyed to BGM switches
String search for “Streming BGM PreLoad” or “bgmfile is not find” [sic]
Beginning of that function

Step 3: Dump all data

With these breakpoints, we now have a on-the-fly data dumper, without having to know anything about the .dat format.

So... let's write a small chunk of assembly to dump it all!

Set a debugging breakpoint on file_size
- -> file_load is this function. Confirm that, there should be lots of calls to it.
- -> file_table should be in some register.
Step out of this function to free up critical sections and stuff.
Search for all calls to KERNEL32.HeapFree - there shouldn't be too many.
- -> heap_free is a wrapper around this function, accepting only one parameter.

With these values, search a nice spot, adjust and paste the code somewhere, and jump to it.

Dump the entire game archive (file_dump_loop)
Description	In this example, file_load takes the file name in ECX and the target address for the file size in EDX.
Address	A nice place
Code	be 00000000 83ec 04 89e2 8b0e 85c9 74 13 31c0 50 e8 00000000 50 e8 00000000 83c6 10 eb e5 cc mov esi,file_table sub esp,4 ; allocate a local variable to store the file size mov edx,esp mov ecx,dword ptr ds:[esi] test ecx,ecx ; end of list? je short +0x15 xor eax,eax push eax call file_load push eax call heap_free add esi,10 jmp short -0x18 int3 ; that's it, we're done

Step 4: Upload images

For the longest time, I was terribly scared of those... but once I did the implementation, it actually turned to be the easiest thing to patch! Because it's also the one thing that requires the most effort to translate, we'll start with them, so that the image editors can immediately get to work.

So far, the ANM format only changed with

Subterranean Animism, and has since then been constant. Script instructions have come and gone, yeah, but that's nothing we care about.

This means that, as soon as we have general dumping support, we'll also have image patching and sprite boundary dumping support. All that it now takes is a simple thanm x on every file. Then, just look through the extracted images to see what can be translated, upload that, and the rest is up to the image editors.

In-game dialogueStep 5: Fix the buffer overflow in MSG rendering

For some reason, ZUN sprintfs every line into a fixed-width char buffer. Since the strings don't contain format information (and those that do get re-implemented our way), we just remove this sprintf call, passing the raw text pointer to the rendering function instead.

... by doing our safe sprintf. Yes, there are instances where the input is actually a format string - the Music Room (for unheard tracks) or the Spell Practice menu come to mind. The Music Room usually is the fastest way to invoke a relevantsprintf call.

Step 6: Investigate the message format

probably hasn’t changed (it didn’t since th11)
- if it has, quickly hack thmsg to be able to dump it to a plaintext file

Step 7: Convert message dumps to wiki code

using the old and crappy msg2wiki C++ thing I wrote once
Replace tokens
Add page header and footer boilerplate
Hit the “make available for translation” thingy
Translation is now possible.

Header(Don't forget to set the .msg data format in the main .js file!)<languages /><translate></translate>Footer{{SubpageCategory}}{{LanguageCategory|story}}Step 8: Additional binary hacks

For recent games, it is necessary to correctly calculate the width of a text box by using thcrap's own GetTextExtentForFont routine.

Endings

Not different from in-game dialogue at all.

Step 9: Investigate the ending format

This ties with ANM for the most consistent Touhou data format. The new one hasn't changed since th10, so this step will most likely be a non-issue.

Step 10: Convert ending dumps to wiki code

See above.

Spell cards

Oh boy. Ever since th10, this is probably the biggest minefield in Touhou code as far as translation is concerned.

Step 11: Fix the buffer overflows in spell card rendering and do correct alignment

They have been there ever since th06, and we have to get rid of them before putting spell names up for translation. Otherwise, translators can not only crash the game, but also possibly corrupt score.dat, just by inputting a sufficiently long spell name. And the original limits are very strict, especially when taking Greek or Cyrillic UTF-8 text into account.

Add a "translation" for the first spell card consisting of, like, 1 KB of Lorem ipsum, to the sandbox patch. If the game can display this without crashing, rejoice, buy ZUN a beer, and continue with Step 12. Otherwise...

Look for the functions by tracing back TextOutA calls. Normally, it's text output function (4) ← spell name output function (3) ← spell name processing function (2) ← ECL parser (1).
In (4), locate and label the pointer to the default font, used for dialog, spell cards and pretty much anything else that appears in the regular font size. This usually is the default case of the switch at the top. Case 2 should be the font for ruby.

NOP out the sprintf and strlen and the beginning of (3) - we don't need those anyway.
Do the alignment hacks. Shuffle the rest of the function around in such a way as to fit in our call to GetTextExtentForFont.
If necessary, replace the text pointer in (3) that gets passed as a parameter to (4).

Step 12: Set up breakpoints

For spell name patching, we need up to four variables:

spell_idThe spell number as given by the ECL fileFound in (1) near the call to (2)spell_id_realThe real spell number, including a difficulty offsetFound shortly after spell_idspell_rankA value between 0 and 3, indicating the difficulty level this spell appears in.This is used in the result or Spell Practice menus where we only have spell_id_real and thus wouldn't be able to go back to the base ID of a particular spell.spell_nameThe register to write the translated spell name to.This breakpoint should be set in (2) near the call to (3). By deferring spell name fetching as long as possible, we don't have to fix all the buffer overflows in (2).Also, keep in mind that cave_exec: false is a thing (although it shouldn't be necessary anymore with deferred fetching)

While locating these breakpoints, assign labels to the "ECL parameter getter" functions according to the type of their return value.

Step 13: Investigate the ECL format

At this point, we again depend on Touhou Toolkit; not only for the complete list of spell names with their IDs, but also to create the replacement ECLs for the Skipgame patch.

And most likely, the ECL format has changed again, adding a few new opcodes (and other stuff we hopefully don't care about), so that simply specifying the last game will give "id ### was not found in the format table" errors.

Thus begins the trial-and-error process of finding new and changed opcodes. For every new instruction:

Look up the raw instruction in the ECL file in a hex editor
Look up the chunk of code that handles the opcode and read the parameter types. If necessary, set a breakpoint, run the game until the instruction is reached, and observe closer how the parameters are used.
Recompile thecl with the new information.
Still errors? Go back to step 1.

And yes, we do specify the correct types in this step. Sure, we can just add "S" everywhere and quickly get that thing to work. But we have the resources to do better, and it's not worth doing crappy work now and annoying someone actually usingour modified thtk later.

Step 14: Convert spell names to wiki code

At least that step is pretty straightforward.

Grep spell card name instructions out of all files, do iconv -f shift-jis -t utf-8, do some sed magic to bring it into a simpler format, and sort it. A bash one-liner.
Look for duplicated spell IDs and set the correct number according to the difficulty, by looking at the flags near the instruction containing the spell name.
Run that corrected dump through ecsgrep2mw.py
Do a bit of search-and-replace for the character names
... and post stuff on the wiki.

Step 15: Skipgame supportWorkflow for new full versions of games we already have trial support for

With an existing trial build, we already have the technical support worked out and it only needs the addresses and small other adjustments to work with the full version. Since the audience will be much larger, we need to be all the more careful here. Thus, we port all the technical support before doing anything else.

The new workflow is as follows (changes in bold):

Step 0: Collect patch-relevant information about the game
Step 1: Hash the game
Step 2: Search breakpoints. Immediately search for update_poll and release that to base_tsa, keep the rest in the sandbox
Step 3: Port all existing base_tsa binary hacks and breakpoints to the new build (really, it's better to leave the game untranslated for 15-30 minutes than to risk buffer overflows with some language)
Step 4: Dump all data. Quickly check if any files differ (text.anm probably does) and whether everything ports over correctly
Step 5: Add binary hacks and breakpoints to base_tsa
Step 6: Upload images
Step 7: Investigate the message format
Step 8: Convert message dumps to wiki code
Step 9: Investigate the ending format
Step 10: Convert ending dumps to wiki code
Step 11: Investigate the ECL format
Step 12: Convert spell names to wiki code
Step 13: Skipgame support
Step 14: Go to sleep. All the important stuff is translatable now.
Step 15: Do the kludgy Music Room workaround thingy
Step 16: Use Resource Hacker to translate the resolution dialog, then ship that changed dialog as one large binary hack blob.
Step 17: Leisurely pick out translatable hardcoded strings

凯风快晴 · 发表于 2014-12-2 22:54:45

一堆英文看上去好厉害的样子

但是……看不懂

lrdcq · 发表于 2014-12-3 00:35:48

嗯~常见的文件结构分析法~~~

太耗神啦~要是ZUN真弄新结构会死人的~

fancydz · 发表于 2014-12-3 19:33:12

小白鼠表示墙内可看~~（虽然我还是习惯性地翻了出去）~~。写这么多，外国友人也是蛮拼的

lixiang5628638 · 发表于 2014-12-4 01:44:07

本帖最后由 lixiang5628638 于 2014-12-4 02:09 编辑

这篇文章是说国外人给东方打补丁时遇到许多问题，比如zun 从windows 32位到windows 64位编程后系统出现的种种改变引发代码的重构。以及zun用.net编程后所带来修改上的不便。

果然windows这玩意不太可靠，微软不坑就不是微软。期待东方原作以后会在其他系统上创作。

另：那幅图挺有意思，说的是外国人在翻译文本及图像时压力山大。（你需要特殊的ps技巧

天象 · 发表于 2014-12-4 14:26:44

ZUN suddenly starts to think that .NET is a good idea
→ We’re FUCKED

八咫乌空 · 发表于 2014-12-5 10:38:07

WE 'RE FUCKED(ABOUT .NET)

wz520 · 发表于 2014-12-5 19:50:03

什么都看不懂就看懂了 We’re FUCKED

是不是说如果 ZUN 以后的新游戏变成 64bit 或用 .NET ， thcrap 这个东方通用 patch 制作工具该如何支持ZUN的新游戏？

十二 · 发表于 2014-12-7 17:19:08

Search - 90%
Debugging engine - 70%
Analysis - 15% :(

或许还要过两年才会有可靠的OD

windbg 是可以插入代码的问题仅仅是上手难度问题。

VM的复杂度完全和其目的性相关，.毕竟.net初衷不在保护用户的代码。.

drzzm32 · 发表于 2014-12-7 18:11:44

十二发表于 2014-12-7 17:19
Search - 90%
Debugging engine - 70%
Analysis - 15% :(

游戏汉化的核心还是解包啊……不清楚东方的打包解包是不是自有算法……
逆向也就是做解包分析……
游戏代码的话，似乎并不重要
.net也就开发速度快点，但是不容易保护代码

		自动登录	找回密码
密码			少女注册中

[编程算法] 【转】【神物】如何在短时间内破解东方新作

点评

浏览过的版块