drzzm32 发表于 2014-12-2 21:34:05

【转】【神物】如何在短时间内破解东方新作

本帖最后由 drzzm32 于 2014-12-2 21:34 编辑

可能在墙外,注意。

原地址:https://thpatch.net/wiki/How_to_patch_a_new_Touhou_game_in_a_couple_of_hours

相关站点:https://thpatch.net

How to patch a new Touhou game in a couple of hours

目录 [隐藏]

[*]1 Possible engine changes and their impact on our plans

[*]1.1 The game is a 64-bit executable

[*]1.1.1 Probability
[*]1.2 ZUN suddenly starts to think that .NET is a good idea
[*]1.3 Code
[*]1.4 Formats
[*]2 Development

[*]2.1 Step 0: Collect patch-relevant information about the game
[*]2.2 General support

[*]2.2.1 Step 1: Hash the game
[*]2.2.2 Step 2: Search for breakpoints

[*]2.2.2.1 file_size
[*]2.2.2.2 file_load
[*]2.2.2.3 file_loaded
[*]2.2.2.4 update_poll
[*]2.2.3 Step 3: Dump all data

[*]2.2.3.1 Dump the entire game archive

[*]2.3 Step 4: Upload images
[*]2.4 In-game dialogue

[*]2.4.1 Step 5: Fix the buffer overflow in MSG rendering
[*]2.4.2 Step 6: Investigate the message format
[*]2.4.3 Step 7: Convert message dumps to wiki code

[*]2.4.3.1 Header
[*]2.4.3.2 Footer
[*]2.4.4 Step 8: Additional binary hacks
[*]2.5 Endings

[*]2.5.1 Step 9: Investigate the ending format
[*]2.5.2 Step 10: Convert ending dumps to wiki code
[*]2.6 Spell cards

[*]2.6.1 Step 11: Fix the buffer overflows in spell card rendering and do correct alignment
[*]2.6.2 Step 12: Set up breakpoints
[*]2.6.3 Step 13: Investigate the ECL format
[*]2.6.4 Step 14: Convert spell names to wiki code
[*]2.7 Step 15: Skipgame support
[*]3 Workflow for new full versions of games we already have trial support for

Possible engine changes and their impact on our plansThe game is a 64-bit executableAfter ZUN found the switch to enable SSE2 instructions during the development of https://thpatch.net/w/images/thumb/2/21/Icon_th14.png/16px-Icon_th14.png Double Dealing Character (which threw me off for 2 hours when preparing the binary hacks for the trial version), flipping the "64-bit switch" is the most scary compiler-related setting that is yet to happen.There are two main problems here. Firstly, portability to 64-bit wasn't exactly of interest when writing thcrap, due to its nature of being a just-in-time memory patcher targeted at 32-bit games. As a result, the code may contain quite a number of implicit 32-bit assumptions.Secondly, there are certain parts that would require a full rewrite for 64-bit. The most troubling issue here would be this essential bit of inline assembly, which is the reason the breakpoint system works at all. Microsoft removed support for inline assembly in its 64-bit compilers and tells developers to use compiler intrinsics instead. While that's certainly a nicer way for those people who previously used inline assembly to increase performance, it's terrible for us: We use inline assembly because we must access and manipulate the stack directly, since C itself lacks a way to do this. So how would we rewrite that part then? Additional .asm files (and configuring the build environment to work with them... urgh)? Taking advantage of how function calls work on x64? I don't even know myself.Once there is a working 64-bit release of OllyDbg, I will immediately start porting thcrap to 64-bit. This will probably take ½ - 1 week.ProbabilityUnlike SSE2 which is available in every CPU built after 2003, a 64-bit build is nothing you enable just because you can - the entire operating system with all APIs need to be 64-bit as well.Thus, if we assume the data of Steam's monthly hardware survey to be an accurate representation of OS distribution among gamers, 18.30% (as of July 2014) of ZUN's potential audience would not be able to play the game if it was 64-bit.In contrast, Windows XP now has a market share of 4.94% in this survey, and ZUN is still supporting it as of https://thpatch.net/w/images/thumb/0/09/Icon_th143.png/16px-Icon_th143.png Impossible Spell Card. Note that, although the initial trial and full versions of https://thpatch.net/w/images/thumb/2/21/Icon_th14.png/16px-Icon_th14.png Double Dealing Character did not run on XP, this was fixed in their corresponding updates.I also don't think that people generally buy a 64-bit version of the same 32-bit OS within its lifetime. So all in all, it looks like the tipping point might only be reached once Windows 9 is released...ZUN suddenly starts to think that .NET is a good idea
[*]→ We’re FUCKED

[*]Currently (April 2014), I don't think that .NET support should be part of thcrap's patching scope

Code
[*]Tasofro-style segmented data file loading code (complete data file is never in one contiguous block of memory)

[*]→ We’re FUCKED

[*]… okay, well, if it’s halfway sane (see DynaMarisa), we could do it and have the patch one day later
[*]PC-98 games used to do this to some extent...

Formats
[*]Vastly different message file format

[*]→ We’re FUCKED (for real this time)
[*]Hasn’t happened since th06



[*]Slightly different message file format (e.g. structure sizes have changed) or new opcodes relevant to us

[*]→ Requires a new format descriptor and, for the former, possibly an update to the .msg patcher
[*]Nothing that would stop us
[*]The former hasn’t happened since th09, the latter not since th11

Developmenthttps://thpatch.net/w/images/thumb/e/ef/Plaintext_vs_Images.png/180px-Plaintext_vs_Images.pnghttps://thpatch.net/w/skins/common/images/magnify-clip.png
Plaintext vs. Images



Steps 1 and 2 are required for every official version of a game.Step 0: Collect patch-relevant information about the gameThis is not necessary for the patch itself, only to ease the work of the translators. This needs to be done by other people.
[*]Which characters are there, and where do they appear? (Bosses, midbosses, additional characters talking) Insert these names into MediaWiki:Thpatch-chars.
[*]We need a complete list of spell cards with their correct IDs to verify these later. Thus, get another person to play through the game on every difficulty (just continue through all the way).
[*]Do we have people with good Japanese knowledge wanting to help? If yes: Spend your time on transcribing images, not dialogue or spells or anything else. We're going dump all plaintext in the course of this workflow, but we can't dump text from images.
General supportStep 1: Hash the game
[*]sha256sum th??.exe
[*]yes, SHA-256 because I trust you that little
[*]after this, people can already select the game in the configuration tool
Step 2: Search for breakpointsfile_size
[*]String search for “Decode” should get us near the loading code

[*]if not there anymore (new logging format or more aggressive compiler optimization?), trace back from ReadFile calls
[*]Address which has file name and file size in some register at the same time
[*]if not applicable, add separate file_name breakpoint
file_load
[*]Function call shortly after file_size which returns the fully unpacked and decrypted file

[*]if there is no such thing anymore, we’re FUCKED (this is exactly while we don’t do Tasofro games ourselves)
[*]... except, of course, if the "function call" is merely inlined - see th06, th08 and th09
[*]file_buffer: register that contains the address of the final file buffer
file_loaded
[*]some place near the end of the file_load function
[*]should require no parameters on its own – if the function allocates a new buffer though (th08 and th09 do), specify that in file_buffer here
update_poll
[*]Keyed to BGM switches
[*]String search for “Streming BGM PreLoad” or “bgmfile is not find”
[*]Beginning of that function
Step 3: Dump all dataWith these breakpoints, we now have a on-the-fly data dumper, without having to know anything about the .dat format.So... let's write a small chunk of assembly to dump it all!
[*]Set a debugging breakpoint on file_size

[*]-> file_load is this function. Confirm that, there should be lots of calls to it.
[*]-> file_table should be in some register.
[*]Step out of this function to free up critical sections and stuff.
[*]Search for all calls to KERNEL32.HeapFree - there shouldn't be too many.

[*]-> heap_free is a wrapper around this function, accepting only one parameter.

With these values, search a nice spot, adjust and paste the code somewhere, and jump to it.
Dump the entire game archive (file_dump_loop)
DescriptionIn this example, file_load takes the file name in ECX and the target address for the file size in EDX.
AddressA nice place
Code
[*]be 00000000
[*]83ec 04
[*]89e2
[*]8b0e
[*]85c9
[*]74 13
[*]31c0
[*]50
[*]e8 00000000
[*]50
[*]e8 00000000
[*]83c6 10
[*]eb e5
[*]cc




[*]mov esi,file_table
[*]sub esp,4                  ; allocate a local variable to store the file size
[*]mov edx,esp
[*]mov ecx,dword ptr ds:[esi]
[*]test ecx,ecx               ; end of list?
[*]je short +0x15
[*]xor eax,eax
[*]push eax
[*]call file_load
[*]push eax
[*]call heap_free
[*]add esi,10
[*]jmp short -0x18
[*]int3                     ; that's it, we're done




Step 4: Upload imagesFor the longest time, I was terribly scared of those... but once I did the implementation, it actually turned to be the easiest thing to patch! Because it's also the one thing that requires the most effort to translate, we'll start with them, so that the image editors can immediately get to work.So far, the ANM format only changed with https://thpatch.net/w/images/thumb/c/c1/Icon_th11.png/16px-Icon_th11.png Subterranean Animism, and has since then been constant. Script instructions have come and gone, yeah, but that's nothing we care about.This means that, as soon as we have general dumping support, we'll also have image patching and sprite boundary dumping support. All that it now takes is a simple thanm x on every file. Then, just look through the extracted images to see what can be translated, upload that, and the rest is up to the image editors.In-game dialogueStep 5: Fix the buffer overflow in MSG renderingFor some reason, ZUN sprintfs every line into a fixed-width char buffer. Since the strings don't contain format information (and those that do get re-implemented our way), we just remove this sprintf call, passing the raw text pointer to the rendering function instead.... by doing our safe sprintf. Yes, there are instances where the input is actually a format string - the Music Room (for unheard tracks) or the Spell Practice menu come to mind. The Music Room usually is the fastest way to invoke a relevantsprintf call.Step 6: Investigate the message format
[*]probably hasn’t changed (it didn’t since th11)

[*]if it has, quickly hack thmsg to be able to dump it to a plaintext file

Step 7: Convert message dumps to wiki code
[*]using the old and crappy msg2wiki C++ thing I wrote once
[*]Replace tokens
[*]Add page header and footer boilerplate
[*]Hit the “make available for translation” thingy
[*]Translation is now possible.
Header(Don't forget to set the .msg data format in the main .js file!)<languages /><translate><!--T:0--><!-- Optional message you want to include at the top of the page --></translate>Footer{{SubpageCategory}}{{LanguageCategory|story}}Step 8: Additional binary hacksFor recent games, it is necessary to correctly calculate the width of a text box by using thcrap's own GetTextExtentForFont routine.EndingsNot different from in-game dialogue at all.Step 9: Investigate the ending formatThis ties with ANM for the most consistent Touhou data format. The new one hasn't changed since th10, so this step will most likely be a non-issue.Step 10: Convert ending dumps to wiki codeSee above.Spell cardsOh boy. Ever since th10, this is probably the biggest minefield in Touhou code as far as translation is concerned.Step 11: Fix the buffer overflows in spell card rendering and do correct alignmentThey have been there ever since th06, and we have to get rid of them before putting spell names up for translation. Otherwise, translators can not only crash the game, but also possibly corrupt score.dat, just by inputting a sufficiently long spell name. And the original limits are very strict, especially when taking Greek or Cyrillic UTF-8 text into account.
[*]Add a "translation" for the first spell card consisting of, like, 1 KB of Lorem ipsum, to the sandbox patch. If the game can display this without crashing, rejoice, buy ZUN a beer, and continue with Step 12. Otherwise...

[*]Look for the functions by tracing back TextOutA calls. Normally, it's text output function (4) ← spell name output function (3) ← spell name processing function (2) ← ECL parser (1).
[*]In (4), locate and label the pointer to the default font, used for dialog, spell cards and pretty much anything else that appears in the regular font size. This usually is the default case of the switch at the top. Case 2 should be the font for ruby.

[*]NOP out the sprintf and strlen and the beginning of (3) - we don't need those anyway.
[*]Do the alignment hacks. Shuffle the rest of the function around in such a way as to fit in our call to GetTextExtentForFont.
[*]If necessary, replace the text pointer in (3) that gets passed as a parameter to (4).
Step 12: Set up breakpointsFor spell name patching, we need up to four variables:spell_idThe spell number as given by the ECL fileFound in (1) near the call to (2)spell_id_realThe real spell number, including a difficulty offsetFound shortly after spell_idspell_rankA value between 0 and 3, indicating the difficulty level this spell appears in.This is used in the result or Spell Practice menus where we only have spell_id_real and thus wouldn't be able to go back to the base ID of a particular spell.spell_nameThe register to write the translated spell name to.This breakpoint should be set in (2) near the call to (3). By deferring spell name fetching as long as possible, we don't have to fix all the buffer overflows in (2).Also, keep in mind that cave_exec: false is a thing (although it shouldn't be necessary anymore with deferred fetching)While locating these breakpoints, assign labels to the "ECL parameter getter" functions according to the type of their return value.Step 13: Investigate the ECL formatAt this point, we again depend on Touhou Toolkit; not only for the complete list of spell names with their IDs, but also to create the replacement ECLs for the Skipgame patch.And most likely, the ECL format has changed again, adding a few new opcodes (and other stuff we hopefully don't care about), so that simply specifying the last game will give "id ### was not found in the format table" errors.Thus begins the trial-and-error process of finding new and changed opcodes. For every new instruction:
[*]Look up the raw instruction in the ECL file in a hex editor
[*]Look up the chunk of code that handles the opcode and read the parameter types. If necessary, set a breakpoint, run the game until the instruction is reached, and observe closer how the parameters are used.
[*]Recompile thecl with the new information.
[*]Still errors? Go back to step 1.
And yes, we do specify the correct types in this step. Sure, we can just add "S" everywhere and quickly get that thing to work. But we have the resources to do better, and it's not worth doing crappy work now and annoying someone actually usingour modified thtk later.Step 14: Convert spell names to wiki codeAt least that step is pretty straightforward.
[*]Grep spell card name instructions out of all files, do iconv -f shift-jis -t utf-8, do some sed magic to bring it into a simpler format, and sort it. A bash one-liner.
[*]Look for duplicated spell IDs and set the correct number according to the difficulty, by looking at the flags near the instruction containing the spell name.
[*]Run that corrected dump through ecsgrep2mw.py
[*]Do a bit of search-and-replace for the character names
[*]... and post stuff on the wiki.
Step 15: Skipgame supportWorkflow for new full versions of games we already have trial support forWith an existing trial build, we already have the technical support worked out and it only needs the addresses and small other adjustments to work with the full version. Since the audience will be much larger, we need to be all the more careful here. Thus, we port all the technical support before doing anything else.The new workflow is as follows (changes in bold):
[*]Step 0: Collect patch-relevant information about the game
[*]Step 1: Hash the game
[*]Step 2: Search breakpoints. Immediately search for update_poll and release that to base_tsa, keep the rest in the sandbox
[*]Step 3: Port all existing base_tsa binary hacks and breakpoints to the new build (really, it's better to leave the game untranslated for 15-30 minutes than to risk buffer overflows with some language)
[*]Step 4: Dump all data. Quickly check if any files differ (text.anm probably does) and whether everything ports over correctly
[*]Step 5: Add binary hacks and breakpoints to base_tsa
[*]Step 6: Upload images
[*]Step 7: Investigate the message format
[*]Step 8: Convert message dumps to wiki code
[*]Step 9: Investigate the ending format
[*]Step 10: Convert ending dumps to wiki code
[*]Step 11: Investigate the ECL format
[*]Step 12: Convert spell names to wiki code
[*]Step 13: Skipgame support
[*]Step 14: Go to sleep. All the important stuff is translatable now.
[*]Step 15: Do the kludgy Music Room workaround thingy
[*]Step 16: Use Resource Hacker to translate the resolution dialog, then ship that changed dialog as one large binary hack blob.
[*]Step 17: Leisurely pick out translatable hardcoded strings



凯风快晴 发表于 2014-12-2 22:54:45

一堆英文看上去好厉害的样子


但是……看不懂

lrdcq 发表于 2014-12-3 00:35:48

嗯~常见的文件结构分析法~~~太耗神啦~要是ZUN真弄新结构会死人的~

fancydz 发表于 2014-12-3 19:33:12

小白鼠表示墙内可看(虽然我还是习惯性地翻了出去)。写这么多,外国友人也是蛮拼的

lixiang5628638 发表于 2014-12-4 01:44:07

本帖最后由 lixiang5628638 于 2014-12-4 02:09 编辑

这篇文章是说国外人给东方打补丁时遇到许多问题,比如zun 从windows 32位到windows 64位编程后系统出现的种种改变引发代码的重构。以及zun用.net编程后所带来修改上的不便。

果然windows这玩意不太可靠,微软不坑就不是微软。期待东方原作以后会在其他系统上创作。

另:那幅图挺有意思,说的是外国人在翻译文本及图像时压力山大。(你需要特殊的ps技巧

天象 发表于 2014-12-4 14:26:44

ZUN suddenly starts to think that .NET is a good idea
→ We’re FUCKED

八咫乌空 发表于 2014-12-5 10:38:07

WE 'RE FUCKED(ABOUT .NET)

wz520 发表于 2014-12-5 19:50:03

什么都看不懂就看懂了 We’re FUCKED

是不是说如果 ZUN 以后的新游戏变成 64bit 或用 .NET, thcrap 这个东方通用 patch 制作工具该如何支持ZUN的新游戏?

十二 发表于 2014-12-7 17:19:08

Search - 90%
Debugging engine - 70%
Analysis - 15% :(

或许还要过两年 才会有可靠的OD

windbg 是可以插入代码的问题仅仅是上手难度问题。

VM的复杂度完全和其目的性相关,.毕竟.net初衷不在保护用户的代码。.

drzzm32 发表于 2014-12-7 18:11:44

十二 发表于 2014-12-7 17:19
Search - 90%
Debugging engine - 70%
Analysis - 15% :(


游戏汉化的核心还是解包啊……不清楚东方的打包解包是不是自有算法……
逆向也就是做解包分析……
游戏代码的话,似乎并不重要
.net也就开发速度快点,但是不容易保护代码
页: [1] 2
查看完整版本: 【转】【神物】如何在短时间内破解东方新作