data-mining games

Strangelove

AI Researcher
AKA
hitoshura
with the pc release of rebirth on the horizon, i would be interested in using the tools mentioned here to extract the text from the pc version and have an easily readable copy of the script(s) and just poke around and see if there's anything interesting there

in the distant past i messed around using other tools to see what i could find (using a iirc ffx tool i managed to find some unused voice lines in 'digital devil saga' that are just text in the actual game). for a few nds games like ffxii revenant wings and xenosaga 1&2 i managed to extract the text, although that was because they were just easily accessible text files and i didn't need to mess around too much. at one point i managed to extract some of the localisation files from uncharted 1 but i don't remember the process i used now, it might have been by trying that same ffx tool who's name i don't remember, but i might have burnt what i got onto a dvd somewhere (that's how long ago this was, people were still burning dvds)

from the above examples it should be clear that i'm mainly interested in getting the language related data out of the games rather than extracting models or romhacking (which i'd be interested in primarily to learn how to access the text). in my limited reading so far one hurdle for a lot of games, particularly older ones, is the way different developers would use different file types to store things so the method that works for one game wouldn't necessarily work for another. i don't know if this is easier with the rise of more devs using 3rd party engines like unreal and unity rather than proprietary ones. that's my wishful thinking, because trying to word out some of this stuff is way over my simple little head lol. you can't tell me i need to use a hex editor, i am a stupid baby

stuff i was going to be experimenting with first:

  • death stranding: i have a pc copy i need to reinstall and then look around and hopefully, it's just sitting in the open somewhere
  • front mission 1st: was going to be looking at the nds release which i was under the impression included additional story content that isn't in the recent remake, but i might wrong on that. if the nds version is too much for me to work out i might look at the remake in the hopes it will be simpler

and a kind of wish list of varying difficulties:

  • final fantasy xvi: quick googling tells me this might be a unique engine rather than unreal like other recent square enix titles so i don't know if other tools will work there
  • xenosaga 1/2/3: i am apprehensive about older disc-based games like these because i get the impression they need more skill than just poking around a pc game's files until you find the localisation stuff, but it would be super cool to me personally to get all the text out of these
 

Strangelove

AI Researcher
AKA
hitoshura
unrelated to any of the above games, i installed alien isolation on a whim and all the localisation files are just. right there in easily identifiable and accessable .txt files

thank you lord, why can't everyone just do this :sadpanda:
 

Strangelove

AI Researcher
AKA
hitoshura
i don't know why i thought newer games would be easier to work with when you don't know anything about what you're doing, this is still too difficult for me lol. i have tried various tools (umodel, watto's game extractor, another one but i'm forgetting the name now) but am still not having much luck. with games that have multiple localisations like most modern ones i can at least find files/folders labelled with language codes but i haven't managed to extract anything useful from them yet.

in no particular order, a list of games i have tried messing around with (and version used) to little avail:

  • death stranding (original version, epic game store): i haven't used any sort of tools on this yet, i just looked though the installed files but they weren't named in a way that's plainly legible to me (a huge dumbass). i downloaded a mod to add ukrainian language support that seems to replace the french one(???) which involves replacing a file, which should tell me which file at least contains the french localisation. but i downloaded the mod on a different computer, and all the files are large container files named like "bh12hfmsokfofhaor" so i need to sort out these files so there' on the same computer first
  • tales of arise (base game, pc game pass): i did at least manage to find directories that suggest they're used for localisation (names including "en/jp/es/ko/de/fr/etc.") but when i tried to open what i found there they all seemed to have the same contents and none of it was the text from the game. i tried using umodel for the first time on this but it kept crashing so i might have messed up somewhere
  • persona 3 reload (base game, pc game pass): again, found files/folders with names suggesting localisation files but haven't managed to access them
  • crisis core reunion (pc version, downloaded from somewhere lol): got it just to test the ff7remake text tools but haven't gotten around to it yet
  • caligula overdose (pc version, let's not ask where it's from): same as previous attempts, no success
  • front mission 1st remake (pc version): using the program the name of which i can't remember now (a file viewer/extractor that opens in your browser), i could look around the files and see some things that might contain text but i couldn't do anything with them at the time. just by browser in the files in window explorer i did find some video files which have subtitles in various languages, which is at least something.
  • front mission 1st (nds japanese version): files extracted, but i couldn't locate the text
  • final fantasy xii revenant wings (nds japanese & euro version): one of the first ones i tried again, since i did it before somehow. while i was mainly going for the japanese text, i tried the euro version as well to use the localised files (which in a multi-lang released will be labelled "en/fr/es/de/etc.") to determine the names and locations of files that are likely to have text in the japanese version without solely relying on names. (also it's easier to search the language codes first to find them than rummaging through everything.) iirc i did find some files, although for languages like spanish and french accented letters weren't displaying properly when i opened them but that's probably on me
  • hayarigami 1 ds (nds japanese version): failed to find the text so i consoled myself by listening to the low res version of the ending song from the game files. i just saw that there's a collected of 1, 2 & 3 from 2023 for the switch, ps4 and ps5 that i didn't even know about which might yield better results (but probably not with me doing it lol)

idk if there's a step by step tutorial out there to walk you through doing this stuff, i feel like i need a lot of help. tales of arise is leaving game pass in under 2 weeks so i would like to try and crack that before then. i have found a text dump from arise but it doesn't seem to be complete (searching the file for a line of dialogue from the game didn't get any results)
 

X-SOLDIER

Harbinger O Great Justice
AKA
X
While I've never attempted data mining as it's rather significantly far outside of my own skill set, I do have a bit of exposure to some subsets of folks who do data mining via the Souls community with things like Bloodborne & Elden Ring as I've dropped very briefly into the DMs or Discords of those groups to see if there are particular bits of game data that anyone knows of which I can't find documentation for when I'm poking around at trying to get a sense of various details of things about game dev cycles from what I can see about the way that design happens to interconnect. There's a section of this video on one of the obscure Dark Souls 2 puzzles that gets into the weeds a little bit about data mining, and that even for all of the reverse engineering & whatnot that's managed to be done with it, none of the community actually even knows what the game engine is even called, and Dantelion is just the community nickname for it.


In general this is one of those times where the Internet is most useful not as a repository of knowledge, but rather as an index of knowledge. While it's unlikely there's a guide that'll be able to get you through it that'll be broadly applicable across a wide range of titles that aren't doing simple .txt files – there's almost certainly other people who're looking to do this as well and who oftentimes have complimentary skill sets to your own where figuring those things out vastly more viable as a group effort. Given the different ways that various dev teams approach things, I'd expect that some of those will end up being more concentrated around various communities specific to those games.

The only person I can think of here who'd likely have a solid perspective on some of that'd be @Shademp with all of the DoC stuff he's dug through over the years that're REALLY deep into the weeds of everything with the game and how locked away various things can be. I know he's also worked pretty closely with the Speedrunning community around as well, and there tend to be a lot of TAS & other development-minded folks who have a particular familiarity with tools like that who are always integral in assisting with routing & optimization for runs of various games, which can mean that they have particularly specialized knowledge that can point you in the right direction even if sometimes it's just knowing which approaches NOT to take. If you're not finding anything that's getting what you're looking for, I'd expect that a Discord / Reddit / Forum that does speedrunning for some of those are likely to have at least a couple of folks who have some experience to help with getting dialogue/text extraction from various games that might be able to give you a hand.


X :neo:
 
Top Bottom