I created a new Git project on my GitHub profile today as I began some work on a possible conference presentation. I was surprised to see a message that said I’d received an achievement badge because I’d “contributed code to the 2020 GitHub Archive Program and now have a badge for it. Thank you for being part of the program!”

Clicking through the Archive program link to find out more, I saw that “On 02/02/2020 GitHub captured a snapshot of every active public repository. Those millions of repos were then archived to hardened film designed to last for 1,000 years, and stored in the GitHub Arctic Code Vault in a decommissioned coal mine deep beneath an Arctic mountain in Svalbard, Norway.”

Which sounds kind of cool, in more ways than one. However, I’m not excited about not really getting a way to opt out of that archive. Although the message on the achievement badge notification says something about being able to opt out in settings, clicking through to settings doesn’t take me anywhere that makes it clear what setting I’d need to adjust. Further, if they’ve already “archived to hardened film designed to last for 1,000 years”, thinking any setting I list now is sort of moot anyway.

This isn’t the only usage of code item GitHub’s made public lately: their new CoPilot program uses the source of public code repositories, apparently regardless of the license used by the repository owner. Starting to wonder if I need to check more seriously into Gitlab’s offerings….

Was dismayed to discover this morning that O’Reilly is no longer putting on in-person conferences, to include the wonderful OSCON conference I so enjoyed both attending and presenting for. I tripped across that news today when I went to find links to my previous talks (2014, 2016). Both talks were based around the idea of delivering the bad news that your build is broken by way of obnoxious Furby chatter. I had submitted talk topics for several years before that first talk got picked up – guess the conference review assessors similarly thought Furbies might be hard to look away from.

So, farewell, OSCON, Strata, and an abundance of other conferences. I’ve been finding my geek conference fix in other places of late, more related to cyber, and it’s not as if there isn’t an abundance of ways to learn in person and online. But OSCON will forever hold a sweet spot in my heart.

Succumbed to temptation today and bought a laptop. I’ve been thinking about it for a while. In two more weeks, I’ll need to hand back in the one I’ve been using from work. This Macbook has stood me well through college and capture the flags, and I’ll be sad to see it go, particularly since it’ll take another week after that before my new one arrives. That said, 32GB of RAM, a 1 TB NVME drive, an NVIDIA GPU with 8GB, and an AMD Ryzen chip: gotta put this poor box to shame. I’m going to have to grow my chops in reverse engineering and cyber exploitation to match it!


You may have seen a few more geek notes on here of late. I’ve really enjoyed jumping into CTFs. My objective isn’t to win, but to find more ways to solve puzzles.

This weekend’s adventures were a little different, though. My company sponsors UMBC’s CyberDawgs team, and they’ve asked us to contribute challenges to their upcoming CTF. I tasked our IRAD team with coming up with a few and I wrote a couple, as well. So this weekend I spent some normalizing our submissions’ README files and doing a final test of the submissions.

One of the submissions was really giving me trouble. The IRAD team member who’d developed it had demonstrated it to us, but the solution instructions in the README just weren’t “clicking” to then be able to reproduce a solve, much less help anyone else understand how to solve. It’s customary in CTFs to have a Discord channel where mentors can offer assistance to those on the right track; given that I don’t want to be up all night myself providing that support, thought it best to provide a walkthrough for someone else..

Not only did I “crack” it (helped, of course, by the solution instructions in his README), but then I was able to provide a linked reproducible recipe using a tool called CyberChef that is really useful for a lot of CTF grunt work. I’m avoiding linking to the recipe or giving any more info on the challenge, of course, given that there’ll be hopefully lots of folks taking a crack at it in early May. I’m now more confident, though, that there may be some folks who solve it AND I better understand a particular kind of encryption approach.

I gave a talk in November to a local high school about computer science as a career field. Aha, I think – I’ve given this talk before – I’ll just brush up my well-prepared slide deck.

My slide deck has a graphic in it that looks something like the below. All credit to Daniel van der Ende and his work on the GitHub Data Challenge in 2014. It’s an interesting way to show the various combinatrics of languages that are used in projects today. It’s actually common nowadays that a project has multiple types of code in it. Often there’ll be the front-end (often JavaScript + HTML + CSS) with some sort of back-end. The point I wanted to convey in the original presentation was that software engineers often don’t just need to know one language. I then would riff lightly one which of the languages they could see in my slide I’d worked with in some form or fashion. (In the snippet you can see of the image, Perl, Scala, Go, JavaScript, Ruby, and Lua. I did just enough of CoffeeScript to not want to do it anymore…)

Well, now it’s 2021. The slide information needs to be updated, and Mr. van der Ende has not updated his image, but he was kind enough to make available his source code and a handy README file which walks (loosely) through how to get the data.

Challenges then solved so far:

  • getting access to BigQuery
  • finding new sources of the data, since the dataset van der Ende references doesn’t seem to exist anymore
  • making BigQuery convinced that I have permission to run queries
  • updating the query to match the new data source, including figuring out how to flatten arrays – really not in his original flow
  • downloading mysql to my developer machine and setting up a database and username/password combo
  • updating van der Ende’s code to read directly from a CSV, rather than assuming I’m using a JSON file
  • getting php to work on my developer workstation – this particular box has done lots of things for me lately, but php hasn’t been one of them
  • figuring out how to populate the languages list the code asked for, given the languages represented in the dataset I downloaded. (For the record, awk, sort, uniq was the happy combo.)
  • uh, figuring out a better way to ingest the CSV, since pulling in the full file at once took up too much memory for my computer
  • (more to come undoubtedly to get it working…)

Note: I ultimately ran into enough things with it that I left the original image. Still on my todo list to bring this to resolution…

My masters classes keep sending us into Wireshark to analyze packet files. I thought I had a decent understanding of how to use Wireshark from some previous experience through work, but I keep finding new tricks as I try to figure out things about unknown protocols. Note that I’m using Wireshark 3.0.3, because that’s what’s installed in the lab infrastructure. I am aware that Wireshark 3.4 is out: my plan is to play with that version on my personal computer to see new goodies.

Copy and Paste

We keep needing to fill out spreadsheets of interesting things learned. We’re running Wireshark through a VDI infrastructure and I’m typically doing my homework on a laptop, so with limited screen real estate, even my touch typing skills aren’t helpful enough. The Copy capability in Wireshark lets me capture just the value for the field – highly useful for things like MAC addresses.

Protocol Hierarchy

Forget about randomly traversing files which including 100K packets – let the protocol hierarchy show likely interesting data points within the file. Filter by said protocol, and data patterns emerge. Worth calling out also the Conversations and Endpoints statistics areas, as well. Nice ways to get a holistic view of what’s going on in the file and what might be worth diving into.

Statistics -> …

We’re looking at SCADA pcap files, including BACnet. Delighted to find a traversal means for BACnet that let me inspect the devices and services seen in the pcap. I was less happy to see that iFix wasn’t in the list, and that Wireshark just treats it as plain TCP (again, with my older version of Wireshark, with its default set of dissectors, etc). Possibilities for expansion.

Expert Analysis

There’s a menu option for ‘Expert Analysis’ that I hadn’t played with before. Add its data, and then allow it to create filters to show just that data – voila. Evidence of TCP retransmissions? Yes, please.

My masters class had us writing Yara rules for our project lab. Given that I recently gave a talk at DataWorks MD that took a brief foray into describing the use of Yara rules for static malware analysis – well – I was prepared for and looking forward to this particular lab.

The challenging part of the lab: to help us understand how analysts decide which byte(s) to check for hex strings, the lab had use the Linux utility, hexeditor. As instructed, we were to

  • sudo hexeditor
  • use the keyboard’s arrows to navigate into a particular file
  • press Ctl-W to invoke ‘search’
  • use the arrows to navigate to the hex search option, as opposed to text search
  • type in the appropriate hex string. Note: the hex string could be longer than the editor would show us in its entry window. With a long enough string, we were then working blind with typos
  • if the hex string was found, jot down at what byte position so that we could later use that in our Yara rules

Bleah… Too many opportunities for typos. Too slow, as we needed to iterate across five files. _Really_ too slow when you consider we were doing this in a VM hosted on university infrastructure, using its GUI via NoMachine.

Improvement 1: sudo hexeditor filename at least got me into a particular file, and importantly, let my file history show me what files I had already interacted with.

I then looked for command-line options to target hexeditor with a search string. That would at least let me repeat previous commands and edit the filename or the hexstring. Unfortunately, hexeditor doesn’t support anything of that sort. grep would apparently have gotten me to whether the pattern existed in the file, but not given me the byte location.

Long-ish story short, although the lab itself had no reason to cause me to do this, and it certainly took me longer to work this out than to just hand jam it, I now have scripts to iterate over a set of files and a set of hex strings to determine if the hex string is represented in the files, and if so, where. My geek demon is satisfied this evening, and I’m holding onto the files here to help in CTFs or other future geekish fun. Credit to here for the general approach for finding hex data locations in files, and here for helping work out the problem of iterating over lines that contain spaces.

#!/bin/bash

# test_hex_find.sh
# Examine file for hex value
# Argument 1: file name to check
# Argument 2: hex string to look for

position=$(od -v -t x1 $1 | sed 's/[^ ]* * //' | tr '\012' ' ' | grep -b -i -o "$2" | sed 's/:.*//')

if [ ! -z "$position" ]
then	
  position=$(( position/3 ))

  echo "filename: $1, hex value: $2"
  printf '%06X\n' $position
fi
#!/bin/bash

# find_hex.sh
IFS=$'\n' hex_strings=( $(xargs -n1 <hex_strings.txt) )


for hex_string in ${hex_strings[@]}; do
	echo $hex_string
done

for file in *.exe; do
  for hex_string in ${hex_strings[@]}; do 
    ./test_hex_find.sh $file "$hex_string"
  done
done
"C6 45 F4 74 C6 45 F5 6C C6 45 F6 76 C6 45 F7 63 C6 45 F8 2E C6 45 F9 6E C6 45 FA 6C C6 45 FB 73"
"8A 04 17 8B FB 34 A7 46 88 02 83 C9 FF"
"5C EC AB AE 81 3C C9 BC D5 A5 42 F4 54 91 04 28 34 34 79 80 6F 71 D5 52 1E 2A 0D"

Yeah, this kind of joke is just my kind. Thank you, Ian Coldwater, for enlivening my day. Thinking about posting it at work, too.

As the leader of our Women In Tech group for work, I particularly appreciate the pun-blaming on MOM! All the better that it’s the capitalized, exclamation-pointed version.

While my thoughts are fresh on my latest CTF…

Pluses:

  • Throughout the event, in top 3. Currently in top 2, but closing out for the day to get other things done.
  • Figured out a few things: interrogating VMDKs via extracting them; linking up a shared drive in Kali
  • Had some success with python scripting to interrogate Word documents to find hidden data, as well as to find md5 and sha1 hashes. Sha-1 grep string was: ‘[0-9A-Fa-f]{40}’

Need to learn:

  • reverse engineering to interrogate malware or other executables
  • faster ways to traverse Wireshark data. Getting protocol statistics is a good starting point – want to get better there
  • executing random files – need VMs stood up for Windows to have them ready to roll…

Hmmm – I thought the CTF was closing out tonight, but it’s not until Sunday night. I need to carefully tread this, for the sake of my health and marriage..