I recently started an RSS crawler after the announcement of the RPGBA was shutting down. My goals were rather simple: 1) Collate information for personal use. and 2) Apply some level of natural language processing for tagging. Sounded good on paper when I first sketched it out.
And now onto the problem list.
#1: RSS Feeds are a PITA
Quite simply, RSS has standards. Multiple standards. So what works for RSS doesn’t necessarily work for an ATOM based output. Both RSS and ATOM have variants. Some fields you’d like to see just are not present in one versus the other. The vast majority of the differences are “nice to have” rather than critical. So I just adjusted the code.
#2: Contextual Analysis
To achieve reasonable context extraction, the post summary generally doesn’t have sufficient length. Thus, I used a web API to extract full text of articles on the blogs I was curating. No surprise, it has problems extracting reasonable text within many sites. Some due to being malformed HTML or other issues. I’d often get the site’s word cloud of tags, categories, etc. Even worse, I’d pick up the “Blogs I Read” section.
Grabbing the full list of tags/categories has major issues for contextual categorization. It floods the detector with a large list of keywords that aren’t actually tied to the specific post.
Unfortunately, my few test sites didn’t have the text extraction problem so I collated a whole lot of nonsensical data over the course of a few weeks. When I sat down to build an initial taxonomy, the problem was obvious.
Solving the problem wasn’t easy. First, I took a look at sentence length to avoid overly short sentences. Some of the extraction returns huge swaths of text. For that, I looked first at overall sentence length and then words per sentence. In the end, it’s simple not tractable to solve. Too many writers use complicated sentences – compounded upon compounds. The solution is better but can never be precise.
#3: Keyword Extraction
For keyword extraction, I again turned to a natural language processing API. As you might expect, if the target text contains non-essential information, it returns garbage. Worse, it gives you a huge tie to a cloud of tags that aren’t relevant as mentioned in #2.
The target information I wanted is obviously web site data. The text extraction engine quite often returns HTML fragments. Chunks of html muck up the keyword extractor. Originally, I did some keyword cleanup but it was insufficient.
I eventually wrote a normalization routine that attempts to make capitalization sane across the keywords. It also strips out often seen HTML fragments.
In the end, I tossed out about 6 weeks of aggregation. The garbage text I allowed to flow through the process overwhelmed the useful results. I’ve not started to accumulate data once again. It’s better but far from perfect. The goal is not to have perfect inputs but rather extract some knowledge from the noise.
Blog writers… some of you use some seriously long sentences. Shorter is better. Break up those massively compound sentences. The idea will be more clearly expressed. Just my $0.02.
Well, sometimes I have edge case bugs that are hard to track down. The failure to generate female names via the Medieval Names Generator was not one of those. I’m certain it worked at one point but I must have broken it at some point I have yet to find in the change log history.
I tweaked the interface slightly to use a robust paradigm used in many of my other generators to fix the issue. Unfortunately, it has been broken for many months.
My thanks to the anonymous individual who dropped me feedback to let me know it was busted. My apologies to the those who tried to generate female names only to get only male results.
Bugs. I code them daily.
I’ve been running mini-conventions for my old gamer crew for over six years. The reality is that it is very simple and overly complicated all at the same time. These are my experiences and suggestions. Your experience may vary widely from mine.
People expect different things from gatherings of any sort. A mini-convention can take on many fronts and may vary from year to year. Constraints can come from both the location and from the people who are going to participate.
Participants are quite often flexible on what they want to do. However that is a double-edged sword. Pick what you are going to do/play/experience for the weekend.
Open-ended situations can be awesome or burn friendships. The plan doesn’t have to be specific games at specific times. However, it should indicate some idea of what will be played and how often. Pick a general thematic element that fits for what you want and communicate that frequently.
Money Up Front
If you are booking a location, know the cost, then get the money up front from every participant. Either they can afford it, or not. It’s a given from the start; not something that is discussed at the end.
I’ve done this badly over the last 6 years. I’ve always known that I could deal with the costs if a non-payment occurs. It is the worst decision I could have made. Get the payment up front, if it costs $150 or $20 for the weekend, people need to pay before they attend. No exceptions. Monetary disputes can be completely avoided.
My particular situation started out as hotel based and evolved to a location with the ability to support 10x the number of participants. This will vary by each approach but make sure you have sufficient space for everyone. Then add in some additional space if it is a physical building. Keep in mind that every particular game requires specific infrastructure. E.g., playing Magic takes a lot of table top space in comparison to an early edition of D&D.
Also, depending on your approach, every location has limitations. While my current one would house 30 people, there are 2 bathrooms Even with 5-7 people that can confront a challenge. It’s never been an issue for us but is something to consider.
That brings up personal space. Normal conventions allow you to retreat to a hotel room to rewind, recoup, and recover. The more confined spaces do not allow that. Sickness, health issues, and even just a break from people require a bit of space. Likewise, disagreements and arguments. Occasionally we all should just walk away from a discussion and vent elsewhere rather than allowing it to escalate.
Along the thematic lines, pick what you are going to do early. Have a backup plan for when the primary game doesn’t happen. Then have a backup for the backup. Given the small size of what I do, this is difficult. Interests may take everything planned off the table or leave you with no running games if game masters bail out at the last moment. As the organizer, be prepared to step in and run stuff at the last minute or work with the attendees to pick something and a game master. Organic flow can allow someone to dominate the situation at the expense of other participants.
Food & Cooking
If you are in a city, food is not much of a consideration. If you are not, you need a menu or a at least a plan. My group does it via shared chaos — hit a grocery store and buy stuff. That means I get a lot of food leftover when it ends and those bits are never what I’d normally eat. Trying to figure out what do with an 8 person cheese, salami and cracker plate solo is a strange challenge.
Pick a menu. Adjust to the participants. Communicate, adjust and readjust. It should not be complicated but if you have to mix vegans, omnivores, and junk food addicts… Snacks and drinks should be provided by the individual participants. Core meals you can deal with.
On the cooking front, my mantra is simple: if you didn’t cook, you clean. That means you wash and dry everything involved. There are the lazy people who will attempt to do neither. Never invite them back. You’ll never know who they are until they show the selfishness that defines them.
Talk to everyone who has a run something similar. Every situation is different. You are going to have to adapt on the fly. Not just the first time, but each time.
The experience is a whole lot of chaos but well worth the effort.
Yep, rolled out another variant of the classically chaotic Deck of Many Things. This time supporting the 5E system.
Interestingly, 5E allows for players to state the number of cards they wish to draw and then draw them sequentially rather than simultaneously. Most cards return to the deck after being drawn so this change allows duplicate cards to be drawn unless a card effect stops the drawing process.
Also, no limit is placed on the number of cards the character chooses to draw. Prior variants capped the number of drawn cards at up to 4. I previously discussed the Origins of the Deck, which I may update in the future with the new 5E details.
I fixed a minor issue in the Labyrinth Lord treasure generation system. Specifically, the issue was not capping level for druids (from AEC) to a maximum of 14. The other spell casting classes have spell progressions to level 20. Due to the lack of a cap, occasionally the spell book creation process failed for druids.
Given the limited nature of 5E’s random treasure table, the book is far shorter than the other systems. Quite a few people like these on-demand PDF’s to avoid rolling a pile of dice during ad-hoc sessions so I rolled on out just for them.
Enjoy and Happy 2015.
I’ve expanded the original generator into an index page, Modern Business Names, over the last few days. Over the last 72 hours, I’ve added 23 specialized generators to the index and re-factored 3 existing utilities to include additional data.
Today, the additions were Professional, Scientific and Engineering services. The 9 new generators cover a variety of engineering, information technology, and other professional services that include over three hundred thousand unique names.
I have 20-30 more specialized entries to add. My goal is to have those done by the end of 2014 but I also started two new RPG specific projects I’d like to complete. One is an entirely new treasure generation system for general fantasy systems. The other is too nascent to discuss.
I hope you all have a great start to 2015.