Automated monitoring of Security-Related Events in Urban Areas

Along with my colleagues Ivan Dimitrijević and Milan Lipovac, I recently took part in the international conference “Urban Security and Urban Development”, hosted by the Faculty of Architecture and the Faculty of Security Studies of the University of Belgrade. During the plenary session, we presented our pilot research “Automated monitoring of Security-Related Events in Urban Areas” and I will explain our main ideas and results briefly in the following text.

Introduction

As with any other analysis, relevant and timely data make security assessments more aligned with the actual security dynamics in a given zone of interest. If we think about urban security, what should come into mind is the interplay between the architectural solutions and the social context in which they are embedded. In order to better understand this important interaction from the security perspective we need a lot of so called static data, which is data that most commonly changes relatively slowly, like demographics, social structure, education level, etc. But, in order to have a more complete picture, in order to better understand crime patterns associated with certain place, we also need quality insight into the real-time dynamics of crime events.

Data about crime events are very dynamic so we have short collection cycles. Police organizations are naturally very good at collecting and organizing such data, but they most often do not have the capacity to provide a direct link to their databases for other stakeholders, especially for academics and researchers. There are some exceptions, but we should not be too optimistic about the possibility to get daily updates on crime frequencies from the police, both because of their inherent closeness, and because of complex and slow administrative procedures for obtaining such data.

Having that in mind, the following question naturally arises: can we find another way to correctly approximate spatial and temporal crime dynamics?

We argue that there is another way and we are trying to build such a system and task it with non-stop monitoring of the situation. Please note that we are at the beginning of our journey, but nevertheless we think that this is the best moment to demonstrate our ideas because we are now in best position to incorporate critiques into our project. 

Selecting Data Sources

Since we know that we’re looking for current and reasonably detailed crime reports on personal and property security, exploring local news sources was an obvious choice. There we found most of what we needed, but in form of a large quantity of unstructured text. So, in order to take a full advantage of the selected data source, we had to find a way to deal with it automatically.

In other words, we needed to find a way to: (1) discriminate between crime and non-crime events in the news stories and (2) to parse and structure crime stories, and extract as much structured data as possible and (3) to store relevant data in a database.

So, now we were confronted with text-classification and text-mining tasks which required us to do a little programming and eventually statistics/machine learning.

Working with Text Data

In order to give you the intuition on text classification, I will skip technical details and use just a common language. Suppose I give you a bunch of documents, each of which has a label attached. Let’s say, there are five labels. Then you need to read the documents and hypothesize why each document had been given a concrete label based on its content (namely – words). Finally, I present you a new document and ask you to label it, based on what you know from previous steps. I suppose you would make a correct guess in most of the cases. But the real challenge is to teach a machine to do the same task. That’s where statistical learning (or machine learning as it is called nowadays) comes into play. It’s a fairly complex topic so I’ll just mention a few things relevant for our project.

A Few Relevant Machine Learning Concepts

The kind of learning I just talked about is called a supervised learning, since you have been given examples of documents and then you base your decision about new documents on these examples. Also, when you need to discriminate between just two categories, for example between crime and non-crime events, then you have a so called binary classification problem. You need to restate this binary classification problem numerically, which means that you have to numerically represent collected textual corpus and apply some quantitative procedures in order for computer to learn something about those texts and make a decision.

You basically need to transform text into numbers and define a hypothesis function that will apply some logic or model to unknown data and try to guess it’s label. Then you need a so called cost function or error function that will compare this guesses with real labels and calculate an error in order to check the precision of this classification. The whole point is for computer to use presented data to establish a so called decision boundary which splits the dataset into classes or categories. In case of binary classification: two categories.

concepts

Our Approach to Text Classification

First of all, when you have some textual fragment and you want to transform it to numerical form, you add a certain numerical value or so called weight to words. What you could do is to count how frequently a word or term appears in a given document or corpus. This metric is called Term Frequency.  But the problem here is that there are many insignificant (stop) words like ‘a’, ‘the’, etc., that carry minimal meaning but they are very frequent. On the other hand, you can also measure how unique a word is, or how infrequently the word occurs across all documents (this is called Inverse Document Frequency or IDF). This should point out more meaningful words. You can also calculate the product of these two measures  called TF-IDF and get one of the standard metrics of how much information a word provides.

So the main problem here is to select text features that are most indicative for a certain class of articles. Those attributes that contribute more information will have a higher information gain value and can be selected, whereas those that do not add much information will have a lower score and can be removed from our classification model. We could measure information gain with its entropy score. In this context, the entropy is minimal, equals 0, if the some term “j” appears only in one category. We consider that this term might have a good discriminatory power in the categorization task. Conversely, the entropy is maximal if the term “j” is not a good feature to represent the documents of certain class. For example, if it appears in all the categories with the same frequency.

ourway

A thing to note here is that, although statistics is of much help while selecting most representative features from text, deep knowledge of the problem domain is also needed

We made an assumption that a term which is characteristic of a category must appear in a greater number of documents belonging to this category compared with other categories, but it should also appear more frequently than unrelated terms. Then we tried to empirically determine a set of representative words for certain types of criminal activities that we are interested in – namely robberies, attacks, and similar things.

The first thing we did was the selection of news sources covering Belgrade city region, which was in the focus of our study. There are a few notable news sites that we investigated, like Blic, Kurir and Vecernje Novosti, just to name a few. While Blic and Kurir have the most visitors, and Blic has an extensive archive reaching all the way to the year 2000, the content provided by Vecernje Novosti was the most detailed and best suited for our analysis.

So we politely scraped five years of Belgrade local news from Vecernje Novosti website using a Python script and got a sample consisting of 10.725 news articles. Then we cleaned and transformed this data and saved it to our local storage in order to build a sample of relevant textual corpus.

3

After that, we began hunting for text features. This is, by the way, called Feature Engineering or Feature Selection phase. But then we encountered the first problem: the number of relevant news articles was very low, namely it was just below 7 percent of total sampled data. That’s why we decided to scrape more data from other websites, at the same time covering larger time span, in order to manually build our first set of representative words, which we could later improve.

When we enriched our sample, we began to process it. First, we removed so called stop words which are considered noise. Then we looked at term frequencies and similar measures. Finally, we used our domain knowledge and intuition. We came up with a dictionary of 513 more or less indicative words. This is still very rough, but in the future we plan to optimize the size and content of this set. For those of you that speak Serbian, some of the words are shown on the slide. Notice that we have used only the fixed part of a word that is not changing with tense, gender, case, etc. We used the same technique for the detection of locations in the text, but this is rather primitive and we will soon hopefully incorporate very detailed geocoding using OpenStreetMaps data and a few other python libraries.

Then we developed a rudimentary scoring function just as a proof of concept. The algorithm of the scoring function was very basic: if a significant word is found, raise relevance score of the article for a certain amount. We then scored two groups of 200 articles from Crime and Non-Crime class that we earlier manually classified. We compared the mean score of these groups of articles to check if the scores are informative and usable for decision boundary setting. It turned out that there is a notable difference in Crime class mean score in comparison with Non-Crime mean score, but there also was some overlap. Namely, Crime class mean relevance score was 14.05 points with the standard deviation of 5.11 points, while Non-Crime class relevance mean was 8.7, with the standard deviation of 3.54.

Based on these results, we wanted to see what would happen if we say that it’s safe to classify an article with the score equal or larger than 14.05 as a crime event, just as any article with the score below 3.54 as non-crime event. In that case, the area of reduced certainty was between these two means, and we split this range with our decision boundary exactly in the middle with the value of 10.98.  

With this settings, our primitive and certainly to some extent overfitted testing classifier was able to correctly classify 72.5% of cases from our sample corpus. We have also noticed that the significant number of our false-positive results were either short crime texts, since there are fewer words to score so the overall score cannot be sufficiently large to pass the decision boundary, or cases of emergency situations and traffic accidents which were described with a lot of shared terms as violent attacks. So, we could say that this is a kind of proof-of-concept achievement and in the future iterations we plan to significantly improve our model.

Demo Visualizations

data_sample.png
Collected Data Sample
targeted_events-e1504461833488.png
Relevant events excerpt
19357726_10212081750051718_853303360_n.png
Distribution of sampled articles by municipality of Belgrade
map1
Interactive map (still picture)
map3.png
Events Heatmap

Coding Challenge: Simple AI for Tic-Tac-Toe Game

This example relies heavily on the content of an excellent two part course made by Coursera and Rice University named “Principles of Computing with Python” that I took a few years ago. Except, this time I have used JavaScript and the solution has been built from ground-up: without provided classes or any kind of help. Additionally, I have experimented with a few variants of AI in order to check the effectiveness of proposed solution.

The main problem here is to make the computer player choose rational moves in a game of Tic Tac Toe. There are many solutions available on the internet (i. e. http://neverstopbuilding.com/minimax), but the one I have chosen was a simple, yet pretty effective implementation of Monte Carlo method.

Since this was a simple exercise developed in short time, there are many things that could be perfected, debugged, refactored, etc. Also, I’ve only allowed a Human player to play first. But, I wasn’t developing a game, rather I was exploring the effectiveness of proposed decision-making algorithm, and in order to do that I needed relatively decent user interface.

Basic technical stuff

First of all, the board is not represented as two dimensional, but rather one dimensional array. The reason is simple: single dimensional arrays are easier to deal with and CSS enabled me to display them as a 3×3 square matrix indexed from 0 to 8:

table

Then, eight winning patterns were defined (three rows + three columns + two diagonals): 0-3-6, 1-4-7, 2-5-8, 0-1-2, 3-4-5, 6-7-8, 0-4-8 and finally 6-4-2. As I have said, human player always starts first, and his moves are represented as H, while computer player is C. This makes the game sound more Hard Core than X vs O 😉

The AI Algorithm

I have implemented three variants of computer player’s decision making algorithm. First, there is a Naive Random mode, which just serves the purpose of showing how ineffective or stupid AI looks like. Its decisions are based on random selection of one of the available moves. And as expected, it sure is stupid – as you can see from the example below.

naive_ai
Naive Random Mode

Next, there is the main driver behind AI, the so called Smart Mode. This mode is based on what could be loosely described as Monte Carlo Method. In this mode, computer player is given a board state. He then saves this state and continues to play from that point on making random moves for human and computer player until the game is over.

In other words, the computer simulates random outcome of the game for a given board state. When the game is over, the result is taken into account and if it is a win the winner’s moves are scored with +1 weight factor, while the loser’s moves are scored with -1 weight factor. In case of a tie, the game is simply ignored. So the point is that AI now has the information about the most useful moves – the ones that are most strongly associated with winning outcome – so that it can chose the best one in a given context. These moves are kept in a separate table representation which keeps scores for every field of the board.

With this mode of decision making, the computer player is significantly better and smarter. That is to say, the AI uses opportunities for win well and it blocks the opponent relatively consistently (the algorithm is not ‘fatalistic’ as in the link provided above and I would say it is less computationally intensive). Although, it still makes obvious mistakes from time to time. I have discovered during the tests that Smart Mode is particularly vulnerable to ‘second column’ or ‘second row’ win patterns by human player, as you can see from the picture below.

Smart AI
Smart Random Mode

Since Smart Mode wasn’t smart enough, I tried to improve its effectiveness. This is what I call the Adaptive Mode, or simply put – this is the mode in which real human player win moves are scored as well as simulated ones. I thought that it would be interesting to try to improve/correct simulated moves’ scores with real intelligent user moves. That’s why I gave human moves more weight in scoring. The goal is to provide the AI an opportunity to learn from previous mistakes. The results are promising. Namely, since you should not give too much weight to human moves in order to avoid seriously biasing the results of Monte Carlo simulation, Adaptive Mode shows its advantages over time.

As I have noticed, Smart Mode is fairly consistent in terms of making moves – it does not variate much. On the contrary, Adaptive Mode slowly corrects itself and goes for optimal moves gradually. In the example below you can see how Adaptive move is first exactly the same as Smart move, since human moves are not accounted for. But over time, as it gains experience, it corrects its playing strategy and blocks human from winning the same way as before.

The thing here is to experiment with weight factors in order to get the optimal AI behavior.

adaptive_random_gameplay
Adaptive Mode

For those interested, you can pick up the source code (html/css/js) on my github at https://github.com/zkucekovic/Monte-Carlo-Tic-Tac-Toe. As always, your feedback is more than welcomed!

 

Graphing Political Proximity

Among the topics I am highly interested in are network representations and exploration of political and social structures. Here are some samples of my work.

1. Proximity indicators in public political statements

screenshot
Political statements transformed into a graph

Project description

The purpose of this mini-project was to compile the dataset from local media sources, analyze the content of scraped news articles and code the statements of various political parties’ representatives about the possibility of mutual cooperation. Statements were classified into three categories: positive, negative and neutral. Time span of the study was two months.

Data was transformed and converted into network using NetworkX Python module. Finally, the graph was stylized using Gephi.

Numerical representations of proximity (edge weights) used to build the graph are:

ldk {'vv': -1, 'pdk': -11, 'aak': -3, 'nisma': -1, 'akr': 2, 'alternativa': 2}

akr {'pdk': -2, 'aak': -2, 'nisma': -2, 'vv': 2, 'alternativa': 1}

vv {'pdk': -2, 'ldk': 1, 'akr': 1, 'sl': -1, 'nisma': 1}

aak {'pdk': 0, 'nisma': 2, 'ldk': -1, 'vv': 1, 'sl': -1}

alternativa {'pdk': -2, 'aak': -1, 'nisma': -1}

pdk {'ldk': -3, 'nisma': 4, 'aak': 3, 'vv': -1}

nisma {'aak': 3, 'pdk': 0}

sl {'pdk': 1, 'ldk': 1, 'sl': -2}

The final graph depicts the structure of political relations based on the proximity of parties. Darker edges mean higher weights, while the numerical label of edge weight is colored from red (lower weight) to green (higher weight). Size of nodes represents outdegree (the bigger the size, the more statements the party gave) and the color of node represents indegree (the darker the color, the more statements are made about the party).

Results interpretation

Available data suggests that the political scene in Pristina is notably divided into two blocks which generally corresponds with pre-electoral coalitions. In case PDK-AAK-NISMA coalition fails to form a government, the graph suggests that there is a possibility of alignment among Vetëvendosje and some of the parties from LDK-AKR-Alternativa coalition, since they expressed more positive statements about each other and their proximity index is thus higher. Data also shows that Serbian List (SL) is facing cohesion problems, since its leaders issued negative statements about each other. Of course, eventual predictions of post-electoral dynamics based only on these indicators are not so useful, since public statements should not be interpreted as reliable clue of real political intentions. Contrary to that, they are just general contextual features of the current political setting.

2. Proximity indicators in voting patterns

parliament.png
Social Network Model of the Kosovo Assembly (2014-2016)
parliamentary_group_cohesion_example
Investigating proximity within a parliamentary group

Project description

Considering that cohesion is one of the most important measures when it comes to the exploration of intraparty and coalition dynamics, there is a need to model political proximity relations as accurately as possible. The graph shown above is based on “roll-call votes” (RCV), e.g. votes of MPs recorded by electronic system, which were later published on the website of the Kosovo Assembly. Even though these recordings comprise the largest number of voting activities of Kosovo MPs, we should notice that the present electronic system does not include all voting procedures of the Kosovo Assembly, primarily because MPs sometimes use voting methods other than the electronic system (secret ballot for example). However, these situations are not so often in the Kosovo Assembly, which makes RCV a reliable source for conducting research on Kosovo MPs voting behaviour. The time span of the research is December 2014 to December 2016, e. g. the first two years of Isa Mustafa’s Government.

Sampled RCV records included five types of different data on every bill or motion voted upon in the Kosovo Assembly: date of voting procedure, name of MP, her/his party affiliation, weather she/he was present at the voting, and her/his vote in three categories (“yes”, “no”, “abstain”).  This data was later transformed into “similarity matrix” which was used as a basis for the graph creation (again, using NetworkX module and Gephi).

Interpretation

In summary, what can be concluded is that voting behavior of Kosovo MPs reflected political dynamics of their parties, which could be seen in many examples of major political events. Also, two measures of cohesion (a kind of political proximity metric) of the parliamentary groups have been used in the research and one of them (voting agreement index) outperformed the other (social network modularity measure). Including datetime codes into the dataset enabled tracking of the changes in parliamentary groups cohesion in the longitudinal sense, which most of the time faithfully reflected the actual political dynamics. Consequentially, the above graph generally corresponded well with the actual structure of the Parliament at the time and the divisions between the opposition (the part of the graph above the red line) and the ruling parties (below the line) are clearly visible at the highest level of aggregation.