3. twied.multiind — Multi-Indicator Location Inference

3.1. Location Inference Thread

class twied.multiind.inference.Indicators[source]

Simple enum which stores values for each of the Indicators.

class twied.multiind.inference.InferThread(dbcol, config, inf_id=None, proc_id='default', test=False, tweetfunc=None, tweetint=1000)[source]

This class manages the inference of users by creating and managing multiple threads of indicators. The class will setup a number of threads to simultaneously infer tweets. Each of these threads will in turn setup their own threads to contact each of the indicators.

This class takes a MongoDB database which the tweets will be retrieved from, a config object which contains all of the configuration for the indicators. The class will setup all of the indicators in the package:

  • MessageIndicator - uses message field to find topoynms.
  • TZIndicator - uses the timezone the user is in.
  • TZOffsetIndicator - uses the timezone offset the user is in.
  • LocFieldIndicator - uses topoynms in the users location field.
  • CoordinateIndicator - finds coordinates in the users location field.
  • WebsiteIndicator - uses the TLD of the users website address.
  • GeotagIndicator - uses the geotag on the users tweet.

Each of these indicators will be contacted to return estimations of the location of the user. These are then ‘stacked’ to determine an area where the weight is the higest. This class will write the inferred polygon and other information back onto the tweet object in the MongoDB.

Note

This class is not multi-core, which would need to be setup manually or by using multiple processes of this class pointing at different sections of the data.

Initialise the MI location inference.

Parameters:
  • dbcol – The MongoDB collection to recover tweets from.
  • config – The configparser object containing the configuration.
  • inf_id – The inference ID. Has no impact on the inference but is stored in the database alongside the inferred location to act as a tag for the inference task which inferred the location.
  • proc_id – The process name, also stored alongside the inferred location.
  • test (bool) – If True will not use the GeotagIndicator.
  • tweetfunc – A function which is called every tweetint number of inferred tweets which passes a string of the current inference status.
  • tweetint – The number of tweets before each call of the tweetfunc.
add_ind(task)[source]

Processes a task. The task takes a tuple of a Indicator and a field. This method is used in parallel.

Parameters:task – The tuple of the Inidicator and the field.
Returns:The result of passing the field through the indicator.
infer(query, field='locinf.mi')[source]

Starts the location inference task.

Parameters:
  • query – The query used to select tweets from the MongoDB.
  • field – The name of the field to write the inferred location to. For example with the field ‘locinf.mi’ (as default), the final polygon would be written to ‘locinf.mi.poly’.
process_tweet(twt)[source]

Process a single tweet.

Parameters:twt – The tweet object.
Returns:Tuple of the inferred polygons, the number of indicators with a result, the number of polygons returned, the maximum ‘height’ of the stacked polygons, and the tweet object.

3.2. Polygon Stacker

twied.multiind.polystacker.coord2grid(p, scale, offset)[source]

Translates a (lat, long) coordinate on a point on the grid.

Parameters:
  • p – Point on the grid
  • scale – Scale of the grid
  • offset – Offset of the grid
Returns:

Grid coordinate the point is related to.

twied.multiind.polystacker.find_bounds(coords, border)[source]

Finds the top left and bottom right coordinates of the grid which contains the set of points.

Parameters:
  • coords – Set of coordinates to find the area around
  • border – Padding (in degrees) around the bounding area to return
Returns:

Top left coordinate, bottom right coordinate, coordinate area tuple

twied.multiind.polystacker.generate_polygon(coords, scale)[source]

Generate a polygon that encompasses a set of grid coordinates.

Parameters:
  • coords – List of coordinates to draw polygon around
  • scale – Area around each grid square (1/scale in degrees is area of each point)
Returns:

Polygon of area representing the set of points

Returns:

Polygon of area representing the set of points

twied.multiind.polystacker.get_highest(mask, scale, offset)[source]

Returns a list of coordinates that have the largest weight in a grid.

Parameters:
  • mask – A numpy matrix of the weight for each grid point
  • scale – Scaling used to generate the grid
  • offset – Offset used to generate the grid
Returns:

Array of points of the highest weight, maximum value in the grid

twied.multiind.polystacker.grid2coord(p, scale, offset)[source]

Translates a grid point to a (lat, long) coordinate.

Parameters:
  • p – Point on the grid
  • scale – Scale of the grid
  • offset – Offset of the grid
Returns:

World coordinate the point is related to.

twied.multiind.polystacker.infer_location(polys, demo=False)[source]

Stacks a list of weighted polygons and returns the area with the highest weight in the form of a polygon. This polygon may contain multiple contours.

Parameters:
  • polys – Array of indicators with each having an array of (polygon coords, list)
  • demo – Wether to plot diagrams (default=False)
Returns:

Polygon of the highest stacked area from the polygons

twied.multiind.polystacker.plot_area(polys, scale=1.0, p1=(-180, 90), p2=(180, -90))[source]

Polt a array of weighted polygons on a grid and return the weighted grid.

Parameters:
  • polys – Polygons to plot
  • scale – The resolution of the grid - scale is the number of points per degree
  • p1 – Top left coordinate of the grid to generate
  • p2 – Bottom left coordinate of the grid to generate
Returns:

Numpy matrix grid, scale of grid, offset of grid

3.3. Indicators

class twied.multiind.indicators.indicator.Indicator[source]

Base class for an indicator.

get_weight(belief, overloadw=None)[source]

Gets the weight of a polygon based on a belief for the polygon and the weight of this indicator.

Parameters:
  • belief (float) – Value in [0, 1] of how confident the estimation is.
  • overloadw (float) – (optional) If None will use normal weight, otherwise can override the weight of the indicator.
Returns:

The value of the indicator

Return type:

float

point_to_poly(point, belief, overloadw=None)[source]

Translates a (lat, lon) point into a circular polygon of 0.1 degree radius.

Parameters:
  • point (tuple) – The (lat, lon) point to get a polygon around.
  • belief (float) – Value in [0, 1] of how confident the estimation is.
  • overloadw (float) – (optional) If None will use normal weight, otherwise can override the weight of the indicator.
Returns:

class twied.multiind.indicators.coordinateindicator.CoordinateIndicator(config)[source]

Indicator for users with coordinates in their location field.

Initialise the indicator.

Parameters:config – The config object for the MI technique.
get_loc(string)[source]

Returns a point polygon if the users location field contains coordinates in it.

Parameters:string – The users location field.
Returns:Array of polygons.
regex = '(-?\\d{1,2}\\.\\d{6})\\s?,\\s?(-?\\d{1,2}\\.\\d{6})'
class twied.multiind.indicators.geotagindicator.GeotagIndicator(config)[source]

Indicator which returns a single point if the tweet has a geotag on it.

Initialise the indicator.

Parameters:config – The config object for the MI technique.
get_loc(geofield)[source]

Returns a polygon around the location in the geotag if it exists.

Parameters:geofield – The users ‘geotag’ field on the tweet object.
Returns:Array of single polygon around the coordinate of the user.
class twied.multiind.indicators.gislocfieldindicator.GISLocFieldIndicator(config)[source]

Indicator which finds toponyms in the location field and maps them to a area or point using the geonames gazetteer.

This is a reimplementation of the LocFieldIndicator class which uses the Gisgraphy <http://gisgraphy.com/>_ open source gazeteer instead of the Geonames <http://www.geonames.org/>_ gazeteer. In testing this service was found to have a less useful search feature, however was not API limited as the service could be hosted locally.

Initialise the indicator.

Parameters:config – The config object for the MI technique.
get_loc(location)[source]
get_polys(location, gadmpoly)[source]
exception twied.multiind.indicators.gislocfieldindicator.GisgraphyException(value)[source]

Exception object thrown when there is an error is getting the location from the Gisgraphy service.

exception twied.multiind.indicators.locfieldindicator.GeonamesException(value)[source]
class twied.multiind.indicators.locfieldindicator.LocFieldIndicator(config)[source]

Indicator which finds toponyms in the location field and maps them to a area or point using the geonames gazetteer.

get_loc(location)[source]
class twied.multiind.indicators.messageindicator.MessageIndicator(config)[source]

Indicator which finds place names in tweet text using DBpedia spotlight and maps them to a area or point location.

get_loc(message)[source]
class twied.multiind.indicators.tzindicator.TZIndicator(config)[source]

Indicator which gets an area for the timezone the user is in.

get_loc(tz)[source]

3.4. Interfaces

class twied.multiind.interfaces.dbinterfaces.CountryPolyInterface(dbloc)[source]

Interface with the Country polygon database table.

Initialise the database interface.

Parameters:dbloc – The location of the sqlite3 database.
destroy()[source]

Close the connection.

get_polys(name, weight)[source]

Gets the polygons for a name of a Country.

Parameters:
  • name – The name of the area.
  • weight – The weight of the returned polygons.
Returns:

The array of polygons and weights.

class twied.multiind.interfaces.dbinterfaces.GADMPolyInterface(dbloc)[source]

Interface with the GADMPoly database table.

Initialise the database interface.

Parameters:dbloc – The location of the sqlite3 database.
destroy()[source]

Close the connection.

get_polys(name, weight)[source]

Gets the polygons for a name of a administrative district.

Parameters:
  • name – The name of the area.
  • weight – The weight of the returned polygons.
Returns:

The array of polygons and weights.

class twied.multiind.interfaces.dbinterfaces.TZPolyInterface(dbloc)[source]

Interface with the Timezone polygon database table.

Initialise the database interface.

Parameters:dbloc – The location of the sqlite3 database.
destroy()[source]

Close the connection.

get_polys(name, weight)[source]

Gets the polygons for a timezone.

Parameters:
  • name – The name of the area.
  • weight – The weight of the returned polygons.
Returns:

The array of polygons and weights.

get_polys_america(code, weight)[source]

Gets the polygons for American timezones.

Parameters:
  • code – The american code string.
  • weight – The weight of the returned polygons.
Returns:

The array of polygons and weights.

get_polys_offset(offset, weight)[source]

Gets the polygons for a timezone using the offset value.

Parameters:
  • offset – The offset of the timezone.
  • weight – The weight of the returned polygons.
Returns:

The array of polygons and weights.

twied.multiind.interfaces.dbinterfaces.proc_polystr(polys, weight)[source]

Process a string of polygons into an array of polygons with a weight.

Parameters:
  • polys – The string of the polygon data.
  • weight – The weight of the polygons.
Returns:

Array of tuples of (polygon, weight).

class twied.multiind.interfaces.webinterfaces.DBPInterface[source]

Interface for access to the DBPedia API.

destroy()[source]

Close the inferace and destroy the pool.

extract_name(text)[source]

Extracts the name of the page from a DBPedia URL. (the last field) :param text: The DBPedia URL. :return: The name of the page.

req(name)[source]

Request the information from the DBPedia page with the name.

Parameters:name – The name of the DBPedia page.
Returns:The JSON result of the page.
exception twied.multiind.interfaces.webinterfaces.DBPSpotlightException(value)[source]

Exception for twhen there is an issue with DBPediaSpotlight.

class twied.multiind.interfaces.webinterfaces.DBPSpotlightInterface(config)[source]

Interface to access the DBpedia spotlight API.

Initialise the DBPediaSpotlight interface.

Parameters:config – The configparser object.
destroy()[source]

Close the inferace and destroy the pool.

req(text, delay=0.5)[source]

Request the result for some text on DBPediaSpotlight.

Parameters:
  • text – The text to request information for.
  • delay – The number of seconds before retrying if no result is returned. Will double each failure to a maximum of 30 seconds before throwing an exception.
Returns:

The JSON result from the service.

exception twied.multiind.interfaces.webinterfaces.GeonamesDecodeException(value)[source]

Exception for when Geonames returns a result which cannot be decoded.

class twied.multiind.interfaces.webinterfaces.GeonamesInterface(config)[source]

Interface for access to the Geonames API.

Initialise the Geonames interface.

Parameters:config – The configparser object.
destroy()[source]

Close the inferace and destroy the pool.

req(query)[source]

Request the result from the API.

Parameters:query – The string to search.
Returns:The JSON result from the API.
class twied.multiind.interfaces.webinterfaces.GisgraphyInterface(config)[source]

Interface for access to the Gisgraphy API.

Initialise the Gisgraphy interface.

Parameters:config – The configparser object.
destroy()[source]

Close the inferace and destroy the pool.

req(query)[source]

Request the result from the API.

Parameters:query – The string to search.
Returns:The JSON result from the API.
twied.multiind.interfaces.webinterfaces.filter_emoji(text)[source]

Filters out emoji characters in a string.

Parameters:text – The string to filter.
Returns:The text string without the emoji characters.
twied.multiind.interfaces.webinterfaces.req_using_pool(pool, page, data)[source]

Function to perform a GET request on a thread pool with a certain page and get data.

Parameters:
  • pool – The pool to perform the request on.
  • page – The page to contact.
  • data – The get data in the request.
Returns:

The result of the request.