Let’s say that I want to see all the latest post that contains
the word “auburn”. I can paste the following URL in to any web browser and
twitter will send me the last ten tweets containing the word “auburn.”
http://search.twitter.com/search.json?q=auburn&result_type=recent
What will return is posted below. Stripping this return of HTML can be done using RapidMiner or other software packages.
What will return is posted below. Stripping this return of HTML can be done using RapidMiner or other software packages.
The last part of the URL (“&result_type=recent”)
returns the most recent posts. If you remove this you will get a mix of the
most recent post and some of the most popular tweets.
By default this will only give you the last 10 posts that
twitter finds containing “auburn.” If you want to increase the amount of posts
that are loaded you can add to the end of the URL. Let’s say that I want it to
load the most recent 100 posts you would just add “&rpp=100” where the
rpp represents results per page and the 100 is the number we wish to see.
http://search.twitter.com/search.json?q=auburn&result_type=recent&rpp=100
If you want to search for multiple terms at a time then you can do that two different ways. First you can add all the terms together in quotation marks (example: search for " auburn tigers" by using q="auburn tigers"). This method will automatically transform it to the second form: %22auburn%20tigers%22". The second form should be used when you are going to add further search parameters after the query text.
If you want to search for multiple terms at a time then you can do that two different ways. First you can add all the terms together in quotation marks (example: search for " auburn tigers" by using q="auburn tigers"). This method will automatically transform it to the second form: %22auburn%20tigers%22". The second form should be used when you are going to add further search parameters after the query text.
The data that is returned contains a lot of information that
might not be useful for your project but some of it is interesting. The following is the results for the most recent
singular twitter post containing “auburn”.
{"completed_in":0.056,"max_id":319540687655825409,"max_id_str":"319540687655825409","next_page":"?page=2&max_id=319540687655825409&q=auburn&rpp=1","page":1,"query":"auburn","refresh_url":"?since_id=319540687655825409&q=auburn","results":[{"created_at":"Wed,
03 Apr 2013 20:03:31 +0000","from_user":"M0******","from_user_id":21097****,"from_user_id_str":"21097****","from_user_name":"M*******M******","geo":null,"id":3195406876********,"id_str":"3195406876*********","iso_language_code":"en","metadata":{"result_type":"recent"},"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/1790229258\/image_normal.jpg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/1790229258\/image_normal.jpg","source":"<a
href="http:\/\/twitter.com\/download\/iphone">Twitter
for iPhone<\/a>","text":"RT @AuburnUPC: We
are excited to announce the Auburn Airwaves concert line-up! We are presenting Train, Hot Chelle Rae, and
Green River
Ordinance."}],"results_per_page":1,"since_id":0,"since_id_str":"0"}
- When the post was created: Wed, 03 Apr 2013 20:03:31
- Who the user was: ","from_user":"M0******(partially redacted for this post)","from_user_id":21097****(partially redacted for this post),"from_user_id_str":"21097****(partially redacted for this post)","from_user_name":"M(redacted for this post) M(redacted for this post)
- In this case she doesn't geo tag her tweets but if the user did you would see where they posted from: ","geo":null,
- Her unique twitter ID number: 319540687********* (partially redacted for this post)
- The URL link to the users profile picture: "},"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/1790229258\/image_normal.jpg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/1790229258\/image_normal.jpg","
- That the post was submitted using the users iPhone: Twitter for iPhone<
- Then the text of the post: ":"RT @AuburnUPC: We are excited to announce the Auburn Airwaves concert line-up! We are presenting Train, Hot Chelle Rae, and Green River Ordinance."
Final Thoughts: I don't use twitter but for those of you who do, please know that this data is retrievable without you permission by anybody with an internet contention. I choose to redact the private information of the twitter user whose tweet I used as an example but anyone with a web engine could have gotten it. We can use this for big data because twitter makes this available so that developers can access the data free of charge. So twitter users beware!!
There are many more search operators that can be used to record tweets and narrow the search. They can be found here: https://dev.twitter.com/docs/using-search
Sources:
There are many more search operators that can be used to record tweets and narrow the search. They can be found here: https://dev.twitter.com/docs/using-search
Sources:
This post contains information from the website: http://nealcaren.web.unc.edu/pizza-twitter-and-apis/.
A great resource for information on Big Data as it relates to sociology. It is
written by a professor from UNC Chapel Hill who uses social media to conduct research
and his website has a large number of tutorials on Python and API.
No comments:
Post a Comment