查询MongoDB Map Reduce函数

我已经流式传输并将大约25万条推文保存到MongoDB中，在这里，我正在检索它，正如您所看到的，基于推文中出现的单词或关键字。

Mongo mongo = new Mongo("localhost", 27017); DB db = mongo.getDB("TwitterData"); DBCollection collection = db.getCollection("publicTweets"); BasicDBObject fields = new BasicDBObject().append("tweet", 1).append("_id", 0); BasicDBObject query = new BasicDBObject("tweet", new BasicDBObject("$regex", "autobiography")); DBCursor cur=collection.find(query,fields);

我想要做的是使用Map-Reduce并根据关键字对其进行分类并将其传递给reduce函数来计算每个类别下的推文数量，有点像你在这里看到的。在这个例子中，他正在计算页数，因为它是一个简单的数字。我想做的事情如下：

 "if (this.tweet.contains("kword1")) "+ "category = 'kword1 tweets'; " + "else if (this.tweet.contains("kword2")) " + "category = 'kword2 tweets';

然后使用reduce函数来获取计数，就像在示例程序中一样。

我知道语法不正确，但这就是我想做的事情。有没有办法实现它？谢谢！

PS：哦，我用Java编写代码。因此，Java语法将受到高度赞赏。谢谢！

发布的代码输出如下：

 { "tweet" : "An autobiography is a book that reveals nothing bad about its writer except his memory."} { "tweet" : "I refuse to read anything that's not real the only thing I've read since biff books is Jordan's autobiography #lol"} { "tweet" : "well we've had the 2012 publication of Ashley's Good Books, I predict 2013 will be seeing an autobiography ;)"}

当然，这是所有推文都带有“自传”一词。我想要的是在map函数中使用它，将其归类为“自传推文”（以及其他关键字），然后将其发送到reduce函数以计算所有内容并返回带有单词的推文数量它。

就像是：

 {"_id" : "Autobiography Tweets" , "value" : { "publicTweets" : 3.0}} {"_id" : "Biography Tweets" , "value" : { "publicTweets" : 15.0}}

您可能想尝试以下操作：

  String map = "function() { " + " var regex1 = new RegExp('autobiography', 'i'); " + " var regex2 = new RegExp('book', 'i'); " + " if (regex1.test(this.tweet) ) " + " emit('Autobiography Tweet', 1); " + " else if (regex2.test(this.tweet) ) " + " emit('Book Tweet', 1); " + " else " + " emit('Uncategorized Tweet', 1); " + "}"; String reduce = "function(key, values) { " + " return Array.sum(values); " + "}"; MapReduceCommand cmd = new MapReduceCommand(collection, map, reduce, null, MapReduceCommand.OutputType.INLINE, null); MapReduceOutput out = collection.mapReduce(cmd); try { for (DBObject o : out.results()) { System.out.println(o.toString()); } } catch (Exception e) { e.printStackTrace(); }

虽然你已经接受了Kay的答案，但这个可能会被忽略，我想建议一个替代解决方案。

MongoDB文档中有一篇关于如何在Mongo中执行全文搜索的文章。为了快速搜索基于文本的字段以查找单个单词，他们建议通过将文本字段拆分为单个单词的数组来准备文档，将这些数组与全文一起存储在文档中，并为此创建索引arrays。

之后，您可以非常快速地找到包含特定单词的所有文档，因为您的搜索查询可以1.使用索引和2.不必使用正则表达式（这可能非常昂贵）。

查询MongoDB Map Reduce函数

Java 1.7.0_u25 Applet使用eclipse进行调试

获取HTTP状态400 – 必需的MultipartFile参数’file’在spring中不存在

从Eclipse中的项目中删除Apache TomCat运行时？

JPanel定位不正确

为什么JavaMail连接超时太长

强制重试特定的http状态代码

是否可以使用Reflection迭代包内的所有类？

是否需要设置一个服务器运行时来在Eclipse中使用CXF生成Web服务客户端？

从脚本中填充内存hsqldb数据库

用于加载PNG图像的替代库